WO2024264010A1 - Methods of enriching and analyzing methylated dna - Google Patents
Methods of enriching and analyzing methylated dna Download PDFInfo
- Publication number
- WO2024264010A1 WO2024264010A1 PCT/US2024/035148 US2024035148W WO2024264010A1 WO 2024264010 A1 WO2024264010 A1 WO 2024264010A1 US 2024035148 W US2024035148 W US 2024035148W WO 2024264010 A1 WO2024264010 A1 WO 2024264010A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dna
- methylation
- methylated
- cpg
- implementations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6853—Nucleic acid amplification reactions using modified primers or templates
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
Definitions
- the present disclosure relates to methods and compositions for characterization and analysis of epigenetic features of DNA.
- DNA methylation is an epigenetic modification that plays important roles in many biological processes, including gene regulation, maintenance of genome stability, embryo development, X-chromosome inactivation, genomic imprinting, and cellular differentiation.
- DNA methylation involves the addition of a methyl group to the DNA molecule, primarily at cytosine bases in a CpG dinucleotide context, producing a 5-methylcytosine base.
- This modification is catalyzed by a group of enzymes known as DNA methyltransferases, and methylation patterns in a cell’s genome are tightly regulated. The patterns can be maintained during cell division, providing a mechanism for stable gene silencing or activation.
- DNA methylation patterns have been associated with several human diseases, including cancer, cardiovascular diseases, metabolic diseases, neurological disorders, autoimmune disorders, and with aging. Therefore, DNA methylation has emerged as a promising biomarker of disease.
- the aberrant methylation patterns can occur at specific genomic regions, such as gene promoters or CpG islands, leading to the dysregulation of gene expression and contributing to disease development.
- methylation signatures that are cell-lineage-specific, tissue-specific, or cancer-specific can be used to identify the origin of cell-free DNA fragments in biological fluids. Measurement of such methylation signatures can be used to infer the rate of death of the corresponding cell types, providing a means to assess cell death from specific tissues, transplanted organs, a fetus, or cancer.
- DNA methylation patterns can be analyzed in various readily accessible biological samples, including blood, urine, saliva, sputum, and stool, making it a convenient and accessible biomarker. Methylation patterns can also be analyzed in biospecimens obtained from medical procedures, such as tissue, cerebrospinal fluid, pleural fluid, ascites fluid, uterine lavage fluid, and Pap smear fluid. The measurement of DNA methylation patterns can aid in disease diagnosis, prognosis, prediction or monitoring of treatment response, and detection of residual or recurrent disease.
- cfDNA cell-free DNA
- Dying cancer cells release DNA fragments into the bloodstream, and this cfDNA contains characteristic alterations in DNA methylation patterns.
- Enzymatic processes that determine methylation levels in various genomic regions are tightly regulated in healthy cells, resulting in highly consistent genome-wide methylation patterns in a given tissue or cell type. These processes become dysregulated in cancer cells, yielding aberrant patterns of methylation that are rarely found in healthy tissues.
- the transcriptional silencing of tumor suppressor genes, developmental regulators, and many other genes by hypermethylation of promoters is a fundamental mechanism of carcinogenesis.
- CpG island In humans, approximately 70% of promoters located near a gene’s transcription start site contain a CpG island. CpG islands are stretches of genomic DNA containing a relatively high density of CpG sites which are targets of methylation (CpG site refers to the sequence 5’-C-phosphate-G-3’). There are typically several hundred to a few thousand gene promoters at CpG Islands that become aberrantly hypermethylated in a tumor, with substantial heterogeneity in methylation patterns across different patients and different tumors. Some promoter hypermethylation patterns can also be organ-specific, enabling tissue of origin prediction from hypermethylated cfDNA fragment analysis.
- DNA methylation is being used to measure biological aging. As individuals age, patterns of DNA methylation are known to change predictably across various genomic sites. These methylation changes can be used as epigenetic clocks to estimate the biological age of an individual or of a particular organ (e.g. the liver, kidney, heart, etc.). This approach is believed to provide a more accurate reflection of an individual’s physiological state than simple chronological age, potentially providing insights into disease risk and a means to measure the effectiveness of anti-aging therapies.
- These clocks such as the Horvath clock, assess biological age by examining methylation levels at particular CpG sites. Many of the CpG sites that are strongly predictive of biological age are found at or in the vicinity of CpG islands. Thus, genome-wide assessment of methylation levels at CpG islands could provide a more accurate measurement of biological age than current methods which are focused on a limited number of pre-defined CpG sites.
- hypermethylated cfDNA fragments mapping to CpG islands typically have 10 or more methylated CpG sites in a fragment of -160-180 base pairs, whereas the enriched pool of methylated cfDNA captured by cell-free MeDIP contains an average of 4 methylated CpG sites.
- Achieving more selective enrichment of fragments with higher methylation density has been challenging.
- One approach to selectively capture more densely methylated fragments is to include methylated competitor DNA fragments (which cannot participate in downstream sequencing) during antibody- or methyl binding domainbased affinity purification.
- the methylation density distribution of captured DNA fragments can be tuned by adjusting the amount, methylation density, methylation content, and/or fragment size of the competitor DNA added.
- increasing selectivity for densely methylated DNA fragments also leads to greater loss of such fragments because of increased competition for binding. Such losses can degrade the relevant signal when input molecules are limited, as with cfDNA.
- Such a method could permit comprehensive capture of densely methylated DNA fragments mapping to CpG islands throughout the genome. Importantly, the method would not require pre-specification of genomic target sequences. Instead, the method would be able to dynamically capture hypermethylated CpG island and/or promoter sequences wherever they occur in a genome. The method would be able to detect aberrant hypermethylation patterns of CpG islands from any type of cancer as well as from other disease states.
- the method could be used to measure aberrant cancer-derived hypermethylation signals in biofluids or biospecimens to enable early detection of cancer, to assess prognosis, to predict therapy efficacy, to monitor treatment response and disease progression, to identify changes in hypermethylation patterns that could be indicative of treatment response or resistance, and to detect residual or recurrent cancer. Additionally, because the method would enable identification of patient-specific methylation patterns in a biospecimen obtained from a patient at a certain point in time, knowledge of that pattern could be used to improve the sensitivity of detection of similar patterns in biospecimens obtained from the same patient at a different point in time.
- this approach could enable the development of tumor-informed or plasma-informed personalized assays requiring only modifications to computational algorithms, and not requiring physical or experimental changes to the assay methodology. Because the method would enrich DNA fragments based on methylation density rather than sequence, it would be able to enrich densely methylated DNA fragments regardless of their genomic origin, including from essentially any vertebrate species as well as from viruses and microbes.
- the current disclosure is directed to methods and compositions that enable efficient measurement and analysis of biologically or medically informative DNA methylation patterns.
- Some disclosed methods and compositions permit characterization of aberrant hypermethylation patterns at CpG Islands throughout a genome without needing to target prespecified sequences of interest.
- Methods and compositions are described for enriching DNA molecules based on density of methylated CpG sites while minimizing loss of unique sequences derived from densely methylated DNA molecules.
- Methods and compositions are also described to enable selective sequencing of densely methylated DNA molecules from a population of DNA molecules, while minimizing the loss of unique sequences derived from said densely methylated DNA molecules.
- Some methods and compositions described herein enable conversion and amplification of DNA with restoration of CpG methylation patterns in the DNA copies.
- disclosed methods can be applied to detection or monitoring of cancer. In some implementations, disclosed methods can be applied to assessment of evolving methylation patterns during or after a therapy as well as identification of potential mechanisms of therapy resistance. In some implementations, methods described herein can be applied to sensitive detection of residual, recurrent, or progressing cancer by detecting aberrantly hypermethylated DNA fragments in post- treatment biospecimens that match patient-specific patterns of aberrant hypermethylation identified in pre-treatment biospecimens. In some implementations, disclosed methods can be applied to diagnosis and monitoring of non-cancer disease conditions that produce altered methylation patterns in patient biospecimens.
- disclosed methods can be applied to assessment of organ-specific cell-death in various disease states based on measurement of organ-specific methylation patterns in cell-free DNA. In some implementations, disclosed methods can be applied to measurement of biological aging. In some implementations, disclosed methods can be applied to assessment of CpG island hypermethylation in vertebrate species for veterinary, agricultural, and/or animal model research applications. In some implementations, disclosed methods can be applied to assessment of densely methylated viral and/or microbial DNA fragments. In some implementations, disclosed methods can be developed into a kit. In some implementations, disclosed methods can be combined with spatial labeling techniques to permit assessment of spatial patterns of CpG island hypermethylation in tissues.
- a method of DNA conversion and amplification with restoration of CpG methylation patterns comprising the following steps: a. converting unmodified cytosine bases in template DNA molecules to uracil bases by deamination, resulting in converted template DNA molecules; b. performing a polymerase chain reaction (PCR) to generate DNA copies of the converted template DNA molecules; c. methylating cytosine bases at unconverted CpG sites in the DNA copies using an enzyme, thereby providing converted and amplified copies of DNA with CpG methylation patterns restored.
- PCR polymerase chain reaction
- a method of enriching DNA molecules based on density of methylated CpG sites while minimizing loss of unique sequences derived from densely methylated DNA molecules comprising: a. converting unmodified cytosine bases in template DNA molecules to uracil bases by deamination, resulting in converted template DNA molecules; b. performing a polymerase chain reaction (PCR) to generate DNA copies of the converted template DNA molecules; c. methylating cytosine bases at unconverted CpG sites in the DNA copies using an enzyme, resulting in DNA copies with restored methylation; d. enriching densely methylated members of the population of DNA copies with restored methylation via selective capture based on methylation density.
- PCR polymerase chain reaction
- a method of selectively sequencing densely methylated DNA molecules from a population of DNA molecules, while minimizing the loss of unique sequences derived from said densely methylated DNA molecules comprising the following steps: a. converting unmodified cytosine bases in template DNA molecules to uracil bases by deamination, resulting in converted template DNA molecules; b. performing a polymerase chain reaction (PCR) to generate DNA copies of the converted template DNA molecules; c. methylating cytosine bases at unconverted CpG sites in the DNA copies using an enzyme, resulting in converted DNA copies with restored methylation; d.
- PCR polymerase chain reaction
- a method for detecting residual, recurrent, or progressing cancer comprising: a. sequencing densely methylated DNA fragments from tumor tissue, blood, plasma, serum, or urine of a patient diagnosed with cancer to identify a plurality of aberrantly hypermethylated CpG island regions that are specific to that patient’s cancer; b. obtaining one or more longitudinal samples of blood, plasma, serum, or urine from the patient after the patient has received a cancer treatment; c. sequencing densely methylated DNA fragments from the post-treatment longitudinal sample; d.
- Figure 1 provides a schematic illustration of an example method of selectively sequencing densely methylated DNA molecules from a population of DNA molecules with varying methylation density, while minimizing the loss of unique sequences derived from said densely methylated DNA molecules.
- the schematic representation shows 3 examples of biologically derived input DNA molecules at the top of the figure: one with relatively dense methylation, a second with relatively sparse methylation, and a third with no methylation.
- the densely methylated DNA is shown with 4 symmetrically methylated CpG sites.
- densely methylated DNA fragments can have 8 or more methylated CpG sites (symmetric or asymmetric) in fragments of -100 to 250 base pairs in length.
- the example shows that adapters are ligated to input DNA fragments, and then the DNA undergoes enzymatic methyl conversion (or bisulfite conversion) followed by PCR amplification.
- the resulting converted and amplified DNA copies have sequences in which unmodified C bases were converted to T bases, whereas methylated or hydroxymethylated C bases were retained as C bases.
- some amplified copies shown in the schematic are derived from the converted Watson strand of the input DNA and some are derived from the converted Crick strand.
- a CpG methyltransferase enzyme is used to restore methylation at unconverted CpG sites in the converted, amplified DNA copies.
- CpG methyltransferase enables restoration of original methylation patterns.
- the amplified DNA copies with restored methylation patterns are then shown undergoing selective enrichment of densely methylated DNA copies by competitive binding to methyl binding domain protein (or antibody to 5-mC) and capture on magnetic beads. Densely methylated competitor DNA fragments are added to the capture mix to competitively inhibit the capture of fragments with lower methylation density.
- the ability to generate multiple redundant copies of each DNA input molecule with methylation patterns restored enables use of stringent capture and enrichment conditions (including an option of more than one round of capture) while preserving representation of unique sequences of densely methylated DNA fragments.
- the schematic illustrates the purification of a next-generation sequencing (NGS) library of densely methylated, converted sequences. Resulting sequences can be mapped to a reference genome and the original methylation status of cytosine bases can be inferred based on C to T conversion.
- NGS next-generation sequencing
- Figure 2 provides a more detailed schematic illustration of an example method in which double-stranded biologically derived input DNA is converted and amplified with restoration of CpG methylation patterns.
- double- stranded biologically derived input DNA shown at the top of the figure, a symmetrically methylated CpG site is shown (with methylated cytosines on both strands), and several unmethylated cytosines are also shown within and outside of a CpG context.
- Bisulfite conversion or Enzymatic Methyl conversion results in unmodified cytosines being converted to uracils by deamination. 5- methylcytosines and 5 -hydroxymethylcytosines are rarely converted to uracils.
- Figure 3 provides a detailed schematic illustration of an example method in which single-stranded biologically derived input DNA is converted and amplified with restoration of CpG methylation patterns. The process is largely analogous to that shown in Figure 2, but the resulting converted, amplified, and re-methylated sequences are derived from conversion of a single-strand sequence.
- Figure 4 shows a schematic overview of an example method which enables enrichment of DNA molecules based on density of methylated CpG sites while minimizing loss of unique sequences derived from densely methylated DNA molecules.
- the figure highlights the creation of a converted, amplified NGS library with methylation patterns restored, which can then be subjected to stringent selection conditions to enrich densely methylated DNA while taking advantage of the sequence redundancy to preserve unique sequence representation of densely methylated DNA fragments.
- Figure 5 shows a schematic of a two-step ligation scheme that enables ligation of adapter sequences to double- stranded DNA fragments of interest while minimizing formation of adapter dimers.
- the illustrated two-step ligation scheme was used in Example 1 in the Detailed Description section to attach adapter sequences to blunted and 5 ’-phosphorylated cell-free DNA fragments and genomic DNA fragments.
- the first step involves ligation of stem-loop adapters to the insert DNA by forming a phosphodiester linkage between a 5’- phosphylated end of the insert DNA and the 3 ’ -hydroxyl end of the stem-loop adapter.
- the 5 ’-end of the stem-loop adapter lacks a phosphate, and therefore cannot be ligated to the insert or to another stem-loop adapter molecule (thereby avoiding adapter dimer formation).
- USER enzyme is used to cleave at deoxyUridine positions in the stem of the stem-loop adapter to destabilize base-pairing. DNA is then cleaned up to remove unligated adapters, ligase, and USER enzyme.
- a displacer oligonucleotide is added to displace one strand of the stem-loop adapter by hybridization to the opposite (ligated) strand, as shown in the figure.
- a nick-sealing ligase (HiFi Taq DNA ligase) is used to ligate the 5 ’-phosphorylated displacer oligonucleotide to the 3 ’-end of the insert DNA.
- the stem- loop adapter and displacer oligonucleotides were designed to attach sample barcodes and Illumina adapter sequences to the input DNA fragments.
- adapter sequences for other sequencing platforms could be readily substituted.
- Figure 6 shows histograms comparing the CpG dinucleotide content of sequenced cfDNA fragments before vs. after two rounds of selective capture and elution of densely methylated DNA fragments (which is referred to here as high density methyl-capture).
- the CpG dinucleotide count refers to the number of CpG sites (methylated, hydroxymethylated, or unmethylated) in a biologically derived input DNA fragment, not the remaining (unconverted) CpG sites after conversion and amplification.
- Red boxes are included to highlight the robust enrichment of fragments harboring 8 or more CpG sites (in fragments averaging -170-180 bp in length), which is the methylation density range typically found in CpG islands and promoters.
- Figure 7 presents a genomic map showing a change in alignment and coverage of sequenced cell-free DNA fragments in the region of the PAX8 gene on Chromosome 2 before vs. after two rounds of selective capture and elution of densely methylated DNA fragments.
- Preparation of the native library comprised steps of conversion, amplification, and restoration of methylation patterns using methods disclosed herein.
- the enriched library was further subjected to two rounds of methyl binding domain-based affinity capture and elution with competitive binding of a 226 base-pair competitor DNA containing 10 methylated CpG sites.
- sequences mapped in a largely random pattern throughout the genome.
- Figure 8 shows a heat map displaying genomic regions at which aberrantly hypermethylated sequences from cell-free DNA fragments were observed to map in plasma of 11 patients with various types of cancer (advanced stage) and 8 non-cancer control subjects who were heavy smokers participating in a lung cancer screening program. Results are displayed for chromosome 2 (chosen arbitrarily), which is representative of genome-wide patterns. Dark bars represent genomic regions at which mapping is observed of one or more cfDNA fragments that are categorized as aberrantly hypermethylated.
- Such fragments are densely methylated but map to genomic regions that are expected to have a methylation level of less than 40% (averaged across all CpG sites) in multiple types of healthy cells and tissues based on publicly available whole genome bisulfite sequencing data (from Roadmap and Blueprint studies).
- Figure 9 shows the evolution of plasma cell-free DNA hypermethylation patterns over time in a patient with metastatic non-small cell lung cancer being treated with olaparib and cediranib.
- the patient’s cancer initially showed a modest response to therapy (considered stable disease by RECIST criteria), and subsequently showed progression.
- initial shrinkage followed by enlargement of a liver metastasis are shown in computer tomography (CT) scan images taken at baseline (prior to therapy) and at cycles 4 and 8 of therapy (each cycle is 28 days).
- CT computer tomography
- a graph is also provided showing changes over time in tumor burden (defined as sum of diameters of target lesions according to RECIST guidelines) and in tumor-derived cfDNA level (measured as the variant allele fraction [VAF] of a tumor-specific KRAS mutation in plasma cell-free DNA).
- the tumor burden initially decreases with the drug therapy but then increases, likely because of growth of treatmentresistant tumor clones.
- the mutant tumor-derived cfDNA level shows a transient spike in level (possibly due to initial cell kill) followed by a decline, indicative of tumor response. After ⁇ 4 months, the tumor-derived cfDNA level began to increase, indicative to tumor progression.
- aberrantly hypermethylated cfDNA fragment counts mapping to chromosome 10 are shown at 4 time points: at baseline, shortly after beginning treatment, at the nadir of mutant tumor-derived cfDNA level, and when the cancer has clearly progressed.
- Each circle indicates the observation of one or more aberrantly hypermethylated cfDNA fragments mapping to a CpG island at that genomic location.
- Circle size is proportional to the number of fragments mapping to a given CpG island. Blue circles indicate CpG islands at which aberrantly hypermethylated cfDNA fragments mapped at baseline and during therapy. Green circles indicate CpG islands at which aberrantly hypermethylated fragments were observed at either of the first two time points but not thereafter.
- Red circles indicate CpG islands at which aberrantly hypermethylated fragments were not observed at either of the first two time points but emerged thereafter. Analysis of such evolving aberrant hypermethylation patterns at CpG islands can provide biological and clinical insights pertaining to epigenetic resistance mechanisms, tumor heterogeneity, prognosis, and response or lack of response to therapy.
- Figure 10 compares cell-free DNA hypermethylation patterns at CpG islands in plasma samples obtained at two time points (pre- and post-treatment) in a 76 year-old male patient with metastatic non-small cell lung cancer who received immune checkpoint inhibitor therapy with the drug Pembrolizumab. Plasma samples were obtained from the patient prior to initiating therapy (on cycle 1 day 1) and again after completing one cycle of treatment (on cycle 2 day 1, prior to receiving the second cycle). Cell-free DNA was extracted from plasma and was tested according to the methods described in Example 1. The Figure shows a graph in which cell-free DNA fragment counts mapping to various CpG islands are displayed for two time points (pre-treatment on the X-axis, and after 1 cycle on the Y-axis).
- Each data point on the graph shows cell-free DNA fragment counts mapping to an individual CpG island at the two time points.
- the graph shows that at many CpG islands, the relative fragment counts mapping to those CpG islands remain fairly stable over time, suggesting that these CpG islands are unlikely to be cancer-associated (considered background signal).
- the graph also shows that at several other CpG islands, the relative fragment counts mapping to those CpG islands decrease substantially from the pre-treatment sample to the post-treatment sample, suggesting that these CpG islands are likely to be cancer-associated.
- Such analysis can facilitate identification of CpG islands that show cancer-associated hypermethylation in a given patient (i.e., a personalized cancer-associated hypermethylation pattern).
- FIG 11 shows that densely methylated viral DNA can be captured and sequenced using methods disclosed herein.
- the data for the figure were generated from a 1 mL plasma sample that was obtained from a male patient with HIV who developed diffuse large B-cell lymphoma (DLBCL). Densely methylated viral DNA fragments were captured from this patient’s plasma in parallel with densely methylated cell-free DNA fragments derived from the patient’s genome. It is known that DLBCL in the setting of HIV is often associated with latent Epstein-Barr Virus (EBV) infection of B-cells. The plasma sample was obtained prior to initiation of any therapy.
- EBV Epstein-Barr Virus
- Cell-free DNA (including viral DNA) was extracted from the plasma sample and was tested according to the methods described in Example 1 , with a modification in the bioinformatic analysis to include alignment of DNA fragment sequences to viral genomes including Epstein-Barr Virus, HIV-1, Human Papilloma Virus, Kaposi’s Sarcoma Herpesvirus, Hepatitis B Virus, and Hepatitis C virus, in addition to the human genome reference (hg38).
- the Figure shows densely methylated cell-free DNA fragments mapping to the EBV genome in the plasma of this patient. Note the periodicity of sequence coverage suggests phased nucleosomal protection of cfDNA fragments. Red bars in magnified views indicate methylated CpG sites; blue bars indicate unmethylated sites.
- the current disclosure is directed to methods and compositions relating to medical diagnostics and biomedical research. Some methods enable enrichment of densely methylated DNA fragments from a mixture of DNA fragments with varying methylation density. Some methods enable enrichment of densely methylated DNA fragments that map to CpG island regions or promoter regions (or both) in a genome. Some methods enable enrichment of densely methylated DNA fragments while reducing loss of unique sequence representation of such fragments. Some methods enable enrichment of densely methylated DNA fragments using affinity capture techniques in which an antibody or protein preferentially binds to 5-methylcytosine or a methylated CpG site.
- Some methods enable identification and quantification of DNA molecules that harbor epigenetic modifications including 5-methylcytosine, 5-hydroxymethylcytosine, or both. Some methods include methylated competitor DNA during affinity capture to modify the methylation density profile of the captured DNA. Some methods use next-generation sequencing to obtain the sequences of the enriched, densely methylated DNA fragments. Some methods include library preparation steps prior to and/or after the enrichment of densely methylated DNA fragments to enable next-generation sequencing of the densely methylated DNA. Some methods reduce loss of unique sequence representation of densely methylated DNA fragments during affinity capture by producing a plurality of copies of the template DNA fragments (with CpG methylation patterns of the template DNA molecules restored on the DNA copies) prior to performing selective affinity capture based on methylation density. Some methods enable conversion and amplification of DNA with restoration of CpG methylation patterns in the DNA copies. Some methods are well-suited to analysis of DNA from biological specimens in which the DNA quantity is limited, such as cell-free DNA from blood.
- methods described herein address the challenge of selectively capturing and sequencing densely methylated DNA from CpG islands and/or promoter regions of vertebrate genomes without pre-defining target sequences and with minimal loss of desired sequence information during selective capture.
- Enrichment of densely methylated DNA from CpG islands for sequencing is desirable because these genomic regions can be especially rich in biologically informative methylation signals.
- signals of interest e.g., for biomarker development
- DMRs differentially methylated regions frequently occur at CpG islands, even though CpG islands have been estimated to constitute only approximately 1-2% of most mammalian genomes.
- some methods described herein enable generation of many redundant copies of each original (biologically derived) DNA fragment prior to performing selective capture of densely methylated DNA. In this way, representation of unique sequences is preserved even if a large proportion of desired molecules are lost under high- stringency capture conditions. For example, if an original densely methylated DNA molecule is amplified by PCR to 200 copies and only 5% of these copies are recovered after capture, 10 copies of that unique original molecule will still remain available for sequencing. However, standard PCR amplification does not copy methylation marks, making it impossible to subsequently perform affinity capture based on DNA methylation density. To enable amplification of DNA with restoration of methylation patterns, some methods are described herein that permit conversion and amplification of DNA with restoration of methylation at CpG sites.
- some methods described herein comprise conversion of original (biologically derived) DNA molecules resulting in deamination of unmodified cytosine bases to uracil bases.
- conversion can be performed by treatment of DNA with bisulfite or with enzymatic treatment (TET2 then APOBEC) as in Enzymatic Methylation sequencing (EM-Seq).
- TET2 then APOBEC
- EMT2 Enzymatic Methylation sequencing
- cytosine bases which are methylated (5-mC) or hydroxymethylated (5-hmC) are protected from deamination.
- the converted DNA can then undergo PCR amplification in which the uracil bases in the original converted DNA molecules are replaced by thymine bases in the DNA copies.
- some methods described herein use a CpG methyltransferase (such as M.SssI) to restore methylation at CpG sites in the amplified DNA copies which correspond to CpG sites that were either methylated or hydroxymethylated in the original DNA molecules.
- the CpG methyltransferase enzyme M.SssI catalyzes methylation at the C5 position of all cytosine residues within the double- stranded dinucleotide recognition sequence 5’ ...CG...3’.
- CG dinucleotides also known as CpG sites
- CpG methyltransferase can be used to restore the original methylation patterns on the DNA copies.
- some conversion methods such as bisulfite conversion and EM-Seq conversion
- the DNA copies will sometimes (rarely) contain methylated CpG sites where the original DNA was not methylated or hydroxymethylated, or conversely, will lack methylation at a CpG site that was methylated or hydroxymethylated on the original DNA.
- the end result of this conversion, amplification, and re-methylation process is the production of amplified DNA copies in which unmodified C bases in the original DNA fragments are converted to T bases in the DNA copies, and methylated or hydroxymethylated CpG sites in the original DNA fragments are restored as methylated CpG sites in the DNA copies.
- sequence redundancy Taking advantage of the sequence redundancy of the converted DNA copies with methylation patterns restored, some methods described herein are able to enrich DNA fragments based on their methylation density while minimizing loss of unique sequence representation of densely methylated DNA fragments.
- the sequence redundancy permits use of enrichment or capture conditions that are highly selective for densely methylated DNA sequences. Because of the sequence redundancy, loss of some copies of an original (biologically derived) densely methylated DNA fragment under stringent enrichment conditions would be unlikely to result in complete loss of sequence representation of that fragment.
- enrichment of densely methylated DNA copies could be performed using affinity purification methods based on antibodies or proteins that bind to 5- methylcytosine or to symmetrically methylated CpG sites in double- stranded DNA.
- densely methylated competitor DNA molecules can be added to the affinity purification mixture to preferentially occupy binding sites to reduce the probability of capture of DNA copies with low methylation density.
- the enriched, densely methylated, converted DNA copies can undergo next-generation sequencing to enable characterization of the sequences, genomic mapping locations, and methylation status of the DNA copies.
- the term “enrichment of densely methylated DNA” refers to at least a 2-fold increase in the fraction of densely methylated DNA molecules divided by the total number of DNA molecules in a population. In a preferred implementation, this term refers to at least a 10-fold increase in the fraction of densely methylated DNA molecules divided by the total number of DNA molecules in a population. In a more preferred implementation, this term refers to at least a 100-fold increase in the fraction of densely methylated DNA molecules divided by the total number of DNA molecules in a population. In a most preferred implementation, this term refers to at least a 500-fold increase in the fraction of densely methylated DNA molecules divided by the total number of DNA molecules in a population.
- methylation status of the DNA fragments can be directly assessed because of the sequence conversion.
- Other existing methods such as MeDIP and MBD Capture that enrich methylated DNA have been developed to directly capture methylated DNA fragments derived from the biological source (sometimes preceded by or followed by steps to incorporate adapters and/or indices for next-generation sequencing), without conversion or amplification of the DNA prior to capture. With these methods, because the DNA did not undergo conversion, the captured fragments are assumed to be methylated at CpG sites based on the fact that they were selectively captured, but there is no direct sequence-based evidence of their methylation state.
- DNA used as input for the assays and/or methods described herein can be derived from biological sources (such DNA is referred to herein as input DNA, original DNA, original input DNA, or biologically derived DNA).
- input DNA can be cell-free DNA (cfDNA) derived from biofluids or biospecimens including but not limited to blood, plasma, serum, saliva, sputum, stool, cerebrospinal fluid, Papanicolaou smear fluid, uterine lavage fluid, peritoneal fluid, pleural fluid, or urine.
- cfDNA cell-free DNA
- input DNA can be cell-derived DNA or exosome-derived DNA obtained from biofluids or biospecimens including but not limited to tissue, blood, plasma, serum, saliva, sputum, stool, cerebrospinal fluid, Papanicolaou smear fluid, uterine lavage fluid, peritoneal fluid, pleural fluid, or urine.
- input DNA can be double-stranded, single-stranded, or a combination of both.
- DNA can be obtained from patients with cancer.
- input DNA can be obtained from individuals being screened for cancer.
- input DNA can be obtained from individuals with inflammatory, autoimmune, or infectious disease processes.
- DNA can be obtained from healthy individuals with no known disease. In some implementations DNA can be obtained from forensic specimens including but not limited to hair, blood, semen, vaginal fluid, and skin. In some implementations, DNA can be obtained from sources that combine the DNA of multiple individuals or organisms including but not limited to human wastewater, agricultural wastewater, agricultural food stocks, and animal-derived food products.
- adapter oligonucleotides that are compatible with a particular sequencing platform can be ligated to the DNA (to produce a next-generation sequencing library).
- adapter sequences can be compatible with one or more of the following sequencing platforms, including but not limited to Illumina, Ion Torrent, Pacific Biosciences, BGI, Complete Genomics, and Oxford Nanopore.
- the ends of the DNA inserts may be prepared for ligation to adapter oligonucleotides by enzymatic treatment to phosphorylate the 5 ’-ends, to produce blunt ends, or to produce ends with appropriate overhangs that are compatible with the adapters that are to be ligated.
- a tagmentation approach can be used to attach adapters.
- DNA adapter molecules are ligated to both paired DNA strands of a double-stranded DNA fragment (on one end or both ends of the DNA fragment).
- adapter molecules can be attached in a similar manner by a transposase enzyme or by primer extension.
- an adapter molecule can be ligated to the 5 ’-end of one strand of DNA, and a polymerase can be used to extend the 3 ’-end of the opposite strand to make a reverse-complement copy of the ligated adapter molecule, thereby attaching adapter sequences to both strands of the DNA.
- the adapter molecule can comprise a DNA sequence tag that is substantially unique to the adapter (e.g., a Unique Molecular Identifier).
- the adapter molecule can comprise a Molecular Lineage Tag (which may have diverse sequences but not necessarily sufficient diversity to be unique).
- the adapter can be fully double-stranded, or can be partially double-stranded and partially singlestranded.
- the adapter can be fully single-stranded.
- the adapter can comprise the 4 unmodified DNA bases (A, C, T, and G).
- the adapter can comprise modified DNA bases, including but not limited to 5 -methylcytosine and/or 5-hydroxymethylcytosine.
- partially-double-stranded adapters can be ligated to both strands of the double-stranded DNA fragments.
- adapters can be ligated to the biologically derived input DNA molecules prior to conversion.
- adapters can be ligated to converted DNA molecules prior to amplification.
- adapters can be litigated to converted and amplified DNA molecules prior to restoration of methylation patterns.
- adapters can be ligated to converted, amplified DNA copies with methylation patterns restored prior to enrichment of densely methylated copies.
- adapters can be ligated to DNA converted, amplified, methylation pattern restored, and dense methylation-enriched DNA prior to next generation sequencing.
- adapters can comprise 5- methylcytosine (or 5-hydroxymethylcytosine or 5-carboxyctyosine or 5-formylcytosine) bases to prevent conversion in adapter sequences.
- adapter sequences can be designed to avoid incorporation of CpG sequences which can subsequently become methylated by CpG methyltransferase.
- adapters can be designed to minimize the formation of adapter dimers during litigation.
- the process of adapter ligation can be optimized to reduce or prevent the formation of adapter dimers.
- a two-step ligation approach is used 2 minimize formation of adapter dimers.
- a two-step ligation approach (as schematized in Figure 5) comprises: (1) ligation of a 3’-end of a stem-loop adapter oligonucleotide to a 5 ’-end of a double-stranded insert DNA, without ligation of the opposite strand; and (2) displacement of the unligated strand of the stem-loop adapter by a displacer oligonucleotide, followed by ligation of the 5 ’-end of the displacer oligonucleotide to a 3 ’-end of the insert DNA.
- a variety of adapter ligation methods are known in the art, including single stranded and double stranded ligation methods; in some implementations, any of these ligation methods can be utilized to produce next generation sequencing libraries of the densely methylated DNA.
- the biologically derived DNA can undergo chemical or enzymatic (or both) conversion processes to enable methylated cytosines to be distinguished from unmethylated cytosines in the DNA.
- the conversion process comprises bisulfite conversion.
- the conversion process comprises the conversion methods used in Enzymatic Methylation Sequencing (EM-seq).
- EM-seq Enzymatic Methylation Sequencing
- unmodified cytosine bases in the biologically derived DNA are converted to uracils by deamination, and can be subsequently replaced by thymine bases in PCR- amplified DNA copies.
- 5-methylcytosine bases and 5- hydroxymethylcytosine bases are protected from conversion, and are represented as cytosine bases in PCR-amplified DNA copies.
- alternative conversion methods could be used to produce the conversion patterns shown in Table 1.
- OxBS-seq Oxidative Bisulfite sequencing
- TAB-seq TET- Assisted Bisulfite sequencing
- ACE-seq APOBEC-Coupled Epigenetic sequencing
- TAPS-seq TET-Assisted Pyridine Borane sequencing
- Sequenced base* is sequencing output after conversion of input DNA and PCR.
- Table 1 is not comprehensive; additional conversion methods exist and could be used with our approach.
- the conversion is performed using chemical reagents, including but not limited to sodium bisulfite, potassium perruthenate, and/or pyridine borane.
- the conversion is performed using enzymatic methods including, but not limited to APOBEC3A, TET2, and/or T4-betaGal.
- the conversion is performed using a combination of enzymatic and chemical methods.
- adapters may contain modified bases which would be resistant to conversion.
- methylated CpG sites in the original biologically derived DNA be retained as CpG sites in the converted, PCR-amplified DNA copies, and that unmethylated CpG sites in the original biologically derived DNA be converted to a non-CG sequence in the converted, PCR-amplified DNA copies.
- converted DNA is amplified via a polymerase chain reaction (PCR).
- PCR amplification results in replacement of uracil bases in the converted DNA to thymine bases in the amplified DNA copies.
- dUTP nucleotides can be included in the PCR buffer to retain uracil bases in the converted DNA as uracil bases in the amplified DNA copies.
- the PCR amplification can be catalyzed by an enzyme that has the ability to read and amplify DNA templates containing uracil bases, including but not limited to Q5U polymerase (New England Biolabs), Phusion U polymerase (Thermo Fisher), and ZymoTaq Polymerase (Zymo Research).
- the polymerase chain reaction can be facilitated by thermocycling.
- an isothermal amplification reaction can be used to amplify the DNA, such as loop-mediated isothermal amplification (LAMP) or rolling circle amplification.
- thermocycling can be stopped before the PCR amplification reaches plateau phase to ensure that most amplified products remain double-stranded.
- fluorescence signal can be monitored via real time quantitative PCR to determine when thermocycling should be stopped prior to plateau phase. When a PCR amplification approaches plateau phase (saturation), the amplified products can become denatured with a low probability of becoming double-stranded by primer-extension in the next cycle (since PCR reagents have been exhausted).
- the amplified products have high sequence diversity (such as when genomic DNA is amplified), there is very low probability of re-annealing of top and bottom strand copies of a given template DNA molecule. If PCR reaches plateau phase, many amplified copies will be single-stranded, and will therefore be unable to undergo subsequent methylation via a CpG methyltransferase enzyme. In some implementations, single stranded DNA copies from a PCR amplification that was allowed to reach plateau phase could be restored to double-stranded DNA by one or more rounds of primer extension in a separate enzymatic reaction.
- PCR-amplified DNA products can be run on an electrophoretic gel (e.g., agarose) to selectively purify double-stranded DNA fragments of the desired size range.
- electrophoretic gel e.g., agarose
- PCR- amplified DNA products can be size-selected based on binding to solid-phase reversible immobilization (SPRI) paramagnetic beads.
- SPRI solid-phase reversible immobilization
- a CpG methyltransferase can be used to methylate cytosines bases at CpG sites in the converted, amplified double-stranded DNA copies.
- the CpG methyltransferase can be M.SssI.
- the CpG methyltransferase can be a member of the family of DNMT3 enzymes.
- the CpG methyltransferase can be a member of the DNMT1 enzymes.
- the CpG methyltransferase can be any methyltransferase with specificity for methylation of CpG sites.
- the converted, amplified DNA copies with methylation patterns restored can be subjected to enrichment based on methylation density of the DNA copies.
- densely methylated DNA copies are enriched.
- enrichment of densely methylated DNA copies is enabled by affinity capture using one or more antibodies that specifically bind to 5-methylcytosine.
- enrichment of densely methylated DNA copies is enabled by affinity capture using any member of the family of methyl binding domain proteins (MBD), or derivatives thereof, that have binding affinity for methylated double-stranded CpG sites.
- MBD methyl binding domain proteins
- enrichment of densely methylated DNA copies is enabled by affinity capture using MeCP2.
- 5-methylcytosine bases in the methylated DNA copies can be converted to 5-hydoxymethylcytosine or 5-formylcytosine or 5- carboxycytosine, and enrichment of DNA copies can be enabled by affinity capture based on binding to the correspondingly modified cytosine base.
- affinity capture of densely methylated DNA can be mediated by any of (but not limited to) the following: antibodies, aptamers, Affibodies, proteins, or peptides.
- the selectivity of enrichment of densely methylated DNA can be increased by including methylated competitor DNA molecules in the affinity purification mixture.
- the methylated competitor DNA can comprise DNA molecules with a high methylation density to competitively inhibit capture of DNA copies with a lower methylation density, and to promote capture of DNA copies with a high methylation density.
- the methylated competitor DNA can be synthesized via chemical means on an oligonucleotide synthesizer.
- the methylated competitor DNA can be produced via PCR amplification of a template that contains multiple CG dinucleotides, followed by methylation of CpG sites in the amplified competitor DNA copies using a CpG methyltransferase.
- the methylated competitor DNA can be derived from a natural source.
- the methylated competitor DNA can comprise many copies of a single defined sequence with a defined number of methylated CPG sites.
- the methylated competitor DNA can comprise many different sequences with a defined number of methylated CPG sites.
- the methylated competitor DNA can comprise many different sequences with a range of CPG density or CpG content.
- the methylated competitor DNA can be derived from a biological source including but not limited to animals, plants, microbes, or viruses. In some implementations, the methylated competitor DNA can be derived from chemical synthesis. In some implementations, the methylated competitor DNA can be derived from in vitro enzymatic reactions. In some implementations, the methylated competitor DNA can have a length between 200 base pairs and 400 base pairs. In some implementations, the methylated competitor DNA can have a length between 20 base pairs and 1000 base pairs. In some implementations, the competitor DNA can have a broad range of lengths without any specified limits. In some implementations, the competitor DNA can have an average CpG methylation density of between 3 and 20 methylated CpG sites per 100 base pairs.
- the competitor DNA can have an average CpG methylation density of between 5 and 15 methylated CpG sites per 100 base pairs. In some implementations, the competitor DNA can have an average CpG methylation density of between 6 and 10 methylated CpG sites per 100 base pairs. In some implementations, the methylation density of the competitor DNA can be adjusted to a level that yields a desired methylation density profile in the captured DNA of interest.
- various parameters of the affinity purification can be adjusted to achieve a desired methylation density profile in the captured DNA of interest; the perimeters include but are not limited to: amount of competitor DNA, methylation density of competitor DNA, amount of binding protein (or antibody), amount of affinity capture beads, density of capture sites on the beads or surface, temperature of the capture, buffer conditions of the capture, washing conditions, and conditions of elution.
- the selectivity of the enrichment method can be adjusted to capture mostly DNA fragments that map to CpG islands.
- a single round of capture can be performed.
- two or more rounds of capture can be performed to further remove fragments with low-methylation density. Sequence redundancy of the methylated DNA copies enables highly selective enrichment of densely methylated DNA, optionally including two or more rounds of enrichment, with minimal loss of unique sequence representation.
- PCR amplification is necessary to produce sufficient DNA for next-generation sequencing of the densely methylated DNA copies that were converted, amplified, re-methylated, and enriched via selective capture.
- a post-enrichment PCR amplification can be performed.
- primers used in a post-enrichment PCR amplification can incorporate additional sequences in the amplified DNA copies, including but not limited to indices or barcodes to enable sample multiplexing and sequences needed for compatibility with a sequencing platform (e.g., Illumina P5 and P7 sequences).
- the amplified next-generation sequencing library can be purified and/or undergo size-selection to enrich for DNA products of the appropriate length.
- Figure 1 provides a schematic illustration of an example method of selectively sequencing densely methylated DNA molecules from a population of DNA molecules with varying methylation density, while minimizing the loss of unique sequences derived from said densely methylated DNA molecules.
- the schematic representation shows 3 examples of biologically-derived input DNA molecules at the top of the figure: one with relatively dense methylation, a second with relatively sparse methylation, and a third with no methylation.
- the densely methylated DNA is shown with 4 symmetrically methylated CpG sites.
- densely methylated DNA fragments can have 8 or more methylated CpG sites (symmetric or asymmetric) in fragments of -100 to 250 base pairs in length.
- the example shows that adapters are ligated to input DNA fragments, and then the DNA undergoes enzymatic methyl conversion (or bisulfite conversion) followed by PCR amplification.
- the resulting converted and amplified DNA copies have sequences in which unmodified C bases were converted to T bases, whereas methylated or hydroxymethylated C bases were retained as C bases.
- some amplified copies shown in the schematic are derived from the converted Watson strand of the input DNA and some are derived from the converted Crick strand.
- a CpG methyltransferase enzyme is used to restore methylation at unconverted CpG sites in the converted, amplified DNA copies.
- CpG methyltransferase enables restoration of original methylation patterns.
- the amplified DNA copies with restored methylation patterns are then shown undergoing selective enrichment of densely methylated DNA copies by competitive binding to methyl binding domain protein (or antibody to 5-mC) and capture on magnetic beads. Densely methylated competitor DNA fragments are added to the capture mix to competitively inhibit the capture of fragments with lower methylation density.
- the ability to generate multiple redundant copies of each DNA input molecule with methylation patterns restored enables use of stringent capture and enrichment conditions (including an option of more than one round of capture) while preserving representation of unique sequences of densely methylated DNA fragments.
- the schematic illustrates the purification of a next-generation sequencing (NGS) library of densely methylated, converted sequences. Resulting sequences can be mapped to a reference genome and the original methylation status of cytosine bases can be inferred based on C to T conversion.
- NGS next-generation sequencing
- Figure 2 provides a more detailed schematic illustration of an example method in which double-stranded biologically-derived input DNA is converted and amplified with restoration of CpG methylation patterns.
- double- stranded biologically-derived input DNA shown at the top of the figure, a symmetrically methylated CpG site is shown (with methylated cytosines on both strands), and several unmethylated cytosines are also shown within and outside of a CpG context.
- Bisulfite conversion or Enzymatic Methyl conversion results in unmodified cytosines being converted to uracils by deamination. 5- methylcytosines and 5 -hydroxymethylcytosines are rarely converted to uracils.
- Figure 3 provides a detailed schematic illustration of an example method in which single-stranded biologically-derived input DNA is converted and amplified with restoration of CpG methylation patterns. The process is largely analogous to that shown in Figure 2, but the resulting converted, amplified, and re-methylated sequences are derived from conversion of a single-strand sequence.
- Figure 4 shows a schematic overview of an example method which enables enrichment of DNA molecules based on density of methylated CpG sites while minimizing loss of unique sequences derived from densely methylated DNA molecules.
- the figure highlights the creation of a converted, amplified NGS library with methylation patterns restored, which can then be subjected to stringent selection conditions to enrich densely methylated DNA while taking advantage of the sequence redundancy to preserve unique sequence representation of densely methylated DNA fragments.
- sequencing includes but is not limited to next-generation sequencing (NGS) or massively parallel sequencing.
- NGS next-generation sequencing
- an NGS platform used for analysis can be a sequencer made by Illumina.
- next-generation sequencing can be performed on an instrument manufactured by companies including but not limited to Illumina, Ion Torrent, Pacific Biosciences, Qiagen, Thermo Fisher, Roche, BGI, Complete Genomics, and Oxford Nanopore.
- sequencing can be performed in paired-end mode or in single-end mode.
- sequencing read lengths can be between 30 and 1000 bases.
- long -read sequencing can be used, in which read lengths are not defined.
- sequencing is performed with 150- or 100-base read-lengths, in paired-end mode.
- the sequencing output yields a plurality of converted sequences.
- the converted, densely methylated DNA copies produced using methods described herein can be analyzed via other analytical means including but not limited to microarrays, pyrosequencing, primer extension assays, hybridization with complementary oligonucleotides, and/or analysis via fluorescence in microfluidic devices.
- next-generation sequencing of DNA libraries produced using methods described herein yields a plurality of converted DNA sequences.
- converted sequences comprise sequences in which unmodified cytosine bases in the original DNA molecules are read as thymine bases in the converted sequences.
- converted sequences comprise sequences in which 5- methylcytosine bases in the original DNA molecules are read as cytosine bases in the converted sequences.
- converted sequences comprise sequences in which 5 -hydroxy methylcytosine bases in the original DNA molecules are read as cytosine bases in the converted sequences.
- converted sequences can be aligned to reference genome sequences that have been converted in silico.
- the plurality of converted sequences can be grouped into sets, wherein each set of sequences is determined to be derived from an individual DNA fragment.
- converted sequences can be compared to reference genome sequences that have not been converted to infer methylation states of cytosine bases in the original, unconverted DNA molecules.
- methylation states of multiple CpG sites in a DNA fragment can be used to evaluate a methylation level of the fragment.
- most converted sequences map to genomic regions with a high density of CpG sites, including but not limited to CpG islands.
- the converted sequence data can be used to evaluate fragment-level methylation patterns at CpG islands across a genome.
- fragment-level methylation patterns can be compared to methylation patterns obtained from independent evaluations of DNA derived from any of (but not limited to) the following: healthy tissues, diseased tissues, healthy cells, diseased cells, biospecimens from healthy individuals, biospecimens from individuals with disease, cancer cells, cancer tissues, or biospecimens from individuals with cancer.
- comparisons of fragment-level methylation patterns with independently obtained methylation data can enable identification of fragments that match expected methylation patterns for a cell type, a tissue, or a disease state.
- comparisons of fragment-level methylation patterns with reference methylation data can enable identification of fragments that do not match expected methylation patterns (aberrantly methylated fragments).
- identification of fragments that match expected methylation patterns for a disease state can be used to aid in diagnosis of said disease state.
- identification of fragments that match expected methylation patterns of a tissue or cell type can be used to infer the presence of DNA or measure the amount of DNA from that tissue or cell type in a biospecimen.
- identification of fragments that match expected methylation patterns for a particular cancer type can be used to aid in diagnosis of that cancer type.
- identification of fragments that do not match expected methylation patterns of healthy (non-cancerous) cells or tissues can be used to identify the presence of aberrant methylation patterns in a biospecimen that could be an indication of cancer-derived DNA.
- the number of fragments that have aberrant methylation patterns in a biospecimen can be used to infer the amount of cancer cell death contributing to tumor-derived cell-free DNA in the biospecimen.
- the number of fragments that have disease-associated methylation patterns in a biospecimen can be used to infer the amount disease-associated cell-free DNA in the biospecimen.
- measurement of the number of fragments with cancer-associated or disease-associated methylation patterns can aid in evaluating the extent or degree of the cancer or disease.
- disclosed methods can be used for clinical purposes. In some implementations, disclosed methods can be used for research purposes. In some implementations, disclosed methods can be used to determine if a person has a disease state. In some implementations, disclosed methods can be used to determine if a person has cancer. In some implementations, disclosed methods can be used to aid in early detection of cancer. For example, the detection of cancer-specific hypermethylated DNA fragment patterns in a clinical biospecimen such as plasma or urine can be used to identify patients who are likely to have cancer. In some implementations, disclosed methods can be used to estimate probabilities that a cancer originated from a particular type of tissue. For example, different cancer types are known to have cancer-type-specific methylation patterns.
- hypermethylated DNA patterns can be compared to expected patterns for various types of cancer to find similarities in patterns which can suggest that the hypermethylated DNA fragments were derived from a particular type of cancer.
- disclosed methods can be used to assess the stage of a cancer, the extent of a cancer, or the burden of tumor. For example, increased amounts of DNA fragments bearing tumor- specific methylation patterns in a biospecimen such as plasma may indicate a greater amount of tumor-DNA shedding which may be associated with a greater tumor burden (or cancer stage).
- disclosed methods can be used to assess prognosis of a disease based on evaluation of either the amount or the pattern of disease-specific hypermethylated DNA fragments, or both.
- disclosed methods can be used to assess the regression or progression of cancer. For example, changes over time in levels of DNA fragments bearing tumor- specific methylation patterns in a biofluid may indicate a corresponding change in the tumor burden of the patient.
- disclosed methods can be used to assess treatment response to cancer therapy. For example, it has been shown in many studies that a patient whose cancer is responding to therapy will often have a decrease over time in tumor-derived cell-free DNA (cfDNA) fragments measurable in his or her plasma. In some patients, tumor-derived cell-free DNA is shed at a higher rate initially as cancer cells are killed by the therapy and spill their DNA into the blood (a transient spike).
- cfDNA tumor-derived cell-free DNA
- tumor-derived cfDNA would be expected to decrease.
- changes in tumor-derived cell-free DNA levels can be measured by quantifying the amount of DNA fragments bearing tumor- specific methylation patterns.
- disclosed methods can be used to assess the presence of residual cancer after a patient receives a curative-intent therapy.
- a patient’s plasma can be tested following curative-intent therapy to detect the presence of cfDNA fragments containing cancer-associated methylation patterns.
- detection of small amounts of residual cancer after a curative-intent therapy can be challenging due to the very small amount of tumor-derived cfDNA fragments that may be shed into the blood.
- a patient-specific pattern of aberrant hypermethylation can be identified by initial testing of a biospecimen from that patient (for example, testing of tumor tissue or pre-treatment plasma).
- such a patient-specific pattern can be used to personalize the signal detection algorithm to improve detection sensitivity.
- a tumorspecific set of aberrantly hypermethylated CpG islands could be identified for a particular patient by testing tumor tissue or pre-treatment plasma of said patient (in which cancerspecific hypermethylated fragments are likely to be more abundant, providing a stronger signal).
- tumor tissue or pre-treatment plasma of said patient in which cancerspecific hypermethylated fragments are likely to be more abundant, providing a stronger signal.
- by identifying such aberrantly hypermethylated genomic regions that are specific to a particular patient’s tumor(s) one could look in post-treatment plasma for residual hypermethylation signal mapping to those same genomic regions (which would suggest the presence of persistent cancer after therapy).
- a personalized algorithm when applied to measurement of cancer signals in a subsequent biospecimen (e.g., post-treatment plasma), a personalized algorithm could assign greater weight to signals from hypermethylated DNA fragments that match aberrant methylation patterns already identified to be present in that patient’s tumor tissue or pre-treatment plasma.
- the disclosed methods do not require any physical or experimental alterations to the assay, as personalization could be achieved simply by bioinformatic modifications.
- disclosed methods can be used to assess recurrence of disease after a patient receives cancer therapy. Early detection of recurrent cancer can also require very high detection sensitivity.
- a personalized signal detection approach could also be employed for this purpose.
- disclosed methods can be used to monitor changes in tumor- specific methylation patterns over time in a patient to assess the epigenetic evolution of a tumor. Because the disclosed methods are able to assess hypermethylated CpG island and/or promoter sequences from anywhere in a genome, in some implementations, the methods can dynamically capture the evolution of hypermethylation patterns in a tumor over time. This can be done without pre-defining genomic target regions based on sequencespecific targeting. In some implementations, disclosed methods can be used to identify epigenetic mechanisms of resistance to drug therapy. Because changes in methylation patterns can be monitored dynamically over time in an untargeted manner, in some implementations, the disclosed methods can enable identification of methylation changes that give rise to drug resistance without requiring pre-defined hypotheses for resistance mechanisms.
- monitoring of dynamic changes in CpG island and/or promoter hypermethylation patterns could enable assessment of evolving cells states (e.g., epithelial to mesenchymal transition, transformation from adenocarcinoma to small cell carcinoma, etc.).
- evolving cells states e.g., epithelial to mesenchymal transition, transformation from adenocarcinoma to small cell carcinoma, etc.
- disclosed methods can be used to assess a variety of pathologies by identifying tissue- specific patterns of hypermethylation in blood. For example, a patient with liver cirrhosis may shed liver-derived DNA into the blood stream, allowing DNA fragments with liver-specific methylation patterns to be detected at higher levels than in the general population. In some cases, the amount of such a signal could be correlated with the severity or extent of disease. In some implementations, changes in tissuespecific hypermethylation signals over time could be used to monitor for exacerbations or improvements in a disease process.
- methylation patterns derived from non-diseased cells
- significant changes in disease-related methylation signals can be more readily identified by comparing signal in the same patient at different time points rather than comparing a patient’s signal against measurements in a population.
- Similar analysis could be applied to methylation patterns that are specific to other organs including but not limited to: kidney, heart, lung, brain, muscles, bones, intestines, and pancreas.
- disclosed methods can be used to assess transplanted organ rejection based on measurement of organ-specific methylation patterns.
- disclosed methods can be used to assess methylation patterns of fetal or placental DNA from the maternal circulation in pregnancy.
- disclosed methods can be used to assess changes in cells of a person’s immune system as an indication of health or disease.
- disclosed methods can be used to assess aging. For example, changes in DNA methylation are known to occur as individuals age. Such changes could be measured to evaluate health status via an assessment of epigenetic age.
- Biological aging can be measured via epigenetic clocks which are based on evaluation of DNA methylation changes. These clocks, such as the Horvath and Hannum clocks, analyze the methylation status of specific age- associated CpG sites across the genome.
- disclosed methods which enrich densely methylated DNA fragments mapping to CpG islands can be used to evaluate methylation levels and/or patterns that can provide an estimation of biological age.
- organspecific methylation changes could be evaluated to assess pathology or stress in an organ. For example, an individual who has a long history of excessive alcohol consumption may have a disproportionately high epigenetically measured age of his or her liver.
- disclosed methods can be used in biomedical research to characterize hypermethylation patterns that can provide an assessment of gene expression states.
- disclosed methods can he used in biomedical research to identify hypermethylation patterns to provide an understanding of fundamental cellular or developmental epigenetic processes.
- disclosed methods can be used in biomedical research or clinical applications to evaluate methylation patterns in single cells or small clusters of cells because some methods are compatible with analysis of very small amounts of input DNA.
- disclosed methods can be used to characterize methylation patterns in an ovum, sperm, or embryo to guide clinical decisions pertaining to in vitro fertilization.
- disclosed methods can be used to evaluate methylation patterns in DNA derived from vertebrate organisms.
- CpG islands which are genomic regions that have a high density of CpG sites, are found in the genomes of nearly all vertebrate organisms. Because disclosed methods can enrich densely methylated DNA fragments regardless of the genomic origin of said fragments, disclosed methods can be applied to analysis of methylation patterns in human and/or non-human vertebrate species.
- disclosed methods can be used in veterinary medical applications in a manner that is analogous to human medical applications.
- disclosed methods can be used to detect, diagnose, and/or monitor cancer in vertebrate animals, including but not limited to household pets, farm animals, and horses.
- disclosed methods can be used to detect, diagnose, and/or monitor various diseases in vertebrate animals.
- disclosed methods can be used for agricultural applications to detect, diagnose, and/or monitor disease in livestock.
- disclosed methods can be used in biomedical research applications to study methylation patterns in model organisms including but not limited to mice, rats, frogs, and fish.
- disclosed methods can be used in laboratory animals having human xenografted tumors.
- disclosed methods can be used to distinguish methylation patterns arising from xenografted tumor cells versus from the host animal’s cells and/or tissues.
- disclosed methods can be used to enrich densely methylated DNA fragments arising from a virus.
- methylation state of DNA in a DNA virus can change depending on whether the DNA is in the virion or in a host cell, and also depending on the state of the host cell.
- the Epstien-Barr Virus genome is known to be unmethylated in virions but becomes highly methylated during latent infection and in transformed B cells.
- the proliferation and turnover of EB V-infected B cells can lead to increased shedding of hypermethylated EB V DNA into plasma, which can be exploited as a biomarker signal for lymphoma detection.
- the DNA of several viruses has been observed to become hypermethylated in virus-associated malignancies (e.g.
- hypermethylated viral DNA has a similar methylation density as CpG islands in vertebrate genomes, it can be enriched from complex DNA mixtures (for example, cell-free DNA) using methods disclosed herein.
- disclosed methods can be used to enrich densely methylated viral DNA in parallel with densely methylated human and/or vertebrate animal DNA.
- disclosed methods can be used to measure densely methylated viral DNA as a biomarker of cancer.
- disclosed methods can be used to enrich densely methylated DNA fragments arising from Epstein-Barr Virus as a biomarker for detection of lymphoma in patients with HIV. In some implementations, disclosed methods can be used to enrich densely methylated DNA fragments arising from Epstein-Barr Virus as a biomarker for detection of post-transplant lymphoproliferative disorder (PTLD) in patients receiving immunosuppressive therapy after organ transplantation. In some implementations, disclosed methods can be used to measure densely methylated viral DNA as a biomarker to evaluate latent or lytic viral state. In some implementations, disclosed methods can be used to measure densely methylated viral DNA as a biomarker of disease involving shedding of said viral DNA from infected cells. For example, shedding of hepatitis B or hepatitis C viral DNA into blood could serve as a biomarker of liver cell death.
- shedding of hepatitis B or hepatitis C viral DNA into blood could serve as a biomarker of liver
- disclosed methods can be combined with spatially encoded DNA barcoding techniques to permit genome- wide analysis of methylation patterns at CpG islands and/or promoters in tissues in a spatially resolved manner.
- spatially encoded DNA barcodes can be incorporated in or added to sequencing adapters.
- spatial DNA barcodes can be attached by ligation.
- spatial DNA barcodes can be attached by primer extension.
- spatial DNA barcodes attachment can be facilitated by a transposase.
- disclosed methods can be used to evaluate additional features of enriched densely methylated DNA fragments including but not limited to mutations, DNA fragment size, fragment location within the genome, and/or nucleosome protection pattern.
- information gained from analysis of such additional DNA features could enable improved biomarker performance compared to analysis of DNA methylation patterns alone.
- disclosed methods for enriching densely methylated DNA fragments can be preceded by chromatin immunoprecipitation (ChIP) to selectively enrich DNA fragments associated with histones having particular modifications.
- immunoprecipitation using antibodies that specifically bind to, for example, Histone H3K27me3, Histone H3K9me3, Histone H3K4me3, and/or Histone H3K27ac could be used to enrich DNA fragments based on chromatin features prior to enrichment based on methylation density.
- such sequential, orthogonal enrichment steps could yield more nuanced biomarker signals and/or improve the sampling of cancer-specific signals.
- the ability to directly determine methylation status of CpG sites from converted sequence data results in greater accuracy in measurement of densely methylated DNA fragments.
- methods such as MeDIP or MBD Capture enrich methylated DNA directly from biological sources without conversion, and captured fragments are presumed to be methylated because they were captured based on methylationspecific binding.
- some fragments that have zero or few methylated CpG sites can also be non-specifically captured. Without the ability to assess methylation status, such fragments may be incorrectly presumed to have high CpG methylation content, thereby contributing to inaccurate background noise of an assay.
- the methods described herein can improve the accuracy of measuring densely methylated DNA fragments because enriched fragments are converted, and can be verified to have a high CpG methylation density by comparison to aligned reference genomic sequences.
- kits for performing the methods disclosed herein can comprise the reagents and materials necessary for the conversion of DNA, for PCR amplification, for CpG methylation, and for enrichment of densely methylated DNA.
- a kit for performing the methods disclosed herein can additionally comprise reagents and materials necessary for production of next-generation sequencing libraries.
- a kit for performing the methods disclosed herein can additionally comprise instructions and quality control materials to ensure accurate and reproducible results.
- a kit for performing the methods disclosed herein can additionally comprise software and/or access to computational resources to enable analysis of next-generation sequencing data.
- a method of attaching adapter oligonucleotides to double- stranded DNA fragments can be employed which utilizes two sequential enzymatic ligation steps to minimize formation of adapter dimers.
- the adapters can be ligated to double-stranded DNA fragments of interest for the purpose of facilitating analysis by next-generation sequencing.
- Adapter dimers can be formed via ligation of one adapter molecule to another adapter molecule.
- Adapter dimers can be problematic for nextgeneration sequencing (NGS) libraries. These dimers can dominate the sequencing output, overwhelming the sequence output from the desired DNA fragments of interest.
- Adapter dimers are more likely to form when the DNA fragments of interest are very low in abundance, as reaction stoichiometry in such situations favors ligation of adapters to other adapters over ligation of adapters to the DNA fragments of interest.
- Adapter dimers can also be more efficiently amplified during PCR than the desired product of adapters ligated to the DNA fragments of interest because of the dimers’ short length (generally, shorter targets amplify more efficiently in PCR). Therefore, it is important to minimize formation of adapter dimers, especially when DNA input quantities for NGS library preparation are low.
- a two- step ligation method disclosed herein is able to greatly reduce adapter dimer formation.
- the method uses stem- loop adapters that lack a 5 ’-phosphate which would be required for adapter self-ligation.
- a stem-loop adapter is able to ligate via its 3 ’-end to a 5’-phosphylated strand of a double-stranded DNA fragment of interest (insert DNA), but not to the opposite strand.
- stemloop adapter molecule is unable to ligate to another stem loop adapter molecule because of the lack of 5’-phosphate ends on adapter molecules.
- a second oligonucleotide having a 5 ’-phosphate can be hybridized to the ligated stem-loop adapter (by displacing one strand of DNA at the stem) and then ligated to the target DNA in a second enzymatic step.
- the adapter used in the first ligation step is a stem-loop adapter.
- the adapter used in the first ligation step can comprise two strands which are partially complementary and hybridized (known in the art as a Y-shaped adapter).
- the adapter used in the first step can comprise two strands of DNA: a first strand having a 3 ’-end that is available for ligation to a 5 ’-phosphorylated DNA fragment of interest, and a second strand that is either partially or fully hybridized to the first strand in a manner that would enable a DNA ligase to catalyze ligation of the first strand to the DNA of interest but wherein the second strand lacks a 5 ’-phosphate and therefore cannot be ligated.
- the target DNA fragments can be blunt-ended.
- the target DNA fragments can have overhangs at their ends, such as a 3’-dA tail.
- the first ligation step can utilize a DNA ligase that has optimal efficiency for ligation of double- stranded DNA (such as T4 DNA ligase or NEBNext Ultra II DNA ligase).
- a DNA ligase that has optimal efficiency for ligation of double- stranded DNA
- the DNA ligase and the excess unligated adapter molecules can be removed prior to the second ligation step by performing a DNA clean-up step.
- cleavable positions (such as dU) can be incorporated into adapter oligonucleotides to facilitate hybridization of the displacer oligonucleotides in the second ligation step.
- a 5-phosphorylated displacer molecule can be ligated to the double-stranded DNA of interest in the second ligation step using a nicksealing ligase such as HiFi Taq DNA ligase (New England Biolabs).
- a nick-sealing ligase such as HiFi Taq DNA ligase (New England Biolabs).
- HiFi Taq DNA ligase New England Biolabs
- the two-step ligation method disclosed herein can enable ligation of adapters and displacer oligonucleotides to very low amounts of input DNA (double-stranded DNA of interest).
- the two-step ligation method disclosed herein can enable next-generation sequencing from DNA derived from a small number of cells (less than 10 cells or less than 100 cells). In some implementations, the two- step ligation method disclosed herein can enable next-generation sequencing from DNA derived from a single cell.
- Figure 5 shows an example schematic of a two-step ligation scheme that enables ligation of adapter sequences to double- stranded DNA fragments of interest while minimizing formation of adapter dimers.
- the illustrated two-step ligation scheme was used in Example 1 in the Detailed Description section to attach adapter sequences to blunted and 5’- phosphorylated cell-free DNA fragments and genomic DNA fragments.
- the first step involves ligation of stem-loop adapters to the insert DNA by forming a phosphodiester linkage between a 5 ’ -phosphylated end of the insert DNA and the 3 ’-hydroxyl end of the stem-loop adapter.
- the 5 ’-end of the stem-loop adapter lacks a phosphate, and therefore cannot be ligated to the insert or to another stem-loop adapter molecule (thereby avoiding adapter dimer formation).
- USER enzyme is used to cleave at deoxyUridine positions in the stem of the stem-loop adapter to destabilize base-pairing. DNA is then cleaned up to remove unligated adapters, ligase, and USER enzyme.
- a displacer oligonucleotide is added to displace one strand of the stem-loop adapter by hybridization to the opposite (ligated) strand, as shown in the figure.
- a nick-sealing ligase (HiFi Taq DNA ligase) is used to ligate the 5 ’-phosphorylated displacer oligonucleotide to the 3 ’-end of the insert DNA.
- the stem-loop adapter and displacer oligonucleotides were designed to attach sample barcodes and Illumina adapter sequences to the input DNA fragments.
- adapter sequences for other sequencing platforms could be readily substituted.
- Blood was collected by venipuncture into a blood collection tube containing potassium-EDTA or containing a proprietary anticoagulation and stabilization cocktail designed to limit cellular degradation and to stabilize cell-free DNA (Cell-free DNA BCT from Streck). Tubes had 10 mL capacity, and at least 8 mL blood volume was required to be collected in each tube. Blood was inverted in the tube several times at the time of collection to ensure even mixing with the anticoagulant and/or stabilizer. Samples were kept at room temperature (20-25°C) during temporary storage and transportation prior to separation of plasma. Plasma was separated and frozen as soon as possible after blood collection, preferably within four hours if collection was in an EDTA tube or within 2 weeks if collection was in a Streck tube.
- the collection tubes were centrifuged at 1000 x g for 10 minutes in a clinical centrifuge with a swinging bucket rotor with slow acceleration and deceleration (brake off). Plasma was removed from the red blood cells and buffy coat using a 1 mL pipette, being careful not to disturb the cells in the tube. The plasma was dispensed into 1.5 mL cryovials in 0.5 to 1 mL aliquots. The plasma was then frozen at -80° C until needed for further processing.
- Blood was obtained from patients with various types of cancer at various stages. For some patients, blood was obtained at multiple time points before and during therapy. Blood was also obtained from individuals who did not have a cancer diagnosis (control subjects). Some of these control subjects had a history of heavy smoking and were participating in a lung cancer screening program based on eligibility according to the guidelines of the United States Preventative Services Task Force. All subjects provided informed written consent for participation in the study, which was approved by the Human Investigation Committee of Yale University.
- Plasma was removed from the -80° C freezer and was thawed at room temperature for 15 to 30 minutes before proceeding with DNA extraction. Thawed plasma was then centrifuged at 6800 x g for 3 minutes to remove any cryoprecipitate. The supernatant was transferred to a fresh tube for further processing.
- a QiaAmp® MinElute® Virus Vacuum Kit (Qiagen) was used for extraction of DNA from plasma volumes up to 1 mL (elution volume as low as 20 LI L). For larger volumes of plasma up to 5 mL, the QiaAmp® Circulating Nucleic Acid Kit was used for DNA purification (elution volume as low as 20 pL).
- kits were used according to the manufacturer's instructions, generally eluting the DNA into the lowest recommended volume (preferably 20 pL).
- cRNA carrier RNA
- Qiagen carrier RNA
- Genomic DNA was extracted from frozen tumor tissue samples or cancer cell lines using a DNeasy Blood & Tissue Kit (Qiagen), according to the manufacturer’s instructions.
- tissue- or cell-derived gDNA Before further processing tissue- or cell-derived gDNA for next-generation sequencing library preparation, the DNA was sheared into fragments with an average length of 180 - 200 bp using focused ultrasonication (Covaris).
- the cell-free DNA and fragmented gDNA samples were quantified by real-time quantitative PCR using a KAPA Human Genomic DNA Quantification and QC Kit for Illumina platforms (Roche) with the 129 bp Primer Premix, suitable for the expected fragment size distribution of the samples.
- Varying amounts of cell-free DNA or fragmented gDNA were obtained and quantified.
- minimum and maximum input DNA limits were set at 1 ng and 30 ng, respectively.
- a quantitative spike-in control DNA mixture was added to each sample to enable comparison of library preparation efficiency across samples.
- the spike-in control mixture consisted of PhiX 174 RF I DNA (New England Biolabs) that was fragmented to an average size of 180-200 base pairs by ultrasonication (Covaris). Approximately 50% of the fragments in the mixture were unmethylated, and 50% of the fragments had undergone CpG methylation using a CpG Methyltransferase (M.SssI; New England Biolabs) according to the manufacturer’s instructions. A total of approximately 2 picograms of the spike-in control mixture was added to each DNA sample.
- the DNA samples (each in 20 microliters of 10 mM Tris-HCl pH 7.8 buffer) were treated with an enzyme mix comprising T4 DNA Polymerase and T4 Polynucleotide Kinase as provided in the Quick Blunting Kit (New England Biolabs; following manufacturer’s protocol), to produce 5 ’-phosphorylated, blunt-ended DNA. Enzymes were then heat-inactivated by incubation at 70°C for 10 minutes.
- the blunted, 5 ’-phosphorylated DNA was then ligated to custom stem-loop oligonucleotide adapters using the NEBNext Ultra II Ligation Module (New England Biolabs) according to the manufacturer’s protocol.
- the custom stem loop adapters and accompanying displacer oligonucleotides were designed to greatly reduce the formation of adapter dimers, thereby enabling preparation of sequencing libraries from very small amounts of input DNA.
- the adapter oligonucleotide sequences are as follows: EMSv2m- 1 AGTXYAAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTTAGAXT
- FIG. 5 A two-step scheme of one-strand ligation followed by displacement and ligation of the second strand is shown in Figure 5. Because the 5 ’-end of the stem- loop adapter was not phosphorylated, an adapter molecule is unable to become ligated to another adapter molecule, thereby minimizing formation of adapter dimers. The 3’ -end of the adapter is able to become ligated to the 5’ -ends of the double- stranded insert DNA fragment which is phosphorylated at its 5 ’-ends.
- the stem-loop adapters also contained deoxyUridine (dU) within the stem sequence to permit site-specific cleavage by USER enzyme (New
- samplespecific barcode sequences were included in the adapter sequence to enable multiplexed sequencing of a plurality of samples on the same lane of a next-generation sequencing instrument (because sequences can be sorted into sample-specific datasets based on their barcode sequence).
- sequences can be sorted into sample-specific datasets based on their barcode sequence.
- a single uniquely barcoded adapter sequence was used for ligation to each sample, such that 24 individual samples would be ligated to adapters labeled with 24 distinct barcodes (1 to 1 mapping).
- the bioinformatic demultiplexing algorithm would require both ends of the sequence to labeled with the same barcode.
- concentration of stem-loop oligonucleotide adapter used in the ligation reaction was 1 micromolar in a final reaction volume of 45 microliters.
- USER enzyme New England Biolabs was added at the manufacturer’s recommended concentration and the sample was incubated at 37°C for 30 minutes to cleave dU sites in the adapters.
- DNA was then cleaned up to remove enzymes, buffers, and unligated adapters using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified).
- the clean-up process included wash steps followed by elution in 12 microliters of 10 mM Tris-HCl pH 7.8 for each sample.
- displacer oligonucleotides which have a phosphorylated 5 ’ -end were hybridized to the complementary sequence of the ligated adapter oligonucleotide and were ligated using HiFi Taq Ligase (New England Biolabs) according to the manufacturer’s protocol.
- HiFi Taq Ligase efficiently seals nicks in DNA with very high fidelity, exhibiting greatly reduced ligation efficiency if there are mismatched base pairs at either side of the ligation junction.
- the two-step ligation method which greatly reduces adapter dimer formation is schematized in Figure 5.
- the sequence of the displacer oligonucleotide (with 24 distinct barcode sequences that match the barcode sequences of the stem-loop adapter oligonucleotides) is as follows:
- the displacer oligonucleotide used in the second ligation step had the same barcode as the stem-loop adapter used in the first ligation step, to ensure perfect base pairing between the displacer and adapter sequences in the vicinity of the ligation junction.
- the concentration of the displacer oligonucleotide in the reaction was 0.5 micromolar, and the final reaction volume was 25 microliters (for each sample).
- the reaction was incubated at 60°C for 30 minutes.
- Ligated DNA was then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified).
- Enzymatic conversion of ligated DNA was performed using the NEBNext® Enzymatic Methyl-seq (EM-seq) Conversion Module kit (New England Biolabs), according to the manufacturer’s instructions.
- This method is an alternative to bisulfite conversion, causing less damage, fragmentation, GC bias, and degradation of DNA.
- the method enables identification of 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC) bases in DNA by efficiently converting unmodified cytosine bases (not 5-mC or 5-hmC) to uracil bases.
- the EM-seq method comprises two steps: (1) The enzyme TET2 is used to oxidize 5- mC and 5-hmC to 5-carboxycytosine (5-caC), providing protection from deamination by APOBEC enzy me; (2) The enzyme APOBEC is used to deaminate unmodified cytosines to uracils, while the 5-mC and 5-hmC bases which were oxidized to 5-caC in the first step are protected from deamination. Between the two steps, TET2-converted DNA was cleaned up according to the manufacturer’s protocol.
- APOBEC-mediated deamination of cytosine is more efficient with single-stranded DNA
- formamide was used to denature the DNA prior to the APOBEC enzymatic reaction, according to the manufacturer’s protocol.
- the ligated adapter and displacer oligonucleotides contained several 5-mC positions which were protected from deamination and conversion to uracils.
- Converted DNA was then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified). Cleaned-up DNA was eluted in 20 microliters of 10 mM Tris-HCl pH 7.8 for each batch of 12 samples.
- PCR polymerase chain reaction
- PCR-amplification was carried out using the NEBNext® Q5U Master Mix (New England Biolabs), according to the manufacturer’s protocol.
- the Q5U high fidelity DNA polymerase harbors a mutation which enables amplification of templates containing uracil bases.
- PCR primers were designed to hybridize to the adapter and displacer sequences, and the primers comprised the following sequences (5-carboxy-cytosine bases were included to prevent methylation at those bases in subsequent steps):
- Primers were added to the reaction at a final concentration of 200 nanomolar for each primer.
- SYBR Green I dye (Thermo Fisher Scientific) was added to the PCR reaction at the concentration recommended by the manufacturer to permit fluorescence-based measurement of double- stranded DNA amplification during real-time quantitative PCR.
- Quantitative PCR was carried out on a CFX96TM System (Bio-Rad) thermocycler, and change in fluorescence signal during the reactions was monitored in real-time. Samples were removed from the thermocycler as the amplification neared saturation (plateau of fluorescence signal), but approximately 2-3 cycles prior to reaching saturation.
- Thermocycling parameters were as follows: (1) 98°C for 30 seconds, (2) 98°C for 10 seconds, (3) 62°C for 30 seconds, (4) 65°C for 60 seconds, (5) repeat thermocycling steps #2-4 until the real-time fluorescence signal begins to plateau. Samples were removed from the thermocycler after the 65°C extension step, approximately 2-3 cycles prior to reaching plateau of fluorescence.
- uracil bases in the template DNA were replaced with thymine bases in the DNA copies, whereas 5-carboxycytosine (oxidation product of 5-mC and/or 5-hmC) bases in the template DNA were replaced with cytosine bases in the DNA copies.
- methylated cytosine bases (5-mC or 5-hmC) in the original template DNA were retained as cytosine bases in the converted, PCR-amplified copies.
- Any unmodified cytosine bases in the original template DNA were converted to thymine bases in the converted, PCR-amplified copies. Notably, the conversion process did not achieve completely accurate discrimination of methylated vs.
- PCR-amplified DNA was then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified). Cleaned-up DNA was eluted in 12 microliters of 10 mM Tris-HCl pH 7.8 for each batch of 12 samples.
- the double- stranded, converted, amplified DNA copies underwent CpG methylation using M.SssI according to the manufacturer’s protocol (including a buffer supplemented with S-adenosylmethionine).
- CpG sites CG dinucleotides
- TG dinucleotides or CA if an unmethylated C on the opposite strand was converted
- cytosines that were either methylated or unmethylated outside of a CpG context were not methylated by M.SssI.
- the methylation pattern at CpG sites in an original template DNA molecule could be reconstituted on the DNA copies after conversion and PCR-amplification using a CpG methyltransferase.
- Converted, amplified DNA copies with methylation patterns restored were then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1 .3 x the volume of bead slurry relative to the reaction volume to be purified).
- Cleaned-up DNA was eluted in 12 microliters of 10 mM Tris-HCl pH 7.8 for each batch of 12 samples.
- MBD Methyl-CpG-binding domain
- the MethylCap Kit first bound MethylCap proteins to methylated DNA fragments in solution, and then the complexes were captured with glutathione-coated magnetic beads. A magnetic field was used to isolate the beads, and after two wash steps, the methylated DNA was eluted from the beads into a high salt buffer provided in the kit. A densely methylated competitor DNA (1 microgram) was mixed with the methylated library DNA copies prior to binding with MethylCap protein to reduce the probability of capture of less densely methylated DNA fragments. The competitive capture and elution process was repeated a second time to yield a library with even lower representation of fragments with low or moderate methylation density (less than 8 methylated CpG sites per DNA fragment).
- the competitor DNA consisted of a double-stranded, 226 base-pair long PCR product (amplicon) containing 10 CpG sites, derived from amplification of a segment of PhiX174 RF I phage DNA (New England Biolabs). Importantly, the competitor DNA did not contain sequences that would be required for next-generation sequencing (e.g., Illumina Read 1 and Read 2 sequences), so although the competitor DNA was captured and eluted along with the densely methylated library DNA fragments, the competitor DNA was not able to participate in downstream PCR or sequencing reactions.
- the following primers were used to generate the PCR-amplified competitor DNA: lOCpGFWD:
- PCR was performed using EmeraldAMP® Max HS PCR Mastermix (Takara) to produce a high yield of amplified DNA according to the manufacturer’s instructions, using 0.5 ng of PhiX174 RF I phage DNA (New England Biolabs) as a template and 0.5 micromolar concentration of each primer. Thermocycling parameters were set according to the manufacturer’s recommendations, with an annealing temperature of 60°C. PCR was carried out to saturation (plateau phase) to maximize product yield. The PCR product was purified using QIAquick PCR Purification kit (Qiagen) according to the manufacturer’s protocol. The purified DNA underwent CpG methylation using M.SssI (New England Biolabs) according to the manufacturer’s protocol.
- Methylated competitor DNA was again purified using a QIAquick PCR Purification kit (Qiagen) according to the manufacturer’s protocol. For each round of competitive capture performed on a batch of 12 samples, 1 microgram of methylated competitor DNA was used.
- the selectively captured densely methylated DNA copies were then further amplified by PCR to produce enough DNA library copies for loading onto a flow cell of an Illumina NovaSeq next-generation sequencing instrument.
- the PCR amplification was carried out using NEBNext® Dual Index Primers for Illumina® (with 8 base-pair indices) according to the manufacturer’s protocol.
- a distinct Illumina index pair was used for each batch of 12 samples that were intended to be sequenced on the same lane of the sequencing flow cell (allowing multiple batches of 12 samples to be sequenced in a multiplexed fashion on a single flow cell lane). As many as 8 batches (96 samples total) have been successfully multiplexed on a single flow cell lane.
- NEBNext® Q5U Master Mix (New England Biolabs) was used for the PCR amplification.
- Primers were added to the reaction at a final concentration of 200 nanomolar for each primer.
- SYBR Green I dye (Thermo Fisher Scientific) was added to the PCR reaction at the concentration recommended by the manufacturer to permit fluorescence-based measurement of double- stranded DNA amplification during real-time quantitative PCR.
- Quantitative PCR was carried out on a CFX96TM System (Bio-Rad) thermocycler, and change in fluorescence signal during the reactions was monitored in real-time.
- thermocycling parameters were used: (1) 98°C for 30 seconds, (2) 98°C for 10 seconds, (3) 62°C for 30 seconds, (4) 65°C for 60 seconds, (5) repeat thermocycling steps #2-4 until the real-time fluorescence signal begins to plateau. Samples were removed from the thermocycler after the 65°C extension step, approximately 1-2 cycles prior to reaching plateau of fluorescence signal. Amplified, indexed sequencing libraries were then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified). Cleaned-up DNA was eluted in 12 microliters of 10 mM Tris-HCl pH 7.8 for each batch of 12 samples.
- the amplified, indexed libraries were further purified on a precast E-GelTM SizeSelecfTM II Agarose Gel, 2% (Thermo Fisher) using an E-Gel Power Snap Electrophoresis System (Thermo Fisher), according to the manufacturer’s protocol.
- a DNA ladder run in a separate gel lane as a size reference, a band in the size range of approximately 320-360 base-pairs (representing the expected size of the library) was recovered from the gel for libraries produced from cfDNA with a mononucleosomal size distribution.
- a library was produced from sheared genomic DNA, a broader size distribution was expected due to random fragmentation, and accordingly a broader band was recovered in a size range of approximately 300-380 base pairs.
- the DNA was recovered in deionized water and could be used without further purification as a library input for nextgeneration sequencing on an Illumina flow cell (after appropriate adjustment of concentration).
- Next-seneration sequencing To prepare the library for loading onto an Illumina NovaSeq flow cell, the concentration of DNA was measured using a KAPA Library Quantification Kit (Roche) according to the manufacturer’s protocol. The size profile and concentration of the libraries was also evaluated on a Bioanalyzer (Agilent). Libraries were diluted to the concentration recommended for the flow cell to be used (both S 1 and S4 flow cells were used in different experiments). Cluster formation was carried out on the flow cell according to Illumina’s protocol. Sequencing was performed on a NovaSeq 6000 instrument in multiplexed paired- end mode, with a read length of 150 base pairs in each direction (2 x 150 bp mode). Two index reads were also performed, with read lengths of 8 bases each. Data was output to a server from which files could be downloaded for further processing.
- the sequence output from the Illumina sequencer was analyzed according to the following general scheme. First, read pairs were demultiplexed based on Illumina indexes to sort read pairs arising from different sample batches. Then, then read pairs were further sorted based on sample barcodes to yield sample-specific sets of read pairs. Read pairs were discarded if their sample barcode sequences did not exactly match one of the used barcodes or if the barcodes of a pair of reads did not match each other. Low-quality reads were also filtered out according to quality filtering parameters recommended by Illumina. Next, any adapter sequences identified at the ends of reads were trimmed.
- Each read-pair from a given cluster was then joined by overlapping the 3 ’-regions to re-create a full sequence of a DNA insert fragment (merged read pairs). Any read-pairs that had ⁇ 95% sequence agreement in their overlapping 3 ’-regions (imperfect complementarity) were discarded because such discrepancies would be indicative of sequencer errors.
- an initial de-duplication was performed to remove any replicate sequences that had exactly identical sequences. Such deduplicated sequences were annotated to record the number of replicate sequences that were collapsed into a single sequence.
- Resulting sequences were then further processed using Bismark software (Babraham Bioinformatics Institute) to map sequences to the human genome (using an in silico C to T converted reference genome) and to perform methylation status calling (using an unconverted reference genome).
- Bismark software (Babraham Bioinformatics Institute) to map sequences to the human genome (using an in silico C to T converted reference genome) and to perform methylation status calling (using an unconverted reference genome).
- Build version hg38 of the human reference genome was used.
- Bismark used the short read aligner Bowtie 2 to map sequences to the human genome.
- a further de-duplication step was performed by Bismark to remove alignments mapping to the same position (including start and end positions) in the genome, unless the sequences aligned to the same genomic position but on different strands.
- sequences that were considered to be truly densely methylated sequences were required to meet all of the following filter criteria: (1) must contain no more than 20% cytosines that were read as being methylated outside of a CpG context, (2) must contain a minimum of 10 CpG sites, and (3) must contain a minimum of 80% methylated cytosines at CpG sites.
- WGBS whole genome bisulfite sequencing
- WGBS data from the following cell types were used: alternatively activated macrophage, band form neutrophil, CD 14-positive CD 16-negative classical monocyte, CD3-negative CD4-positive CD8-positive double positive thymocyte, CD3-positive CD4-positive CD8-positive double positive thymocyte, CD34-negative CD41- positive CD42 -positive megakaryocyte cell, CD38-negative naive B cell, CD4-positive alpha-beta T cell, CD4-positive alpha-beta thymocyte, CD8-positive alpha-beta T cell, CD8- positive alpha-beta thymocyte, central memory CD4-positive alpha-beta T cell, central memory CD8-positive alpha-beta T cell, class switched memory B cell, conventional dendritic cell, cytotoxic CD56-dim natural killer cell, effector memory CD4-positive alpha- beta T cell, effector memory CD8-positive alpha-beta T cell
- WGBS data from the following cell types were used: aorta, esophagus, left ventricle, liver, lung, macrophage, natural killer cell, pancreas, primary hematopoietic stem cells GCSF-mobilized, psoas muscle, sigmoid colon, small intestine, spleen, stomach, T Cell, and thymus.
- an expected average methylation level was calculated for each healthy tissue or cell type by averaging the beta values at all CpG sites in the genomic region covered by the sequence. For example, if a sequence was mapped to a 170 base-pair region of a CpG island on chromosome 2, and this region contained 13 CpG sites, an average methylation level would be calculated for each healthy tissue or cell type by averaging the 13 beta values at the corresponding genomic region in the WGBS data. Thus, for each DNA sequence, a list of corresponding expected average methylation level values was generated from the healthy tissue/cell public WGBS datasets.
- a sequence (fragment) was considered to be aberrantly hypermethylated if it passed the filters for being considered a truly densely methylated sequence, and additionally, none of the expected average methylation level values from all healthy samples exceeded 0.4 (or 40%).
- a truly densely methylated sequence was considered to be aberrantly hypermethylated if it mapped to a genomic region that was known to have a low expected average methylation level in all queried healthy cell types and tissues.
- an aberrantly hypermethylated sequence also mapped to a genomic region annotated as a CpG island, it was considered an aberrantly hypermethylated CpG island sequence.
- Figure 6 shows histograms comparing the CpG dinucleotide content of sequenced cfDNA fragments before vs. after two rounds of selective capture and elution of densely methylated DNA fragments (which is referred to here as high density methyl-capture).
- the CpG dinucleotide count refers to the number of CpG sites (methylated, hydroxymethylated, or unmethylated) in a biologically derived input DNA fragment, not the remaining (unconverted) CpG sites after conversion and amplification.
- Red boxes are included to highlight the robust enrichment of fragments harboring 8 or more CpG sites (in fragments averaging -170-180 bp in length), which is the methylation density range typically found in CpG islands and promoters.
- Figure 7 presents a genomic map showing a change in alignment and coverage of sequenced cell-free DNA fragments in the region of the PAX8 gene on Chromosome 2 before vs. after two rounds of selective capture and elution of densely methylated DNA fragments.
- Preparation of the native library comprised steps of conversion, amplification, and restoration of methylation patterns using methods disclosed herein.
- the enriched library was further subjected to two rounds of methyl binding domain-based affinity capture and elution with competitive binding of a 226 base-pair competitor DNA containing 10 methylated CpG sites.
- sequences mapped in a largely random pattern throughout the genome.
- Example 3 Patterns of aberrant hypermethylation of cell-free DNA fragments in plasma from patients with various types of cancers and from non-cancer control subjects.
- Plasma samples ( ⁇ 1 mL) were tested from 11 patients with various types of advanced-stage cancer and from 8 individuals with no known cancer history who were undergoing lung cancer screening because they had a heavy smoking history (meeting US Preventative Services Task Force eligibility criteria). Samples were tested according to the methods described in Example 1.
- Figure 8 shows a heat map displaying genomic regions at which aberrantly hypermethylated sequences from cell-free DNA fragments were observed to map in plasma of 11 patients with various types of cancer (advanced stage) and 8 non-cancer control subjects who were heavy smokers participating in a lung cancer screening program. Results are displayed for chromosome 2 (chosen arbitrarily), which is representative of genome-wide patterns. Dark bars represent genomic regions at which mapping is observed of one or more cfDNA fragments that are categorized as aberrantly hypermethylated.
- Such fragments are densely methylated but map to genomic regions that are expected to have a methylation level of less than 40% (averaged across all CpG sites) in multiple types of healthy cells and tissues based on publicly available whole genome bisulfite sequencing data (from Roadmap and Blueprint studies). The difference in signal between cancer cases and non-cancer control subjects is striking. The distinct patterns of hypermethylation between samples underscores the importance of untargeted capture. If a panel of targeted hybrid-capture oligonucleotides had been used instead, such comprehensive capture for all cancer types would not have been possible. These results demonstrate the ability of the assay to capture aberrant promoter hypermethylation signals regardless of genomic location and from multiple types of cancer.
- initial shrinkage followed by enlargement of a liver metastasis are shown in computer tomography (CT) scan images taken at baseline (prior to therapy) and at cycles 4 and 8 of therapy (each cycle is 28 days).
- CT computer tomography
- a graph is also provided showing changes over time in tumor burden (defined as sum of diameters of target lesions according to RECIST guidelines) and in tumor-derived cfDNA level (measured as the variant allele fraction [VAF] of a tumorspecific KRAS mutation in plasma cell-free DNA).
- the tumor burden initially decreases with the drug therapy but then increases, likely because of growth of treatment-resistant tumor clones.
- the mutant tumor-derived cfDNA level shows a transient spike in level (possibly due to initial cell kill) followed by a decline, indicative of tumor response.
- aberrantly hypermethylated cfDNA fragment counts mapping to chromosome 10 are shown at 4 time points: at baseline, shortly after beginning treatment, at the nadir of mutant tumor-derived cfDNA level, and when the cancer has clearly progressed.
- Each circle indicates the observation of one or more aberrantly hypermethylated cfDNA fragments mapping to a CpG island at that genomic location.
- Circle size is proportional to the number of fragments mapping to a given CpG island. Blue circles indicate CpG islands at which aberrantly hypermethylated cfDNA fragments mapped at baseline and during therapy.
- Green circles indicate CpG islands at which aberrantly hypermethylated fragments were observed at either of the first two time points but not thereafter.
- Red circles indicate CpG islands at which aberrantly hypermethylated fragments were not observed at either of the first two time points but emerged thereafter. Analysis of such evolving aberrant hypermethylation patterns at CpG islands can provide biological and clinical insights pertaining to epigenetic resistance mechanisms, tumor heterogeneity, prognosis, and response or lack of response to therapy.
- Analyzing longitudinal changes in methylation patterns over time in the same patient facilitates identification and monitoring of personalized disease-associated methylation signals.
- various logical approaches can be applied alone or in combination: (1) identify hypermethylated cell-free DNA fragments that map to CpG islands which are rarely hypermethylated in healthy plasma, (2) identify hypermethylated cell-free DNA fragments that map to CpG islands which are known to commonly become hypermethylated in cancer cells based on data from studies of other patients, and/or (3) identify hypermethylated cell-free DNA fragments that map to CpG islands whose fragment counts (relative to other CpG islands in the same biospecimen) change over time in concert with changes in tumor burden (e.g., relative DNA fragment counts mapping to a CpG island can increase over time with disease progression or decrease over time when tumors shrink in response to effective therapy).
- this information can be used to improve sensitivity and/or specificity for detecting tumor- derived signals in subsequent biospecimens obtained from the same patient.
- this information can be used to improve sensitivity and/or specificity for detecting tumor- derived signals in subsequent biospecimens obtained from the same patient.
- observation of hypermethylated DNA fragments mapping those genomic regions in a subsequent biospecimen can be considered to have a greater probability of being tumor- derived.
- observation of hypermethylated DNA fragments mapping outside of those genomic regions would be less likely to be tumor-derived. Similar approaches can be applied to DNA derived from other biological samples beyond just plasma.
- longitudinal plasma samples ( ⁇ 1 mL each) were obtained from a 76 year-old male patient with metastatic non-small cell lung cancer who received immune checkpoint inhibitor therapy with the drug Pembrolizumab. Plasma samples were obtained from the patient prior to initiating therapy (on cycle 1 day 1) and again after completing one cycle of treatment (on cycle 2 day 1, prior to receiving the second cycle). Cell-free DNA was extracted from plasma and was tested according to the methods described in Example 1 .
- Figure 10 shows a graph in which cell-free DNA fragment counts mapping to various CpG islands are displayed for two time points (pre-treatment on the X-axis, and after 1 cycle on the Y-axis). Each data point on the graph shows cell-free DNA fragment counts mapping to an individual CpG island at the two time points.
- the graph shows that at many CpG islands, the relative fragment counts mapping to those CpG islands remain fairly stable over time, suggesting that these CpG islands are unlikely to be cancer-associated (considered background signal).
- the graph also shows that at some other CpG islands, the relative fragment counts mapping to those CpG islands decrease substantially from the pre-treatment sample to the post-treatment sample, suggesting that these CpG islands are likely to be cancer-associated.
- Such analysis can facilitate identification of CpG islands that show cancer-associated hypermethylation in a given patient (i.e., a personalized cancer-associated hypermethylation pattern).
- This example shows that densely methylated viral DNA can be captured and sequenced using methods disclosed herein.
- densely methylated viral DNA fragments were captured from a patient’s plasma in parallel with densely methylated cell-free DNA fragments derived from the patient’s genome.
- a 1 mL plasma sample was obtained from a male patient with HIV who developed diffuse large B-cell lymphoma (DLBCL). It is known that DLBCL in the setting of HIV is often associated with latent Epstein-Barr Virus (EB V) infection of B-cells. The plasma sample was obtained prior to initiation of any therapy.
- DLBCL diffuse large B-cell lymphoma
- EB V latent Epstein-Barr Virus
- Cell-free DNA (including viral DNA) was extracted from the plasma sample and was tested according to the methods described in Example 1 , with a modification in the bioinformatic analysis to include alignment of DNA fragment sequences to viral genomes including Epstein-Barr Virus, HIV-1, Human Papilloma Virus, Kaposi’s Sarcoma Herpesvirus, Hepatitis B Virus, and Hepatitis C virus, in addition to the human genome reference (hg38).
- Figure 11 shows densely methylated cell-free DNA fragments mapping to the EBV genome in the plasma of this patient with HIV and DLBCL. Note the periodicity of sequence coverage suggests phased nucleosomal protection of cfDNA fragments.
- Red bars in magnified views indicate methylated CpG sites; blue bars indicate unmethylated sites.
- no fragments were found to map to the genomes of any of the other viral reference genomes that were included in the bioinformatic analysis (besides EBV), suggesting that other viral DNA was not present in the blood, or if present, may not have been methylated with sufficient density to be captured and sequenced.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Immunology (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Physics & Mathematics (AREA)
- Pathology (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Oncology (AREA)
- Hospice & Palliative Care (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Methods and compositions are provided that enable efficient and accurate characterization of epigenetic modifications of DNA. Some methods enable enrichment of DNA based on density of methylated CpG sites while preserving unique sequence representation of densely methylated DNA. Some methods enable amplification of DNA with restoration of methylation patterns at CpG sites in the DNA copies. Some methods can be applied to detection of cancer-specific or disease-specific methylation patterns from biological specimens.
Description
METHODS OF ENRICHING AND ANALYZING METHYLATED DNA
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
This invention was made with government support under R01CA197486 and U01CA233364 awarded by the National Institutes of Health. The government has certain rights in the invention.
FIELD OF THE INVENTION
The present disclosure relates to methods and compositions for characterization and analysis of epigenetic features of DNA.
BACKGROUND
DNA methylation is an epigenetic modification that plays important roles in many biological processes, including gene regulation, maintenance of genome stability, embryo development, X-chromosome inactivation, genomic imprinting, and cellular differentiation. DNA methylation involves the addition of a methyl group to the DNA molecule, primarily at cytosine bases in a CpG dinucleotide context, producing a 5-methylcytosine base. This modification is catalyzed by a group of enzymes known as DNA methyltransferases, and methylation patterns in a cell’s genome are tightly regulated. The patterns can be maintained during cell division, providing a mechanism for stable gene silencing or activation.
Aberrant DNA methylation patterns have been associated with several human diseases, including cancer, cardiovascular diseases, metabolic diseases, neurological disorders, autoimmune disorders, and with aging. Therefore, DNA methylation has emerged as a promising biomarker of disease. The aberrant methylation patterns can occur at specific genomic regions, such as gene promoters or CpG islands, leading to the dysregulation of gene expression and contributing to disease development. Additionally, methylation signatures that are cell-lineage-specific, tissue-specific, or cancer-specific can be used to identify the origin of cell-free DNA fragments in biological fluids. Measurement of such methylation signatures can be used to infer the rate of death of the corresponding cell types, providing a means to assess cell death from specific tissues, transplanted organs, a fetus, or cancer.
Importantly, DNA methylation patterns can be analyzed in various readily accessible biological samples, including blood, urine, saliva, sputum, and stool, making it a convenient and accessible biomarker. Methylation patterns can also be analyzed in biospecimens obtained from medical procedures, such as tissue, cerebrospinal fluid, pleural fluid, ascites fluid, uterine lavage fluid, and Pap smear fluid. The measurement of DNA methylation
patterns can aid in disease diagnosis, prognosis, prediction or monitoring of treatment response, and detection of residual or recurrent disease.
Analysis of cancer-specific methylation patterns in cell-free DNA (cfDNA) has especially gained attention as a promising avenue for noninvasive cancer detection and monitoring. Dying cancer cells release DNA fragments into the bloodstream, and this cfDNA contains characteristic alterations in DNA methylation patterns. Enzymatic processes that determine methylation levels in various genomic regions are tightly regulated in healthy cells, resulting in highly consistent genome-wide methylation patterns in a given tissue or cell type. These processes become dysregulated in cancer cells, yielding aberrant patterns of methylation that are rarely found in healthy tissues. The transcriptional silencing of tumor suppressor genes, developmental regulators, and many other genes by hypermethylation of promoters is a fundamental mechanism of carcinogenesis. In humans, approximately 70% of promoters located near a gene’s transcription start site contain a CpG island. CpG islands are stretches of genomic DNA containing a relatively high density of CpG sites which are targets of methylation (CpG site refers to the sequence 5’-C-phosphate-G-3’). There are typically several hundred to a few thousand gene promoters at CpG Islands that become aberrantly hypermethylated in a tumor, with substantial heterogeneity in methylation patterns across different patients and different tumors. Some promoter hypermethylation patterns can also be organ-specific, enabling tissue of origin prediction from hypermethylated cfDNA fragment analysis. Because promoter methylation is tightly regulated and occurs in highly consistent patterns in healthy cells and tissues, the aberrant hypermethylation of large numbers of promoters in cancer cells provides an attractive biomarker signal in cfDNA that is both abundant and highly cancer specific. Similar patterns of aberrant CpG island hypermethylation are observed in cancers occurring in nearly all vertebrate species (beyond just humans), providing an avenue for cfDNA-based cancer signal detection and/or monitoring for veterinary, agricultural, and animal model research applications.
Additionally, DNA methylation is being used to measure biological aging. As individuals age, patterns of DNA methylation are known to change predictably across various genomic sites. These methylation changes can be used as epigenetic clocks to estimate the biological age of an individual or of a particular organ (e.g. the liver, kidney, heart, etc.). This approach is believed to provide a more accurate reflection of an individual’s physiological state than simple chronological age, potentially providing insights into disease risk and a means to measure the effectiveness of anti-aging therapies. These clocks, such as the Horvath clock, assess biological age by examining methylation levels at particular CpG
sites. Many of the CpG sites that are strongly predictive of biological age are found at or in the vicinity of CpG islands. Thus, genome-wide assessment of methylation levels at CpG islands could provide a more accurate measurement of biological age than current methods which are focused on a limited number of pre-defined CpG sites.
However, efficient, genome- wide capture and sequencing of hypermethylated CpG islands has been challenging. Several existing methods have been developed to target specific promoter sequences and/or CpG island sequences based on hybridization of complementary oligonucleotide capture probes. These existing approaches require predefining of target sequences, usually based on identification of genomic regions that are prone to being differentially methylated in healthy vs. diseased cells. Such differentially methylated regions (DMRs) may be informative as biomarkers of disease states, including cancer. However, methylation patterns differ across tissue types and among different cancers, making it is difficult to design a universal panel of capture probes to cover all CpG islands that could be hypermethylated in a variety of cancer types, normal tissues, and disease states. Such a broad hybrid-capture panel that comprehensively covers nearly all potentially hypermethylated CpG islands would result in wasteful sequencing of many irrelevant genomic regions at higher cost. Alternative approaches to evaluate genome-wide methylation patterns are based on enrichment of methylated DNA fragments via affinity purification using either an antibody raised against 5-methylcytosine (MeDIP-seq) or a methyl binding domain (MBD) protein which binds to symmetrically methylated CpG sites on both strands of double-stranded DNA (MBD-seq). However, these approaches achieve limited enrichment of densely methylated promoter and/or CpG island sequences because they also capture a large proportion of fragments with moderate methylation density which are much more abundant throughout the genome. For example, hypermethylated cfDNA fragments mapping to CpG islands typically have 10 or more methylated CpG sites in a fragment of -160-180 base pairs, whereas the enriched pool of methylated cfDNA captured by cell-free MeDIP contains an average of 4 methylated CpG sites.
Achieving more selective enrichment of fragments with higher methylation density (mapping to CpG islands) has been challenging. One approach to selectively capture more densely methylated fragments is to include methylated competitor DNA fragments (which cannot participate in downstream sequencing) during antibody- or methyl binding domainbased affinity purification. The methylation density distribution of captured DNA fragments can be tuned by adjusting the amount, methylation density, methylation content, and/or fragment size of the competitor DNA added. However, increasing selectivity for densely
methylated DNA fragments also leads to greater loss of such fragments because of increased competition for binding. Such losses can degrade the relevant signal when input molecules are limited, as with cfDNA.
Thus, it would be desirable to develop an approach for selectively enriching densely methylated DNA fragments while minimizing loss of unique sequence representation of these fragments. Such a method could permit comprehensive capture of densely methylated DNA fragments mapping to CpG islands throughout the genome. Importantly, the method would not require pre-specification of genomic target sequences. Instead, the method would be able to dynamically capture hypermethylated CpG island and/or promoter sequences wherever they occur in a genome. The method would be able to detect aberrant hypermethylation patterns of CpG islands from any type of cancer as well as from other disease states. In the area of cancer diagnostics, the method could be used to measure aberrant cancer-derived hypermethylation signals in biofluids or biospecimens to enable early detection of cancer, to assess prognosis, to predict therapy efficacy, to monitor treatment response and disease progression, to identify changes in hypermethylation patterns that could be indicative of treatment response or resistance, and to detect residual or recurrent cancer. Additionally, because the method would enable identification of patient-specific methylation patterns in a biospecimen obtained from a patient at a certain point in time, knowledge of that pattern could be used to improve the sensitivity of detection of similar patterns in biospecimens obtained from the same patient at a different point in time. For example, this approach could enable the development of tumor-informed or plasma-informed personalized assays requiring only modifications to computational algorithms, and not requiring physical or experimental changes to the assay methodology. Because the method would enrich DNA fragments based on methylation density rather than sequence, it would be able to enrich densely methylated DNA fragments regardless of their genomic origin, including from essentially any vertebrate species as well as from viruses and microbes.
SUMMARY OF EMBODIMENTS
The current disclosure is directed to methods and compositions that enable efficient measurement and analysis of biologically or medically informative DNA methylation patterns. Some disclosed methods and compositions permit characterization of aberrant hypermethylation patterns at CpG Islands throughout a genome without needing to target prespecified sequences of interest. Methods and compositions are described for enriching DNA molecules based on density of methylated CpG sites while minimizing loss of unique sequences derived from densely methylated DNA molecules. Methods and compositions are
also described to enable selective sequencing of densely methylated DNA molecules from a population of DNA molecules, while minimizing the loss of unique sequences derived from said densely methylated DNA molecules. Some methods and compositions described herein enable conversion and amplification of DNA with restoration of CpG methylation patterns in the DNA copies. In some implementations, disclosed methods can be applied to detection or monitoring of cancer. In some implementations, disclosed methods can be applied to assessment of evolving methylation patterns during or after a therapy as well as identification of potential mechanisms of therapy resistance. In some implementations, methods described herein can be applied to sensitive detection of residual, recurrent, or progressing cancer by detecting aberrantly hypermethylated DNA fragments in post- treatment biospecimens that match patient-specific patterns of aberrant hypermethylation identified in pre-treatment biospecimens. In some implementations, disclosed methods can be applied to diagnosis and monitoring of non-cancer disease conditions that produce altered methylation patterns in patient biospecimens. In some implementations, disclosed methods can be applied to assessment of organ-specific cell-death in various disease states based on measurement of organ-specific methylation patterns in cell-free DNA. In some implementations, disclosed methods can be applied to measurement of biological aging. In some implementations, disclosed methods can be applied to assessment of CpG island hypermethylation in vertebrate species for veterinary, agricultural, and/or animal model research applications. In some implementations, disclosed methods can be applied to assessment of densely methylated viral and/or microbial DNA fragments. In some implementations, disclosed methods can be developed into a kit. In some implementations, disclosed methods can be combined with spatial labeling techniques to permit assessment of spatial patterns of CpG island hypermethylation in tissues.
In accordance with an aspect of at least one embodiment, there is provided a method of DNA conversion and amplification with restoration of CpG methylation patterns, comprising the following steps: a. converting unmodified cytosine bases in template DNA molecules to uracil bases by deamination, resulting in converted template DNA molecules; b. performing a polymerase chain reaction (PCR) to generate DNA copies of the converted template DNA molecules; c. methylating cytosine bases at unconverted CpG sites in the DNA copies using an enzyme, thereby providing converted and amplified copies of DNA with CpG methylation patterns restored.
In accordance with an aspect of at least one embodiment, there is provided a method of enriching DNA molecules based on density of methylated CpG sites while minimizing loss
of unique sequences derived from densely methylated DNA molecules, the method comprising: a. converting unmodified cytosine bases in template DNA molecules to uracil bases by deamination, resulting in converted template DNA molecules; b. performing a polymerase chain reaction (PCR) to generate DNA copies of the converted template DNA molecules; c. methylating cytosine bases at unconverted CpG sites in the DNA copies using an enzyme, resulting in DNA copies with restored methylation; d. enriching densely methylated members of the population of DNA copies with restored methylation via selective capture based on methylation density.
In accordance with an aspect of at least one embodiment, there is provided a method of selectively sequencing densely methylated DNA molecules from a population of DNA molecules, while minimizing the loss of unique sequences derived from said densely methylated DNA molecules, comprising the following steps: a. converting unmodified cytosine bases in template DNA molecules to uracil bases by deamination, resulting in converted template DNA molecules; b. performing a polymerase chain reaction (PCR) to generate DNA copies of the converted template DNA molecules; c. methylating cytosine bases at unconverted CpG sites in the DNA copies using an enzyme, resulting in converted DNA copies with restored methylation; d. enriching densely methylated members of the population of converted DNA copies with restored methylation through selective capture based on methylation density'; e. ligating sequencing-compatible adapters to DNA molecules either before or after any one of steps a, b, c, or d, ultimately resulting in the formation of a sequencing library; f. sequencing at least a portion of the sequencing library to obtain a plurality of sequencing reads.
In accordance with an aspect of at least one embodiment, there is provided a method for detecting residual, recurrent, or progressing cancer, the method comprising: a. sequencing densely methylated DNA fragments from tumor tissue, blood, plasma, serum, or urine of a patient diagnosed with cancer to identify a plurality of aberrantly hypermethylated CpG island regions that are specific to that patient’s cancer; b. obtaining one or more longitudinal samples of blood, plasma, serum, or urine from the patient after the patient has received a cancer treatment; c. sequencing densely methylated DNA fragments from the post-treatment longitudinal sample; d. identifying aberrantly hypermethylated DNA sequences in the posttreatment longitudinal sample that map to the same patient-specific hypermethylated CpG island regions that were identified in the earlier sample, wherein detection of one or more such sequences matching patient-specific hypermethylation patterns in the post-treatment sample is indicative of residual, recurrent, or progressing cancer.
BRIEF DESCRIPTION OF THE DRAWINGS
The patent or application file contains at. least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Figure 1 provides a schematic illustration of an example method of selectively sequencing densely methylated DNA molecules from a population of DNA molecules with varying methylation density, while minimizing the loss of unique sequences derived from said densely methylated DNA molecules. The schematic representation shows 3 examples of biologically derived input DNA molecules at the top of the figure: one with relatively dense methylation, a second with relatively sparse methylation, and a third with no methylation. For illustrative purposes, the densely methylated DNA is shown with 4 symmetrically methylated CpG sites. In some examples of biological specimens, densely methylated DNA fragments can have 8 or more methylated CpG sites (symmetric or asymmetric) in fragments of -100 to 250 base pairs in length. The example shows that adapters are ligated to input DNA fragments, and then the DNA undergoes enzymatic methyl conversion (or bisulfite conversion) followed by PCR amplification. The resulting converted and amplified DNA copies have sequences in which unmodified C bases were converted to T bases, whereas methylated or hydroxymethylated C bases were retained as C bases. Of note, some amplified copies shown in the schematic are derived from the converted Watson strand of the input DNA and some are derived from the converted Crick strand. Next, a CpG methyltransferase enzyme is used to restore methylation at unconverted CpG sites in the converted, amplified DNA copies. Because only CpG sites that were methylated in the biologically derived input DNA are retained as CpG sites in the converted, amplified copies, the use of a CpG methyltransferase enables restoration of original methylation patterns. The amplified DNA copies with restored methylation patterns are then shown undergoing selective enrichment of densely methylated DNA copies by competitive binding to methyl binding domain protein (or antibody to 5-mC) and capture on magnetic beads. Densely methylated competitor DNA fragments are added to the capture mix to competitively inhibit the capture of fragments with lower methylation density. The ability to generate multiple redundant copies of each DNA input molecule with methylation patterns restored enables use of stringent capture and enrichment conditions (including an option of more than one round of capture) while preserving representation of unique sequences of densely methylated DNA fragments. The schematic illustrates the purification of a next-generation sequencing (NGS) library of
densely methylated, converted sequences. Resulting sequences can be mapped to a reference genome and the original methylation status of cytosine bases can be inferred based on C to T conversion.
Figure 2 provides a more detailed schematic illustration of an example method in which double-stranded biologically derived input DNA is converted and amplified with restoration of CpG methylation patterns. In the double- stranded biologically derived input DNA shown at the top of the figure, a symmetrically methylated CpG site is shown (with methylated cytosines on both strands), and several unmethylated cytosines are also shown within and outside of a CpG context. Bisulfite conversion or Enzymatic Methyl conversion results in unmodified cytosines being converted to uracils by deamination. 5- methylcytosines and 5 -hydroxymethylcytosines are rarely converted to uracils. In this example, subsequent PCR amplification replaces the converted uracils with thymines. Therefore, the only remaining CpG sites after conversion and amplification are those which were methylated (or hydroxymethylated) in the input DNA. A CpG methyltransferase enzyme (such as M.SssI) can thus be used to restore the original methylation patterns in the converted, amplified DNA copies.
Figure 3 provides a detailed schematic illustration of an example method in which single-stranded biologically derived input DNA is converted and amplified with restoration of CpG methylation patterns. The process is largely analogous to that shown in Figure 2, but the resulting converted, amplified, and re-methylated sequences are derived from conversion of a single-strand sequence.
Figure 4 shows a schematic overview of an example method which enables enrichment of DNA molecules based on density of methylated CpG sites while minimizing loss of unique sequences derived from densely methylated DNA molecules. The figure highlights the creation of a converted, amplified NGS library with methylation patterns restored, which can then be subjected to stringent selection conditions to enrich densely methylated DNA while taking advantage of the sequence redundancy to preserve unique sequence representation of densely methylated DNA fragments.
Figure 5 shows a schematic of a two-step ligation scheme that enables ligation of adapter sequences to double- stranded DNA fragments of interest while minimizing formation of adapter dimers. The illustrated two-step ligation scheme was used in Example 1 in the Detailed Description section to attach adapter sequences to blunted and 5 ’-phosphorylated cell-free DNA fragments and genomic DNA fragments. The first step involves ligation of stem-loop adapters to the insert DNA by forming a phosphodiester linkage between a 5’-
phosphylated end of the insert DNA and the 3 ’ -hydroxyl end of the stem-loop adapter. The 5 ’-end of the stem-loop adapter lacks a phosphate, and therefore cannot be ligated to the insert or to another stem-loop adapter molecule (thereby avoiding adapter dimer formation). Next, USER enzyme is used to cleave at deoxyUridine positions in the stem of the stem-loop adapter to destabilize base-pairing. DNA is then cleaned up to remove unligated adapters, ligase, and USER enzyme. In the second step, a displacer oligonucleotide is added to displace one strand of the stem-loop adapter by hybridization to the opposite (ligated) strand, as shown in the figure. Finally, a nick-sealing ligase (HiFi Taq DNA ligase) is used to ligate the 5 ’-phosphorylated displacer oligonucleotide to the 3 ’-end of the insert DNA. In the example provided herein, the stem- loop adapter and displacer oligonucleotides were designed to attach sample barcodes and Illumina adapter sequences to the input DNA fragments. However, adapter sequences for other sequencing platforms could be readily substituted.
Figure 6 shows histograms comparing the CpG dinucleotide content of sequenced cfDNA fragments before vs. after two rounds of selective capture and elution of densely methylated DNA fragments (which is referred to here as high density methyl-capture). In these histograms, the CpG dinucleotide count refers to the number of CpG sites (methylated, hydroxymethylated, or unmethylated) in a biologically derived input DNA fragment, not the remaining (unconverted) CpG sites after conversion and amplification. Red boxes are included to highlight the robust enrichment of fragments harboring 8 or more CpG sites (in fragments averaging -170-180 bp in length), which is the methylation density range typically found in CpG islands and promoters.
Figure 7 presents a genomic map showing a change in alignment and coverage of sequenced cell-free DNA fragments in the region of the PAX8 gene on Chromosome 2 before vs. after two rounds of selective capture and elution of densely methylated DNA fragments. Preparation of the native library comprised steps of conversion, amplification, and restoration of methylation patterns using methods disclosed herein. The enriched library was further subjected to two rounds of methyl binding domain-based affinity capture and elution with competitive binding of a 226 base-pair competitor DNA containing 10 methylated CpG sites. In the unenriched (native) library, sequences mapped in a largely random pattern throughout the genome. In the library that was enriched for densely methylated fragments, the vast majority of reads mapped to CpG islands. This enrichment results in a high yield of cancerrelevant signal after enrichment. Importantly, the enrichment is based on methylation density, not based on hybrid-capture of pre-defined sequences. It is also noteworthy that
multiple cfDNA fragments from a sample can map to the same CpG island region, providing strong signal confirmation.
Figure 8 shows a heat map displaying genomic regions at which aberrantly hypermethylated sequences from cell-free DNA fragments were observed to map in plasma of 11 patients with various types of cancer (advanced stage) and 8 non-cancer control subjects who were heavy smokers participating in a lung cancer screening program. Results are displayed for chromosome 2 (chosen arbitrarily), which is representative of genome-wide patterns. Dark bars represent genomic regions at which mapping is observed of one or more cfDNA fragments that are categorized as aberrantly hypermethylated. Such fragments are densely methylated but map to genomic regions that are expected to have a methylation level of less than 40% (averaged across all CpG sites) in multiple types of healthy cells and tissues based on publicly available whole genome bisulfite sequencing data (from Roadmap and Blueprint studies).
Figure 9 shows the evolution of plasma cell-free DNA hypermethylation patterns over time in a patient with metastatic non-small cell lung cancer being treated with olaparib and cediranib. The patient’s cancer initially showed a modest response to therapy (considered stable disease by RECIST criteria), and subsequently showed progression. As an example, initial shrinkage followed by enlargement of a liver metastasis are shown in computer tomography (CT) scan images taken at baseline (prior to therapy) and at cycles 4 and 8 of therapy (each cycle is 28 days). A graph is also provided showing changes over time in tumor burden (defined as sum of diameters of target lesions according to RECIST guidelines) and in tumor-derived cfDNA level (measured as the variant allele fraction [VAF] of a tumor-specific KRAS mutation in plasma cell-free DNA). The tumor burden initially decreases with the drug therapy but then increases, likely because of growth of treatmentresistant tumor clones. The mutant tumor-derived cfDNA level shows a transient spike in level (possibly due to initial cell kill) followed by a decline, indicative of tumor response. After ~4 months, the tumor-derived cfDNA level began to increase, indicative to tumor progression. In the plot below, aberrantly hypermethylated cfDNA fragment counts mapping to chromosome 10 (as a representative example) are shown at 4 time points: at baseline, shortly after beginning treatment, at the nadir of mutant tumor-derived cfDNA level, and when the cancer has clearly progressed. Each circle indicates the observation of one or more aberrantly hypermethylated cfDNA fragments mapping to a CpG island at that genomic location. Circle size is proportional to the number of fragments mapping to a given CpG island. Blue circles indicate CpG islands at which aberrantly hypermethylated cfDNA
fragments mapped at baseline and during therapy. Green circles indicate CpG islands at which aberrantly hypermethylated fragments were observed at either of the first two time points but not thereafter. Red circles indicate CpG islands at which aberrantly hypermethylated fragments were not observed at either of the first two time points but emerged thereafter. Analysis of such evolving aberrant hypermethylation patterns at CpG islands can provide biological and clinical insights pertaining to epigenetic resistance mechanisms, tumor heterogeneity, prognosis, and response or lack of response to therapy.
Figure 10 compares cell-free DNA hypermethylation patterns at CpG islands in plasma samples obtained at two time points (pre- and post-treatment) in a 76 year-old male patient with metastatic non-small cell lung cancer who received immune checkpoint inhibitor therapy with the drug Pembrolizumab. Plasma samples were obtained from the patient prior to initiating therapy (on cycle 1 day 1) and again after completing one cycle of treatment (on cycle 2 day 1, prior to receiving the second cycle). Cell-free DNA was extracted from plasma and was tested according to the methods described in Example 1. The Figure shows a graph in which cell-free DNA fragment counts mapping to various CpG islands are displayed for two time points (pre-treatment on the X-axis, and after 1 cycle on the Y-axis). Each data point on the graph shows cell-free DNA fragment counts mapping to an individual CpG island at the two time points. The graph shows that at many CpG islands, the relative fragment counts mapping to those CpG islands remain fairly stable over time, suggesting that these CpG islands are unlikely to be cancer-associated (considered background signal). The graph also shows that at several other CpG islands, the relative fragment counts mapping to those CpG islands decrease substantially from the pre-treatment sample to the post-treatment sample, suggesting that these CpG islands are likely to be cancer-associated. Such analysis can facilitate identification of CpG islands that show cancer-associated hypermethylation in a given patient (i.e., a personalized cancer-associated hypermethylation pattern).
Figure 11 shows that densely methylated viral DNA can be captured and sequenced using methods disclosed herein. The data for the figure were generated from a 1 mL plasma sample that was obtained from a male patient with HIV who developed diffuse large B-cell lymphoma (DLBCL). Densely methylated viral DNA fragments were captured from this patient’s plasma in parallel with densely methylated cell-free DNA fragments derived from the patient’s genome. It is known that DLBCL in the setting of HIV is often associated with latent Epstein-Barr Virus (EBV) infection of B-cells. The plasma sample was obtained prior to initiation of any therapy. Cell-free DNA (including viral DNA) was extracted from the plasma sample and was tested according to the methods described in Example 1 , with a
modification in the bioinformatic analysis to include alignment of DNA fragment sequences to viral genomes including Epstein-Barr Virus, HIV-1, Human Papilloma Virus, Kaposi’s Sarcoma Herpesvirus, Hepatitis B Virus, and Hepatitis C virus, in addition to the human genome reference (hg38). The Figure shows densely methylated cell-free DNA fragments mapping to the EBV genome in the plasma of this patient. Note the periodicity of sequence coverage suggests phased nucleosomal protection of cfDNA fragments. Red bars in magnified views indicate methylated CpG sites; blue bars indicate unmethylated sites. Notably, no fragments were found to map to the genomes of any of the other viral reference genomes that were included in the bioinformatic analysis (besides EBV), suggesting that other viral DNA was not present in the blood, or if present, may not have been methylated with sufficient density to be captured and sequenced.
DETAILED DESCRIPTION
The current disclosure is directed to methods and compositions relating to medical diagnostics and biomedical research. Some methods enable enrichment of densely methylated DNA fragments from a mixture of DNA fragments with varying methylation density. Some methods enable enrichment of densely methylated DNA fragments that map to CpG island regions or promoter regions (or both) in a genome. Some methods enable enrichment of densely methylated DNA fragments while reducing loss of unique sequence representation of such fragments. Some methods enable enrichment of densely methylated DNA fragments using affinity capture techniques in which an antibody or protein preferentially binds to 5-methylcytosine or a methylated CpG site. Some methods enable identification and quantification of DNA molecules that harbor epigenetic modifications including 5-methylcytosine, 5-hydroxymethylcytosine, or both. Some methods include methylated competitor DNA during affinity capture to modify the methylation density profile of the captured DNA. Some methods use next-generation sequencing to obtain the sequences of the enriched, densely methylated DNA fragments. Some methods include library preparation steps prior to and/or after the enrichment of densely methylated DNA fragments to enable next-generation sequencing of the densely methylated DNA. Some methods reduce loss of unique sequence representation of densely methylated DNA fragments during affinity capture by producing a plurality of copies of the template DNA fragments (with CpG methylation patterns of the template DNA molecules restored on the DNA copies) prior to performing selective affinity capture based on methylation density. Some methods enable conversion and amplification of DNA with restoration of CpG methylation patterns in the
DNA copies. Some methods are well-suited to analysis of DNA from biological specimens in which the DNA quantity is limited, such as cell-free DNA from blood.
In some implementations, methods described herein address the challenge of selectively capturing and sequencing densely methylated DNA from CpG islands and/or promoter regions of vertebrate genomes without pre-defining target sequences and with minimal loss of desired sequence information during selective capture. Enrichment of densely methylated DNA from CpG islands for sequencing is desirable because these genomic regions can be especially rich in biologically informative methylation signals. Generally, signals of interest (e.g., for biomarker development) are found at genomic regions that have differential methylation patterns among different tissues or cell types. Such differentially methylated regions (DMRs) frequently occur at CpG islands, even though CpG islands have been estimated to constitute only approximately 1-2% of most mammalian genomes. Also, because the methylation of CpG islands and promoters plays an important role in the regulation of gene expression, characterizing the methylation status of these genomic regions is of particular interest in biomedicine. However, most existing methods for methylation analysis either probe pre-specified genomic target regions or they broadly analyze the genome or methylome. The former approach limits the analysis to specific genomic targets whereas the latter approach produces large volumes of data containing relatively sparse signals that are of biological or medical interest (at higher sequencing cost). Thus, a method that could greatly enrich and sequence densely methylated CpG island and/or promoter DNA based on highly selective capture of densely methylated DNA fragments would provide a much-needed analytical tool for use in biomedicine.
However, a key challenge with most enrichment techniques is that increasing selectivity of enrichment is usually accompanied by greater loss of the molecules being selected for. As the selectivity of an enrichment method is increased, the undesirable molecules are removed more efficiently, but the capture efficiency of the desired molecules is also usually reduced. In situations where the desired molecules are available in limited abundance, such increased selectivity can lead to reduced yield of the molecules of interest (thereby reducing the signal of interest). In the specific instance of enriching methylated DNA with affinity purification techniques based on binding to 5 -methylcytosine or methylated CpG sites, increasing selectivity for capture of densely methylated DNA would more efficiently remove DNA molecules with low methylation density, but it would also lead to decreased yields of DNA molecules with high methylation density. For example, addition of a densely methylated competitor DNA to competitively reduce the capture efficiency of
low methylation density DNA molecules would also result in reduced yields of high methylation density DNA molecules (because of increased competition for binding to limited binding sites). This can be especially problematic when the densely methylated DNA molecules of interest are present in limited abundance in the sample being evaluated (for example, cell-free DNA from a patient’s plasma). Because of this concern, most existing methods for capturing methylated DNA (such as Methylated DNA Immunoprecipitation [MeDIP] and MBD Capture) are not applied with high selectivity for capturing densely methylated DNA fragments.
To address this challenge, some methods described herein enable generation of many redundant copies of each original (biologically derived) DNA fragment prior to performing selective capture of densely methylated DNA. In this way, representation of unique sequences is preserved even if a large proportion of desired molecules are lost under high- stringency capture conditions. For example, if an original densely methylated DNA molecule is amplified by PCR to 200 copies and only 5% of these copies are recovered after capture, 10 copies of that unique original molecule will still remain available for sequencing. However, standard PCR amplification does not copy methylation marks, making it impossible to subsequently perform affinity capture based on DNA methylation density. To enable amplification of DNA with restoration of methylation patterns, some methods are described herein that permit conversion and amplification of DNA with restoration of methylation at CpG sites.
As schematized in Figures 1, 2, 3, and 4, some methods described herein comprise conversion of original (biologically derived) DNA molecules resulting in deamination of unmodified cytosine bases to uracil bases. In some implementations, conversion can be performed by treatment of DNA with bisulfite or with enzymatic treatment (TET2 then APOBEC) as in Enzymatic Methylation sequencing (EM-Seq). In some implementations, cytosine bases which are methylated (5-mC) or hydroxymethylated (5-hmC) are protected from deamination. In some implementations, the converted DNA can then undergo PCR amplification in which the uracil bases in the original converted DNA molecules are replaced by thymine bases in the DNA copies. This results in unmethylated-C in the original DNA molecules being converted to T in the PCR-amplified DNA copies, whereas 5-methylcytosine or 5-hydroxymethylcytosine in the original DNA molecules are copied as C in the PCR- amplified DNA copies. Accordingly, any methylated or hydroxymethylated CpG sites (5’ ...mCG...3’ or 5’ ...hmCG...3’) in the original DNA molecules are copied as 5’...CG...3’ dinucleotide sequences in the amplified DNA copies, whereas any unmodified CpG sites in
the original DNA molecules become converted to 5’ . ..TG. ..3’ dinucleotide sequences (or 5’ . . .CA. . .3’ if the converted C was on the opposite strand of the original DNA molecule) in the amplified DNA copies. Therefore, some methods described herein use a CpG methyltransferase (such as M.SssI) to restore methylation at CpG sites in the amplified DNA copies which correspond to CpG sites that were either methylated or hydroxymethylated in the original DNA molecules. The CpG methyltransferase enzyme M.SssI catalyzes methylation at the C5 position of all cytosine residues within the double- stranded dinucleotide recognition sequence 5’ ...CG...3’. Because CG dinucleotides (also known as CpG sites) that were methylated or hydroxymethylated in the original DNA molecule are expected to be the only remaining CG dinucleotides in the converted, amplified DNA copies, a CpG methyltransferase can be used to restore the original methylation patterns on the DNA copies. Of note, some conversion methods (such as bisulfite conversion and EM-Seq conversion) do not distinguish between methylated and hydroxymethylated cytosines in the template DNA, and therefore, the CpG methylation pattern on the copied DNA could reflect either CpG methylation or hydroxymethylation in the original DNA. Also, because DNA conversion processes are not perfect, the DNA copies will sometimes (rarely) contain methylated CpG sites where the original DNA was not methylated or hydroxymethylated, or conversely, will lack methylation at a CpG site that was methylated or hydroxymethylated on the original DNA. The end result of this conversion, amplification, and re-methylation process is the production of amplified DNA copies in which unmodified C bases in the original DNA fragments are converted to T bases in the DNA copies, and methylated or hydroxymethylated CpG sites in the original DNA fragments are restored as methylated CpG sites in the DNA copies.
Taking advantage of the sequence redundancy of the converted DNA copies with methylation patterns restored, some methods described herein are able to enrich DNA fragments based on their methylation density while minimizing loss of unique sequence representation of densely methylated DNA fragments. The sequence redundancy permits use of enrichment or capture conditions that are highly selective for densely methylated DNA sequences. Because of the sequence redundancy, loss of some copies of an original (biologically derived) densely methylated DNA fragment under stringent enrichment conditions would be unlikely to result in complete loss of sequence representation of that fragment. In some implementations, enrichment of densely methylated DNA copies could be performed using affinity purification methods based on antibodies or proteins that bind to 5- methylcytosine or to symmetrically methylated CpG sites in double- stranded DNA. In some
implementations, to preferentially capture the most densely methylated DNA fragments, densely methylated competitor DNA molecules can be added to the affinity purification mixture to preferentially occupy binding sites to reduce the probability of capture of DNA copies with low methylation density. In some implementations, the enriched, densely methylated, converted DNA copies can undergo next-generation sequencing to enable characterization of the sequences, genomic mapping locations, and methylation status of the DNA copies.
In some implementations, the term “densely methylated”, when referring to cell-free DNA fragments (which typically have a length of approximately 150 to 190 base pairs), refers to fragments that have at least 5 methylated CpG sites. In a preferred implementation, such “densely methylated” fragments have at least 7 methylated CpG sites. In a more preferred implementation, such “densely methylated” fragments have at least 10 methylated CpG sites. In a most preferred implementation, such “densely methylated” fragments have between 12 and 30 methylated CpG sites.
In some implementations, the term “densely methylated”, when referring to DNA without a well-defined fragment size distribution (such as cellular genomic DNA), refers to fragments that have at least 3 methylated CpG sites per 100 base pairs. In a preferred implementation, such “densely methylated” fragments have at least 5 methylated CpG sites per 100 base pairs. In a more preferred implementation, such “densely methylated” fragments have at least 7 methylated CpG sites per 100 base pairs. In a most preferred implementation, such “densely methylated” fragments have between 8 and 18 methylated CpG sites per 100 base pairs.
In some implementations, the term “enrichment of densely methylated DNA” (or variations of this phrase) as used herein, refers to at least a 2-fold increase in the fraction of densely methylated DNA molecules divided by the total number of DNA molecules in a population. In a preferred implementation, this term refers to at least a 10-fold increase in the fraction of densely methylated DNA molecules divided by the total number of DNA molecules in a population. In a more preferred implementation, this term refers to at least a 100-fold increase in the fraction of densely methylated DNA molecules divided by the total number of DNA molecules in a population. In a most preferred implementation, this term refers to at least a 500-fold increase in the fraction of densely methylated DNA molecules divided by the total number of DNA molecules in a population.
In addition to preserving sequence representation of densely methylated DNA fragments, another important advantage of the methods described herein is that the
methylation status of the DNA fragments can be directly assessed because of the sequence conversion. Other existing methods such as MeDIP and MBD Capture that enrich methylated DNA have been developed to directly capture methylated DNA fragments derived from the biological source (sometimes preceded by or followed by steps to incorporate adapters and/or indices for next-generation sequencing), without conversion or amplification of the DNA prior to capture. With these methods, because the DNA did not undergo conversion, the captured fragments are assumed to be methylated at CpG sites based on the fact that they were selectively captured, but there is no direct sequence-based evidence of their methylation state. Indeed, because some DNA fragments bind non-specifically to the capture beads, some unmethylated fragments will be inappropriately captured, contributing to non-specific background signal (noise). In contrast, using some of the methods described herein, such fragments with zero or few methylated CpG sites which bind non-specifically to the capture beads would be identified based on comparison of their converted sequence to the genomic reference sequence data (using standard bioinformatic tools for evaluating converted sequence data, such as Bismark [Babraham Bioinformatics]).
DNA sources:
In some implementations, DNA used as input for the assays and/or methods described herein can be derived from biological sources (such DNA is referred to herein as input DNA, original DNA, original input DNA, or biologically derived DNA). In some implementations, input DNA can be cell-free DNA (cfDNA) derived from biofluids or biospecimens including but not limited to blood, plasma, serum, saliva, sputum, stool, cerebrospinal fluid, Papanicolaou smear fluid, uterine lavage fluid, peritoneal fluid, pleural fluid, or urine. In some implementations, input DNA can be cell-derived DNA or exosome-derived DNA obtained from biofluids or biospecimens including but not limited to tissue, blood, plasma, serum, saliva, sputum, stool, cerebrospinal fluid, Papanicolaou smear fluid, uterine lavage fluid, peritoneal fluid, pleural fluid, or urine. In some implementations, input DNA can be double-stranded, single-stranded, or a combination of both. In some implementations, DNA can be obtained from patients with cancer. In some implementations, input DNA can be obtained from individuals being screened for cancer. In some implementations, input DNA can be obtained from individuals with inflammatory, autoimmune, or infectious disease processes. In some implementations, DNA can be obtained from healthy individuals with no known disease. In some implementations DNA can be obtained from forensic specimens including but not limited to hair, blood, semen, vaginal fluid, and skin. In some implementations, DNA can be obtained from sources that combine the DNA of multiple
individuals or organisms including but not limited to human wastewater, agricultural wastewater, agricultural food stocks, and animal-derived food products.
Lisation of oligonucleotide adapters that facilitate next- sene ration sequencing:
In some implementations, to facilitate next generation sequencing of the DNA that is undergoing conversion, amplification, restoration of methylation patterns, and enrichment of densely methylated copies, adapter oligonucleotides that are compatible with a particular sequencing platform can be ligated to the DNA (to produce a next-generation sequencing library). In some implementations, adapter sequences can be compatible with one or more of the following sequencing platforms, including but not limited to Illumina, Ion Torrent, Pacific Biosciences, BGI, Complete Genomics, and Oxford Nanopore. In some implementations, the ends of the DNA inserts may be prepared for ligation to adapter oligonucleotides by enzymatic treatment to phosphorylate the 5 ’-ends, to produce blunt ends, or to produce ends with appropriate overhangs that are compatible with the adapters that are to be ligated. In some implementations, a tagmentation approach can be used to attach adapters.
In some implementations, DNA adapter molecules are ligated to both paired DNA strands of a double-stranded DNA fragment (on one end or both ends of the DNA fragment). In an alternate implementation, adapter molecules can be attached in a similar manner by a transposase enzyme or by primer extension. In a further alternate implementation, an adapter molecule can be ligated to the 5 ’-end of one strand of DNA, and a polymerase can be used to extend the 3 ’-end of the opposite strand to make a reverse-complement copy of the ligated adapter molecule, thereby attaching adapter sequences to both strands of the DNA. In some implementations, the adapter molecule can comprise a DNA sequence tag that is substantially unique to the adapter (e.g., a Unique Molecular Identifier). In some implementations, the adapter molecule can comprise a Molecular Lineage Tag (which may have diverse sequences but not necessarily sufficient diversity to be unique). In some implementations, the adapter can be fully double-stranded, or can be partially double-stranded and partially singlestranded. In some implementations, the adapter can be fully single-stranded. In some implementations, the adapter can comprise the 4 unmodified DNA bases (A, C, T, and G). In some implementations, the adapter can comprise modified DNA bases, including but not limited to 5 -methylcytosine and/or 5-hydroxymethylcytosine. In some implementations, partially-double-stranded adapters (e.g., Y-shaped) can be ligated to both strands of the double-stranded DNA fragments. In some implementations, adapters can be ligated to the biologically derived input DNA molecules prior to conversion. In some implementations,
adapters can be ligated to converted DNA molecules prior to amplification. In some implementations, adapters can be litigated to converted and amplified DNA molecules prior to restoration of methylation patterns. In some implementations, adapters can be ligated to converted, amplified DNA copies with methylation patterns restored prior to enrichment of densely methylated copies. In some implementations, adapters can be ligated to DNA converted, amplified, methylation pattern restored, and dense methylation-enriched DNA prior to next generation sequencing. In some implementations, adapters can comprise 5- methylcytosine (or 5-hydroxymethylcytosine or 5-carboxyctyosine or 5-formylcytosine) bases to prevent conversion in adapter sequences. In some implementations, adapter sequences can be designed to avoid incorporation of CpG sequences which can subsequently become methylated by CpG methyltransferase. In some implementations, adapters can be designed to minimize the formation of adapter dimers during litigation. In some implementations, the process of adapter ligation can be optimized to reduce or prevent the formation of adapter dimers. In some implementations, a two-step ligation approach is used 2 minimize formation of adapter dimers. In some implementations, a two-step ligation approach (as schematized in Figure 5) comprises: (1) ligation of a 3’-end of a stem-loop adapter oligonucleotide to a 5 ’-end of a double-stranded insert DNA, without ligation of the opposite strand; and (2) displacement of the unligated strand of the stem-loop adapter by a displacer oligonucleotide, followed by ligation of the 5 ’-end of the displacer oligonucleotide to a 3 ’-end of the insert DNA. A variety of adapter ligation methods are known in the art, including single stranded and double stranded ligation methods; in some implementations, any of these ligation methods can be utilized to produce next generation sequencing libraries of the densely methylated DNA.
DNA Conversion:
In some implementations, the biologically derived DNA can undergo chemical or enzymatic (or both) conversion processes to enable methylated cytosines to be distinguished from unmethylated cytosines in the DNA. In some implementations, the conversion process comprises bisulfite conversion. In some implementations, the conversion process comprises the conversion methods used in Enzymatic Methylation Sequencing (EM-seq). In some implementations, unmodified cytosine bases in the biologically derived DNA are converted to uracils by deamination, and can be subsequently replaced by thymine bases in PCR- amplified DNA copies. In some implementations, 5-methylcytosine bases and 5- hydroxymethylcytosine bases are protected from conversion, and are represented as cytosine
bases in PCR-amplified DNA copies. In some other implementations, alternative conversion methods could be used to produce the conversion patterns shown in Table 1.
C = cytosine
T = thymine
5mC = 5-methyl-cytosine
5hmC = 5-hydroxymethyl-cytosine
BS-seq = Bisulfite sequencing
EM-seq = Enzymatic Methyl sequencing (New England Biolabs)
OxBS-seq = Oxidative Bisulfite sequencing
TAB-seq = TET- Assisted Bisulfite sequencing
ACE-seq = APOBEC-Coupled Epigenetic sequencing
TAPS-seq = TET-Assisted Pyridine Borane sequencing
Sequenced base* is sequencing output after conversion of input DNA and PCR.
Table 1 is not comprehensive; additional conversion methods exist and could be used with our approach.
In some implementations, the conversion is performed using chemical reagents, including but not limited to sodium bisulfite, potassium perruthenate, and/or pyridine borane. In some implementations, the conversion is performed using enzymatic methods including, but not limited to APOBEC3A, TET2, and/or T4-betaGal. In some implementations, the conversion is performed using a combination of enzymatic and chemical methods. In some
implementations, adapters may contain modified bases which would be resistant to conversion. To restore methylation patterns after amplification using a CPG methyltransferase enzyme, it is important that methylated CpG sites in the original biologically derived DNA be retained as CpG sites in the converted, PCR-amplified DNA copies, and that unmethylated CpG sites in the original biologically derived DNA be converted to a non-CG sequence in the converted, PCR-amplified DNA copies.
Amplification of converted DNA:
In some implementations, converted DNA is amplified via a polymerase chain reaction (PCR). In some implementations, PCR amplification results in replacement of uracil bases in the converted DNA to thymine bases in the amplified DNA copies. In some implementations, dUTP nucleotides can be included in the PCR buffer to retain uracil bases in the converted DNA as uracil bases in the amplified DNA copies. In some implementations, the PCR amplification can be catalyzed by an enzyme that has the ability to read and amplify DNA templates containing uracil bases, including but not limited to Q5U polymerase (New England Biolabs), Phusion U polymerase (Thermo Fisher), and ZymoTaq Polymerase (Zymo Research). In some implementations, the polymerase chain reaction can be facilitated by thermocycling. In some implementations, an isothermal amplification reaction can be used to amplify the DNA, such as loop-mediated isothermal amplification (LAMP) or rolling circle amplification. In some implementations, PCR amplified DNA copies should be maintained in a double-stranded form to enable subsequent methylation via CpG methyltransferase, which requires double- stranded CpG sequences as substrate. In some implementations, thermocycling can be stopped before the PCR amplification reaches plateau phase to ensure that most amplified products remain double-stranded. In some implementations, fluorescence signal can be monitored via real time quantitative PCR to determine when thermocycling should be stopped prior to plateau phase. When a PCR amplification approaches plateau phase (saturation), the amplified products can become denatured with a low probability of becoming double-stranded by primer-extension in the next cycle (since PCR reagents have been exhausted). If the amplified products have high sequence diversity (such as when genomic DNA is amplified), there is very low probability of re-annealing of top and bottom strand copies of a given template DNA molecule. If PCR reaches plateau phase, many amplified copies will be single-stranded, and will therefore be unable to undergo subsequent methylation via a CpG methyltransferase enzyme. In some implementations, single stranded DNA copies from a PCR amplification that was allowed to
reach plateau phase could be restored to double-stranded DNA by one or more rounds of primer extension in a separate enzymatic reaction. In some implementations, PCR-amplified DNA products can be run on an electrophoretic gel (e.g., agarose) to selectively purify double-stranded DNA fragments of the desired size range. In some implementations, PCR- amplified DNA products can be size-selected based on binding to solid-phase reversible immobilization (SPRI) paramagnetic beads.
Restoration of methylation patterns in converted, amplified DNA'.
In some implementations, a CpG methyltransferase can be used to methylate cytosines bases at CpG sites in the converted, amplified double-stranded DNA copies. In some implementations, the CpG methyltransferase can be M.SssI. In some implementations, the CpG methyltransferase can be a member of the family of DNMT3 enzymes. In some implementations, the CpG methyltransferase can be a member of the DNMT1 enzymes. In some implementations, the CpG methyltransferase can be any methyltransferase with specificity for methylation of CpG sites.
Selective enrichment of densely methylated, converted, amplified DNA:
In some implementations, the converted, amplified DNA copies with methylation patterns restored can be subjected to enrichment based on methylation density of the DNA copies. In some implementations, densely methylated DNA copies are enriched. In some implementations, enrichment of densely methylated DNA copies is enabled by affinity capture using one or more antibodies that specifically bind to 5-methylcytosine. In some implementations, enrichment of densely methylated DNA copies is enabled by affinity capture using any member of the family of methyl binding domain proteins (MBD), or derivatives thereof, that have binding affinity for methylated double-stranded CpG sites. In some implementations, enrichment of densely methylated DNA copies is enabled by affinity capture using MeCP2. In some implementations, 5-methylcytosine bases in the methylated DNA copies can be converted to 5-hydoxymethylcytosine or 5-formylcytosine or 5- carboxycytosine, and enrichment of DNA copies can be enabled by affinity capture based on binding to the correspondingly modified cytosine base. In some implementations, affinity capture of densely methylated DNA can be mediated by any of (but not limited to) the following: antibodies, aptamers, Affibodies, proteins, or peptides.
In some implementations, the selectivity of enrichment of densely methylated DNA can be increased by including methylated competitor DNA molecules in the affinity
purification mixture. In some implementations, the methylated competitor DNA can comprise DNA molecules with a high methylation density to competitively inhibit capture of DNA copies with a lower methylation density, and to promote capture of DNA copies with a high methylation density. In some implementations, the methylated competitor DNA can be synthesized via chemical means on an oligonucleotide synthesizer. In some implementations, the methylated competitor DNA can be produced via PCR amplification of a template that contains multiple CG dinucleotides, followed by methylation of CpG sites in the amplified competitor DNA copies using a CpG methyltransferase. In some implementations, the methylated competitor DNA can be derived from a natural source. In some implementations, the methylated competitor DNA can comprise many copies of a single defined sequence with a defined number of methylated CPG sites. In some implementations, the methylated competitor DNA can comprise many different sequences with a defined number of methylated CPG sites. In some implementations, the methylated competitor DNA can comprise many different sequences with a range of CPG density or CpG content. In some implementations, the methylated competitor DNA can be derived from a biological source including but not limited to animals, plants, microbes, or viruses. In some implementations, the methylated competitor DNA can be derived from chemical synthesis. In some implementations, the methylated competitor DNA can be derived from in vitro enzymatic reactions. In some implementations, the methylated competitor DNA can have a length between 200 base pairs and 400 base pairs. In some implementations, the methylated competitor DNA can have a length between 20 base pairs and 1000 base pairs. In some implementations, the competitor DNA can have a broad range of lengths without any specified limits. In some implementations, the competitor DNA can have an average CpG methylation density of between 3 and 20 methylated CpG sites per 100 base pairs. In some implementations, the competitor DNA can have an average CpG methylation density of between 5 and 15 methylated CpG sites per 100 base pairs. In some implementations, the competitor DNA can have an average CpG methylation density of between 6 and 10 methylated CpG sites per 100 base pairs. In some implementations, the methylation density of the competitor DNA can be adjusted to a level that yields a desired methylation density profile in the captured DNA of interest. In some implementations, various parameters of the affinity purification can be adjusted to achieve a desired methylation density profile in the captured DNA of interest; the perimeters include but are not limited to: amount of competitor DNA, methylation density of competitor DNA, amount of binding protein (or antibody), amount of affinity capture beads, density of capture sites on the beads or surface, temperature
of the capture, buffer conditions of the capture, washing conditions, and conditions of elution. In some implementations, the selectivity of the enrichment method can be adjusted to capture mostly DNA fragments that map to CpG islands. In some implementations, a single round of capture can be performed. In some implementations, two or more rounds of capture can be performed to further remove fragments with low-methylation density. Sequence redundancy of the methylated DNA copies enables highly selective enrichment of densely methylated DNA, optionally including two or more rounds of enrichment, with minimal loss of unique sequence representation.
Post-enrichment amplification:
In some implementations, further PCR amplification is necessary to produce sufficient DNA for next-generation sequencing of the densely methylated DNA copies that were converted, amplified, re-methylated, and enriched via selective capture. In some implementations, a post-enrichment PCR amplification can be performed. In some implementations, primers used in a post-enrichment PCR amplification can incorporate additional sequences in the amplified DNA copies, including but not limited to indices or barcodes to enable sample multiplexing and sequences needed for compatibility with a sequencing platform (e.g., Illumina P5 and P7 sequences). In some implementations, the amplified next-generation sequencing library can be purified and/or undergo size-selection to enrich for DNA products of the appropriate length.
Illustrative figures o f the methods:
The methods described herein are further illustrated in Figures 1, 2, 3, and 4. These figures are provided as examples intended to be representative of specific implementations. However, these examples are for illustrative purposes and are not meant to limit the scope of the invention. Various changes and modifications will be apparent to those skilled in the art and such changes and modifications including, without limitation, those relating to the formulations and/or methods of the invention may be made without departing from the spirit of the invention and the scope of the appended claims.
Figure 1 provides a schematic illustration of an example method of selectively sequencing densely methylated DNA molecules from a population of DNA molecules with varying methylation density, while minimizing the loss of unique sequences derived from said densely methylated DNA molecules. The schematic representation shows 3 examples of biologically-derived input DNA molecules at the top of the figure: one with relatively dense
methylation, a second with relatively sparse methylation, and a third with no methylation. For illustrative purposes, the densely methylated DNA is shown with 4 symmetrically methylated CpG sites. In some examples of biological specimens, densely methylated DNA fragments can have 8 or more methylated CpG sites (symmetric or asymmetric) in fragments of -100 to 250 base pairs in length. The example shows that adapters are ligated to input DNA fragments, and then the DNA undergoes enzymatic methyl conversion (or bisulfite conversion) followed by PCR amplification. The resulting converted and amplified DNA copies have sequences in which unmodified C bases were converted to T bases, whereas methylated or hydroxymethylated C bases were retained as C bases. Of note, some amplified copies shown in the schematic are derived from the converted Watson strand of the input DNA and some are derived from the converted Crick strand. Next, a CpG methyltransferase enzyme is used to restore methylation at unconverted CpG sites in the converted, amplified DNA copies. Because only CpG sites that were methylated in the biologically derived input DNA are retained as CpG sites in the converted, amplified copies, the use of a CpG methyltransferase enables restoration of original methylation patterns. The amplified DNA copies with restored methylation patterns are then shown undergoing selective enrichment of densely methylated DNA copies by competitive binding to methyl binding domain protein (or antibody to 5-mC) and capture on magnetic beads. Densely methylated competitor DNA fragments are added to the capture mix to competitively inhibit the capture of fragments with lower methylation density. The ability to generate multiple redundant copies of each DNA input molecule with methylation patterns restored enables use of stringent capture and enrichment conditions (including an option of more than one round of capture) while preserving representation of unique sequences of densely methylated DNA fragments. The schematic illustrates the purification of a next-generation sequencing (NGS) library of densely methylated, converted sequences. Resulting sequences can be mapped to a reference genome and the original methylation status of cytosine bases can be inferred based on C to T conversion.
Figure 2 provides a more detailed schematic illustration of an example method in which double-stranded biologically-derived input DNA is converted and amplified with restoration of CpG methylation patterns. In the double- stranded biologically-derived input DNA shown at the top of the figure, a symmetrically methylated CpG site is shown (with methylated cytosines on both strands), and several unmethylated cytosines are also shown within and outside of a CpG context. Bisulfite conversion or Enzymatic Methyl conversion results in unmodified cytosines being converted to uracils by deamination. 5-
methylcytosines and 5 -hydroxymethylcytosines are rarely converted to uracils. In this example, subsequent PCR amplification replaces the converted uracils with thymines. Therefore, the only remaining CpG sites after conversion and amplification are those which were methylated (or hydroxymethylated) in the input DNA. A CpG methyltransferase enzyme (such as M.SssI) can thus be used to restore the original methylation patterns in the converted, amplified DNA copies.
Figure 3 provides a detailed schematic illustration of an example method in which single-stranded biologically-derived input DNA is converted and amplified with restoration of CpG methylation patterns. The process is largely analogous to that shown in Figure 2, but the resulting converted, amplified, and re-methylated sequences are derived from conversion of a single-strand sequence.
Figure 4 shows a schematic overview of an example method which enables enrichment of DNA molecules based on density of methylated CpG sites while minimizing loss of unique sequences derived from densely methylated DNA molecules. The figure highlights the creation of a converted, amplified NGS library with methylation patterns restored, which can then be subjected to stringent selection conditions to enrich densely methylated DNA while taking advantage of the sequence redundancy to preserve unique sequence representation of densely methylated DNA fragments.
Next-Generation sequencing or other analytical read-outs:
In some implementations, the converted, densely methylated DNA copies produced using methods described herein can be analyzed via sequencing. In one implementation, sequencing includes but is not limited to next-generation sequencing (NGS) or massively parallel sequencing. In some implementations, an NGS platform used for analysis can be a sequencer made by Illumina. In some implementations, next-generation sequencing can be performed on an instrument manufactured by companies including but not limited to Illumina, Ion Torrent, Pacific Biosciences, Qiagen, Thermo Fisher, Roche, BGI, Complete Genomics, and Oxford Nanopore. In some implementations, sequencing can be performed in paired-end mode or in single-end mode. In some implementations, sequencing read lengths can be between 30 and 1000 bases. In some implementations, long -read sequencing can be used, in which read lengths are not defined. In some implementations, sequencing is performed with 150- or 100-base read-lengths, in paired-end mode. In some implementations, the sequencing output yields a plurality of converted sequences. In some implementations, the converted, densely methylated DNA copies produced using methods
described herein can be analyzed via other analytical means including but not limited to microarrays, pyrosequencing, primer extension assays, hybridization with complementary oligonucleotides, and/or analysis via fluorescence in microfluidic devices.
Analysis of Sequence Data'.
In some implementations, next-generation sequencing of DNA libraries produced using methods described herein yields a plurality of converted DNA sequences. In some implementations, converted sequences comprise sequences in which unmodified cytosine bases in the original DNA molecules are read as thymine bases in the converted sequences. In some implementations, converted sequences comprise sequences in which 5- methylcytosine bases in the original DNA molecules are read as cytosine bases in the converted sequences. In some implementations, converted sequences comprise sequences in which 5 -hydroxy methylcytosine bases in the original DNA molecules are read as cytosine bases in the converted sequences. In some implementations, converted sequences can be aligned to reference genome sequences that have been converted in silico. In some implementations, the plurality of converted sequences can be grouped into sets, wherein each set of sequences is determined to be derived from an individual DNA fragment. In some implementations, converted sequences can be compared to reference genome sequences that have not been converted to infer methylation states of cytosine bases in the original, unconverted DNA molecules. In some implementations, methylation states of multiple CpG sites in a DNA fragment can be used to evaluate a methylation level of the fragment. In some implementations, most converted sequences map to genomic regions with a high density of CpG sites, including but not limited to CpG islands. In some implementations, the converted sequence data can be used to evaluate fragment-level methylation patterns at CpG islands across a genome. In some implementations, fragment-level methylation patterns can be compared to methylation patterns obtained from independent evaluations of DNA derived from any of (but not limited to) the following: healthy tissues, diseased tissues, healthy cells, diseased cells, biospecimens from healthy individuals, biospecimens from individuals with disease, cancer cells, cancer tissues, or biospecimens from individuals with cancer. In some implementations, comparisons of fragment-level methylation patterns with independently obtained methylation data (reference methylation data) can enable identification of fragments that match expected methylation patterns for a cell type, a tissue, or a disease state. In some implementations, comparisons of fragment-level methylation patterns with reference methylation data can enable identification of fragments that do not match expected
methylation patterns (aberrantly methylated fragments). In some implementations, identification of fragments that match expected methylation patterns for a disease state can be used to aid in diagnosis of said disease state. In some implementations, identification of fragments that match expected methylation patterns of a tissue or cell type can be used to infer the presence of DNA or measure the amount of DNA from that tissue or cell type in a biospecimen. In some implementations, identification of fragments that match expected methylation patterns for a particular cancer type can be used to aid in diagnosis of that cancer type. In some implementations, identification of fragments that do not match expected methylation patterns of healthy (non-cancerous) cells or tissues can be used to identify the presence of aberrant methylation patterns in a biospecimen that could be an indication of cancer-derived DNA. In some implementations, the number of fragments that have aberrant methylation patterns in a biospecimen can be used to infer the amount of cancer cell death contributing to tumor-derived cell-free DNA in the biospecimen. In some implementations, the number of fragments that have disease-associated methylation patterns in a biospecimen can be used to infer the amount disease-associated cell-free DNA in the biospecimen. In some implementations, measurement of the number of fragments with cancer-associated or disease-associated methylation patterns can aid in evaluating the extent or degree of the cancer or disease.
Applications'.
In some implementations, disclosed methods can be used for clinical purposes. In some implementations, disclosed methods can be used for research purposes. In some implementations, disclosed methods can be used to determine if a person has a disease state. In some implementations, disclosed methods can be used to determine if a person has cancer. In some implementations, disclosed methods can be used to aid in early detection of cancer. For example, the detection of cancer-specific hypermethylated DNA fragment patterns in a clinical biospecimen such as plasma or urine can be used to identify patients who are likely to have cancer. In some implementations, disclosed methods can be used to estimate probabilities that a cancer originated from a particular type of tissue. For example, different cancer types are known to have cancer-type-specific methylation patterns. In some implementations, hypermethylated DNA patterns can be compared to expected patterns for various types of cancer to find similarities in patterns which can suggest that the hypermethylated DNA fragments were derived from a particular type of cancer. In some implementations, disclosed methods can be used to assess the stage of a cancer, the extent of
a cancer, or the burden of tumor. For example, increased amounts of DNA fragments bearing tumor- specific methylation patterns in a biospecimen such as plasma may indicate a greater amount of tumor-DNA shedding which may be associated with a greater tumor burden (or cancer stage). In some implementations, disclosed methods can be used to assess prognosis of a disease based on evaluation of either the amount or the pattern of disease-specific hypermethylated DNA fragments, or both.
In some implementations, disclosed methods can be used to assess the regression or progression of cancer. For example, changes over time in levels of DNA fragments bearing tumor- specific methylation patterns in a biofluid may indicate a corresponding change in the tumor burden of the patient. In some implementations, disclosed methods can be used to assess treatment response to cancer therapy. For example, it has been shown in many studies that a patient whose cancer is responding to therapy will often have a decrease over time in tumor-derived cell-free DNA (cfDNA) fragments measurable in his or her plasma. In some patients, tumor-derived cell-free DNA is shed at a higher rate initially as cancer cells are killed by the therapy and spill their DNA into the blood (a transient spike). However, eventually as the tumor bulk is reduced with therapy, the level of tumor-derived cfDNA would be expected to decrease. In some implementations, such changes in tumor-derived cell-free DNA levels can be measured by quantifying the amount of DNA fragments bearing tumor- specific methylation patterns.
In some implementations, disclosed methods can be used to assess the presence of residual cancer after a patient receives a curative-intent therapy. For example, a patient’s plasma can be tested following curative-intent therapy to detect the presence of cfDNA fragments containing cancer-associated methylation patterns. However, detection of small amounts of residual cancer after a curative-intent therapy can be challenging due to the very small amount of tumor-derived cfDNA fragments that may be shed into the blood. To improve detection sensitivity for this application, in some implementations, a patient-specific pattern of aberrant hypermethylation can be identified by initial testing of a biospecimen from that patient (for example, testing of tumor tissue or pre-treatment plasma). In some implementations, such a patient-specific pattern can be used to personalize the signal detection algorithm to improve detection sensitivity. In some implementations, a tumorspecific set of aberrantly hypermethylated CpG islands could be identified for a particular patient by testing tumor tissue or pre-treatment plasma of said patient (in which cancerspecific hypermethylated fragments are likely to be more abundant, providing a stronger
signal). In some implementations, by identifying such aberrantly hypermethylated genomic regions that are specific to a particular patient’s tumor(s), one could look in post-treatment plasma for residual hypermethylation signal mapping to those same genomic regions (which would suggest the presence of persistent cancer after therapy). For example, when applied to measurement of cancer signals in a subsequent biospecimen (e.g., post-treatment plasma), a personalized algorithm could assign greater weight to signals from hypermethylated DNA fragments that match aberrant methylation patterns already identified to be present in that patient’s tumor tissue or pre-treatment plasma. In contrast to many existing bespoke tumor- informed mutation detection approaches, the disclosed methods do not require any physical or experimental alterations to the assay, as personalization could be achieved simply by bioinformatic modifications. In some implementations, disclosed methods can be used to assess recurrence of disease after a patient receives cancer therapy. Early detection of recurrent cancer can also require very high detection sensitivity. In some implementations, a personalized signal detection approach could also be employed for this purpose.
In some implementations, disclosed methods can be used to monitor changes in tumor- specific methylation patterns over time in a patient to assess the epigenetic evolution of a tumor. Because the disclosed methods are able to assess hypermethylated CpG island and/or promoter sequences from anywhere in a genome, in some implementations, the methods can dynamically capture the evolution of hypermethylation patterns in a tumor over time. This can be done without pre-defining genomic target regions based on sequencespecific targeting. In some implementations, disclosed methods can be used to identify epigenetic mechanisms of resistance to drug therapy. Because changes in methylation patterns can be monitored dynamically over time in an untargeted manner, in some implementations, the disclosed methods can enable identification of methylation changes that give rise to drug resistance without requiring pre-defined hypotheses for resistance mechanisms. In some implementations, monitoring of dynamic changes in CpG island and/or promoter hypermethylation patterns could enable assessment of evolving cells states (e.g., epithelial to mesenchymal transition, transformation from adenocarcinoma to small cell carcinoma, etc.).
In some implementations, disclosed methods can be used to assess a variety of pathologies by identifying tissue- specific patterns of hypermethylation in blood. For example, a patient with liver cirrhosis may shed liver-derived DNA into the blood stream, allowing DNA fragments with liver-specific methylation patterns to be detected at higher
levels than in the general population. In some cases, the amount of such a signal could be correlated with the severity or extent of disease. In some implementations, changes in tissuespecific hypermethylation signals over time could be used to monitor for exacerbations or improvements in a disease process. Because background methylation patterns (derived from non-diseased cells) tend to be relatively stable over time in a given person, significant changes in disease-related methylation signals can be more readily identified by comparing signal in the same patient at different time points rather than comparing a patient’s signal against measurements in a population. Similar analysis could be applied to methylation patterns that are specific to other organs including but not limited to: kidney, heart, lung, brain, muscles, bones, intestines, and pancreas. In some implementations, disclosed methods can be used to assess transplanted organ rejection based on measurement of organ-specific methylation patterns.
In some implementations, disclosed methods can be used to assess methylation patterns of fetal or placental DNA from the maternal circulation in pregnancy. In some implementations, disclosed methods can be used to assess changes in cells of a person’s immune system as an indication of health or disease. In some implementations, disclosed methods can be used to assess aging. For example, changes in DNA methylation are known to occur as individuals age. Such changes could be measured to evaluate health status via an assessment of epigenetic age. Biological aging can be measured via epigenetic clocks which are based on evaluation of DNA methylation changes. These clocks, such as the Horvath and Hannum clocks, analyze the methylation status of specific age- associated CpG sites across the genome. By comparing the methylation patterns at these sites, it is possible to estimate an individual's biological age, which may reflect the cumulative effect of environmental factors, lifestyle, and genetic predisposition on their aging process. This estimation may correlate with age-related phenotypes and diseases more closely than chronological age. Moreover, tracking DNA methylation changes over time can also help evaluate the effectiveness of interventions aimed at slowing down the aging process, offering a powerful tool for research in aging and regenerative medicine. Importantly, many of the CpG sites whose methylation status is associated with age are found within or in the vicinity of CpG islands. Thus, in some implementations, disclosed methods which enrich densely methylated DNA fragments mapping to CpG islands can be used to evaluate methylation levels and/or patterns that can provide an estimation of biological age. In some implementations, organspecific methylation changes could be evaluated to assess pathology or stress in an organ.
For example, an individual who has a long history of excessive alcohol consumption may have a disproportionately high epigenetically measured age of his or her liver.
In some implementations, disclosed methods can be used in biomedical research to characterize hypermethylation patterns that can provide an assessment of gene expression states. In some implementations, disclosed methods can he used in biomedical research to identify hypermethylation patterns to provide an understanding of fundamental cellular or developmental epigenetic processes. In some implementations, disclosed methods can be used in biomedical research or clinical applications to evaluate methylation patterns in single cells or small clusters of cells because some methods are compatible with analysis of very small amounts of input DNA. In some implementations, disclosed methods can be used to characterize methylation patterns in an ovum, sperm, or embryo to guide clinical decisions pertaining to in vitro fertilization.
In some implementations, disclosed methods can be used to evaluate methylation patterns in DNA derived from vertebrate organisms. CpG islands, which are genomic regions that have a high density of CpG sites, are found in the genomes of nearly all vertebrate organisms. Because disclosed methods can enrich densely methylated DNA fragments regardless of the genomic origin of said fragments, disclosed methods can be applied to analysis of methylation patterns in human and/or non-human vertebrate species. In some implementations, disclosed methods can be used in veterinary medical applications in a manner that is analogous to human medical applications. In some implementations, disclosed methods can be used to detect, diagnose, and/or monitor cancer in vertebrate animals, including but not limited to household pets, farm animals, and horses. In some implementations, disclosed methods can be used to detect, diagnose, and/or monitor various diseases in vertebrate animals. In some implementations, disclosed methods can be used for agricultural applications to detect, diagnose, and/or monitor disease in livestock. In some implementations, disclosed methods can be used in biomedical research applications to study methylation patterns in model organisms including but not limited to mice, rats, frogs, and fish. In some implementations, disclosed methods can be used in laboratory animals having human xenografted tumors. In some implementations, disclosed methods can be used to distinguish methylation patterns arising from xenografted tumor cells versus from the host animal’s cells and/or tissues.
In some implementations, disclosed methods can be used to enrich densely methylated DNA fragments arising from a virus. Prior studies have shown that the
methylation state of DNA in a DNA virus can change depending on whether the DNA is in the virion or in a host cell, and also depending on the state of the host cell. For example, The Epstien-Barr Virus genome is known to be unmethylated in virions but becomes highly methylated during latent infection and in transformed B cells. The proliferation and turnover of EB V-infected B cells can lead to increased shedding of hypermethylated EB V DNA into plasma, which can be exploited as a biomarker signal for lymphoma detection. The DNA of several viruses has been observed to become hypermethylated in virus-associated malignancies (e.g. Epstein -Barr Virus in diffuse-large-B-cell lymphoma and nasopharyngeal carcinoma, Human Papilloma Virus in cervical and head-and-neck cancers, etc.). Since hypermethylated viral DNA has a similar methylation density as CpG islands in vertebrate genomes, it can be enriched from complex DNA mixtures (for example, cell-free DNA) using methods disclosed herein. In some implementations, disclosed methods can be used to enrich densely methylated viral DNA in parallel with densely methylated human and/or vertebrate animal DNA. In some implementations, disclosed methods can be used to measure densely methylated viral DNA as a biomarker of cancer. In some implementations, disclosed methods can be used to enrich densely methylated DNA fragments arising from Epstein-Barr Virus as a biomarker for detection of lymphoma in patients with HIV. In some implementations, disclosed methods can be used to enrich densely methylated DNA fragments arising from Epstein-Barr Virus as a biomarker for detection of post-transplant lymphoproliferative disorder (PTLD) in patients receiving immunosuppressive therapy after organ transplantation. In some implementations, disclosed methods can be used to measure densely methylated viral DNA as a biomarker to evaluate latent or lytic viral state. In some implementations, disclosed methods can be used to measure densely methylated viral DNA as a biomarker of disease involving shedding of said viral DNA from infected cells. For example, shedding of hepatitis B or hepatitis C viral DNA into blood could serve as a biomarker of liver cell death.
In some implementations, disclosed methods can be combined with spatially encoded DNA barcoding techniques to permit genome- wide analysis of methylation patterns at CpG islands and/or promoters in tissues in a spatially resolved manner. In some implementations, spatially encoded DNA barcodes can be incorporated in or added to sequencing adapters. In some implementations, spatial DNA barcodes can be attached by ligation. In some implementations, spatial DNA barcodes can be attached by primer extension. In some implementations, spatial DNA barcodes attachment can be facilitated by a transposase.
In some implementations, disclosed methods can be used to evaluate additional features of enriched densely methylated DNA fragments including but not limited to mutations, DNA fragment size, fragment location within the genome, and/or nucleosome protection pattern. In some implementations, information gained from analysis of such additional DNA features could enable improved biomarker performance compared to analysis of DNA methylation patterns alone. In some implementations, disclosed methods for enriching densely methylated DNA fragments can be preceded by chromatin immunoprecipitation (ChIP) to selectively enrich DNA fragments associated with histones having particular modifications. In some implementations, immunoprecipitation using antibodies that specifically bind to, for example, Histone H3K27me3, Histone H3K9me3, Histone H3K4me3, and/or Histone H3K27ac could be used to enrich DNA fragments based on chromatin features prior to enrichment based on methylation density. In some implementations, such sequential, orthogonal enrichment steps could yield more nuanced biomarker signals and/or improve the sampling of cancer-specific signals.
In some implementations, the ability to directly determine methylation status of CpG sites from converted sequence data results in greater accuracy in measurement of densely methylated DNA fragments. In contrast, methods such as MeDIP or MBD Capture enrich methylated DNA directly from biological sources without conversion, and captured fragments are presumed to be methylated because they were captured based on methylationspecific binding. However, some fragments that have zero or few methylated CpG sites can also be non-specifically captured. Without the ability to assess methylation status, such fragments may be incorrectly presumed to have high CpG methylation content, thereby contributing to inaccurate background noise of an assay. In some implementations, the methods described herein can improve the accuracy of measuring densely methylated DNA fragments because enriched fragments are converted, and can be verified to have a high CpG methylation density by comparison to aligned reference genomic sequences.
Although the currently described methods have been optimized for measurement of small amounts of DNA input such as cell-free DNA in plasma, it is understood that they could be applied more broadly to the analysis of epigenetic modifications from a variety of sources. Examples of such sources include, but are not limited to lymph nodes, tumor margins, pleural fluid, urine, stool, serum, bone marrow, peripheral white blood cells, cheek swabs, circulating tumor cells, cerebrospinal fluid, peritoneal fluid, amniotic fluid, cystic
fluid, frozen tumor specimens, and tumor specimens that have been formalin-fixed and paraffin-embedded.
Kit production:
In some implementations, the methods described herein can be developed into a kit format. In some implementations, a kit for performing the methods disclosed herein can comprise the reagents and materials necessary for the conversion of DNA, for PCR amplification, for CpG methylation, and for enrichment of densely methylated DNA. In some implementations, a kit for performing the methods disclosed herein can additionally comprise reagents and materials necessary for production of next-generation sequencing libraries. In some implementations, a kit for performing the methods disclosed herein can additionally comprise instructions and quality control materials to ensure accurate and reproducible results. In some implementations, a kit for performing the methods disclosed herein can additionally comprise software and/or access to computational resources to enable analysis of next-generation sequencing data.
Two-step ligation approach to minimize formation of adapter dimers:
In some implementations, a method of attaching adapter oligonucleotides to double- stranded DNA fragments can be employed which utilizes two sequential enzymatic ligation steps to minimize formation of adapter dimers. In some implementations, the adapters can be ligated to double-stranded DNA fragments of interest for the purpose of facilitating analysis by next-generation sequencing. Adapter dimers can be formed via ligation of one adapter molecule to another adapter molecule. Adapter dimers can be problematic for nextgeneration sequencing (NGS) libraries. These dimers can dominate the sequencing output, overwhelming the sequence output from the desired DNA fragments of interest. Adapter dimers are more likely to form when the DNA fragments of interest are very low in abundance, as reaction stoichiometry in such situations favors ligation of adapters to other adapters over ligation of adapters to the DNA fragments of interest. Adapter dimers can also be more efficiently amplified during PCR than the desired product of adapters ligated to the DNA fragments of interest because of the dimers’ short length (generally, shorter targets amplify more efficiently in PCR). Therefore, it is important to minimize formation of adapter dimers, especially when DNA input quantities for NGS library preparation are low. A two- step ligation method disclosed herein is able to greatly reduce adapter dimer formation. In some implementations, the method uses stem- loop adapters that lack a 5 ’-phosphate which would be required for adapter self-ligation. As schematized in Figure 5, such a stem-loop
adapter is able to ligate via its 3 ’-end to a 5’-phosphylated strand of a double-stranded DNA fragment of interest (insert DNA), but not to the opposite strand. Importantly, such a stemloop adapter molecule is unable to ligate to another stem loop adapter molecule because of the lack of 5’-phosphate ends on adapter molecules. In some implementations, after removing excess unligated adapters and double-stranded DNA ligase enzyme, a second oligonucleotide having a 5 ’-phosphate, which we refer to as a displacer oligonucleotide, can be hybridized to the ligated stem-loop adapter (by displacing one strand of DNA at the stem) and then ligated to the target DNA in a second enzymatic step. In some implementations, the adapter used in the first ligation step is a stem-loop adapter. In some implementations, the adapter used in the first ligation step can comprise two strands which are partially complementary and hybridized (known in the art as a Y-shaped adapter). In some implementations, the adapter used in the first step can comprise two strands of DNA: a first strand having a 3 ’-end that is available for ligation to a 5 ’-phosphorylated DNA fragment of interest, and a second strand that is either partially or fully hybridized to the first strand in a manner that would enable a DNA ligase to catalyze ligation of the first strand to the DNA of interest but wherein the second strand lacks a 5 ’-phosphate and therefore cannot be ligated. In some implementations, the target DNA fragments can be blunt-ended. In some implementations, the target DNA fragments can have overhangs at their ends, such as a 3’-dA tail. In some implementations, the first ligation step can utilize a DNA ligase that has optimal efficiency for ligation of double- stranded DNA (such as T4 DNA ligase or NEBNext Ultra II DNA ligase). In some implementations, the DNA ligase and the excess unligated adapter molecules can be removed prior to the second ligation step by performing a DNA clean-up step. In some implementations, cleavable positions (such as dU) can be incorporated into adapter oligonucleotides to facilitate hybridization of the displacer oligonucleotides in the second ligation step. In some implementations, a 5-phosphorylated displacer molecule can be ligated to the double-stranded DNA of interest in the second ligation step using a nicksealing ligase such as HiFi Taq DNA ligase (New England Biolabs). Use of a nick-sealing ligase in the second step of ligation ensures that only adapters that were ligated in the first step can serve as complementary templates to facilitate ligation (via nick sealing) in the second step. In some implementations, the two-step ligation method disclosed herein can enable ligation of adapters and displacer oligonucleotides to very low amounts of input DNA (double-stranded DNA of interest). In some implementations, the two-step ligation method disclosed herein can enable next-generation sequencing from DNA derived from a small number of cells (less than 10 cells or less than 100 cells). In some implementations, the two-
step ligation method disclosed herein can enable next-generation sequencing from DNA derived from a single cell.
Figure 5 shows an example schematic of a two-step ligation scheme that enables ligation of adapter sequences to double- stranded DNA fragments of interest while minimizing formation of adapter dimers. The illustrated two-step ligation scheme was used in Example 1 in the Detailed Description section to attach adapter sequences to blunted and 5’- phosphorylated cell-free DNA fragments and genomic DNA fragments. The first step involves ligation of stem-loop adapters to the insert DNA by forming a phosphodiester linkage between a 5 ’ -phosphylated end of the insert DNA and the 3 ’-hydroxyl end of the stem-loop adapter. The 5 ’-end of the stem-loop adapter lacks a phosphate, and therefore cannot be ligated to the insert or to another stem-loop adapter molecule (thereby avoiding adapter dimer formation). Next, USER enzyme is used to cleave at deoxyUridine positions in the stem of the stem-loop adapter to destabilize base-pairing. DNA is then cleaned up to remove unligated adapters, ligase, and USER enzyme. In the second step, a displacer oligonucleotide is added to displace one strand of the stem-loop adapter by hybridization to the opposite (ligated) strand, as shown in the figure. Finally, a nick-sealing ligase (HiFi Taq DNA ligase) is used to ligate the 5 ’-phosphorylated displacer oligonucleotide to the 3 ’-end of the insert DNA. In the Example 1 provided herein, the stem-loop adapter and displacer oligonucleotides were designed to attach sample barcodes and Illumina adapter sequences to the input DNA fragments. However, adapter sequences for other sequencing platforms could be readily substituted.
EXAMPLES
The present technology may be better understood by reference to the following examples. These examples are intended to be representative of specific implementations. However, these examples are for illustrative purposes and are not meant to limit the scope of the invention. Various changes and modifications will be apparent to those skilled in the art and such changes and modifications including, without limitation, those relating to the formulations and/or methods of the invention may be made without departing from the spirit of the invention and the scope of the appended claims.
Example 1:
Enrichment and Sequencing of Densely Methylated Cell-Free DNA Fragments for Identification and Quantitation of Cancer-Specific Hypermethylation Patterns. Materials and Methods:
Collection and processing of patient plasma samples:
Blood was collected by venipuncture into a blood collection tube containing potassium-EDTA or containing a proprietary anticoagulation and stabilization cocktail designed to limit cellular degradation and to stabilize cell-free DNA (Cell-free DNA BCT from Streck). Tubes had 10 mL capacity, and at least 8 mL blood volume was required to be collected in each tube. Blood was inverted in the tube several times at the time of collection to ensure even mixing with the anticoagulant and/or stabilizer. Samples were kept at room temperature (20-25°C) during temporary storage and transportation prior to separation of plasma. Plasma was separated and frozen as soon as possible after blood collection, preferably within four hours if collection was in an EDTA tube or within 2 weeks if collection was in a Streck tube. The collection tubes were centrifuged at 1000 x g for 10 minutes in a clinical centrifuge with a swinging bucket rotor with slow acceleration and deceleration (brake off). Plasma was removed from the red blood cells and buffy coat using a 1 mL pipette, being careful not to disturb the cells in the tube. The plasma was dispensed into 1.5 mL cryovials in 0.5 to 1 mL aliquots. The plasma was then frozen at -80° C until needed for further processing.
Blood was obtained from patients with various types of cancer at various stages. For some patients, blood was obtained at multiple time points before and during therapy. Blood was also obtained from individuals who did not have a cancer diagnosis (control subjects). Some of these control subjects had a history of heavy smoking and were participating in a lung cancer screening program based on eligibility according to the guidelines of the United States Preventative Services Task Force. All subjects provided informed written consent for participation in the study, which was approved by the Human Investigation Committee of Yale University.
Extraction and purification of DNA from plasma, tissue, or cells:
Plasma was removed from the -80° C freezer and was thawed at room temperature for 15 to 30 minutes before proceeding with DNA extraction. Thawed plasma was then centrifuged at 6800 x g for 3 minutes to remove any cryoprecipitate. The supernatant was transferred to a fresh tube for further processing. A QiaAmp® MinElute® Virus Vacuum Kit (Qiagen) was used for extraction of DNA from plasma volumes up to 1 mL (elution volume as low as 20 LI L). For larger volumes of plasma up to 5 mL, the QiaAmp® Circulating Nucleic Acid Kit was used for DNA purification (elution volume as low as 20 pL). All kits were used according to the manufacturer's instructions, generally eluting the DNA into the lowest recommended volume (preferably 20 pL). To process 1 mL of plasma using the
QiaAmp® MinElute® Virus Vacuum Kit, 5 micrograms of carrier RNA (cRNA; Qiagen) were added per mL, and the user-developed protocol found on the Qiagen website was followed. Genomic DNA (gDNA) was extracted from frozen tumor tissue samples or cancer cell lines using a DNeasy Blood & Tissue Kit (Qiagen), according to the manufacturer’s instructions. Before further processing tissue- or cell-derived gDNA for next-generation sequencing library preparation, the DNA was sheared into fragments with an average length of 180 - 200 bp using focused ultrasonication (Covaris). The cell-free DNA and fragmented gDNA samples were quantified by real-time quantitative PCR using a KAPA Human Genomic DNA Quantification and QC Kit for Illumina platforms (Roche) with the 129 bp Primer Premix, suitable for the expected fragment size distribution of the samples.
Library Preparation for Next-Generation Sequencing:
Blunting and ligation o f stem-loop adapters:
Varying amounts of cell-free DNA or fragmented gDNA were obtained and quantified. For library preparation, minimum and maximum input DNA limits were set at 1 ng and 30 ng, respectively. A quantitative spike-in control DNA mixture was added to each sample to enable comparison of library preparation efficiency across samples. The spike-in control mixture consisted of PhiX 174 RF I DNA (New England Biolabs) that was fragmented to an average size of 180-200 base pairs by ultrasonication (Covaris). Approximately 50% of the fragments in the mixture were unmethylated, and 50% of the fragments had undergone CpG methylation using a CpG Methyltransferase (M.SssI; New England Biolabs) according to the manufacturer’s instructions. A total of approximately 2 picograms of the spike-in control mixture was added to each DNA sample.
To make DNA fragment ends compatible with ligation, the DNA samples (each in 20 microliters of 10 mM Tris-HCl pH 7.8 buffer) were treated with an enzyme mix comprising T4 DNA Polymerase and T4 Polynucleotide Kinase as provided in the Quick Blunting Kit (New England Biolabs; following manufacturer’s protocol), to produce 5 ’-phosphorylated, blunt-ended DNA. Enzymes were then heat-inactivated by incubation at 70°C for 10 minutes.
The blunted, 5 ’-phosphorylated DNA was then ligated to custom stem-loop oligonucleotide adapters using the NEBNext Ultra II Ligation Module (New England Biolabs) according to the manufacturer’s protocol. The custom stem loop adapters and accompanying displacer oligonucleotides (see below) were designed to greatly reduce the formation of adapter dimers, thereby enabling preparation of sequencing libraries from very small amounts of input DNA. The adapter oligonucleotide sequences are as follows:
EMSv2m- 1 AGTXYAAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTTAGAXT
EMSv2m-2 TXAGYTAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTAAXTGA
EMSv2m-3 AAXAYTAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTAATGTT
EMSv2m-4 XTAAYGAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTXATTAG
EMSv2m-5 AAGTYTAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTAAAXTT
EMSv2m-6 TATXYXAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTGAGATA
EMSv2m-7 AXAXYTAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTAAGTGT
EMSv2m-8 TGAAYGAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTXATTXA
EMSv2m-9 AXTAYAAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTTATAGT
EMSv2m-10 XATXYTAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTAAGATG
EMSv2m-l l AXTGYTAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTAAXAGT
EMSv2m-12 TGGTYXAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTGAAXXA
EMSv2m- 13 AGXTYTAGA Y AX AXTXTTTXXXTAX AXGAXGXTXTTXTGATXTA AAGXT
EMSv2m- 14 TGATYAAGA YAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTTAATXA
EMS v2m- 15 AT AX YGAG AY AX AXTXTTTXXXTAX AXGAXGXTXTTXTGATXTXAGT AT
EMS v2m- 16 TG AX YX AG AY AX AXTXTTTXXXTAX AXGAXGXTXTTXTGATXTGAGTX A
EMSv2m- 17 AATGYXAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTGAXATT
EMSv2m- 18 TXX A YGAG AY AX AXTXTTTXXXTAX AXGAXGXTXTTXTGATXTXATGG A
EMSv2m-19 ATTXYXAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTGAGAAT
EMSv2m-20 TAXTYGAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTXAAGTA
EMSv2m-21 ATXAYXAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTGATGAT
EMSv2m-22 TGTXYGAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTXAGAXA
EMSv2m-23 XAXAYXAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTGATGTG
EMSv2m-24 TTAGYGAGAYAXAXTXTTTXXXTAXAXGAXGXTXTTXTGATXTXAXTAA
(Note: X denotes 5-methyl-dC and Y denotes deoxyUridine)
A two-step scheme of one-strand ligation followed by displacement and ligation of the second strand is shown in Figure 5. Because the 5 ’-end of the stem- loop adapter was not phosphorylated, an adapter molecule is unable to become ligated to another adapter molecule, thereby minimizing formation of adapter dimers. The 3’ -end of the adapter is able to become ligated to the 5’ -ends of the double- stranded insert DNA fragment which is phosphorylated at its 5 ’-ends. The adapter and displacer oligonucleotides several 5-methyl-dC nucleotides to prevent conversion in subsequent steps. The stem-loop adapters also contained deoxyUridine (dU) within the stem sequence to permit site-specific cleavage by USER enzyme (New
England Biolabs). Upon cleavage at dU, the hairpin structure of the adapters would become unstable, allowing the displacer oligonucleotide to hybridize as described below. Samplespecific barcode sequences were included in the adapter sequence to enable multiplexed sequencing of a plurality of samples on the same lane of a next-generation sequencing
instrument (because sequences can be sorted into sample-specific datasets based on their barcode sequence). We used a set of 24 distinct barcodes, each 5 nucleotides in length. A single uniquely barcoded adapter sequence was used for ligation to each sample, such that 24 individual samples would be ligated to adapters labeled with 24 distinct barcodes (1 to 1 mapping). Because the same barcode sequence was ligated to both ends of a DNA insert, the bioinformatic demultiplexing algorithm would require both ends of the sequence to labeled with the same barcode. The concentration of stem-loop oligonucleotide adapter used in the ligation reaction was 1 micromolar in a final reaction volume of 45 microliters. Upon completion of the ligation reaction (1 hour at 20°C), USER enzyme (New England Biolabs) was added at the manufacturer’s recommended concentration and the sample was incubated at 37°C for 30 minutes to cleave dU sites in the adapters. DNA was then cleaned up to remove enzymes, buffers, and unligated adapters using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified). The clean-up process included wash steps followed by elution in 12 microliters of 10 mM Tris-HCl pH 7.8 for each sample.
Ligation of displacer oligonucleotides:
As the second step in the two-step ligation reaction, displacer oligonucleotides which have a phosphorylated 5 ’ -end were hybridized to the complementary sequence of the ligated adapter oligonucleotide and were ligated using HiFi Taq Ligase (New England Biolabs) according to the manufacturer’s protocol. HiFi Taq Ligase efficiently seals nicks in DNA with very high fidelity, exhibiting greatly reduced ligation efficiency if there are mismatched base pairs at either side of the ligation junction. The two-step ligation method which greatly reduces adapter dimer formation is schematized in Figure 5. The sequence of the displacer oligonucleotide (with 24 distinct barcode sequences that match the barcode sequences of the stem-loop adapter oligonucleotides) is as follows:
EMSv2m-dil phosphate-AGTXTAAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di2 phosphate-TXAGTTAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di3 phosphate-AAXATTAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di4 phosphate -XT AATGAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di5 phosphatc-AAGTTTAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di6 phosphate-TATXTXAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di7 phosphate-AXAXTTAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di8 phosphate-TGAATGAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di9 phosphate-AXTATAAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dilO phosphate -XATXTTAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dill phosphate-AXTGTTAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dil2 phosphate-TGGTTXAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dil3 phosphate-AGXTTTAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dil4 phosphate-TGATTAAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dil5 phosphate-ATAXTGAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dil6 phosphate-TGAXTXAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dil7 phosphate-AATGTXAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dil8 phosphate-TXXATGAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-dil9 phosphate-ATTXTXAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di20 phosphate-TAXTTGAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di21 phosphate-ATXATXAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di22 phosphate-TGTXTGAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
EMSv2m-di23 phosphate -XAXATXAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTX AX
EMSv2m-di24 phosphate-TTAGTGAGATXAGAAGAGXAXAXGTXTGAAXTXXAGTXAX
(Note: X denotes 5-methyl-dC)
For any given sample, the displacer oligonucleotide used in the second ligation step had the same barcode as the stem-loop adapter used in the first ligation step, to ensure perfect base pairing between the displacer and adapter sequences in the vicinity of the ligation junction. The concentration of the displacer oligonucleotide in the reaction was 0.5 micromolar, and the final reaction volume was 25 microliters (for each sample). The reaction was incubated at 60°C for 30 minutes. Ligated DNA was then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified). At this point, because adapter and displacer oligonucleotides containing sample- specific barcodes were ligated to the original template DNA molecules (considered the ligation inserts), multiple samples were pooled into a combined volume for further processing. Batches of 12 samples were pooled into a combined volume, wherein each sample had a distinct sample- specific barcode within the ligated adapter and displacer sequences. Cleaned-up DNA was eluted in 42 microliters of 10 mM Tris-HCl pH 7.8 for each batch.
Enzymatic conversion of unmethylated cytosine bases to uracil bases
Enzymatic conversion of ligated DNA was performed using the NEBNext® Enzymatic Methyl-seq (EM-seq) Conversion Module kit (New England Biolabs), according to the manufacturer’s instructions. This method is an alternative to bisulfite conversion, causing less damage, fragmentation, GC bias, and degradation of DNA. The method enables identification of 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC) bases in
DNA by efficiently converting unmodified cytosine bases (not 5-mC or 5-hmC) to uracil bases. The EM-seq method comprises two steps: (1) The enzyme TET2 is used to oxidize 5- mC and 5-hmC to 5-carboxycytosine (5-caC), providing protection from deamination by APOBEC enzy me; (2) The enzyme APOBEC is used to deaminate unmodified cytosines to uracils, while the 5-mC and 5-hmC bases which were oxidized to 5-caC in the first step are protected from deamination. Between the two steps, TET2-converted DNA was cleaned up according to the manufacturer’s protocol. Because APOBEC-mediated deamination of cytosine is more efficient with single-stranded DNA, formamide was used to denature the DNA prior to the APOBEC enzymatic reaction, according to the manufacturer’s protocol. Notably, the ligated adapter and displacer oligonucleotides contained several 5-mC positions which were protected from deamination and conversion to uracils. Converted DNA was then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified). Cleaned-up DNA was eluted in 20 microliters of 10 mM Tris-HCl pH 7.8 for each batch of 12 samples.
Amplification of converted DNA by PCR:
The DNA which was converted using the EM-seq Conversion Module kit was then amplified via a polymerase chain reaction (PCR). PCR-amplification was carried out using the NEBNext® Q5U Master Mix (New England Biolabs), according to the manufacturer’s protocol. The Q5U high fidelity DNA polymerase harbors a mutation which enables amplification of templates containing uracil bases. PCR primers were designed to hybridize to the adapter and displacer sequences, and the primers comprised the following sequences (5-carboxy-cytosine bases were included to prevent methylation at those bases in subsequent steps):
R1 : ACACTCTTTCCCTACAZGAZGCTCTTCTGATCT
R2 : GTGACTGGAGTTC AG AZGTGTGCTCTTCTGATCT
(Note: The Z positions in both sequences denote 5-carboxycytosine, which was incorporated via an oligonucleotide synthesizer using 5-Carboxy-dC-CE Phosphoramidite [Glen Research])
Primers were added to the reaction at a final concentration of 200 nanomolar for each primer. SYBR Green I dye (Thermo Fisher Scientific) was added to the PCR reaction at the concentration recommended by the manufacturer to permit fluorescence-based measurement of double- stranded DNA amplification during real-time quantitative PCR. Quantitative PCR was carried out on a CFX96™ System (Bio-Rad) thermocycler, and change in fluorescence
signal during the reactions was monitored in real-time. Samples were removed from the thermocycler as the amplification neared saturation (plateau of fluorescence signal), but approximately 2-3 cycles prior to reaching saturation. This was done to ensure that most amplified molecules would be double-stranded, because the CpG methyltransferase used in the next step is only able to methylate CpG sites in double- stranded DNA. Thermocycling parameters were as follows: (1) 98°C for 30 seconds, (2) 98°C for 10 seconds, (3) 62°C for 30 seconds, (4) 65°C for 60 seconds, (5) repeat thermocycling steps #2-4 until the real-time fluorescence signal begins to plateau. Samples were removed from the thermocycler after the 65°C extension step, approximately 2-3 cycles prior to reaching plateau of fluorescence. During PCR, uracil bases in the template DNA were replaced with thymine bases in the DNA copies, whereas 5-carboxycytosine (oxidation product of 5-mC and/or 5-hmC) bases in the template DNA were replaced with cytosine bases in the DNA copies. As a result, methylated cytosine bases (5-mC or 5-hmC) in the original template DNA were retained as cytosine bases in the converted, PCR-amplified copies. Any unmodified cytosine bases in the original template DNA were converted to thymine bases in the converted, PCR-amplified copies. Notably, the conversion process did not achieve completely accurate discrimination of methylated vs. unmethylated cytosines, resulting in a small percentage of methylated cytosines being converted to thymines and a small percentage of unmodified cytosines being retained as cytosines in the converted, PCR-amplified copies. Converted, PCR-amplified DNA was then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified). Cleaned-up DNA was eluted in 12 microliters of 10 mM Tris-HCl pH 7.8 for each batch of 12 samples.
Restoration of methylation patterns at CpG sites on converted, PCR-amplified DNA :
Taking advantage of the observation that only CpG sites which were methylated (or hydroxymethylated) in the original template DNA were retained as CpG sites in the converted, amplified DNA copies (with rare exceptions due to imperfect conversion), the methylation patterns in the original template DNA were restored in the amplified DNA copies using a CpG Methyltransferase enzyme (M.SssI, New England Biolabs). This enzyme methylates cytosine residues to produce 5 -methylcytosine on both strands within the doublestranded dinucleotide recognition sequence 5’. ..CG. ..3’. The double- stranded, converted, amplified DNA copies underwent CpG methylation using M.SssI according to the manufacturer’s protocol (including a buffer supplemented with S-adenosylmethionine). CpG sites (CG dinucleotides) that were unmethylated in the original template DNA were
converted to TG dinucleotides (or CA if an unmethylated C on the opposite strand was converted), which were not targets for methylation by M.SssI. Similarly, cytosines that were either methylated or unmethylated outside of a CpG context were not methylated by M.SssI. Hence, the methylation pattern at CpG sites in an original template DNA molecule could be reconstituted on the DNA copies after conversion and PCR-amplification using a CpG methyltransferase. Converted, amplified DNA copies with methylation patterns restored were then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1 .3 x the volume of bead slurry relative to the reaction volume to be purified). Cleaned-up DNA was eluted in 12 microliters of 10 mM Tris-HCl pH 7.8 for each batch of 12 samples.
Selective capture of densely methylated DNA copies
Next, densely methylated DNA copies were selectively captured using an affinity purification approach based on binding to a Methyl-CpG-binding domain (MBD) protein. This protein binds to double-stranded DNA that contains one or more symmetrically methylated CpG sites (methylated on both DNA strands). A MethlyCap kit (Diagenode) was used to enrich densely methylated DNA copies from a mixture of DNA copies, according to the manufacturer’s instructions. The kit uses a MethylCap protein which consists of the MBD of human MeCP2 as a C-terminal fusion with Glutathione-S-transferase (GST) containing an N-terminal His6-tag. The MethylCap Kit first bound MethylCap proteins to methylated DNA fragments in solution, and then the complexes were captured with glutathione-coated magnetic beads. A magnetic field was used to isolate the beads, and after two wash steps, the methylated DNA was eluted from the beads into a high salt buffer provided in the kit. A densely methylated competitor DNA (1 microgram) was mixed with the methylated library DNA copies prior to binding with MethylCap protein to reduce the probability of capture of less densely methylated DNA fragments. The competitive capture and elution process was repeated a second time to yield a library with even lower representation of fragments with low or moderate methylation density (less than 8 methylated CpG sites per DNA fragment). Two sequential rounds of capture and elution could be performed with minimal loss of representation of unique sequences of densely methylated DNA because each original DNA fragment (from a biological source) was represented by multiple copies. Therefore, loss of some copies under stringent competitive capture conditions (even two rounds) would be unlikely to result in complete loss of unique sequence representation from densely methylated original DNA fragments. Following each round of capture and elution of the densely methylated DNA into high-salt elution buffer, the DNA
was cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified). Cleaned-up DNA was eluted in 12 microliters of 10 mM Tris-HCl pH 7.8 for each batch of 12 samples.
The competitor DNA consisted of a double-stranded, 226 base-pair long PCR product (amplicon) containing 10 CpG sites, derived from amplification of a segment of PhiX174 RF I phage DNA (New England Biolabs). Importantly, the competitor DNA did not contain sequences that would be required for next-generation sequencing (e.g., Illumina Read 1 and Read 2 sequences), so although the competitor DNA was captured and eluted along with the densely methylated library DNA fragments, the competitor DNA was not able to participate in downstream PCR or sequencing reactions. The following primers were used to generate the PCR-amplified competitor DNA: lOCpGFWD:
5 ’ -TGGCCTTATGGTTACAGTATGCCCATC-3 ’ lOCpGREV:
5 ’ -CTACG ATGCTCGGTTTTTAGTGAGTTGTTCCGTTCTTTAGCTCCTAGACCTTTA- GCAGCGAGGTC-3’
PCR was performed using EmeraldAMP® Max HS PCR Mastermix (Takara) to produce a high yield of amplified DNA according to the manufacturer’s instructions, using 0.5 ng of PhiX174 RF I phage DNA (New England Biolabs) as a template and 0.5 micromolar concentration of each primer. Thermocycling parameters were set according to the manufacturer’s recommendations, with an annealing temperature of 60°C. PCR was carried out to saturation (plateau phase) to maximize product yield. The PCR product was purified using QIAquick PCR Purification kit (Qiagen) according to the manufacturer’s protocol. The purified DNA underwent CpG methylation using M.SssI (New England Biolabs) according to the manufacturer’s protocol. Methylated competitor DNA was again purified using a QIAquick PCR Purification kit (Qiagen) according to the manufacturer’s protocol. For each round of competitive capture performed on a batch of 12 samples, 1 microgram of methylated competitor DNA was used.
Further PCR amplification of selectively captured DNA copies using index primers:
The selectively captured densely methylated DNA copies were then further amplified by PCR to produce enough DNA library copies for loading onto a flow cell of an Illumina NovaSeq next-generation sequencing instrument. The PCR amplification was carried out using NEBNext® Dual Index Primers for Illumina® (with 8 base-pair indices) according to
the manufacturer’s protocol. A distinct Illumina index pair was used for each batch of 12 samples that were intended to be sequenced on the same lane of the sequencing flow cell (allowing multiple batches of 12 samples to be sequenced in a multiplexed fashion on a single flow cell lane). As many as 8 batches (96 samples total) have been successfully multiplexed on a single flow cell lane. NEBNext® Q5U Master Mix (New England Biolabs) was used for the PCR amplification. Primers were added to the reaction at a final concentration of 200 nanomolar for each primer. SYBR Green I dye (Thermo Fisher Scientific) was added to the PCR reaction at the concentration recommended by the manufacturer to permit fluorescence-based measurement of double- stranded DNA amplification during real-time quantitative PCR. Quantitative PCR was carried out on a CFX96™ System (Bio-Rad) thermocycler, and change in fluorescence signal during the reactions was monitored in real-time. The following thermocycling parameters were used: (1) 98°C for 30 seconds, (2) 98°C for 10 seconds, (3) 62°C for 30 seconds, (4) 65°C for 60 seconds, (5) repeat thermocycling steps #2-4 until the real-time fluorescence signal begins to plateau. Samples were removed from the thermocycler after the 65°C extension step, approximately 1-2 cycles prior to reaching plateau of fluorescence signal. Amplified, indexed sequencing libraries were then cleaned up using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol (adding 1.3 x the volume of bead slurry relative to the reaction volume to be purified). Cleaned-up DNA was eluted in 12 microliters of 10 mM Tris-HCl pH 7.8 for each batch of 12 samples.
Gel-purification of sequencing library’’.
The amplified, indexed libraries were further purified on a precast E-Gel™ SizeSelecf™ II Agarose Gel, 2% (Thermo Fisher) using an E-Gel Power Snap Electrophoresis System (Thermo Fisher), according to the manufacturer’s protocol. Using a DNA ladder run in a separate gel lane as a size reference, a band in the size range of approximately 320-360 base-pairs (representing the expected size of the library) was recovered from the gel for libraries produced from cfDNA with a mononucleosomal size distribution. If a library was produced from sheared genomic DNA, a broader size distribution was expected due to random fragmentation, and accordingly a broader band was recovered in a size range of approximately 300-380 base pairs. The DNA was recovered in deionized water and could be used without further purification as a library input for nextgeneration sequencing on an Illumina flow cell (after appropriate adjustment of concentration).
Next-seneration sequencing :
To prepare the library for loading onto an Illumina NovaSeq flow cell, the concentration of DNA was measured using a KAPA Library Quantification Kit (Roche) according to the manufacturer’s protocol. The size profile and concentration of the libraries was also evaluated on a Bioanalyzer (Agilent). Libraries were diluted to the concentration recommended for the flow cell to be used (both S 1 and S4 flow cells were used in different experiments). Cluster formation was carried out on the flow cell according to Illumina’s protocol. Sequencing was performed on a NovaSeq 6000 instrument in multiplexed paired- end mode, with a read length of 150 base pairs in each direction (2 x 150 bp mode). Two index reads were also performed, with read lengths of 8 bases each. Data was output to a server from which files could be downloaded for further processing.
Processing of next-generation sequencing data'.
The sequence output from the Illumina sequencer was analyzed according to the following general scheme. First, read pairs were demultiplexed based on Illumina indexes to sort read pairs arising from different sample batches. Then, then read pairs were further sorted based on sample barcodes to yield sample-specific sets of read pairs. Read pairs were discarded if their sample barcode sequences did not exactly match one of the used barcodes or if the barcodes of a pair of reads did not match each other. Low-quality reads were also filtered out according to quality filtering parameters recommended by Illumina. Next, any adapter sequences identified at the ends of reads were trimmed. Each read-pair from a given cluster was then joined by overlapping the 3 ’-regions to re-create a full sequence of a DNA insert fragment (merged read pairs). Any read-pairs that had <95% sequence agreement in their overlapping 3 ’-regions (imperfect complementarity) were discarded because such discrepancies would be indicative of sequencer errors. Next, an initial de-duplication was performed to remove any replicate sequences that had exactly identical sequences. Such deduplicated sequences were annotated to record the number of replicate sequences that were collapsed into a single sequence. Resulting sequences were then further processed using Bismark software (Babraham Bioinformatics Institute) to map sequences to the human genome (using an in silico C to T converted reference genome) and to perform methylation status calling (using an unconverted reference genome). Build version hg38 of the human reference genome was used. Bismark used the short read aligner Bowtie 2 to map sequences to the human genome. A further de-duplication step was performed by Bismark to remove alignments mapping to the same position (including start and end positions) in the genome, unless the sequences aligned to the same genomic position but on different strands.
Analysis of converted, mapped sequences to identify aberrant CpG island hypermethylation:
The mapped, deduplicated DNA sequences (derived from individual strands of the original input DNA fragments) with cytosine methylation status information were then further filtered to distinguish sequences that were derived from fragments with dense CpG methylation vs. those that were derived from fragments that did not undergo efficient conversion. Fragments with a low conversion efficiency could be identified as those in which many cytosines were read as being methylated outside of the CpG context. Because methylation of cytosines outside of CpG sites is known to occur rarely in DNA from vertebrates, such events can be presumed to be mostly representative of technical artifacts resulting from unmodified cytosines that were not efficiently converted to uracils. Thus, to select sequences that were considered to be truly densely methylated, sequences were required to meet all of the following filter criteria: (1) must contain no more than 20% cytosines that were read as being methylated outside of a CpG context, (2) must contain a minimum of 10 CpG sites, and (3) must contain a minimum of 80% methylated cytosines at CpG sites.
Next, to identify sequences that were aberrantly hypermethylated among the truly densely methylated sequences, the methylation levels of the sequences were compared to expected methylation levels at the same genomic locations in a variety of healthy cells. Publicly available data from whole genome bisulfite sequencing (WGBS) of various cells and tissues were used for this comparison. Since cell-free DNA in healthy plasma is known to be mostly derived from physiological turnover of blood cells, WGBS data from a variety of blood cells were used. WGBS data from several other organs were also used since some healthy cfDNA has been shown to arise from organ tissues (e.g., liver). WGBS datasets were obtained from the Blueprint Epigenome Project (International Human Epigenome Consortium) and the Roadmap Epigenomics Project (NIH Roadmap Reference Epigenome Mapping Consortium).
From the Blueprint Project, WGBS data from the following cell types were used: alternatively activated macrophage, band form neutrophil, CD 14-positive CD 16-negative classical monocyte, CD3-negative CD4-positive CD8-positive double positive thymocyte, CD3-positive CD4-positive CD8-positive double positive thymocyte, CD34-negative CD41- positive CD42 -positive megakaryocyte cell, CD38-negative naive B cell, CD4-positive alpha-beta T cell, CD4-positive alpha-beta thymocyte, CD8-positive alpha-beta T cell, CD8- positive alpha-beta thymocyte, central memory CD4-positive alpha-beta T cell, central memory CD8-positive alpha-beta T cell, class switched memory B cell, conventional dendritic cell, cytotoxic CD56-dim natural killer cell, effector memory CD4-positive alpha-
beta T cell, effector memory CD8-positive alpha-beta T cell terminally differentiated, effector memory CD8-positive alpha-beta T cell, erythroblast, germinal center B cell, hematopoietic multipotent progenitor cell, inflammatory macrophage, macrophage, mature eosinophil, mature neutrophil, memory B cell, monocyte, naive B cell, neutrophilic metamyelocyte, neutrophilic myelocyte, plasma cell, precursor B cell, precursor lymphocyte of B lineage, regulatory T cell, segmented neutrophil of bone marrow, and thymocyte.
From the Roadmap Project, WGBS data from the following cell types were used: aorta, esophagus, left ventricle, liver, lung, macrophage, natural killer cell, pancreas, primary hematopoietic stem cells GCSF-mobilized, psoas muscle, sigmoid colon, small intestine, spleen, stomach, T Cell, and thymus.
For each mapped, deduplicated sequence (derived from an individual strand of a DNA fragment), an expected average methylation level was calculated for each healthy tissue or cell type by averaging the beta values at all CpG sites in the genomic region covered by the sequence. For example, if a sequence was mapped to a 170 base-pair region of a CpG island on chromosome 2, and this region contained 13 CpG sites, an average methylation level would be calculated for each healthy tissue or cell type by averaging the 13 beta values at the corresponding genomic region in the WGBS data. Thus, for each DNA sequence, a list of corresponding expected average methylation level values was generated from the healthy tissue/cell public WGBS datasets. A sequence (fragment) was considered to be aberrantly hypermethylated if it passed the filters for being considered a truly densely methylated sequence, and additionally, none of the expected average methylation level values from all healthy samples exceeded 0.4 (or 40%). In other words, a truly densely methylated sequence was considered to be aberrantly hypermethylated if it mapped to a genomic region that was known to have a low expected average methylation level in all queried healthy cell types and tissues. Furthermore, if an aberrantly hypermethylated sequence also mapped to a genomic region annotated as a CpG island, it was considered an aberrantly hypermethylated CpG island sequence. Analysis of cfDNA from plasma samples showed the ratio of aberrantly hypermethylated sequences to truly densely methylated sequences was frequently higher in the plasma of cancer patients than in non-cancer control subjects. Cancer-specific patterns of aberrantly hypermethylated CpG island sequences can be established by testing plasma from many patients with various types of cancer and from many non-cancer control subjects. Sequence data from subsequent patient samples could then be analyzed in the context of these patterns to predict the presence or absence of cancer as well as the tissue of origin of a cancer signal.
Example 2:
Change in distribution of CpG site content and of genomic mapping pattern of DNA fragments with enrichment of densely methylated sequences.
In this example, the effectiveness of the disclosed methods for enriching densely methylated DNA molecules is evaluated. Analysis of CpG content and genomic mapping patterns of cell-free DNA obtained from ~1 mL plasma of a patient with metastatic non-small cell lung cancer (before vs. after enrichment of densely methylated DNA fragments) is presented in Figures 6 and 7. NGS libraries were prepared according to the methods described in Example 1, but the unenriched library was directly subjected to post-capture amplification without undergoing enrichment.
Figure 6 shows histograms comparing the CpG dinucleotide content of sequenced cfDNA fragments before vs. after two rounds of selective capture and elution of densely methylated DNA fragments (which is referred to here as high density methyl-capture). In these histograms, the CpG dinucleotide count refers to the number of CpG sites (methylated, hydroxymethylated, or unmethylated) in a biologically derived input DNA fragment, not the remaining (unconverted) CpG sites after conversion and amplification. Red boxes are included to highlight the robust enrichment of fragments harboring 8 or more CpG sites (in fragments averaging -170-180 bp in length), which is the methylation density range typically found in CpG islands and promoters.
Figure 7 presents a genomic map showing a change in alignment and coverage of sequenced cell-free DNA fragments in the region of the PAX8 gene on Chromosome 2 before vs. after two rounds of selective capture and elution of densely methylated DNA fragments. Preparation of the native library comprised steps of conversion, amplification, and restoration of methylation patterns using methods disclosed herein. The enriched library was further subjected to two rounds of methyl binding domain-based affinity capture and elution with competitive binding of a 226 base-pair competitor DNA containing 10 methylated CpG sites. In the unenriched (native) library, sequences mapped in a largely random pattern throughout the genome. In the library that was enriched for densely methylated fragments, the vast majority of reads mapped to CpG islands. This enrichment results in a high yield of cancerrelevant signal after enrichment. Importantly, the enrichment is based on methylation density, not based on hybrid-capture of pre-defined sequences. It is also noteworthy that multiple cfDNA fragments from a sample can map to the same CpG island region, providing strong signal confirmation.
Example 3:
Patterns of aberrant hypermethylation of cell-free DNA fragments in plasma from patients with various types of cancers and from non-cancer control subjects.
In this example, the ability of the disclosed methods to capture cancer-specific hypermethylation signals from cell-free DNA in clinical plasma samples was demonstrated. Plasma samples (~1 mL) were tested from 11 patients with various types of advanced-stage cancer and from 8 individuals with no known cancer history who were undergoing lung cancer screening because they had a heavy smoking history (meeting US Preventative Services Task Force eligibility criteria). Samples were tested according to the methods described in Example 1.
Figure 8 shows a heat map displaying genomic regions at which aberrantly hypermethylated sequences from cell-free DNA fragments were observed to map in plasma of 11 patients with various types of cancer (advanced stage) and 8 non-cancer control subjects who were heavy smokers participating in a lung cancer screening program. Results are displayed for chromosome 2 (chosen arbitrarily), which is representative of genome-wide patterns. Dark bars represent genomic regions at which mapping is observed of one or more cfDNA fragments that are categorized as aberrantly hypermethylated. Such fragments are densely methylated but map to genomic regions that are expected to have a methylation level of less than 40% (averaged across all CpG sites) in multiple types of healthy cells and tissues based on publicly available whole genome bisulfite sequencing data (from Roadmap and Blueprint studies). The difference in signal between cancer cases and non-cancer control subjects is striking. The distinct patterns of hypermethylation between samples underscores the importance of untargeted capture. If a panel of targeted hybrid-capture oligonucleotides had been used instead, such comprehensive capture for all cancer types would not have been possible. These results demonstrate the ability of the assay to capture aberrant promoter hypermethylation signals regardless of genomic location and from multiple types of cancer.
Example 4:
Non-invasive monitoring of the evolution of tumor derived aberrant hypermethylation patterns in cell-free DNA.
In this example, changes in aberrant hypermethylation patterns in cell-free DNA during cancer therapy were evaluated. Longitudinal plasma samples (~1 mL each) were obtained from a 66-year-old male patient with metastatic non-small cell lung cancer who received therapy with the drugs olaparib and cediranib. Cell-free DNA was extracted from plasma and was tested according to the methods described in Example 1.
Figure 9 shows the evolution of plasma cell-free DN A hypermethylation patterns over time in a patient with metastatic non-small cell lung cancer being treated with olaparib and cediranib. The patient’s cancer initially showed a modest response to therapy (considered stable disease by RECIST criteria), and subsequently showed progression. As an example, initial shrinkage followed by enlargement of a liver metastasis are shown in computer tomography (CT) scan images taken at baseline (prior to therapy) and at cycles 4 and 8 of therapy (each cycle is 28 days). A graph is also provided showing changes over time in tumor burden (defined as sum of diameters of target lesions according to RECIST guidelines) and in tumor-derived cfDNA level (measured as the variant allele fraction [VAF] of a tumorspecific KRAS mutation in plasma cell-free DNA). The tumor burden initially decreases with the drug therapy but then increases, likely because of growth of treatment-resistant tumor clones. The mutant tumor-derived cfDNA level shows a transient spike in level (possibly due to initial cell kill) followed by a decline, indicative of tumor response. After ~4 months, the tumor-derived cfDNA level began to increase, indicative to tumor progression. In the plot below, aberrantly hypermethylated cfDNA fragment counts mapping to chromosome 10 (as a representative example) are shown at 4 time points: at baseline, shortly after beginning treatment, at the nadir of mutant tumor-derived cfDNA level, and when the cancer has clearly progressed. Each circle indicates the observation of one or more aberrantly hypermethylated cfDNA fragments mapping to a CpG island at that genomic location. Circle size is proportional to the number of fragments mapping to a given CpG island. Blue circles indicate CpG islands at which aberrantly hypermethylated cfDNA fragments mapped at baseline and during therapy. Green circles indicate CpG islands at which aberrantly hypermethylated fragments were observed at either of the first two time points but not thereafter. Red circles indicate CpG islands at which aberrantly hypermethylated fragments were not observed at either of the first two time points but emerged thereafter. Analysis of such evolving aberrant hypermethylation patterns at CpG islands can provide biological and clinical insights pertaining to epigenetic resistance mechanisms, tumor heterogeneity, prognosis, and response or lack of response to therapy.
Example 5:
Analyzing longitudinal changes in methylation patterns over time in the same patient facilitates identification and monitoring of personalized disease-associated methylation signals.
Because the methods disclosed herein for enrichment of densely methylated DNA fragments do not require pre-specification of a panel of targeted genomic regions, a
comprehensive assessment of genome- wide hypermethylation patterns at CpG islands can be obtained. This approach can permit identification of patient-specific hypermethylation patterns in a hypothesis-free manner. However, considering cancer signal detection in cell- free DNA in plasma as an example, some CpG islands are commonly hypermethylated in healthy cell-free DNA (derived from healthy cells), whereas some CpG islands are aberrantly hypermethylated in cell-free DNA derived from cancer cells and are rarely hypermethylated in healthy cell-free DNA. To identify which hypermethylated CpG islands are likely to be cancer-associated in a given patient’s cell-free DNA, various logical approaches can be applied alone or in combination: (1) identify hypermethylated cell-free DNA fragments that map to CpG islands which are rarely hypermethylated in healthy plasma, (2) identify hypermethylated cell-free DNA fragments that map to CpG islands which are known to commonly become hypermethylated in cancer cells based on data from studies of other patients, and/or (3) identify hypermethylated cell-free DNA fragments that map to CpG islands whose fragment counts (relative to other CpG islands in the same biospecimen) change over time in concert with changes in tumor burden (e.g., relative DNA fragment counts mapping to a CpG island can increase over time with disease progression or decrease over time when tumors shrink in response to effective therapy). Once a patient- specific (personalized) set of aberrantly hypermethylated genomic regions has been identified, then this information can be used to improve sensitivity and/or specificity for detecting tumor- derived signals in subsequent biospecimens obtained from the same patient. By knowing which genomic regions are likely aberrantly hypermethylated in a particular patient’s tumor(s), observation of hypermethylated DNA fragments mapping those genomic regions in a subsequent biospecimen can be considered to have a greater probability of being tumor- derived. In contrast, observation of hypermethylated DNA fragments mapping outside of those genomic regions would be less likely to be tumor-derived. Similar approaches can be applied to DNA derived from other biological samples beyond just plasma.
In this example, densely methylated cell-free DNA fragment counts mapping to CpG islands were compared in pre-treatment versus post-treatment plasma of the same patient. This example illustrates that counts of cell-free DNA fragments mapping to CpG islands which are commonly hypermethylated in healthy cells remain fairly stable over time in a given patient (considered to be background signal). In contrast, counts of cell-free DNA fragments mapping to CpG islands which are hypermethylated in cancer cells (but are rarely methylated in healthy cells) can vary over time as a patient undergoes cancer therapy. As tumor burden decreases with therapy, shedding of cell-free DNA from the tumor can
decrease, and tumor-specific methylation signals can decrease relative to the more stable background methylation signal from healthy cells.
In this example, longitudinal plasma samples (~1 mL each) were obtained from a 76 year-old male patient with metastatic non-small cell lung cancer who received immune checkpoint inhibitor therapy with the drug Pembrolizumab. Plasma samples were obtained from the patient prior to initiating therapy (on cycle 1 day 1) and again after completing one cycle of treatment (on cycle 2 day 1, prior to receiving the second cycle). Cell-free DNA was extracted from plasma and was tested according to the methods described in Example 1 .
Figure 10 shows a graph in which cell-free DNA fragment counts mapping to various CpG islands are displayed for two time points (pre-treatment on the X-axis, and after 1 cycle on the Y-axis). Each data point on the graph shows cell-free DNA fragment counts mapping to an individual CpG island at the two time points. The graph shows that at many CpG islands, the relative fragment counts mapping to those CpG islands remain fairly stable over time, suggesting that these CpG islands are unlikely to be cancer-associated (considered background signal). The graph also shows that at some other CpG islands, the relative fragment counts mapping to those CpG islands decrease substantially from the pre-treatment sample to the post-treatment sample, suggesting that these CpG islands are likely to be cancer-associated. Such analysis can facilitate identification of CpG islands that show cancer-associated hypermethylation in a given patient (i.e., a personalized cancer-associated hypermethylation pattern).
Example 6:
Capturing and sequencing densely methylated viral DNA.
This example shows that densely methylated viral DNA can be captured and sequenced using methods disclosed herein. In this example, densely methylated viral DNA fragments were captured from a patient’s plasma in parallel with densely methylated cell-free DNA fragments derived from the patient’s genome. In this example, a 1 mL plasma sample was obtained from a male patient with HIV who developed diffuse large B-cell lymphoma (DLBCL). It is known that DLBCL in the setting of HIV is often associated with latent Epstein-Barr Virus (EB V) infection of B-cells. The plasma sample was obtained prior to initiation of any therapy. Cell-free DNA (including viral DNA) was extracted from the plasma sample and was tested according to the methods described in Example 1 , with a modification in the bioinformatic analysis to include alignment of DNA fragment sequences to viral genomes including Epstein-Barr Virus, HIV-1, Human Papilloma Virus, Kaposi’s Sarcoma Herpesvirus, Hepatitis B Virus, and Hepatitis C virus, in addition to the human
genome reference (hg38). Figure 11 shows densely methylated cell-free DNA fragments mapping to the EBV genome in the plasma of this patient with HIV and DLBCL. Note the periodicity of sequence coverage suggests phased nucleosomal protection of cfDNA fragments. Red bars in magnified views indicate methylated CpG sites; blue bars indicate unmethylated sites. Notably, no fragments were found to map to the genomes of any of the other viral reference genomes that were included in the bioinformatic analysis (besides EBV), suggesting that other viral DNA was not present in the blood, or if present, may not have been methylated with sufficient density to be captured and sequenced.
Although the present invention has been described in terms of particular embodiments, it is not intended that the invention be limited to these embodiments.
Modifications within the spirit of the invention will be apparent to those skilled in the art. It is appreciated that the previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A method of DNA conversion and amplification with restoration of CpG methylation patterns, comprising the following steps: a. converting unmodified cytosine bases in template DNA molecules to uracil bases by deamination, resulting in converted template DNA molecules; b. performing a polymerase chain reaction (PCR) to generate DNA copies of the converted template DNA molecules; c. methylating cytosine bases at unconverted CpG sites in the DNA copies using an enzyme, thereby providing converted and amplified copies of DNA with CpG methylation patterns restored.
2. A method of enriching DNA molecules based on density of methylated CpG sites while minimizing loss of unique sequences derived from densely methylated DNA molecules, the method comprising: a. converting unmodified cytosine bases in template DNA molecules to uracil bases by deamination, resulting in converted template DNA molecules; b. performing a polymerase chain reaction (PCR) to generate DNA copies of the converted template DNA molecules; c. methylating cytosine bases at unconverted CpG sites in the DNA copies using an enzyme, resulting in DNA copies with restored methylation; d. enriching densely methylated members of the population of DNA copies with restored methylation via selective capture based on methylation density.
3. A method of selectively sequencing densely methylated DNA molecules from a population of DNA molecules, while minimizing the loss of unique sequences derived from said densely methylated DNA molecules, comprising the following steps: a. converting unmodified cytosine bases in template DNA molecules to uracil bases by deamination, resulting in converted template DNA molecules; b. performing a polymerase chain reaction (PCR) to generate DNA copies of the converted template DNA molecules;
c. methylating cytosine bases at unconverted CpG sites in the DNA copies using an enzyme, resulting in converted DNA copies with restored methylation; d. enriching densely methylated members of the population of converted DNA copies with restored methylation through selective capture based on methylation density; e. ligating sequencing-compatible adapters to DNA molecules either before or after any one of steps a, b, c, or d, ultimately resulting in the formation of a sequencing library; f. sequencing at least a portion of the sequencing library to obtain a plurality of sequencing reads.
4. The method of any one of claims 1 , 2, or 3, wherein the template DNA molecules are derived from biological samples, including cells, tissues, or bodily fluids.
5. The method of any one of claims 1, 2, or 3, wherein the template DNA molecules are derived from biological samples, including cells, tissues, or bodily fluids of vertebrate species.
6. The method of any one of claims 1, 2, or 3, wherein the template DNA comprises double stranded DNA.
7. The method of any one of claims 1, 2, or 3, wherein the template DNA comprises single stranded DNA.
8. The method of any one of claims 1 , 2, or 3, wherein the template DNA comprises cell-free DNA derived from a biofluid of a patient.
9. The method of any one of claims 1, 2, or 3, wherein the template DNA comprises cell-free DNA derived from blood.
10. The method of any one of claims 1, 2, or 3, wherein an unmodified cytosine base is a cytosine base lacking an epigenetic modification.
11. The method of any one of claims 1, 2, or 3, wherein 5-methylcytosine bases resist conversion, and are copied as cytosine bases in amplified DNA copies.
12. The method of any one of claims 1, 2, or 3, wherein 5-hydroxymethylcytosine bases resist conversion, and are copied as cytosine bases in amplified DNA copies.
13. The method of any one of claims 1, 2, or 3, wherein the conversion of unmodified cytosine bases to uracil bases is accomplished through bisulfite treatment.
14. The method of any one of claims 1, 2, or 3, wherein the conversion of unmodified cytosine bases to uracil bases is accomplished through enzymatic conversion.
15. The method of any one of claims 1, 2, or 3, wherein the conversion of unmodified cytosine bases to uracil bases is accomplished through a combination of chemical and enzymatic conversion.
16. The method of any one of claims 1, 2, or 3, wherein the polymerase chain reaction replaces uracil bases in the template DNA molecule with thymine bases in the DNA copies of the converted template DNA molecule.
17. The method of any one of claims 1, 2, or 3, wherein the PCR amplification is performed using primers specific to the converted template DNA molecules.
18. The method of any one of claims 1, 2, or 3, wherein the enzyme used to methylate cytosine bases at unconverted CpG sites in the DNA copies is a CpG methyltransferase.
19. The method of claim 1, further comprising a step of verifying the restored CpG methylation patterns through DNA sequencing or methylation-specific analysis.
20. The method of claim 1, wherein the DNA amplification with restored CpG methylation patterns is used for epigenetic analysis, studying gene regulation, identifying disease-associated methylation changes, or personalized medicine.
21. A kit for performing the method of claim 1, comprising reagents and materials necessary for the conversion of unmodified cytosine bases, PCR amplification, and CpG methylation restoration in DNA copies.
22. The kit of claim 21, further comprising user instructions and quality control materials to ensure accurate and reproducible results.
23. The method of claim 2 or 3, wherein selective capture based on methylation density comprises methods involving affinity purification based on binding of methylated CpG sites to methyl binding domain (MBD).
24. The method of claim 2 or 3, wherein selective capture based on methylation density comprises methods involving affinity purification based on binding of methylated 5- methylcytosine to an antibody with affinity for 5-methylcytosine.
25. The method of claim 2 or 3, wherein selective capture based on methylation density comprises methods involving competitive binding in the presence of a methylated competitor DNA whose methylation density can be adjusted to yield a desired methylation density profile of the enriched DNA.
26. The method of claim 2, further comprising a step of verifying the enrichment efficiency by analyzing the density of methylated CpG sites in the captured DNA sequences.
27. The method of claim 2, wherein the enriched DNA sequences are utilized for applications including but not limited to epigenetic analysis, studying gene regulation, identifying disease-associated methylation changes, or personalized medicine.
28. A kit for performing the method of claim 2, comprising necessary reagents and materials for converting unmodified cytosine bases, PCR amplification, CpG methylation, and the enrichment of densely methylated DNA copies.
29. The kit of claim 28, further comprising user instructions and quality control materials to ensure accurate and reproducible results.
30. The method of claim 3, further comprising a step of PCR-amplification of DNA molecules in the sequencing library prior to sequencing in step f.
31. The method of claim 3, further comprising a step of size selection of DNA molecules in the sequencing library prior to sequencing in step f.
32. The method of claim 3, wherein the sequencing in step f is performed using a nextgeneration sequencing platform.
33. The method of claim 3, further comprising a step of demultiplexing the plurality of sequencing reads based on barcodes or indexes.
34. The method of claim 3, wherein the sequencing reads obtained in step f are aligned to a reference genome for analysis.
35. The method of claim 3, wherein the sequencing reads obtained in step f are used for DNA methylation analysis, epigenetic profiling, studying gene regulation, identifying disease-associated methylation changes, or personalized medicine.
36. The method of claim 3, wherein the obtained sequence information is informative of a presence or absence of a cancer, a tissue of origin of a cancer, or a stage of cancer.
37. A kit for performing the method of claim 3, comprising necessary reagents and materials for converting unmodified cytosine bases, PCR amplification, CpG methylation, enrichment of densely methylated DNA copies, and adapter ligation to prepare a next-generation sequencing library.
38. The kit of claim 37, further comprising user instructions and quality control materials to ensure accurate and reproducible results.
39. A method for detecting residual, recurrent, or progressing cancer, the method comprising:
a. sequencing densely methylated DNA fragments from tumor tissue, blood, plasma, serum, or urine of a patient diagnosed with cancer to identify a plurality of aberrantly hypermethylated CpG island regions that are specific to that patient’s cancer; b. obtaining one or more longitudinal samples of blood, plasma, serum, or urine from the patient after the patient has received a cancer treatment; c. sequencing densely methylated DNA fragments from the post-treatment longitudinal sample; d. identifying aberrantly hypermethylated DNA sequences in the post-treatment longitudinal sample that map to the same patient-specific hypermethylated CpG island regions that were identified in the earlier sample, wherein detection of one or more such sequences matching patient- specific hypermethylation patterns in the post-treatment sample is indicative of residual, recurrent, or progressing cancer.
40. The method of claim 39, wherein the sequencing in step a and step c is performed using a next-generation sequencing platform.
41. The method of claim 39, wherein the cancer treatment received by the patient includes surgery, chemotherapy, radiation therapy, immunotherapy, targeted therapy, or a combination thereof.
42. The method of claim 39, wherein the longitudinal samples obtained in step b are collected at predetermined time intervals following the cancer treatment.
43. The method of claim 39, wherein the post-treatment longitudinal samples obtained in step b further comprise tumor tissue or cells collected from the patient.
44. The method of claim 39, further comprising comparing the abundance or frequency of the aberrantly hypermethylated DNA sequences in the post-treatment longitudinal sample with the abundance or frequency of the sequences mapping to the same CpG island regions in the earlier sample.
45. The method of claim 39, wherein the identification of aberrantly hypermethylated DNA sequences in step d is performed using bioinformatics analysis and comparison algorithms.
46. The method of claim 39, further comprising validating the presence of residual, recurrent, or progressing cancer by performing additional diagnostic tests such as imaging, biopsy, or histopathological analysis.
47. The method of claim 39, wherein the plurality of aberrantly hypermethylated CpG island regions specific to the patient's cancer are identified based on a comparison with a reference database or panel of known cancer- associated methylation patterns.
48. The method of claim 39, wherein the patient-specific hypermethylated CpG island regions are used to generate a personalized methylation signature or profile for the patient's cancer.
49. The method of claim 39, wherein the method is used for monitoring the effectiveness of the cancer treatment over time and adjusting the treatment regimen accordingly.
50. The method of claim 39, wherein the method is used for detection residual cancer after therapy to guide decision-making regarding administration of additional therapy.
51. The method of claim 39, further comprising correlating the detection of residual, recurrent, or progressing cancer with clinical outcomes or prognosis of the patient.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363509353P | 2023-06-21 | 2023-06-21 | |
| US63/509,353 | 2023-06-21 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024264010A1 true WO2024264010A1 (en) | 2024-12-26 |
Family
ID=93936411
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/035148 Pending WO2024264010A1 (en) | 2023-06-21 | 2024-06-21 | Methods of enriching and analyzing methylated dna |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024264010A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050196792A1 (en) * | 2004-02-13 | 2005-09-08 | Affymetrix, Inc. | Analysis of methylation status using nucleic acid arrays |
| US20190032148A1 (en) * | 2016-01-29 | 2019-01-31 | Epigenomics Ag | Methods for detecting cpg methylation of tumor-derived dna in blood samples |
| CN110643702A (en) * | 2018-06-26 | 2020-01-03 | 深圳市圣必智科技开发有限公司 | Method for determining methylation level of DNA of specific site in biological sample and application thereof |
| WO2020243609A1 (en) * | 2019-05-31 | 2020-12-03 | Freenome Holdings, Inc. | Methods and systems for high-depth sequencing of methylated nucleic acid |
| WO2022255944A2 (en) * | 2021-06-02 | 2022-12-08 | Lucence Life Sciences Pte. Ltd. | Method for detection and quantification of methylated dna |
-
2024
- 2024-06-21 WO PCT/US2024/035148 patent/WO2024264010A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050196792A1 (en) * | 2004-02-13 | 2005-09-08 | Affymetrix, Inc. | Analysis of methylation status using nucleic acid arrays |
| US20190032148A1 (en) * | 2016-01-29 | 2019-01-31 | Epigenomics Ag | Methods for detecting cpg methylation of tumor-derived dna in blood samples |
| CN110643702A (en) * | 2018-06-26 | 2020-01-03 | 深圳市圣必智科技开发有限公司 | Method for determining methylation level of DNA of specific site in biological sample and application thereof |
| WO2020243609A1 (en) * | 2019-05-31 | 2020-12-03 | Freenome Holdings, Inc. | Methods and systems for high-depth sequencing of methylated nucleic acid |
| WO2022255944A2 (en) * | 2021-06-02 | 2022-12-08 | Lucence Life Sciences Pte. Ltd. | Method for detection and quantification of methylated dna |
Non-Patent Citations (1)
| Title |
|---|
| THERMOFISHER SCIENTIFIC: "Methyl-Seq Direct workflow: a fast method for DNA methylation analysis", APPLIED BIOSYSTEMS / APPLICATION NOTE: SEQSTUDIO GENETIC ANALYZER, 1 January 2020 (2020-01-01), pages 1 - 12, XP093252907, Retrieved from the Internet <URL:https://assets.thermofisher.com/TFS-Assets/GSD/Application-Notes/methyl-seq-direct-workflow-application-note.pdf> * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250346960A1 (en) | Identification and use of circulating nucleic acid tumor markers | |
| JP6977014B2 (en) | Transfer to Natural Chromatin for Individual Epigenomics | |
| Elazezy et al. | Techniques of using circulating tumor DNA as a liquid biopsy component in cancer management | |
| JP6683752B2 (en) | Non-invasive determination of fetal or tumor methylome by plasma | |
| AU2011316807C1 (en) | Varietal counting of nucleic acids for obtaining genomic copy number information | |
| US20170298427A1 (en) | Nucleic acids and methods for detecting methylation status | |
| US20190309352A1 (en) | Multimodal assay for detecting nucleic acid aberrations | |
| JP2021176302A (en) | Deep sequencing profiling of tumors | |
| CN110168108A (en) | Rareness DNA's deconvoluting and detecting in blood plasma | |
| JP2022526415A (en) | Detection of pancreatic ductal adenocarcinoma in plasma | |
| US12428684B2 (en) | Methods for detecting and treating a tumorigenic phenotype of the liver | |
| JP2023528533A (en) | Multimodal analysis of circulating tumor nucleic acid molecules | |
| WO2024020573A1 (en) | Methods for detection and reduction of sample preparation-induced methylation artifacts | |
| EP2912468B1 (en) | Papanicolaou test for ovarian and endometrial cancers | |
| US20240352518A1 (en) | Methods for simultaneous mutation detection and methylation analysis | |
| JP2023527912A (en) | Methods for predicting therapeutic response in cancer | |
| WO2024264010A1 (en) | Methods of enriching and analyzing methylated dna | |
| JP2022512848A (en) | Methods, compositions and systems for calibrating epigenetic compartment assays | |
| US20250230507A1 (en) | Methods and systems for cell-free nucleic acid processing | |
| Picardo | Analysis of tumor eterogeneity in blood and tissue samples | |
| WO2024192294A1 (en) | Methods and systems for generating sequencing libraries | |
| Rosenbaum et al. | Telomemore enables single-cell analysis of cell cycle and chromatin condensation | |
| Yakovenko et al. | Telomemore enables single-cell analysis of cell cycle and chromatin condensation | |
| JP2025538165A (en) | Methods for identifying tumor nucleic acids | |
| CN118139987A (en) | Compositions and methods for CFRNA and CFTNA targeted NGS sequencing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24826797 Country of ref document: EP Kind code of ref document: A1 |