[go: up one dir, main page]

WO2025104430A1 - Procédé pour établir des profils en vue de déterminer des modifications épigénétiques - Google Patents

Procédé pour établir des profils en vue de déterminer des modifications épigénétiques Download PDF

Info

Publication number
WO2025104430A1
WO2025104430A1 PCT/GB2024/052883 GB2024052883W WO2025104430A1 WO 2025104430 A1 WO2025104430 A1 WO 2025104430A1 GB 2024052883 W GB2024052883 W GB 2024052883W WO 2025104430 A1 WO2025104430 A1 WO 2025104430A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
dna
sequencing
methyltransferase
unmodified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/GB2024/052883
Other languages
English (en)
Inventor
Krystian UBYCH
Jack KENNEFICK
Calum MOULD
Robert Neely
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tagomics Ltd
Original Assignee
Tagomics Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB2317422.0A external-priority patent/GB2639522A/en
Application filed by Tagomics Ltd filed Critical Tagomics Ltd
Publication of WO2025104430A1 publication Critical patent/WO2025104430A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • the present application relates to methods of determining polynucleotide modification.
  • Epigenetic modifications of polynucleotides such as methylation and hydroxymethylation of cytosine, play an important role in determining the activity of a gene or a much more extended region of the genome.
  • methylation of DNA is critical in embryogenesis, early development and is known to change predictably in correlation with biological ageing of an organism.
  • aberrant modification of DNA can be an important driver of tumourigenesis, and the broader dysregulation of genes is likely to play a key role in many diseases.
  • current methods for studying epigenetic modifications fundamentally limit the scope of current studies of the epigenome.
  • Methods comprising bisulfite conversion of cytosine to uracil maybe used for epigenetic analysis, but the treatment of DNA with bisulfite can lead to DNA degradation. This limits the application of bisulfite in samples where the DNA quantity is low, as is typical for circulating cell-free DNA (cfDNA) in blood.
  • cfDNA circulating cell-free DNA
  • Enrichment-based approaches for epigenetic profiling allow cost-effective, whole genome profiling that is more suited to the challenges of delivering personalised medicine.
  • the amount of available cfDNA in a blood sample limited the application of either antibodies (“methyl-DNA Immunoprecipitation”, also referred to as “MeDIP-Seq”) or methyl-binding domain protein (referred to as “MBD-Seq”) in the analysis of cfDNA.
  • MBD-Seq methyl-binding domain protein
  • the inventors have developed an epigenetic profiling technique which meets these requirements and overcomes various limitations of existing methods, including those discussed above.
  • the disclosed method is an enzymatic “unmethylome” profiling approach, in which unmodified nucleotides, such as unmodified CpG dinucleotides, in a DNA sample are derivatised in such a way that they can then be isolated without bias and subsequently sequenced.
  • the disclosed technique is an advantageously simple process that can be performed in a one-pot approach, that can be integrated readily with standard high throughput sequencing platforms, and may be used with lower quantities of input DNA than has previously been possible, to generate genome-wide epigenetic profiles.
  • the disclosed method combines a procedure of DNA library preparation for next generation sequencing and a method for labelling unmodified, such as epigenetically unmodified, nucleotides. Using the label, the DNA library is subsequently fractionated into modified and unmodified fractions.
  • the method advantageously minimises the number of DNA purification steps required and is highly efficient.
  • the disclosed enzymatic platform can be applied at DNA concentrations that are compatible with single-cell analysis (picogram inputs).
  • the approach has been found to be highly reproducible and unbiased, and maybe used as a platform for the diagnosis of disease and the identification of tissue of origin in a sample.
  • the underlying chemistry requires no a priori assumptions to be made about the sample, making the platform ideally suited for the discovery of novel biomarkers of disease.
  • a method of determining the modification status of nucleotide residues in a polynucleotide sample comprising the steps of:
  • a methyltransferase enzyme configured to modify a nucleotide residue in a target position to apply a tag to each unmodified nucleotide residue in a polynucleotide of the sample, wherein each unmodified nucleotide residue is unmodified in the target position;
  • the method comprises sequencing the polynucleotides of the first fraction only.
  • the method may comprise sequencing the polynucleotides of the first fraction, and using the sequencing information to determine the modification status of nucleotide residues in the polynucleotide sample.
  • the terms “fractionated”, “fractionating”, “fractionation”, and similar terms, in relation to the sequencing library refer to the separation of the polynucleotides of the sequencing library into different groups or
  • polynucleotides maybe fractionated based on the presence or absence of an associated affinity label, thereby forming a first fraction that is enriched for polynucleotides comprising an affinity label and a second fraction that is enriched for polynucleotides that do not comprise an affinity label.
  • the fractionation of the polynucleotides may also be described and/or referred to as “enriching” and/or “enrichment” of the sequencing library for polynucleotides having or not having the one or more specific features.
  • a fraction of polynucleotides may be referred to as being “enriched” for labelled or unlabelled polynucleotides, as appropriate.
  • site-specific refers to the application of a tag to a target atom within a nucleotide residue having a particular configuration.
  • the particular configuration maybe the absence of a specific modification, such as a cytosine that is unmethylated in the C5 position, and the tag may be bound to an atom that would otherwise have been bound by the modification.
  • the tag is not a fluorescent tag.
  • the label is not a fluorescent label.
  • neither the tag nor the label is fluorescent.
  • the label and/ or tag does not consist of or comprise a fluorophore or fluorophore derivative that is capable of emitting light when excited, such as re-emitting light upon light excitation.
  • the method preferably does not comprise the use of bisulfite, such as bisulfite conversion of cytosine to uracil.
  • the method preferably does not comprise pyridine borane base conversion or enzymatic deamination of unmethylated cytosine.
  • the method preferably does not comprise the use of methyl-binding antibodies or methyl-DNA Immunoprecipitation (also referred to as “MeDIP-Seq”).
  • the method preferably does not comprise the use of methyl-binding domain protein (referred to as “MBD-Seq”).
  • the sequencing library may be suitable for use with high throughout sequencing methods. Sequencing the polynucleotides preferably comprises the use of next generation sequencing, such as sequencing applications on the Illumina platform.
  • the method preferably does not comprise the use of a nanopore-based sequencing method.
  • the polynucleotide maybe a DNA sample, or maybe a mixed sample, comprising DNA and RNA. In some embodiments, the sample maybe a DNA sample comprising an epigenome.
  • the polynucleotide sample may comprise or substantially consist of, and/or may have been fragmented to comprise or substantially consist of, polynucleotides having a length between to and 500 bp. In some embodiments, the polynucleotides may substantially or predominantly have a length of 50-475 nucleotides, such as 100-450 nucleotides, 125-425 nucleotides, or 150-400 nucleotides.
  • the polynucleotide sample may comprise or substantially consist of, and/or may have been fragmented to comprise or substantially consist of, polynucleotides having a length corresponding to the DNA sequencing read length.
  • the polynucleotides may substantially or predominantly have a length of 100-250 nucleotides, preferably 150-180 nucleotides.
  • the method may further comprise a step of amplifying the polynucleotides.
  • the method may comprise the amplification of the polynucleotides after inactivation of the methyltransferase (i.e. after step 2 or step 3).
  • the method may comprise the amplification of the polynucleotides of the sequencing library after binding of the affinity label (i.e. after step 4).
  • the method may comprise the amplification of the polynucleotides of the first and/ or second fraction (i.e. after step 5).
  • the polynucleotide may be DNA. Accordingly, the method may comprise determining the modification status of nucleotide residues in a DNA sample. As such, the method may comprise the steps of:
  • a methyltransferase enzyme configured to modify a nucleotide residue in a target position to apply a tag to each unmodified nucleotide residue in a DNA molecule of the sample wherein each unmodified nucleotide residue is unmodified in the target position;
  • the method may comprise sequencing the DNA of the first fraction only.
  • the method may further comprise a step of amplifying the DNA.
  • the method may comprise the amplification of the DNA after inactivation of the methyltransferase (i.e. after step 2 or step 3).
  • the method may comprise the amplification of the sequencing library, after binding of the affinity label (i.e. after step 4).
  • the method may comprise the amplification of the DNA of the first and/or second fraction (i.e. after step 5).
  • enriched refers to a polynucleotide concentration and/or proportion that is greater than the corresponding polynucleotide concentration and/or proportion in the initial (unenriched) sample.
  • References to “enriching the labelled polynucleotide library” and similar terms refer to fractionating the sequencing library/polynucleotide sample into first and second fractions, wherein the first fraction is enriched for polynucleotides comprising an affinity label, and wherein the second fraction is enriched for polynucleotides lacking an affinity label.
  • references to “enriching the labelled sequencing library” or “enriching the labelled DNA library” and similar terms refer to fractionating the sequencing library/ polynucleotide /DNA sample into first and second fractions, wherein the first fraction is enriched for polynucleotides/ DNA molecules comprising an affinity label, and wherein the second fraction is enriched for polynucleotides/DNA molecules lacking an affinity label.
  • step 1 Applying a tag to unmodified nucleotide residues (step 1) is performed before inactivation of the methyltransferase (step 2) and before binding an affinity label to each tag (step 4).
  • the library preparation step (step 3) is performed before the fractionation step (step 5).
  • the steps of the method are performed in the numerical sequence in ascending order (i.e. in sequence from step 1 to step 6).
  • the library preparation step (step 3) maybe performed before the affinity labelling and fractionation steps (i.e. before steps 4 and 5).
  • step 3 may be performed after step 4, and before step 5.
  • steps 1-4 are performed before fractionation of the sequencing library (step 5), and sequencing the polynucleotides of the first and second fractions (step 6) is performed after fractionation (step 5).
  • one or more aspect of the library preparation step may be performed before step 1.
  • the remaining step(s) of library preparation maybe performed subsequently, such as after the inactivation of the methyltransferase (step 2), or after affinity labelling (step 4).
  • references to “preparing the polynucleotide sample into a sequencing library” or to “preparing the DNA sample into a sequencing library” refer to completion of the preparation of the sequencing library.
  • the polynucleotide sample may be prepared into an affinity labelled sequencing library (steps 1-4) prior to fractionation.
  • the method may be a “one pot” method, wherein all of the steps prior to fractionation (i.e. steps 1-4) are performed in a single container, thereby providing significant efficiencies in terms of time and reagents, and advantages in terms of automation. This approach has been found to maximise sensitivity, time, reagents and the overall yield of the polynucleotide enrichment.
  • the method may comprise at most one sample purification step prior to fractionation (i.e. prior to step 5).
  • the preparation of a labelled sequencing library (steps 1-4) may involve at most one sample purification step.
  • binding an affinity label to each tag may comprise adding an affinity label precursor directly into the sequencing library preparation mixture, without a washing step.
  • label refers to the targeted binding (covalent or otherwise, either directly or indirectly) of a compound that facilitates selective enrichment of the targeted polynucleotides.
  • the label may be suitable for use for enriching the labelled polynucleotides from a mixture comprising labelled and unlabelled polynucleotides.
  • the label may be suitable for use directly for enriching the labelled polynucleotides.
  • a secondary compound that specifically binds the label preferably with a high affinity, may be used for enrichment of the labelled polynucleotides.
  • the affinity label may comprise biotin
  • fractionating the sequencing library into first and second fractions may comprise fractionation using a capture agent comprising a biotin-binding protein.
  • method of and “method for”, such as “method of determining” and “method for determining” are intended to be interpreted interchangeably, to encompass methods “suitable for” the described purpose.
  • the inventors have found that the disclosed method provides significant advantages over previous methods of epigenetic analysis.
  • the non-destructive nature of the disclosed method means that epigenetic analysis of the sample may be performed in parallel with other analytical approaches, such as nucleotide sequencing.
  • the inventors have found that the disclosed method is highly efficient, allowing analysis of very low levels of input sample. Moreover, the inventors have identified that previous “unmethylome” profiling approaches introduce bias in relation to densely modified polynucleotides, and the disclosed method avoids this detrimental bias. As discussed herein, these advantages have been made possible by minimising the loss of sample, by performing various operations in specific sequences and combinations.
  • the method may be a “one pot” method, wherein all of the steps, or all of the steps prior to fractionation (i.e. steps 1-4), are performed in a single container, thereby providing significant efficiencies in terms of time and reagents, and advantages in terms of automation. This approach has been found to maximise sensitivity, time, reagents and the overall yield of the polynucleotide enrichment.
  • nucleotide modification status and “modification status of nucleotide residues” as used herein, unless otherwise stated, refer to the presence (modified) or absence (unmodified) of any chemical modification in a target position on a nucleotide residue that may be catalysed by a methyltransferase enzyme.
  • the disclosed method may be used to determine the modification status of any position within a nucleotide that may be chemically modified by a methyltransferase enzyme.
  • methyltransferase catalysed modifications include, for example, the modification of cytosine (at the C5 or N4 position), adenine (at the N6 position).
  • the nucleotide residues maybe cytosine residues and/or adenine residues.
  • the method may be a method for determining the modification status at target positions of cytosine and/ or adenine residues in a polynucleotide.
  • the method may be a method for determining the modification status at the cytosine
  • each CpG dinucleotide of a DNA sample comprising the steps of: 1. using a methyltransferase enzyme configured to modify the cytosine C5 position of a CpG dinucleotide motif to apply a tag to each unmodified cytosine residue of the sample, wherein each unmodified cytosine residue is the cytosine of a CpG dinucleotide motif that is unmodified in the C5 position;
  • the method comprises sequencing the DNA of the first fraction only. In some embodiments, the method may further comprise sequencing the DNA of the first fraction, and using the sequencing information to determine the modification status at the cytosine C5 position of each CpG dinucleotide in the DNA sample.
  • the method may further comprise a step of amplifying the DNA. In some embodiments, the method may comprise the amplification of the DNA after inactivation of the methyltransferase (i.e. after step 2 or step 3). In some embodiments, the method may comprise the amplification of the sequencing library, after binding of the affinity label (i.e. after step 4). In some embodiments, the method may comprise the amplification of the DNA of the first and/or second fraction (i.e. after step 5).
  • modified status of cytosine residues refers to the presence (modified) or absence (unmodified) of any methyltransferase catalysed chemical modification of cytosine.
  • the modification may comprise modification at the C5 position of cytosine.
  • a cytosine residue may be understood to have the following structure:
  • R 1 may be understood to be H.
  • R 1 may also be referred to herein as the “C5 position”.
  • modified cytosine refers to any cytosine residue that has been modified in any way at the C5 position, including, in particular, 5-methylcytosine (5-mC) and its oxidized products 5- hydroxymethylcytosine (5-hmC), 5-formylcytosine (5-fC) and 5-carboxylcytosine (5- caC). It maybe therefore appreciated that in modified cytosine, R 1 maybe methyl, CH 2 0H, COH or COOH. Alternatively, the modification may comprise modification at the N4 position of cytosine. Accordingly, a modified cytosine may have the following structure: where R 1 maybe anything other than H. Thus, “modified cytosine” refers to any cytosine residue that has been modified in anyway at the N4 position. It maybe therefore appreciated that in modified cytosine, R 1 maybe methyl, CH 2 0H, COH or COOH.
  • the method may be a method of determining the modification status of cytosine residues in a DNA sample, the method comprising the steps of:
  • the method comprises sequencing the DNA of the first fraction only. In some embodiments, the method may further comprise sequencing the DNA of the first fraction, and using the sequencing information to determine the modification status at the N4 position of each cytosine residue in the DNA sample.
  • modified status of adenine residues refers to the presence (modified) or absence (unmodified) of any chemical modification at the N6 position of adenine.
  • An adenine residue maybe understood to have the following structure:
  • R 1 In an unmodified adenine residue R 1 maybe understood to be H.
  • modified adenine residue may be anything other than H.
  • modified adenine refers to any adenine residue that has been modified in anyway at the N6 position, including, in particular, 6 -methyladenine (m 6 A). It maybe therefore appreciated that in modified adenine, R 1 maybe methyl, CH 2 0H, COH or COOH.
  • the method may be a method of determining the modification status of adenine residues in a polynucleotide sample, the method comprising the steps of:
  • the method comprises sequencing the DNA of the first fraction only. In some embodiments, the method may further comprise sequencing the DNA of the first fraction, and using the sequencing information to determine the modification status at the N6 position of each adenine residue in the DNA sample.
  • unmodified and “unmethylated” refer to all nucleotides that are unmodified in any way (such as methylated, hydroxymethylated, carboxylated, acylated).
  • the method comprises the detection of unmodified nucleotide residues in a polynucleotide.
  • modified status may also be referred to as the “profile”.
  • the disclosed method maybe used to determine a profile of methyltransferase catalysed modifications within a polynucleotide sample.
  • the method may comprise the detection of unmodified cytosine residues in CpG dinucleotides of a DNA sample.
  • CpG CpG site
  • CpG dinucleotide are used interchangeably herein to refer to a cytosine-phosphate-guanine sequence in a 5’ to 3’ direction in the backbone of a nucleic acid.
  • CpG modification status and “modification status” as used interchangeably herein, unless otherwise stated, refer to the presence (modified) or absence
  • the “CpG modification status” and “modification status” may also be referred to as the “profile”.
  • the profile may be referred to as the “unmethylome profile” or “unmethylome”.
  • the polynucleotide maybe a DNA sample, or maybe a mixed sample, comprising DNA and RNA.
  • the sample may be a DNA sample comprising an epigenome.
  • epigenome and “epigenetic” as used herein, unless otherwise specified, refer to the chemical modification of a polynucleotide or genome in such a way that gene expression is regulated.
  • the method may be a method of determining the epigenetic profile of a genomic DNA sample, the method further comprising determining the epigenetic profile based on the modification status of nucleotide residues in the sample.
  • the method may comprise determining the epigenetic profile based on the modification status of cytosine residues in CpG dinucleotides of the sample, i.e. the CpG modification status.
  • the method may be a method of analysing a polynucleotide sample, such as a DNA sample, from a subject.
  • the method may be a method for determining the modification status of one or more specific nucleotides in a polynucleotide sample from a subject.
  • the method may be a method for determining the modification status of the cytosine residue in one or more specific CpG dinucleotides of a biomarker in a sample from a subject.
  • the method may be a method for determining the modification status of the cytosine residues in one or more CpG dinucleotides of a plurality of genomic regions in a sample from a subject.
  • the method may be a method for determining the modification status of one or more specific adenine nucleotides of a biomarker in a sample from a subject.
  • the method may be a method for determining the modification status of one or more adenine residues in a plurality of genomic regions in a sample from a subject.
  • the method may be a method for determining the modification status of one or more specific adenine nucleotides of one or more biomarkers in a sample from a subject.
  • the method may be an in vitro method performed on a polynucleotide sample that has previously been obtained from a subject.
  • subject may refer to any type of organism, including for example, a mammalian species (such as a human or domesticated animal), other animal species, a plant such as a crop, or other type of organism, including single celled organisms, and viruses.
  • the subject may be a developing organism, such as an embryo or foetus.
  • the subject may be a healthy individual.
  • the subject may be an individual that has, or is suspected of having, a disease or predisposition to a disease.
  • the subject may be an individual in need of therapy or suspected of needing therapy.
  • the method may be a method of determining the disease status of a subject. Accordingly, the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, and determining the disease status based on the modification status. For example, the method may comprise determining the modification status of cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, and determining the disease status based on the CpG modification status. The method may be a method of diagnosing a disease in a subject.
  • the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, and diagnosing the disease based on the modification status.
  • the method may comprise determining the modification status of cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, and diagnosing the disease based on the CpG modification status.
  • the method may be a method of making a disease prognosis in a subject.
  • the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, and making a disease prognosis based on the modification status.
  • the method may comprise determining the modification status of cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, and making a disease prognosis based on the CpG modification status.
  • Methods comprising making a determination based on the nucleotide modification status may comprise comparing the modification status of specific nucleotide residues in a polynucleotide from the subject to the modification status of the corresponding residues in a reference sample. Accordingly, methods comprising making a determination based on the CpG modification status may comprise comparing the modification status of cytosine residues in CpG dinucleotides of a polynucleotide from the subject to the CpG modification status of the corresponding residues in a reference sample.
  • the reference sample may comprise a polynucleotide from a healthy subject.
  • the reference sample may comprise a polynucleotide from a diseased subject.
  • the reference sample may comprise a polynucleotide from the same subject as the test sample, taken at a different time point and/or from a different location in the body. Differences in the nucleotide modification status between the test and reference samples may be indicative of the presence or absence of a particular phenotype or clinical feature.
  • the method may be a method of treating a subject. Accordingly, the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, diagnosing a disease based on the nucleotide modification status, and providing a therapeutic composition to the subject to treat the disease based on the diagnosis. For example, the method may comprise determining the modification status of cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, diagnosing a disease based on the CpG modification status, and providing a therapeutic composition to the subject to treat the disease based on the diagnosis.
  • the method may be a method of determining a personalised or precision method of treatment for a subject. Accordingly, the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, determining the disease status of the subject based on the nucleotide modification status, and determining a personalised medical treatment for the subject based on the disease status.
  • the method may comprise determining the modification status of specific cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, determining the disease status of the subject based on the CpG modification status, and determining a personalised medical treatment for the subject based on the disease status.
  • the method may be a personalised or precision method of treating a subject.
  • the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, determining the disease status of the subject based on the nucleotide modification status, and providing a personalised medical treatment to the subject based on the disease status.
  • the method may comprise determining the modification status of specific cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, determining the disease status of the subject based on the CpG modification status, and providing a personalised medical treatment to the subject based on the disease status.
  • the subject may be an individual that has been diagnosed with having a disease.
  • the subject may be an individual that has been identified as being predisposed to, or at risk of having, a disease.
  • the subject may be an individual that has not been diagnosed with having a disease.
  • the subject may be an individual that has been diagnosed with cancer.
  • the subject may be pending or undergoing treatment such as a cancer therapy.
  • the subject can be in remission of a cancer.
  • Cancer can be identified on the basis of epigenetic variations. Cancer may be associated with both DNA hypomethylation and hypermethylation, but these two types of epigenetic abnormalities may affect different DNA sequences, and occur at different stages of cancer progression.
  • genomic hypermethylation in cancer maybe seen in CpG islands in gene regions, whereas hypomethylation may be observed in repeated DNA sequences in cancer, including heterochromatic DNA repeats, retrotransposons, and endogenous retroviral elements.
  • unique sequences, such as transcription control sequences are often subject to cancer-associated hypomethylation.
  • the method maybe a method of diagnosing cancer in subject.
  • the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, and making a cancer diagnosis based on the nucleotide modification status.
  • the method may comprise determining the modification status of cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, and making a cancer diagnosis based on the CpG modification status.
  • the method maybe a method of detecting cancer in a subject.
  • the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, and determining the presence or absence of cancer based on the modification status of the nucelotide residues.
  • the method may comprise determining the modification status of cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, and determining the presence or absence of cancer based on the modification status of the cytosine residues.
  • the method may comprise determining the modification status of adenine residues in a sample from the subject using the disclosed method, and determining the presence or absence of cancer based on the modification status of the adenine residues.
  • the method may comprise the analysis of a plurality of genomic regions, and detecting the presence or absence of cancer from the modification status of nucleotide residues in the plurality of genomic regions.
  • the method may be a method of detecting any type of cancer. Different types of cancer may be preferentially detected and/ or analysed using different sampling approaches based on the disclosed method.
  • the method may be a method of treating cancer in a subject. Accordingly, the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, diagnosing a cancer based on the nucleotide modification status, and providing a therapeutic composition to the subject to treat the cancer based on the diagnosis. For example, the method may comprise determining the modification status of cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, diagnosing a cancer based on the CpG modification status, and providing a therapeutic composition to the subject to treat the cancer based on the diagnosis.
  • the method may be a method of determining a personalised or precision method of cancer treatment for a subject. Accordingly, the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, determining the genetic profile of the cancer based on the nucleotide modification status, and determining a personalised medical treatment for the subject based on the genetic profile of the cancer.
  • the method may comprise determining the modification status of specific cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, determining the genetic profile of the cancer based on the CpG modification status, and determining a personalised medical treatment for the subject based on the genetic profile of the cancer.
  • the method may be a personalised or precision method of treating cancer in a subject.
  • the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, determining the genetic profile of the cancer based on the nucleotide modification status, and providing a personalised medical treatment to the subject based on the genetic profile of the cancer.
  • the method may comprise determining the modification status of specific cytosine residues in CpG dinucleotides of a polynucleotide in a sample from the subject using the disclosed method, determining the genetic profile of the cancer based on the CpG modification status, and providing a personalised medical treatment to the subject based on the genetic profile of the cancer.
  • the method may be an in vitro method performed on a DNA sample that has previously been obtained from a tissue biopsy.
  • Biopsy is a diagnostic procedure for cancers and other diseases.
  • tissue biopsy may provide material for cancer genotyping, which may assist in the design of targeted therapeutic approaches.
  • the method may be a method of genotyping a cancerous or otherwise diseased tissue. Accordingly, the method may comprise determining the modification status of nucleotide residues in a polynucleotide from a biopsy of the tissue using the disclosed method. For example, the method may comprise determining the modification status of cytosine residues in CpG dinucleotides of a polynucleotide from a biopsy of the tissue using the disclosed method. The method may further comprise designing a targeted therapeutic approach based on the nucleotide modification status.
  • the method may be performed on a DNA sample that has previously been obtained from a biopsy of any type of tissue from a subject.
  • Liquid biopsy which has the advantage of minimal invasiveness, has shown potential in detecting cancers, including early stage cancers and pre-cancerous lesions.
  • cfDNA cell-free DNA
  • circulating cfDNA which refers to DNA present at very low concentration in various bodily fluids, comprises extracellular nucleic acid fragments, for example, released by damaged cells during apoptosis, necrosis, or secretion.
  • cfDNA has been found to exhibit the genetic and epigenetic alterations of cancers, including mutations, copy number alterations, chromosomal rearrangements, hypermethylation, and hypomethylation.
  • the analysis of cfDNA has the potential to revolutionise the detection of early stage cancers and other diseases.
  • cfDNA may comprise circulating tumor DNA (“ctDNA”), which is cell free tumor-derived fragmented DNA in a bodily fluid.
  • cfDNA may comprise ctDNA.
  • the disclosed method has advantageously been found to be capable of providing high quality and consistently reproducible results from the very low concentrations of nucleic acid that are typically present in liquid biopsy (such as circulating cfDNA) samples.
  • the sample for use in the disclosed method may be a cfDNA sample.
  • the sample may consist of or comprise ctDNA.
  • liquid biopsy samples may be used for the analysis of cfDNA, including blood, plasma, urine, and spinal fluid.
  • the liquid biopsy sample may be a blood or plasma sample.
  • the method may be an in vitro method of diagnosing disease in a cfDNA sample from a subject.
  • the method may comprise determining the modification status of nucleotide residues in a cfDNA sample from the subject, and diagnosing the disease based on the nucleotide modification status.
  • the method may comprise determining the modification status of cytosine residues in CpG dinucleotides of a cfDNA sample from the subject, and diagnosing the disease based on the CpG modification status.
  • the method may also be used to determine the tissue of origin of the nucleic acid present in a liquid biopsy sample.
  • the method may be a method of identifying the cellular origin of cfDNA in a sample from a subject. Accordingly, the method may comprise determining the modification status of nucleotide residues in a polynucleotide in a sample from the subject using the disclosed method, and identifying the cellular origin of the DNA based on the modification status of nucleotide residues in the sample.
  • the method may be a method of diagnosing the recurrence of cancer in a subject. Accordingly, the method may comprise determining the modification status of nucleotide residues in a cfDNA sample from the subject, comparing the nucleotide modification status to the nucleotide modification status of a tumour sample from the subject that has previously been determined using the disclosed method, and diagnosing the recurrence of the cancer in the subject based on the comparison.
  • the cfDNA sample from the subject may be a blood sample.
  • the tumour sample from the subject may be a sample of a solid tumour, for example previously obtained from the subject in a surgical procedure.
  • the method may be a method of sequencing polynucleotides, the method comprising sequencing the polynucleotides of the first and/or second fraction to generate a plurality of sequencing reads.
  • the method may further comprise comparing the sequences of the sequencing reads to a reference sequence or genome to determine the genomic location of the sequencing reads.
  • the method may be a method of preparing a profile of the modification status of nucleotide residues in a reference sequence, such as one or more regions of a genome.
  • step 6 of the method may comprise sequencing the polynucleotides of the first and/or second fraction to generate a plurality of sequencing reads, comparing the sequences of the sequencing reads to the reference sequence, such as a reference genome or genomic region, to determine the location, such as the genomic location, of the sequencing reads within the reference sequence and thereby the presence or otherwise of unmodified nucleotide residues at specific locations within the reference sequence, such as the one or more regions of the genome.
  • the reference sequence such as a reference genome or genomic region
  • the method may be a method of preparing a profile of the modification status at the cytosine C5 position of each CpG dinucleotide in a reference sequence, such as one or more regions of a genome.
  • step 6 of the method may comprise sequencing the polynucleotides of the first and/or second fraction, to generate a plurality of sequencing reads, comparing the sequences of the sequencing reads to the reference sequence, such as a reference genome or genomic region, to determine the location, such as the genomic location, of the sequencing reads within the reference sequence and thereby the presence or otherwise of unmodified cytosine residues in specific CpG dinucleotides within the reference sequence, such as across the genome or across one or more regions of the genome.
  • the method may be a method of preparing a profile of the modification status of cytosine residues at the N4 position, in a reference sequence, such as one or more regions of a genome.
  • step 6 of the method may comprise sequencing the polynucleotides of the first and/or second fraction to generate a plurality of sequencing reads, comparing the sequences of the sequencing reads to the reference sequence, such as a reference genome or genomic region, to determine the location, such as the genomic location, of the sequencing reads within the reference sequence and thereby the presence or otherwise of cytosine residues unmodified at the N4 position, at specific locations within the reference sequence, such as the across the genome or across one or more regions of the genome.
  • the method may be a method of preparing a profile of the modification status of adenine nucleotides at the N6 position, in a reference sequence, such as one or more regions of a genome.
  • step 6 of the method may comprise sequencing the polynucleotides of the first and/or second fraction to generate a plurality of sequencing reads, comparing the sequences of the sequencing reads to the reference sequence, such as a reference genome or genomic region, to determine the location, such as the genomic location, of the sequencing reads within the reference sequence and thereby the presence or otherwise of adenine residues unmodified at the N6 position, at specific locations within the reference sequence, such as the across the genome or across one or more regions of the genome.
  • the sample for use in the disclosed method may be obtained from any type of cell or tissue.
  • the sample maybe obtained from tissue, blood, plasma, serum, urine, saliva, stool, cerebrospinal fluid, buccal swab, pleural tap, etc.
  • the sample may be obtained from tissue.
  • the sample may be obtained from blood.
  • the sample may be a DNA sample.
  • the DNA sample may be a cfDNA sample, which may comprise ctDNA.
  • the DNA sample maybe a cfDNA sample from peripheral blood.
  • a “cell- free” sample as used herein, refers to nucleic acids not contained within or otherwise bound to a cell or, remaining in a sample following the removal of intact cells.
  • Cell-free nucleic acids can include, for example, all non-encapsulated nucleic acids sourced from a bodily fluid (e.g., blood, plasma, serum, cerebrospinal fluid, etc.) from a subject.
  • the cfDNA may be released into bodily fluid through secretion or a cell death process.
  • the cfDNA may comprise DNA released into bodily fluid from cancer cells, and may be referred to as comprising circulating tumor DNA (ctDNA).
  • the cfDNA may be released from healthy cells.
  • the sample may comprise fragmented DNA. Fragmentation may be performed using any method used in the analysis of DNA, such as any fragmentation method used in the preparation of a DNA sample for genetic sequencing.
  • the DNA may be fragmented enzymatically, chemically by acoustic shearing, mechanical shearing (example, French pressure cells), sonicating, hydrodynamic shearing or chemically (for example, heat and divalent metal cation).
  • fragmentation of the DNA sample is not required.
  • the method does not include fragmentation of the DNA sample.
  • the DNA sample may be used directly in the disclosed method.
  • cfDNA extracted from blood substantially comprises DNA fragment sizes of 50-300 base pairs (bp) in length.
  • the method may comprise the selection of polynucleotides, such as DNA fragments, of a desired length.
  • the method may comprise, before the use of a methyltransferase (step 1), a step of fragmenting the polynucleotide and/or selecting polynucleotides of a desired length.
  • the method may comprise the use of polynucleotides, such as DNA fragments, substantially or predominantly having a length between 10 and 500 bp, such as between 30 and 400 bp, and preferably between 50 and 300 bp in length.
  • the method may comprise the use of polynucleotides, such as DNA fragments, substantially or predominantly having a length in the region of between too and 250 bp, preferably between about 150 and 180 bp, to match the sample to the DNA sequencing read length.
  • the method may comprise the use of polynucleotides corresponding to an amount of
  • DNA in the range of about 1 fg to about 1 pg such as about 10 fg to about too ng, about ioo fg to about 10 ng, about i pg to about i ng.
  • the sample may comprise a quantity of DNA in the picogram range.
  • the sample may comprise less than ipg of DNA, such as less than 500ng, less than loong, or less than long of DNA.
  • the sample comprises between ing and loong of DNA.
  • the polynucleotide may be derivatised using a methyltransferase enzyme to apply a tag to unmodified nucleotide residues.
  • the polynucleotide maybe derivatised using an appropriate methyltransferase enzyme to apply a tag to specific nucleotide residues that are unmodified in target positions.
  • a methyltransferase enzyme may be used that is configured to apply a tag to the target nucleotide residue in the target position.
  • the same tag may be used to derivatise different unmodified nucleotides and/or different target positions.
  • tags may be used to derivatise different unmodified nucleotides and/or different target positions.
  • the polynucleotide maybe derivatised with different tags (e.g. on different unmodified nucleotides) sequentially, for example, with the inactivation of the first methyltransferase before the addition of a second, different methyltransferase.
  • tags e.g. on different unmodified nucleotides
  • the use of a plurality of different tags advantageously allows the tags to be independently functionalised, for example, to provide selective enrichment/ fractionation.
  • references to the “target position” in which an unmodified nucleotide residue is unmodified refer to a specific position within the chemical structure of the nucleotide.
  • the tag may be applied in the cytosine C5 position. Accordingly, the fragmented DNA sample will comprise tagged residues.
  • a tagged cytosine residue may be understood to have the following structure: wherein R 2 is the tag.
  • the tag maybe applied in the N4 position in cytosine. Accordingly, the fragmented DNA sample will comprise tagged residues.
  • a tagged cytosine residue may be understood to have the following structure: wherein R 2 is the tag.
  • a tag may be added at the N6 position of adenine, the N2 or N7 position of guanine or at the 2’-0H position of ribose.
  • a tag at the 2’-0H position of ribose is a tag at the 2’-0H position of a terminal ribose.
  • a tagged adenine residue may be understood to have the following structure: wherein R 2 is the tag.
  • tag which may also be referred to as a “linker”, a “functional linker” or “DNA tag”, as used herein, unless otherwise specified, refers to a reactive moiety that is applied site-specifically to the polynucleotide, such as fragmented DNA. Polynucleotides that have been tagged in this way may be referred to as “derivatised”.
  • the disclosed method may comprise the use of a methyltransferase cofactor analogue, such as a synthetic methyltransferase cofactor analogue, comprising the tag and a methyltransferase-binding moiety.
  • a methyltransferase cofactor analogue such as a synthetic methyltransferase cofactor analogue, comprising the tag and a methyltransferase-binding moiety.
  • the method may comprise the use of a methyltransferase enzyme to catalyse the transfer of the tag from the methyltransferase cofactor analog to an unmodified nucleotide residue, such as to the C5 position of a cytosine base of an unmodified CpG dinucleotide, in a polynucleotide sample.
  • a modification such as a methyl group or other chemical modification of the nucleotide residue, such as in the C5 position within a CpG dinucleotide, prevents the transfer of the tag.
  • only nucleotides, such as CpG dinucleotides, that are unmodified (such as unmethylated) in this position may be labelled with a tag.
  • the methyltransferase cofactor maybe an ion of formula (I): wherein, X is S or Se;
  • L 1 is -CH 2 - or -CH2CH2-;
  • R 2 is the tag
  • R3 and R 4 are independently H or an optionally substituted C1-6 alkyl an optionally substituted C 2 -6 alkenyl or an optionally substituted C 2 -6 alkynyl; or R3 and R 4 together with the nitrogen to which they are attached, form an optionally substituted 5- or 6- membered heterocyclyl ring; and Rs is NH 2 , NHBOC or H; or a salt, solvate or tautomer thereof.
  • the ion of formula (I) may be provided together with a counterion.
  • the counterion maybe an organic or inorganic anion carrying one or more negative charges.
  • the counterion may be formate or acetate.
  • R 2 may be -CH 2 -U-[L3] m -[HM]n-[L 2 ] p -[R 6 ]q, wherein: m, n, p and q are each independently selected from o and 1; L 2 is a linker;
  • HM is a hydrolysable moiety
  • L 3 is a linker
  • U is an unsaturated group selected from an alkene, an alkyne, an aromatic group (e.g. aryl), a carbonyl group, SO and S0 2 ;
  • R 6 is a heavy atom or a heavy atom cluster suitable for phasing of X-ray diffraction data, a radioactive or stable rare isotope, a fluorophore, a fluorescence quencher, an affinity tag, a crosslinking agent, a nucleic acid cleaving reagent, a spin label, a chromophore, a protein, peptide or amino acid which may optionally be modified a nucleotide, nucleoside or nucleic acid which may optionally be modified, a carbohydrate, a lipid, a transfection reagent, an intercalating agent, a nanoparticle or bead, or a functional group, wherein the functional group is selected from the group consisting of: an amino group (including a protected amino), a thiol group, a 1,2-diol group, a hydrazino group, a hydroxyamino group, a haloacetamide group, a male
  • a cycloalkyl group e.g. a C 3 -6 cycloalkyl
  • a halo group e.g. -F, -Cl, -Br, -I
  • an aldehyde group e.g. a ketone group
  • a 1,2-aminothiol group e.g. a 1,2-aminothiol group
  • a azido group e.g. a isothiocyanate or thiocyanate group
  • an alkene group such as a terminal alkene
  • an alkyne group such as a terminal alkyne group
  • a 1,3-diene function e.g.
  • R 4 is an optionally substituted Ci- 4 alkyl an optionally substituted C 2.4 alkenyl or an optionally substituted C 2-4 alkynyl
  • the alkyl, alkenyl or alkynyl may be unsubstituted or substituted with one or more substituents selected from the group consisting of: -NR 7 R 8 ; -OH; -SH; -CN; -C(O)OR 7 ; -C(O)R 7 ; C(O)NR 7 R 8 ; N 3 ; and halo, wherein R 7 and R 8 are independently H or a Ci- 4 alkyl.
  • Halo may be F, Cl, Br or I.
  • the 5- or 6-membered heterocyclyl ring may be unsubstituted, or substituted with one or more substituents selected from the group consisting of: -NR 7 R 8 ; -OH; -SH; -CN; - C(O)OR 7 ; -C(0)R7; C(O)NR 7 R 8 ; N 3 ; and halo, wherein R 7 and R 8 are independently H or a Ci-4 alkyl. Halo may be F, Cl, Br or I.
  • Synthetic methyltransferase cofactors are described in more detail in PCT/GB2022/052438, EP3186266B1 and US8008007B2. It maybe appreciated that preferred embodiments of the X, L 1 , R 2 , R 3 and R 4 groups in the compound of formula (I) may be as defined for the equivalent groups in these applications.
  • X may be S.
  • L 1 may be -CH 2 CH 2 -.
  • R 3 maybe H.
  • R 3 maybe an optionally substituted C1-4 alkyl an optionally substituted C 2.4 alkenyl or an optionally substituted C 2-4 alkynyl, more preferably an optionally substituted methyl or an optionally substituted ethyl.
  • the alkyl, alkenyl or alkynyl may be unsubstituted or substituted with an OH. Accordingly, R 3 may be -
  • R 4 maybe H.
  • R 5 may be NH 2 .
  • q is 1. In some embodiments, R 6 is -N 3 . p may be 1.
  • L 2 maybe a linker comprising a backbone of between 1 and 50 atoms, between 2 and 40 atoms, between 3 and 30 atoms, between 4 and 20 or between 5 and 15 atoms.
  • the backbone maybe made up of carbon, oxygen and/or nitrogen atoms.
  • the linker may be understood to consist of the atoms which define the shortest possible route between the two ends of the linker group.
  • the hydrocarbon may be an optionally substituted alkylene, preferably an optionally substituted Ci-i 0 alkylene and more preferably a Ci- 5 alkylene.
  • the optionally substituted polyether chain may be an optionally substituted polyethylene glycol chain.
  • the polyethylene glycol chain may comprise up to 15 monomers, up to 10 monomers or up to 5 monomers of ethylene glycol. In some embodiments, the polyethylene glycol chain consists of between 1 and 5 or between 2 and 3 monomers of ethylene glycol.
  • the arylene moiety may be a CeH 4 phenylene ring.
  • L 2 maybe: wherein w is an integer from between 1 and 15, e.g. between 2 and 10 or between 3 and 5. In some embodiments, w is 2 or 3.
  • p is o.
  • n 1
  • the hydrolysable moiety may be alkyl.
  • the C1-4 alkyl maybe methyl.
  • the hydrolysable moiety maybe a Schiff base, for example, an imine moiety, an oxime moiety and/or a hydrazone moiety.
  • the hydrolysable moiety comprises a disulphide (S-S) bond.
  • the hydrolysable moiety is N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl
  • L3 may be a linker comprising a linear chain of from 1 to 20, from 2 to 15, from 3 to 10 or from 4 to 9 atoms. The atoms maybe carbon, oxygen and/or nitrogen atoms). The linker maybe substituted or unsubstituted.
  • L 3 comprises an optionally substituted hydrocarbon (e.g. an alkyl) chain.
  • L 3 comprises an optionally substituted linear C1-10 alkyl chain, e.g. an optionally substituted C 2 -s or an optionally substituted C 4 -6 alkyl chain.
  • the alkyl chain is unsubstituted.
  • the alkyl chain is substituted.
  • L 3 is a linear, unsubstituted C 2 , C 3 or C 4 alkyl chain.
  • R 2 is .
  • the synthetic methyltransferase cofactor maybe:
  • the methyltransferase may be any methyltransferase that is capable of using S- adenosyl methionine as a cofactor.
  • the methyltransferase may be an S- adenosylmethionine-dependent methyltransferase, such as an S-adenosyl-L- methionine-dependent methyltransferase.
  • the methyltransferase may be a cytosine-5 (C5) methyltransferase, such as a bacterial cytosine C5 methyltransferase.
  • the methyltransferase may be an adenine methyltransferase, such as a bacterial adenine methyltransferase.
  • the methyltransferase maybe M.TaqI, which is a DNA adenine methyltransferase.
  • the methyltransferase may be a methyltransferase from Mycoplasma.
  • the methyltransferase may be a constitutively active methyltransferase.
  • the methyltransferase may be one of the enzymes described in US 2017/0283453.
  • the methyltransferase may be M.Mpel, M.Hhal, M.SssI, M.AccII, M.MspI or M.TaqI.
  • the methyltransferase maybe an active mutant, variant, and/or fragment of M.Mpel, M.Hhal, M.SssI, M.AccII, M.MspI or M.TaqI.
  • M.Mpel has been found to be particularly advantageous for use in the disclosed method, in part due to being particularly non-selective in terms of target locus.
  • the methyltransferase maybe M.Mpel or an active mutant, variant, and/or fragment thereof.
  • methyltransferase enzymes share a relatively low level of sequence similarity, they do share a highly conserved structural fold. This conserved fold is known as the Rossmann fold and comprises a series of beta strand and alpha helical segments, in which the beta strands are hydrogen bonded to form a beta-sheet.
  • the cofactor binding pocket of the methyltransferase enzyme may be modified within the Rossman fold, for example, by substitution of one or more amino acids, to improve the suitability of the enzyme for use in the disclosed method, such as, for example, by improving cofactor compatibility.
  • one or more amino acids within the Rossman fold of the methyltransferase enzyme may be substituted, for example, to reduce or relieve potential steric interaction with the cofactor analogue.
  • the methyltransferase may be modified such that an amino acid having a relatively large side chain, such as, for example, glutamine or asparagine may be substituted for an amino acid comprising a shorter side chain, such as, for example, alanine.
  • an amino acid having a relatively large side chain such as, for example, glutamine or asparagine
  • a shorter side chain such as, for example, alanine.
  • the use of a methyltransferase enzyme that has been modified in this way maybe particularly desirable when larger cofactor analogues are used, such as cofactor analogues comprising transferrable groups with longer alkyl-chains than those with shorter chains.
  • the methyltransferase may be any bacterial cytosine C5 methyltransferase enzyme comprising one or more, such as 2, 3, 4, 5, 6, or 7, amino acid substitutions in the Rossman fold.
  • the methyltransferase may be any bacterial cytosine C5 methyltransferase enzyme comprising an amino acid substitution in the position of the amino acid residue of the Rossman fold corresponding to the residue that is Gln82, Tyr254, and/or Asn3O4 in the wild type sequence of the M.Hhal methyltransferase (i.e. the sequence having the NCBI accession number P05102).
  • the methyltransferase may be any bacterial cytosine C5 methyltransferase enzyme comprising an alanine residue in the position of the amino acid residue of the Rossman fold corresponding to the residue that is Gln82, Tyr254, and/or Asn3O4 in the wild type sequence of the M.Hhal methyltransferase (i.e. the sequence having the NCBI accession number P05102).
  • the methyltransferase may be an M.Mpel methyltransferase.
  • the M.Mpel methyltransferase enzyme has been found to be particularly advantageous for use in the disclosed method due to non-selectively targeting any and all CpG dinucleotides for modification.
  • the methyltransferase may be, or may comprise, a variant, and/ or fragment of the wild type M.Mpel sequence, which is defined as the sequence having the NCBI accession number BAC44284.
  • the methyltransferase may be, or may comprise, a variant, and/ or fragment of the wild type M.Mpel sequence, comprising at least 80% sequence identity, such as at least 85%, 90%, or 95% sequence identity to the wild type M.Mpel sequence.
  • the methyltransferase may comprise one or more, such as 2, 3, 4, 5, 6 or 7 amino acid substitutions relative to the wild type M.Mpel sequence having the NCBI accession number BAC44284.
  • methyltransferase enzyme to apply the tag to unmodified nucleotides, such as unmodified CpG dinucleotides, may be carried out under conditions which enable the methyltransferase to transfer the tag from the methyltransferase cofactor analogue to the target DNA.
  • the reaction mixture may be incubated at a temperature of from 10 to 6o°C, from 20 to 5O°C, or from 30 to 4O°C. Preferably, the reaction mixture may be incubated at a temperature of about 37°C.
  • the incubation may be performed for a time sufficient to enable transfer of the tag to all of the available unmodified nucleotides, such as unmodified CpG dinucleotides, in the fragmented DNA sample.
  • the incubation may be performed for a period of 5 minutes to 5 hours, 10 minutes to 4 hours, 15 minutes to 3 hours, 30 minutes to 2 hours, or 40 to
  • the incubation is performed for a period of about 1 hour.
  • the incubation may be performed in a suitable buffer at a pH that is selected based on the methyltransferase that is being used.
  • a pH that is selected based on the methyltransferase that is being used.
  • the pH maybe between 7.5 and 8.5, such as between 7.8 and 8.2, or about pH 8.
  • the method comprises the inactivation of the methyltransferase enzyme.
  • methyltransferase enzymes have been found to bind tightly to DNA, thereby inhibiting downstream processing of the DNA. Methods comprising removal of the methyltransferase or purification of the sample have been found to reduce the efficiency of the process due to the additional time and reagents required and due to the loss of sample.
  • the methyltransferase enzyme may be inactivated in the reaction mixture. The inactivation of the methyltransferase in this way has surprisingly been found to provide significant processing efficiencies in the disclosed method.
  • inactivated and inactivation refer to any alteration in the structure and/or function of the methyltransferase that prevents further activity of the methyltransferase on the target polynucleotide.
  • active and inactivated refer to an enzyme that has less than 10%, such as less than 5%, less than 2%, or preferably less than 1% of its maximum activity.
  • Alterations in the structure and/or function of the methyltransferase that prevent further activity of the methyltransferase on the target polynucleotide may include, for example, denaturation, modification, inhibition, and/ or fragmentation of the methyltransferase.
  • inactivation of the methyltransferase may comprise denaturation of the methyltransferase.
  • inactivation of the methyltransferase may comprise modification of the methyltransferase.
  • inactivation of the methyltransferase may comprise inhibition of the methyltransferase.
  • inactivation of the methyltransferase may comprise fragmentation of the methyltransferase.
  • the methyltransferase may be inactivated by any suitable method. Suitable methods include changing the environmental conditions of the methyltransferase, and targeted inactivation of the methyltransferase.
  • Changing the environmental conditions may consist of or comprise, for example, changing the temperature and/ or pH of the reaction mixture.
  • inactivation of the methyltransferase may comprise incubation of the reaction mixture at a temperature of from 55 to 85°C, from 60 to 8o°C, or from 65 to 75°C.
  • inactivation of the methyltransferase may comprise incubation of the reaction mixture at a temperature of from 55 to 65°C, such as at or about 6o°C, or from 75 to 85°C, such as at or about 8o°C.
  • Inactivation of the methyltransferase may comprise incubation at an elevated temperature for a period of 5 minutes to 1 hour, or 10-30 minutes. Preferably inactivation of the methyltransferase may comprise incubation at an elevated temperature for about 15 minutes.
  • Targeted inactivation of the methyltransferase may comprise the addition of an agent to alter the structure and/or function of the methyltransferase.
  • an agent may comprise, for example, a methyltransferase inhibitor.
  • Any suitable methyltransferase inhibitor may be used, including, for example, 5-azacitidine, decitabine, clofarabine, arsenic trioxide, guadecitabine, RX-3117, 5-fluoro-2’-deoxycytidine, 5,6-dihydro-5- azacytidine, cladribine, fludarabine, camrabine, procaine, EGCG, hydralazine, genistein, equol, curcumin, disulfiram, resveratrol, and/or caffeic acid.
  • the methyltransferase inhibitor may be a S-Adenyl-l-methionine (SAM) analogue, such as sinefungin or
  • the methyltransferase may be inactivated in the reaction mixture after an appropriate incubation to label the unmodified nucleotides, such as unmodified CpG dinucleotides, with a tag, thereby terminating the derivatisation reaction.
  • the presence of the inactivated methyltransferase has not been found to be detrimental to subsequent processing.
  • the inactivation of the methyltransferase enzyme in the reaction mixture this way, rather than by removal or dilution, has been found to provide increased efficiencies and significantly improved yields in subsequent steps of the process.
  • the inactivation of the methyltransferase may, therefore, provide significant advantages by removing the requirement for purification of the polynucleotide at this stage, and permitting the efficient combination of the methyltransferase and library preparation processes in a single reaction mixture. These efficiency advantages are shown in the Examples.
  • the polynucleotide sample may be modified into a form that is compatible for high throughput sequencing. This process may be referred to as “preparing a sequencing library” or “library preparation”.
  • a “sequencing library” refers to a plurality of polynucleotides, each comprising a sequencing adaptor, such as a sequencing adaptor arranged for use in next generation sequencing. Accordingly, “preparing a polynucleotide sample into a sequencing library”, as used herein, unless otherwise indicated, refers to the addition of one or more sequencing adaptors to the polynucleotides of the sample.
  • the process of preparing a polynucleotide sample into a sequencing library may comprise the ligation of one or more sequencing adapters to the polynucleotides.
  • the process of preparing a polynucleotide sample into a sequencing library may comprise end repair and/or A-tailing of the polynucleotides.
  • the process of preparing a sequencing library does not comprise combining the polynucleotides together to form an extended ligated polynucleotide for use, for example, in a sequencing method comprising nanopore technology.
  • the reaction mixture comprises a buffer mixture such as the labelling buffer, together with inactive methyltransferase, excess cofactor analogue, and the polynucleotide sample.
  • a buffer mixture such as the labelling buffer
  • inactive methyltransferase, excess cofactor analogue and the polynucleotide sample.
  • library preparation is typically conducted with purified DNA. It has surprisingly been found by the inventors, however, that the library preparation process may be performed directly in the reaction mixture following methyltransferase inactivation and that the efficiency of the library preparation process is not compromised by the use of a different buffer, or the presence of unpurified sample, such as DNA, and/ or residual enzyme and cofactor components in the mixture. This finding provides a significant advantage over previous methods, offering significant efficiencies in terms of savings of time and reagents.
  • the preparation of a sequencing library is preferably performed in a one-pot approach.
  • the method comprises the inactivation of the methyltransferase followed by library preparation without any intervening steps or clean-up process, for example, comprising removal of inactivated enzymes, exchange of reaction buffer, or isolation or purification of the polynucleotide sample.
  • library preparation at this stage for example, prior to any enrichment process, and without the requirement for any washing steps or clean-up of the sample, surprisingly provides significant processing efficiencies, including significantly reducing any loss of polynucleotide sample.
  • the sample may be subjected to a library preparation process comprising end repair of the polynucleotide sample.
  • the end repair process may comprise removal of 3' overhangs, for example using a Klenow fragment-based enzyme.
  • the end repair process may also comprise modifying 3’ ends as necessary to comprise a hydroxyl group.
  • the end repair process may additionally or alternatively fill 5' overhangs, for example, using a T4 DNA polymerase.
  • the end repair process may also comprise phosphorylation of 5' ends where necessary, for example, using of a T4 polunucleotide kinase (PNK).
  • PNK T4 polunucleotide kinase
  • the polynucleotide sample may be subjected to a library preparation process comprising A-tailing.
  • the A-tailing process may comprise the addition of an adenosine residue to the 3' ends of the polynucleotide sample. This process may reduce the possibility of the polynucleotides in the sample ligating to each other.
  • the A-tailing process may also increase the rate of adapter ligation, particularly in embodiments in which the adapters comprise a thymine overhang.
  • the A-tailing process may comprise the use of an “exo-
  • the polynucleotide sample may be subjected to end repair and A-tailing processes simultaneously.
  • an end repair and A-tailing buffer comprising end repair and A-tailing enzymes may be used.
  • the polynucleotide sample may be subjected to a library preparation process comprising one or more “adapter ligation” processes comprising the ligation of sequencing adapters to the polynucleotides in the sample.
  • the adapters may include a primer binding site for amplification of the sample.
  • the adapters may include a primer binding site for sequencing applications, such as next-generation sequencing (NGS) applications.
  • NGS next-generation sequencing
  • the adapters may include a binding site for capture probes, such as an oligonucleotide attached to a flow cell support. A plurality of adapters of the same or different sequences may be attached to the polynucleotides in the sample.
  • the ligated adapters may include a nucleic acid tag.
  • the nucleic acid tag may be positioned relative to an amplification primer and/or sequencing primer binding site, such that the tag sequence is included in subsequent amplicons and sequence reads.
  • a plurality of adapters having the same sequence apart from different nucleic acid tags may be attached to the polynucleotides in the sample.
  • the ligated adapters may include a barcode that can be introduced at one or both ends of the sample DNA molecule.
  • a barcode maybe a type of nucleic acid tag.
  • individual "barcode" sequences may be added to the polynucleotides in the sample for use in next-generation sequencing (NGS) so that the sequencing read can be identified and sorted before the final data analysis.
  • NGS next-generation sequencing
  • the adapter ligation process may comprise the ligation of sequencing adapters to the polynucleotides in the sample.
  • the adapter ligation process may comprise the use of a
  • any sequencing adapters may be used.
  • Sequencing adapters that have been found to be particularly suitable for use in the disclosed method include, for example, any sequencing adapters suitable for use with high throughout sequencing methods, such as sequencing applications on the Illumina platform.
  • the method may comprise the use of a double-stranded indexing and unique dual indexing (UDI) adapter that enables efficient ligation and identification of PCR amplification replicates in the sequencing dataset.
  • UMI unique dual indexing
  • Other adapters may also be used, such as hairpin adapters.
  • the adapter ligation processes may comprise the ligation of adaptors that do not comprise indexing barcodes, and such adaptors may be referred to herein as “stubby adapters”.
  • Barcodes may be applied to one or both ends of the polynucleotides as part of the library preparation process. In addition, or alternatively, barcodes maybe added to one or both ends of the polynucleotides in a separate amplification step.
  • the method comprises the labelling of derivatised polynucleotides.
  • a methyltransferase is used to bind a tag to unmodified target nucleotides in the polynucleotide sample, and tags on the polynucleotides in the sample may be modified by the addition of an affinity label.
  • the affinity label may be referred to as an “affinity label” or “label” when bound to the tag and an “affinity label precursor” beforehand.
  • an affinity label may be added to the tag by the addition of the affinity label precursor to the reaction mixture.
  • the finding that an affinity label may be applied to the tag in this technically simple and efficient manner is advantageous in view of the fact that the reaction mixture comprises various components including , inactive methyltransferase, excess cofactor analogue, and the reagents and enzymes required for library preparation.
  • the finding that the affinity label may be added to the tag in this way provides a significant advantage over previous methods, by avoiding the requirement for a washing step, thereby providing efficiency savings in terms of time and reagents and avoiding any loss of sample.
  • the method comprises the addition of an affinity label to the tag after library preparation, without any intervening steps or clean-up process, for example, comprising removal of peptides or enzymes, exchange of reaction buffer, or isolation or purification of the sample.
  • binding an affinity label to each tag may comprise adding an affinity label precursor directly into the sequencing library preparation mixture, without a washing step.
  • the affinity label may comprise biotin.
  • the affinity label precursor maybe a compound of formula (II): R9-L4-R 10
  • R 9 is a reactive moiety configured to react with a group in the tag and to thereby form a bond therebetween;
  • L4 is a linker;
  • R 10 comprises or consists of biotin.
  • R 9 may be an optionally substituted 5 to 30 membered heterocyclyl, an optionally substituted 5 to 30 membered heteroaryl, an optionally substituted Ce- 3 o membered aryl or an optionally substituted C 3-3O cycloalkyl.
  • a multicyclic group may be understood to be a group comprising two or more fused rings. Accordingly, a multicyclic group may have 2 or 3 fused rings.
  • a “heterocyclyl”, “heterocyclic” or “heterocycle” group includes nonaromatic saturated or partially saturated mono and multicyclic groups.
  • a heterocyclic ring contains 1 or more heteroatoms in the ring, which may independently selected from nitrogen, oxygen or sulfur.
  • a multicyclic group may be understood to be multicyclic heterocyclyl group if it contains at least one heteroatom and at least one ring which is a non-aromatic saturated or partially saturated ring.
  • a “cycloalkyl” group includes non-aromatic saturated or partially saturated mono and multicyclic groups.
  • a multicyclic group may be understood to be multicyclic cycloalkyl group if it only contains carbon atoms in the rings and it contains at least one ring which is a non-aromatic saturated or partially saturated ring.
  • heteroaryl group includes aromatic mono and multicyclic groups.
  • a heteroaryl ring contains 1 or more heteroatoms in the ring, which may independently selected from nitrogen, oxygen or sulfur.
  • a multicyclic group may be understood to be multicyclic heteroaryl group if it contains at least one heteroatom and every ring is aromatic.
  • R 9 contains a triple bond.
  • R 9 is an optionally substituted to to 20 membered multicyclic heterocyclyl, an optionally substituted 10 to 20 membered multicyclic heteroaryl or an optionally substituted C10-20 multicyclic cycloalkyl.
  • R 9 maybe a 14 to 18 membered multicyclic heterocyclyl, an optionally substituted 14 to 18 membered multicyclic heteroaryl or an optionally substituted C13-18 multicyclic cycloalkyl.
  • X 2 is N or CH.
  • X 2 is N or CH.
  • L4 may comprise between 1 and 12 groups, each group selected from an optionally substituted hydrocarbon, an optionally substituted polyether chain, NH, O, S or S-S.
  • the hydrocarbon may be an optionally substituted alkylene, preferably an optionally substituted C1-10 alkylene and more preferably a Ci- 5 alkylene.
  • the alkylene may be substituted with an OH or oxo group.
  • the alkylene is substituted with an oxo group.
  • the optionally substituted polyether chain may be an optionally substituted polyethylene glycol chain.
  • the polyethylene glycol chain may comprise up to 15 monomers, up to 10 monomers or up to 5 monomers of ethylene glycol.
  • U may have the structure
  • Ls to L 8 are each independently absent or an optionally substituted hydrocarbon, an optionally substituted polyether chain, an NH, O, S or S-S; and an asterisk indicates a point of bonding to R 10 .
  • L 5 is an optionally substituted hydrocarbon. Accordingly, Ls may be C0CH 2 CH 2 . In some embodiments, L 6 is NH.
  • 17 is an optionally substituted hydrocarbon. Accordingly, may be C0CH 2 CH 2 .
  • L 8 is an optionally substituted polyether chain.
  • the optionally substituted polyether chain may be an optionally substituted polyethylene glycol chain.
  • the polyethylene glycol chain may comprise up to 15 monomers, up to 10 monomers or up to 5 monomers of ethylene glycol. Accordingly, L 8 maybe (0CH 2 CH 2 ) r , where r is an integer between 1 and 15, more preferably between 2 and 10 or between 3 and 5. In some embodiments, r is 4.
  • L 4 may have no charge.
  • Negatively charged linkers have been found to react poorly with the tag.
  • L 4 is not negatively charged.
  • R 10 may have the following formula:
  • L 4 and R 10 are not sulfonated.
  • the affinity label precursor is not DBCO-SS-biotin.
  • the affinity label precursor is not NHS-SS-biotin.
  • a modified tagged cytosine residue, which comprises the affinity label may be understood to have the following structure: wherein L 4 and R 10 are as defined above; and L 10 is a linker.
  • L 10 maybe understood to be -CHs-U-tLoJm-fHMJn-EL ⁇ p-L 11 -, wherein U, L 2 , L3, HM, m, n and p are as defined above and L 11 is a linker formed due to a reaction between the R 6 and R 9 groups.
  • L 11 may asterisk indicates a point of bonding to L 4 and X 2 is as defined above.
  • the present inventors have surprisingly found that in previous methods, such as that described by Kriukiene et al. (Nature Communications 20134:2190), DNA fragments having a significant density of CpG sites, such as, for example, 5 or more CpG sites per 100 bp, may be underrepresented in the sequencing reads, thereby introducing bias to the results.
  • An advantage of the disclosed method is that if a purification process, such as DNA isolation, is performed at this point, it is the only clean-up step for the entire process, and this has been found to dramatically improve the efficiency and sensitivity of the process.
  • the method may comprise, after the affinity labelling step (step 4), and before the fractionation step (step 5), a step of purifying the polynucleotide.
  • a step of purifying the polynucleotide Preferably the method involves no more than one step of purifying the polynucleotide.
  • DNA may be purified using a DNA purification kit.
  • the DNA may be washed, for example, using ethanol, such as 80% ethanol, or other DNA washing buffer. After washing, the DNA may be eluted, for example, using a suitable elution buffer, such as phosphate buffer.
  • the method comprises, after the labelling of derivatised polynucleotides (step 4), fractionating the derivatised polynucleotides of the sequencing library into first and second fractions.
  • the first fraction is enriched for polynucleotides comprising an affinity label
  • the second fraction is enriched for polynucleotides lacking an affinity label.
  • fractionation may comprise the use of the affinity label, such that polynucleotides comprising an affinity label are separated, using the affinity label, from polynucleotides lacking an affinity label.
  • the first fraction is enriched for polynucleotides comprising an affinity label, in the sense that substantially or entirely all of the polynucleotides in the first fraction comprise an affinity label.
  • the second fraction is enriched for polynucleotides lacking an affinity label, in the sense that substantially or entirely all of the polynucleotides in the second fraction lack an affinity label.
  • the fractionation may comprise selectively isolating the labelled polynucleotides using the affinity label.
  • fractionation of the polynucleotides may comprise binding of the affinity label to a capture probe that specifically binds to the affinity label.
  • fractionation of the polynucleotides may comprise selectively isolating the labelled polynucleotides using a biotin-binding protein.
  • the biotin-binding protein may comprise, for example, streptavidin, avidin, and/ or a biotin-specific antibody.
  • the biotin-binding protein may comprise streptavidin, or a functional analogue or derivative of streptavidin.
  • the biotin-binding protein may comprise a separation medium or substrate.
  • the biotin-binding protein maybe conjugated to a surface.
  • the surface may comprise a plurality of microbeads, such as paramagnetic microbeads.
  • the method may comprise binding the labelled polynucleotides to the biotin-binding protein on the coated microbeads and then isolating the coated microbeads. Isolation of the microbeads maybe performed by centrifugation. In embodiments comprising the use of paramagnetic microbeads, isolation of the microbeads may comprise the application of a magnetic field to the reaction mixture to separate the beads from the remainder of the suspension.
  • the biotin-binding protein may comprise streptavidin conjugated to the surface of microbeads.
  • the streptavidin-coated microbeads maybe streptavidin-coated paramagnetic microbeads.
  • the probe After an appropriate incubation to bind the labelled polynucleotides to the capture probe, the probe may be washed to remove unbound and non-specifically bound polynucleotides.
  • the affinity label comprises biotin
  • the biotin-binding protein may be washed by any suitable method to remove unbound and non-specifically bound polynucleotides.
  • the polynucleotides are separated from the capture probe.
  • DNA fragments are released from a streptavidin capture agent using oxidative cleavage of a disulfide bond within the affinity label.
  • this method has been found by the present inventors to be inconsistently reproducible and to have poor efficiency.
  • the polynucleotides are preferably not separated from the capture probe by a method comprising oxidative cleavage of the tag or affinity label.
  • the method does not comprise the separation of the polynucleotides from the capture probe by cleavage, such as oxidative cleavage or hydrolysis, of the tag or affinity label.
  • the method may comprise the denaturation of the capture probe.
  • the capture probe comprises a biotin- binding protein
  • the method may comprise the denaturation of the biotin-binding protein. This method has been found to be particularly advantageous due to the consistent release of DNA fragments regardless of the number of the attached affinity labels.
  • the ability of streptavidin to bind to biotin is dependent on both a sterically defined binding pocket and the highly polar residues within it. Any agent that induces a conformational change of streptavidin may, therefore, be used to release the labelled polynucleotides.
  • the labelled polynucleotides maybe released from the streptavidin by any method that denatures streptavidin without damaging the polynucleotides.
  • the labelled polynucleotides may be released from the streptavidin by incubation in pure water at a temperature of about 7O°C.
  • the labelled polynucleotides may be released from the streptavidin by incubation in 12-15% (v/v) phenol at room temperature.
  • streptavidin may be denatured using a denaturing reagent, such as 1% sodium dodecyl sulphate and heating the sample to 9O°C.
  • a denaturing reagent such as 1% sodium dodecyl sulphate
  • the first fraction may be further enriched for polynucleotides comprising an affinity label by repeating the selective isolation (affinity purification) step in one or more further cycles.
  • target unmodified nucleotides such as unmodified CpG dinucleotides
  • the first fraction may be used with extremely low sample quantities to provide a highly accurate epigenetic profile of the polynucleotide sample.
  • the polynucleotides of the first and/or second fraction maybe amplified. Amplification may be used to simultaneously introduce indexing barcodes to the polynucleotides.
  • Sequencing the polynucleotides of the first and/ or second fraction may comprise pooling the first and second fractions and sequencing them together.
  • the first and second fraction may be distinguished using indexing barcodes.
  • the inventors have surprisingly found that standard DNA polymerases are able to amplify densely modified DNA comprising the disclosed affinity labels, prior to sequencing. Moreover, this has advantageously been found to be possible under the conditions employed for DNA release. This negates the need for DNA purification prior to amplification.
  • the polynucleotides may be amplified by PCR or qPCR, for example, using primers designed to anneal within the ligated sequencing adapters, and an appropriate PCR program.
  • the amplification method may be arranged to introduce indexing barcodes into the polynucleotides.
  • the method may comprise amplification before fractionation to introduce indexing barcodes into the polynucleotides.
  • the method may comprise amplification after fractionation to introduce indexing barcodes into the polynucleotides of the first and/or second fraction.
  • the amplified polynucleotides may be purified prior to sequencing by any suitable method for cleaning up PCR products for use in a sequencing platform.
  • the amplified polynucleotides may be used directly for sequencing without further purification.
  • the term “sequencing” as used herein, unless otherwise indicated, refers to any method that maybe used to determine the sequence (i.e. the order of nucleotides) in a nucleic acid such as DNA or RNA.
  • Any type of sequencing platform may be used to determine the sequences of the polynucleotides, in combination with the appropriately ligated sequencing adapter.
  • sequencing approaches that maybe suitable for use in the disclosed method include, but are not limited to, Sanger sequencing, high-throughput sequencing, pyrosequencing, sequencing-by-synthesis, single-molecule sequencing, nanoporebased sequencing, semiconductor sequencing, sequencing-by-ligation, sequencing-by- hybridization, RNA-Seq (Illumina), Digital Gene Expression (Helicos), next generation sequencing (NGS), Single Molecule Sequencing by Synthesis (SMSS) (Helicos), massively-parallel sequencing, Clonal Single Molecule Array (Solexa), shotgun sequencing, Ion Torrent, Oxford Nanopore, Roche Genia, Maxim-Gilbert sequencing, primer walking, sequencing using Singular Genomics, Ultima Genomics, Element Biosciences, PacBio, SOLiD, Ion Torrent, or Nanopore platforms.
  • Sequencing reactions can be performed in a variety of sample processing units, which may include multiple lanes, multiple channels, multiple wells, or other means of processing multiple sample sets substantially simultaneously.
  • the method may comprise a high throughput sequencing method.
  • the high throughput sequencing method maybe capable of generating hundreds of thousands of sequence reads in parallel.
  • the method may comprise a multiplex sequencing technique.
  • the high throughput sequencing methods that may be used include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization.
  • the sequencing method maybe capable of sequencing single molecules.
  • the method comprises sequencing the polynucleotides using the Illumina platform.
  • the method may comprise comparing the sequencing reads to a reference sequence.
  • the method may comprise comparing the sequencing reads to a reference genome to determine the genomic location of the sequencing reads.
  • the reference sequence may be a human genome, or may comprise one or more portions thereof, such as one or more chromosomes and/or chromosomal regions.
  • the method may comprise aligning the sequencing reads to the reference sequence. Any suitable alignment method compatible with high throughput sequencing data maybe used, for example, using the Burrows Wheeler Alignment algorithm.
  • Aligned reads may be normalised. Any method of normalisation used in the art may maybe used.
  • normalising the sequencing reads may comprise reporting the number of reads in an aligned region (bin) as a fraction of reads per million reads of the sequencing output.
  • the method may comprise comparing the normalised read counts from two or more sequencing experiments. Such an approach may be advantageous in methods further comprising the step of diagnosing a disease based on the modification status of the nucleotide residues in the sample.
  • the method may further comprise comparing the sequences of the sequencing reads obtained by sequencing the polynucleotides of the first and/or second fraction to a reference sequence to determine the location of the sequencing reads within the reference sequence and thereby the presence or otherwise of unmodified nucleotide residues at specific locations within the reference sequence, thereby generating a profile of the modification status of nucleotide residues in the polynucleotide sample.
  • a method for determining the modification status of nucleotide residues in a polynucleotide sample comprising the steps of:
  • a methyltransferase enzyme configured to modify a nucleotide residue in a target position to apply a tag to each unmodified nucleotide residue in a polynucleotide of the sample, wherein each unmodified nucleotide residue is unmodified in the target position;
  • the method may be an in vitro method for diagnosing disease in a subject, the method comprising diagnosing the disease based on a profile obtained by the disclosed method, using a polynucleotide sample obtained from the subject.
  • a profile of a region of a polynucleotide sample comprising the modification status of nucleotide residues in the region of the polynucleotide sample, wherein the profile is obtained or obtainable by the disclosed method.
  • the region may consist of or comprise one or more specific portions of the polynucleotide sample.
  • Methods comprising making a determination based on the modification status of nucleotide residues in a polynucleotide may comprise the production of a profile.
  • the profile may reflect the position of modified and/ or unmodified residues within a polynucleotide such as a portion of a genome or an entire genome.
  • methods comprising making a determination based on the CpG modification status of a plurality of CpG dinucleotides in a polynucleotide may comprise the production of a profile, wherein the profile may reflect the position of modified and/or unmodified CpG dinucleotides within a polynucleotide such as a portion of a genome or an entire genome.
  • comparing the modification status of nucleotide residues in a polynucleotide from the subject to the modification status of the corresponding residues in a reference sample may comprise comparing the profile obtained from the sample with the profile of a reference sample.
  • comparing the modification status of cytosine residues in CpG dinucleotides of a polynucleotide from the subject to the CpG modification status of the corresponding residues in a reference sample may comprise comparing the profile obtained from the sample with the profile of a reference sample.
  • the profile from the reference sample may comprise a profile that is representative of a healthy individual.
  • the profile from the reference sample may comprise a profile that is obtained from, or indicative of a particular disease, such as, for example, a cancer.
  • the method may comprise comparing the profile obtained from the sample with a database of profiles.
  • the database of profiles may comprise a plurality of profiles relating to a single disease, wherein the disease may be diagnosed in the subject from which the test sample was derived based on similarities between the profile of the test sample and the database of profiles.
  • the database of profiles may comprise a plurality of profiles representative of different diseases, wherein a disease may be diagnosed in the subject from which the test sample was derived based on similarities between the profile of the test sample and one or more of the profiles within the the database.
  • a comparison between profiles maybe made using any suitable method. For example, a comparison may be made statistically, using an appropriate metric, for example, a p- value. A comparison may also be made using a machine-learning platform.
  • modification status may also be referred to as the “profile”.
  • the disclosed method maybe used to determine a profile of methyltransferase catalysed modifications within a polynucleotide sample.
  • the method may comprise the detection of unmodified cytosine residues in CpG dinucleotides of a DNA sample.
  • CpG CpG site
  • CpG dinucleotide are used interchangeably herein to refer to a cytosine-phosphate-guanine sequence in a 5’ to 3’ direction in the backbone of a nucleic acid.
  • CpG modification status and “modification status” as used interchangeably herein, unless otherwise stated, refer to the presence (modified) or absence
  • the method may be a method for preparing a profile of the modification status at the cytosine C5 position of CpG dinucleotides, wherein the method further comprises comparing the sequences of the sequencing reads of the first fraction to a reference sequence to determine the location of the sequencing reads within the reference sequence and thereby the presence or otherwise of unmodified cytosine residues in specific CpG dinucleotides within the reference sequence.
  • the “CpG modification status” and “modification status” may also be referred to as the “profile”.
  • the profile may be referred to as the “unmethylome profile” or “unmethylome”.
  • kits for determining the modification status of nucleotide residues in a polynucleotide sample comprising:
  • a methyltransferase enzyme configured to modify a nucleotide residue in a target position to apply a tag from a cofactor analogue to each unmodified nucleotide residue in a polynucleotide of the sample, wherein each unmodified nucleotide residue is unmodified in the target position;
  • an affinity label precursor suitable for binding an affinity label to the tags, wherein the affinity label comprises biotin;
  • a capture agent comprising a biotin-binding protein for fractionating the sequencing library into first and second fractions, wherein the first fraction is enriched for polynucleotides comprising an affinity label, and wherein the second fraction is enriched for polynucleotides lacking an affinity label.
  • the kit may comprise:
  • a methyltransferase enzyme configured to modify a nucleotide residue in a target position to apply a tag from a cofactor analogue to each unmodified nucleotide residue in the DNA sample, wherein each unmodified nucleotide residue is unmodified in the target position;
  • the kit may be a kit for determining the modification status of cytosine residues in CpG dinucleotides of a DNA sample, the kit comprising:
  • a methyltransferase enzyme configured to modify the cytosine C5 position of a CpG dinucleotide to apply a tag from a cofactor analogue to each unmodified cytosine residue in the DNA sample, wherein each unmodified cytosine residue is the cytosine of a CpG dinucleotide that is unmodified in the C5 position;
  • an affinity label precursor suitable for binding an affinity label to the tags, wherein the affinity label comprises biotin;
  • a capture agent comprising a biotin-binding protein for fractionating the sequencing library into first and second fractions, wherein the first fraction is enriched for DNA molecules comprising an affinity label, and wherein the second fraction is enriched for DNA molecules lacking an affinity label.
  • the kit may further comprise a cofactor analogue comprising the tag precursor.
  • the kit may further comprise sequencing adaptors for preparing the DNA into a sequencing library, as described in accordance with the first aspect.
  • the kit may further comprise enzymes and reagents for reverse transcription, end repair, A- tailing, and/or adapter ligation, as described in accordance with the first aspect.
  • the kit may or may not comprise a capture agent comprising a biotin-binding protein for fractionating the sequencing library into first and second fractions, wherein the first fraction is enriched for polynucleotides comprising an affinity label, and wherein the second fraction is enriched for polynucleotides lacking an affinity label.
  • the kit may further comprise a releasing agent for releasing the affinity label from the capture agent by denaturation of the biotin-binding protein.
  • the kit may comprise:
  • a methyltransferase enzyme configured to modify a nucleotide residue in a target position to apply a tag from a cofactor analogue to each unmodified nucleotide residue in a polynucleotide of the sample, and a cofactor analogue comprising the tag precursor, wherein each unmodified nucleotide residue is unmodified in the target position;
  • an affinity label precursor suitable for binding an affinity label to the tags wherein the affinity label comprises biotin
  • a capture agent comprising a biotin-binding protein for fractionating the sequencing library into first and second fractions, wherein the first fraction is enriched for polynucleotides comprising an affinity label, and wherein the second fraction is enriched for polynucleotides lacking an affinity label;
  • a releasing agent for releasing the affinity label from the capture agent by denaturation of the biotin-binding protein 5.
  • the kit may comprise:
  • a methyltransferase enzyme configured to modify a nucleotide residue in a target position to apply a tag from a cofactor analogue to each unmodified nucleotide residue in the DNA sample, and a cofactor analogue comprising the tag precursor, wherein each unmodified nucleotide residue is unmodified in the target position;
  • an affinity label precursor suitable for binding an affinity label comprising a biotin to the tags 4.
  • a capture agent comprising a biotin-binding protein for fractionating the sequencing library into first and second fractions, wherein the first fraction is enriched for DNA molecules comprising an affinity label, and wherein the second fraction is enriched for DNA molecules lacking an affinity label;
  • a releasing agent for releasing the affinity label from the capture agent by denaturation of the biotin-binding protein 5.
  • the kit may be a kit for determining the modification status of cytosine residues in CpG dinucleotides of a DNA sample, the kit comprising:
  • a methyltransferase enzyme configured to modify the cytosine C5 position of a CpG dinucleotide to apply a tag from a cofactor analogue to each unmodified cytosine residue in the DNA sample, and a cofactor analogue comprising the tag precursor, wherein each unmodified cytosine residue is the cytosine of a CpG dinucleotide that is unmodified in the C5 position;
  • an affinity label precursor suitable for binding an affinity label to the tags, wherein the affinity label comprises biotin ;
  • a capture agent comprising a biotin-binding protein forfractionating the sequencing library into first and second fractions, wherein the first fraction is enriched for DNA molecules comprising an affinity label, and wherein the second fraction is enriched for DNA molecules lacking an affinity label;
  • the kit may be suitable for use in or as a one pot method as described in accordance with the first aspect.
  • the methyltransferase enzyme may be a methyltransferase as described in accordance with the first aspect.
  • the cofactor analogue may be a cofactor analogue as described in accordance with the first aspect.
  • the cofactor analogue may be synthetic methyltransferase cofactor analogue.
  • the cofactor analogue may comprise a compound of formula (I).
  • the sequencing adaptors and enzymes for preparing the polynucleotide into a sequencing library may comprise enzymes and reagents for reverse transcription, end repair, A-tailing, and/or adapter ligation, as described in accordance with the first aspect.
  • the affinity label precursor may be an affinity label precursor as described in accordance with the first aspect.
  • the affinity label precursor may comprise a compound of formula (II).
  • the capture agent comprising a biotin-binding protein for fractionating the sequencing library may be as described in accordance with the first aspect.
  • the releasing agent for releasing the affinity label from the capture agent by denaturation of the biotin-binding protein may be as described herein. All features described herein (including any accompanying claims and drawings), and/or all of the steps of any method or process so disclosed, may be combined with any of the above aspects in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.
  • the invention will now be illustrated by reference to specific Examples showing how embodiments may be carried into effect, which are not intended to be limiting. Data from the Examples is presented in the Figures, in which:
  • Figure 1A is a flow chart showing the disclosed method.
  • FIG. 1B is a diagram providing an overview of an embodiment of the disclosed method.
  • Purified fragmented DNA (cfDNA or fragmented genomic DNA) is derivatised by treatment with a CpG-targeting methyltransferase (in this case M.Mpel) and a synthetic cofactor analogue (in this case ETA-AdoHcy-N 3 ) that results in the addition of tags at unmodified CpG sites.
  • the methyltransferase is inactivated and the DNA fragments are then end-repaired and ligated to sequencing adapters.
  • Fragments comprising a tag are subsequently labelled by attaching an affinity label (for example, biotin) to the tag, and isolated (for example, using streptavidin-coated magnetic beads).
  • an affinity label for example, biotin
  • tagged (unmodified CpG sites) and untagged (predominantly 5mCpG and 5hmCpG sites) can be fractionated to form the first (labelled) and second (unlabelled) fractions, and separately amplified and sequenced.
  • Figure 2A is a graph showing sequencing adapter ligation efficiencies using unlabelled DNA (control; left-hand cluster), methyltransferase-labelled DNA without methyltransferase inactivation (centre cluster), and methyltransferase-labelled DNA with methyltransferase inactivation (right-hand cluster), prior to adapter ligation.
  • FIG. 2B shows the raw data for this experiment.
  • Figure 3A is a bar chart showing efficiencies for capture (light bars) and capture/release (dark bars) of target DNA from solution, as a function of target CpG site density.
  • Figures 3B and 3C are reproduced from Kriukiene et al. (Nature Communications 20134:2190).
  • Figure 3D shows the enrichment efficiency of the present method for a (target) DNA molecule with a high density of CpG sites (10 sites per -150 bp).
  • the target DNA is mixed with 24 ng of non-target DNA and is selectively purified with high efficiency at a range of concentrations. Final DNA concentration was quantified using spectrophotometry (Qubit).
  • Figure 3E shows enrichment of unmethylated DNA as a function CpG density using three different enrichment chemistries.
  • the current approach using a single pot reaction and one purification step (dark grey bars) shows over three times improvement in the retention of DNA throughout the Tag-Seq enrichment process, compared to the approach described by Kruikiene et al. (light grey) and an approach using two DNA purification steps (grey). Density of CpG sites is shown as number of sites per 300 bp genomic window.
  • Figure 4 is a graph showing threshold cycle versus target DNA concentration (ng) showing a linear response from 1.25 ng target DNA down to 1.25 pg target in a background of 24 ng (over 19OOOX excess) of DNA containing no target sites.
  • Figure 5 is a bar chart showing a comparison of sequencing coverage at CpG sites in the enriched (unmodified CpG) fraction (light grey) of a DNA sample, and the unenriched (modified CpG) fraction (dark grey) of the sample. Note that 1.4% and 44.7% of reads did not contain a CpG site for the enriched and unenriched fractions, respectively.
  • Figures 6A and 6B are bar charts showing sequencing coverage of enriched (light grey) and unenriched (dark grey) fractions at unmodified CpG sites ( Figure 6A) and methylated CpG sites ( Figure 6B). Note the different y-axis scales in the two plots.
  • Figure 7A is a bar chart showing enrichment using the disclosed method (NRPM) across a range of unmodified CpG densities (75 bp window) (groups 1-4, light grey bars) compared to similar enrichment using the known MeDIP-Seq method (group 5, hashed bars), with baseline for whole-genome sequencing provided for context (group 6, white bars).
  • NRPM disclosed method
  • Figure 7B is a bar chart corresponding to Figure 7A showing the corresponding enrichment profiles for methylated CpG sites.
  • Figure 8A shows example profiles obtained using the disclosed method (dark blue) compared to MeDIP-Seq (green) and WGBS (yellow) profiles for (top) the KRAS gene and (bottom) a megabase-scale region of chromosome 1.
  • Figure 8B shows the emrichment of read counts for two technical replicates of the profile obtained using the disclosed method (blue) MeDIP-Seq (yellow) and shallow whole genome sequencing (red) at gene transcription start sites (TSS).
  • Figure 8C shows the profile obtained using the disclosed method (blue) correlates inversely with the MeDIP-Seq profile (purple) and with chromatin domain organisation identified in Hi-C experiments (red) on the megabase scale.
  • Figure 9A shows enrichment of genomic DNA at regions corresponding to H3K4 monomethylation. Comparison of two experiments using the disclosed method (technical repeats, different users) (light and dark blue), MeDIP Seq (green) and shallow whole genome sequencing (control, no enrichment) (orange).
  • Figure 9B shows enrichment of genomic DNA at regions corresponding to H3K4 trimethylation. Comparison of two experiments using the disclosed method (technical repeats, different users) (light and dark blue), MeDIP Seq (green) and shallow whole genome sequencing (control, no enrichment) (orange).
  • Figure 9C shows enrichment of genomic DNA at regions corresponding to H3K27 acetylation. Comparison of two experiments using the disclosed method (technical repeats, different users) (light and dark blue), MeDIP Seq (green) and shallow whole genome sequencing (control, no enrichment) (orange).
  • Figure 10 shows a Spearmann correlation analysis to assess the similarity of the profiles obtained using the disclosed method across six different cell lines and for three technical repeats of each sample. Dark blue indicates a high degree of similarity of the profiles. Notably, DNA from each cell line has a clearly distinct profile when compared to the sample technical repeats, consistent with the known utility of methylation profiles for the identification of tissues.
  • Figure 11 shows volcano plots showing the comparison of tumour and normal adjacent tissue profiles obtained using the disclosed method, across the genome for six patients with a range of different cancers.
  • Differentially methylated windows are defined as those with an adjusted p-value of less than 5% and a log-fold change in signal of greater than 0.58 (1.5X).
  • Red lines indicate the locations of these thresholds in the volcano plots.
  • Blue markers are windows that show hypermethylation in cancer, red markers are for windows that are hypomethylated in cancer.
  • Figures 12A and 12B show DNA Agilent TapeStation (Cell-free DNA ScreenTape) traces showing profiles for the cell free DNA (cfDNA) that was input for the disclosed profiling method (top) and the output from the enriched, amplified libraries (bottom) of the disclosed method, for a healthy patient sample ( Figure 12A) and for a sample from a patient with Stage 1 non-small cell lung cancer ( Figure 12B).
  • the mono- and dinucleosomal pattern of the input cfDNA is maintained in the final libraries, the size of which corresponds to the duplicated original strand plus the Illumina P5/P7 sequencing adaptors.
  • Figure 13 shows example profiles using the disclosed method for a genomic region (SHOX2 gene, a known methylation biomarker for lung cancer) in the healthy (blue) and lung cancer (grey) patients, compared to genomic DNA, extracted from the healthy patient’s buffy coat (yellow).
  • Black traces show the duplicate profiles in both cases, dark blue tick marks show the known CpG site density across the gene. Traces are based on normalised read counts for all profiles and the cfDNA samples are displayed on the same scale for direct comparison.
  • Figure 14 shows a summary of data derived from triplicate repeat experiments using DNA isolated from FFPE (formalin-fixed, paraffin-embedded) samples.
  • the disclosed method may be used to provide a uniquely straightforward and robust approach to epigenetic profiling that can be readily adapted for concurrent or simultaneous readout of genetic features of the genome of interest.
  • the method is an enzymatic technology that enriches for unmodified nucleotide residues such as, in particular, CpG sites, across the polynucleotide sample, such as across the whole genome.
  • the method requires no de novo knowledge of the polynucleotide sequence for its application in epigenetic profiling. It does not damage the nucleic acid sample nor rely on base conversion and therefore allows concurrent analysis of other genetic features, such as mutations.
  • the method may also provide a profile whose signal correlates with markers of active genomic regions.
  • the enzymatic chemistry enables unbiased fractionation of modified and unmodified nucleic acids from a sample for subsequent analysis.
  • the Examples below show that the disclosed method is a uniquely sensitive approach, relative to other available methods for epigenetic analysis.
  • the method can be applied at polynucleotide (e.g. DNA) concentrations that are compatible with single-cell analysis (picogram inputs).
  • the workflow can be adapted to enable concurrent readout of a genome’s genetic and epigenetic features.
  • FIG. 1A A flow chart of the disclosed method is shown in Figure 1A and an overview of an embodiment of the disclosed method is shown schematically in Figure 1B.
  • a bacterial DNA methyltransferase enzyme (M.Mpel) is used to target unmodified CpG sites for modification with an unnatural cofactor analogue of S-adenosyl-L-methionine referred to herein as ETA-AdoHcy-N 3 .
  • tags can be further modified (for example, biotinylated) to enable fractionation of modified and unmodified DNA (where ‘unmodified’ refers to all genomic DNA fragments containing one or more CpG dinucleotide that is unmodified (for example, that is not methylated, hydroxymethylated, carboxylated, or acylated) at the C5-position).
  • the inventors have developed a modified library preparation that integrates this labelling step, thereby minimising handling and purification steps, and as a result, improving robustness and maximising sensitivity.
  • a significant advantage of the disclosed method is that it employs a single clean-up step for the entire process, dramatically improving the efficiency of the fractionation. This is made possible by the use of, firstly, a step to inactivate the methyltransferase and, secondly, the surprising activity of the enzymes for library preparation in the resulting buffer mixtures.
  • Figure 2 shows that the efficiency of adapter ligation is significantly inhibited in the absence of inactivation of the methyltransferaseafter labelling and before adapter ligation. This surprising result is due to the high binding affinity of the methyltransferase enzyme to the DNA molecule, which has been found to limit the activity of DNA-targeting enzymes in subsequent steps of the procedure. This activity can be recovered byinactivating the methyltransferase enzyme.
  • Figure 3 shows the results of example experiments investigating the recovery of DNA samples with different affinity labels and comprising different CpG densities.
  • a mixture containing too ng of DNA cariying a known number of CpG sites (o, 1, 2, 4 or 10) was incubated with M.Mpel (0.0274 pg/pL) and ETA-AdoHCy-N3 (too pM) .
  • the reaction was incubated at 37 °C for 1 hour.
  • the DNA was purified using AMPure beads (Beckman Coulter), followed by conjugation of an affinity label comprising biotin using click chemsitry. Finally, DNA was purified using a standard PCR clean-up kit (Zymo Clean and Concentrate).
  • Purified DNA was fractionated using DynaBeads MyOne Streptavidin-coated beads.
  • the beads were then washed twice with 150 pL of PBST. Finally, captured DNA was released.
  • Figure 3A shows the release efficiency of DNA in the current workflow (Active-Seq).
  • the disclosed method provides a significant improvement in capture efficiency relative to the method described in Kriukiene et al. (Nature Communications 20134:2190).
  • Figure 3B (reproduced from Figure 2b of Kriukiene et al.)
  • Kriukiene at el. report capture efficiencies in the 20-30% range using a method comprising an azide- DBCO label. This is significantly lower than the capture efficiencies that maybe obtained using the method disclosed herein, as shown, for example, in Figure 3A.
  • the method described by Kriukiene et al. shows (in Figure 2c, reproduced herein as Figure 3C) capture of around 30-40% of target DNA containing 2 CG sites using an azide-DBCO affinity label and streptavidin-coated magnetic beads.
  • the method disclosed herein is able to isolate DNA at similar input levels to those of the method described by Kriukiene et al.
  • the captured DNA maybe recovered from the capture agent in much more significant proportions, at least in part due to the efficient release of the labelledDNA molecules (see Figure 3D).
  • Kriukiene et al. only includes data on the level of DNA capture, and there is no discussion or data in Kriukiene et al., on the efficiency of release of the sample from the magnetic beads.
  • the present inventors have found that using the method disclosed by Kriukiene et al., the release of enriched DNA fragments is highly inefficient and inconsistently reproducible.
  • Figure 3E shows a plot of mean normalised read count per million reads (NRPM) for samples prepared using the method described in Kriukiene et al. (left-hand bars) or the method disclosed herein (right-hand bars), as a function of the number of CpG sites in a given read. The plot is generated for CpG-rich regions of the genome (CG islands).
  • Figure 3E clearly shows higher read densities across CpG rich regions, demonstrating the significantly improved enrichment of CpG-rich DNA using the disclosed method.
  • the overall effect of the method disclosed herein is to enable efficient enrichment of unmodified DNA from as little as a few picograms of input DNA. This is particularly critical for samples where the DNA concentration is limited, such as liquid biopsy (blood, urine, saliva, spinal fluid) samples.
  • target DNA 153 bp, containing 10 unmodified CpG sites
  • non-target DNA 142 bp containing no CpG sites
  • the target DNA was tagged and thereby enriched using streptavidin coated beads for analysis by qPCR, the results of which are shown in Figure 4.
  • Tagged DNA is compatible with PCR and can be amplified using a standard polymerase, following enrichment.
  • the initial step of enrichment shows only a very minor dependence on the number of CpG sites available on a DNA molecule.
  • Light grey bars show capture efficiencies and dark grey bars show capture/release of target DNA from solution.
  • Example 4 Having demonstrated the performance of the biochemical approach on simple DNA fragments, the utility of the platform disclosed herein on genomic DNA was investigated by generating genome-wide epigenetic profiles from DNA extracted from a range of cell lines. Extracted DNA was fragmented by sonication (-150 bp) and subject to enrichment of the DNA fragments lacking CpG modification.
  • Example 5 Successful enrichment at CpG sites was assessed by comparison of the enriched (unmodified CpG) and unenriched (modified CpG) fractions of the genome by sequencing. This was done by examining the fraction of reads containing a CpG site and the sequencing coverage at each CpG site. In the enriched fraction, 98.6% of the reads contain a CpG site. By contrast, in the unenriched fraction only 55.3% of reads contain a CpG site, indicating effective enrichment at CpG sites. Furthermore, in the enriched fraction, a majority of the CpG sites of the genome (54.0%) are covered by greater than 5 reads, whereas in the unenriched fraction, this figure is just 6.3% of the CpG sites.
  • enriched DNA typically between 1 and 5% of reads were found to contain no CpG sites.
  • the source of these reads is likely varied but will include non-specifically enriched DNA, as well as reads that do not cover a motif but that originate from a molecule that does (specifically enriched but CpG not sequenced). This read fraction is denoted as the ‘background’ for the enriched sample.
  • an unmodified (e.g. ‘unmethylated’) site is a site having a modification (e.g. methylation) level (P-value) of less than 0.05 by whole genome bisulfite sequencing.
  • a ‘modified’ (e.g. ‘methylated’) site has a modification (e.g. methylation) level (P-value) of greater than 0.95 by whole genome bisulfite sequencing.
  • An advantage of the disclosed method is provided by the enzymatic targeting of unmodified CpG sites.
  • the approach is well-suited to the enrichment of hypomethylated DNA from tumour cells in the blood.
  • a key genomic feature that were hypothesised to be prominent in the profiles produced by the disclosed method are extended regions of unmodified DNA, which are epigenetically-stable and conserved in mammals, with consistently low unmodified levels on length scales of 5-20 kbp.
  • the term ‘non-modified island’ (NMI) is used herein since such unmodified regions are rather more island-like in the profiles produced by the disclosed method. Read count peaks in profiles of unmodified DNA produced by the disclosed method
  • NMIs play a central role in regulation of gene expression and their methylation levels are regulated by the TET (demethylating) enzymes, via the polycomb protein complex.
  • TET demethylating
  • Example 8 For further validation, the disclosed method was compared to established sequencing approaches that are known to correlate with DNA methylation levels, as shown in Figure 8. Clear regions of highly unmethylated DNA, also evident in the MeDIP-Seq and WGBS profiles ( Figure 8A). Enrichment in the profile produced with the disclosed method at transcription start sites and markers of active chromatin (HsKqMei, HsKqMes and H3K2 ac) anticorrelates with loss of MeDIP-Seq signal at these regions ( Figure 8B and Figure 9).
  • the approach was applied to nine DNA samples derived from cultured cell lines for a range of cancers (breast (MCF7, HCC1937), colorectal (HT29, SW48, C0I0201, RKO), liver (HepG2) and lung (SW1271, NCI-
  • Genome-wide correlation analysis of this dataset shows excellent correlation of a series of three technical repeats for each of the samples.
  • Each of the cell lines examined forms a distinct cluster of correlated data, with cell lines from similar tissues broadly clustering together, consistent with the expectation that the epigenetic profile can be employed for the identification of tissue of origin for a sample.
  • These distinct cell-line- specific profiles result in part from the robustness of the method and the remarkable consistency of the disclosed method across sequencing runs and operators.
  • DNA shed from tumour cells can be isolated in the blood of cancer patients.
  • the technical challenge associated with its analysis is two-fold; DNA it is typically present in healthy and early-stage cancer patients at less than 10 ng per millilitre of plasma; and cell free DNA isolated from plasma can contain less than 1% tumour fraction (ctDNA).
  • the disclosed method is ideally suited to the analysis of ctDNA because it is performant with input DNA orders of magnitude less than one nanogram and provides genomewide analysis.
  • Formalin fixed, paraffin-embedded (FFPE) treatment typically leads to extensive damage (depurination, depyrimidation and deamination) of the genome.
  • the ability to generate a meaningful epigenomic profile from DNA preserved in these samples using the disclosed method was investigated.
  • Genome-wide profiles were generated using the disclosed method for three FFPE embedded samples in triplicate, sourced from the Welsh Cancer Bank, derived from patients with colorectal cancer. Consistent with other sample types, sequencing reached 90% saturation by 80M (150 bp, paired-end) reads for all samples. The resultant datasets show good overall coverage of the genome and excellent consistency for the technical repeats ( Figure 14). For two of the three samples, high levels of relative enrichment of CpG dense regions of the genome were observed, ( Figure 14). However, comparison of the three FFPE datasets to similar data for the HT-29 cell line shows good consistency of the profile and the observed ‘background’ of the sequencing dataset (reads lacking CpG sites) is consistently below 5% for all samples.
  • DNA samples requiring DNA fragmentation DNA was sheared to an average of 180 bp.
  • a mixture of DNA ( ⁇ iong), M.Mpel and ETA-AdoHCy-N3 was prepared on ice. This solution was incubated at 37 °C for 1 hour. Following incubation, the methyltransferase enzyme was inactivated by heating.
  • the sample was cooled to io°C. End Repair & A-Tailing Master Mix was added (Kapa Biosystems). The mixture was mixed thoroughly by pipette aspiration and incubated at 20°C for 30 mins followed by a 65°C incubation for a further 30 mins.
  • the sample was cooled to io°C, and sequencing adapters were ligated.
  • biotin-PEGq-DBCO (Jena Biosciences) was added and the mixture was incubated at 37 °C for 1 hour with shaking at 500 rpm.
  • the DNA was subsequently purified from the reaction mixture.
  • 5 pL Dynabeads MyOne Streptavidin Ci beads (ThermoFisher) were washed with 150 pL of PBST.
  • the DNA was added to the beads and the mixture was further incubated at 23 °C for 15 minutes, with shaking at 1000 rpm. Once completed, the supernatant was removed (as the “second fraction”), and the beads were washed twice with 150 pL of PBST. Finally, the bound DNA was released from the beads (as the “first fraction”) by denaturation of streptavidin.
  • Amplified libraries were pooled together with 0.1% PhiX and sequenced on a S4 Flow Cell using an Illumina NovaSeq Sequencer (Source Biosciences).
  • qPCR was performed on the Azure Cielo 6 thermocycler (Azure Biosystems) with the following conditions: initial denaturation at 98°C for 30 seconds, then 40 cycles at 95°C for 10 seconds and 6o°C for 60 seconds with fluorescence detection. Analysis of the acquired fluorescence intensity and subsequent quantification of DNA in the samples was performed using Azure Cielo Manager Analysis Software (V1.0.4). After sequencing, adaptors were removed from the reads using BBTools and then aligned to human reference genome HG38 using BWA-MEM2.
  • Ambiguously aligned reads and those with low mapping scores were removed using SamTools. Duplicates were removed with Sambamba (PMID: 25697820) and reads hard-clipped using jvarkit (https: / / github.com/lindenb/jvarkit). Spearman correlation plots were generated from the processed bam files with deepTools using a binsize of looobp and RPGC normalisation. Saturation figures and CpG density plots were generated for Chri-22 using the QSEA and Repitools R packages. To allow direct comparison of enriched and unenriched samples, Bam files were down sampled to the same sequencing depth using SamTools.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé pour déterminer l'état de modification des résidus nucléotidiques dans un échantillon de polynucléotides. Le procédé fait intervenir l'utilisation d'une méthyltransférase pour appliquer un marqueur à des résidus de nucléotides non modifiés dans l'échantillon.
PCT/GB2024/052883 2023-11-14 2024-11-13 Procédé pour établir des profils en vue de déterminer des modifications épigénétiques Pending WO2025104430A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB2317422.0A GB2639522A (en) 2023-11-14 2023-11-14 Profiling Method
GB2317422.0 2023-11-14
GB2400723.9 2024-01-19
GB2400723.9A GB2637191A (en) 2023-11-14 2024-01-19 Profiling Method

Publications (1)

Publication Number Publication Date
WO2025104430A1 true WO2025104430A1 (fr) 2025-05-22

Family

ID=93648538

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/GB2024/052884 Pending WO2025104431A1 (fr) 2023-11-14 2024-11-13 Procédé pour établir des profils en vue de déterminer des modifications épigénétiques
PCT/GB2024/052883 Pending WO2025104430A1 (fr) 2023-11-14 2024-11-13 Procédé pour établir des profils en vue de déterminer des modifications épigénétiques

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/GB2024/052884 Pending WO2025104431A1 (fr) 2023-11-14 2024-11-13 Procédé pour établir des profils en vue de déterminer des modifications épigénétiques

Country Status (1)

Country Link
WO (2) WO2025104431A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8008007B2 (en) 2005-04-14 2011-08-30 Rwth Aachen S-adenosyl-L-methionine analogs with extended activated groups for transfer by methyltransferases
EP2594651A1 (fr) * 2011-11-17 2013-05-22 Vilnius University Analyse de sites de méthylation
US20170283453A1 (en) 2014-08-29 2017-10-05 Katholieke Universiteit Leuven Cofactor analogues for methyltransferases
WO2021053346A1 (fr) * 2019-09-20 2021-03-25 The University Of Birmingham Procédé de profilage épigénétique
WO2023047141A1 (fr) * 2021-09-27 2023-03-30 The University Of Birmingham Analogues de la s-adénosyl-l-méthionine et de la se-adénosyl-l-méthionine à groupes activés pour le transfert par méthyltransférases sur des biomolécules cibles

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR1310894A (fr) 1961-12-26 1962-11-30 Corps de joint pour le remplissage des joints d'ouvrages de construction, en particulier d'ouvrages de construction massifs, tels que chaussées en béton, pistes d'aviation, revêtements de bassins, ou autres semblables
JP2025502811A (ja) * 2021-12-29 2025-01-28 ファウンデーション・メディシン・インコーポレイテッド 単一ワークフローにおける遺伝子情報及びエピジェネティック情報の検出

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8008007B2 (en) 2005-04-14 2011-08-30 Rwth Aachen S-adenosyl-L-methionine analogs with extended activated groups for transfer by methyltransferases
EP2594651A1 (fr) * 2011-11-17 2013-05-22 Vilnius University Analyse de sites de méthylation
US20170283453A1 (en) 2014-08-29 2017-10-05 Katholieke Universiteit Leuven Cofactor analogues for methyltransferases
EP3186266B1 (fr) 2014-08-29 2019-12-11 Katholieke Universiteit Leuven Analogues de s-adenosyl-l-cysteine comme cofacteurs des methyltransferases
WO2021053346A1 (fr) * 2019-09-20 2021-03-25 The University Of Birmingham Procédé de profilage épigénétique
WO2023047141A1 (fr) * 2021-09-27 2023-03-30 The University Of Birmingham Analogues de la s-adénosyl-l-méthionine et de la se-adénosyl-l-méthionine à groupes activés pour le transfert par méthyltransférases sur des biomolécules cibles

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"NCBI", Database accession no. BAC44284
KRIUKIENE ET AL., NATURE COMMUNICATIONS, vol. 4, 2013, pages 2190
KRIUKIENE, COMMUNICATIONS, vol. 4, 2013, pages 2190
TOSTI LUCA ET AL: "Epigenomic profiling of active regulatory elements by enrichment of unmodified CpG dinucleotides", BIORXIV, 16 February 2024 (2024-02-16), XP093240667, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2024.02.16.575381v1.full.pdf> DOI: 10.1101/2024.02.16.575381 *

Also Published As

Publication number Publication date
WO2025104431A1 (fr) 2025-05-22

Similar Documents

Publication Publication Date Title
JP7256748B2 (ja) エラーが訂正された核酸配列決定への適用を伴う標的化核酸配列濃縮のための方法
US12188084B2 (en) Method for highly sensitive DNA methylation analysis
JP7514263B2 (ja) 試料核酸にアダプターを付着する方法
CN103233072B (zh) 一种高通量全基因组dna甲基化检测技术
CA2815076C (fr) Comptage varietal d&#39;acides nucleiques pour obtenir des informations sur le nombre de copies genomiques
US11339435B2 (en) Methods for copy number determination
US20200063213A1 (en) Methods of Amplifying DNA to Maintain Methylation Status
AU2016297510A1 (en) Methods of amplifying nucleic acid sequences
CN112041459A (zh) 核酸扩增方法
US20190309352A1 (en) Multimodal assay for detecting nucleic acid aberrations
US20240076720A1 (en) Methods for analyzing nucleic acids
CN116445593A (zh) 测定一生物样品的一甲基化图谱的方法
EP4172357B1 (fr) Procédés et compositions pour analyse d&#39;acide nucléique
Tost Current and emerging technologies for the analysis of the genome-wide and locus-specific DNA methylation patterns
Tost Current and emerging technologies for the analysis of the genome-wide and locus-specific DNA methylation patterns
WO2011063210A2 (fr) Methodes de mappage de profils de methylation genomique
CN114787385A (zh) 用于检测核酸修饰的方法和系统
JP2024529674A (ja) 同時での変異検出およびメチル化分析のための方法
EP4632077A1 (fr) Procédé d&#39;analyse de séquençage pleine longueur multicorps pour cellule unique à l&#39;aide d&#39;une réaction d&#39;assemblage multi-combinaison de fragments d&#39;adn
CN115348896A (zh) 包括可切割或可切除部分的核酸分子
CN103374759A (zh) 一种检测肺癌转移标志性snp的方法及其应用
WO2025104430A1 (fr) Procédé pour établir des profils en vue de déterminer des modifications épigénétiques
Estécio et al. Tackling the methylome: recent methodological advances in genome-wide methylation profiling
CN112714796A (zh) 扩增经亚硫酸氢盐处理的dna的方法
GB2639522A (en) Profiling Method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24812542

Country of ref document: EP

Kind code of ref document: A1