[go: up one dir, main page]

WO2024249466A1 - False positive reduction by translesion polymerase repair - Google Patents

False positive reduction by translesion polymerase repair Download PDF

Info

Publication number
WO2024249466A1
WO2024249466A1 PCT/US2024/031370 US2024031370W WO2024249466A1 WO 2024249466 A1 WO2024249466 A1 WO 2024249466A1 US 2024031370 W US2024031370 W US 2024031370W WO 2024249466 A1 WO2024249466 A1 WO 2024249466A1
Authority
WO
WIPO (PCT)
Prior art keywords
altered
cytosine
subfamily
stranded dna
seq
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/031370
Other languages
French (fr)
Inventor
Kayla BUSBY
Gaetano SPECIALE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Illumina Inc
Original Assignee
Illumina Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Illumina Inc filed Critical Illumina Inc
Publication of WO2024249466A1 publication Critical patent/WO2024249466A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism

Definitions

  • Embodiments of the present disclosure relate to the prevention of false positive detection of 5-methylcytosine (5mC) and/or 5-hydroxymethylcytosine (5hmC) due to the deamination of unmethylated cytosines in assays using cytosine deaminases to selectively deaminate methylated cytosines.
  • embodiments of the methods, compositions, and kits provided herein utilize a dual enzymatic process to reduce the likelihood that such false positive conversions are detected in the final sequenced library.
  • UDG Uracil DNA Glycosylase
  • the DNA is incubated with a high-fidelity polymerase supplemented with a deoxy cytidyl transferase, such as Revl, to repair the lesion with the installation of a cytidine.
  • Modified DNA cytosines including 5-methylcytosine (5mC)
  • 5mC 5-methylcytosine
  • 5mC is a well-studied epigenetic modification that play fundamental roles in human development and disease. Its genome-wide distribution differs between tissue types, and between healthy and diseased states.
  • 5mC has also gained prominence as a tool for clinical diagnostics. For example, its distribution in cell-free DNA (cfDNA) obtained from a liquid biopsy can be used for the tissue-specific prediction of early-stage cancer.
  • cfDNA cell-free DNA
  • AP0BEC3A is a cytidine deaminase that recognizes single-stranded DNA and catalyzes the deamination of cytosine (C) to uracil (U), 5-methylcytosine (5mC) to thymine (T), and 5-hydroxymethylcytosine to 5-hydroxymethyluracil.
  • C cytosine
  • U uracil
  • T 5-methylcytosine
  • T 5-hydroxymethylcytosine to 5-hydroxymethyluracil.
  • Protein engineering of AP0BEC3A has resulted in mutant APOBEC proteins with selectivity towards deamination of 5mC with reduced activity towards deamination of C, however residual activity for deamination of C remains. This undesirable deamination of unmethylated cytosines results in the false positive detection of 5mC (and 5hmC) with uracil bases being read as thymine bases in the assay.
  • the present disclosure provides a method of reducing false positive detection of 5- methylcytosine (5mC) and/or 5-hydroxymethylcytosine (5hmC) due to deamination of unmethylated cytosines, the method comprising:
  • the deamination of unmethylated cytosines is due to deamination of unmethylated cytosines in an assay using a cytosine deaminase to selectively deaminate methylated cytosines.
  • the sample comprising single stranded DNA library fragments is contacted with a deaminase to selectively deaminate methylated cytosine.
  • the disclosure provides a method of replicating uracil residues as cytosine residues, the method comprising:
  • the deoxycytidyl transferase comprises the Revl enzyme.
  • the high fidelity polymerase comprises T4 DNA polymerase or E. coli polymerase.
  • treating the double stranded DNA library fragments to digest the first strand comprising abasic sites library comprises treating the double stranded DNA library fragments with heat and/or NaOH.
  • the single stranded DNA library fragments are about lOObp to about 200bp in length.
  • the method further comprises subjecting the complementary second strands in which uracil residues are replicated as cytosine residues sample to polymerase chain reaction (PCR) amplification.
  • PCR polymerase chain reaction
  • the method further comprises sequencing the complementary second strands in which uracil residues are replicated as cytosine residues.
  • the method further comprises processing the complementary second strands in which uracil residues are replicated as cytosine residues to produce a sequencing library.
  • the method further comprises sequencing the sequencing library.
  • the cytosine deaminase comprises an altered cytosine deaminase.
  • the altered cytosine deaminase is a member of the AID subfamily, the APOBEC1 subfamily, the APOBEC2 subfamily, the AP0BEC3A subfamily, the APOBEC3B subfamily, the APOBEC3C subfamily, the AP0BEC3D subfamily, the APOBEC3F subfamily, the AP0BEC3G subfamily, the AP0BEC3G subfamily, the AP0BEC3H subfamily, or the APOBEC4 subfamily, or an alteration thereof.
  • the altered cytosine deaminase comprises an altered AP0BEC3A.
  • the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type AP0BEC3A protein and/or an amino acid substitution mutation at a position functionally equivalent to Tyrl32 in a wild-type AP0BEC3A protein.
  • the altered cytidine deaminase comprises amino acid substitution mutations at positions functionally equivalent to (Tyr/Phe)130 and Tyrl32 in a wild-type AP0BEC3A protein.
  • the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type AP0BEC3A protein, wherein the substitution mutation is (Tyr/Phe)130Trp.
  • the (Tyr/Phe)130 is Tyrl30
  • the wild-type AP0BEC3A protein is SEQ ID NO: 12.
  • the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to alanine, glycine, phenylalanine, histidine, glutamine, methionine, asparagine, lysine, valine, aspartic acid, glutamic acid, serine, cysteine, proline, arginine, or threonine.
  • the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to Ala, Vai, or Trp.
  • the substitution mutation at the position functionally equivalent to Tyrl32 comprises a mutation to His, Arg, Gin, or Lys.
  • the altered cytidine deaminase converts 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of cytosine (C) to uracil (U) by deamination. In some aspects, the rate is at least 100-fold greater.
  • the altered cytidine deaminase converts cytosine (C) to uracil (U) by deamination and 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of 5 -hydroxymethyl cytosine (5hmC) to 5-hydroxymethyl uracil (5hmU) by deamination. In some aspects, conversion of 5hmC to 5hmU by deamination is undetectable.
  • the altered cytidine deaminase comprises a ZDD motif H- [P/A/V]-E- X[23-28]-P-C-X[2-4]-C (SEQ ID NO: 1).
  • the altered cytidine deaminase is a member of the AP0BEC3A subfamily and comprises a ZDD motif HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8- 1 l]LX2LX[10]M (SEQ ID NO:2), wherein the amino acid substitution mutation at the position functionally equivalent to (Tyr/Phe)130 of the wild-type AP0BEC3A protein is the Tyr (Y) amino acid of the ZDD motif.
  • the altered cytidine deaminase is a member of the AP0BEC3A subfamily and comprises X[16-26]-GRXXTXLCYXV-X15-GXXXN-X12-HAEXXF-X14- YXXTWXXSWSPC- X[2-4]-CA-X5-FL-X7-LXIXXXR(L/I)Y-X8-GLXXLXXXG-X5-M-X4- FXXCWXXFV-X6-FXPW-X13-LXXI- X[2-6] (SEQ ID NO:3).
  • the altered cytidine deaminase is a member of the AP0BEC3A family and comprises SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, or SEQ ID NO: 11.
  • the present disclosure provides a kit comprising: a cytosine deaminase; an uracil DNA glycosylase (UDG); a high fidelity polymerases; and/or a deoxy cytidyl transferase.
  • a cytosine deaminase an uracil DNA glycosylase (UDG); a high fidelity polymerases; and/or a deoxy cytidyl transferase.
  • the kit further comprises dNTPs and a primer complementary to the 3' end library adapter capable of binding to single stranded DNA library fragments comprising 5' end and 3' end library adapters.
  • the kit further comprises an unnatural dCTP derivative.
  • the cytosine deaminase is an altered APOBEC.
  • nucleic acid is intended to be consistent with its use in the art and includes naturally occurring nucleic acids or functional analogs thereof. Particularly useful functional analogs are capable of hybridizing to a nucleic acid in a sequence specific fashion or capable of being used as a template for replication of a particular nucleotide sequence.
  • Naturally occurring nucleic acids generally have a backbone containing phosphodi ester bonds. An analog structure can have an alternate backbone linkage including any of a variety of those known in the art.
  • Naturally occurring nucleic acids generally have a deoxyribose sugar (for example, found in deoxyribonucleic acid (DNA)) or a ribose sugar (for example, found in ribonucleic acid (RNA)).
  • a nucleic acid can contain any of a variety of analogs of these sugar moieties that are known in the art.
  • a nucleic acid can include native or non-native bases.
  • a native deoxyribonucleic acid can have one or more bases selected from the group consisting of adenine, thymine, cytosine, or guanine and a ribonucleic acid can have one or more bases selected from the group consisting of uracil, adenine, cytosine, or guanine.
  • Useful non-native bases that can be included in a nucleic acid are known in the art.
  • template and “target,” when used in reference to a nucleic acid, is intended as a semantic identifier for the nucleic acid in the context of a method or composition set forth herein and does not necessarily limit the structure or function of the nucleic acid beyond what is otherwise explicitly indicated.
  • target nucleic acid is intended as a semantic identifier for the nucleic acid in the context of a method or composition or kit set forth herein and does not necessarily limit the structure or function of the nucleic acid beyond what is otherwise explicitly indicated.
  • Reference to a nucleic acid such as a target nucleic acid includes both single-stranded and double-stranded nucleic acids, and both DNA and RNA, unless indicated otherwise.
  • polynucleotide and “oligonucleotide” are used interchangeably herein to refer to a polymeric form of nucleotides of any length, and may include ribonucleotides, deoxyribonucleotides, analogs thereof, or mixtures thereof.
  • the terms should be understood to include, as equivalents, analogs of either DNA, RNA, cDNA, or antibody-oligo conjugates made from nucleotide analogs and to be applicable to single stranded (such as sense or antisense) and double stranded polynucleotides.
  • the term as used herein also encompasses cDNA, that is complementary or copy DNA produced from an RNA template, for example by the action of reverse transcriptase.
  • the term “primer” and its derivatives refer generally to any polynucleotide that can hybridize to a target sequence of interest.
  • the primer functions as a substrate onto which nucleotides can be polymerized by a polymerase; in some embodiments, however, the primer can become incorporated into the synthesized nucleic acid strand and provide a site to which another primer can hybridize to prime synthesis of a new strand that is complementary to the synthesized nucleic acid molecule.
  • the primer can include any combination of nucleotides or analogs thereof.
  • the primer is a singlestranded oligonucleotide or polynucleotide.
  • polynucleotide and “oligonucleotide” are used interchangeably herein to refer to a polymeric form of nucleotides of any length, and may comprise ribonucleotides, deoxyribonucleotides, analogs thereof, or mixtures thereof.
  • the terms should be understood to include, as equivalents, analogs of either DNA or RNA made from nucleotide analogs and to be applicable to single stranded (such as sense or antisense) and double-stranded polynucleotides.
  • the term as used herein also encompasses cDNA, that is complementary or copy DNA produced from an RNA template, for example by the action of reverse transcriptase. This term refers only to the primary structure of the molecule. Thus, the term includes triple-, double- and single-stranded deoxyribonucleic acid (“DNA”), as well as triple-, double- and single-stranded ribonucleic acid (“RNA”).
  • DNA triple-, double- and single-
  • sensitivity is equal to the number of true positives divided by the sum of true positives and false negatives.
  • “providing” in the context of a protein, sample of DNA or RNA, or composition means making the protein, sample of DNA or RNA, or composition, purchasing the protein, sample of DNA or RNA, or composition, or otherwise obtaining the protein, sample of DNA or RNA, or composition.
  • isolated refers to material removed from its original environment (e.g., the natural environment if it is naturally occurring), and thus is altered “by the hand of man” from its natural state.
  • each when used in reference to a collection of items, is intended to identify an individual item in the collection but does not necessarily refer to every item in the collection unless the context clearly dictates otherwise.
  • a,” “an,” “the,” and “at least one” are used interchangeably and mean one or more than one.
  • Conditions that are “suitable” for an event to occur or “suitable” conditions are conditions that do not prevent such events from occurring. Thus, these conditions permit, enhance, facilitate, and/or are conducive to the event.
  • any method disclosed herein that includes discrete steps the steps may be conducted in any feasible order. And, as appropriate, any combination of two or more steps may be conducted simultaneously.
  • various aspects of the disclosure can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range.
  • range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 4.5, 5, 5.3, and 6. This applies regardless of the breadth of the range.
  • FIG. 1 is a schematic of a dual enzymatic process for false positive repair.
  • FIGS. 2A and 2B Interpretation of sequencing data with TraPR-treated libraries.
  • FIG. 2A presents how DNA sequences are modified through the workflow, expected sequencing results, and comparison to reference sequences.
  • FIG. 2A shows SEQ ID NOs: 13-20. After sequencing, false positive repair events are discriminated from true positive methylation signals by comparison of the sequencing data to the reference sequences shown in FIG. 2B.
  • FIG. 3 shows the structure of Revl Binary Complex and Revl Ternary Complex (Weaver et al., 2020, PNAS' 117(41):25494-25504).
  • FIGS. 4A-4B show the amino acid sequences of various altered cytosine deaminases.
  • FIG. 4A shows the amino acid sequences of altered cytosine deaminases with SEQ ID NO:5 and SEQ ID NO:6.
  • FIG. 4B shows the amino acid sequences of altered cytosine deaminases with SEQ ID NO: 9, SEQ ID NOTO, and SEQ ID NOT E
  • the problem of false positive conversions of cytosines to uracils in cytosine deaminase based methylation detection assays is solved utilizing a dual enzymatic process.
  • UDG Uracil DNA Glycosylase
  • the DNA is then incubated with a high-fidelity polymerase supplemented with a deoxycytidyl transferase, such as Revl, to repair the abasic lesion with the installation of a cytidine.
  • FIG. 1 A schematic illustrating this translesion polymerase repair (TraPR) method is shown in FIG. 1. Briefly, a preparation of DNA fragments from an input sample that has been treated with a cytidine deaminase to deaminate 5-methylcytosine (5mC) and/or 5-hydroxymethylcytosine (5hmC) residues and possibly including one or more off-target conversions of a cytosine to an uracil is first treated with an uracil DNA glycosylase (UDG). UDG enzymatically catalyzes the hydrolysis of the N-glycosidic bond from deoxyuridine to release uracil, resulting in an abasic site.
  • UDG uracil DNA glycosylase
  • the complementary DNA fragments are generated by use of a high-fidelity polymerase, supplemented with a deoxy cytidyl transferase, such as Revl, to generate complementary corrected DNA fragments.
  • a high-fidelity polymerase supplemented with a deoxy cytidyl transferase, such as Revl, to generate complementary corrected DNA fragments.
  • Revl deoxy cytidyl transferase
  • a sample is a preparation of single-stranded DNA (ssDNA) fragments is a preparation of library fragments.
  • a sample including single-stranded DNA (ssDNA) fragments is a preparation of library fragments.
  • the library fragments may include 5' and/or 3' adapter sequences.
  • Single stranded library preparation methods are well known in the art and include ligation-based approaches (Troll et al., 2019, BMC Genomics,' 20(1): 1-14; and Raine et al., 2017, Nucleic Acids Research, 45(6):e36) and commercial kits (xGen Methyl-Sequencing DNA Library Prep Kit and Adaptase, Integrated DNA Technologies).
  • the preparation of library fragments is in solution.
  • the preparation of library fragments is on a surface, including, but not limited to, on the surface of beads.
  • bead based preparation may involve the use bead-linked transposomes (BLT) utilizing transposomes conjugated directly to beads to bind a fixed DNA fragments (Bruinsma et al., 2018, BMC Genomics,' 19(722), using, for example Illumina’s NEXTERATM technologies.
  • BLT bead-linked transposomes
  • the preparation of corrected DNA fragments is then sequenced.
  • the 5m-dC sites will be recognized by the expected conversion to dT, while the false positives arising from dC to dU conversion can be identified by a defined signature, a dC to dG conversion.
  • the preparation of DNA fragments may be treated to degrade or digest the original DNA strands.
  • the original DNA strands contain abasic sites which can serve as targets for this digestion/degradation.
  • abasic sites are labile, treatment with for example, heat or a base, such as for example NaOH, will result in the cleavage and degradation of original DNA strands that contain an abasic site.
  • removal of original DNA strands containing abasic sites may be by an enzymatic process.
  • the 5-hydroxymethylcytosine (5hmC) binding, ESC- specific (HMCES) protein or another SRAP (SOS-Response Associated Peptidase) domain protein such as, for example, Escherichia coli YedK, may be used to trap and remove ss DNA fragments with abasic sites.
  • HMCES and YedK preferentially bind ssDNA and efficiently form DNA-protein crosslinks (DPCs) to AP sites in ssDNA.
  • DPCs DNA-protein crosslinks
  • HMCES may be bound to a surface, such as for example, a bead, may be used to trap and remove ssDNA fragments containing abasic sites.
  • the covalent linkage of HMCES to ssDNA fragments containing abasic sites may serve to hinder or stop the PCR replication for fragments containing abasic sites.
  • a preparation of corrected DNA fragments may be subject to PCR amplification prior to sequencing.
  • a standard high fidelity PCR polymerase may be used.
  • a preparation of corrected DNA fragments is not subject to amplification prior to sequencing.
  • a sample including single-stranded DNA (ssDNA) fragments is contacted with a cytosine deaminase to deaminate methylated cytosines.
  • a sample including single-stranded DNA (ssDNA) fragments is a preparation of denatured library fragments.
  • the library fragments may include 5' and/or 3' adapter sequences.
  • a “cytidine deaminase enzyme” refers to an enzyme that deaminates cytosine and/or one or more cytosine derivatives. The deamination occurs at the amino group of the C4 position of the cytosine or cytosine derivative.
  • a cytidine deaminase enzyme may deaminate cytosine to form U, may deaminate mC to form T, and/or may deaminate hydroxymethylcytosine (hmC) to form hmU.
  • a nonlimiting example of a cytidine deaminase enzyme that may deaminate cytosine to form U, may deaminate mC to form T, and/or may deaminate hmC to form hmU is apolipoprotein B mRNA editing enzyme, catalytic polypeptide- like (APOB EC).
  • APOBECs include AP0BEC1, AP0BEC2, AP0BEC3A, AP0BEC3B, APOBEC3C, AP0BEC3E, APOBEC3F, AP0BEC3G, AP0BEC3H, and AP0BEC4.
  • methylcytosine refers to cytosine that includes a methyl group (-CH3 or -Me).
  • the methyl group may be located at the 5 position of the cytosine, in which case the mC may be referred to as 5mC.
  • a cytidine deaminase may include, but is not limited to, any known member of the APOBEC protein family.
  • the APOBEC protein family is a member of the large cytidine deaminase superfamily that contains a canonical zinc-dependent deaminase (ZDD) signature motif embedded within a core cytidine deaminase fold. This fold includes a five-stranded mixed beta (b)-sheet surrounded by six alpha (a)-helices with the order al-bl-b2-a2-b3-a3-b4-a4-b5-a5- a6 (Salter et al., Trends Biochem Sci. 2016 41(7):578-594.
  • ZDD zinc-dependent deaminase
  • Each cytidine deaminase domain core structure of APOBEC proteins contains a highly conserved spatial arrangement of the catalytic center residues of a zinc-binding motif (referred to as the ZDD motif) (Salter et al., Trends Biochem Sci. 2016 41 (7):578- 594).
  • the APOBEC protein family includes subfamilies AID (activation-induced cytidine deaminase), APOBEC 1, APOBEC2, APOBEC3 (including 3 A, 3B, 3C, 3D, 3F, 3G, 3H), and APOBEC4.
  • a cytidine deaminase may be a member of the APOBEC protein family from a vertebrate, such as a mammal. Examples of mammals include, but are not limited to, rodents, primates, rabbit, bovine (e.g., cow), porcine (e.g., pig), and equine (e.g., horse).
  • An example of a primate is a human and a chimpanzee.
  • Some members of the APOBEC protein family include one copy of the ZDD motif.
  • Other members of the APOBEC protein family e.g., the AP0BEC3B subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, and the AP0BEC3G subfamily, include two copies of the ZDD motif, but often only the C-terminal copy is active (Salter et al., Trends Biochem Sci. 2016 41(7):578-594).
  • the skilled person can easily identify members of each of the subfamilies by using a publicly available database such as the Protein database available at the National Center for Biotechnology Information (ncbi.nlm.nih.gov/protein).
  • a cytidine deaminase is an altered cytidine deaminase, recombinantly engineered to include a substitution mutation at one or more residues when compared to a reference cytidine deaminase.
  • An altered cytidine deaminase can be based on a member of the AID subfamily, the APOBEC 1 subfamily, the AP0BEC2 subfamily, the AP0BEC3 subfamily (e.g., the 3A subfamily, the 3B subfamily, the 3C subfamily, the 3D subfamily, the 3F subfamily, the 3G subfamily, or the 3H subfamily), or the AP0BEC4 subfamily.
  • An altered cytidine deaminase may be one of three types of altered cytidine deaminases.
  • One type of altered cytidine deaminase preferentially deaminates 5mC instead of C (i.e., converts 5mC to T at a greater rate than converting C to U) and is referred to herein as having “cytosinedefective deaminase activity.”
  • a second type of altered cytidine deaminase preferentially deaminates C instead of 5mC (i.e., converts C to U at a greater rate than converting 5mC to T) and is referred to herein as having “5mC-defective deaminase activity.”
  • a third type of altered cytidine deaminase preferentially deaminates C and 5mC to U and T, respectively, and has significantly reduced deamination of 5hmC, 5fC, and 5caC.
  • the third type is referred to herein as having “5hmC-defective deaminase activity.”
  • reference to an altered cytidine deaminase includes altered cytidine deaminases having cytosine- defective deaminase activity, altered cytidine deaminases having 5mC-defective deaminase activity, and altered cytidine deaminases having 5mC-defective deaminase activity.
  • Altered cytidine deaminases include apolipoprotein B mRNA editing enzymes, catalytic polypeptide-like (APOBEC) and activation induced cytidine deaminase (AID). Wild-type APOBEC and AID cytidine deaminases have the activity of deaminating cytidine (C) of DNA and/or RNA to form uridine (U). An altered cytidine deaminase of the present disclosure has an altered rate of deamination of C, 5mC, and/or 5hmC when compared to the wild-type enzyme.
  • APOBEC catalytic polypeptide-like
  • AID activation induced cytidine deaminase
  • Wild-type APOBEC and AID cytidine deaminases have the activity of deaminating cytidine (C) of DNA and/or RNA to form uridine (U).
  • a cytidine deaminase of the present disclosure can be referred to herein as an "altered cytidine deaminase,” “recombinant cytidine deaminase,” “mutant cytosine deaminase,” or “modified cytidine deaminases” and refers to any of the altered cytosine deaminases described herein that comprise one or more changes from the reference (i.e., wildtype) amino acid sequence that provide the unexpected property of an altered deamination profile, e.g., alters its ability to preferentially deaminate one form of cytosine over another.
  • Whether a protein has cytidine deaminase activity may be determined by in vitro assays. On example of an in vitro assay is based on digestion with the restriction enzyme 5iral. A protein that can deaminate 5mC to thymidine has cytidine deaminase activity.
  • An altered cytidine deaminase that preferentially deaminates 5mC instead of C can have a catalytic efficiency that is at least 10-fold, at least 50-fold, or at least 100-fold higher on 5mC than C substrates.
  • an altered cytidine deaminase that preferentially deaminates 5mC instead of C can have a catalytic efficiency that is no greater than 1500-fold higher on 5mC than C substrates.
  • An altered cytidine deaminase that preferentially deaminates C instead of 5mC can have a catalytic efficiency that is at least 10-fold, at least 50-fold, or at least 100-fold higher on C than 5mC substrates.
  • an altered cytidine deaminase that preferentially deaminates C instead of 5mC can have a catalytic efficiency that is no greater than 1500-fold higher on C than 5mC substrates.
  • the deamination of 5hmC by an altered cytidine deaminase disclosed herein is reduced by at least 80%, at least 90%, or at least 99% compared to the wild type cytidine deaminase.
  • the deamination of 5hmC by an altered cytidine deaminase disclosed herein is undetectable using an assay such as the .S'lrc/I- based assay.
  • an altered cytidine deaminase of the present disclosure is based on a member of the APOBEC protein family.
  • An altered cytidine deaminase of the present disclosure that is "based on" a member of the APOBEC protein family means the altered cytidine deaminase is an APOBEC protein that includes one or more of the substitution mutations described herein as compared to a reference APOBEC sequence.
  • An altered cytidine deaminase of the present disclosure that is "based on" a member of the APOBEC protein family can also include conservative and/or nonconservative mutations as described herein.
  • the APOBEC protein family includes subfamilies AID, APOBEC 1, APOBEC2, APOBEC3 (including 3A, 3B, 3C, 3D, 3F, 3G, 3H), and AP0BEC4.
  • An altered cytidine deaminase of the present disclosure can be based on a member of the AID subfamily, the APOBEC 1 subfamily, the AP0BEC2 subfamily, the APOBEC3 subfamily (e.g., the 3 A subfamily, the 3B subfamily, the 3C subfamily, the 3D subfamily, the 3F subfamily, the 3G subfamily, or the 3H subfamily), or the APOBEC4 subfamily.
  • An altered cytidine deaminase of the present disclosure can be based on a member of the APOBEC protein family from a vertebrate, such as a mammal.
  • mammals include, but are not limited to, rodents, primates, rabbit, bovine (e.g., cow), porcine (e.g., pig), and equine (e.g., horse).
  • An example of a primate is a human and a chimpanzee.
  • the APOBEC protein family is a member of the large cytidine deaminase superfamily that contains a canonical zinc-dependent deaminase (ZDD) signature motif embedded within a core cytidine deaminase fold.
  • ZDD zinc-dependent deaminase
  • This fold includes a five-stranded mixed beta (b)-sheet surrounded by six alpha (a)-helices with the order al-bl-b2-a2-b3-a3-b4-a4-b5-a5-a6 (Salter et al., 2016, Trends Biochem Sci 41(7):578-594.
  • Each cytidine deaminase domain core structure of APOBEC proteins contains a highly conserved spatial arrangement of the catalytic center residues of a zinc-binding motif H-[P/A/V]-E-X
  • Some members of the APOBEC protein family include one copy of the ZDD motif.
  • APOBEC3B subfamily e.g., the APOBEC3B subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, and the APOBEC3G subfamily
  • APOBEC3B subfamily e.g., the APOBEC3B subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, and the APOBEC3G subfamily
  • ZDD motif e.g., 2016, Trends Biochem Sci,' 41(7): 578-594. doi: 10.1016/j .tibs.2016.05.001.
  • an altered cytidine deaminase disclosed herein includes one or two ZDD motifs.
  • an altered cytidine deaminase based on a member of the APOBEC3A subfamily includes the following ZDD motif: HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I) ⁇ X[8-ii]LX 2 LX[io]M (SEQ ID NO:2) (where X is any amino acid, and the subscript number or range of numbers after X refers to the number of amino acids) (Salter et al., 2016, Trends Biochem Sci', 41(7):578-594).
  • an altered cytidine deaminase disclosed herein is a member of the following subfamilies, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, and APOBEC3G, and can include one or more highly conserved sites that are part of the active site and within the ZDD motif SEQ ID NO: 1.
  • the sites include tryptophan at position 98 and serine or threonine at position 99 (Kouno et al., 2017, Nat. Comm, 8: 15024).
  • a member of the APOBEC protein family also includes other highly conserved residues that are part of the active site but not present as part of the ZDD motif SEQ ID NO: 1.
  • a member the APOBEC3A subfamily, APOBEC3B subfamily, APOBEC3C subfamily, APOBEC3D subfamily, APOBEC3F subfamily, and APOBEC3G subfamily typically includes one or more of the following highly conserved sites that are part of the active site: arginine at position 28; histidine, asparagine, or arginine at position 29; serine or threonine, preferably threonine, at position 31; asparagine or aspartic acid at position 57; tyrosine or phenylalanine at position 130; asparagine or tyrosine at position 131; asparagine, tyrosine, or phenylalanine, preferably tyrosine, at position 132; and argin
  • An altered cytidine deaminase of the present disclosure includes a substitution mutation at one or more residues when compared to a reference cytidine deaminase.
  • a substitution mutation can be at the same position or a functionally equivalent position compared to the reference cytidine deaminase.
  • Reference cytidine deaminases and functionally equivalent positions are described in detail herein. The skilled person will readily appreciate that an altered cytidine deaminase described herein is not naturally occurring.
  • a reference cytidine deaminase can be a member of the APOBEC protein family. Essentially any known member of the APOBEC protein family can be a reference cytidine deaminase.
  • the skilled person can easily identify members of each of the subfamilies by using a publicly available database such as the Protein database available at the National Center for Biotechnology Information (ncbi.nlm.nih.gov/protein) and searching for APOBEC 1, AP0BEC2, AP0BEC3A, AP0BEC3B, AP0BEC3C, AP0BEC3D, AP0BEC3F, AP0BEC3G, AP0BEC3H, AP0BEC4, or, when identifying members of the AID family, Activation-induced cytidine deaminase.
  • a wild type reference cytidine deaminase has the activity of binding singlestranded DNA (ssDNA) and deaminating a cytosine present on the ssDNA to convert it to uracil.
  • a wild type reference cytidine deaminase has the activity of binding singlestranded RNA (ssRNA) and deaminating a cytosine present on the ssRNA to convert it to uracil.
  • an altered cytidine deaminase has an amino acid sequence that is based on a reference sequence which is a member of the APOBEC protein family includes a ZDD motif H-[P/A/V]-E-X[23-28]-P-C-X[2-4]-C (SEQ ID NO: 1) and at least one substitution mutation disclosed herein.
  • an altered cytidine deaminase includes other active site residues disclosed herein.
  • Non-limiting examples of reference cytidine deaminase proteins are shown in the following table.
  • GenBank collection of nucleotide sequences and their protein translations, available at ncbi.nlm.nih.gov/protein/.
  • an altered cytidine deaminase has an amino acid sequence that is based on a reference sequence that is a member of the AP0BEC3A subfamily, and includes a ZDD motif HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8-ii]LX2LX[io]M (SEQ ID N0:2) (where X is any amino acid, and the subscript number or range of numbers after X refers to the number of amino acids) and at least one substitution mutation disclosed herein.
  • the substitution mutation is a substitution mutation at the underlined tyrosine, such as a substitution mutation to alanine (A).
  • the altered cytidine deaminase includes other active site residues disclosed herein.
  • the amino acid sequence of an altered cytidine deaminase includes the amino acids of a member of the AP0BEC3A subfamily: X[i6-26]-GRXXTXLCYXV-Xi5- GXXXN-X12-HAEXXF-X14-YXXTWXXSWSPC- X[2-4]-CA-X 5 -FL-X7-LXIXXXR(L/I)Y-X8- GLXXLXXXG-X5-M-X4-FXXCWXXFV-X6-FXPW-X13-LXXI- X [2 -6] (SEQ ID N0:3) (where X is any amino acid, and the subscript number or range of numbers after X refers to the number of amino acids), or a subset thereof, and at least one substitution mutation disclosed herein.
  • the substitution mutation is a substitution mutation at the underlined tyrosine, such as a substitution mutation to alanine (A) or to tryptophan (W).
  • the amino acid sequence of an altered cytidine deaminase includes the amino acids of a member of the APOBEC3A subfamily: X26-GRXXTXLCYXV-X15-G-X16- HAEXXF-X14-YXXTWXXSWSPC-X4-CA-X5-FL-X7-LXIFXXR(L/I)Y-X8-GLXXLXXXG-X 5 - M-X4-FXXCWXXFV-X6-FXPW-X13-LXXLX6 (SEQ ID NO:4) (where X is any amino acid, and the subscript number after X refers to the number of amino acids present), or a subset thereof, and at least one substitution mutation disclosed herein.
  • X is any amino
  • a substitution mutation can be at the same position or a functionally equivalent position compared to a reference cytidine deaminase.
  • “functionally equivalent” it is meant that the altered cytidine deaminase has the amino acid substitution at the amino acid position in a reference cytidine deaminase that has the same functional role in both the reference cytidine deaminase and the altered cytidine deaminase.
  • the tyrosine at residue 130 of the APOBEC3A proteins of Homo sapiens, Pongo pygmaeus, Nomascus leucogenys, Pan troglodytes, and Gorilla and the tyrosine at residue 133 of the APOBEC3A protein from Macaca fascicularis are functionally equivalent and positionally equivalent.
  • the skilled person can easily identify functionally equivalent residues in cytidine deaminases.
  • an altered cytidine deaminase has an amino acid sequence that is structurally similar to a reference cytidine deaminase disclosed herein.
  • a reference cytidine deaminase is one that includes the amino acid sequence of a sequence listed in Table 1.
  • an altered cytidine deaminase may be "structurally similar" to a reference cytidine deaminase if the amino acid sequence of the altered cytidine deaminase possesses a specified amount of sequence similarity and/or sequence identity compared to the reference cytidine deaminase.
  • Structural similarity of two amino acid sequences can be determined by aligning the residues of the two sequences (for example, a candidate altered cytidine deaminase and a reference cytidine deaminase described herein) to optimize the number of identical amino acids along the lengths of their sequences; gaps in either or both sequences are permitted in making the alignment in order to optimize the number of identical amino acids, although the amino acids in each sequence must nonetheless remain in their proper order.
  • a candidate altered cytidine deaminase is the cytidine deaminase being compared to the reference cytidine deaminase.
  • a candidate altered cytidine deaminase that has structural similarity with a reference cytidine deaminase and cytidine deaminase activity is an altered cytidine deaminase.
  • a pair-wise comparison analysis of amino acid sequences can be conducted, for instance, by the local homology algorithm of Smith and Waterman, 1981, Adv. Appl. Math,' 2:482, by the homology alignment algorithm of Needleman and Wunsch, 1907, J Mol Biol,' 48:443, by the search for similarity method of Pearson and Lipman, 1988, Proc Nat'l Acad Sci USA,' 85:2444, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by visual inspection (see generally Current Protocols in Molecular Biology, Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc.
  • BLAST® algorithm which is described in Altschul et al., 1990, J Mol Biol,' 215:403-410.
  • the BLAST® algorithm can be used to calculate percent sequence identity and percent sequence similarity between two sequences.
  • Software for performing BLAST® analyses is publicly available through the National Center for Biotechnology Information.
  • amino acid sequence of a cytidine deaminase protein having sequence similarity to a reference sequence may include conservative substitutions of amino acids present in that reference sequence.
  • a conservative substitution for an amino acid in a protein may be selected from other members of the class to which the amino acid belongs.
  • an amino acid belonging to a grouping of amino acids having a particular size or characteristic can be substituted for another amino acid without altering the activity of a protein, particularly in regions of the protein that are not directly associated with biological activity.
  • amino acids having a non-polar side chain include alanine, glycine, isoleucine, leucine, methionine, phenylalanine, proline, tryptophan, and valine; amino acids having a hydrophobic side chain include glycine, alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, and tryptophan; amino acids having a polar side chain include arginine, asparagine, aspartic acid, glutamine, glutamic acid, histidine, lysine, serine, cysteine, tyrosine, and threonine; and amino acids having an uncharged side chain include glycine, serine, cysteine, asparagine, glutamine, tyrosine, and threonine.
  • reference to a cytidine deaminase as described herein can include a protein with at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% amino acid sequence similarity to the reference cytidine deaminase.
  • altered cytidine deaminases having similarity with a reference amino acid sequence includes those having, for instance, at least 80%, at least 85%, at least 90%, or at least 95% similarity with SEQ ID NO:5 and having an alanine at amino acid 130.
  • Other examples of altered cytidine deaminases having similarity with a reference amino acid sequence includes those having, for instance, at least 80%, at least 85%, at least 90%, or at least 95% similarity or identity with SEQ ID NO:6 and having an alanine at amino acid 130 and a histidine at amino acid 132.
  • reference to a cytidine deaminase as described herein can include a protein with at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% amino acid sequence identity to the reference cytidine deaminase.
  • altered cytidine deaminases having identity with a reference amino acid sequence includes those having, for instance, at least 80%, at least 85%, at least 90%, or at least 95% similarity with SEQ ID NO:5 and having an alanine (A) at amino acid 130.
  • Other examples of altered cytidine deaminases having identity with a reference amino acid sequence includes those having, for instance, at least 80%, at least 85%, at least 90%, or at least 95% similarity or identity with SEQ ID NO:6 and having an alanine (A) at amino acid 130 and a histidine (H) at amino acid 132.
  • An altered cytidine deaminase of the present disclosure may include a substitution mutation at a position functionally equivalent to tyrosine at position 130 (Y130) in a member of the APOBEC3A subfamily. Accordingly, an alignment can be produced using a member of the APOBEC3A subfamily and another candidate altered cytidine deaminase from the APOBEC3A subfamily or a different APOBEC subfamily.
  • the candidate is selected from APOPEC subfamilies APOBEC 1 or AID.
  • An example of an algorithm that can be used to produce an alignment is Clustal O.
  • the wild type residue at a position functionally equivalent to Y130 is phenylalanine (F).
  • an altered cytidine deaminase of the present disclosure includes a substitution mutation at a position functionally equivalent to the tyrosine (Y) of ZDD motif HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8-ii]LX 2 LX[io]M (SEQ ID NO:2) in a member of the APOBEC family, such as a member of the APOBEC3A subfamily.
  • the underlined tyrosine (Y) of SEQ ID NO:2 is the position functionally equivalent to the tyrosine amino acid 130 of the wild type APOBEC3A protein (SEQ ID NO: 12).
  • the substitution mutation at a position functionally equivalent to Y130 increases cytidine deaminase activity and preferentially acts on 5mC compared to cytosine (i.e., has cytosine-defective deaminase activity).
  • the substitution mutation can be a mutation to alanine (A), glycine (G), phenylalanine (F), histidine (H), glutamine (Q), methionine (M), asparagine (N), lysine (K), valine (V), aspartic acid (D), glutamic acid (E), serine (S), cysteine (C), proline (P), or threonine (T).
  • the altered cytidine deaminase can comprise SEQ ID NO:9, wherein X is selected from A, G, F, H, Q, M, N, K, V, D, E, S, C, P or T (and is not Y), or can comprise SEQ ID NO: 10, wherein Z is selected from A, G, F, H, Q, M, N, K, V, D, E, S, C, P or T (and is not Y), preferably, in one embodiment, X or Z is A or L.
  • the substitution mutation at a position functionally equivalent to Y130 is a mutation to alanine (A), (e.g., SEQ ID NO: 5).
  • altered cytidine deaminases having increased activity and preferentially acting on 5mC compared to cytosine include SEQ ID NO: 5 or a sequence having at least 90%, at least 95%, at least 98%, at least 99% sequence identity to SEQ ID NO:5 and comprising Y130A.
  • An altered cytidine deaminase of the present disclosure having cytosine-defective deaminase activity optionally includes a second substitution mutation at a position two, three, four, or five amino acids on the C -terminal side of the Y130 position, or functionally equivalent to the Y130 position.
  • the second mutation is a tyrosine (Y), tryptophan (W), cysteine (C), histidine (H), or phenylalanine (F) at a position two, three, four, or five amino acids on the C-terminal side of the Y130 position, or functionally equivalent to the Y130 position.
  • the second mutation is at a position functionally equivalent to tyrosine at position 132 (Y132) in a member of the APOBEC3A subfamily.
  • An APOBEC protein such as an APOBEC3A protein, containing substitution mutations at both the first site, a position functionally equivalent to Y130, and the second site, at a position two, three, four, or five amino acids on the C-terminal side of the Y130 position, increases the preferential activity to act on 5mC compared to the same APOBEC protein, such as an APOBEC3A protein, containing one substitution mutation at Y130.
  • the substitution mutation at the second position is an amino acid having a positively charged side chain and selected from arginine (R), histidine (H), lysine (L), or a polar side chain selected from glutamine (Q).
  • the substitution mutation at the second position is histidine (H), such as Y132 to histidine.
  • the double mutant containing both first and second mutations can be any substitution mutation at a position functionally equivalent to Y130 described herein and any second substitution mutation at a position two, three, four, or five amino acids on the C-terminal side of the Y130 position described herein, in any combination.
  • the altered cytidine deaminase can be, for example, SEQ ID NO: 4 and have a substitution at Y130 and Y132, or the position functionally equivalent to Y130 and Y132 as described herein.
  • SEQ ID NO: 11 comprising Y130X and Y132Z, where X is selected from (A), (L), or (W) (preferably (A)), and Z is selected from (R), (H), (L), or (Q), preferably (H).
  • the double mutant includes substitution mutations Y130A and Y132R, Y130A and Y132H, Y130A and Y132L, Y130A and Y132Q, Y130L and Y132R, Y130L and Y132H, Y130L and Y132L, Y130L and Y132Q, Y130W and Y132R, Y130W and Y132H, Y130W and Y132L, Y130W and Y130Q, or any suitable combinations therein.
  • the double mutant includes substitution mutations Y130A and Y132H.
  • altered cytidine deaminases having both substitution mutations and preferentially acting on 5mC compared to the APOBEC protein having just the single substitution mutation at cytosine include SEQ ID NO:6 or a sequence having at least 90%, at least 95%, at least 98%, at least 99% sequence identity to SEQ ID NO:6 and comprising Y130A and Y132H.
  • double mutants can be constructed to create an altered cytidine deaminase having a first substitution mutation at a position functionally equivalent to Y130 and a second arginine, glutamine, histidine, or lysine substitution mutation at the tyrosine position two amino acids on the C-terminal side of the Y130 position, and then evaluated for deamination of C residues in one assay and deamination of 5mC residues in a second assay.
  • the ratio of 5mC deamination and C deamination can be compared to identify those double mutants that preferentially deaminate 5mC compared to C.
  • One of ordinary skill in the art could similarly test double mutants having a tyrosine at a position three, four or five positions C- terminal to the position functionally equivalent to Y130 and confirm that a substitution mutation at that position to arginine, glutamine, histidine, or lysine, in combination with a mutation at the position functionally equivalent to Y130 (such as Y130A), as double mutants that preferentially deaminate 5mC compared to C.
  • substitution mutations that result in 5mC- defective deaminase activity (i.e., converts C to U at a greater rate than converting 5mC to T).
  • the substitution mutation at a position functionally equivalent to Y130 increases cytidine deaminase activity and preferentially acts on cytosine compared to 5mC and is a mutation to an amino acid having a non-polar side chain or a hydrophobic side chain, such as leucine (L) or tryptophan (W).
  • the substitution mutation at a position functionally equivalent to Y130 is a mutation to leucine.
  • mutations that result in increased preferential deamination activity on cytosine compared to 5mC include a single mutant with Y132P, and double mutants with a substitution mutation at Y130V and Y132H, or Y130W and Y132H.
  • Specific examples of altered cytidine deaminases having increased cytidine deaminase activity and preferentially acts on cytosine compared to 5mC include SEQ ID NO:7 or a sequence having at least 90%, at least 95%, at least 98%, at least 99% sequence identity to SEQ ID NO:7 and comprising Y130L.
  • the substitution mutation is at a position functionally equivalent to Y130 that results in 5hmC-defective deaminase activity (i.e., preferentially deaminates C and 5mC to U and T, respectively, and has significantly reduced deamination of 5hmC).
  • the substitution mutation at a position functionally equivalent to Y130 is a mutation to an amino acid having a non-polar side chain or a hydrophobic side chain, such as tryptophan (W).
  • altered cytidine deaminases having the ability to deaminate C and 5mC to U and T, respectively, but reduced ability to deaminate 5hmC, preferably no detectable ability to deaminate 5hmC include SEQ ID NO:8 or a sequence having at least 90%, at least 95%, at least 98%, at least 99% sequence identity to SEQ ID NO:8 and comprising Y130W.
  • an altered cytidine deaminase includes a substitution mutation at a position functionally equivalent to tyrosine at position 130 (Y130) and/or tyrosine position 132 (Y132) in a member of the APOBEC3A subfamily. In some embodiments, such an altered cytidine deaminase demonstrates selective deamination for mC.
  • an altered cytidine deaminase is an altered APOBEC3A cytidine deaminase, altered to include a substitution mutation at tyrosine at position 130 (Y130) and/or tyrosine position 132 (Y132). In some embodiments, such an altered APOBEC3A cytidine deaminase demonstrates selective deamination for mC.
  • an altered cytidine deaminase is a double mutant of APOBEC3A, with substitution mutations Y130A/Y132H. In some embodiments, such an altered APOBEC3A cytidine deaminase demonstrates selective deamination for mC.
  • an altered cytidine deaminase includes an altered cytidine deaminase having an amino acid of SEQ ID NO:5, SEQ ID NO:6, SEQ ID NOY, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, or SEQ ID NO: 11.
  • such an altered APOBEC3A cytidine deaminase demonstrates selective deamination for mC.
  • An altered cytidine deaminase described herein can include additional mutations. Typically, additional mutations do not unduly alter the activity of the altered cytidine deaminase. One or more additional mutations can be a conservative mutation.
  • An altered cytidine deaminase described herein can be a truncated protein.
  • a truncated protein is a fragment of an altered cytidine deaminase of the present disclosure that retains the ability to deaminate 5mC to thymidine.
  • a truncated altered cytidine deaminase can include a deletion of 1 to 13 amino acids on the N-terminal end of the protein, a deletion of 1 to 3 amino acids on the C-terminal end of the protein, or a combination thereof.
  • an altered cytidine deaminase includes any of those described in International Patent Application No. PCT/US2023/017846 (“Altered Cytidine Deaminases and Methods of Use”), filed April 7, 2023, which is hereby incorporated by reference in its entirety.
  • methods for using a cytidine deaminase include contacting target nucleic acids, e g., DNA or RNA, with the enzyme, under conditions suitable for conversion of modified cytidines, such as 5mC, to thymidine, or for conversion of unmodified cytidine to uracil. Because amplification of DNA does not preserve the modification status of cytidine (e.g., the methylation status of 5mC is not retained), use of a cytidine deaminase typically occurs before amplification of target DNA.
  • Target nucleic acids can be contacted with cytidine deaminase at essentially any time.
  • target nucleic acids can be contacted with cytidine deaminase after isolation of genomic or cell free DNA or mRNA, before or after fragmentation, or before or after tagmentation.
  • target nucleic acids can be contacted with a cytidine deaminase after addition of a universal sequence and/or an adapter, provided the universal sequence and/or an adapter is not added by amplification.
  • Reaction conditions suitable for conversion of modified cytidines, such as 5mC, to thymidine by a cytidine deaminase include, but are not limited to, a substrate of target nucleic acid suspected of including at least one modified cytidine, with appropriate pH, temperature of the reaction, time of the reaction, and concentration of the cytidine deaminase and/or DNA or RNA substrate. It is expected that a cytidine deaminase can function in essentially any buffer. Examples of useful buffers include, but are not limited to, a citrate buffer, such as the citrate buffer available from Thermo Fisher Scientific (Cat. No.
  • a deamination reaction can occur at a temperature of about 25°C to about 60°C, including but not limited to, at about 37°C, at about 45°C, at about 50°C, and at about 60°C.
  • Some cytidine deaminases preferentially deaminate a modified cytosine to thymidine at a faster rate than deamination of cytosine to uracil.
  • the time of reaction can be used to allow the reaction to run to completion, to maximize the difference of deamination of modified cytosine versus deamination of cytosine.
  • the reaction can proceed for at least 15 minutes, at least 30 minutes, at least 45 minutes, at least 60 minutes, at least 90 minutes, at least 120 minutes, or at least 150 minutes, and for no greater than 15 minutes, no greater than 30 minutes, no greater than 45 minutes, no greater than 60 minutes, no greater than 90 minutes, no greater than 120 minutes, no greater than 150 minutes, or no greater than 180 minutes. In some embodiments, the reaction can run overnight.
  • a deamination reaction can include a cytidine deaminase at a concentration from at least about 25 nanomolar (nM) to no greater than about 5 micromolar (pM).
  • concentration of the enzyme can be at least about 25 nM, at least about 0.5, at least about 1 pM, at least about 2pM, at least about 3 pM, at least about 4 pM, or at least about 5 pM, and/or no greater than 5 pM, no greater than 4 pM, no greater than 3 pM, no greater than 2 pM, no greater than 1 pM, or 0.5 pM.
  • a deamination reaction can include about 1 ng to about 1 pg input nucleic acid. In some embodiments, a deamination reaction can include nucleic acids at a concentration of at least about 10 pM to at least about 200 nM.
  • Uracil-DNA-glycosylase also known as Uracil-N-glycosylase (UNG)
  • Uracil-DNA-glycosylase is a highly conserved repair enzyme that catalyzes the excision of uracil from uracil-containing single- and double-stranded DNA but is inactive on RNA. It is a monomeric protein with relatively stable physicochemical properties, a small molecular weight of 25KDa, and is widely present in various prokaryotic and eukaryotic organisms.
  • UDG excises uracil from DNA by hydrolyzing the N-glycoside bond between the uracil base and the sugar-phosphate backbone in single- and double-stranded DNA (Bellamy et al., 2007, Nucleic Acids Res; 35: 1478-1487; Slupphaug et al., 1996, Nature 384, 87-92; Stivers et al., 1999, Biochemistry; 38:952-963; and Parikh et al., 2000, MutatRes 460: 183-199), resulting in the formation of an abasic site (AP-site) having a hemiacetal moiety.
  • AP-site abasic site having a hemiacetal moiety
  • FIG. 1 A schematic illustration of the UDG-mediated generation of single nucleotide gaps within single stranded DNA fragments is shown in FIG. 1. Because false positive (cytosine) deamination results in uracil bases, and true positive (methylcytosine) bases result in thymine bases, UDG can be utilized to specifically recognize and remove uracil bases, thus removing the false positive signal and preventing its propagation as a “T” in downstream amplification and sequencing. APOB EC enzymes require ssDNA for recognition, and thus deaminated DNA will be single stranded.
  • the UDG is of commercial origin.
  • Reaction conditions suitable for the UDG-mediated excision of uracil from DNA include, but are not limited to, concentration of the single stranded or double stranded DNA substrate, pH, temperature of the reaction, time of the reaction, and concentration of the UDG enzyme.
  • UDG is active over a broad pH range with an optimum at pH 8.0, does not require divalent cation, and is inhibited by high ionic strength (> 200 niM). It is expected that a UDG can function in any of a variety of buffers.
  • An example of a useful buffer includes, but is not limited to, IX UDG Reaction Buffer (New England Biolabs, Catalog # B0280S, see the worldwide web at neb.com/products/m0280-uracil-dna-glycosylase-udg#Product%20Information) which is 20 mM Tris-HCl, ImM DTT, ImM EDTA (pH 8 at 25°C).
  • Uracil-DNA Glycosylase is active over a broad pH range, with an optimum at pH 8.0, does not require a divalent cation, and is inhibited by high ionic strength (> 200 pM).
  • Uracil-DNA Glycosylase is active in a temperature of 25°C to 37°C and in some embodiments the reaction can proceed in a temperature of 25°C to 37°C. In some embodiments, the reaction can proceed at 37°C. In some embodiments, the reaction can proceed for about 5 minutes, about 10 minutes, about 15 minutes, about 20 minutes, about 30 minutes, about 45 minutes, about 60 minutes, about 90 minutes, about 120 minutes, or any range thereof.
  • a reaction can include about O.OOlU/pl to about 1 U/ pl UDG enzyme, wherein one unit is defined as the amount of enzyme that catalyzes the release of 60 pmol of uracil per minute from double-stranded, uracil-containing DNA.
  • a reaction can include about 0.05 U/ pl UDG.
  • a reaction can include nucleic acids at a concentration of about Ing to about lug of input nucleic acid.
  • a reaction can include nucleic acids at a concentration of about ⁇ 10pM to about 200nM.
  • a reaction can include nucleic acids at a concentration of about 200pM to about 20nM.
  • dNTPs deoxyribonucleoside 5 '-triphosphates
  • Primers include, but are not limited to, a primer complementary to the 3' end library adapter, and random oligonucleotides of about 18 to 22 bases in length.
  • the two enzymatic steps of 1) UDG treatment to generate abasic sites where dC to dU conversions have occurred and 2) complementary strand synthesis with a high-fidelity polymerase supplemented with a deoxycytidyl transferase may be carried out separately and sequentially. In some embodiments, the two enzymatic steps may be carried out simultaneously in the same reaction mixture.
  • a high-fidelity polymerase is utilized for the synthesis of complementary strands.
  • the fidelity of a DNA polymerase refers to its ability to accurately replicate a template.
  • a critical aspect of this is the ability of the DNA polymerase to read a template strand, select the appropriate nucleoside triphosphate and insert the correct nucleotide at the 3' primer terminus, such that canonical Watson-Crick base pairing is maintained.
  • the rate of misincorporation (incorporating the incorrect nucleotide) is known as the polymerase's “error rate.”
  • some DNA polymerases possess a 3 ' — >5' exonuclease activity.
  • a high fidelity polymerases replicates DNA with the introduction of minimal errors. Examples include, but are not limited to, VENT® DNA Polymerase (New England BioLabs, Inc.), PHUSION® High-Fidelity DNA Polymerase (Thermo Scientific), Q5® High-Fidelity DNA Polymerase, T4 DNA polymerase, and E. coli DNA polymerase (Journal of Molecular Biology 336, no. 5 (2004): 1023-34).
  • a high fidelity polymerase is an archaeal polymerase.
  • the high- fidelity polymerase is supplemented with a deoxycytidyl transferase.
  • Deoxycytidyl transferases are Y family polymerases that are involved in DNA repair, complementing other polymerases to prevent their stalling at translesion sites, by transferring a dCMP residue from dCTP to the 3 '-end of a DNA primer in a template-dependent reaction.
  • Deoxycytidyl transferases assist in the bypass of a abasic lesion by the insertion of a nucleotide opposite the lesion.
  • a deoxycytidyl transferase demonstrates a preferential and limited incorporation of dCMP in a template-directed manner regardless of the template nucleotide, always inserting a deoxy cytidine (dC) across from a lesion. Whether G, A, T, C, or an abasic site, a deoxycytidyl transferase will always add a C.
  • Deoxycytidyl transferases may be produced recombinantly and are commercially available.
  • recombinant human REV1 protein (Catalog REV1-1531H), recombinant mouse REV1 protein (Catalog # REV1-14090M), recombinant Chicken REV1 (Catalog # REV1-2508C), recombinant zebrafish REV1 (Catalog # REV1-6683Z), and recombinant yeast Revl protein (Catalog #Revl-1532Y) are commercially available from Creative Biomart Inc. (Shirley, NY) and Human Revl (Catalog # PT-A04738) is available from Novatein Biosciences, Woburn MA.
  • Reaction conditions may include any of those discussed in Brown et al. (Brown et al., 2010, Biochemistry, 49(26):5504-5510).
  • unnatural dCTP derivatives including, but not limited to, any of those discussed those discussed in Salem et al. (Salem et al., 2009, J Bacterial, 191 ( 18): 5657-68) may be used.
  • corrected DNA fragments may be sequenced.
  • Sequencing may be by any of a variety of known methodologies, including, but not limited to any of a variety high- throughput, next generation sequencing (NGS) platforms, including, but not limited to, sequencing by synthesis, sequencing by ligation, nanopore sequencing, Sanger sequencing, and the like.
  • NGS next generation sequencing
  • sequencing is performed using the sequencing by synthesis methodologies commercialized by ILLUMINA® as described in U.S. Patent Application Publication No. 2007/0166705, U.S. Patent Application Publication No. 2006/0188901, U.S. Pat. No.
  • NGS Next Generation Sequencing
  • NGS refers to sequencing methods that allow for massively parallel sequencing of clonally amplified molecules and of single nucleic acid molecules.
  • Non-limiting examples of NGS include sequencing-by-synthesis using reversible dye terminators, and sequencing-by-ligation.
  • SBS techniques generally involve the enzymatic extension of a nascent nucleic acid strand through the iterative addition of nucleotides against a template strand.
  • repaired fragments are cloned, followed by Sanger sequencing of clones to assess methylation.
  • the readout may be obtained by the use of an array, using for example, procedures as described on the worldwide web illumina.com/techniques/microarrays/methylation-arrays.html.
  • array refers to a population of sites that can be differentiated from each other according to relative location. Different molecules that are at different sites of an array can be differentiated from each other according to the locations of the sites in the array.
  • An individual site of an array can include one or more molecules of a particular type.
  • a site can include a single target nucleic acid molecule having a particular sequence or a site can include several nucleic acid molecules having the same sequence (and/or complementary sequence, thereof).
  • the sites of an array can be different features located on the same substrate. Exemplary features include without limitation, droplets, wells in a substrate, beads (or other particles) in or on a substrate, projections from a substrate, ridges on a substrate or channels in a substrate.
  • the sites of an array can be separate substrates each bearing a different molecule. Different molecules attached to separate substrates can be identified according to the locations of the substrates on a surface to which the substrates are associated or according to the locations of the substrates in a liquid or gel. Exemplary arrays in which separate substrates are located on a surface include, without limitation, those having beads in wells.
  • the corrected DNA fragments may be amplified. It will be appreciated that any of the amplification methodologies described herein or generally known in the art may be used with universal or target-specific primers to amplify DNA fragments. Suitable methods for amplification include, but are not limited to, the polymerase chain reaction (PCR), strand displacement amplification (SDA), transcription mediated amplification (TMA) and nucleic acid sequence-based amplification (NASBA), as described in U.S. Pat. No. 8,003,354. The above amplification methods may be employed to amplify one or more nucleic acids of interest. For example, PCR, including multiplex PCR, SDA, TMA, NASBA and the like may be utilized to amplify DNA fragments. In some embodiments, primers directed specifically to the polynucleotide of interest are included in the amplification reaction.
  • PCR polymerase chain reaction
  • SDA strand displacement amplification
  • TMA transcription mediated amplification
  • NASBA nucleic acid
  • amplify refer generally to any action or process whereby at least a portion of a nucleic acid molecule is replicated or copied into at least one additional nucleic acid molecule.
  • the additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the target nucleic acid molecule.
  • the target nucleic acid molecule can be single-stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or double-stranded.
  • Amplification optionally includes linear or exponential replication of a nucleic acid molecule.
  • such amplification can be performed using isothermal conditions; in other embodiments, such amplification can include thermocycling.
  • the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction.
  • “amplification” includes amplification of at least some portion of DNA and RNA based nucleic acids alone, or in combination.
  • the amplification reaction can include any of the amplification processes known to one of ordinary skill in the art.
  • amplification conditions generally refers to conditions suitable for amplifying one or more nucleic acid sequences. Such amplification can be linear or exponential.
  • the amplification conditions can include isothermal conditions or alternatively can include thermocyling conditions, or a combination of isothermal and thermocycling conditions.
  • the conditions suitable for amplifying one or more nucleic acid sequences include polymerase chain reaction (PCR) conditions.
  • the amplification conditions refer to a reaction mixture that is sufficient to amplify nucleic acids such as one or more target sequences, or to amplify an amplified target sequence ligated to one or more adapters, e.g., an adapter-ligated amplified target sequence.
  • the amplification conditions include a catalyst for amplification or for nucleic acid synthesis, for example a polymerase; a primer that possesses some degree of complementarity to the nucleic acid to be amplified; and nucleotides, such as deoxyribonucleotide triphosphates (dNTPs) to promote extension of the primer once hybridized to the nucleic acid.
  • the amplification conditions can require hybridization or annealing of a primer to a nucleic acid, extension of the primer and a denaturing step in which the extended primer is separated from the nucleic acid sequence undergoing amplification.
  • amplification conditions can include thermocycling; in some embodiments, amplification conditions include a plurality of cycles where the steps of annealing, extending, and separating are repeated.
  • the amplification conditions include cations such as Mg++ or Mn++ and can also include various modifiers of ionic strength.
  • PCR polymerase chain reaction
  • K. B. Mullis as described in U.S. Pat. Nos. 4,683,195 and 4,683,202, which describes a method for increasing the concentration of a segment of a polynucleotide of interest in a mixture of genomic DNA without cloning or purification.
  • This process for amplifying the polynucleotide of interest consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired polynucleotide of interest, followed by a series of thermal cycling in the presence of a DNA polymerase.
  • the two primers are complementary to their respective strands of the double-stranded polynucleotide of interest.
  • the mixture is denatured at a higher temperature first and the primers are then annealed to complementary sequences within the polynucleotide of interest molecule.
  • the primers are extended with a polymerase to form a new pair of complementary strands.
  • the steps of denaturation, primer annealing, and polymerase extension can be repeated many times (referred to as thermocycling) to obtain a high concentration of an amplified segment of the desired polynucleotide of interest.
  • the length of the amplified segment of the desired polynucleotide of interest is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter.
  • the method is referred to as the “polymerase chain reaction” (hereinafter “PCR”).
  • the target nucleic acid molecules can be PCR amplified using a plurality of different primer pairs, in some cases, one or more primer pairs per target nucleic acid molecule of interest, thereby forming a multiplex PCR reaction.
  • amplicon when used in reference to a nucleic acid, means the product of copying the nucleic acid, wherein the product has a nucleotide sequence that is the same as or complementary to at least a portion of the nucleotide sequence of the nucleic acid.
  • An amplicon can be produced by any of a variety of amplification methods that use the nucleic acid, or an amplicon thereof, as a template including, for example, PCR, rolling circle amplification (RCA), ligation extension, or ligation chain reaction.
  • An amplicon can be a nucleic acid molecule having a single copy of a particular nucleotide sequence (for example, a PCR product) or multiple copies of the nucleotide sequence (for example, a concatameric product of RCA).
  • a first amplicon of a target nucleic acid is typically a complimentary copy.
  • Subsequent amplicons are copies that are created, after generation of the first amplicon, from the target nucleic acid or from the first amplicon.
  • a subsequent amplicon can have a sequence that is substantially complementary to the target nucleic acid or substantially identical to the target nucleic acid.
  • multiplex amplification refers to selective and non-random amplification of two or more target sequences within a sample using at least one target-specific primer. In some embodiments, multiplex amplification is performed such that some or all of the target sequences are amplified within a single reaction vessel.
  • the “plexity” or “plex” of a given multiplex amplification refers generally to the number of different target-specific sequences that are amplified during that single multiplex amplification. In some embodiments, the plexy can be about 12-plex, 24-plex, 48-plex, 96-plex, 192-plex, 384-plex, 768-plex, 1536-plex, 3072-plex, 6144-plex or higher.
  • amplified target sequences by several different methodologies (e.g., gel electrophoresis followed by densitometry, quantitation with a bioanalyzer or quantitative PCR, hybridization with a labeled probe; incorporation of biotinylated primers followed by avidin-enzyme conjugate detection; incorporation of 32P- labeled deoxynucleotide triphosphates into the amplified target sequence).
  • amplification site refers to a site in or on an array where one or more amplicons can be generated.
  • An amplification site can be further configured to contain, hold, or attach at least one amplicon that is generated at the site.
  • the target nucleic acids may be essentially any nucleic acid of known or unknown sequence.
  • Such target nucleic acids are typically derived from primary nucleic acids present in a sample, such as a biological sample.
  • the primary nucleic acids may originate as DNA or RNA.
  • DNA primary nucleic acids may originate in double-stranded DNA (dsDNA) form (e.g., genomic DNA, genomic DNA fragments, cell-free DNA, and the like) from a sample or may originate in single-stranded form from a sample.
  • RNA primary nucleic acids may be mRNA or non-coding RNA, e.g., microRNA or small interfering RNA.
  • a preparation of DNA fragments from an input sample may be single or double stranded DNA.
  • the primary nucleic acid molecules may represent the entire genetic complement of an organism, e.g., genomic DNA molecules which include both intron and exon sequences, as well as non-coding regulatory sequences such as promoter and enhancer sequences.
  • the primary nucleic acid molecules may represent the entire genetic complement of specific cells of an organism, e.g., from tumor cells, where the genomic DNA molecules which include both intron and exon sequences, as well as non-coding regulatory sequences such as promoter and enhancer sequences.
  • particular subsets of genomic DNA can be used, such as, for example, particular chromosomes, DNA associated with open chromatin, DNA associated with closed chromatin, or one or more specific sequences such as a region of a specific gene (e.g., targeted sequencing).
  • the primary nucleic acid molecules may represent a particular subset of DNA, e.g., DNA having a specific sequence that anneals with a primer such as one used for targeted sequencing or target enrichment.
  • a particular subset of DNA can be used, such as cell-free DNA, which can include DNA of the subject including DNA from normal cells, DNA from diseased cells such as tumor cells, and/or DNA from fetal cells.
  • the primary nucleic acid molecules may represent the entire transcriptome of cells of an organism, e.g., mRNA molecules.
  • the primary nucleic acid molecules may represent the entire transcriptome of specific cells of an organism, e.g., from tumor cells or for instance the cells of a tissue.
  • the primary nucleic acid molecules may represent a particular subset of mRNA, e.g., mRNA having a specific sequence that anneals with a primer such as one used for targeted sequencing or target enrichment.
  • a sample such as a biological sample, can include nucleic acid molecules obtained from biopsies, tumors, scrapings, swabs, blood, mucus, urine, plasma, semen, hair, laser capture micro-dissections, surgical resections, and other clinical or laboratory obtained samples.
  • the sample can be an epidemiological, agricultural, forensic, or pathogenic sample.
  • the sample can include cultured cells.
  • the sample can include nucleic acid molecules obtained from an animal such as a human or mammalian source.
  • the sample can include nucleic acid molecules obtained from a non-mammalian source such as a plant, bacteria, virus, or fungus.
  • the source of the nucleic acid molecules may be an archived or extinct sample or species.
  • sources of biological samples can include whole organisms as well as a sample obtained from a subject or a patient.
  • the biological sample can be obtained from any biological fluid or tissue and can be in a variety of forms, including liquid fluid and tissue, solid tissue, and preserved forms such as dried, frozen, and fixed forms.
  • the sample may be of any biological tissue, cells, or fluid.
  • Such samples include, but are not limited to, sputum, blood, serum, plasma, blood cells (e.g., white cells), ascitic fluid, urine, saliva, tears, sputum, vaginal fluid (discharge), washings obtained during a medical procedure (e.g., pelvic or other washings obtained during biopsy, endoscopy or surgery), tissue, nipple aspirate, core or fine needle biopsy samples, cell-containing body fluids, peritoneal fluid, and pleural fluid, or cells therefrom, and free floating nucleic acids such as cell-free circulating DNA.
  • Biological samples may also include sections of tissues such as frozen or fixed sections taken for histological purposes or micro-dissected cells or extracellular parts thereof.
  • the sample can be a blood sample, such as, for example, a whole blood sample.
  • the sample is an unprocessed dried blood spot (DBS) sample.
  • the sample is a formalin-fixed paraffin-embedded (FFPE) sample.
  • the sample is a saliva sample.
  • the sample is a dried saliva spot (DSS) sample.
  • Exemplary biological samples from which target nucleic acids can be derived include, for example, those from a eukaryote, for instance a mammal, such as a rodent, mouse, rat, rabbit, guinea pig, ungulate, horse, sheep, pig, goat, cow, cat, dog, primate, human or non-human primate; a plant, such as Arabidopsis ihciliana, com, sorghum, oat, wheat, rice, canola, or soybean; an algae, such as Chlamydomonas reinhardtir, a nematode such as Caenorhabditis elegans', an insect, such as Drosophila melanogaster , mosquito, fruit fly, honey bee or spider; a fish, such as zebrafish; a reptile; an amphibian, such as a frog or Xenopus laevis,' a Dictyostelium discoideum, a
  • Target nucleic acids can also be derived from a prokaryote such as a bacterium, Escherichia coli, Staphylococcus or Mycoplasma pneumoniae,' an archaeon; a virus such as Hepatitis C virus or human immunodeficiency virus; or a viroid.
  • Target nucleic acids can be derived from a homogeneous culture or population of organisms described herein or alternatively from a collection of several different organisms, for example, in a community or ecosystem.
  • a biological sample includes tissue that is processed to obtain the desired primary nucleic acids.
  • cells are used obtain the desired primary nucleic acids.
  • nuclei are used to obtain the desired primary nucleic acids.
  • the method can further include dissociating cells, and/or isolating nuclei from cells. Methods for isolating cells and nuclei from tissue are available (WO 2019/236599).
  • nucleic acids present in tissue, in cells, or in isolated nuclei can be processed depending on the desired read-out.
  • nucleic acids can be fixed during processing, and useful fixation methods are available (WO 2019/236599).
  • Fixation can be useful to preserve a sample or maintain contiguity of analytes from a sample, a cell, or a nucleus.
  • Fixation methods preserve and stabilize tissue, cell, and nucleus morphology and architecture, inactivates proteolytic enzymes, strengthens samples, cells, and nuclei so they can withstand further processing and staining, and protects against contamination.
  • fixation examples include, but are not limited to, whole genome sequencing of isolated nuclei and chromosome conformation capture methods such as Hi-C. Common methods of fixation include perfusion, immersion, freezing, and drying (Srinivasan et al., Am J Pathol. 2002 Dec; 161(6): 1961-1971. doi: 10.1016/S0002-9440(10)64472-0). In some embodiments such as whole genome sequencing, isolated nuclei can be processed to dissociate nucleosomes from DNA while leaving the nuclei intact, and methods for generating nucleosome-free nuclei are available (WO 2018/018008).
  • primary nucleic acids in bulk can be used to produce a sequencing library as described herein.
  • individual cells or nuclei can be used as sources of primary nucleic acids to obtain sequence information from single cells and nuclei.
  • single cell library preparation methods are known in the art, including, but not limited to, Drop-seq, Seq-well, and single cell combinatorial indexing ("sci-") methods. Companies providing single cell products and related technologies include, but are not limited to, Illumina, 10X genomics, Takara Biosciences, BD biosciences, Biorad, Icellbio, isoplexis, CellSee, nanoselect, and Dolomite bio.
  • Sci-seq is a methodological framework that employs split-pool barcoding to uniquely label the nucleic acid contents of large numbers of single cells or nuclei.
  • the number of nuclei or cells can be at least two.
  • the upper limit is dependent on the practical limitations of equipment (e.g., multi-well plates, number of indexes) used in other steps of the methods as described herein.
  • the number of nuclei or cells that can be used is not intended to be limiting and can number in the billions.
  • the target nucleic acids used in the methods and compositions of the present disclosure can be derived by fragmentation.
  • Random fragmentation refers to the fragmentation of a polynucleotide molecule from a primary nucleic acid sample in a non-ordered fashion by enzymatic, chemical, or mechanical methods. Such fragmentation methods are known in the art and use standard methods (Sambrook and Russell, Molecular Cloning, A Laboratory Manual, third edition). Moreover, random fragmentation is designed to produce fragments irrespective of the sequence identity or position of nucleotides comprising and/or surrounding the break.
  • the random fragmentation is by mechanical means such as nebulization or sonication to produce fragments of about 50 base pairs in length to about 1500 base pairs in length, for example, about 50-700 base pairs in length, about 50-400 base pairs in length. In some preferred embodiments, fragments are about 100 to 300 base pairs in length or about 100 to 200 base pairs in length.
  • the DNA fragments are DNA library fragments. Any of the many library preparation protocols available are compatible with the methods described herein.
  • a library may be a whole-genome library or a targeted library.
  • a library includes, but is not limited to, a sequencing library.
  • a multitude of sequencing library methods are known to a skilled person (see, for example, Sequencing Methods Review, available on the world wide web at illumina.com/content/dam/illumina-marketing/documents/products/research_ reviews/sequencing-methods-review.pdf).
  • library preparation may be for use with any of a variety of next generation sequencing platforms, such as for example, the sequencing by synthesis platform of ILLUMINA® or the ion semiconductor sequencing platform of ION TORRENTTM.
  • DNA fragments including DNA library fragments, may be prepared from input sample material such that adapter sequences are ligated to fragments to facilitate downstream workflow steps, such as for example, degradation of the second strand, amplification, and/or sequencing.
  • adapter sequences e.g., sequences present in a universal adaptor
  • Methods for attaching adapters to a nucleic acid are known to the person skilled in the art. For example, the attachment can be through tagmentation using transposase complexes (WO 2016/130704), or through standard library preparation techniques using ligation (U.S. Pat. Pub. No. 2018/0305753). Addition of an adapter can occur before or after treatment of the target nucleic acid with a cytidine deaminase and/or an uracil de-glycosylase.
  • Adapter sequences may include 5' and/or 3' adapter sequences.
  • An adapter may be attached to just one end of the DNA fragment, for example, 5' and/or 3' ends, or to both ends.
  • the term “adapter” and its derivatives, e.g., universal adapter refers generally to any linear oligonucleotide which can be attached to a target nucleic acid.
  • An adapter can be singlestranded or double-stranded DNA or can include both double-stranded and single- stranded regions.
  • An adapter can include a universal sequence that is substantially identical, or substantially complementary, to at least a portion of a primer, for example a universal primer; an index (also referred to herein as a barcode or tag) to assist with downstream error correction, identification, or sequencing; and/or a unique molecular identifier.
  • the adapter is substantially non-complementary to the 3' end or the 5' end of any target sequence present in the sample.
  • adapter sequences may have one or more phosphorothioate bonds at the 5' end of the adapter sequences.
  • suitable adapter lengths are in the range of about 6-100 nucleotides, about 12-60 nucleotides, or about 15- 50 nucleotides in length.
  • the term “adaptor” and “adapter” are used interchangeably.
  • the term “universal,” when used to describe a nucleotide sequence refers to a region of sequence that is common to two or more nucleic acid molecules where the molecules also have regions of sequence that differ from each other.
  • Non-limiting examples of universal capture sequences include sequences that are identical to or complementary to P5 and P7 primers.
  • the terms “P5” and “P7” may be used when referring to a universal capture sequence or a capture oligonucleotide.
  • the terms “P5 1 ” (P5 prime) and “P7 1 ” (P7 prime) refer to the reverse complement of P5 and P7, respectively.
  • any suitable universal capture sequence or a capture oligonucleotide can be used in the methods presented herein, and that the use of P5 and P7 are exemplary embodiments only.
  • Uses of capture oligonucleotides such as P5 and P7 or their complements on flowcells are known in the art, as exemplified by the disclosures of WO 2007/010251, WO 2006/064199, WO 2005/065814, WO 2015/106941, WO 1998/044151, and WO 2000/018957.
  • any suitable forward amplification primer can be useful in the methods presented herein for hybridization to a complementary sequence and amplification of a sequence.
  • any suitable reverse amplification primer can be useful in the methods presented herein for hybridization to a complementary sequence and amplification of a sequence.
  • One of skill in the art will understand how to design and use primer sequences that are suitable for capture and/or amplification of nucleic acids as presented herein.
  • DNA fragments can have an average strand length that is desired or appropriate for a particular application of the methods, compositions, or kits set forth herein.
  • the average strand length can be less than about 100,000 nucleotides, 50,000 nucleotides, 10,000 nucleotides, 5,000 nucleotides, 1,000 nucleotides, 500 nucleotides, 300 nucleotides 200 nucleotides, 100 nucleotides, or 50 nucleotides.
  • the average strand length can be greater than about 10 nucleotides, 50 nucleotides, 100 nucleotides, 200 nucleotides, 500 nucleotides, 1,000 nucleotides, 5,000 nucleotides, 10,000 nucleotides, 50,000 nucleotides, or 100,000 nucleotides.
  • the average strand length for a population of DNA fragments can be in a range between any maximum and minimum value set forth above.
  • DNA fragments may be of a shorter length, for example, about 50 nucleotides to about 500 nucleotides in length, about 50 nucleotides to about 300 nucleotides in length, about 50 nucleotides to about 250 nucleotides in length, about 50 nucleotides to about 200 nucleotides in length, about 50 nucleotides to about 100 nucleotides in length, about 100 nucleotides to about 200 nucleotides in length, about 100 nucleotides to about 250 nucleotides in length, about 100 nucleotides to about 300 nucleotides in length, or about 100 nucleotides to about 500 nucleotides in length.
  • Shorter fragment length can be employed to maximize the overall performance of the enzymatic error-correction, by minimizing the number of potential false-positive uracils that may be present in any one individual DNA fragment. By minimizing the number of false positives in each fragment, the probability of successfully repairing library fragments and mediating their downstream amplification may be increased, thus preserving library quality and diversity.
  • kits for undertaking a TraPR method as described herein for the reduction of false positive uracil residues due to deamination of unmethylated cytosines in an assay using a cytosine deaminase to deaminate methylated cytosines.
  • the present disclosure also provides kits for directly sequencing DNA fragments to identify abasic sites and/or uracil residues, for example as in the method of Jian et al., a UdgX cross-linking and polymerase stalling sequencing (“Ucaps-seq”) method to detect dU at single-nucleotide resolution (Jiang et al., 2022, J Am Chem Soc 144: 1323-1331).
  • a kit may include at least one or more of a cytosine deaminase, an uracil DNA glycosylase (UDG), a high fidelity polymerases, a deoxy cytidyl transferase, primers, and/or dNTPs in a suitable packaging material in an amount sufficient for at least one reaction.
  • the deoxycytidyl transferase is Revl.
  • the primer is primer complementary to the 3' end library adapter capable of binding to single stranded DNA library fragments comprising 5' end and 3' end library adapters.
  • a kit may also include a dCTP derivative.
  • a cytosine deaminase may be an altered cytosine deaminase, including, but not limited to any of those described herein or as described in International Patent Application No. PCT/US2023/017846 (“Altered Cytidine Deaminases and Methods of Use”), filed April 7, 2023, which is hereby incorporated by reference in its entirety.
  • a kit may include one or more other components.
  • other components include, for example, a PCR polymerase, PCR master mix, a DNA denaturation solution (such as for example, NaOH, formamide, or DMSO), a cytosine deaminase buffer, a UDG reaction buffer, DNA purification beads for purification steps, a positive control polynucleotide, such as a double-stranded DNA including one or more known modified cytosines for use in measuring efficiency, or a negative control polynucleotide, such as a double-stranded DNA including unmodified cytosines.
  • other reagents such as buffers and solutions are also included. Instructions for use of the packaged components are also typically included.
  • the term "package” refers to a solid matrix or material such as glass, plastic, paper, foil, and the like, capable of holding within fixed limits the polypeptides.
  • "Instructions for use” typically include a tangible expression describing the reagent concentration or at least one assay method parameter, such as the relative amounts of reagent and sample to be admixed, maintenance time periods for reagent/sample admixtures, temperature, buffer conditions, and the like.
  • packaging material refers to one or more physical structures used to house the contents of the kit.
  • the packaging material is constructed by known methods, preferably to provide a sterile, contaminant-free environment.
  • the packaging material has a label which indicates that the components can be used for the reducing uracil residues due to deamination of unmethylated cytosines in an assay using a cytosine deaminase to deaminate methylated cytosines.
  • Aspect 1 is a method of reducing false positive detection of 5-methylcytosine (5mC) and/or 5-hydroxymethylcytosine (5hmC) due to deamination of unmethylated cytosines, the method comprising:
  • Aspect 2 is a method of Aspect 1, wherein the deamination of unmethylated cytosines is due to deamination of unmethylated cytosines in an assay using a cytosine deaminase to selectively deaminate methylated cytosines.
  • Aspect 3 is a method of Aspect 1, wherein prior to step (a), the sample comprising single stranded DNA library fragments is contacted with a deaminase to selectively deaminate methylated cytosine.
  • Aspect 4 is a method of replicating uracil residues as cytosine residues, the method comprising:
  • Aspect 5 is a method of any one of Aspects 1 to 4, wherein the deoxycytidyl transferase comprises the Rev 1 enzyme.
  • Aspect 6 is a method of any one of Aspects 1 to 5, wherein the high fidelity polymerase comprises T4 DNA polymerase or E. coli polymerase.
  • Aspect 7 is a method of any one of Aspects 1 to 6, wherein treating the double stranded DNA library fragments to digest the first strand comprising abasic sites library comprises treating the double stranded DNA library fragments with heat and/or NaOH.
  • Aspect 8 is a method of any one of Aspects 1 to 7, wherein the single stranded DNA library fragments are about lOObp to about 200bp in length.
  • Aspect 9 is a method of any one of Aspects 1 to 8, further comprising subjecting the complementary second strands in which uracil residues are replicated as cytosine residues sample to polymerase chain reaction (PCR) amplification.
  • PCR polymerase chain reaction
  • Aspect 10 is a method of any one of Aspects 1 to 9, further comprising sequencing the complementary second strands in which uracil residues are replicated as cytosine residues.
  • Aspect 11 is a method of any one of Aspects 1 to 9 further comprising processing the complementary second strands in which uracil residues are replicated as cytosine residues to produce a sequencing library.
  • Aspect 12 is the method of Aspect 11, further comprising sequencing the sequencing library.
  • Aspect 13 is a method of any one of Aspects 1 to 12, wherein the cytosine deaminase comprises an altered cytosine deaminase.
  • Aspect 14 is the method of Aspect 13, wherein the altered cytosine deaminase is a member of the AID subfamily, the APOBEC1 subfamily, the APOBEC2 subfamily, the APOBEC3A subfamily, the APOBEC3B subfamily, the APOBEC3C subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, the APOBEC3G subfamily, the APOBEC3G subfamily, the APOBEC3H subfamily, or the APOBEC4 subfamily, or an alteration thereof.
  • Aspect 15 is the method of Aspect 13, wherein the altered cytosine deaminase comprises an altered APOBEC3A.
  • Aspect 16 is a method of any one of Aspects 13 to 15, wherein the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type APOBEC3A protein and/or an amino acid substitution mutation at a position functionally equivalent to Tyrl32 in a wild-type APOBEC3A protein.
  • Aspect 17 is a method of any one of Aspects 13 to 16, wherein the altered cytidine deaminase comprises amino acid substitution mutations at positions functionally equivalent to (Tyr/Phe)130 and Tyrl32 in a wild-type AP0BEC3A protein.
  • Aspect 18 is a method of any one of Aspects 13 to 17, wherein the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type AP0BEC3A protein, wherein the substitution mutation is (Tyr/Phe)130Trp.
  • Aspect 19 is the method of Aspect 17 or 18, wherein the (Tyr/Phe)130 is Tyrl30, and the wild-type AP0BEC3A protein is SEQ ID NO: 12.
  • Aspect 20 is a method of any one of Aspects 16 to 19, wherein the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to alanine, glycine, phenylalanine, histidine, glutamine, methionine, asparagine, lysine, valine, aspartic acid, glutamic acid, serine, cysteine, proline, arginine, or threonine.
  • Aspect 21 is a method of any one of Aspects 16 to 20, wherein the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to Ala, Vai, or Trp.
  • Aspect 22 is the method of any one of Aspects 16 to 21, wherein the substitution mutation at the position functionally equivalent to Tyrl32 comprises a mutation to His, Arg, Gin, or Lys.
  • Aspect 23 is a method of any one of Aspects 13 to 22, wherein the altered cytidine deaminase converts 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of cytosine (C) to uracil (U) by deamination.
  • 5mC 5-methyl cytosine
  • T thymidine
  • U uracil
  • Aspect 24 is the method of Aspect 23, wherein the rate is at least 100-fold greater.
  • Aspect 25 is a method of any one of Aspects 13 to 24, wherein the altered cytidine deaminase converts cytosine (C) to uracil (U) by deamination and 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of 5 -hydroxymethyl cytosine (5hmC) to 5 -hydroxymethyl uracil (5hmU) by deamination.
  • Aspect 26 is the method of Aspect 25, wherein conversion of 5hmC to 5hmU by deamination is undetectable.
  • Aspect 27 is a method of any one of Aspects 13 to 26, wherein the altered cytidine deaminase comprises a ZDD motif H- [P/A/V]-E-X[23-28]-P-C-X[2-4]-C (SEQ ID NO: 1).
  • Aspect 28 is a method of any one of Aspects 13 to 27, wherein the altered cytidine deaminase is a member of the AP0BEC3A subfamily and comprises a ZDD motif HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8-1 l]LX2LX[10]M (SEQ ID NOY), wherein the amino acid substitution mutation at the position functionally equivalent to (Tyr/Phe)130 of the wild-type AP0BEC3A protein is the Tyr (Y) amino acid of the ZDD motif.
  • Aspect 29 is a method of any one of Aspects 13 to 28, wherein the altered cytidine deaminase is a member of the AP0BEC3A subfamily and comprises X[16-26]- GRXXTXLCYXV-X1 5-GXXXN-X12-HAEXXF-X14-YXXTWXXSWSPC- X[2-4]-CA-X5- FL-X7-LXIXXXR(L/I)Y-X8-GLXXLXXXG-X5-M-X4-FXXCWXXFV-X6-FXPW-X13- LXXI- X[2-6] (SEQ ID NO:3).
  • Aspect 30 is a method of any one of Aspects 13 to 29, wherein the altered cytidine deaminase is a member of the APOBEC3A family and comprises SEQ ID NO:5, SEQ ID NO:6, SEQ ID NOY, SEQ ID NO 8, SEQ ID NO:9, SEQ ID NO: 10, or SEQ ID NO:11.
  • Aspect 31 is a kit comprising: a cytosine deaminase; an uracil DNA glycosylase (UDG); a high fidelity polymerases; and/or a deoxy cytidyl transferase.
  • a cytosine deaminase an uracil DNA glycosylase (UDG); a high fidelity polymerases; and/or a deoxy cytidyl transferase.
  • Aspect 32 is the kit of Aspect 31, further comprising: dNTPs; and a primer complementary to the 3' end library adapter capable of binding to single stranded DNA library fragments comprising 5' end and 3' end library adapters.
  • Aspect 33 is the kit of Aspect 31 or 32, further comprising an unnatural dCTP derivative.
  • Aspect 33 is the kit of any one of Aspects 31 to 33, wherein the cytosine deaminase is an altered APOB EC.
  • Enzymatic sequencing methods targeting 5-methylcytosine (5m-dC) are suffering from cross reactivity with cytosine (dC). This is due to the use of cytidine deaminases, which target Cs and 5m-dCs to give Uracil (dU) and Thymine (dT), respectively.
  • cytidine deaminases which target Cs and 5m-dCs to give Uracil (dU) and Thymine (dT), respectively.
  • Recently, developed engineered cytidine deaminase variants with increased selectivity for 5m-dC over dC allow direct sequencing of methylated regions by 5m-dC to dT conversion. However, the false positive rate of these enzymes (dC to dU conversion) is still too high to allow for deployment in a workflow.
  • the false positive rate associated with cytidine deaminases including engineered cytidine deaminase variants with increased selectivity for 5m-dC over dC is solved by repair of the uracils with Revl, effectively converting false positives into dC to dG conversions.
  • the DNA is incubated with Uracil DNA Glycosylase (UDG), which will generate abasic sites where dC to dU conversions have occurred.
  • UDG Uracil DNA Glycosylase
  • the complement of the library is generated by use of a high-fidelity polymerase, supplemented with a deoxycytidyl transferase, such as Revl .
  • dC will be selectively incorporated opposite to the abasic site. After digestion of the original strand, the complement can be read to assign methylation calls.
  • 5m-dC sites will be recognized by the expected conversion to dT, while the false positives arising from dC to dU conversion can be identified by a defined signature, a dC to dG conversion.
  • 5m-dC sequencing is EM-seq.
  • This fully enzymatic technique uses a two- step enzymatic reaction: first, the DNA is treated with TET and then with APOBEC. In the first step, TET oxidizes 5m-dC to a mixture of 5 -hydroxy cytosine (5hm-dC), 5 -formyl cytosine (5f- dC), and 5-carboxycytosine (5ca-dC). The purpose of this step is to protect the methylated cytosines from cross-reactivity in the following step, as APOBEC is active on both 5m-dC and C.
  • APOBEC is also reactive on 5hm-dC - which is the reason that 5hm-dC is also further reacted with a glucosyltransferase to generate 5-glucosylhydroxycytidine (5gm-dC).
  • 5gm-dC 5-glucosylhydroxycytidine
  • the DNA is treated with APOBEC, and all the dCs converted to dUs.
  • the overall result is that, if a site was methylated before the dual treatment, it will still sequence as a C. Conversely, any other site that was not methylated before treatment, will read as a T (as U are read as T during sequencing).
  • the main disadvantage of this method is the generation of a so-called 3-base genome, which is extremely burdensome due to high computational demands, difficulties in variation calling, and poor sequencing performance.
  • Cytidine deaminase variants with extraordinarily selectivity for 5m-dC over dC have been recently developed.
  • engineered cytidine deaminases include any of the engineered AP0BEC3A cytidine deaminases described in
  • this example uses a tandem enzymatic treatment.
  • UDG Uracil DNA Glycosylase
  • UDG is a monofunctional glycosylase, that upon recognition of dU in either ssDNA or dsDNA, will cleave the N-glycosidic bond to release uracil and yield an abasic (AP) site (Rusmintratip and Sowers, 2000, PNAS,' 97(26): 14183-14187).
  • AP abasic site
  • a DNA repair enzyme is utilized to make good use of the dC -> AP.
  • AP sites are repaired via the base excision repair pathway (BER) (reviewed in Robertson et al., 2009, Cell Mol Life Sci, 66(6):981-93).
  • BER base excision repair pathway
  • another mechanism of repair is a mutagenesis replication operated by polymerases. Most polymerases, when encountering an AP site, they would undergo stalling. Few of them - like the highly mutagenic DNA Polymerase 0, is able to read through the lesion with the insertion of dA by what is known as the A-rule (Laverty et al., 2017, ACS Chem Biol,' 12(6): 1584-1592). Unfortunately, introduction of dA opposite to the AP site would still mean that ultimately dU will be read as a dT - thus, conventional polymerases would not result in the needed change of base for a selective detection.
  • This example utilizes a specific class of polymerases (Family Y), often referred as deoxycytidyl transferases, that are specific for the transfer of dC across AP sites.
  • the defining member for this polymerase family is Revl (Nair et al., 2005, Science' 309(5744):2219-2222). Revl is rarely seen replicating full strands, and instead is role is to complement other polymerases to prevent their stalling at translesion sites.
  • Revl is rarely seen replicating full strands, and instead is role is to complement other polymerases to prevent their stalling at translesion sites.
  • Revl is able to bypass the AP site by using an arginine residue (R324) as a template, to bind an incoming dCMP through the Ns and Nr, of the guanidium group and help transfer it onto the nascent strand of DNA (Weaver et al., 2020, PNAS 117 (41) 25494-25504).
  • libraries are first prepared from the input sample material such that adapter sequences are ligated to library fragments to facilitate downstream workflow steps.
  • Many possible library preparation protocols are compatible with the described invention.
  • libraries may be prepared targeting a shorter insert size, for example 100-200bp, in order to minimize the number of potential false-positive uracils that may be present in any individual library fragment. By minimizing the number of false positives in each fragment, the probability of successfully repairing library fragments and mediating their downstream amplification may be increased, thus preserving library quality and diversity.
  • UDG Uracil DNA glycosylase
  • Heating DNA containing abasic sites may be detrimental to performance, as abasic sites are labile and heating may lead to strand cleavage. Therefore, following the generation of abasic sites, care should be taken with the DNA sample to minimize degradation.
  • the DNA sample may be mixed with a primer that binds to the 3’ library adapter sequence to facilitate second strand synthesis, along with a polymerase cocktail of a high fidelity polymerase mixed with Revl polymerase and a mixture of dNTPs.
  • a second strand will be generated such that “C” residues are inserted preferentially across from abasic sites.
  • Appropriate high fidelity polymerases may include T4 DNA polymerase or E. coll DNA polymerase, among others (Tanguy Le Gac et al., 2004, J Mol Blo 336(5): 1023-34).
  • the double-stranded DNA molecule may be subjected to PCR with a standard high fidelity PCR polymerase.
  • the DNA sample may first be treated with heat and/or a dilute solution of NaOH to cleave the original library fragments at the abasic sites, preventing their amplification.
  • both 5m-dC and 5hm-dC are converted. While 5m-dC is converted to a dT, 5hm-dC is converted to 5hm-dU.
  • a possible issue with the approach described here is that UDG would be active on 5hm-dU.
  • some uracil- DNA-glycosylases from higher organisms have indeed shown activity on 5hm-dU, bacterial (E. coli) UDG have been shown to only be active on dU.
  • HMCES has been found to form a stable protein- DNA crosslink via a thioazolidinone, ultimately resulting in protection of AP sites from hydrolysis (Thompson et aL, 2019, Nat Struct Mol Biol, 26(7):613-618).
  • Other methods for stabilization or trapping of abasic sites can be inspired by the literature on AP site visualization.
  • Transformations that will stabilize the AP site and prevent P-elimination include reduction of the aldehyde to alcohol with a reducing agent, oxydation to deoxyribonolactone with an oxidizing agent, formation of a Schiff base with hydroxylamine, hydrazine, and derivatives, and formation of thioazolidinone with cysteine and derivatives.
  • SEQ ID NO: 1 zinc-binding motif H-[P/A/V]-E-X[23-28]-P-C-X[2-4]-C
  • SEQ ID NO: 3 altered cytidine deaminase includes the amino acids of a member of the APOBEC3A subfamily:
  • SEQ ID NO: 4 altered cytidine deaminase includes the amino acids of a member of the APOBEC3A subfamily:
  • FIG. 2A Reference Sequence FIG. 2A
  • FIG. 2A ATCATCGACACGTACGACTAGCTATACTAGCTAGCTATATGATCGATAT

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Described herein are methods, compositions, and kits for removing false positive uracils due to the deamination of unmethylated cytosines in assays using engineered cytosine deaminases to deaminate methylated cytosines. The methods, compositions, and kits utilize a dual enzymatic process. After cytidine deaminase treatment, the DNA is first incubated with Uracil DNA Glycosylase (UDG), which generates abasic sites where dC to dU conversions have occurred. The DNA is then incubated with a high-fidelity polymerase supplemented with a deoxycytidyl transferase, such as Rev1, to repair the lesion with the installation of a cytidine.

Description

FALSE POSITIVE REDUCTION BY TRANSLESION POLYMERASE REPAIR
CONTINUING APPLICATION DATA
This application claims the benefit of U.S. Provisional Application Serial No. 63/469,860, filed May 31, 2023, which is incorporated by reference herein.
SEQUENCE LISTING
This application contains a Sequence Listing electronically submitted via EFS-Web to the United States Patent and Trademark Office as an XML file entitled "0531.002563W001.xml" having a size of 26,495 bytes and created on May 28, 2024. The information contained in the Sequence Listing is incorporated by reference herein.
FIELD OF INVENTION
Embodiments of the present disclosure relate to the prevention of false positive detection of 5-methylcytosine (5mC) and/or 5-hydroxymethylcytosine (5hmC) due to the deamination of unmethylated cytosines in assays using cytosine deaminases to selectively deaminate methylated cytosines. In particular, embodiments of the methods, compositions, and kits provided herein utilize a dual enzymatic process to reduce the likelihood that such false positive conversions are detected in the final sequenced library. After cytidine deaminase treatment, the DNA is first incubated with Uracil DNA Glycosylase (UDG), which will generate abasic sites where dC to dU conversions have occurred. Then the DNA is incubated with a high-fidelity polymerase supplemented with a deoxy cytidyl transferase, such as Revl, to repair the lesion with the installation of a cytidine.
BACKGROUND
Modified DNA cytosines, including 5-methylcytosine (5mC), are a well-studied epigenetic modification that play fundamental roles in human development and disease. Its genome-wide distribution differs between tissue types, and between healthy and diseased states. In recent years, 5mC has also gained prominence as a tool for clinical diagnostics. For example, its distribution in cell-free DNA (cfDNA) obtained from a liquid biopsy can be used for the tissue-specific prediction of early-stage cancer. As a result, there has been an intense focus on developing methods for mapping 5mC at single base resolution, with minimal loss of sample DNA quantity, quality, and complexity.
5mC bases treated with a cytosine deaminase result in thymine bases, providing a signal for assessing sequence-specific methylation state of cytosines when sequenced. AP0BEC3A is a cytidine deaminase that recognizes single-stranded DNA and catalyzes the deamination of cytosine (C) to uracil (U), 5-methylcytosine (5mC) to thymine (T), and 5-hydroxymethylcytosine to 5-hydroxymethyluracil. Protein engineering of AP0BEC3A has resulted in mutant APOBEC proteins with selectivity towards deamination of 5mC with reduced activity towards deamination of C, however residual activity for deamination of C remains. This undesirable deamination of unmethylated cytosines results in the false positive detection of 5mC (and 5hmC) with uracil bases being read as thymine bases in the assay.
SUMMARY OF THE INVENTION
The present disclosure provides a method of reducing false positive detection of 5- methylcytosine (5mC) and/or 5-hydroxymethylcytosine (5hmC) due to deamination of unmethylated cytosines, the method comprising:
(a) providing a sample comprising single stranded DNA library fragments comprising deaminated methylated cytosines, wherein the strand single stranded DNA fragments comprise 5' end and 3' end library adapters;
(b) contacting the sample with an uracil DNA glycosylase (UDG), wherein the UDG deglycosylates uracil residues to form abasic sites, resulting in single stranded DNA library fragments with abasic sites;
( c) contacting the sample comprising single stranded DNA library fragments with abasic sites with a mixture comprising: high fidelity polymerase; a deoxycytidyl transferase; dNTPs; and a primer complementary to the 3' end library adapter under conditions to provide for second strand synthesis, wherein the deoxycytidyl transferase incorporates cytosines opposite abasic sites, resulting in double stranded DNA library fragments comprising a first strand comprising the single stranded DNA library fragment with abasic sites and a complementary second strand comprising cytosines opposite the abasic sites;
(d) treating the double stranded DNA library fragments to digest the first strand comprising the single stranded DNA library fragment with abasic sites resulting in a sample comprising complementary second strands in which uracil residues are replaced with cytosine residues.
In some aspects, the deamination of unmethylated cytosines is due to deamination of unmethylated cytosines in an assay using a cytosine deaminase to selectively deaminate methylated cytosines.
In some aspects, prior to step (a), the sample comprising single stranded DNA library fragments is contacted with a deaminase to selectively deaminate methylated cytosine.
The disclosure provides a method of replicating uracil residues as cytosine residues, the method comprising:
(a) providing a sample comprising single stranded DNA library fragments comprising 5' end and 3' end library adapters;
(b) contacting the sample with an uracil DNA glycosylase (UDG) wherein the UDG deglycosylates uracil residues to form abasic sites, resulting in single stranded DNA library fragments with abasic sites;
( c) contacting the sample comprising single stranded DNA library fragments with abasic sites with a mixture comprising: high fidelity polymerase; a deoxy cytidyl transferase; dNTPs; and a primer complementary to the 3' end library adapter under conditions to provide for second strand synthesis, wherein the deoxycytidyl transferase incorporates cytosines opposite abasic sites, resulting in double stranded DNA library fragments comprising a first strand comprising the single stranded DNA library fragment with abasic sites and a complementary second strand comprising cytosines opposite the abasic sites; (d) treating the double stranded DNA library fragments to digest the first strand comprising the single stranded DNA library fragment with abasic sites resulting in a sample comprising complementary second strands in which uracil residues are replaced with cytosine residues.
With a method as described herein, the deoxycytidyl transferase comprises the Revl enzyme.
With a method as described herein, the high fidelity polymerase comprises T4 DNA polymerase or E. coli polymerase.
With a method as described herein, treating the double stranded DNA library fragments to digest the first strand comprising abasic sites library comprises treating the double stranded DNA library fragments with heat and/or NaOH.
With a method as described herein, the single stranded DNA library fragments are about lOObp to about 200bp in length.
With a method as described herein, the method further comprises subjecting the complementary second strands in which uracil residues are replicated as cytosine residues sample to polymerase chain reaction (PCR) amplification.
With a method as described herein, the method further comprises sequencing the complementary second strands in which uracil residues are replicated as cytosine residues.
With a method as described herein, the method further comprises processing the complementary second strands in which uracil residues are replicated as cytosine residues to produce a sequencing library. In some aspects, the method further comprises sequencing the sequencing library.
With a method as described herein, the cytosine deaminase comprises an altered cytosine deaminase. In some aspects, the altered cytosine deaminase is a member of the AID subfamily, the APOBEC1 subfamily, the APOBEC2 subfamily, the AP0BEC3A subfamily, the APOBEC3B subfamily, the APOBEC3C subfamily, the AP0BEC3D subfamily, the APOBEC3F subfamily, the AP0BEC3G subfamily, the AP0BEC3G subfamily, the AP0BEC3H subfamily, or the APOBEC4 subfamily, or an alteration thereof. In some aspects, the altered cytosine deaminase comprises an altered AP0BEC3A.
In some aspects, the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type AP0BEC3A protein and/or an amino acid substitution mutation at a position functionally equivalent to Tyrl32 in a wild-type AP0BEC3A protein.
In some aspects, the altered cytidine deaminase comprises amino acid substitution mutations at positions functionally equivalent to (Tyr/Phe)130 and Tyrl32 in a wild-type AP0BEC3A protein.
In some aspects, the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type AP0BEC3A protein, wherein the substitution mutation is (Tyr/Phe)130Trp.
In some aspects, the (Tyr/Phe)130 is Tyrl30, and the wild-type AP0BEC3A protein is SEQ ID NO: 12.
In some aspects, the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to alanine, glycine, phenylalanine, histidine, glutamine, methionine, asparagine, lysine, valine, aspartic acid, glutamic acid, serine, cysteine, proline, arginine, or threonine.
In some aspects, the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to Ala, Vai, or Trp.
In some aspects, the substitution mutation at the position functionally equivalent to Tyrl32 comprises a mutation to His, Arg, Gin, or Lys.
In some aspects, the altered cytidine deaminase converts 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of cytosine (C) to uracil (U) by deamination. In some aspects, the rate is at least 100-fold greater.
In some aspects, the altered cytidine deaminase converts cytosine (C) to uracil (U) by deamination and 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of 5 -hydroxymethyl cytosine (5hmC) to 5-hydroxymethyl uracil (5hmU) by deamination. In some aspects, conversion of 5hmC to 5hmU by deamination is undetectable.
In some aspects, the altered cytidine deaminase comprises a ZDD motif H- [P/A/V]-E- X[23-28]-P-C-X[2-4]-C (SEQ ID NO: 1).
In some aspects, the altered cytidine deaminase is a member of the AP0BEC3A subfamily and comprises a ZDD motif HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8- 1 l]LX2LX[10]M (SEQ ID NO:2), wherein the amino acid substitution mutation at the position functionally equivalent to (Tyr/Phe)130 of the wild-type AP0BEC3A protein is the Tyr (Y) amino acid of the ZDD motif.
In some aspects, the altered cytidine deaminase is a member of the AP0BEC3A subfamily and comprises X[16-26]-GRXXTXLCYXV-X15-GXXXN-X12-HAEXXF-X14- YXXTWXXSWSPC- X[2-4]-CA-X5-FL-X7-LXIXXXR(L/I)Y-X8-GLXXLXXXG-X5-M-X4- FXXCWXXFV-X6-FXPW-X13-LXXI- X[2-6] (SEQ ID NO:3).
In some aspects, the altered cytidine deaminase is a member of the AP0BEC3A family and comprises SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, or SEQ ID NO: 11.
The present disclosure provides a kit comprising: a cytosine deaminase; an uracil DNA glycosylase (UDG); a high fidelity polymerases; and/or a deoxy cytidyl transferase.
In some aspects, the kit further comprises dNTPs and a primer complementary to the 3' end library adapter capable of binding to single stranded DNA library fragments comprising 5' end and 3' end library adapters. In some aspects, the kit further comprises an unnatural dCTP derivative. In some aspects, the cytosine deaminase is an altered APOBEC.
Terms used herein will be understood to take on their ordinary meaning in the relevant art unless specified otherwise. Several terms used herein and their meanings are set forth below.
As used herein, the term “nucleic acid” is intended to be consistent with its use in the art and includes naturally occurring nucleic acids or functional analogs thereof. Particularly useful functional analogs are capable of hybridizing to a nucleic acid in a sequence specific fashion or capable of being used as a template for replication of a particular nucleotide sequence. Naturally occurring nucleic acids generally have a backbone containing phosphodi ester bonds. An analog structure can have an alternate backbone linkage including any of a variety of those known in the art. Naturally occurring nucleic acids generally have a deoxyribose sugar (for example, found in deoxyribonucleic acid (DNA)) or a ribose sugar (for example, found in ribonucleic acid (RNA)). A nucleic acid can contain any of a variety of analogs of these sugar moieties that are known in the art. A nucleic acid can include native or non-native bases. In this regard, a native deoxyribonucleic acid can have one or more bases selected from the group consisting of adenine, thymine, cytosine, or guanine and a ribonucleic acid can have one or more bases selected from the group consisting of uracil, adenine, cytosine, or guanine. Useful non-native bases that can be included in a nucleic acid are known in the art. The term “template” and “target,” when used in reference to a nucleic acid, is intended as a semantic identifier for the nucleic acid in the context of a method or composition set forth herein and does not necessarily limit the structure or function of the nucleic acid beyond what is otherwise explicitly indicated.
As used herein, the term “target nucleic acid,” is intended as a semantic identifier for the nucleic acid in the context of a method or composition or kit set forth herein and does not necessarily limit the structure or function of the nucleic acid beyond what is otherwise explicitly indicated. Reference to a nucleic acid such as a target nucleic acid includes both single-stranded and double-stranded nucleic acids, and both DNA and RNA, unless indicated otherwise.
The terms “polynucleotide” and “oligonucleotide” are used interchangeably herein to refer to a polymeric form of nucleotides of any length, and may include ribonucleotides, deoxyribonucleotides, analogs thereof, or mixtures thereof. The terms should be understood to include, as equivalents, analogs of either DNA, RNA, cDNA, or antibody-oligo conjugates made from nucleotide analogs and to be applicable to single stranded (such as sense or antisense) and double stranded polynucleotides. The term as used herein also encompasses cDNA, that is complementary or copy DNA produced from an RNA template, for example by the action of reverse transcriptase.
As used herein, the term “primer” and its derivatives refer generally to any polynucleotide that can hybridize to a target sequence of interest. Typically, the primer functions as a substrate onto which nucleotides can be polymerized by a polymerase; in some embodiments, however, the primer can become incorporated into the synthesized nucleic acid strand and provide a site to which another primer can hybridize to prime synthesis of a new strand that is complementary to the synthesized nucleic acid molecule. The primer can include any combination of nucleotides or analogs thereof. In some embodiments, the primer is a singlestranded oligonucleotide or polynucleotide. The terms “polynucleotide” and “oligonucleotide” are used interchangeably herein to refer to a polymeric form of nucleotides of any length, and may comprise ribonucleotides, deoxyribonucleotides, analogs thereof, or mixtures thereof. The terms should be understood to include, as equivalents, analogs of either DNA or RNA made from nucleotide analogs and to be applicable to single stranded (such as sense or antisense) and double-stranded polynucleotides. The term as used herein also encompasses cDNA, that is complementary or copy DNA produced from an RNA template, for example by the action of reverse transcriptase. This term refers only to the primary structure of the molecule. Thus, the term includes triple-, double- and single-stranded deoxyribonucleic acid (“DNA”), as well as triple-, double- and single-stranded ribonucleic acid (“RNA”).
The term “sensitivity” as used herein is equal to the number of true positives divided by the sum of true positives and false negatives.
The term “specificity” as used herein is equal to the number of true negatives divided by the sum of true negatives and false positives.
As used herein, “providing” in the context of a protein, sample of DNA or RNA, or composition means making the protein, sample of DNA or RNA, or composition, purchasing the protein, sample of DNA or RNA, or composition, or otherwise obtaining the protein, sample of DNA or RNA, or composition.
As used herein, “isolated” refers to material removed from its original environment (e.g., the natural environment if it is naturally occurring), and thus is altered “by the hand of man” from its natural state.
As used herein, the term “each,” when used in reference to a collection of items, is intended to identify an individual item in the collection but does not necessarily refer to every item in the collection unless the context clearly dictates otherwise.
As used in this specification and the appended claims, the term “or” is generally employed in its sense including “and/or” unless the content clearly dictates otherwise. The term “and/or” means one or all of the listed elements or a combination of any two or more of the listed elements. The use of “and/or” in some instances does not imply that the use of “or” in other instances may not mean “and/or.”
Unless otherwise specified, “a,” “an,” “the,” and “at least one” are used interchangeably and mean one or more than one.
The words “preferred” and “preferably” refer to embodiments of the disclosure that may afford certain benefits, under certain circumstances. However, other embodiments may also be preferred, under the same or other circumstances. Furthermore, the recitation of one or more preferred embodiments does not imply that other embodiments are not useful and is not intended to exclude other embodiments from the scope of the disclosure.
As used herein, “have,” “has,” “having,” “include,” “includes,” “including,” “comprise,” “comprises,” “comprising,” or the like are used in their open ended inclusive sense, and generally mean "include, but not limited to, “includes, but not limited to,” or “including, but not limited to.”
It is understood that wherever embodiments are described herein with the language “have,” “has,” “having,” “include,” “includes,” “including,” “comprise,” “comprises,” “comprising,” and the like, otherwise analogous embodiments described in terms of “consisting of’ and/or “consisting essentially of’ are also provided. The term “consisting of’ means including, and limited to, whatever follows the phrase “consisting of.” That is, “consisting of’ indicates that the listed elements are required or mandatory, and that no other elements may be present. The term “consisting essentially of’ indicates that any elements listed after the phrase are included, and that other elements than those listed may be included provided that those elements do not interfere with or contribute to the activity or action specified in the disclosure for the listed elements.
Conditions that are “suitable” for an event to occur or “suitable” conditions are conditions that do not prevent such events from occurring. Thus, these conditions permit, enhance, facilitate, and/or are conducive to the event.
Reference throughout this specification to “one embodiment,” “an embodiment,” “certain embodiments,” or “some embodiments,” etc., means that a particular feature, configuration, composition, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Thus, the appearances of such phrases in various places throughout this specification are not necessarily referring to the same embodiment of the disclosure. Furthermore, the particular features, configurations, compositions, or characteristics may be combined in any suitable manner in one or more embodiments.
In the description herein particular embodiments may be described in isolation for clarity. Unless otherwise expressly specified that the features of a particular embodiment are incompatible with the features of another embodiment, certain embodiments can include a combination of compatible features described herein in connection with one or more embodiments.
For any method disclosed herein that includes discrete steps, the steps may be conducted in any feasible order. And, as appropriate, any combination of two or more steps may be conducted simultaneously. Throughout this disclosure, various aspects of the disclosure can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 4.5, 5, 5.3, and 6. This applies regardless of the breadth of the range.
Unless otherwise indicated, all numbers expressing quantities of components, molecular weights, and so forth used in the specification and claims are to be understood as being modified in all instances by the term “about.” Accordingly, unless otherwise indicated to the contrary, the numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the present invention. At the very least, and not as an attempt to limit the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques.
Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. All numerical values, however, inherently contain a range necessarily resulting from the standard deviation found in their respective testing measurements.
In several places throughout the application, guidance is provided through lists of examples, which examples can be used in various combinations. In each instance, the recited list serves only as a representative group and should not be interpreted as an exclusive list. It is to be understood that the particular examples, materials, amounts, and procedures are to be interpreted broadly in accordance with the scope and spirit of the invention as set forth herein.
All headings throughout are for the convenience of the reader and should not be used to limit the meaning of the text that follows the heading, unless so specified.
The above summary of the present disclosure provided above is not intended to describe each disclosed embodiment or every implementation of the present disclosure. The description that follows more particularly exemplifies illustrative embodiments. In several places throughout the application, guidance is provided through lists of examples, which examples can be used in various combinations. In each instance, the recited list serves only as a representative group and should not be interpreted as an exclusive list.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1. FIG. 1 is a schematic of a dual enzymatic process for false positive repair.
FIGS. 2A and 2B. Interpretation of sequencing data with TraPR-treated libraries. FIG. 2A presents how DNA sequences are modified through the workflow, expected sequencing results, and comparison to reference sequences. FIG. 2A shows SEQ ID NOs: 13-20. After sequencing, false positive repair events are discriminated from true positive methylation signals by comparison of the sequencing data to the reference sequences shown in FIG. 2B.
FIG. 3. FIG. 3 shows the structure of Revl Binary Complex and Revl Ternary Complex (Weaver et al., 2020, PNAS' 117(41):25494-25504).
FIGS. 4A-4B. FIGS. 4A and 4B show the amino acid sequences of various altered cytosine deaminases. FIG. 4A shows the amino acid sequences of altered cytosine deaminases with SEQ ID NO:5 and SEQ ID NO:6. FIG. 4B shows the amino acid sequences of altered cytosine deaminases with SEQ ID NO: 9, SEQ ID NOTO, and SEQ ID NOT E
The schematic drawings are not necessarily to scale. Like numbers used in the figures may refer to like components. However, it will be understood that the use of a number to refer to a component in a given figure is not intended to limit the component in another figure labeled with the same number. In addition, the use of different numbers to refer to components is not intended to indicate that the different numbered components cannot be the same or similar to other numbered components.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
With the methods described herein the problem of false positive conversions of cytosines to uracils in cytosine deaminase based methylation detection assays is solved utilizing a dual enzymatic process. After cytidine deaminase treatment, the DNA is first incubated with Uracil DNA Glycosylase (UDG), which will generate abasic sites where dC to dU conversions have occurred. The DNA is then incubated with a high-fidelity polymerase supplemented with a deoxycytidyl transferase, such as Revl, to repair the abasic lesion with the installation of a cytidine.
A schematic illustrating this translesion polymerase repair (TraPR) method is shown in FIG. 1. Briefly, a preparation of DNA fragments from an input sample that has been treated with a cytidine deaminase to deaminate 5-methylcytosine (5mC) and/or 5-hydroxymethylcytosine (5hmC) residues and possibly including one or more off-target conversions of a cytosine to an uracil is first treated with an uracil DNA glycosylase (UDG). UDG enzymatically catalyzes the hydrolysis of the N-glycosidic bond from deoxyuridine to release uracil, resulting in an abasic site. Then, the complementary DNA fragments are generated by use of a high-fidelity polymerase, supplemented with a deoxy cytidyl transferase, such as Revl, to generate complementary corrected DNA fragments. In these conditions, dC will be selectively incorporated opposite to the abasic site.
In some embodiments, a sample is a preparation of single-stranded DNA (ssDNA) fragments is a preparation of library fragments. In some embodiments, a sample including single-stranded DNA (ssDNA) fragments is a preparation of library fragments. In some embodiments, the library fragments may include 5' and/or 3' adapter sequences. Single stranded library preparation methods are well known in the art and include ligation-based approaches (Troll et al., 2019, BMC Genomics,' 20(1): 1-14; and Raine et al., 2017, Nucleic Acids Research, 45(6):e36) and commercial kits (xGen Methyl-Sequencing DNA Library Prep Kit and Adaptase, Integrated DNA Technologies).
In some embodiments, the preparation of library fragments is in solution. In some embodiments, the preparation of library fragments is on a surface, including, but not limited to, on the surface of beads. For example, bead based preparation may involve the use bead-linked transposomes (BLT) utilizing transposomes conjugated directly to beads to bind a fixed DNA fragments (Bruinsma et al., 2018, BMC Genomics,' 19(722), using, for example Illumina’s NEXTERA™ technologies.
The preparation of corrected DNA fragments is then sequenced. The 5m-dC sites will be recognized by the expected conversion to dT, while the false positives arising from dC to dU conversion can be identified by a defined signature, a dC to dG conversion.
In some embodiments, after the complement of the DNA fragments is generated by use of a high-fidelity polymerase supplemented with a deoxycytidyl transferase and prior to sequencing, the preparation of DNA fragments may be treated to degrade or digest the original DNA strands. The original DNA strands contain abasic sites which can serve as targets for this digestion/degradation. In some embodiments, as abasic sites are labile, treatment with for example, heat or a base, such as for example NaOH, will result in the cleavage and degradation of original DNA strands that contain an abasic site.
In some embodiments, removal of original DNA strands containing abasic sites may be by an enzymatic process. For example, the 5-hydroxymethylcytosine (5hmC) binding, ESC- specific (HMCES) protein or another SRAP (SOS-Response Associated Peptidase) domain protein, such as, for example, Escherichia coli YedK, may be used to trap and remove ss DNA fragments with abasic sites. Both HMCES and YedK preferentially bind ssDNA and efficiently form DNA-protein crosslinks (DPCs) to AP sites in ssDNA. These proteins form a covalent crosslink to abasic sites via a stable thiazolidine DNA-protein linkage formed with the N- terminal cysteine and the aldehyde form of the AP deoxyribose. See Mohni et al., 2019, Cell,' 176: 144-153 and Thompson et al., 2019, Nat Struct Mol Biol, 26(7): 613-618. In some embodiments, HMCES may be bound to a surface, such as for example, a bead, may be used to trap and remove ssDNA fragments containing abasic sites. In some embodiments, rather than being used as a target to remove DNA fragments, the covalent linkage of HMCES to ssDNA fragments containing abasic sites may serve to hinder or stop the PCR replication for fragments containing abasic sites.
In some embodiments, a preparation of corrected DNA fragments may be subject to PCR amplification prior to sequencing. In some embodiments, a standard high fidelity PCR polymerase may be used. In some embodiments, a preparation of corrected DNA fragments is not subject to amplification prior to sequencing.
In addition to preventing false positive detection of 5-methylcytosine (5mC) and/or 5- hydroxymethylcytosine (5hmC) due to the deamination of unmethylated cytosines in an assay using a cytosine deaminase to selectively deaminate methylated cytosines, the translesion polymerase repair (TraPR) methods described herein have more general applications as methods of replicating uracil residues with cytosine residues. Cytosine Deaminase
With the methods described herein, a sample including single-stranded DNA (ssDNA) fragments is contacted with a cytosine deaminase to deaminate methylated cytosines. In some embodiments, a sample including single-stranded DNA (ssDNA) fragments is a preparation of denatured library fragments. In some embodiments, the library fragments may include 5' and/or 3' adapter sequences.
As used herein, a “cytidine deaminase enzyme” refers to an enzyme that deaminates cytosine and/or one or more cytosine derivatives. The deamination occurs at the amino group of the C4 position of the cytosine or cytosine derivative. For example, a cytidine deaminase enzyme may deaminate cytosine to form U, may deaminate mC to form T, and/or may deaminate hydroxymethylcytosine (hmC) to form hmU. A nonlimiting example of a cytidine deaminase enzyme that may deaminate cytosine to form U, may deaminate mC to form T, and/or may deaminate hmC to form hmU is apolipoprotein B mRNA editing enzyme, catalytic polypeptide- like (APOB EC). Nonlimiting examples of such APOBECs include AP0BEC1, AP0BEC2, AP0BEC3A, AP0BEC3B, APOBEC3C, AP0BEC3E, APOBEC3F, AP0BEC3G, AP0BEC3H, and AP0BEC4. As used herein, the term “methylcytosine” or “mC” refers to cytosine that includes a methyl group (-CH3 or -Me). The methyl group may be located at the 5 position of the cytosine, in which case the mC may be referred to as 5mC.
A cytidine deaminase may include, but is not limited to, any known member of the APOBEC protein family. The APOBEC protein family is a member of the large cytidine deaminase superfamily that contains a canonical zinc-dependent deaminase (ZDD) signature motif embedded within a core cytidine deaminase fold. This fold includes a five-stranded mixed beta (b)-sheet surrounded by six alpha (a)-helices with the order al-bl-b2-a2-b3-a3-b4-a4-b5-a5- a6 (Salter et al., Trends Biochem Sci. 2016 41(7):578-594. doi : 10.1016/j .tibs.2016.05.001 ; Salter et al., Trends Biochem. Sci. 2018, 43(8):606-622 doi.org/10. 1016/j .tibs.2018.04.013). Each cytidine deaminase domain core structure of APOBEC proteins contains a highly conserved spatial arrangement of the catalytic center residues of a zinc-binding motif (referred to as the ZDD motif) (Salter et al., Trends Biochem Sci. 2016 41 (7):578- 594).
The APOBEC protein family includes subfamilies AID (activation-induced cytidine deaminase), APOBEC 1, APOBEC2, APOBEC3 (including 3 A, 3B, 3C, 3D, 3F, 3G, 3H), and APOBEC4. A cytidine deaminase may be a member of the APOBEC protein family from a vertebrate, such as a mammal. Examples of mammals include, but are not limited to, rodents, primates, rabbit, bovine (e.g., cow), porcine (e.g., pig), and equine (e.g., horse). An example of a primate is a human and a chimpanzee. Some members of the APOBEC protein family, e.g., the AID subfamily, the APOBEC 1 subfamily, the APOBEC2 subfamily, the AP0BEC3A subfamily, the APOBEC3C subfamily, the AP0BEC3H subfamily, and the APOBEC4 subfamily, include one copy of the ZDD motif. Other members of the APOBEC protein family, e.g., the AP0BEC3B subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, and the AP0BEC3G subfamily, include two copies of the ZDD motif, but often only the C-terminal copy is active (Salter et al., Trends Biochem Sci. 2016 41(7):578-594). The skilled person can easily identify members of each of the subfamilies by using a publicly available database such as the Protein database available at the National Center for Biotechnology Information (ncbi.nlm.nih.gov/protein).
In some embodiments, a cytidine deaminase is an altered cytidine deaminase, recombinantly engineered to include a substitution mutation at one or more residues when compared to a reference cytidine deaminase. An altered cytidine deaminase can be based on a member of the AID subfamily, the APOBEC 1 subfamily, the AP0BEC2 subfamily, the AP0BEC3 subfamily (e.g., the 3A subfamily, the 3B subfamily, the 3C subfamily, the 3D subfamily, the 3F subfamily, the 3G subfamily, or the 3H subfamily), or the AP0BEC4 subfamily. The skilled person will readily appreciate that such an altered or engineered cytidine deaminase described herein is not naturally occurring. In some embodiments, such an altered or engineered cytidine deaminase demonstrates selective deamination for mC.
An altered cytidine deaminase may be one of three types of altered cytidine deaminases. One type of altered cytidine deaminase preferentially deaminates 5mC instead of C (i.e., converts 5mC to T at a greater rate than converting C to U) and is referred to herein as having “cytosinedefective deaminase activity.” A second type of altered cytidine deaminase preferentially deaminates C instead of 5mC (i.e., converts C to U at a greater rate than converting 5mC to T) and is referred to herein as having “5mC-defective deaminase activity.” A third type of altered cytidine deaminase preferentially deaminates C and 5mC to U and T, respectively, and has significantly reduced deamination of 5hmC, 5fC, and 5caC. The third type is referred to herein as having “5hmC-defective deaminase activity.” Unless the context indicates otherwise, reference to an altered cytidine deaminase includes altered cytidine deaminases having cytosine- defective deaminase activity, altered cytidine deaminases having 5mC-defective deaminase activity, and altered cytidine deaminases having 5mC-defective deaminase activity.
Altered cytidine deaminases include apolipoprotein B mRNA editing enzymes, catalytic polypeptide-like (APOBEC) and activation induced cytidine deaminase (AID). Wild-type APOBEC and AID cytidine deaminases have the activity of deaminating cytidine (C) of DNA and/or RNA to form uridine (U). An altered cytidine deaminase of the present disclosure has an altered rate of deamination of C, 5mC, and/or 5hmC when compared to the wild-type enzyme. A cytidine deaminase of the present disclosure can be referred to herein as an "altered cytidine deaminase," "recombinant cytidine deaminase," “mutant cytosine deaminase,” or “modified cytidine deaminases” and refers to any of the altered cytosine deaminases described herein that comprise one or more changes from the reference (i.e., wildtype) amino acid sequence that provide the unexpected property of an altered deamination profile, e.g., alters its ability to preferentially deaminate one form of cytosine over another.
Whether a protein has cytidine deaminase activity may be determined by in vitro assays. On example of an in vitro assay is based on digestion with the restriction enzyme 5iral. A protein that can deaminate 5mC to thymidine has cytidine deaminase activity.
An altered cytidine deaminase that preferentially deaminates 5mC instead of C (i.e., has cytosine-defective deaminase activity) can have a catalytic efficiency that is at least 10-fold, at least 50-fold, or at least 100-fold higher on 5mC than C substrates. In one embodiment, an altered cytidine deaminase that preferentially deaminates 5mC instead of C can have a catalytic efficiency that is no greater than 1500-fold higher on 5mC than C substrates.
An altered cytidine deaminase that preferentially deaminates C instead of 5mC (i.e., has 5mC-defective deaminase activity) can have a catalytic efficiency that is at least 10-fold, at least 50-fold, or at least 100-fold higher on C than 5mC substrates. In one embodiment, an altered cytidine deaminase that preferentially deaminates C instead of 5mC can have a catalytic efficiency that is no greater than 1500-fold higher on C than 5mC substrates.
When compared to a wild type cytidine deaminase, an altered cytidine deaminase that deaminates C and 5mC to U and T, respectively, and has significantly reduced deamination of 5hmC (i.e., has 5hmC-defective deaminase activity), the deamination of 5hmC by an altered cytidine deaminase disclosed herein is reduced by at least 80%, at least 90%, or at least 99% compared to the wild type cytidine deaminase. In one embodiment, the deamination of 5hmC by an altered cytidine deaminase disclosed herein is undetectable using an assay such as the .S'lrc/I- based assay.
In certain embodiments, an altered cytidine deaminase of the present disclosure is based on a member of the APOBEC protein family. An altered cytidine deaminase of the present disclosure that is "based on" a member of the APOBEC protein family means the altered cytidine deaminase is an APOBEC protein that includes one or more of the substitution mutations described herein as compared to a reference APOBEC sequence. An altered cytidine deaminase of the present disclosure that is "based on" a member of the APOBEC protein family can also include conservative and/or nonconservative mutations as described herein.
The APOBEC protein family includes subfamilies AID, APOBEC 1, APOBEC2, APOBEC3 (including 3A, 3B, 3C, 3D, 3F, 3G, 3H), and AP0BEC4. An altered cytidine deaminase of the present disclosure can be based on a member of the AID subfamily, the APOBEC 1 subfamily, the AP0BEC2 subfamily, the APOBEC3 subfamily (e.g., the 3 A subfamily, the 3B subfamily, the 3C subfamily, the 3D subfamily, the 3F subfamily, the 3G subfamily, or the 3H subfamily), or the APOBEC4 subfamily. An altered cytidine deaminase of the present disclosure can be based on a member of the APOBEC protein family from a vertebrate, such as a mammal. Examples of mammals include, but are not limited to, rodents, primates, rabbit, bovine (e.g., cow), porcine (e.g., pig), and equine (e.g., horse). An example of a primate is a human and a chimpanzee.
The APOBEC protein family is a member of the large cytidine deaminase superfamily that contains a canonical zinc-dependent deaminase (ZDD) signature motif embedded within a core cytidine deaminase fold. This fold includes a five-stranded mixed beta (b)-sheet surrounded by six alpha (a)-helices with the order al-bl-b2-a2-b3-a3-b4-a4-b5-a5-a6 (Salter et al., 2016, Trends Biochem Sci 41(7):578-594. doi: 10.1016/j .tibs.2016.05.001 ; Salter et al., 2018, Trends Biochem Ser, 43(8):606-622 doi.org/10. 1016/j .tibs.2018.04.013). Each cytidine deaminase domain core structure of APOBEC proteins contains a highly conserved spatial arrangement of the catalytic center residues of a zinc-binding motif H-[P/A/V]-E-X|22-2S|-P-C-X|2-4|-C (SEQ ID NO: 1) (referred to herein as the ZDD motif, where X is any amino acid, and the subscript range of numbers after X refers to the number of amino acids) (Salter et al., 2016, Trends Biochem Sci 41(7):578-594. doi: 10.1016/j. tibs.2016.05.001). Without intending to be limited by theory, the H and two C residues coordinate a Zn atom, and the E residue polarizes a water molecule near the Zn-atom for catalysis (Chen et al., 2021, Viruses,' 13:497).
Some members of the APOBEC protein family, e.g., the AID subfamily, the APOBEC1 subfamily, the APOBEC2 subfamily, the APOBEC3A subfamily, the APOBEC3C subfamily, the APOBEC3H subfamily, and the APOBEC4 subfamily, include one copy of the ZDD motif. Other members of the APOBEC protein family, e.g., the APOBEC3B subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, and the APOBEC3G subfamily, include two copies of the ZDD motif, but often only the C-terminal copy is active (Salter et al., 2016, Trends Biochem Sci,' 41(7): 578-594. doi: 10.1016/j .tibs.2016.05.001). Thus, an altered cytidine deaminase disclosed herein includes one or two ZDD motifs. In one embodiment, an altered cytidine deaminase based on a member of the APOBEC3A subfamily includes the following ZDD motif: HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)¥X[8-ii]LX2LX[io]M (SEQ ID NO:2) (where X is any amino acid, and the subscript number or range of numbers after X refers to the number of amino acids) (Salter et al., 2016, Trends Biochem Sci', 41(7):578-594).
In one embodiment, an altered cytidine deaminase disclosed herein is a member of the following subfamilies, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, and APOBEC3G, and can include one or more highly conserved sites that are part of the active site and within the ZDD motif SEQ ID NO: 1. The sites include tryptophan at position 98 and serine or threonine at position 99 (Kouno et al., 2017, Nat. Comm, 8: 15024).
In addition to the ZDD motif, a member of the APOBEC protein family also includes other highly conserved residues that are part of the active site but not present as part of the ZDD motif SEQ ID NO: 1. A member the APOBEC3A subfamily, APOBEC3B subfamily, APOBEC3C subfamily, APOBEC3D subfamily, APOBEC3F subfamily, and APOBEC3G subfamily typically includes one or more of the following highly conserved sites that are part of the active site: arginine at position 28; histidine, asparagine, or arginine at position 29; serine or threonine, preferably threonine, at position 31; asparagine or aspartic acid at position 57; tyrosine or phenylalanine at position 130; asparagine or tyrosine at position 131; asparagine, tyrosine, or phenylalanine, preferably tyrosine, at position 132; and arginine or lysine at position 189 (Kouno et al., 2017, Nat. Comm,' 8: 15024, DOI: 10.1038/ncomms 15024).
An altered cytidine deaminase of the present disclosure includes a substitution mutation at one or more residues when compared to a reference cytidine deaminase. A substitution mutation can be at the same position or a functionally equivalent position compared to the reference cytidine deaminase. Reference cytidine deaminases and functionally equivalent positions are described in detail herein. The skilled person will readily appreciate that an altered cytidine deaminase described herein is not naturally occurring.
A reference cytidine deaminase can be a member of the APOBEC protein family. Essentially any known member of the APOBEC protein family can be a reference cytidine deaminase. The skilled person can easily identify members of each of the subfamilies by using a publicly available database such as the Protein database available at the National Center for Biotechnology Information (ncbi.nlm.nih.gov/protein) and searching for APOBEC 1, AP0BEC2, AP0BEC3A, AP0BEC3B, AP0BEC3C, AP0BEC3D, AP0BEC3F, AP0BEC3G, AP0BEC3H, AP0BEC4, or, when identifying members of the AID family, Activation-induced cytidine deaminase. A wild type reference cytidine deaminase has the activity of binding singlestranded DNA (ssDNA) and deaminating a cytosine present on the ssDNA to convert it to uracil. In one embodiment, a wild type reference cytidine deaminase has the activity of binding singlestranded RNA (ssRNA) and deaminating a cytosine present on the ssRNA to convert it to uracil. Methods for determining whether a protein binds ssDNA or ssRNA and deaminates a cytosine present are known to the skilled person.
In one embodiment, an altered cytidine deaminase has an amino acid sequence that is based on a reference sequence which is a member of the APOBEC protein family includes a ZDD motif H-[P/A/V]-E-X[23-28]-P-C-X[2-4]-C (SEQ ID NO: 1) and at least one substitution mutation disclosed herein. Optionally, an altered cytidine deaminase includes other active site residues disclosed herein. Non-limiting examples of reference cytidine deaminase proteins are shown in the following table.
Table 1. Examples of members of the APOBEC protein subfamilies.
Figure imgf000020_0001
Figure imgf000021_0001
UniProt, database of protein sequence and functional information, available at uniprot.org;
GenBank, collection of nucleotide sequences and their protein translations, available at ncbi.nlm.nih.gov/protein/.
In one embodiment, an altered cytidine deaminase has an amino acid sequence that is based on a reference sequence that is a member of the AP0BEC3A subfamily, and includes a ZDD motif HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8-ii]LX2LX[io]M (SEQ ID N0:2) (where X is any amino acid, and the subscript number or range of numbers after X refers to the number of amino acids) and at least one substitution mutation disclosed herein. In one embodiment, the substitution mutation is a substitution mutation at the underlined tyrosine, such as a substitution mutation to alanine (A). Optionally, the altered cytidine deaminase includes other active site residues disclosed herein.
In one embodiment, the amino acid sequence of an altered cytidine deaminase includes the amino acids of a member of the AP0BEC3A subfamily: X[i6-26]-GRXXTXLCYXV-Xi5- GXXXN-X12-HAEXXF-X14-YXXTWXXSWSPC- X[2-4]-CA-X5-FL-X7-LXIXXXR(L/I)Y-X8- GLXXLXXXG-X5-M-X4-FXXCWXXFV-X6-FXPW-X13-LXXI- X[2-6] (SEQ ID N0:3) (where X is any amino acid, and the subscript number or range of numbers after X refers to the number of amino acids), or a subset thereof, and at least one substitution mutation disclosed herein. In one embodiment, the substitution mutation is a substitution mutation at the underlined tyrosine, such as a substitution mutation to alanine (A) or to tryptophan (W). In one embodiment, the amino acid sequence of an altered cytidine deaminase includes the amino acids of a member of the APOBEC3A subfamily: X26-GRXXTXLCYXV-X15-G-X16- HAEXXF-X14-YXXTWXXSWSPC-X4-CA-X5-FL-X7-LXIFXXR(L/I)Y-X8-GLXXLXXXG-X5- M-X4-FXXCWXXFV-X6-FXPW-X13-LXXLX6 (SEQ ID NO:4) (where X is any amino acid, and the subscript number after X refers to the number of amino acids present), or a subset thereof, and at least one substitution mutation disclosed herein. In one embodiment, the substitution mutation is a substitution mutation at the underlined tyrosine (Y), such as a substitution mutation to alanine (A) or to tryptophan (W).
A substitution mutation can be at the same position or a functionally equivalent position compared to a reference cytidine deaminase. By "functionally equivalent" it is meant that the altered cytidine deaminase has the amino acid substitution at the amino acid position in a reference cytidine deaminase that has the same functional role in both the reference cytidine deaminase and the altered cytidine deaminase.
In general, functionally equivalent substitution mutations in two or more different cytidine deaminases occur at homologous amino acid positions in the amino acid sequences of the cytidine deaminases. Hence, use herein of the term "functionally equivalent" also encompasses mutations that are "positionally equivalent" or "homologous" to a given mutation, regardless of whether or not the particular function of the mutated amino acid is known. It is possible to identify the locations of functionally equivalent and positionally equivalent amino acid residues in the amino acid sequences of two or more different cytidine deaminases on the basis of sequence alignment and/or molecular modelling. For example, the tyrosine at residue 130 of the APOBEC3A proteins of Homo sapiens, Pongo pygmaeus, Nomascus leucogenys, Pan troglodytes, and Gorilla and the tyrosine at residue 133 of the APOBEC3A protein from Macaca fascicularis are functionally equivalent and positionally equivalent. The skilled person can easily identify functionally equivalent residues in cytidine deaminases.
In one embodiment, an altered cytidine deaminase has an amino acid sequence that is structurally similar to a reference cytidine deaminase disclosed herein. In one embodiment, a reference cytidine deaminase is one that includes the amino acid sequence of a sequence listed in Table 1.
As used herein, an altered cytidine deaminase may be "structurally similar" to a reference cytidine deaminase if the amino acid sequence of the altered cytidine deaminase possesses a specified amount of sequence similarity and/or sequence identity compared to the reference cytidine deaminase.
Structural similarity of two amino acid sequences can be determined by aligning the residues of the two sequences (for example, a candidate altered cytidine deaminase and a reference cytidine deaminase described herein) to optimize the number of identical amino acids along the lengths of their sequences; gaps in either or both sequences are permitted in making the alignment in order to optimize the number of identical amino acids, although the amino acids in each sequence must nonetheless remain in their proper order. A candidate altered cytidine deaminase is the cytidine deaminase being compared to the reference cytidine deaminase. A candidate altered cytidine deaminase that has structural similarity with a reference cytidine deaminase and cytidine deaminase activity is an altered cytidine deaminase.
Unless modified as otherwise described herein, a pair-wise comparison analysis of amino acid sequences can be conducted, for instance, by the local homology algorithm of Smith and Waterman, 1981, Adv. Appl. Math,' 2:482, by the homology alignment algorithm of Needleman and Wunsch, 1907, J Mol Biol,' 48:443, by the search for similarity method of Pearson and Lipman, 1988, Proc Nat'l Acad Sci USA,' 85:2444, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by visual inspection (see generally Current Protocols in Molecular Biology, Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., supplemented through 2004). One example of an algorithm that is suitable for determining structural similarity is the BLAST® algorithm, which is described in Altschul et al., 1990, J Mol Biol,' 215:403-410. The BLAST® algorithm can be used to calculate percent sequence identity and percent sequence similarity between two sequences. Software for performing BLAST® analyses is publicly available through the National Center for Biotechnology Information.
In the comparison of two amino acid sequences, structural similarity may be referred to by percent “identity” or may be referred to by percent “similarity.” “Identity" refers to the presence of identical amino acids. “Similarity” refers to the presence of not only identical amino acids but also the presence of conservative substitutions. Thus, in one embodiment the amino acid sequence of a cytidine deaminase protein having sequence similarity to a reference sequence may include conservative substitutions of amino acids present in that reference sequence.
A conservative substitution for an amino acid in a protein may be selected from other members of the class to which the amino acid belongs. For example, it is well-known in the art of protein biochemistry that an amino acid belonging to a grouping of amino acids having a particular size or characteristic (such as charge, hydrophobicity, or hydrophilicity) can be substituted for another amino acid without altering the activity of a protein, particularly in regions of the protein that are not directly associated with biological activity. For example, amino acids having a non-polar side chain include alanine, glycine, isoleucine, leucine, methionine, phenylalanine, proline, tryptophan, and valine; amino acids having a hydrophobic side chain include glycine, alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, and tryptophan; amino acids having a polar side chain include arginine, asparagine, aspartic acid, glutamine, glutamic acid, histidine, lysine, serine, cysteine, tyrosine, and threonine; and amino acids having an uncharged side chain include glycine, serine, cysteine, asparagine, glutamine, tyrosine, and threonine.
Thus, as used herein, reference to a cytidine deaminase as described herein, such as reference to the amino acid sequence of one or more SEQ ID NOs described herein can include a protein with at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% amino acid sequence similarity to the reference cytidine deaminase. Examples of altered cytidine deaminases having similarity with a reference amino acid sequence includes those having, for instance, at least 80%, at least 85%, at least 90%, or at least 95% similarity with SEQ ID NO:5 and having an alanine at amino acid 130. Other examples of altered cytidine deaminases having similarity with a reference amino acid sequence includes those having, for instance, at least 80%, at least 85%, at least 90%, or at least 95% similarity or identity with SEQ ID NO:6 and having an alanine at amino acid 130 and a histidine at amino acid 132.
Alternatively, as used herein, reference to a cytidine deaminase as described herein, such as reference to the amino acid sequence of one or more SEQ ID NOs described herein can include a protein with at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% amino acid sequence identity to the reference cytidine deaminase. Examples of altered cytidine deaminases having identity with a reference amino acid sequence includes those having, for instance, at least 80%, at least 85%, at least 90%, or at least 95% similarity with SEQ ID NO:5 and having an alanine (A) at amino acid 130. Other examples of altered cytidine deaminases having identity with a reference amino acid sequence includes those having, for instance, at least 80%, at least 85%, at least 90%, or at least 95% similarity or identity with SEQ ID NO:6 and having an alanine (A) at amino acid 130 and a histidine (H) at amino acid 132.
An altered cytidine deaminase of the present disclosure may include a substitution mutation at a position functionally equivalent to tyrosine at position 130 (Y130) in a member of the APOBEC3A subfamily. Accordingly, an alignment can be produced using a member of the APOBEC3A subfamily and another candidate altered cytidine deaminase from the APOBEC3A subfamily or a different APOBEC subfamily. In one embodiment, the candidate is selected from APOPEC subfamilies APOBEC 1 or AID. An example of an algorithm that can be used to produce an alignment is Clustal O. In some APOBEC family proteins, the wild type residue at a position functionally equivalent to Y130 is phenylalanine (F).
In another embodiment, an altered cytidine deaminase of the present disclosure includes a substitution mutation at a position functionally equivalent to the tyrosine (Y) of ZDD motif HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8-ii]LX2LX[io]M (SEQ ID NO:2) in a member of the APOBEC family, such as a member of the APOBEC3A subfamily. The underlined tyrosine (Y) of SEQ ID NO:2 is the position functionally equivalent to the tyrosine amino acid 130 of the wild type APOBEC3A protein (SEQ ID NO: 12).
In one embodiment, the substitution mutation at a position functionally equivalent to Y130 increases cytidine deaminase activity and preferentially acts on 5mC compared to cytosine (i.e., has cytosine-defective deaminase activity). The substitution mutation can be a mutation to alanine (A), glycine (G), phenylalanine (F), histidine (H), glutamine (Q), methionine (M), asparagine (N), lysine (K), valine (V), aspartic acid (D), glutamic acid (E), serine (S), cysteine (C), proline (P), or threonine (T). For example, the altered cytidine deaminase can comprise SEQ ID NO:9, wherein X is selected from A, G, F, H, Q, M, N, K, V, D, E, S, C, P or T (and is not Y), or can comprise SEQ ID NO: 10, wherein Z is selected from A, G, F, H, Q, M, N, K, V, D, E, S, C, P or T (and is not Y), preferably, in one embodiment, X or Z is A or L. In an exemplary aspect of this embodiment, the substitution mutation at a position functionally equivalent to Y130 is a mutation to alanine (A), (e.g., SEQ ID NO: 5). Specific examples of altered cytidine deaminases having increased activity and preferentially acting on 5mC compared to cytosine include SEQ ID NO: 5 or a sequence having at least 90%, at least 95%, at least 98%, at least 99% sequence identity to SEQ ID NO:5 and comprising Y130A.
An altered cytidine deaminase of the present disclosure having cytosine-defective deaminase activity (i.e., converts 5mC to T at a greater rate than converting C to U) optionally includes a second substitution mutation at a position two, three, four, or five amino acids on the C -terminal side of the Y130 position, or functionally equivalent to the Y130 position. In one embodiment, the second mutation is a tyrosine (Y), tryptophan (W), cysteine (C), histidine (H), or phenylalanine (F) at a position two, three, four, or five amino acids on the C-terminal side of the Y130 position, or functionally equivalent to the Y130 position. In one embodiment, the second mutation is at a position functionally equivalent to tyrosine at position 132 (Y132) in a member of the APOBEC3A subfamily. An APOBEC protein, such as an APOBEC3A protein, containing substitution mutations at both the first site, a position functionally equivalent to Y130, and the second site, at a position two, three, four, or five amino acids on the C-terminal side of the Y130 position, increases the preferential activity to act on 5mC compared to the same APOBEC protein, such as an APOBEC3A protein, containing one substitution mutation at Y130. In one embodiment, the substitution mutation at the second position is an amino acid having a positively charged side chain and selected from arginine (R), histidine (H), lysine (L), or a polar side chain selected from glutamine (Q). In one embodiment, the substitution mutation at the second position is histidine (H), such as Y132 to histidine. The double mutant containing both first and second mutations can be any substitution mutation at a position functionally equivalent to Y130 described herein and any second substitution mutation at a position two, three, four, or five amino acids on the C-terminal side of the Y130 position described herein, in any combination. For example, the altered cytidine deaminase can be, for example, SEQ ID NO: 4 and have a substitution at Y130 and Y132, or the position functionally equivalent to Y130 and Y132 as described herein. One example of an altered cytidine deaminase is SEQ ID NO: 11 comprising Y130X and Y132Z, where X is selected from (A), (L), or (W) (preferably (A)), and Z is selected from (R), (H), (L), or (Q), preferably (H). This encompasses examples including, but not limited to, for example Y130A and Y132R, Y130A and Y132H, Y130A and Y132L, Y130A and Y132Q, Y130L and Y132R, Y130L and Y132H, Y130L and Y132L, Y130L and Y132Q, Y130W and Y132R, Y130W and Y132H, Y130W and Y132L, Y130W and Y130Q, or any suitable combinations therein. In one embodiment, the double mutant includes substitution mutations Y130A and Y132H. Specific examples of altered cytidine deaminases having both substitution mutations and preferentially acting on 5mC compared to the APOBEC protein having just the single substitution mutation at cytosine include SEQ ID NO:6 or a sequence having at least 90%, at least 95%, at least 98%, at least 99% sequence identity to SEQ ID NO:6 and comprising Y130A and Y132H.
The person of ordinary skill in the art can confirm the 5mC preferential deaminase activity of the arginine, glutamine, histidine, and lysine substitution mutations at the second position in the double mutants described above. For example, double mutants can be constructed to create an altered cytidine deaminase having a first substitution mutation at a position functionally equivalent to Y130 and a second arginine, glutamine, histidine, or lysine substitution mutation at the tyrosine position two amino acids on the C-terminal side of the Y130 position, and then evaluated for deamination of C residues in one assay and deamination of 5mC residues in a second assay. Using an assay such as the Y l-based assay described herein, the ratio of 5mC deamination and C deamination can be compared to identify those double mutants that preferentially deaminate 5mC compared to C. One of ordinary skill in the art could similarly test double mutants having a tyrosine at a position three, four or five positions C- terminal to the position functionally equivalent to Y130 and confirm that a substitution mutation at that position to arginine, glutamine, histidine, or lysine, in combination with a mutation at the position functionally equivalent to Y130 (such as Y130A), as double mutants that preferentially deaminate 5mC compared to C.
Some embodiments presented herein relate to substitution mutations that result in 5mC- defective deaminase activity (i.e., converts C to U at a greater rate than converting 5mC to T). In one embodiment, the substitution mutation at a position functionally equivalent to Y130 increases cytidine deaminase activity and preferentially acts on cytosine compared to 5mC and is a mutation to an amino acid having a non-polar side chain or a hydrophobic side chain, such as leucine (L) or tryptophan (W). In an exemplary aspect of this embodiment, the substitution mutation at a position functionally equivalent to Y130 is a mutation to leucine. Other examples of mutations that result in increased preferential deamination activity on cytosine compared to 5mC include a single mutant with Y132P, and double mutants with a substitution mutation at Y130V and Y132H, or Y130W and Y132H. Specific examples of altered cytidine deaminases having increased cytidine deaminase activity and preferentially acts on cytosine compared to 5mC include SEQ ID NO:7 or a sequence having at least 90%, at least 95%, at least 98%, at least 99% sequence identity to SEQ ID NO:7 and comprising Y130L.
In one embodiment, the substitution mutation is at a position functionally equivalent to Y130 that results in 5hmC-defective deaminase activity (i.e., preferentially deaminates C and 5mC to U and T, respectively, and has significantly reduced deamination of 5hmC). In an exemplary aspect of this embodiment, the substitution mutation at a position functionally equivalent to Y130 is a mutation to an amino acid having a non-polar side chain or a hydrophobic side chain, such as tryptophan (W). Specific examples of altered cytidine deaminases having the ability to deaminate C and 5mC to U and T, respectively, but reduced ability to deaminate 5hmC, preferably no detectable ability to deaminate 5hmC include SEQ ID NO:8 or a sequence having at least 90%, at least 95%, at least 98%, at least 99% sequence identity to SEQ ID NO:8 and comprising Y130W.
In some embodiments, an altered cytidine deaminase includes a substitution mutation at a position functionally equivalent to tyrosine at position 130 (Y130) and/or tyrosine position 132 (Y132) in a member of the APOBEC3A subfamily. In some embodiments, such an altered cytidine deaminase demonstrates selective deamination for mC.
In some embodiments, an altered cytidine deaminase is an altered APOBEC3A cytidine deaminase, altered to include a substitution mutation at tyrosine at position 130 (Y130) and/or tyrosine position 132 (Y132). In some embodiments, such an altered APOBEC3A cytidine deaminase demonstrates selective deamination for mC.
In some embodiments, an altered cytidine deaminase is a double mutant of APOBEC3A, with substitution mutations Y130A/Y132H. In some embodiments, such an altered APOBEC3A cytidine deaminase demonstrates selective deamination for mC.
In some embodiments, an altered cytidine deaminase includes an altered cytidine deaminase having an amino acid of SEQ ID NO:5, SEQ ID NO:6, SEQ ID NOY, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, or SEQ ID NO: 11. In some embodiments, such an altered APOBEC3A cytidine deaminase demonstrates selective deamination for mC. An altered cytidine deaminase described herein can include additional mutations. Typically, additional mutations do not unduly alter the activity of the altered cytidine deaminase. One or more additional mutations can be a conservative mutation.
An altered cytidine deaminase described herein can be a truncated protein. A truncated protein is a fragment of an altered cytidine deaminase of the present disclosure that retains the ability to deaminate 5mC to thymidine. A truncated altered cytidine deaminase can include a deletion of 1 to 13 amino acids on the N-terminal end of the protein, a deletion of 1 to 3 amino acids on the C-terminal end of the protein, or a combination thereof.
In some embodiments, an altered cytidine deaminase includes any of those described in International Patent Application No. PCT/US2023/017846 (“Altered Cytidine Deaminases and Methods of Use”), filed April 7, 2023, which is hereby incorporated by reference in its entirety.
In general, methods for using a cytidine deaminase include contacting target nucleic acids, e g., DNA or RNA, with the enzyme, under conditions suitable for conversion of modified cytidines, such as 5mC, to thymidine, or for conversion of unmodified cytidine to uracil. Because amplification of DNA does not preserve the modification status of cytidine (e.g., the methylation status of 5mC is not retained), use of a cytidine deaminase typically occurs before amplification of target DNA. Target nucleic acids can be contacted with cytidine deaminase at essentially any time. For instance, target nucleic acids can be contacted with cytidine deaminase after isolation of genomic or cell free DNA or mRNA, before or after fragmentation, or before or after tagmentation. The skilled person will recognize that target nucleic acids can be contacted with a cytidine deaminase after addition of a universal sequence and/or an adapter, provided the universal sequence and/or an adapter is not added by amplification.
Reaction conditions suitable for conversion of modified cytidines, such as 5mC, to thymidine by a cytidine deaminase include, but are not limited to, a substrate of target nucleic acid suspected of including at least one modified cytidine, with appropriate pH, temperature of the reaction, time of the reaction, and concentration of the cytidine deaminase and/or DNA or RNA substrate. It is expected that a cytidine deaminase can function in essentially any buffer. Examples of useful buffers include, but are not limited to, a citrate buffer, such as the citrate buffer available from Thermo Fisher Scientific (Cat. No. #005000); sodium acetate buffer, Bis Tris-Propane HC1; and Tris-HCl Tris. Examples of other buffers include, but are not limited to, Bicine, DIPSO, glycylglycine, HEPES, imidazole, malonate, MES, MOPS, PB, phosphate, PIPES, SPG, succinate, TAPS, TAPSO, tricine. Cytidine deaminases typically function at nearneutral pH, e g., pH 7. In some embodiments a reducing agent such as dithiothreitol (DTT) can be present. In some embodiments a divalent cation is not included. A deamination reaction can occur at a temperature of about 25°C to about 60°C, including but not limited to, at about 37°C, at about 45°C, at about 50°C, and at about 60°C.
Some cytidine deaminases preferentially deaminate a modified cytosine to thymidine at a faster rate than deamination of cytosine to uracil. Thus, in some embodiments the time of reaction can be used to allow the reaction to run to completion, to maximize the difference of deamination of modified cytosine versus deamination of cytosine. In some embodiments, the reaction can proceed for at least 15 minutes, at least 30 minutes, at least 45 minutes, at least 60 minutes, at least 90 minutes, at least 120 minutes, or at least 150 minutes, and for no greater than 15 minutes, no greater than 30 minutes, no greater than 45 minutes, no greater than 60 minutes, no greater than 90 minutes, no greater than 120 minutes, no greater than 150 minutes, or no greater than 180 minutes. In some embodiments, the reaction can run overnight.
In some embodiments, a deamination reaction can include a cytidine deaminase at a concentration from at least about 25 nanomolar (nM) to no greater than about 5 micromolar (pM). For instance, the concentration of the enzyme can be at least about 25 nM, at least about 0.5, at least about 1 pM, at least about 2pM, at least about 3 pM, at least about 4 pM, or at least about 5 pM, and/or no greater than 5 pM, no greater than 4 pM, no greater than 3 pM, no greater than 2 pM, no greater than 1 pM, or 0.5 pM. In some embodiments, a deamination reaction can include about 1 ng to about 1 pg input nucleic acid. In some embodiments, a deamination reaction can include nucleic acids at a concentration of at least about 10 pM to at least about 200 nM.
Uracil-DNA-glycosylase
With the methods described herein, after a preparation of single-stranded DNA (ssDNA) fragments has been treated with a cytidine deaminase, it is then contacted with an Uracil-DNA- glycosylase. Uracil-DNA-glycosylase (UDG), also known as Uracil-N-glycosylase (UNG), is a highly conserved repair enzyme that catalyzes the excision of uracil from uracil-containing single- and double-stranded DNA but is inactive on RNA. It is a monomeric protein with relatively stable physicochemical properties, a small molecular weight of 25KDa, and is widely present in various prokaryotic and eukaryotic organisms. See, for example, Holz et al., 2019, Scientific Reports,' 9: 17822; Schormann et al., 2014, Protein Sci; 23: 1667-1685; Zharkov et al., 2010, Mutation Research 685, 11-20; Stivers et al., 2001, Arch Biochem Biophys; 396, 1-9; Parikh et al., 2000, Proc Natl Acad Sci USA; 97:5083; Pearl, 2000, MutatRes 460, 165-181; Lindahl, 1982, Annu Rev Biochem; 51 :61— 87; and Lindahl et al., 1977, J Biol Chem; 252:3286- 3294.
UDG excises uracil from DNA by hydrolyzing the N-glycoside bond between the uracil base and the sugar-phosphate backbone in single- and double-stranded DNA (Bellamy et al., 2007, Nucleic Acids Res; 35: 1478-1487; Slupphaug et al., 1996, Nature 384, 87-92; Stivers et al., 1999, Biochemistry; 38:952-963; and Parikh et al., 2000, MutatRes 460: 183-199), resulting in the formation of an abasic site (AP-site) having a hemiacetal moiety.
A schematic illustration of the UDG-mediated generation of single nucleotide gaps within single stranded DNA fragments is shown in FIG. 1. Because false positive (cytosine) deamination results in uracil bases, and true positive (methylcytosine) bases result in thymine bases, UDG can be utilized to specifically recognize and remove uracil bases, thus removing the false positive signal and preventing its propagation as a “T” in downstream amplification and sequencing. APOB EC enzymes require ssDNA for recognition, and thus deaminated DNA will be single stranded.
A variety of UDG enzymes are commercially available, including, for example, E. coli Uracil-DNA Glycosylase (UDG) (New England Biolabs, Catalog # M0280S, see the worldwide web at neb.com/products/m0280-uracil-dna-glycosylase-udg#Product%20Information) and a heat-labile Uracil DNA Glycosylase (UDG/UNG) isolated from a psychrophilic marine bacteria (Yeasen Biotechnology (Shanghai) Co., Ltd., Catalog #10707ES, see the worldwide web at yeasenbiotech.com/solutiondetail/79?gclid=EAIaIQobChMI_Oie4unY- gIV3xCtBh0hRwGHEAAYASAAEgKsx_D_BwE). In some embodiments, the UDG is of commercial origin.
Reaction conditions suitable for the UDG-mediated excision of uracil from DNA include, but are not limited to, concentration of the single stranded or double stranded DNA substrate, pH, temperature of the reaction, time of the reaction, and concentration of the UDG enzyme. UDG is active over a broad pH range with an optimum at pH 8.0, does not require divalent cation, and is inhibited by high ionic strength (> 200 niM). It is expected that a UDG can function in any of a variety of buffers. An example of a useful buffer includes, but is not limited to, IX UDG Reaction Buffer (New England Biolabs, Catalog # B0280S, see the worldwide web at neb.com/products/m0280-uracil-dna-glycosylase-udg#Product%20Information) which is 20 mM Tris-HCl, ImM DTT, ImM EDTA (pH 8 at 25°C). Uracil-DNA Glycosylase is active over a broad pH range, with an optimum at pH 8.0, does not require a divalent cation, and is inhibited by high ionic strength (> 200 pM). Uracil-DNA Glycosylase is active in a temperature of 25°C to 37°C and in some embodiments the reaction can proceed in a temperature of 25°C to 37°C. In some embodiments, the reaction can proceed at 37°C. In some embodiments, the reaction can proceed for about 5 minutes, about 10 minutes, about 15 minutes, about 20 minutes, about 30 minutes, about 45 minutes, about 60 minutes, about 90 minutes, about 120 minutes, or any range thereof. In some embodiments, a reaction can include about O.OOlU/pl to about 1 U/ pl UDG enzyme, wherein one unit is defined as the amount of enzyme that catalyzes the release of 60 pmol of uracil per minute from double-stranded, uracil-containing DNA. Activity is measured by release of [3H]-uracil in a 50 pl reaction containing 0.2 pg DNA (104-105 cpm/pg) in 30 minutes at 37°C (see the worldwide web at neb.com/products/m0280-uracil-dna-glycosylase- udg#Product%20Information). In some embodiments, a reaction can include about 0.05 U/ pl UDG. In some embodiments, a reaction can include nucleic acids at a concentration of about Ing to about lug of input nucleic acid. In some embodiments, a reaction can include nucleic acids at a concentration of about ~10pM to about 200nM. In some embodiments, a reaction can include nucleic acids at a concentration of about 200pM to about 20nM.
Complementary strand synthesis
After the enzymatic treatment of single stranded DNA fragments with Uracil DNA Glycosylase (UDG), which deglycosylates uracil residues, forming abasic sites, the preparation of single stranded DNA fragments is incubated with a high-fidelity polymerase supplemented with a deoxycytidyl transferase, such as Revl, to generate a complementary DNA fragment in which the abasic site lesions are repaired with the installation of a cytidine. This results in a complementary DNA fragments in which false positive uracil residues have been corrected to cytosine. This is shown schematically in FIG. 1.
Any of the many protocols available for the synthesis of a complementary second DNA strand are compatible with the methods described herein. In addition to the cocktail of a high- fidelity polymerase supplemented with a deoxy cytidyl transferase, a mixture of all four deoxyribonucleoside 5 '-triphosphates (dNTPs) and an appropriate primer are provided for the synthesis of the second complementary strand. These four types of dNTP include adenine (dATP), cytosine (dCTP), guanine (dGTP), and thymine (dTTP). Primers include, but are not limited to, a primer complementary to the 3' end library adapter, and random oligonucleotides of about 18 to 22 bases in length.
In some embodiments, the two enzymatic steps of 1) UDG treatment to generate abasic sites where dC to dU conversions have occurred and 2) complementary strand synthesis with a high-fidelity polymerase supplemented with a deoxycytidyl transferase may be carried out separately and sequentially. In some embodiments, the two enzymatic steps may be carried out simultaneously in the same reaction mixture.
High fidelity polymerase
With the methods described herein, a high-fidelity polymerase is utilized for the synthesis of complementary strands. The fidelity of a DNA polymerase refers to its ability to accurately replicate a template. A critical aspect of this is the ability of the DNA polymerase to read a template strand, select the appropriate nucleoside triphosphate and insert the correct nucleotide at the 3' primer terminus, such that canonical Watson-Crick base pairing is maintained. The rate of misincorporation (incorporating the incorrect nucleotide) is known as the polymerase's “error rate.” In addition to effective discrimination for correct over incorrect nucleotide incorporation, some DNA polymerases possess a 3 ' — >5' exonuclease activity. This activity, also termed “proofreading,” is used to excise incorrectly incorporated mononucleotides that are then replaced with the correct nucleotide. High-fidelity DNA polymerases demonstrate a low error rate and result in a high degree of accuracy in the replication of the DNA of interest (“Polymerase Fidelity: What is it, and what does it mean for your PCR?” available on the worldwide web at neb.com/tools-and-resources/feature-articles/polymerase-fidelity-what-is-it-and-what-does-it- mean-for-your-pcr).
A high fidelity polymerases replicates DNA with the introduction of minimal errors. Examples include, but are not limited to, VENT® DNA Polymerase (New England BioLabs, Inc.), PHUSION® High-Fidelity DNA Polymerase (Thermo Scientific), Q5® High-Fidelity DNA Polymerase, T4 DNA polymerase, and E. coli DNA polymerase (Journal of Molecular Biology 336, no. 5 (2004): 1023-34). In some embodiments, a high fidelity polymerase is an archaeal polymerase. The fidelity of various polymerases is presented in more detail in the chart available on the worldwide web at neb.com/tools-and-resources/selection-charts/dna-polymerase- selection-chart and reviewed by Johnson (Johnson, 2010, Biochim Biophys Acla 1804(5): 1041— 1048)) and by Cline et al., (Clien et al., 1996, Nucleic Acids Research, 24(18):3546-3551).
Deoxycytidyl transferase
With the methods described herein, for the synthesis of complementary strands, the high- fidelity polymerase is supplemented with a deoxycytidyl transferase. Deoxycytidyl transferases are Y family polymerases that are involved in DNA repair, complementing other polymerases to prevent their stalling at translesion sites, by transferring a dCMP residue from dCTP to the 3 '-end of a DNA primer in a template-dependent reaction. Deoxycytidyl transferases assist in the bypass of a abasic lesion by the insertion of a nucleotide opposite the lesion. Specifically, a deoxycytidyl transferase demonstrates a preferential and limited incorporation of dCMP in a template-directed manner regardless of the template nucleotide, always inserting a deoxy cytidine (dC) across from a lesion. Whether G, A, T, C, or an abasic site, a deoxycytidyl transferase will always add a C. One example of a deoxycytidyl transferases is Revl. See, for example, Gibbs et al., 2000, PNAS,- 97:4186-4191; Lin et al., 1999, Nucleic Acids Res,' 27(22):4468-75; Masuda et al., 2001, J Biol Chem, 276:15051-15058; Murakumo et al., 2001, J Biol Chem, 276:35644- 35651; Nair et al., 2005, Science,' 309(5744):2219-22; Nelson et al., 1996, Nature, 382(6593): 729-31; Prasad et al., 2016, Nucleic Acids Res,' 44(22): 10824-10833; and Weaver et al., 2020, TWAS; 117(41):25494-25504.
Deoxycytidyl transferases may be produced recombinantly and are commercially available. For example, recombinant human REV1 protein (Catalog REV1-1531H), recombinant mouse REV1 protein (Catalog # REV1-14090M), recombinant Chicken REV1 (Catalog # REV1-2508C), recombinant zebrafish REV1 (Catalog # REV1-6683Z), and recombinant yeast Revl protein (Catalog #Revl-1532Y) are commercially available from Creative Biomart Inc. (Shirley, NY) and Human Revl (Catalog # PT-A04738) is available from Novatein Biosciences, Woburn MA. Reaction conditions may include any of those discussed in Brown et al. (Brown et al., 2010, Biochemistry, 49(26):5504-5510). In some embodiments, unnatural dCTP derivatives, including, but not limited to, any of those discussed those discussed in Salem et al. (Salem et al., 2009, J Bacterial, 191 ( 18): 5657-68) may be used.
Sequencing
In some embodiments, corrected DNA fragments may be sequenced. Sequencing may be by any of a variety of known methodologies, including, but not limited to any of a variety high- throughput, next generation sequencing (NGS) platforms, including, but not limited to, sequencing by synthesis, sequencing by ligation, nanopore sequencing, Sanger sequencing, and the like. In some embodiments, sequencing is performed using the sequencing by synthesis methodologies commercialized by ILLUMINA® as described in U.S. Patent Application Publication No. 2007/0166705, U.S. Patent Application Publication No. 2006/0188901, U.S. Pat. No. 7,057,026, Beijing Genomics Institute (BG) as described in Carnevali et al., 2012, J Comput Biol,' 9(3):279-92, or the ion semiconductor sequencing methodologies of ION TORRENT™ as described in US 2009/0026082 Al; US 2009/0127589 Al; US 2010/0137143 Al; or US 2010/0282617 Al, each of which is incorporated herein by reference.
Next Generation Sequencing (NGS) refers to sequencing methods that allow for massively parallel sequencing of clonally amplified molecules and of single nucleic acid molecules. Non-limiting examples of NGS include sequencing-by-synthesis using reversible dye terminators, and sequencing-by-ligation.
Preferred embodiments include sequencing-by-synthesis (SBS) techniques. SBS techniques generally involve the enzymatic extension of a nascent nucleic acid strand through the iterative addition of nucleotides against a template strand.
In some embodiments, repaired fragments are cloned, followed by Sanger sequencing of clones to assess methylation.
In some embodiments, rather than sequencing, the readout may be obtained by the use of an array, using for example, procedures as described on the worldwide web illumina.com/techniques/microarrays/methylation-arrays.html. As used herein, the term “array” refers to a population of sites that can be differentiated from each other according to relative location. Different molecules that are at different sites of an array can be differentiated from each other according to the locations of the sites in the array. An individual site of an array can include one or more molecules of a particular type. For example, a site can include a single target nucleic acid molecule having a particular sequence or a site can include several nucleic acid molecules having the same sequence (and/or complementary sequence, thereof). The sites of an array can be different features located on the same substrate. Exemplary features include without limitation, droplets, wells in a substrate, beads (or other particles) in or on a substrate, projections from a substrate, ridges on a substrate or channels in a substrate. The sites of an array can be separate substrates each bearing a different molecule. Different molecules attached to separate substrates can be identified according to the locations of the substrates on a surface to which the substrates are associated or according to the locations of the substrates in a liquid or gel. Exemplary arrays in which separate substrates are located on a surface include, without limitation, those having beads in wells.
Amplification
In some embodiments, the corrected DNA fragments may be amplified. It will be appreciated that any of the amplification methodologies described herein or generally known in the art may be used with universal or target-specific primers to amplify DNA fragments. Suitable methods for amplification include, but are not limited to, the polymerase chain reaction (PCR), strand displacement amplification (SDA), transcription mediated amplification (TMA) and nucleic acid sequence-based amplification (NASBA), as described in U.S. Pat. No. 8,003,354. The above amplification methods may be employed to amplify one or more nucleic acids of interest. For example, PCR, including multiplex PCR, SDA, TMA, NASBA and the like may be utilized to amplify DNA fragments. In some embodiments, primers directed specifically to the polynucleotide of interest are included in the amplification reaction.
As used herein, “amplify,” “amplifying” or “amplification reaction” and their derivatives, refer generally to any action or process whereby at least a portion of a nucleic acid molecule is replicated or copied into at least one additional nucleic acid molecule. The additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the target nucleic acid molecule. The target nucleic acid molecule can be single-stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or double-stranded. Amplification optionally includes linear or exponential replication of a nucleic acid molecule. In some embodiments, such amplification can be performed using isothermal conditions; in other embodiments, such amplification can include thermocycling. In some embodiments, the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction. In some embodiments, “amplification” includes amplification of at least some portion of DNA and RNA based nucleic acids alone, or in combination. The amplification reaction can include any of the amplification processes known to one of ordinary skill in the art.
As used herein, “amplification conditions” and its derivatives, generally refers to conditions suitable for amplifying one or more nucleic acid sequences. Such amplification can be linear or exponential. In some embodiments, the amplification conditions can include isothermal conditions or alternatively can include thermocyling conditions, or a combination of isothermal and thermocycling conditions. In some embodiments, the conditions suitable for amplifying one or more nucleic acid sequences include polymerase chain reaction (PCR) conditions. Typically, the amplification conditions refer to a reaction mixture that is sufficient to amplify nucleic acids such as one or more target sequences, or to amplify an amplified target sequence ligated to one or more adapters, e.g., an adapter-ligated amplified target sequence.
Generally, the amplification conditions include a catalyst for amplification or for nucleic acid synthesis, for example a polymerase; a primer that possesses some degree of complementarity to the nucleic acid to be amplified; and nucleotides, such as deoxyribonucleotide triphosphates (dNTPs) to promote extension of the primer once hybridized to the nucleic acid. The amplification conditions can require hybridization or annealing of a primer to a nucleic acid, extension of the primer and a denaturing step in which the extended primer is separated from the nucleic acid sequence undergoing amplification. Typically, but not necessarily, amplification conditions can include thermocycling; in some embodiments, amplification conditions include a plurality of cycles where the steps of annealing, extending, and separating are repeated. Typically, the amplification conditions include cations such as Mg++ or Mn++ and can also include various modifiers of ionic strength.
As used herein, the term “polymerase chain reaction” (PCR) refers to the method of K. B. Mullis as described in U.S. Pat. Nos. 4,683,195 and 4,683,202, which describes a method for increasing the concentration of a segment of a polynucleotide of interest in a mixture of genomic DNA without cloning or purification. This process for amplifying the polynucleotide of interest consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired polynucleotide of interest, followed by a series of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double-stranded polynucleotide of interest. The mixture is denatured at a higher temperature first and the primers are then annealed to complementary sequences within the polynucleotide of interest molecule. Following annealing, the primers are extended with a polymerase to form a new pair of complementary strands. The steps of denaturation, primer annealing, and polymerase extension can be repeated many times (referred to as thermocycling) to obtain a high concentration of an amplified segment of the desired polynucleotide of interest. The length of the amplified segment of the desired polynucleotide of interest (amplicon) is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of repeating the process, the method is referred to as the “polymerase chain reaction” (hereinafter “PCR”). Because the desired amplified segments of the polynucleotide of interest become the predominant nucleic acid sequences (in terms of concentration) in the mixture, they are said to be “PCR amplified.” In a modification to the method discussed above, the target nucleic acid molecules can be PCR amplified using a plurality of different primer pairs, in some cases, one or more primer pairs per target nucleic acid molecule of interest, thereby forming a multiplex PCR reaction.
As used herein, the term “amplicon,” when used in reference to a nucleic acid, means the product of copying the nucleic acid, wherein the product has a nucleotide sequence that is the same as or complementary to at least a portion of the nucleotide sequence of the nucleic acid. An amplicon can be produced by any of a variety of amplification methods that use the nucleic acid, or an amplicon thereof, as a template including, for example, PCR, rolling circle amplification (RCA), ligation extension, or ligation chain reaction. An amplicon can be a nucleic acid molecule having a single copy of a particular nucleotide sequence (for example, a PCR product) or multiple copies of the nucleotide sequence (for example, a concatameric product of RCA). A first amplicon of a target nucleic acid is typically a complimentary copy. Subsequent amplicons are copies that are created, after generation of the first amplicon, from the target nucleic acid or from the first amplicon. A subsequent amplicon can have a sequence that is substantially complementary to the target nucleic acid or substantially identical to the target nucleic acid. As defined herein “multiplex amplification” refers to selective and non-random amplification of two or more target sequences within a sample using at least one target-specific primer. In some embodiments, multiplex amplification is performed such that some or all of the target sequences are amplified within a single reaction vessel. The “plexity” or “plex” of a given multiplex amplification refers generally to the number of different target-specific sequences that are amplified during that single multiplex amplification. In some embodiments, the plexy can be about 12-plex, 24-plex, 48-plex, 96-plex, 192-plex, 384-plex, 768-plex, 1536-plex, 3072-plex, 6144-plex or higher. It is also possible to detect the amplified target sequences by several different methodologies (e.g., gel electrophoresis followed by densitometry, quantitation with a bioanalyzer or quantitative PCR, hybridization with a labeled probe; incorporation of biotinylated primers followed by avidin-enzyme conjugate detection; incorporation of 32P- labeled deoxynucleotide triphosphates into the amplified target sequence).
As used herein, the term “amplification site” refers to a site in or on an array where one or more amplicons can be generated. An amplification site can be further configured to contain, hold, or attach at least one amplicon that is generated at the site.
Samples
With the methods described herein, the target nucleic acids (also referred to herein as “DNA fragments” or “a preparation of DNA fragments from an input sample”) may be essentially any nucleic acid of known or unknown sequence.
Such target nucleic acids are typically derived from primary nucleic acids present in a sample, such as a biological sample. The primary nucleic acids may originate as DNA or RNA. DNA primary nucleic acids may originate in double-stranded DNA (dsDNA) form (e.g., genomic DNA, genomic DNA fragments, cell-free DNA, and the like) from a sample or may originate in single-stranded form from a sample. RNA primary nucleic acids may be mRNA or non-coding RNA, e.g., microRNA or small interfering RNA. A preparation of DNA fragments from an input sample may be single or double stranded DNA.
The primary nucleic acid molecules may represent the entire genetic complement of an organism, e.g., genomic DNA molecules which include both intron and exon sequences, as well as non-coding regulatory sequences such as promoter and enhancer sequences. The primary nucleic acid molecules may represent the entire genetic complement of specific cells of an organism, e.g., from tumor cells, where the genomic DNA molecules which include both intron and exon sequences, as well as non-coding regulatory sequences such as promoter and enhancer sequences. In one embodiment, particular subsets of genomic DNA can be used, such as, for example, particular chromosomes, DNA associated with open chromatin, DNA associated with closed chromatin, or one or more specific sequences such as a region of a specific gene (e.g., targeted sequencing). In one or more embodiments, the primary nucleic acid molecules may represent a particular subset of DNA, e.g., DNA having a specific sequence that anneals with a primer such as one used for targeted sequencing or target enrichment. In one embodiment, a particular subset of DNA can be used, such as cell-free DNA, which can include DNA of the subject including DNA from normal cells, DNA from diseased cells such as tumor cells, and/or DNA from fetal cells.
The primary nucleic acid molecules may represent the entire transcriptome of cells of an organism, e.g., mRNA molecules. The primary nucleic acid molecules may represent the entire transcriptome of specific cells of an organism, e.g., from tumor cells or for instance the cells of a tissue. In one embodiment, the primary nucleic acid molecules may represent a particular subset of mRNA, e.g., mRNA having a specific sequence that anneals with a primer such as one used for targeted sequencing or target enrichment.
A sample, such as a biological sample, can include nucleic acid molecules obtained from biopsies, tumors, scrapings, swabs, blood, mucus, urine, plasma, semen, hair, laser capture micro-dissections, surgical resections, and other clinical or laboratory obtained samples. In some embodiments, the sample can be an epidemiological, agricultural, forensic, or pathogenic sample. In some embodiments, the sample can include cultured cells. In some embodiments, the sample can include nucleic acid molecules obtained from an animal such as a human or mammalian source. In another embodiment, the sample can include nucleic acid molecules obtained from a non-mammalian source such as a plant, bacteria, virus, or fungus. In some embodiments, the source of the nucleic acid molecules may be an archived or extinct sample or species.
Additional non-limiting examples of sources of biological samples can include whole organisms as well as a sample obtained from a subject or a patient. The biological sample can be obtained from any biological fluid or tissue and can be in a variety of forms, including liquid fluid and tissue, solid tissue, and preserved forms such as dried, frozen, and fixed forms. The sample may be of any biological tissue, cells, or fluid. Such samples include, but are not limited to, sputum, blood, serum, plasma, blood cells (e.g., white cells), ascitic fluid, urine, saliva, tears, sputum, vaginal fluid (discharge), washings obtained during a medical procedure (e.g., pelvic or other washings obtained during biopsy, endoscopy or surgery), tissue, nipple aspirate, core or fine needle biopsy samples, cell-containing body fluids, peritoneal fluid, and pleural fluid, or cells therefrom, and free floating nucleic acids such as cell-free circulating DNA. Biological samples may also include sections of tissues such as frozen or fixed sections taken for histological purposes or micro-dissected cells or extracellular parts thereof. In some embodiments, the sample can be a blood sample, such as, for example, a whole blood sample. In another example, the sample is an unprocessed dried blood spot (DBS) sample. In yet another example, the sample is a formalin-fixed paraffin-embedded (FFPE) sample. In yet another example, the sample is a saliva sample. In yet another example, the sample is a dried saliva spot (DSS) sample.
Exemplary biological samples from which target nucleic acids can be derived include, for example, those from a eukaryote, for instance a mammal, such as a rodent, mouse, rat, rabbit, guinea pig, ungulate, horse, sheep, pig, goat, cow, cat, dog, primate, human or non-human primate; a plant, such as Arabidopsis ihciliana, com, sorghum, oat, wheat, rice, canola, or soybean; an algae, such as Chlamydomonas reinhardtir, a nematode such as Caenorhabditis elegans', an insect, such as Drosophila melanogaster , mosquito, fruit fly, honey bee or spider; a fish, such as zebrafish; a reptile; an amphibian, such as a frog or Xenopus laevis,' a Dictyostelium discoideum, a fungi, such as Pneumocystis carinii, Takifugu rubripes, yeast, Saccharamoyces cerevisiae, or Schizosaccharomyces pombc, or a protozoan such as Plasmodium falciparum. Target nucleic acids can also be derived from a prokaryote such as a bacterium, Escherichia coli, Staphylococcus or Mycoplasma pneumoniae,' an archaeon; a virus such as Hepatitis C virus or human immunodeficiency virus; or a viroid. Target nucleic acids can be derived from a homogeneous culture or population of organisms described herein or alternatively from a collection of several different organisms, for example, in a community or ecosystem.
In some embodiments, a biological sample includes tissue that is processed to obtain the desired primary nucleic acids. In some embodiments, cells are used obtain the desired primary nucleic acids. In some embodiments, nuclei are used to obtain the desired primary nucleic acids. The method can further include dissociating cells, and/or isolating nuclei from cells. Methods for isolating cells and nuclei from tissue are available (WO 2019/236599).
In some embodiments, nucleic acids present in tissue, in cells, or in isolated nuclei can be processed depending on the desired read-out. For instance, nucleic acids can be fixed during processing, and useful fixation methods are available (WO 2019/236599). Fixation can be useful to preserve a sample or maintain contiguity of analytes from a sample, a cell, or a nucleus. Fixation methods preserve and stabilize tissue, cell, and nucleus morphology and architecture, inactivates proteolytic enzymes, strengthens samples, cells, and nuclei so they can withstand further processing and staining, and protects against contamination. Examples of methods where fixation can be useful include, but are not limited to, whole genome sequencing of isolated nuclei and chromosome conformation capture methods such as Hi-C. Common methods of fixation include perfusion, immersion, freezing, and drying (Srinivasan et al., Am J Pathol. 2002 Dec; 161(6): 1961-1971. doi: 10.1016/S0002-9440(10)64472-0). In some embodiments such as whole genome sequencing, isolated nuclei can be processed to dissociate nucleosomes from DNA while leaving the nuclei intact, and methods for generating nucleosome-free nuclei are available (WO 2018/018008).
In some embodiments, primary nucleic acids in bulk, e.g., from a plurality of cells, can be used to produce a sequencing library as described herein. In other embodiments, individual cells or nuclei can be used as sources of primary nucleic acids to obtain sequence information from single cells and nuclei. Many different single cell library preparation methods are known in the art, including, but not limited to, Drop-seq, Seq-well, and single cell combinatorial indexing ("sci-") methods. Companies providing single cell products and related technologies include, but are not limited to, Illumina, 10X genomics, Takara Biosciences, BD biosciences, Biorad, Icellbio, isoplexis, CellSee, nanoselect, and Dolomite bio. Sci-seq is a methodological framework that employs split-pool barcoding to uniquely label the nucleic acid contents of large numbers of single cells or nuclei. Typically, the number of nuclei or cells can be at least two. The upper limit is dependent on the practical limitations of equipment (e.g., multi-well plates, number of indexes) used in other steps of the methods as described herein. The number of nuclei or cells that can be used is not intended to be limiting and can number in the billions.
The target nucleic acids used in the methods and compositions of the present disclosure can be derived by fragmentation. Random fragmentation refers to the fragmentation of a polynucleotide molecule from a primary nucleic acid sample in a non-ordered fashion by enzymatic, chemical, or mechanical methods. Such fragmentation methods are known in the art and use standard methods (Sambrook and Russell, Molecular Cloning, A Laboratory Manual, third edition). Moreover, random fragmentation is designed to produce fragments irrespective of the sequence identity or position of nucleotides comprising and/or surrounding the break. In one or more embodiments, the random fragmentation is by mechanical means such as nebulization or sonication to produce fragments of about 50 base pairs in length to about 1500 base pairs in length, for example, about 50-700 base pairs in length, about 50-400 base pairs in length. In some preferred embodiments, fragments are about 100 to 300 base pairs in length or about 100 to 200 base pairs in length.
In some embodiments, the DNA fragments are DNA library fragments. Any of the many library preparation protocols available are compatible with the methods described herein. A library may be a whole-genome library or a targeted library. A library includes, but is not limited to, a sequencing library. A multitude of sequencing library methods are known to a skilled person (see, for example, Sequencing Methods Review, available on the world wide web at illumina.com/content/dam/illumina-marketing/documents/products/research_ reviews/sequencing-methods-review.pdf). For example, library preparation may be for use with any of a variety of next generation sequencing platforms, such as for example, the sequencing by synthesis platform of ILLUMINA® or the ion semiconductor sequencing platform of ION TORRENT™. For example, established ligase-dependent methods or transposon-based methods may be used (see, for example, Head et al, 2014, Biotechniques; 56(2):61 and Bruinsma et al., 2019, BMC Genomics, 19:722) and numerous kits for making sequencing libraries by these methods are available commercially from a variety of vendors.
DNA fragments, including DNA library fragments, may be prepared from input sample material such that adapter sequences are ligated to fragments to facilitate downstream workflow steps, such as for example, degradation of the second strand, amplification, and/or sequencing. For example, universal amplification sequences, e.g., sequences present in a universal adaptor, may be placed at the ends of each nucleotide fragment to facilitate amplification. Methods for attaching adapters to a nucleic acid are known to the person skilled in the art. For example, the attachment can be through tagmentation using transposase complexes (WO 2016/130704), or through standard library preparation techniques using ligation (U.S. Pat. Pub. No. 2018/0305753). Addition of an adapter can occur before or after treatment of the target nucleic acid with a cytidine deaminase and/or an uracil de-glycosylase.
Adapter sequences may include 5' and/or 3' adapter sequences. An adapter may be attached to just one end of the DNA fragment, for example, 5' and/or 3' ends, or to both ends. As used herein, the term “adapter” and its derivatives, e.g., universal adapter, refers generally to any linear oligonucleotide which can be attached to a target nucleic acid. An adapter can be singlestranded or double-stranded DNA or can include both double-stranded and single- stranded regions. An adapter can include a universal sequence that is substantially identical, or substantially complementary, to at least a portion of a primer, for example a universal primer; an index (also referred to herein as a barcode or tag) to assist with downstream error correction, identification, or sequencing; and/or a unique molecular identifier. In some embodiments, the adapter is substantially non-complementary to the 3' end or the 5' end of any target sequence present in the sample. In some embodiments, adapter sequences may have one or more phosphorothioate bonds at the 5' end of the adapter sequences. In some embodiments, suitable adapter lengths are in the range of about 6-100 nucleotides, about 12-60 nucleotides, or about 15- 50 nucleotides in length. For instance, The terms “adaptor” and “adapter” are used interchangeably. As used herein, the term “universal,” when used to describe a nucleotide sequence, refers to a region of sequence that is common to two or more nucleic acid molecules where the molecules also have regions of sequence that differ from each other. Non-limiting examples of universal capture sequences include sequences that are identical to or complementary to P5 and P7 primers. The terms “P5” and “P7” may be used when referring to a universal capture sequence or a capture oligonucleotide. The terms “P51” (P5 prime) and “P71” (P7 prime) refer to the reverse complement of P5 and P7, respectively. It will be understood that any suitable universal capture sequence or a capture oligonucleotide can be used in the methods presented herein, and that the use of P5 and P7 are exemplary embodiments only. Uses of capture oligonucleotides such as P5 and P7 or their complements on flowcells are known in the art, as exemplified by the disclosures of WO 2007/010251, WO 2006/064199, WO 2005/065814, WO 2015/106941, WO 1998/044151, and WO 2000/018957. For example, any suitable forward amplification primer, whether immobilized or in solution, can be useful in the methods presented herein for hybridization to a complementary sequence and amplification of a sequence. Similarly, any suitable reverse amplification primer, whether immobilized or in solution, can be useful in the methods presented herein for hybridization to a complementary sequence and amplification of a sequence. One of skill in the art will understand how to design and use primer sequences that are suitable for capture and/or amplification of nucleic acids as presented herein.
DNA fragments, including DNA library fragments, can have an average strand length that is desired or appropriate for a particular application of the methods, compositions, or kits set forth herein. For example, the average strand length can be less than about 100,000 nucleotides, 50,000 nucleotides, 10,000 nucleotides, 5,000 nucleotides, 1,000 nucleotides, 500 nucleotides, 300 nucleotides 200 nucleotides, 100 nucleotides, or 50 nucleotides. Alternatively, or additionally, the average strand length can be greater than about 10 nucleotides, 50 nucleotides, 100 nucleotides, 200 nucleotides, 500 nucleotides, 1,000 nucleotides, 5,000 nucleotides, 10,000 nucleotides, 50,000 nucleotides, or 100,000 nucleotides. The average strand length for a population of DNA fragments can be in a range between any maximum and minimum value set forth above.
In some embodiments, DNA fragments, including DNA library fragments, may be of a shorter length, for example, about 50 nucleotides to about 500 nucleotides in length, about 50 nucleotides to about 300 nucleotides in length, about 50 nucleotides to about 250 nucleotides in length, about 50 nucleotides to about 200 nucleotides in length, about 50 nucleotides to about 100 nucleotides in length, about 100 nucleotides to about 200 nucleotides in length, about 100 nucleotides to about 250 nucleotides in length, about 100 nucleotides to about 300 nucleotides in length, or about 100 nucleotides to about 500 nucleotides in length. Shorter fragment length can be employed to maximize the overall performance of the enzymatic error-correction, by minimizing the number of potential false-positive uracils that may be present in any one individual DNA fragment. By minimizing the number of false positives in each fragment, the probability of successfully repairing library fragments and mediating their downstream amplification may be increased, thus preserving library quality and diversity.
Kits
The present disclosure also provides kits for undertaking a TraPR method as described herein, for the reduction of false positive uracil residues due to deamination of unmethylated cytosines in an assay using a cytosine deaminase to deaminate methylated cytosines. The present disclosure also provides kits for directly sequencing DNA fragments to identify abasic sites and/or uracil residues, for example as in the method of Jian et al., a UdgX cross-linking and polymerase stalling sequencing (“Ucaps-seq”) method to detect dU at single-nucleotide resolution (Jiang et al., 2022, J Am Chem Soc 144: 1323-1331).
In some embodiments, a kit may include at least one or more of a cytosine deaminase, an uracil DNA glycosylase (UDG), a high fidelity polymerases, a deoxy cytidyl transferase, primers, and/or dNTPs in a suitable packaging material in an amount sufficient for at least one reaction. In some embodiments, the deoxycytidyl transferase is Revl. In some embodiments, the primer is primer complementary to the 3' end library adapter capable of binding to single stranded DNA library fragments comprising 5' end and 3' end library adapters. In some embodiments, a kit may also include a dCTP derivative.
A cytosine deaminase may be an altered cytosine deaminase, including, but not limited to any of those described herein or as described in International Patent Application No. PCT/US2023/017846 (“Altered Cytidine Deaminases and Methods of Use”), filed April 7, 2023, which is hereby incorporated by reference in its entirety.
A kit may include one or more other components. Examples of other components include, for example, a PCR polymerase, PCR master mix, a DNA denaturation solution (such as for example, NaOH, formamide, or DMSO), a cytosine deaminase buffer, a UDG reaction buffer, DNA purification beads for purification steps, a positive control polynucleotide, such as a double-stranded DNA including one or more known modified cytosines for use in measuring efficiency, or a negative control polynucleotide, such as a double-stranded DNA including unmodified cytosines. Optionally, other reagents such as buffers and solutions are also included. Instructions for use of the packaged components are also typically included.
As used herein, the term "package" refers to a solid matrix or material such as glass, plastic, paper, foil, and the like, capable of holding within fixed limits the polypeptides. "Instructions for use" typically include a tangible expression describing the reagent concentration or at least one assay method parameter, such as the relative amounts of reagent and sample to be admixed, maintenance time periods for reagent/sample admixtures, temperature, buffer conditions, and the like.
As used herein, the phrase “packaging material” refers to one or more physical structures used to house the contents of the kit. The packaging material is constructed by known methods, preferably to provide a sterile, contaminant-free environment. The packaging material has a label which indicates that the components can be used for the reducing uracil residues due to deamination of unmethylated cytosines in an assay using a cytosine deaminase to deaminate methylated cytosines.
Exemplary Aspects
The invention is defined in the claims. However, below there is provided a non- exhaustive listing of non-limiting exemplary aspects. Any one or more of the features of these aspects may be combined with any one or more features of another example, embodiment, or aspect described herein. Exemplary Embodiments of the present invention include, but are not limited to, the following.
Aspect 1 is a method of reducing false positive detection of 5-methylcytosine (5mC) and/or 5-hydroxymethylcytosine (5hmC) due to deamination of unmethylated cytosines, the method comprising:
(a) providing a sample comprising single stranded DNA library fragments comprising deaminated methylated cytosines, wherein the strand single stranded DNA fragments comprise 5' end and 3' end library adapters;
(b) contacting the sample with an uracil DNA glycosylase (UDG), wherein the UDG deglycosylates uracil residues to form abasic sites, resulting in single stranded DNA library fragments with abasic sites;
( c) contacting the sample comprising single stranded DNA library fragments with abasic sites with a mixture comprising: high fidelity polymerase; a deoxycytidyl transferase; dNTPs; and a primer complementary to the 3' end library adapter under conditions to provide for second strand synthesis, wherein the deoxycytidyl transferase incorporates cytosines opposite abasic sites, resulting in double stranded DNA library fragments comprising a first strand comprising the single stranded DNA library fragment with abasic sites and a complementary second strand comprising cytosines opposite the abasic sites;
(d) treating the double stranded DNA library fragments to digest the first strand comprising the single stranded DNA library fragment with abasic sites resulting in a sample comprising complementary second strands in which uracil residues are replaced with cytosine residues.
Aspect 2 is a method of Aspect 1, wherein the deamination of unmethylated cytosines is due to deamination of unmethylated cytosines in an assay using a cytosine deaminase to selectively deaminate methylated cytosines.
Aspect 3 is a method of Aspect 1, wherein prior to step (a), the sample comprising single stranded DNA library fragments is contacted with a deaminase to selectively deaminate methylated cytosine.
Aspect 4 is a method of replicating uracil residues as cytosine residues, the method comprising:
(a) providing a sample comprising single stranded DNA library fragments comprising 5' end and 3' end library adapters;
(b) contacting the sample with an uracil DNA glycosylase (UDG) wherein the UDG deglycosylates uracil residues to form abasic sites, resulting in single stranded DNA library fragments with abasic sites;
( c) contacting the sample comprising single stranded DNA library fragments with abasic sites with a mixture comprising: high fidelity polymerase; a deoxycytidyl transferase; dNTPs; and a primer complementary to the 3' end library adapter under conditions to provide for second strand synthesis, wherein the deoxycytidyl transferase incorporates cytosines opposite abasic sites, resulting in double stranded DNA library fragments comprising a first strand comprising the single stranded DNA library fragment with abasic sites and a complementary second strand comprising cytosines opposite the abasic sites;
(d) treating the double stranded DNA library fragments to digest the first strand comprising the single stranded DNA library fragment with abasic sites resulting in a sample comprising complementary second strands in which uracil residues are replaced with cytosine residues.
Aspect 5 is a method of any one of Aspects 1 to 4, wherein the deoxycytidyl transferase comprises the Rev 1 enzyme. Aspect 6 is a method of any one of Aspects 1 to 5, wherein the high fidelity polymerase comprises T4 DNA polymerase or E. coli polymerase.
Aspect 7 is a method of any one of Aspects 1 to 6, wherein treating the double stranded DNA library fragments to digest the first strand comprising abasic sites library comprises treating the double stranded DNA library fragments with heat and/or NaOH.
Aspect 8 is a method of any one of Aspects 1 to 7, wherein the single stranded DNA library fragments are about lOObp to about 200bp in length.
Aspect 9 is a method of any one of Aspects 1 to 8, further comprising subjecting the complementary second strands in which uracil residues are replicated as cytosine residues sample to polymerase chain reaction (PCR) amplification.
Aspect 10 is a method of any one of Aspects 1 to 9, further comprising sequencing the complementary second strands in which uracil residues are replicated as cytosine residues.
Aspect 11 is a method of any one of Aspects 1 to 9 further comprising processing the complementary second strands in which uracil residues are replicated as cytosine residues to produce a sequencing library.
Aspect 12 is the method of Aspect 11, further comprising sequencing the sequencing library.
Aspect 13 is a method of any one of Aspects 1 to 12, wherein the cytosine deaminase comprises an altered cytosine deaminase.
Aspect 14 is the method of Aspect 13, wherein the altered cytosine deaminase is a member of the AID subfamily, the APOBEC1 subfamily, the APOBEC2 subfamily, the APOBEC3A subfamily, the APOBEC3B subfamily, the APOBEC3C subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, the APOBEC3G subfamily, the APOBEC3G subfamily, the APOBEC3H subfamily, or the APOBEC4 subfamily, or an alteration thereof.
Aspect 15 is the method of Aspect 13, wherein the altered cytosine deaminase comprises an altered APOBEC3A.
Aspect 16 is a method of any one of Aspects 13 to 15, wherein the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type APOBEC3A protein and/or an amino acid substitution mutation at a position functionally equivalent to Tyrl32 in a wild-type APOBEC3A protein. Aspect 17 is a method of any one of Aspects 13 to 16, wherein the altered cytidine deaminase comprises amino acid substitution mutations at positions functionally equivalent to (Tyr/Phe)130 and Tyrl32 in a wild-type AP0BEC3A protein.
Aspect 18 is a method of any one of Aspects 13 to 17, wherein the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type AP0BEC3A protein, wherein the substitution mutation is (Tyr/Phe)130Trp.
Aspect 19 is the method of Aspect 17 or 18, wherein the (Tyr/Phe)130 is Tyrl30, and the wild-type AP0BEC3A protein is SEQ ID NO: 12.
Aspect 20 is a method of any one of Aspects 16 to 19, wherein the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to alanine, glycine, phenylalanine, histidine, glutamine, methionine, asparagine, lysine, valine, aspartic acid, glutamic acid, serine, cysteine, proline, arginine, or threonine.
Aspect 21 is a method of any one of Aspects 16 to 20, wherein the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to Ala, Vai, or Trp.
Aspect 22 is the method of any one of Aspects 16 to 21, wherein the substitution mutation at the position functionally equivalent to Tyrl32 comprises a mutation to His, Arg, Gin, or Lys.
Aspect 23 is a method of any one of Aspects 13 to 22, wherein the altered cytidine deaminase converts 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of cytosine (C) to uracil (U) by deamination.
Aspect 24 is the method of Aspect 23, wherein the rate is at least 100-fold greater.
Aspect 25 is a method of any one of Aspects 13 to 24, wherein the altered cytidine deaminase converts cytosine (C) to uracil (U) by deamination and 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of 5 -hydroxymethyl cytosine (5hmC) to 5 -hydroxymethyl uracil (5hmU) by deamination.
Aspect 26 is the method of Aspect 25, wherein conversion of 5hmC to 5hmU by deamination is undetectable.
Aspect 27 is a method of any one of Aspects 13 to 26, wherein the altered cytidine deaminase comprises a ZDD motif H- [P/A/V]-E-X[23-28]-P-C-X[2-4]-C (SEQ ID NO: 1). Aspect 28 is a method of any one of Aspects 13 to 27, wherein the altered cytidine deaminase is a member of the AP0BEC3A subfamily and comprises a ZDD motif HXEX24SW(S/T)PCX[2-4]CX6FX8LX5R(L/I)YX[8-1 l]LX2LX[10]M (SEQ ID NOY), wherein the amino acid substitution mutation at the position functionally equivalent to (Tyr/Phe)130 of the wild-type AP0BEC3A protein is the Tyr (Y) amino acid of the ZDD motif.
Aspect 29 is a method of any one of Aspects 13 to 28, wherein the altered cytidine deaminase is a member of the AP0BEC3A subfamily and comprises X[16-26]- GRXXTXLCYXV-X1 5-GXXXN-X12-HAEXXF-X14-YXXTWXXSWSPC- X[2-4]-CA-X5- FL-X7-LXIXXXR(L/I)Y-X8-GLXXLXXXG-X5-M-X4-FXXCWXXFV-X6-FXPW-X13- LXXI- X[2-6] (SEQ ID NO:3).
Aspect 30 is a method of any one of Aspects 13 to 29, wherein the altered cytidine deaminase is a member of the APOBEC3A family and comprises SEQ ID NO:5, SEQ ID NO:6, SEQ ID NOY, SEQ ID NO 8, SEQ ID NO:9, SEQ ID NO: 10, or SEQ ID NO:11.
Aspect 31 is a kit comprising: a cytosine deaminase; an uracil DNA glycosylase (UDG); a high fidelity polymerases; and/or a deoxy cytidyl transferase.
Aspect 32 is the kit of Aspect 31, further comprising: dNTPs; and a primer complementary to the 3' end library adapter capable of binding to single stranded DNA library fragments comprising 5' end and 3' end library adapters.
Aspect 33 is the kit of Aspect 31 or 32, further comprising an unnatural dCTP derivative.
Aspect 33 is the kit of any one of Aspects 31 to 33, wherein the cytosine deaminase is an altered APOB EC.
The present invention is illustrated by the following examples. It is to be understood that the particular examples, materials, amounts, and procedures are to be interpreted broadly in accordance with the scope and spirit of the invention as set forth herein.
EXAMPLES
Example 1
False positive reduction by translesion polymerase repair (TraPR)
Enzymatic sequencing methods targeting 5-methylcytosine (5m-dC) are suffering from cross reactivity with cytosine (dC). This is due to the use of cytidine deaminases, which target Cs and 5m-dCs to give Uracil (dU) and Thymine (dT), respectively. Recently, developed engineered cytidine deaminase variants with increased selectivity for 5m-dC over dC, allow direct sequencing of methylated regions by 5m-dC to dT conversion. However, the false positive rate of these enzymes (dC to dU conversion) is still too high to allow for deployment in a workflow.
With this example, the false positive rate associated with cytidine deaminases, including engineered cytidine deaminase variants with increased selectivity for 5m-dC over dC is solved by repair of the uracils with Revl, effectively converting false positives into dC to dG conversions. After cytidine deaminase treatment, the DNA is incubated with Uracil DNA Glycosylase (UDG), which will generate abasic sites where dC to dU conversions have occurred. Then, the complement of the library is generated by use of a high-fidelity polymerase, supplemented with a deoxycytidyl transferase, such as Revl . In these conditions, dC will be selectively incorporated opposite to the abasic site. After digestion of the original strand, the complement can be read to assign methylation calls. Thus, 5m-dC sites will be recognized by the expected conversion to dT, while the false positives arising from dC to dU conversion can be identified by a defined signature, a dC to dG conversion.
Other than bisulfite sequencing, which comes with heavy losses of DNA, the other common method for 5m-dC sequencing is EM-seq. This fully enzymatic technique uses a two- step enzymatic reaction: first, the DNA is treated with TET and then with APOBEC. In the first step, TET oxidizes 5m-dC to a mixture of 5 -hydroxy cytosine (5hm-dC), 5 -formyl cytosine (5f- dC), and 5-carboxycytosine (5ca-dC). The purpose of this step is to protect the methylated cytosines from cross-reactivity in the following step, as APOBEC is active on both 5m-dC and C. In fact, APOBEC is also reactive on 5hm-dC - which is the reason that 5hm-dC is also further reacted with a glucosyltransferase to generate 5-glucosylhydroxycytidine (5gm-dC). After this first treatment, the DNA is treated with APOBEC, and all the dCs converted to dUs. The overall result is that, if a site was methylated before the dual treatment, it will still sequence as a C. Conversely, any other site that was not methylated before treatment, will read as a T (as U are read as T during sequencing). The main disadvantage of this method is the generation of a so-called 3-base genome, which is extremely burdensome due to high computational demands, difficulties in variation calling, and poor sequencing performance.
Cytidine deaminase variants with exquisite selectivity for 5m-dC over dC have been recently developed. Examples of such engineered cytidine deaminases include any of the engineered AP0BEC3A cytidine deaminases described in
International Patent Application No. PCT/US2023/017846 (“Altered Cytidine Deaminases and Methods of Use”), filed April 7, 2023. With these newly engineered enzymes, one could directly treat the DNA and selectively convert the 5m-dC into dT - leaving any other base untouched and generating a convenient 4-base genome. However, the false positive arising from the enzymes low activity on dC remains a challenge that could slow the development of the technique.
To address this problem, this example uses a tandem enzymatic treatment. First with Uracil DNA Glycosylase (UDG) to excise the dU and introduce an abasic site and second with a translesion polymerase to install a signature by repair. Interestingly, dU is a well-known DNA lesion caused by spontaneous dC deamination, and as such many enzymes have evolved to recognize and correct it. One of these enzymes UDG. UDG is a monofunctional glycosylase, that upon recognition of dU in either ssDNA or dsDNA, will cleave the N-glycosidic bond to release uracil and yield an abasic (AP) site (Rusmintratip and Sowers, 2000, PNAS,' 97(26): 14183-14187). Thus, UDG incubation of a cytosine deaminase treated library will result in 5m-dC -> dT and dC -> dU -> AP conversions. An abasic site is generally unwanted in NGS sequencing protocols, as they readily form nicks upon basic treatment. However, with the method of this example, a DNA repair enzyme is utilized to make good use of the dC -> AP. Typically, AP sites are repaired via the base excision repair pathway (BER) (reviewed in Robertson et al., 2009, Cell Mol Life Sci, 66(6):981-93). However, another mechanism of repair is a mutagenesis replication operated by polymerases. Most polymerases, when encountering an AP site, they would undergo stalling. Few of them - like the highly mutagenic DNA Polymerase 0, is able to read through the lesion with the insertion of dA by what is known as the A-rule (Laverty et al., 2017, ACS Chem Biol,' 12(6): 1584-1592). Unfortunately, introduction of dA opposite to the AP site would still mean that ultimately dU will be read as a dT - thus, conventional polymerases would not result in the needed change of base for a selective detection.
This example utilizes a specific class of polymerases (Family Y), often referred as deoxycytidyl transferases, that are specific for the transfer of dC across AP sites. The defining member for this polymerase family is Revl (Nair et al., 2005, Science' 309(5744):2219-2222). Revl is rarely seen replicating full strands, and instead is role is to complement other polymerases to prevent their stalling at translesion sites. Elegantly, as shown in FIG. 3, Revl is able to bypass the AP site by using an arginine residue (R324) as a template, to bind an incoming dCMP through the Ns and Nr, of the guanidium group and help transfer it onto the nascent strand of DNA (Weaver et al., 2020, PNAS 117 (41) 25494-25504).
Treatment with Revl will provide an alternative read to the AP site introduced by UDG. As shown in FIG. 1, after the UDG-treatment, the ssDNA will be extended with a high-fidelity polymerase, supplemented with Revl. For the original 5m-dC site, converted to dT by a cytosine deaminase, this will result in incorporation of dA, and ultimately a dT read. However, the dC site, upon conversion to dU and deglycosylation to AP by UDG, will have a dC incorporated on the opposite strand, ultimately resulting in a dG read.
Application of TraPR to methylation sequencing: In order to prepare libraries for methylation sequencing, DNA libraries are first prepared from the input sample material such that adapter sequences are ligated to library fragments to facilitate downstream workflow steps. Many possible library preparation protocols are compatible with the described invention. In order to maximize the overall performance of the enzymatic error-correction, libraries may be prepared targeting a shorter insert size, for example 100-200bp, in order to minimize the number of potential false-positive uracils that may be present in any individual library fragment. By minimizing the number of false positives in each fragment, the probability of successfully repairing library fragments and mediating their downstream amplification may be increased, thus preserving library quality and diversity. Once library fragments with adapter sequences have been prepared, libraries are denatured and subjected an engineered cytidine deaminase variant selective for mC deamination. Because false positive (cytosine) deamination results in uracil bases, and true positive (methylcytosine) bases result in thymine bases, Uracil DNA glycosylase (UDG) can be utilized to specifically recognize and remove uracil bases, thus removing the false positive signal and preventing its propagation as a “T” in downstream amplification and sequencing. Cytidine deaminase such as APOBEC enzymes require ssDNA for recognition, and thus deaminated DNA will be single stranded. Importantly, UDG recognizes both dsDNA and ssDNA, making it compatible with this application
Heating DNA containing abasic sites may be detrimental to performance, as abasic sites are labile and heating may lead to strand cleavage. Therefore, following the generation of abasic sites, care should be taken with the DNA sample to minimize degradation.
Following the generation of the abasic sites, the DNA sample may be mixed with a primer that binds to the 3’ library adapter sequence to facilitate second strand synthesis, along with a polymerase cocktail of a high fidelity polymerase mixed with Revl polymerase and a mixture of dNTPs. Following incubation of the sample under appropriate reaction conditions for polymerization, a second strand will be generated such that “C” residues are inserted preferentially across from abasic sites. Appropriate high fidelity polymerases may include T4 DNA polymerase or E. coll DNA polymerase, among others (Tanguy Le Gac et al., 2004, J Mol Blo 336(5): 1023-34).
Subsequently, the double-stranded DNA molecule may be subjected to PCR with a standard high fidelity PCR polymerase. Optionally, the DNA sample may first be treated with heat and/or a dilute solution of NaOH to cleave the original library fragments at the abasic sites, preventing their amplification.
Interpretation of sequencing data with TrAPR-treated libraries: After sequencing of the resulting DNA libraries, false positive repair events would have to be discriminated from true positive methylation signal. Further, ideally, the sequencing could also discriminate these methylation marks and false positive repair events from SNVs present in the sample. Because of the unique sequencing signatures, these possibilities could be discriminated downstream by comparison of the sequencing data to the reference and comparing the data from both strands, as shown in FIGS. 2A and 2B. Briefly, C:G>T :G alignments suggest a methylation mark, C:G>T:A suggest a SNV, C:G>G:G suggest a FP repair event, and C:G>G:C suggest a SNV.
Using modified cytidine deaminiases, both 5m-dC and 5hm-dC are converted. While 5m-dC is converted to a dT, 5hm-dC is converted to 5hm-dU. A possible issue with the approach described here is that UDG would be active on 5hm-dU. However, while some uracil- DNA-glycosylases from higher organisms have indeed shown activity on 5hm-dU, bacterial (E. coli) UDG have been shown to only be active on dU.
A possible challenge is given by the stability of AP sites in a workflow. Abasic sites can form nick in DNA through a base-catalyzed mechanism called P-elimination. If the resulting AP sites are too unstable to allow for efficient sequencing of the library, selective site trapping or masking approaches can be used. For example, HMCES has been found to form a stable protein- DNA crosslink via a thioazolidinone, ultimately resulting in protection of AP sites from hydrolysis (Thompson et aL, 2019, Nat Struct Mol Biol, 26(7):613-618). Other methods for stabilization or trapping of abasic sites can be inspired by the literature on AP site visualization. Transformation that will stabilize the AP site and prevent P-elimination include reduction of the aldehyde to alcohol with a reducing agent, oxydation to deoxyribonolactone with an oxidizing agent, formation of a Schiff base with hydroxylamine, hydrazine, and derivatives, and formation of thioazolidinone with cysteine and derivatives.
SEQUENCE INFORMATION
SEQ ID NO: 1 zinc-binding motif H-[P/A/V]-E-X[23-28]-P-C-X[2-4]-C
SEQ ID NO
ZDD motif:
Figure imgf000057_0001
Note: (L/I)60 is represented by “J” in the sequence listing.
SEQ ID NO: 3 altered cytidine deaminase includes the amino acids of a member of the APOBEC3A subfamily:
X[i6-26]-GRXXTXLCYXV-Xi5-GXXXN-Xi2-HAEXXF-Xi4-YXXTWXXSWSPC- X[2-4]-CA-
X5-FL-X7-LXIXXXR(L/I)Y-X8-GLXXLXXXG-X5-M-X4-FXXCWXXFV-X6-FXPW-XI3-
LXXI- X[2-6J
Note: (L/I)129 is represented by “J” in the sequence listing.
SEQ ID NO: 4 altered cytidine deaminase includes the amino acids of a member of the APOBEC3A subfamily:
X26-GRXXTXLCYXV-X15-G-X16-HAEXXF-X14-YXXTWXXSWSPC-X4-CA-X5-FL-X7-
LXIFXXR(L/I)Y-X8-GLXXLXXXG-X5-M-X4-FXXCWXXFV-X6-FXPW-X13-LXXI-X6
Note: (L/I)129 is represented by “J” in the sequence listing.
SEQ ID NO: 5
Altered cytosine deaminase (SGI) - synthetic construct:
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQ
AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN
THVRLRIFAARIADYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP
WDGLDEHSQALSGRLRAILQNQGN
SEQ ID NO: 6
Altered cytosine deaminase (SG2) - synthetic construct:
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQ
AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN
THVRLRIFAARIADHDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP
WDGLDEHSQALSGRLRAILQNQGN
SEQ ID NO: 7
Altered cytosine deaminase- synthetic construct
APOBECC3A with (Y130L)
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQ
AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN
THVRLRIFAARILDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP
WDGLDEHSQALSGRLRAILQNQGN
SEQ ID NO: 8
Altered cytosine deaminase - synthetic construct AP0BEC3A with (Y130W)
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQ AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN THVRLRIFAARIWDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP WDGLDEHSQALSGRLRAILQNQGN
SEQ ID NO : 9
Altered cytosine deaminase (SGI with Y130X)
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQ AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN THVRLRIFAARIXDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP WDGLDEHSQALSGRLRAILQNQGN (wherein X can be A, G, F, H, Q, M, N, K, V, D, E, S, C, P, or T, preferably A)
SEQ ID NO: 10
Altered cytosine deaminase - synthetic construct
X26-GRXXTXLC YXV-X 15 -G-X 16-HAEXXF-X 14- YXXT WXXS WSPC-X4-C A-X5 -FL-X7-
LXIFXXR(L/I)Z-X8-GLXXLXXXG-X5-M-X4-FXXCWXXFV-X6-FXPW-X13-LXXI-X6 (where Z is A, G, F, H, Q, M, N, K, V, D, E, S, C, P, or T, and the number after an X refers to the number of amino acids present)
SEQ ID NO: 11
Altered cytosine deaminase - synthetic construct
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQ AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN THVRLRIFAARIXDZDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP WDGLDEHSQALSGRLRAILQNQGN, (wherein X can be A, L, or W, preferably A; and Z is selected from R, H, L, or Q, preferably H).
SEQ ID NO: 12
Wild Type human APOBEC3A protein (UniProt: P31941)
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQ AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN THVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP WDGLDEHSQALSGRLRAILQNQGN
SEQ ID NO: 13
Reference Sequence FIG. 2A
ATA TCG TAC ATA TAG CTA GCT AGT ATA GCT AGT CGT ACG TGT CGA TGA T
SEQ ID NO: 14
Sample Sequence in FIG. 2A
ATATCGTACATATAGCTAGTTAGTATAGGTAGTCGTACGTGTCGATGAT
SEQ ID NO: 15 Deamination Sequence in FIG. 2A ATATCGTAUATATAGCTAGTTAGTATAGGTAGTTGTATGTGTTGATGAT
SEQ ID NO: 16
TRAPR + amplification Sequence in FIG. 2A
ATATCGTAGATATAGCTAGTTAGTATAGGTAGTTGTATGTGTTGATGAT
SEQ ID NO: 17
TRAPR + amplification (no change) Sequence in FIG. 2A
ATC ATT GAC ATG TAT GAC TAC CTA TAC TAA CTA GCT ATA TGA TCG ATA T
SEQ ID NO: 18
Deamination (no change) Sequence in FIG. 2A
ATCATTGACATGTATGACTACCTATACTAACTAGCTATATGATCGATAT
SEQ ID NO: 19
Sample Sequence in FIG. 2A ATCATCGACACGTACGACTACCTATACTAACTAGCTATATGATCGATAT
SEQ ID NO: 20
Reference Sequence FIG. 2A ATCATCGACACGTACGACTAGCTATACTAGCTAGCTATATGATCGATAT
The complete disclosure of all patents, patent applications, and publications, and electronically available material (including, for instance, nucleotide sequence submissions in, e g., GenBank and RefSeq, and amino acid sequence submissions in, e g., SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq) cited herein are incorporated by reference in their entirety. Supplementary materials referenced in publications (such as supplementary tables, supplementary figures, supplementary materials, and methods, and/or supplementary experimental data) are likewise incorporated by reference in their entirety. In the event that any inconsistency exists between the disclosure of the present application and the disclosure(s) of any document incorporated herein by reference, the disclosure of the present application shall govern. The foregoing detailed description and examples have been given for clarity of understanding only. No unnecessary limitations are to be understood therefrom. The disclosure is not limited to the exact details shown and described, for variations obvious to one skilled in the art will be included within the disclosure defined by the claims.

Claims

What is claimed is:
1. A method of reducing false positive detection of 5-methylcytosine (5mC) and/or 5- hydroxymethylcytosine (5hmC) due to deamination of unmethylated cytosines, the method comprising:
(a) providing a sample comprising single stranded DNA library fragments comprising deaminated methylated cytosines, wherein the strand single stranded DNA fragments comprise 5' end and 3' end library adapters;
(b) contacting the sample with an uracil DNA glycosylase (UDG), wherein the UDG deglycosylates uracil residues to form abasic sites, resulting in single stranded DNA library fragments with abasic sites;
( c) contacting the sample comprising single stranded DNA library fragments with abasic sites with a mixture comprising: high fidelity polymerase; a deoxycytidyl transferase; dNTPs; and a primer complementary to the 3' end library adapter under conditions to provide for second strand synthesis, wherein the deoxycytidyl transferase incorporates cytosines opposite abasic sites, resulting in double stranded DNA library fragments comprising a first strand comprising the single stranded DNA library fragment with abasic sites and a complementary second strand comprising cytosines opposite the abasic sites;
(d) treating the double stranded DNA library fragments to digest the first strand comprising the single stranded DNA library fragment with abasic sites resulting in a sample comprising complementary second strands in which uracil residues are replaced with cytosine residues.
2. The method of claim 1, wherein the deamination of unmethylated cytosines is due to deamination of unmethylated cytosines in an assay using a cytosine deaminase to selectively deaminate methylated cytosines.
3. The method of claim 1, wherein prior to step (a), the sample comprising single stranded DNA library fragments is contacted with a deaminase to selectively deaminate methylated cytosine.
4. A method of replicating uracil residues as cytosine residues, the method comprising:
(a) providing a sample comprising single stranded DNA library fragments comprising 5' end and 3' end library adapters;
(b) contacting the sample with an uracil DNA glycosylase (UDG) wherein the UDG deglycosylates uracil residues to form abasic sites, resulting in single stranded DNA library fragments with abasic sites;
( c) contacting the sample comprising single stranded DNA library fragments with abasic sites with a mixture comprising: high fidelity polymerase; a deoxycytidyl transferase; dNTPs; and a primer complementary to the 3' end library adapter under conditions to provide for second strand synthesis, wherein the deoxycytidyl transferase incorporates cytosines opposite abasic sites, resulting in double stranded DNA library fragments comprising a first strand comprising the single stranded DNA library fragment with abasic sites and a complementary second strand comprising cytosines opposite the abasic sites;
(d) treating the double stranded DNA library fragments to digest the first strand comprising the single stranded DNA library fragment with abasic sites resulting in a sample comprising complementary second strands in which uracil residues are replaced with cytosine residues.
5. The method of any one of claims 1 to 4, wherein the deoxy cytidyl transferase comprises the Revl enzyme.
6. The method of any one of claims 1 to 5, wherein the high fidelity polymerase comprises T4 DNA polymerase or E. coli polymerase.
7. The method of any one of claims 1 to 6, wherein treating the double stranded DNA library fragments to digest the first strand comprising abasic sites library comprises treating the double stranded DNA library fragments with heat and/or NaOH.
8. The method of any one of claims 1 to 7, wherein the single stranded DNA library fragments are about lOObp to about 200bp in length.
9. The method of any one of claims 1 to 8, further comprising subjecting the complementary second strands in which uracil residues are replicated as cytosine residues sample to polymerase chain reaction (PCR) amplification.
10. The method of any one of claims 1 to 9, further comprising sequencing the complementary second strands in which uracil residues are replicated as cytosine residues.
11. The method of any one of claims 1 to 9 further comprising processing the complementary second strands in which uracil residues are replicated as cytosine residues to produce a sequencing library.
12. The method of claim 11, further comprising sequencing the sequencing library.
13. The method of any one of claims 1 to 12, wherein the cytosine deaminase comprises an altered cytosine deaminase.
14. The method of claim 13, wherein the altered cytosine deaminase is a member of the AID subfamily, the APOB EC 1 subfamily, the APOBEC2 subfamily, the APOBEC3A subfamily, the APOBEC3B subfamily, the APOBEC3C subfamily, the APOBEC3D subfamily, the APOBEC3F subfamily, the APOBEC3G subfamily, the APOBEC3G subfamily, the APOBEC3H subfamily, or the APOBEC4 subfamily, or an alteration thereof.
15. The method of claim 13, wherein the altered cytosine deaminase comprises an altered APOBEC3A.
16. The method of any one of claims 13 to 15, wherein the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type AP0BEC3A protein and/or an amino acid substitution mutation at a position functionally equivalent to Tyrl32 in a wild-type AP0BEC3A protein.
17. The method of any one of claims 13 to 16, wherein the altered cytidine deaminase comprises amino acid substitution mutations at positions functionally equivalent to (Tyr/Phe)130 and Tyrl32 in a wild-type AP0BEC3A protein.
18. The method of any one of claims 13 to 17, wherein the altered cytidine deaminase comprises an amino acid substitution mutation at a position functionally equivalent to (Tyr/Phe)130 in a wild-type AP0BEC3A protein, wherein the substitution mutation is (Tyr/Phe)130Trp.
19. The method of claim 17 or 18, wherein the (Tyr/Phe)130 is Tyrl30, and the wild-type AP0BEC3A protein is SEQ ID NO: 12.
20. The method of any one of claims 16 to 19, wherein the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to alanine, glycine, phenylalanine, histidine, glutamine, methionine, asparagine, lysine, valine, aspartic acid, glutamic acid, serine, cysteine, proline, arginine, or threonine.
21. The method of any one of claims 16 to 20, wherein the substitution mutation at the position functionally equivalent to Tyrl30 comprises a mutation to Ala, Vai, or Trp.
22. The method of any one of claims 16 to 21, wherein the substitution mutation at the position functionally equivalent to Tyrl32 comprises a mutation to His, Arg, Gin, or Lys.
23. The method of any one of claims 13 to 22, wherein the altered cytidine deaminase converts 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of cytosine (C) to uracil (U) by deamination.
24. The method of claim 23, wherein the rate is at least 1 OO-fold greater.
25. The method of any one of claims 13 to 24, wherein the altered cytidine deaminase converts cytosine (C) to uracil (U) by deamination and 5-methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of 5 -hydroxymethyl cytosine (5hmC) to 5- hydroxymethyl uracil (5hmU) by deamination.
26. The method of claim 25, wherein conversion of 5hmC to 5hmU by deamination is undetectable.
27. The method of any one of claims 13 to 26, wherein the altered cytidine deaminase comprises a ZDD motif H- [P/A/V]-E-X[23-28]-P-C-X[2-4]-C (SEQ ID NO: 1).
28. The method of any one of claims 13 to 27, wherein the altered cytidine deaminase is a member of the APOBEC3A subfamily and comprises a ZDD motifHXEX24SW(S/T)PCX[2- 4]CX6FX8LX5R(L/I)YX[8-1 l]LX2LX[10]M (SEQ ID NOB), wherein the amino acid substitution mutation at the position functionally equivalent to (Tyr/Phe)130 of the wild-type APOBEC3A protein is the Tyr (Y) amino acid of the ZDD motif.
29. The method of any one of claims 13 to 28, wherein the altered cytidine deaminase is a member of the APOBEC3A subfamily and comprises X[16-26]-GRXXTXLCYXV-X15- GXXXN-X 12-HAEXXF-X 14- YXXTWXXS WSPC- X[2-4] -C A-X5 -FL-X7-LXIXXXR(L/I) Y- X8-GLXXLXXXG-X5-M-X4-FXXCWXXFV-X6-FXPW-X13-LXXI- X[2-6] (SEQ ID NOB).
30. The method of any one of claims 13 to 29, wherein the altered cytidine deaminase is a member of the APOBEC3A family and comprises SEQ ID NO:5, SEQ ID NO:6, SEQ ID NOB, SEQ ID NOB, SEQ ID NO 9, SEQ ID NO: 10, or SEQ ID NO: 11.
31. A kit compri sing : a cytosine deaminase; an uracil DNA glycosyl ase (UDG); a high fidelity polymerases; and/or a deoxycytidyl transferase.
32. The kit of claim 31, further comprising: dNTPs; and a primer complementary to the 3' end library adapter capable of binding to single stranded DNA library fragments comprising 5' end and 3' end library adapters.
33. The kit of claim 31 or 32, further comprising an unnatural dCTP derivative.
34. The kit of any one of claims 31 to 33, wherein the cytosine deaminase is an altered APOBEC.
PCT/US2024/031370 2023-05-31 2024-05-29 False positive reduction by translesion polymerase repair Pending WO2024249466A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363469860P 2023-05-31 2023-05-31
US63/469,860 2023-05-31

Publications (1)

Publication Number Publication Date
WO2024249466A1 true WO2024249466A1 (en) 2024-12-05

Family

ID=91585818

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/031370 Pending WO2024249466A1 (en) 2023-05-31 2024-05-29 False positive reduction by translesion polymerase repair

Country Status (1)

Country Link
WO (1) WO2024249466A1 (en)

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4683195A (en) 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4683202A (en) 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
WO1998044151A1 (en) 1997-04-01 1998-10-08 Glaxo Group Limited Method of nucleic acid amplification
WO2000018957A1 (en) 1998-09-30 2000-04-06 Applied Research Systems Ars Holding N.V. Methods of nucleic acid amplification and sequencing
WO2005065814A1 (en) 2004-01-07 2005-07-21 Solexa Limited Modified molecular arrays
US7057026B2 (en) 2001-12-04 2006-06-06 Solexa Limited Labelled nucleotides
WO2006064199A1 (en) 2004-12-13 2006-06-22 Solexa Limited Improved method of nucleotide detection
WO2007010251A2 (en) 2005-07-20 2007-01-25 Solexa Limited Preparation of templates for nucleic acid sequencing
US20070166705A1 (en) 2002-08-23 2007-07-19 John Milton Modified nucleotides
US20090026082A1 (en) 2006-12-14 2009-01-29 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes using large scale FET arrays
US20090127589A1 (en) 2006-12-14 2009-05-21 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes using large scale FET arrays
US20100137143A1 (en) 2008-10-22 2010-06-03 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes
US20100282617A1 (en) 2006-12-14 2010-11-11 Ion Torrent Systems Incorporated Methods and apparatus for detecting molecular interactions using fet arrays
US8003354B2 (en) 2000-02-07 2011-08-23 Illumina, Inc. Multiplex nucleic acid reactions
WO2015106941A1 (en) 2014-01-16 2015-07-23 Illumina Cambridge Limited Polynucleotide modification on solid support
WO2016130704A2 (en) 2015-02-10 2016-08-18 Illumina, Inc. Methods and compositions for analyzing cellular components
WO2018018008A1 (en) 2016-07-22 2018-01-25 Oregon Health & Science University Single cell whole genome libraries and combinatorial indexing methods of making thereof
WO2018057779A1 (en) * 2016-09-23 2018-03-29 Jianbiao Zheng Compositions of synthetic transposons and methods of use thereof
US20180305753A1 (en) 2017-04-23 2018-10-25 Illumina Cambridge Limited Compositions and methods for improving sample identification in indexed nucleic acid libraries
WO2019236599A2 (en) 2018-06-04 2019-12-12 Illumina, Inc. High-throughput single-cell transcriptome libraries and methods of making and of using
WO2022159715A1 (en) * 2021-01-22 2022-07-28 The Broad Institute, Inc. Tracking apobec mutational signatures in tumor cells
CN115261363A (en) * 2021-04-29 2022-11-01 中国科学院分子植物科学卓越创新中心 Method for determining RNA deaminase activity of APOBEC3A and APOBEC3A variant with high RNA activity
WO2024147904A1 (en) * 2023-01-06 2024-07-11 Illumina, Inc. Reducing uracils by polymerase

Patent Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4683202A (en) 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
US4683202B1 (en) 1985-03-28 1990-11-27 Cetus Corp
US4683195B1 (en) 1986-01-30 1990-11-27 Cetus Corp
US4683195A (en) 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
WO1998044151A1 (en) 1997-04-01 1998-10-08 Glaxo Group Limited Method of nucleic acid amplification
WO2000018957A1 (en) 1998-09-30 2000-04-06 Applied Research Systems Ars Holding N.V. Methods of nucleic acid amplification and sequencing
US8003354B2 (en) 2000-02-07 2011-08-23 Illumina, Inc. Multiplex nucleic acid reactions
US20060188901A1 (en) 2001-12-04 2006-08-24 Solexa Limited Labelled nucleotides
US7057026B2 (en) 2001-12-04 2006-06-06 Solexa Limited Labelled nucleotides
US20070166705A1 (en) 2002-08-23 2007-07-19 John Milton Modified nucleotides
WO2005065814A1 (en) 2004-01-07 2005-07-21 Solexa Limited Modified molecular arrays
WO2006064199A1 (en) 2004-12-13 2006-06-22 Solexa Limited Improved method of nucleotide detection
WO2007010251A2 (en) 2005-07-20 2007-01-25 Solexa Limited Preparation of templates for nucleic acid sequencing
US20090026082A1 (en) 2006-12-14 2009-01-29 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes using large scale FET arrays
US20100282617A1 (en) 2006-12-14 2010-11-11 Ion Torrent Systems Incorporated Methods and apparatus for detecting molecular interactions using fet arrays
US20090127589A1 (en) 2006-12-14 2009-05-21 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes using large scale FET arrays
US20100137143A1 (en) 2008-10-22 2010-06-03 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes
WO2015106941A1 (en) 2014-01-16 2015-07-23 Illumina Cambridge Limited Polynucleotide modification on solid support
WO2016130704A2 (en) 2015-02-10 2016-08-18 Illumina, Inc. Methods and compositions for analyzing cellular components
WO2018018008A1 (en) 2016-07-22 2018-01-25 Oregon Health & Science University Single cell whole genome libraries and combinatorial indexing methods of making thereof
WO2018057779A1 (en) * 2016-09-23 2018-03-29 Jianbiao Zheng Compositions of synthetic transposons and methods of use thereof
US20180305753A1 (en) 2017-04-23 2018-10-25 Illumina Cambridge Limited Compositions and methods for improving sample identification in indexed nucleic acid libraries
WO2019236599A2 (en) 2018-06-04 2019-12-12 Illumina, Inc. High-throughput single-cell transcriptome libraries and methods of making and of using
WO2022159715A1 (en) * 2021-01-22 2022-07-28 The Broad Institute, Inc. Tracking apobec mutational signatures in tumor cells
CN115261363A (en) * 2021-04-29 2022-11-01 中国科学院分子植物科学卓越创新中心 Method for determining RNA deaminase activity of APOBEC3A and APOBEC3A variant with high RNA activity
WO2024147904A1 (en) * 2023-01-06 2024-07-11 Illumina, Inc. Reducing uracils by polymerase

Non-Patent Citations (51)

* Cited by examiner, † Cited by third party
Title
"Genetics Computer Group", MADISON, WIS., article "Wisconsin Genetics Software Package"
ALTSCHUL ET AL., J MOL BIOL, vol. 215, 1990, pages 403 - 410
BELLAMY ET AL., NUCLEIC ACIDS RES, vol. 35, 2007, pages 1478 - 1487
BROWN ET AL., BIOCHEMISTRY, vol. 49, no. 26, 2010, pages 5504 - 5510
BRUINSMA ET AL., BMC GENOMICS, vol. 19, 2018, pages 722
BRUINSMA ET AL., BMC GENOMICS, vol. 19, no. 1, 2019, pages 722 - 14
CARNEVALI ET AL., J COMPUT BIOL, vol. 9, no. 3, 2012, pages 279 - 92
CHEN ET AL., VIRUSES, vol. 13, 2021, pages 497
CLIEN ET AL., NUCLEIC ACIDS RESEARCH, vol. 24, no. 18, 1996, pages 3546 - 3551
F'ANGIJY LE GAC ET AL., I MOL BIOL, vol. 336, no. 5, 2004, pages 1023 - 34
HEAD ET AL., BIOTECHNIQUES, vol. 56, no. 2, 2014, pages 61
HOLZ ET AL., SCIENTIFIC REPORTS, vol. 9, 2019, pages 17822
JAVIER M. DI NOIA ET AL: "Molecular Mechanisms of Antibody Somatic Hypermutation", ANNUAL REVIEW OF BIOCHEMISTRY, vol. 76, no. 1, 7 June 2007 (2007-06-07), US, pages 1 - 22, XP055673257, ISSN: 0066-4154, DOI: 10.1146/annurev.biochem.76.061705.090740 *
JIANG ET AL., J AM CHEM SOC, vol. 144, 2022, pages 1323 - 1331
JOHNSON, BIOCHIM BIOPHYS ACTA, vol. 1804, no. 5, 2010, pages 1041 - 1048
JOURNAL OF MOLECULAR BIOLOGY, vol. 336, no. 5, 2004, pages 1023 - 34
KOUNO ET AL., NAT. COMM, vol. 8, 2017, pages 15024
LAVERTY ET AL., ACS CHEM BIOL, vol. 12, no. 6, 2017, pages 1584 - 1592
LIN ET AL., NUCLEIC ACIDS RES, vol. 27, no. 22, 1999, pages 4468 - 75
LINDAHL ET AL., J BIOL CHEM, vol. 252, 1977, pages 3286 - 3294
LINDAHL, ANNU REV BIOCHEM, vol. 51, 1982, pages 61 - 87
MASUDA ET AL., J BIOL CHEM, vol. 276, 2001, pages 15051 - 15058
MOHNI ET AL., CELL, vol. 176, 2019, pages 144 - 153
MURAKUMO ET AL., J BIOLCHEM, vol. 276, 2001, pages 35644 - 35651
NAIR ET AL., SCIENCE, vol. 309, no. 5744, 2005, pages 2219 - 2222
NEEDLEMANWUNSCH, J MOL BIOL, vol. 48, 1907, pages 443
PARIKH ET AL., MUTAT RES, vol. 460, 2000, pages 183 - 199
PARIKH ET AL., PROC NATL ACAD SCI USA, vol. 97, 2000, pages 5083
PEARSONLIPMAN, PROC NAT'L ACAD SCI USA, vol. 85, 1988, pages 2444
PETER HL KRIJGER ET AL: "Rev1 is essential in generating G to C transversions downstream of the Ung2 pathway but not the Msh2+Ung2 hybrid pathway", EUROPEAN JOURNAL OF IMMUNOLOGY, WILEY-VCH, HOBOKEN, USA, vol. 43, no. 10, 5 August 2013 (2013-08-05), pages 2765 - 2770, XP071226395, ISSN: 0014-2980, DOI: 10.1002/EJI.201243191 *
PRASAD ET AL., NUCLEIC ACIDS RES, vol. 44, no. 22, 2016, pages 10824 - 10833
RAINE ET AL., NUCLEIC ACIDS RESEARCH, vol. 45, no. 6, 2017, pages e36
ROBERTSON ET AL., CELL MOL LIFE SCI, vol. 66, no. 6, 2009, pages 981 - 93
RUSMINTRATIPSOWERS, PNAS, vol. 97, no. 26, 2000, pages 14183 - 14187
SALEM ET AL., J BACTERIOL, vol. 191, no. 18, 2009, pages 5657 - 68
SALTER ET AL., TRENDS BIOCHEM SCI, vol. 41, no. 7, 2016, pages 578 - 594
SALTER ET AL., TRENDS BIOCHEM SCI, vol. 43, no. 8, 2018, pages 606 - 622
SALTER ET AL., TRENDS BIOCHEM SCI., vol. 41, no. 7, 2016, pages 578 - 594
SALTER ET AL., TRENDS BIOCHEM. SCI., vol. 43, no. 8, 2018, pages 606 - 622
SCHORMANN ET AL., PROTEIN SCI, vol. 23, 2014, pages 1667 - 1685
SEQUENCING METHODS REVIEW, Retrieved from the Internet <URL:illumina.com/content/dam/illumina-marketing/documents/products/research_reviews/sequencing-methods-review.pdf>
SLUPPHAUG ET AL., NATURE, vol. 382, no. 6593, 1996, pages 729 - 31
SMITHWATERMAN, ADV. APPL. MATH, vol. 2, 1981, pages 482
SRINIVASAN ET AL., AM J PATHOL., vol. 161, no. 6, December 2002 (2002-12-01), pages 1961 - 1971
STIVERS ET AL., ARCH BIOCHEM BIOPHYS, vol. 396, 2001, pages 1 - 9
STIVERS ET AL., BIOCHEMISTRY, vol. 38, 1999, pages 952 - 963
THOMAS HELLEDAY ET AL: "Mechanisms underlying mutational signatures in human cancers", NATURE REVIEWS GENETICS, vol. 15, no. 9, 1 July 2014 (2014-07-01), GB, pages 585 - 598, XP055380188, ISSN: 1471-0056, DOI: 10.1038/nrg3729 *
THOMPSON ET AL., NAT STRUCT MOLBIOL, vol. 26, no. 7, 2019, pages 613 - 618
WEAVER ET AL., PA, -IS, vol. 117, no. 41, 2020, pages 25494 - 25504
WEAVER ET AL., PNAS, vol. 117, no. 41, 2020, pages 25494 - 25504
ZHARKOV ET AL., MUTATION RESEARCH, vol. 685, 2010, pages 11 - 20

Similar Documents

Publication Publication Date Title
US10704091B2 (en) Genotyping by next-generation sequencing
US20240182881A1 (en) Altered cytidine deaminases and methods of use
EP2623613B1 (en) Increasing confidence of allele calls with molecular counting
KR102313470B1 (en) Error-free sequencing of DNA
US10100292B2 (en) Mutant endonuclease V enzymes and applications thereof
US20250327067A1 (en) Reducing uracils by polymerase
JP2022511207A (en) Methods and compositions for cluster formation by bridge amplification
EP4594343A1 (en) Methods of using cpg binding proteins in mapping modified cytosine nucleotides
EP4594482A1 (en) Cytidine deaminases and methods of use in mapping modified cytosine nucleotides
EP4594481A1 (en) Helicase-cytidine deaminase complexes and methods of use
WO2024249466A1 (en) False positive reduction by translesion polymerase repair
EP4627113A1 (en) Chemoenzymatic correction of false positive uracil transformations
US20250388894A1 (en) Methods of using cpg binding proteins in mapping modified cytosine nucleotides
WO2025081064A2 (en) Thermophilic deaminase and methods for identifying modified cytosine
Reiter et al. RT-PCR optimization strategies
WO2025072800A2 (en) Altered cytidine deaminases and methods of use
WO2025129195A1 (en) Dna polymerases and related methods
JP2008178338A (en) Nucleic acid amplification method for amplifying target nucleic acid in nucleic acid sample contaminated with fragmented nucleic acid, and kit thereof
Beaulne Nature and structure of DNA through the process of degradation
HK1204337B (en) Genotyping by next-generation sequencing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24734742

Country of ref document: EP

Kind code of ref document: A1