[go: up one dir, main page]

WO2024224324A1 - Enzymes tet modifiées et utilisation dans l'épigénétique et le séquençage de nouvelle génération (ngs), tels que le séquençage de borane-pyridine assisté par tet (taps). - Google Patents

Enzymes tet modifiées et utilisation dans l'épigénétique et le séquençage de nouvelle génération (ngs), tels que le séquençage de borane-pyridine assisté par tet (taps). Download PDF

Info

Publication number
WO2024224324A1
WO2024224324A1 PCT/IB2024/054026 IB2024054026W WO2024224324A1 WO 2024224324 A1 WO2024224324 A1 WO 2024224324A1 IB 2024054026 W IB2024054026 W IB 2024054026W WO 2024224324 A1 WO2024224324 A1 WO 2024224324A1
Authority
WO
WIPO (PCT)
Prior art keywords
tet
engineered
enzyme
catalytic domain
tet enzyme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/IB2024/054026
Other languages
English (en)
Inventor
Nicola A. BURGESS-BROWN
John HINKS
Marcello TORTORICI
Rosa D'AGOSTINO
Alejandra FERNANDEZ-CID
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Exact Sciences Innovation Ltd
Original Assignee
Exact Sciences Innovation Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Exact Sciences Innovation Ltd filed Critical Exact Sciences Innovation Ltd
Priority to AU2024264100A priority Critical patent/AU2024264100A1/en
Publication of WO2024224324A1 publication Critical patent/WO2024224324A1/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0071Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/52Genes encoding for enzymes or proenzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y114/00Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14)
    • C12Y114/11Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14) with 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors (1.14.11)

Definitions

  • the present invention relates to engineered TET enzymes that find use in epigenetics and Next Generation Sequencing (NGS), and more specifically to sequencing methods such as Tet-Assisted Pyridine Borane Sequencing (TAPS).
  • NGS Next Generation Sequencing
  • TAPS Tet-Assisted Pyridine Borane Sequencing
  • TAPS protocols utilize Ten-Eleven Translocase (TET) enzymes.
  • the TET enzyme family oxidizes 5 -methylcytosine (5mC) promoting locus-specific reversal of DNA methylation. Methylation of Cytosines is a frequent epigenetic modification in eukaryotes; 5- methylcytosine (5mC) is produced by DNA methyltransferase (Dnmt) activity and is located on CG dinucleotides (CpGs) of chromosomal DNA.
  • Dnmt DNA methyltransferase
  • CpGs CG dinucleotides
  • TET proteins oxidize 5mC to 5- hydroxymethylcytosine (5hmC), 5 -formylcytosine (5fmC), and 5 -carboxylcytosine (5caC) to modify DNA methylation patterns.
  • the present invention relates to engineered TET enzymes that find use in epigenetics and Next Generation Sequencing (NGS), and more specifically to sequencing methods such as Tet-Assisted Pyridine Borane Sequencing (TAPS).
  • NGS Next Generation Sequencing
  • TAPS Tet-Assisted Pyridine Borane Sequencing
  • the present invention provides an engineered Ten-Eleven Translocase (TET) enzyme comprising a TET catalytic domain comprising at least one substitution mutation selected from the group consisting of: substitution of a non-cysteine amino acid in the catalytic domain with a cysteine; substitution of cysteine in the catalytic domain with a non-cysteine amino acid; and combinations thereof; wherein the engineered TET enzyme catalyzes the oxidation of 5 -methylcytosine (5mC) to 5- hydroxymethylcytosine (5hmC) and/or 5hmC to 5 -formylcytosine (5fC) and/or 5- carboxycytosine (5caC).
  • TET Ten-Eleven Translocase
  • the at least one non-cysteine amino acid that is substituted with a cysteine occurs at a position in the catalytic domain of the TET enzyme that is occupied by a cysteine in an isoform or ortholog of the TET enzyme.
  • the TET catalytic domain is a mouse TET2 catalytic domain and the isoform is mouse TET1.
  • the at least one non-cysteine amino acid that is substituted with a cysteine is at a position selected from the group consisting of 1286, 1291, 1322, 1835, and 1837 and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET2.
  • the substitutions are selected from the group consisting of A1286C, S1291C, K1332C, M1835C and E1837C substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET2.
  • the at least one cysteine in the catalytic domain that is substituted with a non-cysteine amino acid occurs at a position in the catalytic domain of the TET enzyme that is occupied by a non-cysteine in an isoform of the TET enzyme.
  • the TET catalytic domain is a mouse TET2 catalytic domain and the isoform is a mouse TET1 enzyme.
  • the at least one cysteine that is substituted with a non-cysteine amino acid is at a position selected from the group consisting of 1168, 1171, 1185, 1272, 1772, 1792 and 1827 and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET2.
  • the substitutions are selected from the group consisting of C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET2.
  • the catalytic domain of the engineered TET enzyme has an N terminal portion that shares at least 80% sequence identity with SEQ ID NO: 1 and a C terminal portion that shares at least 80% sequence identity to SEQ ID NO:2, with the proviso that the catalytic domain consists of at least one substitution selected from the group consisting of A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET2.
  • the N terminal portion of the catalytic domain corresponds to SEQ ID NO: 1 and the C terminal portion of the catalytic domain corresponds to SEQ ID NO:2.
  • the N terminal portion of the catalytic domain and the C terminal portion of the catalytic domain are connected by a heterologous linker sequence.
  • the TET catalytic domain is a mouse TET3 catalytic domain and the isoform is mouse TET1.
  • the at least one non-cysteine amino acid that is substituted with a cysteine is at a position selected from the group consisting of 1076, 1112, 1721, and 1726 and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET3.
  • the substitutions are selected from the group consisting of A1076C, Il 112C, M1721C, and E1726C substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET3.
  • the engineered TET enzyme further comprises an amino acid substitution at a position selected from the group consisting of 975, 1658, and 1678 and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET3.
  • the substitutions are selected from the group consisting of A975T, N1658S, and R1678K substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET3.
  • the catalytic domain of the engineered TET enzyme has an N terminal portion that shares at least 80% sequence identity with SEQ ID NO:3 and a C terminal portion that shares at least 80% sequence identity to SEQ ID NO: 4, with the proviso that the catalytic domain consists of at least 1 substitution selected from the group consisting of A1076C, I1112C, M1721C, E1726C, A975T, N1658S, and R1678K substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length mouse TET3.
  • the N terminal portion of the catalytic domain corresponds to SEQ ID NO: 3 and the C terminal portion of the catalytic domain corresponds to SEQ ID NO:4.
  • the N terminal portion of the catalytic domain and the C terminal portion of the catalytic domain are connected by a heterologous linker sequence.
  • the TET catalytic domain is a Lygus hesperus TET2 catalytic domain and the ortholog is mouse TET1.
  • the at least one non-cysteine amino acid that is substituted with a cysteine is at a position selected from the group consisting of 740, 773, and 1400 and combinations thereof, wherein the position numbers are based on position numbers for partial Lygus hesperus TET2.
  • the substitutions are selected from the group consisting of A740C, K773C, and Q1400C substitutions and combinations thereof, wherein the position numbers are based on position numbers for partial length Lygus hesperus TET2.
  • the engineered TET enzyme further comprises an amino acid substitution at a position selected from the group consisting of 627, 641, 1337, and 1357 and combinations thereof, wherein the position numbers are based on position numbers for partial length Lygus hesperus TET2.
  • the substitutions are selected from the group consisting of V627P, A641T, N1337S, and H1357K substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length Lygus hesperus TET2.
  • the catalytic domain of the engineered TET enzyme has an N terminal portion that shares at least 80% sequence identity with SEQ ID NO:5 and a C terminal portion that shares at least 80% sequence identity to SEQ ID NO:6, with the proviso that the catalytic domain consists of at least 1 substitution selected from the group consisting of A740C, K773C, Q1400C, V627P, A641T, N1337S, and H1357K, substitutions and combinations thereof, wherein the position numbers are based on position numbers for partial length Lygus hesperus TET2.
  • the N terminal portion of the catalytic domain corresponds to SEQ ID NO:5 and the C terminal portion of the catalytic domain corresponds to SEQ ID NO:6.
  • the N terminal portion of the catalytic domain and the C terminal portion of the catalytic domain are connected by a heterologous linker sequence.
  • the TET catalytic domain is a human TET2 catalytic domain and the ortholog is mouse TET1.
  • the at least one non-cysteine amino acid that is substituted with a cysteine is at a position selected from the group consisting of 1373, 1409, 1921, and 1923 and combinations thereof, wherein the position numbers are based on position numbers for full length human TET2.
  • the substitutions are selected from the group consisting of A1373C, K1409C, M1921C, and E1923C substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length human TET2.
  • the engineered TET enzyme further comprises an amino acid substitution at a position selected from the group consisting of 1258, 1272, 1858, and 1878 and combinations thereof, wherein the position numbers are based on position numbers for full length human TET2.
  • the substitutions are selected from the group consisting of L1258P, A1272T, D1858S, and R1878K substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length human TET2.
  • the catalytic domain of the engineered TET enzyme has an N terminal portion that shares at least 80% sequence identity with SEQ ID NO:7 and a C terminal portion that shares at least 80% sequence identity to SEQ ID NO:, 8 with the proviso that the catalytic domain consists of at least 1 substitution selected from the group consisting of A1373C, K1409C, M1921C, E1923C, L1258P, A1272T, D1858S, and R1878K substitutions and combinations thereof, wherein the position numbers are based on position numbers for full length human TET2 and the engineered TET enzyme catalyzes the oxidation of 5 -methylcytosine.
  • the N terminal portion of the catalytic domain corresponds to SEQ ID NO: 7 and the C terminal portion of the catalytic domain corresponds to SEQ ID NO: 8.
  • the N terminal portion of the catalytic domain and the C terminal portion of the catalytic domain are connected by a heterologous linker sequence.
  • the engineered TET enzymes have an N terminal truncation. In some preferred embodiments, the engineered TET enzymes have an N terminal truncation extending to the N-terminus of the catalytic domain.
  • the engineered TET enzymes have a C terminal truncation.
  • the C terminal truncation is up to and including 77 amino acids.
  • the C terminal truncation is up to and including 64 amino acids.
  • the C terminal truncation is up to and including 44 amino acids.
  • LCR Low Complexity Region
  • the engineered TET enzymes comprise at least one purification tag selected from the group consisting of a his tag and a FLAG tag and combinations thereof.
  • the present invention provides a nucleic acid encoding an engineered TET enzyme as described above.
  • the nucleic acid further comprises a promoter in operable association with the nucleic acid encoding the engineered TET enzyme.
  • the present invention provides a vector comprising the nucleic acid.
  • the present invention provides a host cell comprising the nucleic acid or vector.
  • the host cell is a prokaryotic host cell.
  • the host cell is a eukaryotic host cell.
  • the present invention provides methods of producing an engineered TET enzyme comprising: culturing the host cell as described above under conditions such that the engineered TET enzyme is expressed; and isolating the expressed engineered TET enzyme.
  • the present invention provides methods of producing an engineered TET enzyme comprising: introducing a baculovirus vector comprising a nucleic acid sequence encoding the engineered TET enzyme into an insect cell free expression system under conditions such that the engineered TET enzyme is expressed; and isolating the expressed engineered TET enzyme.
  • the present invention provides methods of producing a recombinant TET enzyme comprising: introducing a baculo virus vector comprising a nucleic acid sequence encoding a TET enzyme into insect cell expression system under conditions such that the recombinant TET enzyme is expressed; and isolating the expressed recombinant TET enzyme.
  • the recombinant TET enzyme is a recombinant mTETl, mTET2, or mTET3 enzyme or an engineered TET enzyme or ortholog as described above.
  • the recombinant TET enzyme does not comprise an LCR.
  • the LCR is substituted with a heterologous linker sequence.
  • the recombinant TET enzyme has an N-terminal truncation. In some preferred embodiments, the N-terminal truncation extends to the N terminus of the catalytic domain of the recombinant TET enzyme.
  • the present invention provides a recombinant TET enzyme produced by the foregoing methods.
  • the present invention provides recombinant TET enzymes comprising a catalytic domain having at least 80%, 90%, 95%, 98% or 100% identity to the catalytic domain of a sequence selected from the group consisting of SEQ ID NOs: 19 to 33 wherein the recombinant TET enzyme catalyzes the oxidation of 5- methylcytosine (5mC) to 5 -hydroxymethylcytosine (5hmC) and/or 5hmC to 5 -formylcytosine (5fC) and/or 5 -carboxy cytosine (5caC).
  • the enzyme is modified by deletion of the TET LCR.
  • the LCR is substituted with a heterologous linker sequence.
  • the recombinant TET enzyme has an N-terminal truncation. In some preferred embodiments, the N-terminal truncation extends to the N terminus of the catalytic domain of the recombinant TET enzyme.
  • the enzyme comprises at least one purification tag selected from the group consisting of a his tag and a FLAG tag and combinations thereof.
  • the present invention provides a nucleic acid encoding the recombinant TET enzyme.
  • the nucleic acid further comprises a promoter in operable association with the nucleic acid encoding the recombinant TET enzyme.
  • the present invention provides a vector comprising the nucleic acid.
  • the present invention provides a host cell comprising the nucleic acid or vector.
  • the host cell is a prokaryotic host cell.
  • the host cell is a eukaryotic host cell.
  • the present invention provides methods of modifying a nucleic acid in a nucleic acid sample comprising 5mC and/or 5hmC comprising: contacting the nucleic acid with an engineered or recombinant TET enzyme as described above so that 5mC and/or 5hmC is converted 5fC and/or 5caC to provide an oxidized target nucleic acid.
  • the TET enzyme is an engineered TET enzyme as described above.
  • the methods further comprise the step of contacting the oxidized target nucleic acid with a reducing agent.
  • the reducing agent is a borane reducing agent and the 5fC and/or 5caC is reduced to dihydrouracil (DHU).
  • the borane reducing agent comprises an agent selected from the group consisting of 2-picoline borane (pic-BH3), borane, sodium borohydride, sodium cyanoborohydride, and sodium triacetoxyborohydride.
  • the methods further comprise contacting the oxidized target nucleic acid with sodium bisulfite so that the 5fC and/or 5caC is deaminated to uracil (U).
  • the methods further comprise contacting the oxidized target nucleic acid with an APOBEC (Apolipoprotein B mRNA Editing Catalytic Polypeptide -like) enzyme so that the 5fC and/or 5caC is deaminated to uracil (U).
  • APOBEC Polipoprotein B mRNA Editing Catalytic Polypeptide -like
  • the methods further comprise adding a blocking group to one or more of the 5mC and/or 5hmC in the nucleic acid sample.
  • the blocking group is added prior to contacting with the oxidizing agent.
  • 5hmC is blocked.
  • the blocking group comprises a sugar or a uridine diphosphate (UDP)-linked sugar.
  • the blocking group is added after contacting with the oxidizing agent and prior to contacting with the borane reducing agent.
  • the one or more modified cytosines comprises 5caC or 5fC.
  • the blocking group comprises an aldehyde reactive compound.
  • the aldehyde reactive compound comprises a hydroxylamine derivative, a hydrazine derivative, or a hydrazide derivative.
  • adding the blocking group comprises contacting the nucleic acid sample with (i) a coupling agent and (ii) an amine, hydrazine, or hydroxylamine compound.
  • the methods further comprise sequencing the nucleic acid after contacting with the borane reducing agent to identify converted cytosine bases.
  • the present invention provides a system or kit for oxidizing a methylated nucleotide comprising a TET enzyme (e.g., engineered, or recombinant TET enzyme) as described above.
  • the systems or kits further comprise a borane reducing agent.
  • the borane reducing agent comprises an agent selected from the group consisting of 2-picoline borane (pic-BH3), borane, sodium borohydride, sodium cyanoborohydride, and sodium triacetoxyborohydride.
  • systems or kits further comprise sodium bisulfite.
  • the systems or kits further comprise an APOBEC (Apolipoprotein B mRNA Editing Catalytic Polypeptide-like) enzyme.
  • APOBEC Polipoprotein B mRNA Editing Catalytic Polypeptide-like
  • the systems or kits further comprise a blocking reagent.
  • the blocking reagent is selected from the group consisting of a sugar or a uridine diphosphate (UDP)-linked sugar and an aldehyde reactive compound.
  • the aldehyde reactive compound is selected from the group consisting of a hydroxylamine derivative, a hydrazine derivative, and a hydrazide derivative.
  • the blocking reagent is a sugar or a uridine diphosphate (UDP)-linked sugar and the system or kit further comprises a glucosyltransferase enzyme.
  • FIG. 1 Schematic diagram of Mouse TET (also referred to herein as A/mTET or mTET) enzyme domain organization, showing the catalytic domain of mouse TET isoforms including an N terminal portion including Cys-N, Cys-C and a conserved double -stranded [3- helix (DSBH) sub-domain bisected by a Low Complexity Region (LCR).
  • A/mTET or mTET enzyme domain organization
  • FIG. 2 Typical Size Exclusion Chromatography (SEC) profile for Nickel (Ni) IMAC purified E. colt expressed A/mTET I Full Length (FL) Catalytic Domain (CD).
  • SEC Size Exclusion Chromatography
  • FIG. 3 LCR removal and linker insertion strategies.
  • FIG. 4. Alignment of the //.sTET2 CD sequence from the published Protein Data
  • PDB 5D9Y PDB 5D9Y with A/mTETI CD and A/mTET2 CD locations of various C-terminus truncations.
  • FIG. 5 Summary of domain exchange and construct design impacts tested. This figure illustrates the results from 80 TET enzyme CD constructs expressed at 50 ml scale in E. coli, purified by affinity chromatography, then assessed for purity by SDS-PAGE and for activity via an in-house assay. The following terms are used in the figure.
  • Arrows domain swaps betw een A/mTET I and A/mTET2.
  • C-term tagging replacing the N-terminal tags with equivalent C-terminal tags.
  • GGS glycine, glycine, serine addition ahead of the first native CD residue.
  • Minor Loop an unstructured region reported in Rat and some depositions of AAiiTET I isoforms only.
  • N-term extension inclusion of up to 20 amino acids from the TET1 sequence upstream of the Cys-N domain start.
  • FIG. 6 Detailed scheme of A/mTET2 features relating to CD engineering.
  • FIG. 7A-B The cysteine swaps as applied to A/mTET2 improved enzyme activity and developability.
  • A SDS-PAGE analysis of A/mTET2 purified from insect cells.
  • FIG. 8A-B Comparison of SDS-PAGE analysis between A/mTET2 deltaLCR vs A/mTET2 N-His-Flag CD deltaLCR IxGS CysSwap -44 C-term (referred to herein as TET vl.O and represented by SEQ ID NO: 16).
  • A SDS-PAGE analysis of A/mTET2 deltaLCR purified from E. coli (doublet band).
  • B SDS-PAGE analysis of TET vl.O purified from E. coli (resolved band).
  • FIG. 9 Comparison of TAPS performance (represented by conversion %) between A/mTET2 CD deltaLCR and TET vl.O at 30 and 60 minutes, showing the difference between MmTET2 deltaLCR with Cys swap (TET vl.O) and without Cys swap (A/mTET2 CD deltaLCR).
  • FIG. 10 Comparison of commercial A/mTET2 to TET vl .0 by relative performance in TAPS.
  • FIG. 11 Relative activity of A/mTET2 ALCR vs TET vl.O over (A) a range of temperatures, (B) a range of pH, and (C) a range of NaCl concentrations.
  • FIG. 12 SPR sensorgram comparison of TET vl .0 with commercial A/mTET2.
  • FIG. 13 Percent conversion of fully methylated Lambda generated from sequencing data of either TAPS converted or EM-seq converted DNA.
  • the present invention relates to engineered TET enzymes that find use in epigenetics and Next Generation Sequencing (NGS), and more specifically to sequencing methods such as Tet-Assisted Pyridine Borane Sequencing (TAPS).
  • NGS Next Generation Sequencing
  • TAPS Tet-Assisted Pyridine Borane Sequencing
  • TET enzymes occur in nature across the eukaryotes (predominantly), with mammals having three isoforms (other organisms have varying numbers of isoforms), each exhibiting subtle differences in cellular distribution, activity and sequence.
  • Methylation state sequencing techniques such as TAPS can utilize a TET enzyme to convert methylated DNA into any of its various oxidation states (5hmC, 5fmC and 5caC) which may subsequently either be read directly (e.g., by third generation, or “Long Read” NGS) or chemically converted into alternate bases such as Uracil to be read by conventional NGS (e.g., by second generation, or “Short Read” NGS) techniques.
  • a number of TET enzymes are known in the art. Wild-type TET enzymes are large and comprise extensive regions of negligible structural integrity, making them prone to aggregation, degradation and low yields when expressed heterogeneously (e.g., for manufacturing purposes).
  • a Catalytic Domain (CD) at their C-terminal end comprises a conserved double-stranded (3-helix (DSBH) sub-domain bisected by a Low Complexity Region (LCR), a cysteine-rich region, and cofactor binding sites for Fe(II) and 2-oxoglutarate (2-OG). Together these form the core catalytic region of the enzyme as illustrated for the mouse TET isoforms in FIG. 1.
  • TET enzymes are not ideal for high volume manufacture. In particular, these TET enzymes are typically low yield and prone to aggregation and degradation. Furthermore, TET enzymes that have so far been engineered to display improved manufacturability do not possess the requisite functional characteristics for use in stringent, high sensitivity assay formats (e.g., NGS applications within healthcare and specifically TAPS assays).
  • the present invention addresses this problem by providing engineered TET enzymes with improved manufacturing characteristics and performance in methylation assays such as TAPS.
  • the engineered TET enzymes comprise one or more amino acid substitutions in the catalytic domain so that cysteine distribution of one TET isoform (e.g., mouse TET1 or human TET1) is replicated in a different TET isoform or ortholog (e.g., mouse TET2, human TET2, mouse TET3, or Lygus hesperus TET2).
  • the substitutions can include 1) substitution of one or more non-cysteine amino acids in the catalytic domain of the target TET enzyme (e.g., mouse TET2, human TET2, mouse TET3 or Lygus hesperus TET2) with a cysteine at a position that is occupied by a cysteine in the catalytic domain of a TET isoform or ortholog (e.g., mouse TET1 or human TET1); and/or 2) substitution of one or more cysteines in the catalytic domain of the target TET enzyme (e.g., mouse TET2, human TET2, mouse TET3, or Lygus hesperus TET2) with a non-cysteine amino acid at a position that is occupied by a non- cysteine amino acid in the catalytic domain of a TET isoform or ortholog (e.g., mouse TET1 or human TET1).
  • the target TET enzyme e.g., mouse TET2, human TET2, mouse TET3 or Ly
  • engineered TET enzymes with these substitutions demonstrate improved solubility and stability with reduced aggregation as compared to nonmutated versions without compromising yield in both insect or bacterial expression systems. Furthermore, the engineered TET enzymes display improved performance in TAPS assays as compared to the wild-type enzymes.
  • TET enzymes it is highly desirable to develop TET enzymes to meet the stringent requirements of TAPS assays, which is superior to other methods for mapping DNA modifications (including 5mC and 5hmC).
  • these other methods for instance bisulfite-based methods, include methods that incorporate enzymatic modifications by TET alongside bisulfite treatment (such as TAB-seq), and more recent methods that avoid bisulfite use (such as EM-seq), as these methods cannot provide full sequence and epigenetic information from a single sequencing operation, whereas TAPS can. More specifically, because TAPS provides greater sensitivity than other techniques, an enzyme able to provide maximum conversion rates and cause the least DNA loss is necessary if the full potential of TAPS is to be realized. Furthermore, that enzyme must be capable of commercial production and be robust enough to incorporate into kits, while retaining its high conversion rate activity. Section headings as used in this section and the entire disclosure herein are merely for organizational purposes and are not intended to be limiting.
  • TET enzyme refers to a member of the Ten-Eleven Translocation (TET) family that oxidizes 5 -methyl cytosine (5mC) to 5- hydroxymethylcytosine (5hmC) and 5hmC to 5 -carboxylcytosine (5caC) and 5- formylcytosine (5fC).
  • engineered TET enzyme refers to a TET enzyme that has one or more changes in its amino acid sequence as compared to the wild-type version of the TET enzyme.
  • the changes in amino acid sequence may be one or more of an amino acid substitution, amino acid deletion, truncation of the TET enzyme, or addition of amino acids or functional sequences to the TET enzyme.
  • the percentage of identity of an amino acid sequence or nucleic acid sequence is defined herein as the percentage of residues of the full length of an amino acid sequence or nucleic acid sequence that is identical with the residues in a reference amino acid sequence or nucleic acid sequence after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity.
  • the percentage homology of an amino acid sequence or the term “% homology to” is defined herein as the percentage of amino acid residues in a particular sequence that are homologous with the amino acid residues in a reference sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence homology. This method takes into account conservative amino acid substitutions.
  • amino acids can be similar in several characteristics, for example, size, shape, hydrophobicity, hydrophilicity, charge, isoelectric point, polarity, aromaticity, etc.
  • a conservative substitution is an exchange of one amino acid within a group for another amino acid within the same group, whereby the groups are the following: (1) alanine, valine, leucine, isoleucine, methionine, and phenylalanine: (2) histidine, arginine, lysine, glutamine, and asparagine; (3) aspartate and glutamate; (4) serine, threonine, alanine, tyrosine, phenylalanine, tryptophan, and cysteine; and (5) glycine, proline, and alanine.
  • Methods and computer programs for the alignment are well known in the art, for example "Align 2".
  • protein and polypeptide refer to compounds comprising amino acids joined via peptide bonds and are used interchangeably.
  • a protein or polypeptide encoded by a gene is not limited to the amino acid sequence encoded by a gene, but may include post- translational modifications of one or more amino acids of the protein or polypeptide.
  • N- terminal and C-terminal refer to relative positions in the amino acid sequence of the protein or polypeptide toward the N- terminus and the C- terminus, respectively.
  • N- terminus and C-terminus refer to the extreme amino and carboxyl ends of the polypeptide, respectively.
  • methylation refers to cytosine methylation at positions C5 or N4 of cytosine, the N6 position of adenine, or other types of nucleic acid methylation.
  • In vitro amplified DNA is usually unmethylated because typical in vitro DNA amplification methods do not retain the methylation pattern of the amplification template.
  • unmethylated DNA or “methylated DNA” can also refer to amplified DNA whose original template was unmethylated or methylated, respectively.
  • a “methylated nucleotide” or a “methylated nucleotide base” refers to the presence of a methyl moiety on a nucleotide base, where the methyl moiety is not present in a recognized typical nucleotide base.
  • cytosine does not contain a methyl moiety on its pyrimidine ring, but 5 -methylcytosine contains a methyl moiety at position 5 of its pyrimidine ring. Therefore, cytosine is not a methylated nucleotide and 5 -methylcytosine is a methylated nucleotide.
  • a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides.
  • a “methylation state”, “methylation profile”, “methylation status,” or “methylation signature” of a nucleic acid molecule refers to the presence or absence of one or more methylated nucleotide bases in the nucleic acid molecule.
  • a nucleic acid molecule containing a methylated cytosine is considered methylated (e.g., the methylation state of the nucleic acid molecule is methylated).
  • a nucleic acid molecule that does not contain any methylated nucleotides is considered unmethylated.
  • methylation frequency or “methylation percent (%)” refer to the number of instances in which a molecule or locus is methylated relative to the number of instances the molecule or locus is unmethylated.
  • Methylation state frequency can be used to describe a population of individuals or a sample from a single individual. For example, a nucleotide locus having a methylation state frequency of 50% is methylated in 50% of instances and unmethylated in 50% of instances. Such a frequency can be used, for example, to describe the degree to which a nucleotide locus or nucleic acid region is methylated in a population of individuals or a collection of nucleic acids.
  • the methylation state frequency of the first population or pool will be different from the methylation state frequency of the second population or pool.
  • a frequency also can be used, for example, to describe the degree to which a nucleotide locus or nucleic acid region is methylated in a single individual.
  • a frequency can be used to describe the degree to which a group of cells from a tissue sample are methylated or unmethylated at a nucleotide locus or nucleic acid region.
  • the terms “patient” or “subject” refer to organisms to be subject to various tests provided by the technology.
  • the term “subject” includes animals, preferably mammals, including humans.
  • the subject is a primate.
  • the subject is a human.
  • a preferred subject is a vertebrate subject.
  • a preferred vertebrate is warm-blooded; a preferred warm-blooded vertebrate is a mammal.
  • a preferred mammal is most preferably a human.
  • the term “subject” includes both human and animal subjects. Thus, veterinary therapeutic uses are provided herein.
  • the present technology provides for the diagnosis of mammals such as humans, as well as those mammals of importance due to being endangered; of economic importance, such as animals raised on farms for consumption by humans; and/or of social importance to humans, such as animals kept as pets or in zoos.
  • mammals such as humans, as well as those mammals of importance due to being endangered; of economic importance, such as animals raised on farms for consumption by humans; and/or of social importance to humans, such as animals kept as pets or in zoos.
  • animals include but are not limited to: carnivores such as cats and dogs; swine, including pigs, hogs, and wild boars; ruminants and/or ungulates such as cattle, oxen, sheep, giraffes, deer, goats, bison, and camels; pinnipeds; and horses.
  • nucleic acid sample refers to nucleic acid obtained from an organism from the Monera (bacteria), Protista, Fungi, Plantae, and Animalia Kingdoms.
  • the nucleic acid may also be obtained from a virus.
  • Nucleic acid samples may be obtained from a patient or subject, from an environmental sample, or from an organism of interest (e.g., both cellular and circulating cell-free DNA (cfDNA) obtained from tissue, a cell, collection of cells, blood, plasma, serum, organ secretion, semen (seminal fluid), vaginal secretions, cerebral spinal fluid (CSF), saliva, mucus, urine, stool, sweat, pancreatic juice, gastric secretions, gastric fluid (gastric lavage), ascitic fluid, synovial fluid, pleural fluid (pleural lavage), pericardial fluid, peritoneal fluid, amniotic fluid, nasal fluid, optic fluid, breast milk, or any other bodily fluid comprising a desired nucleic acid or cfDNA), DNA obtained from biopsies, and DNA obtained from cells, secretions, or tissues from the lymph gland, breast, liver, bile ducts, pancreas, mouth, stomach, colon, rectum, esophagus, small intestine,
  • the target nucleic acid may be obtained from a sample that contains diseased tissue or cells, or is suspected of containing diseased tissue or cells (e.g., a sample that is cancerous, or contains cancerous tissue or cells, or is suspected of being cancerous or suspected of containing cancerous tissue or cells).
  • the nucleic acid sample is obtained from a subject that has a disease or disorder (e.g., cancer), is suspected of having the disease or disorder, or is being screened to determine the presence of the disease or disorder.
  • the nucleic acid sample is circulating cell-free DNA (cell-free DNA or cfDNA), for instance DNA found in the blood and is not present within a cell.
  • cfDNA can be isolated from a bodily fluid using methods known in the art.
  • Commercial kits are available for isolation of cfDNA including, for example, the Circulating Nucleic Acid Kit (Qiagen).
  • the nucleic acid sample may result from an enrichment step, including, but not limited to antibody immunoprecipitation, chromatin immunoprecipitation, restriction enzyme digestionbased enrichment, hybridization-based enrichment, or chemical labeling-based enrichment.
  • Preferred methods and materials are described below, although methods and materials similar or equivalent to those described herein can be used in practice or testing of the present disclosure. All publications, patent applications, patents and other references mentioned herein are incorporated by reference in their entirety. The materials, methods, and examples disclosed herein are illustrative only and not intended to be limiting.
  • the present invention provides engineered TET enzymes comprising a TET catalytic domain comprising one or more changes to at least one amino acid.
  • the changes to at least one amino acid may comprise, consist essentially of, or consist of a substitution, deletion, addition, or truncation.
  • the changes to at least one amino acid may comprise, consist essentially of, or consist of one or more amino acid substitutions.
  • the substitutions comprise, consist essentially of, or consist of 1) substitution of one or more non-cysteine amino acids in the catalytic domain of the engineered TET enzyme with a cysteine; and/or 2) substitution of cysteine in the catalytic domain of the engineered TET enzyme with a non-cysteine amino acid.
  • substitution of the one or more non-cysteine amino acids in the catalytic domain of the engineered TET enzyme with a cysteine is at a position that is occupied by a cysteine in the catalytic domain of a TET isoform or ortholog of the engineered TET enzyme.
  • substitution of the one or more cysteines in the catalytic domain of the target TET enzyme with a non-cysteine amino acid is at a position that is occupied by a non-cysteine amino acid in the catalytic domain of a TET isoform or ortholog of the engineered TET enzyme.
  • substitutions or “cysteine swaps” the cysteine distribution in the catalytic domain of the engineered TET enzyme is made to replicate or more closely replicate the cysteine distribution in an isoform or ortholog of the engineered TET enzyme. Based on the guidance herein, the cysteine swaps may be performed between any orthologs from any species.
  • the swaps may be performed between TET1, TET2, or TET3 orthologs from the same or different species.
  • exemplary species include, but are not limited to, Mus musculus, Homo sapiens, Naegleria gruberi, Coprinopsis cinerea Lygus hesperus, Drosophila melanogaster, Xenopus laevis, Aedes aegipty, Anopheles gambia, Drosophila ananassae, Cavia porcellus, Lama pacos, Macaca fascicularis, Mus adji, Oryctolagus cuniculus, Chanos chanos, Sus scrofa, Xenopus tropicalis, etc.
  • the engineered TET enzyme may be mouse TET2, human TET2, mouse TET3 or Lygus hesperus TET2 and the isoform or ortholog of the engineered TET enzyme may be mouse TET1 or human TET1.
  • the present invention is not limited to engineered TET enzymes from any particular species (e.g., the exemplary species listed in the preceding paragraph).
  • the engineered TET enzyme may be a mouse TET enzyme, a human TET enzyme, or a. Lygus hesperus TET enzyme.
  • the cysteine substitution principles described herein can be applied to any TET enzyme with a known sequence.
  • the present invention is not limited to any particular TET isoform. Suitable TET isoforms include TET1, TET2 and TET3 isoforms.
  • the present invention is not limited to engineered TET enzymes with any particular number of catalytic domain amino substitutions.
  • the engineered TET enzymes may comprise from 1 to 12, or 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 substitutions in the catalytic domain.
  • the engineered TET enzymes comprise one or more deletions, truncations or additions as compared to the corresponding wild-type enzymes.
  • the engineered TET enzyme is truncated at its N terminus as compared to the corresponding wild-type enzyme.
  • the N terminal truncation extends to the N terminal portion of the catalytic domain of the TET enzyme.
  • the engineered TET enzyme is truncated at its C terminus as compared to the corresponding wild-type enzyme.
  • the C terminal truncation is from 10 to 77, 20 to 77, 30 to 77, 10 to 64, 20 to 64, 30 to 64, 10 to 44, 20 to 44, 30 to 44, 10 to 50, 20 to 50, 30 to 50, 10 to 55, 20 to 55, 30 to 55, 10 to 60, 20 to 60, 30 to 60, 10 to 70, 20 to 70, 30 to 70, 10 to 75, 20 to 74 or 30 to 75 amino acids as compared to the corresponding wild-type enzyme. In some embodiments, the C terminal truncation is about 44, 50, 55, 60, 64, 70, 75 or 77 amino acids as compared to the corresponding wild-type enzyme, wherein about is +/- five amino acids.
  • the C terminal truncation is 77 amino acids as compared to the corresponding wild-type enzyme. In some embodiments, the C terminal truncation is 64 amino acids as compared to the corresponding wild-type enzyme. In some embodiments, the C terminal truncation is 44 amino acids as compared to the corresponding wild-type enzyme.
  • LCR Low Complexity Region
  • all or a portion of the Low Complexity Region (LCR) in the catalytic domain of the engineered TET enzyme is deleted as compared to the wild-type enzyme.
  • at least from 50, 100, 150, 200, 250, 300, 350, or 375 amino acids up to the full length of the LCR are deleted.
  • about 375 amino acids of the LCR are deleted, where about is +/- 5 amino acids.
  • amino acids E1373 to A1754 or K1379 to A1745 may be deleted.
  • the deleted LCR is replaced with a linker sequence. Any suitable linker sequence as is known in the art may be utilized, such as flexible glycine/serine linker sequences.
  • the linker has or consists of the sequence GGGGSGGGGSGGGGS (SEQ ID NO: 13).
  • the engineered TET enzymes may further be modified to include one or more sequences that facilitate purification of the engineered TET enzyme following its expression in a suitable expression system.
  • the engineered TET enzyme comprises a His tag, such as a 6x His tag, preferably at the N terminus of the engineered TET enzyme.
  • the engineered TET enzyme comprises a Flag tag (e.g., DYKDDDDK (SEQ ID NO: 14)), preferably at the N terminus of the engineered TET enzyme.
  • the engineered TET enzyme comprises both a His tag and Flag tag (e.g., SEQ ID NO:35), which are preferably located at the N terminus of the engineered enzyme.
  • a GGS sequence is included between the tag(s) and the N terminal portion of the catalytic domain of the TET enzyme.
  • the engineered TET enzyme comprises a mouse TET2 catalytic domain, mouse TET3 catalytic domain, human TET2 catalytic domain or Lygus hesperus TET2 catalytic domain with one or more substitutions, most preferably cysteine swaps, in the catalytic domain.
  • the catalytic domain of mouse TET isoforms includes an N terminal portion including Cys-N, Cys-C and DSBH regions and a C terminal portion including a DSBH region, separated by the LCR. As indicated above, all or a portion of the LCR is deleted and preferably replaced by a linker sequence in preferred embodiments of the invention.
  • the N-terminal and/or the C terminal is truncated as compared to the wild type enzyme.
  • the engineered enzymes comprise an engineered catalytic domain wherein the N terminal and C terminal portions of the catalytic domain (with the one or more specified substitutions) are joined by a linker sequence.
  • the engineered TET enzyme comprises a mouse TET2 catalytic domain comprising one or more substitutions, the one or more substitutions comprising, consisting essentially of, or consisting of one or more of the following substitutions: A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, Cl 185T, C1272R, C1772S, C1792K and C1827G, wherein the positions are based on position numbering from full length mouse TET2 (SEQ ID NO:9).
  • the one or more substitutions of the mouse TET2 catalytic domain comprise, consist essentially of, or consist of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 of the substitutions in any combination.
  • the catalytic domain of the engineered mouse TET2 enzyme has an N terminal portion that shares at least 90%, 95% or 98%, 99% or 100% sequence identity with SEQ ID NO: 1 and a C terminal portion that shares at least 90%, 95%, 98%, 99%, or 100% sequence identity to SEQ ID NO:2, with the proviso that the catalytic domain comprises at least one substitution selected from the group consisting of A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G and combinations thereof.
  • the catalytic domain of the engineered mouse TET2 enzyme has an N terminal portion that shares at least 90%, 95% or 98%, 99% or 100% sequence identity with SEQ ID NO: 1 and a C terminal portion that shares at least 90%, 95%, 98%, 99%, or 100% sequence identity to SEQ ID NO:2, with the proviso that the catalytic domain comprises one or more substitutions consisting of at least one substitution selected from the group consisting of A1286C, S1291C, K1332C, M1835C, E1837C, Cl 168Y, Cl 171P, Cl 185T, C1272R, C1772S, C1792K and C1827G and combinations thereof.
  • the catalytic domain of the engineered mouse TET2 enzyme has an N terminal portion that has 100% sequence identity with SEQ ID NO: 1 and a C terminal portion that has 100% sequence identity to SEQ ID NO:2.
  • the engineered mouse TET2 enzyme shares at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 15, with the proviso that the catalytic domain comprises at least one substitution selected from the group consisting of A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G and combinations thereof.
  • the engineered mouse TET2 enzyme shares at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 15, with the proviso that the catalytic domain comprises one or more substitutions consisting of at least one substitution selected from the group consisting of A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G and combinations thereof.
  • the engineered mouse TET2 enzyme is set forth in SEQ ID NO: 15.
  • the engineered mouse TET2 enzyme of SEQ ID NO: 15 has an N terminal truncation extending to the N terminal portion of the catalytic domain, a C terminal truncation of 44 amino acids, and the LCR is replaced with a GS linker sequence.
  • the engineered mouse TET2 comprises a polypeptide sequence having at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 15, with the proviso that the catalytic domain comprises the following substitutions: A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G.
  • the engineered mouse TET2 comprises a polypeptide sequence having at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 15, with the proviso that the catalytic domain comprises one or more substitutions consisting of the following substitutions: A1286C, S1291C, K1332C, M1835C, E1837C, Cl 168Y, Cl 171P, Cl 185T, C1272R, C1772S, C1792K and C1827G.
  • the engineered mouse TET2 enzyme shares at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 16, with the proviso that the catalytic domain comprises at least one substitution selected from the group consisting of A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G and combinations thereof.
  • the engineered mouse TET2 enzyme shares at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 16, with the proviso that the catalytic domain comprises one or more substitutions consisting of at least one substitution selected from the group consisting of A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G and combinations thereof.
  • the engineered mouse TET2 enzyme is set forth in SEQ ID NO: 16.
  • the engineered mouse TET2 enzyme of SEQ ID NO: 16 has His and Flag tags at the N terminus, an N terminal truncation extending to the N terminal portion of the catalytic domain, a C terminal truncation of 44 amino acids, and the LCR is replaced with a GS linker sequence.
  • the engineered mouse TET2 comprises a polypeptide sequence having at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 16, with the proviso that the catalytic domain comprises at least one substitution selected from the group consisting of A1286C, S1291C, K1332C, M1835C, E1837C, Cl 168Y, C1171P, C1185T, C1272R, C1772S, C1792K and Cl 827G and combinations thereof.
  • the engineered mouse TET2 comprises a polypeptide sequence having at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 16, with the proviso that the catalytic domain comprises one or more substitutions consisting of at least one substitution selected from the group consisting of A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G and combinations thereof.
  • the engineered mouse TET2 enzyme shares at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 16, with the proviso that the catalytic domain comprises the following substitutions: A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G.
  • the engineered mouse TET2 enzyme shares at least 90%, 95%, 98%, 99% or 100% identity with SEQ ID NO: 16, with the proviso that the catalytic domain comprises one or more substitutions consisting of the following substitutions: A1286C, S1291C, K1332C, M1835C, E1837C, C1168Y, C1171P, C1185T, C1272R, C1772S, C1792K and C1827G.
  • the engineered TET enzyme comprises a mouse TET3 catalytic domain comprising one or more substitutions, the one or more substitutions comprising, consisting essentially of, or consisting of one or more of the following substitutions: A1076C, I1112C, M1721C, E1726C, A975T, N1658S, and R1678K, wherein the positions are based on position number from full length mouse TET3 (SEQ ID NO: 10).
  • the one or more substitutions of the mouse TET3 catalytic domain comprise, consist essentially of, or consist of 1, 2, 3, 4, 5, 6, or 7 of the substitutions in any combination.
  • the catalytic domain of the engineered mouse TET3 enzyme has an N terminal portion that shares at least 90%, 95% or 98%, 99% or 100% sequence identity with SEQ ID NO:3 and a C terminal portion that shares at least 90%, 95%, 98%, 99%, or 100% sequence identity to SEQ ID NO:4, with the proviso that the catalytic domain comprises at least one substitution selected from the group consisting of A1076C, I1112C, M1721C, E1726C, A975T, N1658S, and R1678K and combinations thereof.
  • the catalytic domain of the engineered mouse TET3 enzyme has an N terminal portion that shares at least 90%, 95% or 98%, 99% or 100% sequence identity with SEQ ID NO:3 and a C terminal portion that shares at least 90%, 95%, 98%, 99%, or 100% sequence identity to SEQ ID NO:4, with the proviso that the catalytic domain comprises one or more substitutions consisting of at least one substitution selected from the group consisting of A1076C, Il 112C, M1721C, E1726C, A975T, N1658S, and R1678K and combinations thereof.
  • the catalytic domain of the engineered mouse TET3 enzyme has an N terminal portion that has 100% sequence identity with SEQ ID NO:3 and a C terminal portion that has 100% sequence identity to SEQ ID NO:4.
  • the engineered TET enzyme comprises a Lygus hesperus TET2 catalytic domain one or more substitutions, the one or more substitutions comprising, consisting essentially of, or consisting of one or more of the following substitutions: A740C, K773C, Q1400C, V627P, A641T, N1337S, and H1357K, wherein the positions are based on position number from partial length Lygus hesperus TET2 (SEQ ID NO: 11).
  • the one or more substitutions of the Lygus hesperus TET2 catalytic domain comprise, consist essentially of, or consist of 1, 2, 3, 4, 5, 6 or 7 of the substitutions in any combination.
  • the catalytic domain of the engineered Lygus hesperus TET2 enzyme has an N terminal portion that shares at least 90%, 95% or 98%, 99% or 100% sequence identity with SEQ ID NO:5 and a C terminal portion that shares at least 90%, 95%, 98%, 99%, or 100% sequence identity to SEQ ID NO:6, with the proviso that the catalytic domain comprises at least one substitution selected from the group consisting of A740C, K773C, Q1400C, V627P, A641T, N1337S, and H1357K, and combinations thereof.
  • the catalytic domain of the engineered Lygus hesperus TET2 enzyme has an N terminal portion that shares at least 90%, 95% or 98%, 99% or 100% sequence identity with SEQ ID NO:5 and a C terminal portion that shares at least 90%, 95%, 98%, 99%, or 100% sequence identity to SEQ ID NO:6, with the proviso that the catalytic domain comprises one or more substitutions consisting of at least one substitution selected from the group consisting of A740C, K773C, Q1400C, V627P, A641T, N1337S, and H1357K, and combinations thereof.
  • the catalytic domain of the engineered Lygus hesperus TET2 enzyme has an N terminal portion that has 100% sequence identity with SEQ ID NO: 5 and a C terminal portion that has 100% sequence identity to SEQ ID NO:6.
  • the engineered TET enzyme comprises a human TET2 catalytic domain comprising one or more substitutions, the one or more substitutions comprising, consisting essentially of, or consisting of one or more of the following substitutions: A1373C, K1409C, M1921C, E1923C, L1258P, A1272T, D1858S, and R1878K, wherein the positions are based on position number from full length human TET2 (SEQ ID NO: 12).
  • the one or more substitutions of the human TET2 catalytic domain comprise, consist essentially of, or consist of 1, 2, 3, 4, 5, 6, 7, or 8 of the substitutions in any combination.
  • the catalytic domain of the engineered human TET2 enzyme has an N terminal portion that shares at least 90%, 95% or 98%, 99% or 100% sequence identity with SEQ ID NO:7 and a C terminal portion that shares at least 90%, 95%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 8, with the proviso that the catalytic domain comprises at least one substitution selected from the group consisting of A1373C, K1409C, M1921C, E1923C, L1258P, A1272T, D1858S, and R1878K and combinations thereof.
  • the catalytic domain of the engineered human TET2 enzyme has an N terminal portion that shares at least 90%, 95% or 98%, 99% or 100% sequence identity with SEQ ID NO:7 and a C terminal portion that shares at least 90%, 95%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 8, with the proviso that the catalytic domain comprises one or more substitutions consisting of at least one substitution selected from the group consisting of A1373C, K1409C, M1921C, E1923C, L1258P, A1272T, D1858S, and R1878K and combinations thereof.
  • the catalytic domain of the engineered human TET2 enzyme has an N terminal portion that has 100% sequence identity with SEQ ID NO:7 and a C terminal portion that has 100% sequence identity to SEQ ID NO:8.
  • the present invention provides TET orthologs which have unexpected properties related to expression and/or activity as shown in the Examples.
  • These TET orthologs may be selected from any TET enzyme from any species expressing a TET enzyme.
  • the TET orthologs may be selected from Aedes aegipty TET1, Anopheles gambia TET1, Drosophila ananassae TET1, Drosophila melanogaster TET1, Guinea pig TET1, Lama pacos TET1, Lygus hesperus TET2, Macaca fascicularis TET3, Mus accordingi TET2, Rabbit TET2, Xenopus laevis TET1, Chanos chanos TET3, Sus scrofa TET3, Goldfish TET2 w Xenopus tropi calls TET3.
  • the TET orthologs are recombinant TET orthologs.
  • the orthologs may be engineered to include one or more modifications as described above.
  • the orthologs are modified by truncation of the N terminus, most preferably up to the N terminal portion of the catalytic domain.
  • the orthologs are modified to delete all or a portion of the LCR, for example at least 80%, 90%, 95%, 98% or 100% of the LCR in the particular ortholog.
  • the LCR or portion thereof that has been deleted is replaced with a linker sequence.
  • the linker sequence is a GS linker sequence.
  • the orthologs are modified to include a C terminal truncation.
  • the C terminal truncation may be from 10 to 77, 10 to 64, 10 to 54, 10 to 44, 10 to 34, or 10 to 24 amino acids in length.
  • the orthologs may be modified to include one or more cysteine swaps.
  • the substitutions may comprise, consist essentially of, or consist of 1) substitution of one or more non-cysteine amino acids in the catalytic domain of the TET ortholog with a cysteine; and/or 2) substitution of cysteine in the catalytic domain of the TET ortholog with a non-cysteine amino acid.
  • substitution of the one or more non-cysteine amino acids in the catalytic domain of the TET ortholog with a cysteine is at a position that is occupied by a cysteine in the catalytic domain of a different TET isoform or ortholog.
  • substitution of the one or more cysteines in the catalytic domain of the TET ortholog with a non-cysteine amino acid is at a position that is occupied by a non-cysteine amino acid in the catalytic domain of a different TET isoform or ortholog.
  • the TET ortholog may be modified to include a His tag, Flag tag, or a combination thereof, preferably at the N terminus of the TET ortholog.
  • the present invention provides recombinant TET enzymes comprising a catalytic domain having at least 80%, 90%, 95%, 98%, 99% or 100% identity to the catalytic domain of a sequence selected from the group consisting of SEQ ID NO: 19 or 34 (Lygus hesperus TET2), SEQ ID NO:20 (Aedes aegipty TET1), SEQ ID NO:21 (Anopheles gambia TET1), SEQ ID NO :22 (Drosophila ananas sae TET1), SEQ ID NO:23 (Drosophila melanogaster TET1), SEQ ID NO:24 (Guinea pig TET1), SEQ ID NO:25 (Lama pacos TET1), SEQ ID NO:26 (Macaca fascicularis TET3), SEQ ID NO:27 (Mus adji TET2), SEQ ID NO:28 (Rabbit TET2), SEQ ID NO:20 (Aedes aegi
  • the TET ortholog catalyzes the oxidation of 5 -methylcytosine (5mC) to 5 -hydroxymethylcytosine (5hmC) and/or 5hmC to 5 -formylcytosine (5fC) and/or 5- carboxycytosine (5caC).
  • the sequences provided include the N terminal and C terminal portions of catalytic domains of the orthologs linked by a linker sequence.
  • the exemplified sequences include a GS linker in place of all or a portion of the LCR. It will be understood that when the LCR is specified as being deleted (all or a portion thereof), the percent identities defined above will be understood to refer to the N and C terminal portions of the catalytic domain that flank the LCR (or linker sequence when present).
  • the present invention provides recombinant TET enzymes comprising a catalytic domain having at least 80%, 90%, 95%, 98%, 99% or 100% identity to the N and C terminal portions of a catalytic domain (e.g., the portion on either side of the linker sequence where present) of a sequence selected from the group consisting of SEQ ID NO: 19 or 34 (Lygus hesperus TET2), SEQ ID NO:20 (Aedes aegipty TET1), SEQ ID NO:21 (Anopheles gambia TET1), SEQ ID NO:22 (Drosophila ananassae TET1), SEQ ID NO:23 (Drosophila melanogaster TET1), SEQ ID NO:24 (Guinea pig TET1), SEQ ID NO:25 (Lama pacos TET1), SEQ ID NO:26 (Macaca fascicularis TET3), SEQ ID NO:27 (Mus accession hesperus TET
  • the TET ortholog catalyzes the oxidation of 5- methylcytosine (5mC) to 5 -hydroxymethylcytosine (5hmC) and/or 5hmC to 5 -formylcytosine (5fC) and/or 5 -carboxy cytosine (5caC).
  • the present invention provides nucleic acids encoding the engineered TET enzymes and orthologs described above, vectors containing the nucleic acids, hosts cells containing the nucleic acids or vectors, and methods for the expression and production of the engineered TET enzymes and orthologs.
  • the nucleic acids encoding the engineered TET enzymes and orthologs may be employed for producing the engineered TET enzymes and orthologs by recombinant techniques.
  • the nucleic acids may be included in any one of a variety of expression vectors for expressing a polypeptide.
  • vectors include, but are not limited to, retroviral vectors, chromosomal, nonchromosomal and synthetic DNA sequences (e.g., derivatives of SV40, bacterial plasmids, phage DNA, baculovirus, yeast plasmids, vectors derived from combinations of plasmids and phage DNA, and viral DNA such as vaccinia, adenovirus, fowl pox virus, and pseudorabies). It is contemplated that any vector may be used as long as it is replicable and viable in the host.
  • retroviral vectors e.g., derivatives of SV40, bacterial plasmids, phage DNA, baculovirus, yeast plasmids, vectors derived from combinations of plasmids and phage DNA, and viral DNA such as vaccinia, adenovirus, fowl pox virus, and pseudorabies.
  • some embodiments of the present invention provide recombinant constructs comprising a nucleic acid encoding an engineered TET enzyme or ortholog as described above.
  • the constructs comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention has been inserted, in a forward or reverse orientation.
  • the nucleic acid sequence is assembled in appropriate phase with translation initiation and termination sequences.
  • the appropriate nucleic acid (e.g., DNA) sequence is inserted into the vector using any of a variety of procedures.
  • nucleic sequence is inserted into an appropriate restriction endonuclease site(s) by procedures known in the art.
  • Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook, et al. (1989) Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N.Y.
  • vectors include, but are not limited to, the following vectors: 1) Bacterial— pQE70, pQE60, pQE-9 (Qiagen), pBS, pDlO, phagescript, psiX174, pbluescript SK, pBSKS, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); 2) Eukaryotic-pWLNEO, pSV2CAT, pOG44, PXT1, pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia); and 3) Baculovirus— pPbac and pMbac (Stratagene).
  • expression vectors comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation sites, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking non-transcribed sequences.
  • DNA sequences derived from the SV40 splice, and polyadenylation sites may be used to provide the required non-transcribed genetic elements.
  • the nucleic acid sequence in the expression vector is operatively linked to an appropriate expression control sequence(s) (promoter) to direct mRNA synthesis.
  • Promoters useful in the present invention include, but are not limited to, viral long terminal repeats (LTR), the SV40 promoter, the E.
  • recombinant expression vectors include origins of replication and selectable markers permitting transformation of the host cell.
  • the present invention provides host cells containing the above-described constructs.
  • the host cell is a higher eukaryotic cell (e.g., a mammalian or insect cell).
  • the host cell is a lower eukaryotic cell (e.g., a yeast cell).
  • the host cell can be a prokaryotic cell (e.g., a bacterial cell).
  • host cells include, but are not limited to, Escherichia coli, Salmonella typhimurium, Bacillus suhtilis, and various species within the genera Pseudomonas, Streptomyces, and Staphylococcus, as well as Saccharomyces cerevisiae, Schizosaccharomycees pomhe, Drosophila S2 cells, Spodoptera Sf9 cells, Chinese hamster ovary (CHO) cells, COS-7 lines of monkey kidney fibroblasts, C127, 3T3, 293, 293T, HeLa and BHK cell lines.
  • the constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence.
  • introduction of the construct into the host cell can be accomplished by retroviral transduction, calcium phosphate transfection, DEAE-Dextran mediated transfection, or electroporation (see, e.g., Davis et al. [1986] Basic Methods in Molecular Biology).
  • the engineered TET polypeptides and orthologs are expressed in a baculovirus expression system (BVES).
  • BVES systems are commercially available, for example from Expression Systems (Davis, CA) and ThermoFisher Scientific.
  • the nucleic acids are inserted into a baculovirus vector.
  • the nucleic acid is cloned into a transfer plasmid, typically behind a promoter that can drive protein expression to high levels in insect cells.
  • the nucleic acid that is to be expressed is typically flanked by AcMNPV DNA, e.g., the polyhedrin promoter on one side and a portion of the essential gene ORF 1629 on the other.
  • Insect cells are then co-transfected with a mixture of the transfer plasmid and parental AcMNPV DNA that has been linearized such that the parental polyhedrin gene and portion of ORF 1629 are missing, rendering it non-infectious.
  • the plasmid and parental DNA undergo homologous recombination to generate de novo recombinant baculoviruses. These baculoviruses are plated and individual plaques purified to isolate a single, pure plaque of recombinant baculovirus. This plaque is subsequently passaged through multiple rounds of insect cell infection to generate a high-titer stock and establish a working virus bank (WVB) that can be utilized for protein production.
  • WVB working virus bank
  • a high-titer WVB Once a high-titer WVB has been established, it is used to infect insect cells and stimulate protein production. Cells are seeded in culture flasks (for small-scale production) or bioreactors (for large-scale production) and the WVB added to infect the insect cells when they are in their logarithmic growth phase. The baculoviruses reprogram the cellular machinery to produce the recombinant protein(s). Following protein expression (typically 48-96 hours post-infection), the cells and/or supernatant are harvested, depending on whether the product is intracellular or secreted, respectively.
  • protein is secreted and cells are cultured for an additional period.
  • cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification.
  • microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents.
  • the present invention also provides methods for recovering and purifying the engineered TET enzymes and orthologs from recombinant cell cultures including, but not limited to, ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxyapatite chromatography and lectin chromatography.
  • the present invention provides improved methods for the expression and subsequent purification of active engineered TET enzymes and orthologs.
  • 5 -Methylcytosine (5mC) and 5 -hydroxymethylcytosine (5hmC) are the two major epigenetic marks found in the mammalian genome.
  • 5hmC is generated from 5mC by TET enzymes.
  • TET can further oxidize 5hmC to 5 -formylcytosine (5fC) and 5 -carboxylcytosine (5caC), which exists in much lower abundance in the mammalian genome compared to 5mC and 5hmC (10-fold to 100-fold lower than that of 5hmC).
  • 5mC and 5hmC play crucial roles in a broad range of biological processes from gene regulation to normal development.
  • the ability of TET enzymes to oxidize methylated nucleotides is useful in assays designed to detect the presence of methylated nucleotides in a nucleic acid sample. Accordingly, in further embodiments, the present invention provides methods of assaying methylation of a target nucleic acid in a nucleic acid sample using the engineered TET enzymes and orthologs described above. The present invention is not limited to any particular methylation assay. The engineered TET enzymes and orthologs described above may be utilized in any assay where oxidation of methylated cytosines (e.g., 5mC or 5hmC) in a target nucleic acid is useful.
  • methylated cytosines e.g., 5mC or 5hmC
  • the engineered TET enzymes and orthologs described above are used in Tet-Assisted Pyridine Borane Sequencing (TAPS).
  • TAPS Tet-Assisted Pyridine Borane Sequencing
  • TAB-Seq Tet-Assisted Bisulfite Sequencing
  • Embodiments of the present disclosure provide a bisulfite-free, base-resolution method for detecting 5 -methylcytosine (5mC) and 5 -hydroxymethylcytosine (5hmC) in a sequence (e.g., TAPS and associated methods TAPS(3 and CAPS, referred to collectively as TAPS), including for use with DNA obtained from blood samples (cellular DNA as well as cfDNA) and biopsies.
  • TAPS 5 -methylcytosine
  • 5hmC 5 -hydroxymethylcytosine
  • TAPS e.g., TAPS and associated methods TAPS(3 and CAPS, referred to collectively as TAPS
  • TAPS comprises the use of mild enzymatic and chemical reactions to detect 5mC and 5hmC directly and quantitatively at base-resolution without affecting unmodified cytosine.
  • the present disclosure also provides methods to detect 5 -formylcytosine (5fC) and 5 -carboxylcytosine (5caC) at base resolution without affecting unmodified cytosine.
  • the methods provided herein provide mapping of 5mC, 5hmC, 5fC and 5caC and overcome the disadvantages of previous methods such as bisulfite sequencing.
  • the methods of the present disclosure include the step of converting the 5mC and 5hmC (or just the 5mC if the 5hmC is blocked) to 5caC and/or 5fC.
  • this step comprises contacting the DNA or RNA sample with an engineered TET enzyme or ortholog as described above.
  • the engineered TET enzyme catalyzes the transfer of an oxygen molecule to the C5 methyl group on 5mC resulting in the formation of 5 -hydroxymethylcytosine (5hmC).
  • the engineered TET enzyme further catalyzes the oxidation of 5hmC to 5fC and the oxidation of 5fC to form 5caC.
  • Methods of the present disclosure can also include the step of converting the 5caC and/or 5fC in a nucleic acid sample to DHU.
  • this step comprises contacting the DNA or RNA sample with a reducing agent including, for example, a borane reducing agent such as pyridine borane, 2-picoline borane (pic-BEE), borane, sodium borohydride, sodium cyanoborohydride, sodium triacetoxyborohydride, triethylamine borane or tri(t-butyl)amine borane.
  • a borane reducing agent such as pyridine borane, 2-picoline borane (pic-BEE), borane, sodium borohydride, sodium cyanoborohydride, sodium triacetoxyborohydride, triethylamine borane or tri(t-butyl)amine borane.
  • the methods of the present disclosure include identifying 5mC in a DNA sample (targeted DNA or whole-genome), and providing a quantitative measure for the frequency of the 5mC modification at each location where the modification was identified in the DNA.
  • the percentages of the T at each transition location provide a quantitative level of 5mC at each location in the DNA.
  • methods for identifying 5mC can include the use of a blocking group. In other embodiments, methods for identifying 5mC do not require the use of a blocking group.
  • the 5hmC in the sample is blocked so that it is not subject to conversion to 5caC and/or 5fC.
  • the 5hmC in the sample DNA are rendered non-reactive to the subsequent steps by adding a blocking group to the 5hmC.
  • the blocking group is a sugar, including a modified sugar, for example glucose or 6-azide-glucose (6- azido-6-deoxy-D-glucose).
  • the sugar blocking group can be added to the hydroxymethyl group of 5hmC by contacting the DNA sample with uridine diphosphate (UDP)-sugar in the presence of one or more glucosyltransferase enzymes.
  • the glucosyltransferase is T4 bacteriophage [3-glucosyltransferase (J3GT), T4 bacteriophage a- glucosyltransferase (aGT), or derivatives and analogs thereof.
  • J3GT 3-glucosyltransferase
  • aGT T4 bacteriophage a- glucosyltransferase
  • PGT is an enzyme that catalyzes a chemical reaction in which a beta-D-glucosyl (glucose) residue is transferred from UDP-glucose to a 5 -hydroxymethylcytosine residue in a nucleic acid.
  • the methods of the present disclosure include identifying 5mC or 5hmC in a DNA sample (targeted DNA or whole-genome).
  • the method provides a quantitative measure for the frequency of the 5mC or 5hmC modifications at each location where the modifications were identified in the DNA.
  • the percentages of the T at each transition location provide a quantitative level of 5mC or 5hmC at each location in the DNA.
  • the method for identifying 5mC or 5hmC provides the location of 5mC and 5hmC, but does not distinguish between the two cytosine modifications. Rather, both 5mC and 5hmC are converted to DHU.
  • DHU can be detected directly, or the modified DNA can be replicated, for instance by methods of the present disclosure, where the DHU is converted to T.
  • methods for identifying 5hmC include the use of a blocking group. In other embodiments, methods for identifying 5hmC do not require the use of a blocking group.
  • the present disclosure provides a method for identifying 5mC and identifying 5hmC in a DNA by performing the method for identifying 5mC on a first DNA sample, and performing the method for identifying 5mC or 5hmC on a second DNA sample.
  • the first and second DNA samples are derived from the same DNA sample.
  • the first and second samples may be separate aliquots taken from a sample comprising DNA to be analyzed (e.g., cellular DNA or cfDNA).
  • any existing 5fC and 5caC in the DNA sample will be detected as 5mC and/or 5hmC.
  • the 5fC and 5caC signals can be eliminated by protecting the 5fC and 5caC from conversion to DHU by, for example, hydroxylamine conjugation and EDC coupling, respectively.
  • the method identifies the locations and percentages of 5hmC in the DNA through the comparison of 5mC locations and percentages with the locations and percentages of 5mC or 5hmC (together).
  • the location and frequency of 5hmC modifications in a DNA can be measured directly.
  • identifying 5fC and/or 5caC provides the location of 5fC and/or 5caC, but does not distinguish between these two cytosine modifications. Rather, both 5fC and 5caC are converted to DHU, which is detected by the methods described herein.
  • the method includes identifying 5caC in a DNA sample (targeted DNA or whole-genome), and provides a quantitative measure for the frequency of the 5caC modification at each location where the modification was identified in the DNA.
  • the percentages of the T at each transition location provide a quantitative level of 5caC at each location in the DNA.
  • methods for identifying 5caC can include the use of a blocking group. In other embodiments, methods for identifying 5caC do not require the use of a blocking group.
  • adding a blocking group to the 5fC in the DNA sample comprises contacting the DNA with an aldehyde reactive compound including, for example, hydroxylamine derivatives, hydrazine derivatives, and hydrazide derivatives.
  • Hydroxylamine derivatives include ashydroxylamine; hydroxylamine hydrochloride; hydroxylammonium acid sulfate; hydroxylamine phosphate; O-methylhydroxylamine; O -hexylhydroxylamine; O- pentylhydroxylamine; O-benzylhydroxylamine; and particularly, O-ethylhydroxylamine (EtONH2), O-alkylated or O-arylated hydroxylamine, acid or salts thereof.
  • EtONH2 O-ethylhydroxylamine
  • Hydrazine derivatives include N-alkylhydrazine, N-arylhydrazine, N- benzylhydrazine, N,N- dialkylhydrazine, N,N -diarylhydrazine, N,N -dibenzylhydrazine, N,N-alkylbenzylhydrazine, N, N-ary Ibenzylhydrazine, and N,N-alkylarylhydrazine.
  • Hydrazide derivatives include - toluenesulfonylhydrazide, N-acylhydrazide, N,N-alkylacylhydrazide, N,N- benzylacylhydrazide, N,N-arylacylhydrazide, N-sulfonylhydrazide, N,N- alkylsulfonylhydrazide, N,N-benzylsulfonylhydrazide, and N,N-arylsulfonylhydrazide.
  • the method includes identifying 5fC in a DNA sample (targeted DNA or whole-genome), and provides a quantitative measure for the frequency of the 5fC modification at each location where the modification was identified in the DNA.
  • the percentages of the T at each transition location provide a quantitative level of 5fC at each location in the DNA.
  • methods for identifying 5fC can include the use of a blocking group. In other embodiments, methods for identifying 5fC do not require the use of a blocking group.
  • adding a blocking group to the 5caC in the DNA sample can be accomplished by (i) contacting the DNA sample with a coupling agent, for example a carboxylic acid derivatization reagent like carbodiimide derivatives such as l-ethyl-3-(3- dimethylaminopropyl)carbodiimide (EDC) or N,N'-dicyclohexylcarbodiimide (DCC), and (ii) contacting the DNA sample with an amine, hydrazine or hydroxylamine compound.
  • a coupling agent for example a carboxylic acid derivatization reagent like carbodiimide derivatives such as l-ethyl-3-(3- dimethylaminopropyl)carbodiimide (EDC) or N,N'-dicyclohexylcarbodiimide (DCC)
  • 5caC can be blocked by treating the DNA sample with EDC and then benzylamine, ethylamine, or another amine to form an amide that blocks 5caC from conversion to DHU (e.g., by borane reduction).
  • the engineered TET enzymes may be used in TAB-seq protocols.
  • 5hmC is preferably blocked with a sugar as described above for the TAPS protocol.
  • 5mC in the blocked target nucleic acid is then oxidized to 5caC using an engineered TET enzyme or ortholog of the present invention.
  • the oxidized nucleic acid is treated with sodium bisulfite to reduce the 5caC to Uracil which is read as a T when sequenced.
  • the TAB-Seq protocol may preferably combined with other sequencing protocols, for example, traditional bisulfite sequencing (methylC-Seq) to distinguish 5hmC from 5mC.
  • methylC-Seq a target nucleic acid is treated with sodium bisulfite so that unmethylated cytosines are deaminated to uracil and read as a T whereas 5mC and 5hmC are read as C. Combination of methyl-Seq with TAB-Seq allows 5mC and 5hmC residues to be distinguished.
  • the engineered TET enzymes may be used in Enzymatic Methyl Sequencing (EM-seq) protocols.
  • Enzymatic Methyl-seq (for New England Biolabs) is a two-step enzymatic conversion process to detect modified cytosines.
  • the first step uses a TET enzyme, which can be an engineered TET enzyme or ortholog of the present invention, and an oxidation enhancer to protect modified cytosines from downstream deamination.
  • the TET enzyme oxidizes 5mC and 5hmC through a cascade reaction into 5fC and 5caC as described above. In the EM-seq protocol, this step protects 5mC and 5hmC from deamination.
  • 5hmC can also be protected from deamination by glucosylation to form 5ghmC using the oxidation enhancer.
  • the second enzymatic step uses APOBEC (Apolipoprotein B mRNA Editing Catalytic Polypeptide-like) enzyme to deaminate C to U but does not convert 5caC and 5ghmC.
  • APOBEC Polipoprotein B mRNA Editing Catalytic Polypeptide-like
  • the methods further comprise a sequencing step so that a methylation signature may be obtained.
  • the method includes isolating DNA (e.g., cellular or cfDNA) from a sample; preparing a sequencing library comprising the DNA; and performing TAPS or TAB-Seq on the sequencing library to obtain a methylation signature of the DNA.
  • the methylation signature is a whole -genome methylation signature.
  • preparing the sequencing library comprises ligating sequencing adapters to the isolated DNA to facilitate performing a sequencing reaction.
  • Suitable sequencing adapters for massively parallel sequencing technologies may be utilized.
  • the present invention is not limited to any particular sequencing technology.
  • sequencing technologies such as those provided by Illumina or Nanopore may be utilized.
  • suitable sequencing technologies for use in the present invention include, but are not limited to, those described in US Pat. Publ. 20100120098, US Pat. Publ. 20120208705, US Pat. Publ. 20120208724, International Pat. Publ. WO2012/061832, and US Pat. Publ. 2015/0368638, each of which is incorporated herein by reference in its entirety.
  • the adapter comprises one or more sites that can hybridize to a primer.
  • an adapter comprises at least a first primer site.
  • an adapter comprises at least a first primer site and a second primer site.
  • the orientation of the primer sites in such embodiments can be such that a primer hybridizing to the first primer site and a primer hybridizing to the second primer site are in the same orientation, or in different orientations.
  • the primer sequence in the linker can be complementary to a primer used for amplification. In another embodiment, the primer sequence is complementary to a primer used for sequencing.
  • a linker can include a first primer site, a second primer site having a non-amplifiable site disposed therebetween.
  • the non-amplifiable site is useful to block extension of a polynucleotide strand between the first and second primer sites, wherein the polynucleotide strand hybridizes to one of the primer sites.
  • the non-amplifiable site can also be useful to prevent concatamers. Examples of non-amplifiable sites include a nucleotide analogue, non-nucleotide chemical moiety, amino-acid, peptide, and polypeptide.
  • a non-amplifiable site comprises a nucleotide analogue that does not significantly basepair with A, C, G or T.
  • Some embodiments include a linker comprising a first primer site, a second primer site having a fragmentation site disposed therebetween.
  • Other embodiments can use a forked or Y-shaped adapter design useful for directional sequencing, as described in U.S. Pat. No. 7,741,463, which is incorporated herein by reference.
  • the adapter may comprise an index or barcode sequence.
  • the adapter may comprise a Unique Molecular Identifier (UMI).
  • carrier nucleic acids or a mix of carrier nucleic acids are added to the sequencing library prior to performing TAPS.
  • Carrier nucleic acids can be any specific or non-specific DNA molecules (or nucleic acid derivatives thereof) that enhance one or more aspects of DNA recovery from a sample.
  • DNA methylation signatures are useful for understanding basic biological processes and disease pathology as well as for disease detection.
  • methylation signatures/frequencies/markers etc. can be useful in understanding and studying gene regulation, genomic imprinting, differentiation, development, geneenvironment interaction (e.g., smoking, nutrition), aging, numerous diseases and conditions (e.g., auto-immune diseases, cancer, cardiovascular diseases, CNS diseases, congenital diseases, infectious diseases, metabolic diseases and status, NIPT-related testing, etc.), for detecting and diagnosing cancer and other diseases and for monitoring transplants.
  • diseases and conditions e.g., auto-immune diseases, cancer, cardiovascular diseases, CNS diseases, congenital diseases, infectious diseases, metabolic diseases and status, NIPT-related testing, etc.
  • the method further comprises identifying at least one methylation biomarker from the DNA methylation signature (such as a whole-genome DNA methylation signature) and determining if the methylation biomarker differs from the methylation biomarker in a reference or control sequence.
  • the methylation biomarker comprises a differentially methylated region (DMR).
  • the method further comprises classifying the sample based on the DMR as compared to a reference DMR.
  • the reference DMR corresponds to a non-disease control, or a disease control.
  • the method further comprises identifying at least one methylation biomarker from the DNA methylation signature, and determining a tissue-of-origin corresponding to the methylation biomarker. In some embodiments, the method further comprises classifying the sample based on the tissue-of- origin biomarker.
  • the method further comprises identifying a DNA fragmentation profile, and determining whether the fragmentation profile is indicative of disease, such as cancer.
  • DNA fragmentation profile can be determined from TAPS sequencing data (e.g., read pair alignment positions).
  • the method further comprises identifying at least one sequence variant in the DNA sample, and determining whether the sequence variant is indicative of disease, such as cancer.
  • TAPS can also differentiate methylation from C-to-T genetic variants or single nucleotide polymorphisms (SNPs), and therefore, can be used to detect genetic variants.
  • methylations and C-to-T SNPs can result in different patterns in TAPS.
  • methylations can result in T/G reads in an original top strand/original bottom strand, and A/C reads in strands complementary to these.
  • C-to-T SNPs can result in T/A reads in an original top strand/original bottom strand and strands complementary to these. This further increases the utility of TAPS in providing both methylation information and genetic variants, and therefore mutations, in one experiment and sequencing run.
  • This ability of the TAPS methods disclosed herein provides integration of genomic analysis with epigenetic analysis, and a substantial reduction of sequencing cost by eliminating the need to perform, for example, standard whole genome sequencing (WGS).
  • WGS standard whole genome sequencing
  • methods of the present disclosure include the use of TAPS or TAB-Seq to generate information pertaining to methylation signatures, methylation biomarkers, DNA fragment profdes, DNA sequence information (e.g., variants), and tissue-of-origin information in a single experiment to diagnose/detect a disease or other condition in a subject.
  • TAPS or TAB-Seq as disclosed herein can be used to generate any combination of methylation signatures, methylation biomarkers, DNA fragment profdes, DNA sequence information (e.g., variants), and tissue-of-origin information to diagnose/detect a disease or other condition in a subject.
  • a methylation signature can be obtained, and one or more of a methylation biomarker, a DNA fragment profde, DNA sequence information (e.g., variants), and tissue-of-origin information can also be obtained and used to diagnose/detect a disease or other condition in a subject.
  • the methylation status of a biomarker can be obtained, and one or more of a methylation signature, a DNA fragment profde, DNA sequence information (e.g., variants), and tissue-of-origin information can also be obtained and used to diagnose/detect a disease or other condition in a subject.
  • a DNA fragmentation profde can be obtained, and one or more of a methylation signature, a methylation biomarker, DNA sequence information (e.g., variants), and tissue-of-origin information can also be obtained and used to diagnose/detect a disease or other condition in a subject.
  • a DNA sequence variant can be identified, and one or more of a methylation signature, a methylation biomarker, a DNA fragment profile, and tissue-of-origin information can also be obtained and used to diagnose/detect a disease or other condition in a subject.
  • tissue-of-origin information can be obtained (e.g., from a whole genome DNA methylation signature), and one or more of the methylation signature, a methylation biomarker, a DNA fragment profile, and DNA sequence information (e.g., variants), can also be obtained and used to diagnose/detect a disease or other condition in a subject.
  • performing TAPS or TAB-Seq on the sequencing library to obtain the whole-genome methylation signature comprises identifying 5mC modifications in the DNA and providing a quantitative measure for frequency of the 5mC modifications.
  • performing TAPS on the sequencing library to obtain the whole-genome methylation signature comprises identifying 5hmC modifications in the DNA and providing a quantitative measure for frequency of the 5hmC modifications.
  • performing TAPS on the sequencing library to obtain the whole-genome methylation signature comprises identifying 5caC modifications in the DNA and providing a quantitative measure for frequency of the 5caC modifications.
  • performing TAPS on the sequencing library to obtain the whole-genome methylation signature comprises identifying 5fC modifications in the DNA and providing a quantitative measure for frequency of the 5fC modifications.
  • the methods described herein can be used to diagnose/detect any type of cancer.
  • Types of cancers that can be detected/diagnosed using the methods of the present disclosure include, but are not limited to, lung cancer, melanoma, colon cancer, colorectal cancer, neuroblastoma, breast cancer, prostate cancer, renal cell cancer, transitional cell carcinoma, cholangiocarcinoma, brain cancer, non-small cell lung cancer, pancreatic cancer, liver cancer, gastric carcinoma, bladder cancer, esophageal cancer, mesothelioma, thyroid cancer, head and neck cancer, osteosarcoma, hepatocellular carcinoma, carcinoma of unknown primary, ovarian carcinoma, endometrial carcinoma, glioblastoma, Hodgkin lymphoma and non-Hodgkin lymphomas.
  • types of cancers or metastasizing forms of cancers that can be detected/diagnosed by the methods of the present disclosure include, but are not limited to, carcinoma, sarcoma, lymphoma, germ cell tumor and blastoma.
  • the cancer is invasive and/or metastatic cancer (e.g., stage II cancer, stage III cancer or stage IV cancer).
  • the cancer is an early-stage cancer (e.g., stage 0 cancer, stage I cancer), and/or is not invasive and/or metastatic cancer.
  • the present disclosure provides methods for identifying the location of one or more of 5mC, 5hmC, 5caC and/or 5fC in a nucleic acid quantitatively with base-resolution without affecting the unmodified cytosine.
  • the nucleic acid is DNA.
  • the DNA is cfDNA (e.g., circulating cfDNA).
  • the nucleic acid is RNA.
  • a nucleic acid sample comprises a target nucleic acid that is DNA or a target nucleic acid that is RNA.
  • the methods are applied to a whole genome, and not limited to a specific target nucleic acid.
  • the nucleic acid may be any nucleic acid having cytosine modifications (i.e., 5mC, 5hmC, 5fC, and/or 5caC) but not limited to, DNA fragments and/or genomic DNA.
  • the nucleic acid can be a single nucleic acid molecule in the sample, or may be the entire population of nucleic acid molecules in a sample, or any portion thereof (whole genome or a subset thereof).
  • the nucleic acid can be the native nucleic acid from the source (e.g., cells, tissue samples, etc.) or can pre-converted into a high-throughput sequencing -ready form, for example by fragmentation, repair and ligation with adapters for sequencing.
  • nucleic acids can comprise a plurality of nucleic acid sequences such that the methods described herein may be used to generate a library of target nucleic acid sequences that can be analyzed individually (e.g., by determining the sequence of individual targets) or in a group (e.g., by high-throughput or next generation sequencing methods).
  • the methods of the present disclosure can also include the step of amplifying the copy number of a modified nucleic acid by methods known in the art.
  • the modified nucleic acid is DNA
  • the copy number can be increased by, for example, PCR, cloning, and primer extension.
  • the copy number of individual target DNAs can be amplified by PCR using primers specific for a particular target DNA sequence.
  • a plurality of different modified target DNA sequences can be amplified by cloning into a DNA vector by standard techniques.
  • the copy number of a plurality of different modified target DNA sequences is increased by PCR to generate a library for next generation sequencing where, e.g., double-stranded adapter DNA has been previously ligated to the sample DNA (or to the modified sample DNA) and PCR is performed using primers complimentary to the adapter DNA.
  • the method comprises the step of detecting the sequence of the modified nucleic acid.
  • the modified target DNA or RNA contains DHU at positions where one or more of 5mC, 5hmC, 5fC, and 5caC were present in the unmodified target DNA or RNA.
  • DHU acts as a T in DNA replication and sequencing methods.
  • the cytosine modifications can be detected by any direct or indirect method that identifies a C to T transition known in the art. Such methods include sequencing methods such as Sanger sequencing, microarray, and next generation sequencing methods.
  • the C to T transition can also be detected by restriction enzyme analysis where the C to T transition abolishes or introduces a restriction endonuclease recognition sequence.
  • Embodiments of the present disclosure also provide systems or kits for oxidizing a methylated nucleotide (e.g., 5 -methylcytosine (5mC) and 5 -hydroxymethylcytosine (5hmC)).
  • a methylated nucleotide e.g., 5 -methylcytosine (5mC) and 5 -hydroxymethylcytosine (5hmC)
  • the systems or kits comprise an engineered TET enzyme or ortholog as described in detail above.
  • the systems or kits may further comprise a borane reducing agent.
  • the borane reducing agent is selected from: pyridine borane, 2-picoline borane (pic-BEE), borane, sodium borohydride, sodium cyanoborohydride, sodium triacetoxyborohydride, diborane, decaborane, borane tetrahydrofuran, borane-dimethyl sulfide, borane-N,N-diisopropylethylamine, borane-2 -chloropyridine, borane-aniline, N,N- dimethylamine borane, tert-butylamine borane sodium triacetoxyborohydride, boron hydride, hydrazine or dibutylamine borane, morpholine borane, borane-ammonia complex (BH3NH3), dicyclohexylamine borane, morpholine borane, 4-methyl
  • the systems or kits further comprise a blocking group and/or a glucosyltransferase enzyme.
  • the blocking group is a sugar.
  • the sugar is a naturally-occurring sugar or a modified sugar, for example glucose or a modified glucose.
  • the blocking group functions with UDP linked to a sugar, for example UDP-glucose or UDP linked to a modified glucose in the presence of a glucosyltransferase enzyme, for example, T4 bacteriophage (3- glucosyltransferase ([3GT) and T4 bacteriophage a-glucosyltransferase (aGT) and derivatives and analogs thereof.
  • Such systems or kits may also be used for and comprise additional components necessary for the detection and identification of the methylated nucleotide.
  • the systems or kits may further comprise sequencing reagents (e.g., primers, probes, nucleotides, buffers, control nucleic acid sequences, polymerases, etc.), restriction endonucleases, and the like for detecting the methylated nucleotide.
  • sequencing reagents e.g., primers, probes, nucleotides, buffers, control nucleic acid sequences, polymerases, etc.
  • restriction endonucleases e.g., restriction endonucleases, and the like for detecting the methylated nucleotide.
  • the systems or kits may include instructions for use in any of the methods described herein. Instructions included in the kit may be affixed to packaging material or may be included as a package insert. The instructions may be written or printed materials but are not limited to such. Any medium capable of storing such instructions and communicating them to an end user is contemplated by this disclosure. Such media include, but are not limited to, electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), optical media (e.g., CD ROM), etc. As used herein, the term “instructions” may include the address of an internet site that provides the instructions.
  • kits provided herein are in suitable packaging.
  • suitable packaging includes, but is not limited to, vials, bottles, jars, flexible packaging, and the like. Kits optionally may provide additional components such as buffers and interpretive information.
  • the kit comprises a container and a label or package insert(s) on or associated with the container.
  • the disclosure provides articles of manufacture comprising contents of the kits described above.
  • TAPS engineered TET enzymes amenable to high-volume, low-cost production that also exhibit the high performing functional characteristics essential for extremely sensitive assay applications such as TAPS.
  • TAPS is a new technology, others in the field have not faced the specific problem of identifying, engineering, or designing an enzyme optimized for TAPS.
  • TAB-Seq Yu et al. “Baseresolution analysis of 5-hydroxymethylcytosine in the mammalian genome” Cell. 2012 Jun 8; 149(6): 1368-80) and its incorporation in the NEB Next Enzymatic Methyl-seq Kit or EM- Seq.
  • the TAB-seq method employs the Mouse TET1 catalytic domain comprising amino acids 1367-2039 as expressed from the gene GU079948 including the LCR.
  • the NEB Next Enzymatic Methyl-seq Kit utilizes a Mouse TET2 enzyme and is advertised to provide a high-performance enzyme-based alternative to bisulfite conversion for methyl ome analysis using Illumina® sequencing.
  • the mouse TET2 enzyme from this kit was directly compared with the Mouse TET1 enzyme (as described above for the TAB- seq method)
  • the Mouse TET1 performed better in TAPS and had higher conversion rates than the Mouse TET2 version from the NEB kit. Therefore, while the Mouse TET2 enzyme may be more amenable to commercial production, its limited performance relative to the Mouse TET1 enzyme would curtail the sensitivity advantages otherwise offered by the TAPS process.
  • the NEB enzyme is essentially the wild-type Mouse TET2 sequence and does not perform to the tolerances required of TAPs (as discussed above), either in an RUO device or in a clinical diagnostic setting - where high stability and processivity are required to achieve the highest levels of sensitivity (necessary for low titer clinical samples).
  • Yield is expressed as mg of pure active protein per ml of cell culture harvested.
  • mouse TET1 also referred to as mTET I or mTET
  • the enzyme was modified with a dual N-terminal tag of Hexa-Histidine (His-tag) followed by a Flag -tag.
  • His-tag was to act as the primary purification option via Immobilized Metal Affinity Chromatography (IMAC), while the Flag-tag gave the option of anti-Flag-tag ImmunoPrecipitation (IP) or “pull-down”.
  • the dual tags also provided two means to track the protein via, for example, Western blotting (i.e., anti-His or anti-Flag primary antibodies) and a means to remove the tag if desired, as the Flag-tag sequence was also an Enterokinase recognition site.
  • Western blotting i.e., anti-His or anti-Flag primary antibodies
  • the His-Flag tagged mTETI was cloned into an expression vector and expressed in E. coli.
  • the His-Flag tagged soluble TET was captured via Ni IMAC then applied to SEC (Size Exclusion Chromatography).
  • SEC Size Exclusion Chromatography
  • the SEC profile provided in FIG. 2, illustrates both the low expression level (the main peak only reaches 75 mAU) and the relative impurity of the prep following IMAC. This impurity is in part a factor of the relative enrichment of non- specifically binding host proteins when the target protein levels are so low, but it is also due to aggregates and proteolysis products of the TET co-purifying with the intact enzyme.
  • a typical final yield for the mTETI Full Length (FL) CD expressed and extracted from E. coli was ⁇ 1 mg/L of total cell culture, making economic production at large-scale unviable. See FIG. 2.
  • any unstructured regions may be removed or replaced with linkers.
  • Other groups have already shown that removal or replacement of the LCR is beneficial in terms of TET CD crystallization (Hue et al. 2013, supra), consequently this was the first approach applied for the current Example.
  • Whole and partial deletion of the LCR was tested along with insertion of single and double GS linkers - flexible linkers with sequences consisting primarily of stretches of Gly and Ser residues (Chen et al., “Pusion protein linkers: property, design and functionality”. Adv Drug Deliv Rev. 2013 Oct;65(10): 1357-69). The options tested are shown in FIG 3. After testing these constructs, option C in FIG.
  • the extreme C-terminal region of the Catalytic Domain is reported to have specific functions in vivo (Hrit et al., supra) related to the associated O-GlcNAc transferase (OGT) protein - at least in the Mammalian TET1 isoform and probably in all three isoforms (Deplus et al. (2013) “TET2 and TET3 regulate GlcNAcylation and H3K4 methylation through OGT and SET1/COMPASS” EMBO J. 2013 Mar 6;32(5):645-55; Chen et al. “TET2 promotes histone O-GlcNAcylation during gene transcription” Nature.
  • OGT O-GlcNAc transferase
  • Ng Naegleria gruberi
  • the C-terminus was also truncated at the final 44 residues (highlighted in FIG. 4) corresponding to removal of the OGT binding region of the protein (as described in Hrit et al, supra). Truncation at this position had the desired positive impact on integrity and stability while also generating the most active of the various C-terminal truncation options as shown in Table 4.
  • Table 4 provides data on the different A/mTET2 enzymes, including yield, activity in a TAPS assay, and stability. It is possible that truncations removing still fewer residues (i.e., fewer than 44) may have diminishing benefits, as more of the proteolytically labile region would be retained.
  • C-44 C-terminal deletion of last 44 residues.
  • C-64 and C-77 refer to deletions of 64 & 77 residues respectively.
  • Table 4 provides results for four A/mTET2 CD constructs, each having a His-Flag N-terminal tag and their LCR replaced by a GS linker. They differed at the C-terminus which was variously either Full Length (as wild-type) or truncated by the noted number of residues (i.e., -44, -64 and -77). The most active construct was observed to be the A/mTET2 deltaLCR C-44. There appeared to be a minor sacrifice of thermal stability relative to the wild-type but this was rewarded by the elimination of proteolytic degradation. Note that truncation by 77 residues halved activity, and further truncation is known to eliminate activity (Hu et al. 2013, supra).
  • type-2 TET (/. e. cocktail A/mTET2) was modified such that its Cys distribution mimicked type-1 TET (i.e., A/mTET I ). As shown below, it was observed that type-1 TET enzymatic performance was achieved in type-2 TET with these modifications.
  • a preferred engineered A/mTET2 was thus developed and is represented by the following sequence:
  • A/mTET2_N-His-Flag_CD_dcltaLCR_l xGS Cys Swap -44 C-term also referred to herein as TET vl.0; SEQ ID NO: 16
  • This engineered A/mTET2 comprises the following features: (1) Mutations: C1168Y, C1171P, C1185T, C1272R, A1286C, S1291C, K1322C, C1772S, C1792K, C1827G, M1835C, E1837C
  • Truncations of the C-terminus from its extreme end (position) up to and including (position - e 77 residues from the end) are viable constructs with preferred embodiments being C-termini ending at: (position delta 44) or (position delta 65).
  • Table 5 below shows the positions in the catalytic domain of mouse TET2 at which cysteine residues were replaced with non-cysteine residues or non-cysteine residues were replaced cysteine residues to replicate the cysteine distribution in the catalytic domain of mouse TET1.
  • This engineered A/mTET2 was tested as described below.
  • Table 5 also provides positions at which cysteine swaps may be made in A/mTET3 (see, e.g., SEQ ID NO: 17), 7/.sTET2 (see, e.g, SEQ ID NO: 18), and Lygus hesperus TET2 (see, e.g., SEQ ID NO: 19).
  • Cysteine Number refers to the position in the CDs of either A/mTET I or A/mTET2 isoforms where a Cysteine is present in one or the other of the isoforms.
  • Column 3 provides the absolute numbering of the positions of interest, the numbers being taken from the sp-Q4JK59 Uniprot deposition (SEQ ID NO: 9) for the entire A/mTET2 protein (i.e., residues 1 - 1912 inclusive).
  • Column 4 lists the mutations made in the preferred embodiment using numbering from the sp-Q4JK59 Uniprot deposition.
  • Column 5 lists all the positions of interest and the mutations made using numbering from SEQ ID NO: 16.
  • the actual position in the engineered enzyme may differ due to the inclusion of features such as purification tags, a truncated, N terminus, a truncated C terminus, a deleted LCR and substitution of a linker sequence for the LCR. Table 5. Cysteine swaps in selected engineered TET enzymes
  • FIG. 6 provides a detailed scheme of A/mTET2 features relating to Catalytic Domain (CD) engineering (see, e.g., SEQ ID NO: 16).
  • TET enzymes contain a Catalytic Domain (CD) comprising a conserved double-stranded (3-helix (DSBH) domain, a cysteine- rich domain, and binding sites for the cofactors Fe(II) and 2-oxoglutarate (2-OG) that together form the core catalytic region in the C-terminus.
  • CD Catalytic Domain
  • DSBH conserved double-stranded (3-helix (DSBH) domain
  • cysteine- rich domain a conserved double-stranded (3-helix (DSBH) domain
  • 2-OG 2-oxoglutarate
  • the Mouse TET2 CD sequence spans the equivalent of Q1042 to G1868 of Uniprot: Q4JK59.
  • FIG. 7 provides data showing that the cysteine swaps as applied to A/mTET2 i.e., SEQ ID NO: 16, also referred to herein as TET vl.O) expressed in insect cells improved yield and enzyme activity in TAPS.
  • FIG. 8 provides SDS-PAGE analysis of A/mTET2 with a deleted LCR as compared to TET vl.O, both produced in E. colt.
  • the TET vl.O demonstrated a more homogenous preparation from E. colt as assayed by SDS- PAGE, which showed a doublet for A/mTET2 with a deleted LCR and a single resolved band for TET vl.O.
  • TET vl .0 was also assayed for activity in TAPS as compared to various other TET enzymes.
  • TET vl.O was compared to wild-type A7h?TET2 CD with a deleted LCR (A/mTET2 CD ALCR). The difference between the two enzymes is that the TET vl.O contains cysteine swaps while the A/mTET2 CD ALCR does not contain cysteine swaps.
  • the oxidation step was conducted for either 30 or 60 minutes followed by borane reduction and sequencing.
  • the data which is provided in FIG. 9, shows that TET vl.O demonstrated increased % conversion in the TAPS assay as compared to A/mTET2 CD ALCR. The % conversion shown in FIG.
  • FIG. 9 is the % of methylated Cytosines detected in a fully methylated Lambda sample.
  • three preparations of TET vl.O were compared to a commercial A/mTET2 (New England Biolabs) for activity in TAPS as measured by % conversion. The data is presented in FIG. 10, which shows that TET vl.O had an increased % conversion (above 95%) as compared to the commercial enzyme (about 91%).
  • a TET enzyme activity assay was used to compare the activity of TET vl .0 to A/mTET2 CD ALCR over a range pf temperature, pH, and NaCl concentrations. The data is presented in FIG.
  • FIG. 11 shows that the cysteine swap enhances the biochemical activity of the enzyme over wider temperatures (FIG. 11A), a range of pH conditions (FIG. 1 IB) and a range of NaCl concentrations (FIG. 11C), when compared to the wild-type protein.
  • Table 6 below provides a summary of conversion rates, activity and stability of TET vl .0 as compared to /mTET I CD, A/mTET2 ALCR with or without C terminal truncations, and a A/mTET2 ALCR with cysteine swaps but without a C terminal truncation.
  • TET vl.O demonstrated the best combination of yield, activity and stability.
  • the DNA binding affinity of TET vl .0 as compared to commercial A/mTET2 was also examined.
  • the data is provided in FIG. 12.
  • the SPR sensorgram shows a direct comparison of TET v 1.0 in light grey (higher signal sensorgram), with the commercially available A/mTET2 in black (lower signal sensorgram).
  • DNA was immobilized on the SPR chip surface and equimolar solutions of the enzymes were flowed across it.
  • the degree of enzyme binding to DNA was measured in Response Units (RU) plotted on the Y axis.
  • TET vl.O has greater DNA binding affinity than the commercial enzyme.
  • TET vl .0 was tested alongside the New England BioLabs (NEB) mTET2 in both the TAPS assay and the Enzymatic Methyl-seq (Em-seq) assay from NEB. The results are shown in Figure 13. As demonstrated in FIG. 13, TET vl.O significantly outperforms the NEB mTET2 in the TAPS assay when used with either 5X buffer (Exact Sciences buffer) or NEB buffer. TET vl .0 also performs equivalent or better in the Em-Seq assay when used in Exact Sciences buffer or the NEB buffer.
  • the cysteine swap strategy described can be applied to any TET CD to improve activity and developability characteristics, in specific embodiments they may be applied to A/mTET2. A/mTET3. Lygus hesperus TET2, as well as other orthologs.
  • the data show that the distribution of non-conserved cysteines may influence distinct isoformal characteristics (e.g., improved performance in NGS assays such as TAPS) by means unknown - possibly via interactions with substrate DNA.
  • TET enzymes were selected for further testing based on their expression levels: SEQ ID NO: 19 (Lygus hesperus TET2), SEQ ID NO:20 (Aedes aegipty TET1), SEQ ID NO:21 (Anopheles gambia TET1), SEQ ID NO:22 (Drosophila ananassae TET1), SEQ ID NO:23 (Drosophila melanogaster TET1), SEQ ID NO:24 (Guinea pig TET1), SEQ ID NO:25 (Lama pacos TET1), SEQ ID NO:26 (Macaca fascicularis TET3), SEQ ID NO:27 (Mus adji TET2), SEQ ID NO:28 (Rabbit TET2), SEQ ID NO:29 (Xenopus laevis TET1), SEQ ID NO:30 (Chanos chanos TET3), SEQ ID NO:31
  • Lygus hesperus TET2 had especially promising results and has low sequence identity (about 40%) to A/mTET2. It should be noted that all orthologs tested contained a full length C-terminal region and it is expected that truncation of the C terminal region as described for A/mTET2 will additionally improve expression, stability and activity. Additionally, no cysteine swaps were incorporated in the orthologs tested in the example. Cysteine swaps are also expected to improve expression, stability and activity.
  • Table 8 Yield (mg/L) and activity (via an in-house assay) and TAPS % methylation in a baculovirus expression system for MwtTETl, V/mTET2, and Af/ «TET3 constructs with and without LCR and cysteine swaps

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Plant Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Pyridine Compounds (AREA)

Abstract

La présente invention concerne des enzymes TET modifiées qui trouvent une utilisation dans l'épigénétique et le séquençage de nouvelle génération (NGS), et plus spécifiquement des procédés de séquençage tels que le séquençage de borane-pyridine assisté par tet (TAPS).
PCT/IB2024/054026 2023-04-28 2024-04-24 Enzymes tet modifiées et utilisation dans l'épigénétique et le séquençage de nouvelle génération (ngs), tels que le séquençage de borane-pyridine assisté par tet (taps). Pending WO2024224324A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2024264100A AU2024264100A1 (en) 2023-04-28 2024-04-24 Engineered tet enzymes and use in epigenetics and next generation sequencing (ngs), such as tet-assisted pyridine borane sequencing (taps)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363498874P 2023-04-28 2023-04-28
US63/498,874 2023-04-28

Publications (1)

Publication Number Publication Date
WO2024224324A1 true WO2024224324A1 (fr) 2024-10-31

Family

ID=91022680

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2024/054026 Pending WO2024224324A1 (fr) 2023-04-28 2024-04-24 Enzymes tet modifiées et utilisation dans l'épigénétique et le séquençage de nouvelle génération (ngs), tels que le séquençage de borane-pyridine assisté par tet (taps).

Country Status (2)

Country Link
AU (1) AU2024264100A1 (fr)
WO (1) WO2024224324A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025224162A1 (fr) * 2024-04-24 2025-10-30 Qugen Gmbh Nouvelles enzymes tet3 recombinantes

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100120098A1 (en) 2008-10-24 2010-05-13 Epicentre Technologies Corporation Transposon end compositions and methods for modifying nucleic acids
US7741463B2 (en) 2005-11-01 2010-06-22 Illumina Cambridge Limited Method of preparing libraries of template polynucleotides
WO2012061832A1 (fr) 2010-11-05 2012-05-10 Illumina, Inc. Liaison entre des lectures de séquences à l'aide de codes marqueurs appariés
US20120208724A1 (en) 2011-02-10 2012-08-16 Steemers Frank J Linking sequence reads using paired code tags
US20120208705A1 (en) 2011-02-10 2012-08-16 Steemers Frank J Linking sequence reads using paired code tags
US20150368638A1 (en) 2013-03-13 2015-12-24 Illumina, Inc. Methods and compositions for nucleic acid sequencing
US20180171397A1 (en) 2015-10-30 2018-06-21 New England Biolabs, Inc. Compositions and Methods for Analyzing Modified Nucleotides
WO2019136413A1 (fr) 2018-01-08 2019-07-11 Ludwig Institute For Cancer Research Ltd Identification par résolution de base sans bisulfite de modifications de cytosine
WO2021005537A1 (fr) 2019-07-08 2021-01-14 The Chancellor, Masters And Scholars Of The University Of Oxford Analyse de méthylation du génome entier sans bisulfite
WO2021161192A1 (fr) 2020-02-11 2021-08-19 The Chancellor, Masters And Scholars Of The University Of Oxford Séquençage d'acide nucléique à lecture longue cible pour la détermination de modifications de cytosine
WO2022053872A1 (fr) 2020-09-14 2022-03-17 The Chancellor, Masters And Scholars Of The University Of Oxford Analyse de modifications de cytosine
WO2023007241A2 (fr) 2021-07-27 2023-02-02 The Chancellor, Masters And Scholars Of The University Of Oxford Compositions et procédés liés au séquençage de pyridine borane assisté par tet pour adn acellulaire

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7741463B2 (en) 2005-11-01 2010-06-22 Illumina Cambridge Limited Method of preparing libraries of template polynucleotides
US20100120098A1 (en) 2008-10-24 2010-05-13 Epicentre Technologies Corporation Transposon end compositions and methods for modifying nucleic acids
WO2012061832A1 (fr) 2010-11-05 2012-05-10 Illumina, Inc. Liaison entre des lectures de séquences à l'aide de codes marqueurs appariés
US20120208724A1 (en) 2011-02-10 2012-08-16 Steemers Frank J Linking sequence reads using paired code tags
US20120208705A1 (en) 2011-02-10 2012-08-16 Steemers Frank J Linking sequence reads using paired code tags
US20150368638A1 (en) 2013-03-13 2015-12-24 Illumina, Inc. Methods and compositions for nucleic acid sequencing
US20180171397A1 (en) 2015-10-30 2018-06-21 New England Biolabs, Inc. Compositions and Methods for Analyzing Modified Nucleotides
WO2019136413A1 (fr) 2018-01-08 2019-07-11 Ludwig Institute For Cancer Research Ltd Identification par résolution de base sans bisulfite de modifications de cytosine
US20200370114A1 (en) 2018-01-08 2020-11-26 Ludwig Institute For Cancer Research Ltd Bisulfite-free, base-resolution identification of cytosine modifications
US20210317519A1 (en) 2018-01-08 2021-10-14 Ludwig Institute For Cancer Research Ltd Bisulfite-free, base-resolution identification of cytosine modifications
WO2021005537A1 (fr) 2019-07-08 2021-01-14 The Chancellor, Masters And Scholars Of The University Of Oxford Analyse de méthylation du génome entier sans bisulfite
WO2021161192A1 (fr) 2020-02-11 2021-08-19 The Chancellor, Masters And Scholars Of The University Of Oxford Séquençage d'acide nucléique à lecture longue cible pour la détermination de modifications de cytosine
WO2022053872A1 (fr) 2020-09-14 2022-03-17 The Chancellor, Masters And Scholars Of The University Of Oxford Analyse de modifications de cytosine
WO2023007241A2 (fr) 2021-07-27 2023-02-02 The Chancellor, Masters And Scholars Of The University Of Oxford Compositions et procédés liés au séquençage de pyridine borane assisté par tet pour adn acellulaire

Non-Patent Citations (18)

* Cited by examiner, † Cited by third party
Title
"Bioinformatics for Cancer Immunotherapy : Methods and Protocols", vol. 2198, 22 August 2020, SPRINGER, New York, NY, ISBN: 978-1-0716-0326-0, article HUANG ZHIJUN ET AL: "High-Resolution Analysis of 5-Hydroxymethylcytosine by TET-Assisted Bisulfite Sequencing : Methods and Protocols", pages: 321 - 331, XP055867417, DOI: 10.1007/978-1-0716-0876-0_25 *
CHEN ET AL.: "Fusion protein linkers: property, design and functionality", ADV DRUG DELIV REV., vol. 65, no. 10, October 2013 (2013-10-01), pages 1357 - 69, XP028737352, DOI: 10.1016/j.addr.2012.09.039
CHEN ET AL.: "TET2 promotes histone O-GlcNAcylation during gene transcription", NATURE., vol. 493, 2013, pages 561 - 564, XP055374112, DOI: 10.1038/nature11742
DATABASE UniProt [online] 10 May 2017 (2017-05-10), "RecName: Full=Methylcytosine dioxygenase TET {ECO:0000256|RuleBase:RU367064}; EC=1.14.11.80 {ECO:0000256|RuleBase:RU367064};", XP093182792, retrieved from EBI accession no. UNIPROT:A0A1U7R6L2 Database accession no. A0A1U7R6L2 *
DATABASE UniProt [online] 17 June 2020 (2020-06-17), "RecName: Full=Methylcytosine dioxygenase TET {ECO:0000256|RuleBase:RU367064}; EC=1.14.11.80 {ECO:0000256|RuleBase:RU367064};", XP093182840, retrieved from EBI accession no. UNIPROT:A0A671DWH2 Database accession no. A0A671DWH2 *
DAVIS ET AL., BASIC METHODS IN MOLECULAR BIOLOGY, 1986
DEPLUS ET AL.: "TET2 and TET3 regulate GlcNAcylation and H3K4 methylation through OGT and SET1/COMPASS", EMBO J., vol. 32, no. 5, 6 March 2013 (2013-03-06), pages 645 - 55
HRIT ET AL.: "OGT binds a conserved C-terminal domain of TET1 to regulate TET1 activity and function in development", ELIFE, vol. 7, 2018, pages e34870
HU ET AL.: "Crystal structure of TET2-DNA complex: insight into TET-mediated 5mc oxidation", CELL, vol. 155, 2013, pages 1545 - 1555, XP028806595, DOI: 10.1016/j.cell.2013.11.020
HU ET AL.: "Structural insight into substrate preference for TET-mediated oxidation", NATURE, vol. 527, 2015, pages 118 - 122
HU LULU ET AL: "Crystal Structure of TET2-DNA Complex: Insight into TET-Mediated 5mC Oxidation", CELL, vol. 155, no. 7, 19 December 2013 (2013-12-19), Amsterdam NL, pages 1545 - 1555, XP093183085, ISSN: 0092-8674, DOI: 10.1016/j.cell.2013.11.020 *
ITO ET AL.: "TET3-OGT interaction increases the stability and the presence of OGT in chromatin", GENES TO CELLS., vol. 19, 2014, pages 52 - 65, XP072055039, DOI: 10.1111/gtc.12107
MIER P ET AL.: "Disentangling the complexity of low complexity proteins", BRIEF BIOINFORM., vol. 21, no. 2, 23 March 2020 (2020-03-23), pages 458 - 472
RAIBER ET AL.: "Mapping and elucidating the function of modified bases in DNA", NAT. REV. CHEM., vol. 1, 2017, pages 0069
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR
TAHILIANI ET AL.: "Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1", SCIENCE, vol. 324, no. 5929, 15 May 2009 (2009-05-15), pages 930 - 5
YIBIN LIU ET AL: "Bisulfite-free direct detection of 5-methylcytosine and 5-hydroxymethylcytosine at base resolution", NATURE BIOTECHNOLOGY, 25 February 2019 (2019-02-25), New York, XP055575332, ISSN: 1087-0156, DOI: 10.1038/s41587-019-0041-2 *
YU ET AL.: "Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome", CELL, vol. 149, no. 6, 8 June 2012 (2012-06-08), pages 1368 - 80, XP055064960, DOI: 10.1016/j.cell.2012.04.027

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025224162A1 (fr) * 2024-04-24 2025-10-30 Qugen Gmbh Nouvelles enzymes tet3 recombinantes

Also Published As

Publication number Publication date
AU2024264100A1 (en) 2025-11-13

Similar Documents

Publication Publication Date Title
US20200102616A1 (en) COMPOSITION AND METHODS RELATED TO MODIFICATION OF 5 HYDROXYMETHYLCYTOSINE (5-hmC)
EP2080812A1 (fr) Compositions et procédés pour détecter des peptides post-stop
US8969061B2 (en) Compositions, methods and related uses for cleaving modified DNA
EA028490B1 (ru) Эндогликозидаза из streptococcus pyogenes и способы с применением указанной эндогликозидазы
WO2024112441A1 (fr) Adn désaminases double brin et leurs utilisations
JP2006526405A (ja) 示差的調節された肝細胞癌遺伝子およびその利用
AU2003290715A1 (en) Method for identifying risk of melanoma and treatments thereof
WO2024224324A1 (fr) Enzymes tet modifiées et utilisation dans l'épigénétique et le séquençage de nouvelle génération (ngs), tels que le séquençage de borane-pyridine assisté par tet (taps).
Yang et al. A genome-phenome association study in native microbiomes identifies a mechanism for cytosine modification in DNA and RNA
JP2024019511A (ja) がんにおける融合遺伝子
Swanson et al. Deaminase-assisted single-molecule and single-cell chromatin fiber sequencing
Zhou et al. Dual functional POGases from bacteria encompassing broader O-glycanase and adhesin activities
EP1370684B1 (fr) Polynucleotides lies au cancer du colon
US20240158833A1 (en) Compositions and Methods for Labeling Modified Nucleotides in Nucleic Acids
Gao et al. Amphioxus adenosine-to-inosine tRNA-editing enzyme that can perform C-to-U and A-to-I deamination of DNA
KR101720555B1 (ko) 이상 미토콘드리아 디엔에이, 관련된 융합 전사물 및 번역 산물 및 이에 대한 하이브리드화 탐침
US20250283152A1 (en) Nucleic acid amplification using promoter primers
CN115210380B (zh) 耐热的错配核酸内切酶变体
Morais et al. Mechanisms and clinical applications of RNA pseudouridylation
US20250115953A1 (en) Methylcytosine-Selective Deaminases and Uses Thereof
US7087733B2 (en) Human ARL-related gene variants associated with cancers
AU2023385943A1 (en) Double-stranded dna deaminases and uses thereof
US20040116658A1 (en) Human KAP/Cdi1-related gene variant associated with small cell lung cancer
CN120192947A (zh) 一种新型常温Argonaute蛋白变体的表征及应用
WO2024218401A1 (fr) Protéases spécifiques du poly-glutamate

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24723954

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: AU2024264100

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: P2025-03460

Country of ref document: AE

ENP Entry into the national phase

Ref document number: 2024264100

Country of ref document: AU

Date of ref document: 20240424

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2024723954

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2024723954

Country of ref document: EP

Effective date: 20251128

ENP Entry into the national phase

Ref document number: 2024723954

Country of ref document: EP

Effective date: 20251128

ENP Entry into the national phase

Ref document number: 2024723954

Country of ref document: EP

Effective date: 20251128