[go: up one dir, main page]

WO2025232923A1 - Édition programmable de bases pyrimidines d'adn par excision à base d'uracil-adn glycosylase ingénierisée - Google Patents

Édition programmable de bases pyrimidines d'adn par excision à base d'uracil-adn glycosylase ingénierisée

Info

Publication number
WO2025232923A1
WO2025232923A1 PCT/CN2025/094182 CN2025094182W WO2025232923A1 WO 2025232923 A1 WO2025232923 A1 WO 2025232923A1 CN 2025094182 W CN2025094182 W CN 2025094182W WO 2025232923 A1 WO2025232923 A1 WO 2025232923A1
Authority
WO
WIPO (PCT)
Prior art keywords
thymine
engineered
modifying polypeptide
ung
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2025/094182
Other languages
English (en)
Inventor
Wensheng Wei
Zongyi YI
Xiaoxue ZHANG
Xiaoxu WEI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Beijing Changping Laboratory
Original Assignee
Peking University
Beijing Changping Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Beijing Changping Laboratory filed Critical Peking University
Publication of WO2025232923A1 publication Critical patent/WO2025232923A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/02Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2) hydrolysing N-glycosyl compounds (3.2.2)
    • C12Y302/02027Uracil-DNA glycosylase (3.2.2.27)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2497Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing N- glycosyl compounds (3.2.2)

Definitions

  • the present application is directed to compositions, methods, and systems for DNA pyrimidine-base editing.
  • the present application is directed to engineered thymine-modifying polypeptides or engineered cytosine-modifying polypeptides comprising a variant of a uracil-DNA glycosylate (UNG) .
  • UNG uracil-DNA glycosylate
  • the present application is directed to such engineered polypeptides and a DNA recognition domain, which are configured to target a nucleic acid for excision.
  • cytosine base editors and adenine base editors comprising deaminases.
  • the deaminase in these tools transforms cytosine (C) or adenine (A) into uracil (U) or inosine (I) , which are then subsequently recognized as thymine (T) and guanine (G) in DNA repair mechanisms to complete C-to-T and A-to-G base editing.
  • AP sites apurinic/apyrimidinic sites
  • TLS translesion DNA synthesis
  • CGBE C-to-G base editors
  • AYBE A-to-Y base editors
  • an engineered thymine-modifying polypeptide comprising a uracil-DNA glycosylase (UNG) variant comprising an amino acid substitution selected from a group consisting of Y85A, Y85S, Y85N, Y85G, and Y85C; and one or more amino acid substitutions of L80V, K113E, K197E, S204A, S204G, or E237Q, wherein the positions of amino acid substitutions are in reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG uracil-DNA glycosylase
  • the engineered thymine-modifying polypeptide comprises the amino acid substitution of Y85A. In some embodiments, the engineered thymine-modifying polypeptide comprises the amino acid substitutions of L80V, K113E, K197E, S204A, and E237Q. In some embodiments, the engineered thymine-modifying polypeptide comprises the amino acid substitutions of K113E and S204G. In some embodiments, the engineered thymine-modifying polypeptide comprises the amino acid substitutions of K113E, S204A, and E237Q. In some embodiments, the engineered thymine-modifying polypeptide comprises the amino acid substitution of L80V.
  • the engineered thymine-modifying polypeptide comprises the amino acid substitutions of K197E, S204A, and E237Q. In some embodiments, the engineered thymine-modifying polypeptide comprises the amino acid substitutions of K113E, K197E, and E237Q.
  • the UNG is derived from an organism selected from a group consisting of Deinococcus radiodurans, Homo sapiens, Escherichia coli, Human Herpesvirus 1, and Vaccinia virus. In some embodiments, the UNG is derived from Deinococcus radiodurans.
  • an engineered thymine-modifying polypeptide comprising a uracil-DNA glycosylase (UNG) variant comprising an amino acid substitution comprising Y85A, wherein the UNG variant is derived from Deinococcus radiodurans.
  • the engineered thymine-modifying polypeptide further comprises one or more amino acid substitutions of L80V, K113E, K197E, S204A, S204G, or E237Q, wherein the positions of amino acid substitutions are in reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant has at least about 70%sequence identity to SEQ ID NO: 7. In some embodiments, the UNG variant comprises an N-terminal deletion relative to the corresponding UNG from which the UNG variant was derived. In some embodiments, the N-terminal deletion comprises a deletion of up to the first 23 amino acids. In some embodiments, the N-terminal deletion comprises a deletion of the first two amino acids.
  • the engineered thymine-modifying polypeptide further comprises a DNA recognition domain.
  • the DNA recognition domain is fused to the UNG variant.
  • the DNA recognition domain is fused to the C-terminus of the UNG variant.
  • the DNA recognition domain is operably connected to the UNG variant by a linker domain.
  • the linker domain is a 32-amino acid linker or a 64-amino acid linker, and is selected from a list composed of GS, SGGS, PAPAP, XTEN, or repeats thereof.
  • the DNA recognition domain comprises a Cas nuclease.
  • the Cas nuclease is an nCas9.
  • the nCas9 comprises a D10A mutation.
  • the nCas9 is derived from Streptococcus pyogenes.
  • the Cas nuclease is a dCas9.
  • the engineered thymine-modifying polypeptide does not comprise a deaminase domain having deamination activity.
  • an engineered thymine-modifying polypeptide comprising SEQ ID NO: 9. In some aspects, provided herein is an engineered thymine-modifying polypeptide comprising SEQ ID NO: 10. In some aspects, provided herein is an engineered thymine-modifying polypeptide comprising SEQ ID NO: 12. In some aspects, provided herein is an engineered thymine-modifying polypeptide comprising SEQ ID NO: 13. In some aspects, provided herein is an engineered thymine-modifying polypeptide comprising SEQ ID NO: 14. In some aspects, provided herein is an engineered thymine-modifying polypeptide comprising SEQ ID NO: 15. In some embodiments, the engineered thymine-modifying polypeptide further comprises a DNA recognition domain.
  • nucleic acid encoding an engineered thymine-modifying polypeptide described herein.
  • a thymine editing system comprising an engineered thymine-modifying polypeptide described herein, or a nucleic acid encoding the same.
  • the thymine editing system further comprises an sgRNA.
  • the sgRNA targets a protospacer comprising a target thymine.
  • a method of modifying a target thymine in a nucleic acid sequence present in one or more nucleic acid molecules comprising contacting at least one nucleic acid molecule of the one or more nucleic acid molecules with an engineered thymine-modifying polypeptide described herein, wherein the DNA recognition domain is configured to associate with the at least one nucleic acid molecule such that the UNG variant is positioned to modify the target thymine.
  • the one or more nucleic acid molecules are in a cell. In some embodiments, the one or more nucleic acid molecules are in a mitochondrion. In some embodiments, the target thymine undergoes a T-to-C, T-to-G, or T-to-Amodification. In some embodiments, about 15%to about 75%of the modified target thymine are modified to a cytosine. In some embodiments, about 1%to about 65%of the modified target thymine are modified to a guanine. In some embodiments, about 5%to about 50%of the modified target thymine are modified to an alanine.
  • an NGG PAM site is 14 to 19 base pairs away from the target thymine. In some embodiments, an NGG PAM site is 14 to 19 base pairs away from the target thymine, wherein counting starts at the first base outside of the NGG PAM in a protospacer, and wherein the NGG PAM site is associated with DNA recognized by the DNA recognition domain, such as a Cas nuclease. In some embodiments, an NG PAM site is 14 to 19 base pairs away from the target thymine.
  • an NG PAM site is 14 to 19 base pairs away from the target thymine, wherein counting starts at the first base outside of the NG PAM in a protospacer, and wherein the NG PAM site is associated with DNA recognized by the DNA recognition domain, such as a Cas nuclease
  • the method exhibits an editing efficiency at least 2-fold greater than that of a method comprising contacting a nucleic acid molecule comprising a target thymine with a wild-type UNG. In some embodiments, the method exhibits an editing efficiency of about 6%to about 40%.
  • a method of treating a disease associated with a DNA mutation in an individual comprising contacting a nucleic acid comprising a target thymine associated with a disease with a thymine editing system described herein.
  • the thymine editing system targets a splicing site on a gene.
  • the disease is Duchene Muscular Dystrophy.
  • the contacting reduces the amount of a premature stop codon. In some embodiments, the contacting results in the formation of a TAG-to-XAG mutation in exon 9 of a IDUA gene. In some embodiments, the administration of the thymine editing system results in restoration of IDUA catalytic activity. In some embodiments, the disease is Hurler syndrome.
  • provided herein is a method of treating Hurler syndrome in an individual in need thereof, comprising contacting a nucleic acid comprising a target thymine in a mutation associated with Hurler syndrome with a thymine editing system described herein.
  • FIG. 1A shows a schematic of screen hUNG variants that can specifically excise thymine and to produce AP sites using reporter system (SEQ ID NO: 22 and SEQ ID NO: 23) .
  • FIG. 1B shows a schematic diagram of nCas9 (D10A) -hUNG variant targeting reporter system (Top sequence (SEQ ID NO: 123) ; middle sequence (SEQ ID NO: 124) ; bottom sequence (SEQ ID NO: 125) ) .
  • FIG. 1C shows the eGFP + ratio of hUNG key amino acid saturation mutations for thymine excision (SEQ ID NO: 126) .
  • FIG. 1D shows the editing rate of the hUNG (Y147A) -nCas9 (D10A) targeting reporter system.
  • FIG. 2A shows a schematic of screen hUNG variant that can specifically excise cytosine to produce AP sites using reporter system (SEQ ID NO: 25 and SEQ ID NO: 26) .
  • FIG. 2B shows the eGFP + ratio of hUNG key amino acid saturation mutations for cytosine excision (SEQ ID NO: 126) .
  • FIG. 2C shows the editing rate of the hUNG (N204D) -nCas9 (D10A) targeting reporter system.
  • FIG. 3A shows the evaluation and results of cytosine excision by native and engineered UNG variants from various species (From top to bottom: SEQ ID NO: 127, 128, 129, 30, 131, 132) .
  • FIG. 3B shows the evaluation and results of thymine excision by native and engineered UNG variants from various species (From top to bottom: SEQ ID NO: 133, 134, 135, 136, 137, 138) .
  • FIG. 4A shows a flowchart for screening DrUNG variants.
  • FIG. 4B shows the enrichment percentage of mutations at particular amino acids obtained from the screen.
  • FIG. 4C shows the eGFP + ratio of different DrUNG variants.
  • FIG. 5A shows separate plots the thymine base editing rate of DrUNG variants 2 and 6 and indel percentage at 16 endogenous sites.
  • FIG. 5B shows the comparison of thymine base editing efficiency of DrUNG variant 2 and 6 at 16 endogenous sites.
  • FIG. 5C shows the distribution of thymine editing results at 16 endogenous sites of DrUNG mutant 6 (also referred to as “thymine base editor” [TBE] ) .
  • FIG. 5D shows a plot of the editing rate for various amino acid positions away from a NGG PAM.
  • FIG. 5E shows the editing efficiency of DrUNG mutant 6 comprising various linkers.
  • FIG. 5F shows the editing rate of DrUNG mutant 6 comprising various linkers at various endogenous sites.
  • FIG. 5G shows the editing efficiency of DrUNG mutant 6 in various cell lines.
  • FIG. 5H shows indel levels of DrUNG mutant 6 in various cell lines.
  • FIG. 6A shows the genome wide off-target effects of TBE at site 31. Sample transfected with eGFP-expressing plasmid as control.
  • FIG. 6B shows the transcriptome wide off-target effects of TBE at site 31. Sample transfected with eGFP-expressing plasmid as control.
  • FIG. 6A shows the genome wide off-target effects of TBE at site 31. Sample transfected with eGFP-expressing plasmid as control.
  • 6C shows the editing efficiency of top 10 off-target sites predicted by Cas-OFFinder at site 15 (from top to bottom: SEQ ID NO: 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149) , 16 (from top to bottom: SEQ ID NO: 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160) , 17 (from top to bottom: SEQ ID NO: 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171) and 18 (from top to bottom: SEQ ID NO: 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182) .
  • FIG. 7A shows a diagram illustrating the human Hurler syndrome reporter system and sgRNA design for TBE. (From top to bottom: SEQ ID NO: 183, 184, 185, 186, 187, 188) .
  • FIG. 7B shows Editing efficiency of SpCas9, SpCas9-NG and SpG-Cas9 respectively fused to DrUNG mutant 6 and co-transfected with the corresponding sgRNA in IDUA reporter system.
  • FIG. 7C shows a schematic diagram of co-transfection of mRNA of the fusion protein of SpCas9 and DrUNG mutant 6 and sgRNA into GM06214 (IDUA W402X ) cells derived from patient with Hurler syndrome, which contain the IDUA W402X mutation.
  • FIG. 7A shows a diagram illustrating the human Hurler syndrome reporter system and sgRNA design for TBE. (From top to bottom: SEQ ID NO: 183, 184, 185, 186, 187, 188) .
  • FIG. 7D shows the editing efficiency of on-target in GM06214 cells using TBE.
  • the DNA pyrimidine base modifying polypeptides taught herein are derived from a uracil-DNA glycosylase (UNG) , which natively excises uracil in DNA to prevent mutagenesis, and engineered to excise a DNA pyrimidine base such as a thymine or a cytosine.
  • UNG uracil-DNA glycosylase
  • the engineered DNA pyrimidine base modifying polypeptide is configured to excise a target thymine.
  • the engineered DNA pyrimidine base modifying polypeptide is configured to excise a target cytosine.
  • the engineered DNA pyrimidine base modifying polypeptide comprises a DNA recognition domain, such as to provide targeting specificity. Further described herein are methods of using the taught engineered pyrimidine base modifying polypeptides (e.g., in a method of treatment) , kits, and systems thereof.
  • UNG variants that transform the native function of a UNG and allow for highly efficient excision of DNA pyrimidine bases, such as thymine or cytosine bases (as noted above, UNGs natively excise uracil bases in DNA) .
  • DNA pyrimidine bases such as thymine or cytosine bases (as noted above, UNGs natively excise uracil bases in DNA)
  • the inventors engineered novel UNGs from species not previously explored, such as engineered Deinococcus radiodurans UNG and surprisingly found mutant variants with increase DNA pyrimidine excision activity.
  • EcUNG N123D
  • HHV1_UNG N147D
  • DrUNG Y85A
  • the activity of DrUNG (Y85A) surpasses that of hUNG (Y147A) by a factor of five, and through further mutations, the inventors found that the activity of such engineered polypeptides could further be enhanced by about sixfold.
  • the engineered UNG variants provided herein represent a novel approach to editing DNA pyrimidine bases that do not involve a deaminase.
  • the inventors developed an effective thymine base editing tool by fusing a UNG variant to a DNA recognition domain, with the editing window situated at positions 14-18 of the spacer between the two. More specifically, the inventors also found that by using an engineered UNG variant provided herein, one can selectively generate an apurinic/apyrimidinic site (AP site) , induce nicking on the complementary strand with nCas9, and then achieve base editing of cytosine or thymine during translesion DNA synthesis via incorporation of alternative bases opposite the AP site. As demonstrated in the Examples, we found a preference for incorporating G opposite the AP site after thymine removal, resulting in T-to-C base editing.
  • AP site apurinic/apyrimidinic site
  • the base editing tool In the context of human disease-related point mutations, the base editing tool provided herein holds the potential to correct up to 70%of these mutations. Notably, existing base editing tools relying on deaminases often exhibit significant off-target effects on RNA, posing a clear drawback for disease treatment. The base editing tool, circumventing the need for deaminases, provides a more precise approach to base editing for therapeutic applications.
  • an engineered uracil-DNA glycosylase (UNG) variant configured to excise a DNA pyrimidine such as a target thymine or a target cytosine.
  • the engineered polypeptide is a thymine-modifying polypeptide comprising a variant of a UNG.
  • the engineered polypeptide is a cytosine-modifying polypeptide comprising a variant of a UNG.
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) variant comprising an amino acid substitution of Y85 (such as to an amino acid that is smaller than tyrosine) ; and one or more amino acid substitutions at L80, K113, K197, S204, S204, or E237, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG uracil-DNA glycosylase
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution selected from a group consisting of Y85A, Y85S, Y85N, Y85G, and Y85C; and (b) one or more amino acid substitutions of L80V, K113E, K197E, S204A, S204G, or E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the engineered thymine-modifying polypeptide comprises the amino acid substitution of Y85A.
  • the engineered thymine-modifying polypeptide comprises the amino acid substitutions of L80V, K113E, K197E, S204A, and E237Q. In some embodiments, the engineered thymine-modifying polypeptide comprises the amino acid substitutions of K113E and S204G. In some embodiments, the engineered thymine-modifying polypeptide comprises the amino acid substitutions of K113E, S204A, and E237Q. In some embodiments, the engineered thymine-modifying polypeptide comprises the amino acid substitution of L80V. In some embodiments, the engineered thymine-modifying polypeptide comprises the amino acid substitutions of K197E, S204A, and E237Q.
  • the engineered thymine-modifying polypeptide comprises the amino acid substitutions of K113E, K197E, and E237Q. In some embodiments, the engineered thymine-modifying polypeptide is derived from Deinococcus radiodurans. In some embodiments, the engineered thymine-modifying polypeptide comprises an N-terminal deletion (such as a deletion of one or two amino acids) .
  • the engineered thymine-modifying polypeptide described herein comprises a DNA recognition domain, such as a Cas nuclease (e.g., nCas9 or dCas9) or TAL effector.
  • a Cas nuclease e.g., nCas9 or dCas9
  • TAL effector e.g., TAL effector
  • composition comprising a nucleic acid encoding an engineered polypeptide for DNA pyrimidine base modification described herein such as an engineered thymine-modifying polypeptide described herein.
  • a method of excising a DNA pyrimidine, such as thymine, from a nucleic acid molecule comprising delivering an engineered polypeptide for DNA pyrimidine base modification described herein, such as an engineered thymine-modifying polypeptide, or a nucleic acid composition encoding such a polypeptide.
  • nucleic acid editing system comprising an engineered polypeptide for DNA pyrimidine base modification described herein, such as an engineered thymine-modifying polypeptide comprising a DNA recognition domain, and an sgRNA.
  • a method of treating a disease associated with a DNA mutation in an individual comprising administering an engineered polypeptide for DNA pyrimidine base modification described herein, such as an engineered thymine-modifying polypeptide, or a nucleic acid composition encoding such a polypeptide, such that the engineered thymine-modifying polypeptide, or the nucleic acid composition, contacts a target nucleic acid and excises a target DNA pyrimidine base.
  • an engineered polypeptide for DNA pyrimidine base modification described herein such as an engineered thymine-modifying polypeptide, or a nucleic acid composition encoding such a polypeptide, such that the engineered thymine-modifying polypeptide, or the nucleic acid composition, contacts a target nucleic acid and excises a target DNA pyrimidine base.
  • kits for modifying nucleic acids comprising an engineered polypeptide for DNA pyrimidine base modification described herein, such as an engineered thymine-modifying polypeptide, or a nucleic acid composition encoding such a polypeptide.
  • polypeptide and “protein, ” as used herein, may be used interchangeably to refer to a polymer comprising amino acid residues, and are not limited to a minimum length. Such polymers may be translational fusions of two or more proteins. Such polymers may contain natural or non-natural amino acid residues, or combinations thereof, and include, but are not limited to, peptides, polypeptides, oligopeptides, dimers, trimers, and multimers of amino acid residues. Full-length polypeptides or proteins, and fragments thereof, are encompassed by this definition. The terms also include modified species thereof, e.g., post-translational modifications of one or more residues, for example, methylation, phosphorylation glycosylation, sialylation, or acetylation.
  • polynucleotide refers to a polymeric form of nucleotides of any length, and may be either ribonucleotides (RNA) or deoxyribonucleotides (DNA) .
  • RNA ribonucleotides
  • DNA deoxyribonucleotides
  • this term includes, but is not limited to unless specifically stated to be so limited, single-, double-or multi-stranded DNA or RNA, genomic DNA, mitochondrial DNA (mtDNA) , cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases, or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
  • the backbone of the polynucleotide can comprise sugars and phosphate groups (as may typically be found in RNA or DNA) , or modified or substituted sugar or phosphate groups.
  • the backbone of the polynucleotide can comprise a polymer of synthetic subunits such as phosphoramidates and phosphorothioates, and thus can be an oligodeoxynucleoside phosphoramidate (P-NH2) or a mixed phosphoramidate-phosphodiester oligomer.
  • a double-stranded polynucleotide can be obtained from the single stranded polynucleotide product of chemical synthesis either by synthesizing the complementary strand and annealing the strands under appropriate conditions, or by synthesizing the complementary strand de novo using a DNA polymerase with an appropriate primer.
  • treatment is an approach for obtaining beneficial or desired results, including clinical results.
  • beneficial or desired clinical results include, but are not limited to, alleviating one or more symptoms of a disease associated with a DNA mutation, e.g., a mitochondrial disease, reducing one or more symptoms of a disease, preventing one or more symptoms of a disease, treating one or more symptoms of a disease, ameliorating one or more symptoms of a disease, delaying onset of one or more symptoms associated with having a disease, diminishing the extent of one or more symptoms of a disease, stabilizing the disease (e.g., preventing or delaying the worsening of the disease) , delaying or slowing the progression of the disease, ameliorating one or more symptoms of a disease, decreasing the dose of one or more other medications and/or treatments required to treat the disease, increasing the quality of life of the individual, and/or prolonging survival of the individual.
  • treatment is a reduction of a pathological consequence of a
  • the term “individual” refers to a mammal and includes, but is not limited to, human, bovine, horse, feline, canine, rodent, or primate. In some embodiments, the individual is human.
  • references to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X. ”
  • engineered DNA pyrimidine base-modifying polypeptides for editing DNA comprising an engineered variant of a uracil-DNA glycosylase (UNG) .
  • the UNG is mutated to enable modification (via excision) of a DNA pyrimidine base such as thymine or cytosine.
  • the engineered DNA pyrimidine base-modifying polypeptides taught herein further comprise a DNA recognition domain, such as a double-stranded (ds) DNA binding polypeptide.
  • the engineered DNA pyrimidine base-modifying polypeptide comprises a dsDNA binding polypeptide fused to a UNG variant via translational fusion.
  • the components of the polypeptide are configured such that the engineered UNG variant is brought into proximity of a target DNA pyrimidine to catalyze, at least in part, a desired nucleotide base edit.
  • the engineered DNA pyrimidine base-modifying polypeptide is configured as a thymine-modifying system, e.g., generating a T-to-C, T-to-G, or T-to-Aconversion.
  • the engineered DNA pyrimidine base-modifying polypeptide is configured as a cytosine-modifying system, e.g., generating a C-to-G, C-to-T, or C-to-Aconversion.
  • UNGs are enzymes that natively cleave N-glycosidic bonds in DNA for the purpose of performing uracil base excision repair when a uracil (which is a nucleobase typically found in RNA) is improperly incorporated into DNA thereby helping to prevent mutagenesis.
  • UNGs work by positioning the uracil base out of the DNA double helix and cleaving the N-glycosidic bond thereby excising the uracil while keeping the sugar-phosphate backbone intact.
  • the engineered DNA pyrimidine base-modifying polypeptides described herein comprise UNG variants modified to excise DNA pyrimidines, such as thymine or cytosine.
  • the UNG variant comprises at least the catalytic domain of the UNG from which it is derived.
  • a UNG variant comprising one or more amino acid substitutions which enlarge the active site pocket such that the UNG variant is capable of excising a target thymine or a target cytosine.
  • the position of the one or more amino acid substitutions provided herein may be based on UNGs of different species.
  • One of ordinary skill in the art will readily understand how to convert an amino acid numbering from one UNG to another, such as across species, e.g., by comparing sequences and/or looking for sequence and/or structural homology.
  • the position of amino acid substitution is based on reference to the amino acid sequence set forth in SEQ ID NO: 1, a human UNG.
  • the UNG variant comprises a mutation of Y147 (based on SEQ ID NO: 1) to an amino acid having a smaller residue (such as determined by occupied volume or surface area) the tyrosine, wherein the UNG variant is capable of excising thymine.
  • the UNG variant comprises an amino acid substitution selected from a group consisting of Y147A, Y147S, Y147N, Y147G, and Y147C (based on SEQ ID NO: 1) , wherein the UNG variant is capable of excising thymine.
  • the UNG variant comprises an amino acid substitution of Y147A (based on SEQ ID NO: 1) .
  • the position of amino acid substitution is based on reference to the amino acid sequence set forth in SEQ ID NO: 7, a Deinococcus radiodurans UNG.
  • the UNG variant comprises an amino acid substitution selected from a group consisting of Y85A, Y85S, Y85N, Y85G, and Y85C (based on SEQ ID NO: 7) , wherein the UNG variant is capable of excising thymine.
  • the amino acid substitution is Y85A (based on SEQ ID NO: 7) .
  • the UNG variant comprises an amino acid substitution of N204D (based on SEQ ID NO: 1) , wherein the UNG variant is capable of excising cytosine.
  • the UNG variant further comprises one or more additional amino acid substitutions.
  • the one or more additional amino acid substitutions comprise substitutions in conserved motifs of UNG. Examples of conserved UNG motifs are the catalytic water-activating loop (143-GQDPYH-148 in hUNG) , the Pro-rich loop, which compresses the DNA backbone 5’ to the lesion (165-PPPPS-169 in hUNG) , the Ura-binding motif (201-LLLN-204 in hUNG) , the Gly-Ser loop that compresses the DNA backbone 3’ to the lesion (246-GS-247 in hUNG) 5) , and the Leu-intercalation loop, which penetrates the minor groove (268-HPSPLS-273 in hUNG) .
  • conserved UNG motifs are the catalytic water-activating loop (143-GQDPYH-148 in hUNG) , the Pro-rich loop, which compresses the DNA backbone 5’ to the lesion (165
  • the one or more amino acid substitutions comprise substitutions of one or more of L80V, K113E, K197E, S204A, S204G, or E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of L80V, K113E, K197E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG variant comprises amino acid substitutions of K113E and S204G, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of K113E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of L80V, wherein the positions of amino acid substitution is based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of K197E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of K113E, K197E, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) and one or more of L80V, K113E, K197E, S204A, S204G, or E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) , L80V, K113E, K197E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG variant comprises amino acid substitutions of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) , K113E and S204G, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) , K113E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) , L80V, wherein the positions of amino acid substitution is based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) , K197E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variant comprises amino acid substitutions of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) , K113E, K197E, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • the UNG variants described herein can be engineered to further comprise an N-terminal deletion relative to the UNG (such as a full-length UNG) from which it is derived.
  • the N-terminal deletion is a deletion of up to the first 30 amino acids based on reference to the amino acid sequence set forth in SEQ ID NO: 7, such as deletion of any of 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 amino acids.
  • the N-terminal deletion comprises a deletion of the first 2 amino acids.
  • a deletion of the first two amino acids based on reference to the amino acid sequence set forth in SEQ ID NO: 7 would comprise deleting a methionine and threonine from the N-terminus.
  • the UNG variant is at least about 70%identical to the wildtype sequence of a corresponding truncated version of the UNG.
  • the variant of a UNG has at least about 65%, such as at least about any of 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%, sequence identity to a corresponding portion of a naturally occurring UNG, such as a UNG from Deinococcus radiodurans (e.g., SEQ ID NO: 7) .
  • the UNG variant has at least about 65%, such as at least about any of 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%, sequence identity to SEQ ID NO: 1, including a corresponding portion of SEQ ID NO: 1.
  • the UNG variant has at least about 65%, such as at least about any of 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%, sequence identity to SEQ ID NO: 16, including a corresponding portion of SEQ ID NO: 16. In some embodiments, the UNG variant has at least about 65%, such as at least about any of 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%, sequence identity to SEQ ID NO: 4, including a corresponding portion of SEQ ID NO: 4.
  • the UNG variant has at least about 65%, such as at least about any of 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%, sequence identity to SEQ ID NO: 19, including a corresponding portion of SEQ ID NO: 19.
  • engineered UNG variants described herein may be derived from a specific species, including but not limited to mammals (human, mouse, swine, bovine, monkey, horse, etc. ) , bacteria (Escherichia coli, Deinococcus radiodurans, etc. ) , viruses (Poxviridae, Herpesviridae, etc. ) , and yeast.
  • an engineered UNG variant described herein is derived from Homo sapiens (UniprotKB No. P13051-2) , Escherichia coli (UniprotKB No.
  • the engineered UNG variant described herein is derived from Deinococcus radiodurans.
  • an engineered polypeptide for DNA pyrimidine base modification wherein the engineered polypeptide comprises a variant of a UNG and a DNA recognition domain.
  • the DNA recognition domain is a double-stranded (ds) DNA recognition domain.
  • the DNA recognition domain comprises a polypeptide (and in some embodiments comprises a nucleic acid component, such as a guide) configured to bind to a dsDNA, such as at a specific location to enable action of the engineered UNG variant at a specific target site.
  • the DNA recognition domains can be configured (e.g., is programmable) to bind to a specific location, including strand, of dsDNA relative to the target DNA pyrimidine to be edited.
  • the DNA recognition domain does not comprise a nucleic acid component, e.g., the DNA recognition domains is a TALE or zinc finger, or a portion thereof capable of binding to the dsDNA.
  • the DNA recognition domain comprises a domain of a nucleic acid programmable endonuclease such as Cas9, CasX, CasY, Cpf1, C2c1, C2c2, or C2c3, that binds DNA.
  • the DNA recognition domains is a catalytically inactive form of a nickase, such as an endonuclease.
  • the DNA recognition domain does not comprise nickase activity.
  • the DNA recognition domain is a Cas9 protein.
  • Cas9 recognizes a protospacer adjacent motif (PAM) of 5’-NGG-3’.
  • Cas9 comprises a RuvC domain for cleavage of the non-target strand and a HNH domain for cleavage of the target strand.
  • the DNA recognition domain is nicking Cas9, i.e., nCas9.
  • nCas9 is a mutated form of Cas9 wherein an amino acid substitution disables cleavage of one strand of dsDNA.
  • this mutation is D10A in the RuvC domain, enabling cleavage of only the target strand.
  • this mutation is H840A in the HNH domain, enabling cleavage of only the non-target strand.
  • the DNA recognition domain is dead Cas9, i.e., dCas9.
  • dCas9 is a mutated form of Cas9 with no endonuclease activity, comprising mutations of D10A in the RuvC domain and H840A in the HNH domain.
  • D10A is based on the Streptococcus pyogenes sequence.
  • the DNA recognition domain is derived from Streptococcus pyogenes.
  • the DNA recognition domain comprises a guide (such as a guide RNA) for targeting a specific DNA site.
  • the guide is a Crispr RNA (targeting) and tracr RNA (scaffolding to Cas protein) as a single RNA molecule known as single guide RNA (sgRNA) .
  • sgRNA single guide RNA
  • sgRNA is known in the field, e.g., US Pat. No. 11, 015, 193, which is hereby incorporated by reference herein in its entirety.
  • DNA pyrimidine base-editing systems comprising a UNG variant and a DNA recognition domain.
  • the DNA pyrimidine base-editing system further comprises a nucleic acid for targeted editing and such will be apparent based on the type of DNA recognition domain used.
  • the DNA pyrimidine base-editing systems encompassed herein can be formed from any combination or arrangement of a described UNG variant and a DNA recognition domain, and, optionally, in some embodiments a targeting nucleic acid such as a sgRNA.
  • the DNA pyrimidine base-editing systems are configured such that a UNG variant is brought into proximity of a target DNA pyrimidine to catalyze, at least in part, a target thymine or cytosine nucleotide base edit.
  • the DNA pyrimidine base-editing system targets a thymine or cytosine on a target DNA strand.
  • the DNA pyrimidine base-editing system targets a thymine or cytosine in a non-strand specific manner, i.e., can excise a thymine or cytosine on either strand of DNA.
  • the provided systems are configured to target a thymine that is not in close proximity to another thymine on the opposite DNA strand, such as to avoid formation of a double strand break. In such embodiments, the provided systems are configured to target a thymine that is not within 2 base pairs of another thymine on the opposite DNA strand. Thus, in some aspects provided herein, is a selection step of picking a suitable target thymine for editing. In some embodiments, the provided systems are configured to target a cytosine that is not in close proximity to another cytosine on the opposite DNA strand, such as to avoid formation of a double strand break.
  • the provided systems are configured to target a cytosine that is not within 2 base pairs of another cytosine on the opposite DNA strand.
  • a selection step of picking a suitable target cytosine for editing is a selection step of picking a suitable target cytosine for editing.
  • the DNA pyrimidine base-editing system comprises a UNG variant and a DNA recognition domain, wherein the UNG variant and DNA recognition domain form a complex configured such that the DNA recognition domain, when associated with a dsDNA, positions the UNG variant such that it can act on the target DNA pyrimidine.
  • the DNA recognition domain uses an sgRNA, then it can be selected/designed to be complementary to a sequence in proximity to the target site.
  • Proper sgRNA design including construction and/or selection for a desired specificity to limit off-target editing, is well known in the art (for example, Naeem M, et al, Cells, 9, 2020) .
  • UNG variant and DNA recognition domain are guided to the editing region by a nucleic acid, e.g., an sgRNA.
  • the target DNA pyrimidine is at a specified position on a protospacer targeted by the sgRNA.
  • the target DNA pyrimidine is at positions 1 to 30 base pairs, 5 to 25 base pairs, 10 to 20 base pairs, or 14 to 18 base pairs on the protospacer.
  • the target DNA pyrimidine on the dsDNA is 0-24 base pairs in length, including any of 1-20 base pairs in length, 1-10 base pairs in length, 10-20 base pairs in length, 10-16 base pairs in length, 14-20 base pairs in length, or 14-16 base pairs in length.
  • directly preceding the target protospacer is a protospacer adjacent motif (PAM) , which is required for Cas endonuclease activity.
  • the PAM sequence is NGG.
  • the PAM sequence is a relaxed NG.
  • the complex of a UNG variant and a DNA recognition domain can be configured in various ways, such as via direct fusion (e.g., as a single expressed polypeptide) , non-covalent interaction, or covalent linkage (e.g., via a polypeptide or non-polypeptide linker) .
  • the UNG variant is fused to the DNA recognition domain.
  • the DNA recognition domain is fused to the C-terminus of a UNG variant.
  • the DNA recognition domain is fused to the N-terminus of a UNG variant.
  • the DNA pyrimidine base-editing system comprises a linker associating a DNA recognition domain and a UNG variant.
  • the linker is a polypeptide linker.
  • additional adjustments may be included for configuring the position of the UNG variant, such as by adjusting a linker length connecting a DNA recognition domain and a UNG variant.
  • the present disclosure encompasses such variations of the DNA pyrimidine base-editing systems described herein, e.g., many variations of a pyrimidine base-editing system may be designed based on the teachings provided herein that perform the same DNA edit.
  • the DNA recognition domain is configured to bind 0-25 base pairs away, such as any of 10-20, 12-20, or 14-20 base pairs away, from a target base of a UNG variant described herein.
  • the linker comprises a polypeptide linker, such as a GGGGS linker.
  • the polypeptide linker is from 1-100 amino acids in length, such as any of 1-90 amino acids in length, 1-80 amino acids in length, 1-70 amino acids in length, or 1-65 amino acids in length.
  • the polypeptide linker is any of the following amino acids in lengths: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 2, 4 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 amino acids.
  • the linker is selected from a list composed of GS, SGGS, PAPAP, XTEN, or repeats thereof.
  • the linker is at least about 30 amino acids in length, such as a 32-amino acid linker or a 64-amino acid linker. In some embodiments, the linker is at least about 30 amino acids in length, such as a 32-amino acid linker or a 64-amino acid linker, and is selected from a list composed of GS, SGGS, PAPAP, XTEN, or repeats thereof. In some embodiments, the linker is a 64-amino acid linker. In some embodiments, the linker is a 32-amino acid linker. In some embodiments, the linker is GGGGS (SEQ ID NO: 27) . In some embodiments, the linker is GGGGSGGGGSGGGGS (SEQ ID NO: 28) .
  • the DNA pyrimidine base-modifying system can generate a specific thymine conversion at a target site.
  • the conversion is a T-to-C, T-to-G, or T-to-Aconversion.
  • the DNA pyrimidine base-modifying system when an edit occurs, the DNA pyrimidine base-modifying system generates a T-to-C conversion at a rate of about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, or about 75%.
  • the DNA pyrimidine base-modifying system when an edit occurs, the DNA pyrimidine base-modifying system generates a T-to-G conversion at a rate of about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, or about 65%. In certain embodiments, when an edit occurs, the DNA pyrimidine base-modifying system generates a T-to-Aconversion at a rate of about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, or about 50%. In some embodiments, the DNA pyrimidine base-editing system can generate a specific cytosine conversion at a target site. In some embodiments, conversion is a C-to-G, C-to-T, or C-to-Aconversion. In some embodiments, the conversion is C-to-G.
  • the DNA recognition domains of a pyrimidine base-editing system may be configured to position an associated UNG variant relative to one or more target DNA pyrimidines.
  • the DNA recognition domain recognizes DNA that occurs at a single target site.
  • the DNA recognition domain recognizes DNA that occurs at different target sites (e.g., the target sequence recognized by the DNA recognition domain occurs at two or more different locations of the genome) .
  • the dsDNA targeted for editing by the DNA pyrimidine base-editing systems described herein can be any type of double-stranded DNA.
  • the dsDNA is a circularized dsDNA.
  • the dsDNA is a B-DNA conformation.
  • the dsDNA is an A-DNA conformation.
  • the dsDNA is a Z-DNA conformation.
  • Nucleotide editing can occur outside of the editing region, such as off-target editing.
  • the invention described herein exhibits significantly lower off-target editing compared to existing deaminase-based editing tools.
  • the systems provided herein have an off-target editing rate of 30%or less, such as any of about 15%or less, 10%or less, 9%or less, 8%or less, 7%or less, 6%or less, 5%or less, 4%or less, 3%or less, 2%or less, or 1%or less.
  • the systems provided herein have an off-target editing rate of about 1-30%, such as any of about 1-15%, about 1-10%, about 1-9%, about 1-8%, about 1-7%, about 1-6%, about 1-5%, about 1-4%, about 1-3%, about 1-2%, or about 1%.
  • the off-target editing is based on off-target DNA editing.
  • the off-target editing is based on off-target DNA editing.
  • the off-target editing is based on off-target RNA editing.
  • the off-target editing is based on off-target DNA and RNA editing.
  • off-target editing rates are determined by high-throughput sequencing. D. Additional components
  • the DNA pyrimidine base-editing systems provided herein may comprise one or more additional features, such as to aid in delivery and/or function of the engineered DNA pyrimidine base-modifying polypeptides.
  • Mitochondria are unique sub-cellular organelles that possess their own DNA and RNA and mechanisms for their translation, yet they express only 10%of the proteins that they contain. Instead, mitochondria rely in part on the translation products of nuclear genes. These products traverse the cytoplasm and are ‘imported’ into the mitochondria via a system of outer-and inner-membrane-bound protein complexes, where they are delivered to the appropriate mitochondrial compartment and rendered active.
  • This mitochondrial import process is regulated by an N-terminal pre-sequence in the nuclear gene of the protein that tags the protein with a sequence that tells the import machinery where the protein should be delivered-these are known as mitochondrial location signal (MLS) , which can also be referred to as a mitochondrial targeting signal (MTS) , peptides.
  • MLS mitochondrial location signal
  • MTS mitochondrial targeting signal
  • the DNA pyrimidine base-editing system or one or more components thereof, comprise a mitochondrial location signal (MLS) , which can also be referred to as a mitochondrial targeting signal (MTS) .
  • MLS mitochondrial location signal
  • MTS mitochondrial targeting signal
  • the MLS is fused to a UNG variant.
  • the MLS is fused to a DNA recognition domain.
  • any one or more, including all, units of the engineered DNA pyrimidine base-modifying polypeptide may comprise a MLS.
  • the MLS is about 10 to about 80 amino acids in length, such as about any of 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, 45-50, 50-55, 55-60, 60-65, 65-70, 70-75, or 75-80 amino acids in length.
  • the MLS comprises an amphipathic helix structural motif.
  • the MLS can be enriched in basic (e.g., Arg, Lys) , hydroxylated (e.g., Ser, Thr) and/or hydrophobic (e.g., Ala, Leu, Ile) residues.
  • the MLS comprising an amphipathic helix structural motif exhibit alternating hydrophobic and hydrophilic segments.
  • at least about 20% (such as at least about any of 30%, 40%, 50%, or 60%) of the amino acid residues in the MLS are basic amino acid residues.
  • At least about 20% (such as at least about any of 30%, 40%, 50%, or 60%) of the amino acid residues in the MLS are hydrophobic amino acid residues.
  • the MLS is amphipathic, for example forms an amphipathic helix.
  • the MLS comprises an alternating pattern of hydrophobic and basic residues.
  • the MLS is derived from a protein selected from the group consisting of ATP synthase, cytochrome C oxidase peptide VIII, Su9, and HSP60.
  • the MLS can selectively direct a compound to an outer membrane, an inner membrane, and inter-membrane space, or a mitochondrial matrix.
  • the DNA pyrimidine base editing system does not comprise a deaminase domain having deamination activity.
  • the engineered DNA pyrimidine base-modifying polypeptides, or systems thereof do not comprise a deaminase, or a functional unit therefrom.
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) L80V, K113E, K197E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG uracil-DNA glycosylase
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) L80V, K113E, K197E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7, and a DNA recognition domain.
  • the Y85 substitution is Y85A.
  • the DNA recognition domain is a Cas nuclease (such as nCas9 or dCas9) or a TAL effector.
  • the engineered thymine-modifying polypeptide comprises a linker linking the UNG variant and DNA recognition domain (such as a polypeptide linker, e.g., a 64-amino acid linker) .
  • the UNG variant comprises an N-terminal deletion relative to the full-length UNG from which it is derived, such as deletion of 1 or 2 amino acids.
  • the engineered thymine-modifying polypeptide comprises, including is or consists of, SEQ ID NO: 9.
  • the UNG is derived from Deinococcus radiodurans.
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) K113E and S204G, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG uracil-DNA glycosylase
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) K113E and S204G, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7, and a DNA recognition domain.
  • the Y85 substitution is Y85A.
  • the DNA recognition domain is a Cas nuclease (such as nCas9 or dCas9) or a TAL effector.
  • the engineered thymine-modifying polypeptide comprises a linker linking the UNG variant and DNA recognition domain (such as a polypeptide linker, e.g., a 64-amino acid linker) .
  • the UNG variant comprises an N-terminal deletion relative to the full-length UNG from which it is derived, such as deletion of 1 or 2 amino acids.
  • the engineered thymine-modifying polypeptide comprises, including is or consists of, SEQ ID NO: 10.
  • the UNG is derived from Deinococcus radiodurans.
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) K113E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG uracil-DNA glycosylase
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) K113E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7, and a DNA recognition domain.
  • the Y85 substitution is Y85A.
  • the DNA recognition domain is a Cas nuclease (such as nCas9 or dCas9) or a TAL effector.
  • the engineered thymine-modifying polypeptide comprises a linker linking the UNG variant and DNA recognition domain (such as a polypeptide linker, e.g., a 64-amino acid linker) .
  • the UNG variant comprises an N-terminal deletion relative to the full-length UNG from which it is derived, such as deletion of 1 or 2 amino acids.
  • the engineered thymine-modifying polypeptide comprises, including is or consists of, SEQ ID NO: 12.
  • the UNG is derived from Deinococcus radiodurans.
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) L80V, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG uracil-DNA glycosylase
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) L80V, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7, and a DNA recognition domain.
  • the Y85 substitution is Y85A.
  • the DNA recognition domain is a Cas nuclease (such as nCas9 or dCas9) or a TAL effector.
  • the engineered thymine-modifying polypeptide comprises a linker linking the UNG variant and DNA recognition domain (such as a polypeptide linker, e.g., a 64-amino acid linker) .
  • the UNG variant comprises an N-terminal deletion relative to the full-length UNG from which it is derived, such as deletion of 1 or 2 amino acids.
  • the engineered thymine-modifying polypeptide comprises, including is or consists of, SEQ ID NO: 13.
  • the UNG is derived from Deinococcus radiodurans.
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) K197E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG uracil-DNA glycosylase
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) K113E, S204A, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7, and a DNA recognition domain.
  • the Y85 substitution is Y85A.
  • the DNA recognition domain is a Cas nuclease (such as nCas9 or dCas9) or a TAL effector.
  • the engineered thymine-modifying polypeptide comprises a linker linking the UNG variant and DNA recognition domain (such as a polypeptide linker, e.g., a 64-amino acid linker) .
  • the UNG variant comprises an N-terminal deletion relative to the full-length UNG from which it is derived, such as deletion of 1 or 2 amino acids.
  • the engineered thymine-modifying polypeptide comprises, including is or consists of, SEQ ID NO: 14.
  • the UNG is derived from Deinococcus radiodurans.
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) K113E, K197E, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7.
  • UNG uracil-DNA glycosylase
  • an engineered thymine-modifying polypeptide comprising a variant of a uracil-DNA glycosylase (UNG) comprising: (a) an amino acid substitution of Y85 (such as any of Y85A, Y85S, Y85N, Y85G, and Y85C) ; and (b) K113E, K197E, and E237Q, wherein the positions of amino acid substitutions are based on reference to the amino acid sequence set forth in SEQ ID NO: 7, and a DNA recognition domain.
  • the Y85 substitution is Y85A.
  • the DNA recognition domain is a Cas nuclease (such as nCas9 or dCas9) or a TAL effector.
  • the engineered thymine-modifying polypeptide comprises a linker linking the UNG variant and DNA recognition domain (such as a polypeptide linker, e.g., a 64-amino acid linker) .
  • the UNG variant comprises an N-terminal deletion relative to the full-length UNG from which it is derived, such as deletion of 1 or 2 amino acids.
  • the engineered thymine-modifying polypeptide comprises, including is or consists of, SEQ ID NO: 15.
  • the UNG is derived from Deinococcus radiodurans.
  • polynucleotide forms of engineered DNA pyrimidine base-modifying polypeptides described herein are provided herein.
  • the disclosure provided herein covers a multitude of formats capable of introducing functional engineered DNA pyrimidine base-modifying polypeptides described herein in a cell, including different types of polynucleotides (e.g., DNA or RNA, such as circular RNA) and different designs of polynucleotides.
  • “Introducing” or “introduction” used herein in reference to delivering engineered DNA pyrimidine base-modifying polypeptides means delivering one or more components of a DNA pyrimidine base-modifying polypeptide, or a precursor thereof (e.g., one or more polynucleotides encoding an engineered DNA pyrimidine base-modifying polypeptide or a component thereof) , to a cell.
  • the compositions and methods of the present application can employ many delivery systems, including but not limited to, viral, liposome, electroporation, nanoparticle, microinjection and conjugation, to achieve the introduction of the construct as described herein into a cell.
  • Non-viral vector delivery systems include DNA plasmids, RNA (e.g., a transcript of a construct described herein) , naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome.
  • Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes for delivery to the cell.
  • the polynucleotides taught herein encoding an engineered DNA pyrimidine base-modifying polypeptide may be one or more polynucleotides.
  • introducing the engineered DNA pyrimidine base-modifying polypeptide may comprise introducing two or more different polynucleotides, wherein said two or more different polynucleotides may be introduced simultaneously, sequentially, or concurrently, including introduced simultaneously, sequentially, or concurrently into the cell.
  • Methods of non-viral delivery of one or more components of an engineered DNA pyrimidine base-modifying polypeptide, including nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid: nucleic acid conjugates, electroporation, nanoparticles, exosomes, microvesicles, or gene-gun, naked DNA and artificial virions.
  • the polynucleotides described herein comprise additional features useful for expression of a nucleobase editor system in a cell, such as a promoter sequence.
  • a promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active in the presence of a specific condition.
  • a conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule.
  • a subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule “inducer” for activity.
  • inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
  • arabinose-inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
  • constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect.
  • RNA or DNA viral based systems for the delivery of nucleic acids has high efficiency in targeting a virus to specific cells and trafficking the viral payload to the cellular nuclei.
  • delivery comprises introducing a viral vector (such as lentiviral vector) encoding the nucleic acid (s) to the cell.
  • the viral vector is an AAV, e.g., AAV8.
  • delivery comprises introducing a plasmid encoding one or more engineered DNA pyrimidine base-modifying polypeptide components to the cell.
  • delivery comprises introducing (e.g., by electroporation) one or more engineered DNA pyrimidine base-modifying polypeptide components into the cell.
  • delivery comprises transfection of one or more engineered DNA pyrimidine base-modifying polypeptide components into the cell.
  • the polynucleotide such as the polynucleotide introduced to a cell, is DNA or RNA.
  • the RNA is linear RNA.
  • the RNA is circular RNA.
  • the linear RNA is capable of forming a circular RNA.
  • the circulation can be performed, for example, by using the Tornado expression system ( “Twister-optimized RNA for durable overexpression” ) as described in Litke, J.L. &Jaffrey, S.R. Highly efficient expression of circular RNA aptamers in cells using autocatalytic transcripts. Nat Biotechnol 37, 667-675 (2019) , which is hereby incorporated herein by reference in its entirety.
  • Tornado-expressed transcripts contain an RNA of interest flanked by Twister ribozymes.
  • a twister ribozyme is any catalytic RNA sequences that are capable of self-cleavage.
  • the ribozymes rapidly undergo autocatalytic cleavage, leaving termini that are ligated by an RNA ligase.
  • Non-limiting examples of RNA ligase include: RtcB, T4 RNA Ligase 1, T4 RNA Ligase 2, Rnl3 and Trl1.
  • the RNA ligase is expressly endogenously in the cell.
  • the RNA ligase is RNA ligase RtcB.
  • the method further comprises introducing an RNA ligase (e.g., RtcB) into the cell.
  • the RNA is circularized before being introduced to the cell.
  • the RNA is chemically synthesized.
  • the RNA is circularized through in vitro enzymatic ligation (e.g., using RNA or DNA ligase) or chemical ligation (e.g., using cyanogen bromide or a similar condensing agent) .
  • the polynucleotides described herein comprise additional features useful for expression of an engineered DNA pyrimidine base-modifying polypeptide in a cell, such as a promoter sequence.
  • provided herein are methods of using the engineered DNA pyrimidine base-modifying polypeptides and systems described herein. In some embodiments, provided is a method of excising a target thymine from a nucleic acid molecule. In other aspects, provided herein are methods of excising a target cytosine from a nucleic acid molecule.
  • a target DNA pyrimidine such as a target thymine or a target cytosine
  • the method comprising contacting at least one nucleic acid molecule of the one or more nucleic acid molecules with a pyrimidine base-editing system described herein.
  • the DNA pyrimidine base-editing system, or at least a component thereof is in a polypeptide form.
  • the DNA pyrimidine base-editing system is in a polynucleotide form, such as one or more polynucleotides encoding the engineered DNA pyrimidine base-modifying polypeptide, or a component thereof.
  • the one or more polynucleotides are configured such that the associated polypeptide of the engineered DNA pyrimidine base-modifying polypeptides is expressed in the cell.
  • the method comprises delivering one or more nucleic acids encoding a DNA pyrimidine base-editing system described herein.
  • RNA endonuclease As relevant to certain methods provided herein, specificity of a Cas endonuclease is enabled by guide RNA.
  • Crispr RNA (targeting) and tracr RNA (scaffolding to Cas protein) can be synthetically produced as a single RNA molecule known as single guide RNA (sgRNA) .
  • the methods comprise, including further comprise, delivering a sgRNA to a cell.
  • a method of modifying a target DNA pyrimidine (such as a target thymine or a target cytosine) in a nucleic acid sequence present in one or more nucleic acid molecules, the method comprising contacting at least one nucleic acid molecule of the one or more nucleic acid molecules with an engineered DNA pyrimidine-modifying polypeptide described herein comprising a DNA recognition domain, wherein the DNA recognition domain of said engineered DNA pyrimidine-modifying polypeptide is configured to associate with the at least one nucleic acid molecule such that the UNG variant of the engineered DNA pyrimidine-modifying polypeptide is positioned to modify the target DNA pyrimidine.
  • a target DNA pyrimidine such as a target thymine or a target cytosine
  • the method comprises a method of modifying a target thymine in a nucleic acid sequence present in one or more nucleic acid molecules, wherein the target thymines in the one or more nucleic acid molecules undergo a T-to-C, T-to-G, or T-to-Amodification.
  • an NGG PAM site is 14 to 19 base pairs away from the target thymine.
  • an NGG PAM site is 14 to 19 base pairs away from the target thymine, wherein counting starts at the first base outside of the NGG PAM in a protospacer, and wherein the NGG PAM site is associated with DNA recognized by the DNA recognition domain, such as a Cas nuclease.
  • a relaxed NG PAM site is 14 to 19 base pairs away from the target thymine. In some embodiments, a relaxed NG PAM site is 14 to 19 base pairs away from the target thymine, wherein counting starts at the first base outside of the NG PAM in a protospacer, and wherein the relaxed NG PAM site is associated with DNA recognized by the DNA recognition domain, such as a Cas nuclease.
  • the efficiency of editing of a target DNA nucleotide base is at least about 2%, such as at least about any of 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, or 65%.
  • the method exhibits an editing efficiency at least 2-fold greater than that of a method comprising delivering an engineered thymine-modifying polypeptide which does not comprise one or more amino acid substitutions of L80V, K113E, K197E, S204A, S204G, or E237Q, to a cell.
  • the efficiency of editing is determined by Sanger sequencing.
  • the efficiency of editing is determined by next-generation sequencing.
  • the method has a low off-target editing rate. In some embodiments, the method has lower than about 1% (e.g., no more than about any one of 0.5%, 0.1%, 0.05%, 0.01%, 0.001%or lower) editing efficiency on a non-target DNA nucleotide base as compared to the target DNA nucleotide base. In some embodiments, the method does not edit non-target DNA nucleotide bases.
  • the methods comprise delivering a pyrimidine base-editing system to a cellular location.
  • the cellular location is the cytoplasm.
  • the cellular location is a mitochondrion and the engineered DNA pyrimidine base-modifying polypeptide comprises a MLS.
  • the method can be practiced in a variety of cell lines.
  • the cell line is selected from a group consisting of human cell lines HEK293T and HCT116, mouse cell line N2A, and monkey cell lines NIH3T3 and Cos-7.
  • the cell line is HEK293T.
  • the method of modifying a target DNA pyrimidine in a nucleic acid sequence comprises selecting a target DNA pyrimidine that is at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 base pairs away from an editable base on the opposite strand.
  • the method results in low cytotoxicity.
  • cytotoxicity can be determined by percent of viable cells 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 days (e.g., 3 days) after transfection with an editor system.
  • the methods of modifying a target DNA pyrimidine described herein result in less than about a 1.5-fold, less than about a 1.4-fold, less than about a 1.3-fold, less than about a 1.2-fold, or less than about a 1.1-fold decrease in viable cell count compared to untreated cells.
  • the methods of modifying a target DNA pyrimidine described herein result in about no decrease in viable cell count compared to untreated cells.
  • cytotoxicity can be determined by cell growth/proliferation, as measured by optical density, 72 hours after transfection with an editor system.
  • the methods of modifying a target DNA pyrimidine described herein result in less than about a 1.2-fold or less than about a 1.1-fold decrease in optical density at 72 hours compared to untreated cells.
  • the methods of modifying a target DNA pyrimidine described herein result in about no decrease in optical density at 72 hours compared to untreated cells.
  • the methods of editing a target nucleotide in a cell using a pyrimidine base-editing system provided herein is performed to create a cell model.
  • the target nucleotide is the site of a known SNP, wherein the nucleobase editor system is configured to edit the target nucleotide to revert the SNP to the wild-type nucleotide, create the SNP, or adjust the SNP associated with a disease to another nucleotide base.
  • a method of treating a disease comprising administering to the individual a pyrimidine base-editing system described herein, or a precursor thereof.
  • the DNA pyrimidine base-editing system is configured to edit a base associate with the disease.
  • the DNA pyrimidine base-editing system is configured to edit a base associated with treatment, in whole or in part, of the disease.
  • the DNA pyrimidine base-editing system described herein, or a precursor thereof, administered to an individual is formulated for intracellular delivery.
  • An exemplary disease-relevant mutation that can be corrected by the provided fusion proteins in vitro or in vivo is a mutation resulting in Hurler syndrome.
  • Hurler syndrome is caused by the presence of a premature stop codon on exon 9 of IDUA gene, which prevents the production of functional IDUA protein.
  • the method of treating a disease comprises contacting a cell carrying a mutation to be corrected, e.g., a GM06214 (IDUA W402X ) cell derived from patient with Hurler syndrome, with a pyrimidine base-editing system described herein.
  • the method of treating a disease results in at least about 50%, at least about 45%, at least about 40%, at least about 35%, at least about 30%, at least about 25%, or at least about 20%editing efficiency of the targeted site.
  • the method of treating a disease results in restoration of IDUA catalytic activity in a GM06214 (IDUA W402X ) cell derived from patient with Hurler syndrome.
  • the restoration of IDUA catalytic activity is at least about 3-fold, at least about 2.5-fold, at least about 2-fold, or at least about 1.5-fold.
  • A3140G H1047R
  • the phosphoinositide-3-kinase, catalytic alpha subunit (PI3KCA) protein acts to phosphorylate the 3-OH group of the inositol ring of phosphatidylinositol.
  • the PI3KCA gene has been found to be mutated in many different carcinomas, and thus it is considered to be potent oncogene.
  • the A3140G mutation is present in several NCI-60 cancer cell lines, such as, for example, the HCT116, SKOV3, and T47D cell lines, which are readily 51 available from the American Type Culture Collection (ATCC) .
  • ATCC American Type Culture Collection
  • a cell carrying a mutation to be corrected e.g., a cell carrying a point mutation, e.g., an A3140G point mutation in exon 20 of the PI3KCA gene, resulting in a H1047R substitution in the PI3KCA protein
  • an expression construct encoding a Cas9 DNA-editing fusion protein and an appropriately designed sgRNA targeting the fusion protein to the respective mutation site in the encoding PI3KCA gene.
  • Control experiments can be performed where the sgRNAs are designed to target the fusion enzymes to non-C residues that are within the PI3KCA gene.
  • Genomic DNA of the treated cells can be extracted, and the relevant sequence of the PI3KCA genes PCR amplified and sequenced to assess the activities of the fusion proteins in human cell culture.
  • the target nucleotide sequence may comprise a target base (e.g., a point mutation) associated with a disease, disorder, or condition, such as sickle cell anemia, Fanconi anemia, ectodermal dysplasia skin fragility syndrome, lattice corneal dystrophy Type III, or Noonan syndrome.
  • the target sequence may comprise a T to A point mutation associated with a disease, disorder, or condition, and wherein the methylation of the mutant A base results in mismatch repair-mediated correction to a sequence that is not associated with a disease, or disorder, or condition.
  • the target sequence may instead comprise an A to T point mutation associated with a disease, disorder, or condition, and wherein the methylation of the A base paired with the mutant T results in mismatch repair-mediated correction to a sequence that is not associated with a disease, or disorder, or condition.
  • the target sequence may encode a protein, and where the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to a wild-type codon.
  • the target sequence may also be at a splice site, and the point mutation results in a change in the splicing of an mRNA transcript as compared to a wild-type transcript.
  • the target may be at a non-coding sequence of a gene, such as a promoter, and the point mutation results in increased or decreased expression of the gene.
  • Exemplary target genes include HBB, in which an A to T point mutation at residue 334 results in a sickle cell anemia phenotype; and FANCC, in which an A to T point mutation at residue 456 results in a Fanconi anemia phenotype.
  • Additional target genes include TGFBI (associated with lattice corneal dystrophy type III) , PKP1 (associated with ectodermal dysplasiaskin fragility syndrome) , KRAS and SOS1 (both associated with Noonan syndrome) , for which the disease phenotype is frequently caused by T: Ato A: T point mutations.
  • the edit results in alleviation of a premature stop codon.
  • the edit is a TAG-to-XAG mutation in exon 9 of a IDUA gene.
  • the method results in restoration of IDUA catalytic activity.
  • the disease is Hurler syndrome.
  • the engineered DNA pyrimidine modifying polypeptide targets a splicing site on a gene.
  • the disease is Duchene Muscular Dystrophy.
  • the DNA pyrimidine base editing system has a first editing efficiency (such as measured by an editing percentage) in a first cell type and a second editing efficiency in a second cell type, wherein the first editing efficiency is different than the second editing efficiency.
  • the nucleobase editor system may be configured to have cell type or tissue specificity, wherein the editing efficiency of the nucleobase editor system is higher in a targeted cell type or tissue and lower or not substantially occurring (such as an editing percentage of about 5%or less) in a different cell type or tissue.
  • kits, medicines, and compositions of the DNA pyrimidine base-editing system taught herein are provided herein, in certain aspects, are kits, medicines, and compositions of the DNA pyrimidine base-editing system taught herein.
  • kits for a pyrimidine base-editing system comprising: a UNG variant described herein and, optionally, a DNA recognition domain, such as a UNG variant for thymine or cytosine editing.
  • kit for a pyrimidine base-editing system comprising one or more polynucleotides encoding a UNG variant described herein and, optionally, a DNA recognition domain, such as a UNG variant for thymine or cytosine editing.
  • Kits provided herein may include one or more containers, and instruction for use thereof according to the methods provided herein. Instructions supplied in the kits of the invention are typically written instructions on a label or package insert (e.g., a paper sheet included in the kit) , but machine-readable instructions (e.g., instructions carried on a magnetic or optical storage disk) are also acceptable.
  • kits provided herein are in suitable packaging.
  • suitable packaging include, but is not limited to, vials, bottles, jars, flexible packaging (e.g., sealed Mylar or plastic bags) , and the like. Kits may optionally provide additional components such as buffers and interpretative information.
  • the present application thus also provides articles of manufacture, which include vials (such as sealed vials) , bottles, jars, flexible packaging, and the like.
  • the medicine, composition, or unit dosage form comprising a pyrimidine base-editing system described herein.
  • the DNA pyrimidine base-editing system comprises a UNG variant described herein and, optionally, a DNA recognition domain, such as a UNG variant for thymine or cytosine editing.
  • UNG Homo sapiens
  • SEQ ID NO: 1 UNG Y147A Homo sapiens; *engineered amino acid
  • SEQ ID NO: 2 UNG N204D Homo sapiens; *engineered amino acid
  • SEQ ID NO: 3 UNG (Escherichia coli)
  • UNG Y66A Escherichia coli; *engineered amino acid
  • SEQ ID NO: 5 UNG N123D (Escherichia coli; *engineered amino acid)
  • protein sequence SEQ ID NO: 6 UNG (Deinococcus radiodurans)
  • SEQ ID NO: 7 UNG Y85A (Deinococcus radiodurans; *engineered amino acid)
  • SEQ ID NO: 8 UNG L80V, Y85A, K113E, K197E, S204A, E237Q, N* (Deinococcus radiodurans; *engineered amino acid) SEQ ID NO: 9 UNG
  • PCR was performed using PrimeSTAR GXL DNA polymerase (TaKaRa) or Q5 Hot Start High-Fidelity DNA Polymerase (NEB) .
  • Wild-type hUNG SEQ ID NO: 1)
  • EcUNG SEQ ID NO: 4
  • DrUNG SEQ ID NO: 7
  • HHV1_UNG SEQ ID NO: 16
  • VACV_UNG SEQ ID NO: 19
  • Cas9 and other genes were synthesized as gene blocks and codon optimized for mammalian expression (Tsingke Biotechnology Co., Ltd. ) .
  • the mutation UNG fragment was inserted into the pCMV vector by Gibson assembly using Gibson Assembly Master Mix (NEB) .
  • mRNAs were produced using the commercial HiScribe TM T7 High Yield RNA Synthesis Kit (New England Biolabs) according to the manufacturer’s instructions with the linearized plasmids containing the T7 promotor, UNG mutation, nCas9 (D10A) and -225-nt polyA elements.
  • Final IVT products were column purified and concentrated with the RNA Clean &Concentrator Kit (ZYMO Research) .
  • mCherry and eGFP coding sequences were PCR amplified and digested using BsmBI (Thermo Fisher Scientific, no. ER0452) before being subjected to T4 DNA ligase (NEB, no. M0202L) -mediated ligation with 3 ⁇ GGGGS linkers (GGGGS SEQ ID NO: 27; and GGGGSGGGGSGGGGS SEQ ID NO: 28) .
  • the ligation product was subsequently inserted into the pLenti-CMV-MCS-PURO backbone.
  • reporter constructs pLenti-CMV-MCS-PURO backbone
  • HEK293T cells were cotransfected into HEK293T cells together with two viral packaging plasmids, pR8.74 and pVSVG. After 72 hours, viral supernatant was collected and stored at -80 °C.
  • HEK293T cells were infected with lentivirus, and then mCherry+ cells were sorted via fluorescence-activated cell sorting (FACS) and cultured to select a single clone cell line stably expressing a dual-fluorescence reporter system with no detectable eGFP background.
  • FACS fluorescence-activated cell sorting
  • HEK293T, HCT116, NIH3T3, N2A and Cos-7 cell lines were maintained at Peking University. All the cells above and the dual-fluorescence reporter cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Gibco) with 10%fetal bovine serum (Biological Industries) and penicillin/streptomycin (Sigma) at 37 °C with 5%CO 2 .
  • DMEM Dulbecco’s modified Eagle’s medium
  • fetal bovine serum Biological Industries
  • penicillin/streptomycin Sigma
  • GM06214 were from MEISEN CELL and cultured in Dulbecco’s modified Eagle’s medium (DMEM, Gibco) with 15%fetal bovine serum (Biological Industries) , 1%Non-Essential Amino Acid (NEAA) and penicillin/streptomycin (Sigma) at 37 °C with 5%CO 2 .
  • DMEM Dulbecco modified Eagle’s medium
  • NEAA 1%Non-Essential Amino Acid
  • Picillin/streptomycin Sigma
  • HEK293T reporter cells were plated in 12-well cell culture plates. After 72 hours of transfection, mCherry, BFP and EGFP fluorescence were analyzed by flow cytometer. The mCherry signal served as a fluorescent selection marker for reporter expressing cells, and BFP signal served as a fluorescent selection marker for sg RNA expressing cells. Percentages of eGFP+/mCherry+ cells were calculated as the readout for editing efficiency. FACS data were analyzed with FlowJo X (v. 10.0.7) .
  • the NKK-scanned mutant library was utilized as a background and Error Prone PCR was performed (Agilent, GeneMorph II) . Subsequently, the PCR products were constructed into the pLenti vector using GoldenGate. After packaging into lentivirus, the virus was used to infect the reporter system cells, and FACS sorting was conducted two weeks post-infection. Targeted deep sequencing
  • Genomic sites of interest were amplified into fragments of approximately 200 bp from genomic DNA samples using PrimeSTAR GXL DNA polymerase (TaKaRa) . See Table 2 for the list of primers used.
  • PCR products were purified using DNA Clean &Concentrator-25 (Zymo Research) for Sanger sequencing and targeted deep sequencing.
  • Targeted deep sequencing libraries were prepared using the VAHTS Universal DNA Library Prep Kit for Illumina V3 (Vazyme) . Briefly, the PCR fragments were sequentially subjected to end repair, adapter ligation, and then PCR amplification.
  • DNA purification in library preparation was performed using Agencourt Ampure XP beads (Beckman Coulter) , and library amplification was performed using Q5U Hot Start High-Fidelity DNA Polymerase (NEB) and VAHTS Multiplex Oligos Set 4/5 for Illumina (Vazyme) .
  • the final library was subjected to quantification using the Qubit dsDNA HS assay kit (Invitrogen) and sequenced using Illumina HiSeq X Ten. Table 2 Analysis of high-throughput sequencing data for targeted amplicon sequencing
  • HEK293T cells were seeded in 96-well plates (Corning) at 2 ⁇ 10 4 cells per well in 200 ⁇ l of complete growth medium. 24 hours after seeding, cells were transfected with 1 ⁇ l PEI (ProteinTech) and 250 ng sgRNA and 250 ng editors. 0 h, 24 h, 48 h, and 72 h after transfection, 20 ⁇ l CCK8 solution was added to each well and the absorbance at 450 nm was measured after 1.5 hours using a microplate reader. Electroporation in primary cells
  • mRNA electroporation in GM06214 cells 1.5 ⁇ g sgRNA and 4.5 ⁇ g DrUNG-nCas9 mRNA were electroporated with Nucleofector 2b Device (Lonza) and Basic Nucleofector Kit (Lonza) , and the electroporation program was U-012. Then the cells were transferred to warm culture medium for use in one or more assays. IDUA catalytic activity assay The gathered cell pellet was resuspended and lysed with 28 ⁇ l 0.5%Triton X-100 in 1 ⁇ PBS buffer on ice for 30 min.
  • Cas-OFFinder CRISPR RGEN Tools (rgenome. net) ) used for prediction of potential off-target sites of Cas9 RNA-guided endonucleases, the top 10 off-target sites were selected for validation. Ten off-target sites of each target site were amplified from genomic DNA prepared and sequenced on Hi-TOM NGS platform. Genome-wide DNA off-target sequencing
  • RNAs were purified with Direct-zol RNA Miniprep Kits (Zymo Research) .
  • the mRNA was then purified using NEBNext Poly (A) mRNA Magnetic Isolation Module (New England Biolabs) , processed with the NEBNext Ultra II RNA Library Prep Kit for Illumina (New England Biolabs) , followed by deep sequencing analysis using Illumina HiSeq X Ten platform. Analysis of nuclear genome off-target editing
  • Variant calls were filtered based on the FilterMutectCalls criteria, excluding positions annotated as position, slippage, weak evidence, or low mapping quality. Additionally, mutations with a frequency exceeding 1%in control experiments were excluded. Only mutations at positions where the reference genome contained a T and the mutated allele was A/C/G were retained. To identify potential off-target genome editing events, stringent criteria were applied to mitigate high noise levels. Additional requirements for base quality and mapping quality were imposed based on quality control 611 criteria. Only mutations with a high median base quality (MBQ ⁇ 32) and high mapping quality (MMQ ⁇ 50) were considered potential off-target editing sites.
  • MBQ ⁇ 32 median base quality
  • MMQ ⁇ 50 high mapping quality
  • UNG engineered uracil-DNA glycosylases
  • mutations were introduced to amino acids of UNG which influence the native binding pocket size for uracil, enabling the entry of thymine into the active site pocket and facilitating programmable thymine base editing (FIG. 1A) .
  • Saturation mutagenesis was conducted on several amino acids, including G143-D145, Y147, and F158, which could affect the entry of thymine into the active site pocket.
  • Y147 was identified as hindering the entry of the methyl group at the 5th position of thymine.
  • UNG uracil-DNA glycosylases
  • mutations were introduced to amino acids which influence the binding pocket size of uracil, enabling the entry of cytosine into the active site pocket and facilitating programmable cytosine base editing (FIG. 2A) .
  • N204 affected the entry of cytosine into the active site pocket.
  • the proportion of eGFP-positive cells reached 30%only when Asn204 was mutated to Asp (FIG. 2B) .
  • N204D mutant SEQ ID NO: 3
  • SEQ ID NO: 3 showed the highest editing efficiency at 30%, which predominantly resulted in C-to-G or C-to-T conversion (FIG. 2C) .
  • UNG non-human engineered uracil-DNA glycosylases
  • UNGs were specifically selected from Escherichia coli (Ec) (SEQ ID NO: 4) , Deinococcus radiodurans (Dr) (SEQ ID NO: 7) , Human Herpesvirus 1 (HHV1) (SEQ ID NO: 16) , and Vaccinia virus (VACV) (SEQ ID NO: 19) .
  • Ec Escherichia coli
  • Dr Deinococcus radiodurans
  • HHV1 Human Herpesvirus 1
  • VACV Vaccinia virus
  • EcUNG N123D (SEQ ID NO: 6) and HHV1_UNG (N147D) (SEQ ID NO: 18) achieved editing efficiency on par with hUNG (N204D) (SEQ ID NO: 3) (see Example 2) for cytidine (FIG. 3A) .
  • DrUNG Y85A (SEQ ID NO: 8) exhibited the highest efficiency, nearly five times greater than that of hUNG (Y147A) (SEQ ID NO: 2) (FIG. 3B) .
  • Example 4 Enhancement of DrUNG thymine editing efficiency
  • This example demonstrates the development and use of optimized UNG variants with high thymine-editing efficiency.
  • DrUNG-Y85A SEQ ID NO: 1
  • SEQ ID NO: 8 a library of DrUNG-Y85A (SEQ ID NO: 1) (SEQ ID NO: 8) variants was generated. Subsequently, the targeting reporter system was introduced using nCas9 and sgRNA, the top 1%of eGFP+ cells were sorted through FACS, and then the sequences of DrUNG (Y85A) mutants were amplified (FIG. 4A) . Of the resulting mutations, substitutions at S204, Y85, E237, and K113 were most common (FIG. 4B) . In total, 25 mutants were identified.
  • the 6th mutant variant (SEQ ID NO: 9) exhibited the highest editing efficiency, followed by the 2nd mutant variant (SEQ ID NO: 10) (FIG. 4C) .
  • These two variants were designated DrUNG mutant 6 (SEQ ID NO: 9, also referred to as “thymine base editor” [TBE] ) and DrUNG mutant 2 (SEQ ID NO: 10) , respectively.
  • DrUNG mutant 6 (SEQ ID NO: 9) contains mutations L80V, Y85A, K113E, K197E, S204A, E237Q, and an N-terminal deletion of two amino acids.
  • DrUNG mutant 2 contains mutations Y85A, K113E, and S204G (SEQ ID NO: 10) .
  • DrUNG mutant 6 SEQ ID NO: 9
  • DrUNG mutant 2 SEQ ID NO: 10
  • hUNG Y147A
  • DrUNG mutant 2 SEQ ID NO: 10
  • This example demonstrates the use of engineered UNG variants to edit thymine at endogenous sites in a host cell.
  • DrUNG mutant 6 (SEQ ID NO: 9) and DrUNG (SEQ ID NO: 10) mutant 2 were utilized to target 14 endogenous sites in HEK293T cells. Notably, DrUNG mutant 6 (SEQ ID NO: 9) exhibited effective editing at all 16 targeted sites, whereas DrUNG mutant 2 (SEQ ID NO: 10) exhibited effective editing at only 9 sites. The peak editing efficiency for DrUNG mutant 6 (SEQ ID NO: 9) reached 40%, and for DrUNG mutant 2 (SEQ ID NO: 10) it reached 33% (FIG. 5A) . Concurrent with targeted editing, the thymine base editor induced indels at the targeted sites, with the highest indel rate being 10% (FIG. 5A) . Of the indels caused by the thymine base editor based on DrUNG, 76%were deletions and 24%were insertions.
  • the average editing efficiency of the base editor mediated by DrUNG mutant 6 was 20%, marking a substantial 3-fold increase compared to the editing efficiency of DrUNG mutant 2 (SEQ ID NO: 10) (FIG. 5B) .
  • the editing window of the thymine base editor approximately 52%of thymine converted to cytosine, around 30%transformed into guanine, and approximately 18%changed to adenosine (FIG. 5C) .
  • DrUNG mutant 6 base editor showed strong editing activity and low indel levels in a variety of cell lines on reporter system, including human cell line HCT116, mouse cell line N2A, monkey cell lines NIH3T3 and Cos-7, among which N2A and NIH3T3 cell lines achieved approximately 70%reporter system lighting efficiency (FIG. 5G and 5H) .
  • the thymine base editors we developed exhibit effective editing in multiple cell lines.
  • DrUNG mutant 6 The editing specificity of DrUNG mutant 6 was evaluated on a genome-wide and transcriptome scale. Through genome-wide high-throughput sequencing detection, it was determined that DrUNG mutant 6 transfection group showed some off-targets compared to the control group (FIG. 6A) . The 50 bp upstream and downstream of these off-target sites were removed, and it was determined there were no potential sgRNA binding sites. Furthermore, Cas-OFFinder was used to predict the potential off-target sites of four sgRNAs on the genome, and the 10 highest-scoring off-target sites were selected for sequencing. None of these off-target sites had been edited (FIG. 6C) . This shows that the off-target sites may be random off-target caused by DrUNG mutant 6 (FIG. 6A) .
  • DrUNG mutant 6 will cause some random off-target sites on the genome but will not cause off-target on the transcriptome, which shows that DrUNG mutant 6 is a relatively safe thymine base editor.
  • Example 7 Significant restoration of Hurler syndrome disease cell phenotypes using DrUNG mutant 6
  • DrUNG mutant 6 is a tool that can be used in the treatment of many mutation-related diseases, such as Hurler syndrome.
  • Hurler syndrome is caused by the presence of a premature stop codon (resulting from a TGG-to-TAG mutation) on exon 9 of IDUA gene, which prevents the production of functional IDUA protein.
  • performing a TAG-to-XAG mutation can potentially restore IDUA activity.
  • Three different Cas9 proteins were selected and corresponding sgRNA were designed. By transfecting cells of the previously reported Hurler syndrome disease premature stop codon reporter system (FIG. 7A) , it was observed that all three Cas9 proteins fused to DrUNG mutant 6 can achieve thymine base editing, among which spCas9 protein has the highest editing efficiency (FIG.
  • the mRNA of the spCas9 and DrUNG mutant 6 fusion protein and sgRNA was co-transfected into GM06214 (IDUA W402X ) cells derived from patient with Hurler syndrome, which contained the IDUA W402X mutation (FIG. 7C) , achieving an editing efficiency of approximately 25%at the targeted site (FIG. 7D) and significantly restoring the IDUA catalytic activity of the cells (FIG.7E) .

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • General Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

L'invention concerne des compositions, des procédés et des systèmes pour l'édition des bases pyrimidines de l'ADN. Selon certains modes de réalisation, la présente invention concerne des polypeptides ingénierisés modifiant la thymine ou des polypeptides ingénierisés modifiant la cytosine, contenant un variant d'un glycosylate d'uracile-ADN (UNG). Selon certains modes de réalisation, la présente invention concerne des polypeptides ingénierisés et un domaine de reconnaissance d'ADN, conçus pour cibler un acide nucléique en vue de son excision. La présente invention concerne également un acide nucléique codant pour les polypeptides selon la présente invention, des composants supplémentaires utiles pour l'édition, tels que des ARN guidés uniques (ARNsg), des kits, des médicaments, des compositions et des procédés d'utilisation associés.
PCT/CN2025/094182 2024-05-10 2025-05-12 Édition programmable de bases pyrimidines d'adn par excision à base d'uracil-adn glycosylase ingénierisée Pending WO2025232923A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2024/092138 2024-05-10
CN2024092138 2024-05-10

Publications (1)

Publication Number Publication Date
WO2025232923A1 true WO2025232923A1 (fr) 2025-11-13

Family

ID=97674536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2025/094182 Pending WO2025232923A1 (fr) 2024-05-10 2025-05-12 Édition programmable de bases pyrimidines d'adn par excision à base d'uracil-adn glycosylase ingénierisée

Country Status (1)

Country Link
WO (1) WO2025232923A1 (fr)

Similar Documents

Publication Publication Date Title
KR102851101B1 (ko) Rna를 편집하기 위한 방법 및 조성물
CN116497067B (zh) 治疗血红素病变的组合物和方法
EP3526324B1 (fr) Protéine associée à crispr (cas)
CN114630904B (zh) 用于无毒调理的组合物和方法
JP2023517041A (ja) クラスiiのv型crispr系
CN113631708A (zh) 编辑rna的方法和组合物
WO2020168132A1 (fr) Éditeurs de base adénosine désaminase et leurs méthodes d'utilisation pour modifier une nucléobase dans une séquence cible
WO2019168953A1 (fr) Variants de cas9 évolués et leurs utilisations
JP2019500043A (ja) 異常ヘモグロビン症の治療用組成物および方法
EP4349979A1 (fr) Nucléase cas12i modifiée, protéine effectrice et utilisation de celle-ci
US20210355475A1 (en) Optimized base editors enable efficient editing in cells, organoids and mice
JP2024511621A (ja) 新規crispr酵素、方法、システム、及びそれらの使用
JP2020191879A (ja) 細胞の有する二本鎖dnaの標的部位を改変する方法
US20250144149A1 (en) Adenosine deaminase base editors and methods for use thereof
JP2024540337A (ja) 新型CRISPR-Cas12iシステム及びその用途
EP4530351A2 (fr) Compositions et procédés pour l'édition de gènes améliorée
WO2025232923A1 (fr) Édition programmable de bases pyrimidines d'adn par excision à base d'uracil-adn glycosylase ingénierisée
WO2022109275A2 (fr) Vecteurs, systèmes et procédés d'édition de gènes eucaryotes
EP4209589A1 (fr) Complexe contenant une cytidine désaminase miniaturisé pour modifier l'adn double brin
HK40081918A (en) Methods and compositions for editing rna
HK40081918B (en) Methods and compositions for editing rna
HK40073630A (en) Crispr-associated (cas) protein
JP2025530183A (ja) レット症候群治療法
HK40092826A (en) Miniaturized cytidine deaminase-containing complex for modifying double-stranded dna
HK40061041A (en) Methods and compositions for editing rnas