[go: up one dir, main page]

WO2024222812A1 - Nouveaux éditeurs de base et leurs utilisations - Google Patents

Nouveaux éditeurs de base et leurs utilisations Download PDF

Info

Publication number
WO2024222812A1
WO2024222812A1 PCT/CN2024/089874 CN2024089874W WO2024222812A1 WO 2024222812 A1 WO2024222812 A1 WO 2024222812A1 CN 2024089874 W CN2024089874 W CN 2024089874W WO 2024222812 A1 WO2024222812 A1 WO 2024222812A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
seq
fusion protein
amino acid
base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/089874
Other languages
English (en)
Inventor
Huawei TONG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huidagene Therapeutics Co Ltd
Huidagene Therapeutics Singapore Pte Ltd
Original Assignee
Huidagene Therapeutics Co Ltd
Huidagene Therapeutics Singapore Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huidagene Therapeutics Co Ltd, Huidagene Therapeutics Singapore Pte Ltd filed Critical Huidagene Therapeutics Co Ltd
Priority to CN202480020620.3A priority Critical patent/CN120936712A/zh
Publication of WO2024222812A1 publication Critical patent/WO2024222812A1/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • C12N9/222Clustered regularly interspaced short palindromic repeats [CRISPR]-associated [CAS] enzymes
    • C12N9/226Class 2 CAS enzyme complex, e.g. single CAS protein
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2497Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing N- glycosyl compounds (3.2.2)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/30Special therapeutic applications
    • C12N2320/33Alteration of splicing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/02Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2) hydrolysing N-glycosyl compounds (3.2.2)
    • C12Y302/02021DNA-3-methyladenine glycosylase II (3.2.2.21)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/02Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2) hydrolysing N-glycosyl compounds (3.2.2)
    • C12Y302/02027Uracil-DNA glycosylase (3.2.2.27)

Definitions

  • Base editing is a powerful technology for basic research and therapeutic applications [1, 2] .
  • Current base editors mainly contain a nucleic acid programmable DNA binding protein, such as a catalytically impaired CRISPR-associated (Cas) nuclease, that was fused with a single-stranded DNA deaminase enzyme and sometimes an additional protein that could modulate DNA repair machinery [3, 4] .
  • Cas CRISPR-associated
  • C-to-G base editors [6-10] and adenine transversion base editor (AYBE) [11] were constructed by fusing existing CBE or ABE with a DNA glycosylase variant to generate new tools for achieving more versatile base editing outcomes, including C-to-G, A-to-C and A-to-T editing (FIG. 9) .
  • CRISPR-free CBEs (DdCBEs) were reported for performing C-to-T base editing in mitochondria DNA, by fusing two halves of a double-strand DNA cytidine deaminase (DddA) variants with two separate TALE (transcription activator-like effector) proteins [12-14] .
  • Provided in the disclosure includes at least in part base editors and base editing methods capable of direct base editing of a target deoxyribonucleotide (e.g., dG, dT) in a target dsDNA.
  • Provided in the disclosure includes at least in part base editors and base editing methods capable of base editing of a target deoxyribonucleotide (e.g., dC) in a target dsDNA in the absence of deamination.
  • the disclosure provides a fusion protein comprising:
  • napDNAbd nucleic acid programmable DNA binding domain capable of binding a target dsDNA comprising:
  • a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
  • dG deoxyguanosine
  • dT thymidine
  • dC deoxycytidine
  • a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
  • first deoxyribonucleotide e.g., dG, dT, dC
  • target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence
  • a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide.
  • the fusion protein does not comprise a deaminase domain, e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
  • a deaminase domain e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
  • the first deoxyribonucleotide is deoxyguanosine (dG) , thymidine (dT) , or deoxycytidine (dC) .
  • the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide is dG-to-dA, dG-to-dT, dG-to-dC, dT-to-dA, dT-to-dC, dT-to-dG, dC-to-dA, dC-to-dT, or dC-to-dG.
  • the base excising domain comprises a glycosylase.
  • the glycosylase is selected from the group consisting of N-methylpurine DNA glycosylase (MPG) , 8-oxoguanine DNA glycosylase (OGG1) , methyl-CpG binding domain 4, DNA glycosylase (MBD4) , thymine DNA glycosylase (TDG) , uracil DNA glycosylase (UNG) , single-strand-selective monofunctional uracil-DNA glycosylase 1 (SMUG1) , mutY DNA glycosylase (MUTYH) , nth like DNA glycosylase 1 (NTHL1) , nei like DNA glycosylase 1 (NEIL1) , nei like DNA glycosylase 2 (NEIL2) , nei like DNA glycosylase 3 (NEIL3) , and mutants thereof capable of recognizing and excising a base from a nucleotide of a nucleic acid.
  • MPG N-methylpurine DNA glycosylase
  • the base excising domain comprises an N-methylpurine DNA glycosylase (MPG) .
  • MPG N-methylpurine DNA glycosylase
  • the MPG comprises an amino acid substitution relative to a reference MPG of SEQ ID NO: 7 at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
  • the amino acid substitution is a substitution with R, A, N, or G.
  • the MPG comprises an amino acid substitution relative to a reference MPG of SEQ ID NO: 7 that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
  • the MPG comprises a combination substitution relative to a reference MPG of SEQ ID NO: 7 that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1.
  • the MPG comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
  • the MPG is (substantially) capable of excising guanine of dG.
  • the base excising domain comprises an uracil-DNA glycosylase (UNG) .
  • UNG uracil-DNA glycosylase
  • the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 or 137 at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the amino acid substitution is a substitution with A, D, V, or T.
  • the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 or 137 that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
  • the UNG comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the reference UNG of SEQ ID NO: 135 or 137, wherein the position is numbered according to SEQ ID NO: 133.
  • the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 137 that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
  • M N-terminal Methionine
  • the UNG is (substantially) capable of excising thymine of dT.
  • the UNG is (substantially) capable of excising cytosine of dC.
  • the napDNAbd is RNA programmable DNA binding protein.
  • the napDNAbd is selected from the group consisting of CRISPR-associated (Cas) protein, IscB, IsrB, Argonaute, and TnpB.
  • the napDNAbd is a nickase, e.g., a Cas9 nickase, an IscB nickase.
  • the napDNAbd is nuclease-inactive, e.g., dead Cas9, dead Cas12i.
  • the napDNAbd comprise an amino acid sequence having a sequence identity of at least about 60%(e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 2, 48, 50, 52, or 163.
  • the fusion protein comprises, from N-terminal to C-terminal, (1) the napDNAbp and the base excising domain; or (2) the base excising domain and the napDNAbp.
  • the napDNAbd (e.g., Cas9) is a two-part napDNAbd, for example, a two-part split Cas9, comprising a N-terminal portion and a C-terminal portion, and wherein the fusion protein comprises, from N-terminal to C-terminal, (1) the N-terminal portion of the napDNAbd, the base excising domain, and the C-terminal portion of the napDNAbd; (2) the C-terminal portion of the napDNAbd, the base excising domain, and the N-terminal portion of the napDNAbd; or (3) the base excising domain, the C-terminal portion of the napDNAbd (e.g., amino acids at positions 1249-1368) , and the N-temrinal portion (e.g., amino acids at positions 1-1248) of the napDNAbd .
  • the C-terminal portion of the napDNAbd e.g., amino acids at positions 1249-1368
  • the N-temrinal portion e.g., amino
  • the napDNAbd is SpCas9 (e.g., a SpCas9 nickase) or a mutant thereof (e.g., a SpG Cas9 nickase) .
  • the N-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1 or 2 to 1012, 1028, 1041, 1046, 1047, 1248, 1249, or 1300.
  • the C-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1013, 1029, 1042, 1047, 1048, 1249, 1063, 1064, 1230, 1249, or 1301 to 1368.
  • the fusion protein comprises the base excising domain embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2; or embedded between positions 2-1047 of nCas9 (SEQ ID NO: 2) and positions 1064-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2.
  • the fusion protein comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 12, 14, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 55, 57, 59, 61, 63, 136, 138, 140, 142, 143, 145, 147, 149, 151, 153, 154, 156, 158, 160, 161, 162, and 164.
  • 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,
  • the disclosure provides a system comprising:
  • a guide nucleic acid or a polynucleotide encoding the guide nucleic acid comprising:
  • the guide nucleic acid is a guide RNA (gRNA) .
  • gRNA guide RNA
  • the scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 40, 73, or 74.
  • the scaffold sequence comprises (1) a sequence of SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 40, 73, or 74.
  • the fusion protein or system of the disclosure further comprises a translesion synthesis (TLS) polymerase or a recruiting domain or component capable of recruiting a TLS polymerase.
  • TLS translesion synthesis
  • the TLS polymerase is selected from the group consisting of Pol ⁇ (alpha) , Pol ⁇ (beta) , Pol ⁇ (delta) (PCNA) , Pol ⁇ (gamma) , Pol ⁇ (eta) , Pol ⁇ (iota) , Pol ⁇ (kappa) , Pol ⁇ (lamda) , Pol ⁇ (mu) , Pol ⁇ (nu) , Pol ⁇ (theta) , and REV1.
  • the disclosure provides a polynucleotide encoding the fusion protein of the disclosure and optionally the guide nucleic acid of the disclosure.
  • the disclosure provides a delivery system comprising (1) the fusion protein of the disclosure, the polynucleotide of the disclosure, or the system of the disclosure; and (2) a delivery vehicle.
  • the disclosure provides a vector comprising the polynucleotide of the disclosure.
  • the disclosure provides a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
  • a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
  • the disclosure provides a pharmaceutical composition comprising (1) the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, or the cell of the disclosure; and (2) a pharmaceutically acceptable excipient.
  • the disclosure provides a cell or a progeny thereof comprising the system of the disclosure.
  • the disclosure provides a cell or a progeny thereof modified by the system of the disclosure or the method of the disclosure.
  • the disclosure provides a method of modifying a target dsDNA, comprising contacting the target dsDNA with the system of the disclosure,
  • the target dsDNA comprising:
  • a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
  • dG deoxyguanosine
  • dT thymidine
  • dC deoxycytidine
  • a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
  • first deoxyribonucleotide e.g., dG, dT, dC
  • the protospacer sequence is fully reverse complementary to the target sequence.
  • the method does not include deamination of the base of the first deoxyribonucleotide before the excision of the base of the first deoxyribonucleotide.
  • the method does not include deamination of the base of the first deoxyribonucleotide.
  • the disclosure provides an MPG described herein, or of the disclosure.
  • the disclosure provides an UNG described herein, or of the disclosure.
  • Nucleic acid programmable binding protein for example, nucleic acid programmable DNA binding protein, (napDNAbp) , such as Cas9, Cas12, IscB, nucleic acid programmable RNA binding protein (napRNAbp) , such as, Cas13, is capable of binding to a target nucleic acid (e.g., dsDNA, mRNA) as guided by a guide nucleic acid (e.g., a guide RNA) comprising a guide sequence targeting the target nucleic acid.
  • a target nucleic acid e.g., dsDNA, mRNA
  • a guide nucleic acid e.g., a guide RNA
  • the target nucleic acid is eukaryotic.
  • the guide nucleic acid comprises a scaffold sequence responsible for forming a complex with the napBP, and a guide sequence that is intentionally designed to be responsible for hybridizing to a target sequence of the target nucleic acid, thereby guiding the complex comprising the napBP and the guide nucleic acid to the target nucleic acid.
  • an exemplary target dsDNA is depicted to comprise a 5’ to 3’s ingle DNA strand and a 3’ to 5’ single DNA strand.
  • An exemplary guide nucleic acid (e.g., a guide RNA) is depicted to comprise a guide sequence and a scaffold sequence.
  • the guide sequence is designed to hybridize to a part of the 3’ to 5’s ingle DNA strand, and so the guide sequence “targets” that part.
  • the 3’ to 5’s ingle DNA strand is referred to as a “target strand (TS) ” of the target dsDNA
  • NTS nontarget strand
  • target sequence That part of the target strand based on which the guide sequence is designed and to which the guide sequence may hybridize is referred to as a “target sequence”
  • protospacer sequence the opposite part on the nontarget strand corresponding to that part is referred to as the “protospacer sequence” , which is 100% (fully) reversely complementary to the target sequence and is said to be “corresponding to” the target sequence in the disclosure.
  • an exemplary target dsDNA is depicted to comprise a 5’ to 3’s ingle DNA strand and a 3’ to 5’ single DNA strand.
  • an exemplary target RNA (transcript, e.g., a pre-mRNA) may be transcribed using the 3’ to 5’s ingle DNA strand as a synthesis template, and thus the 3’ to 5’s ingle DNA strand is referred to as a “template strand” or a “antisense strand” .
  • the transcript so transcribed has the same primary sequence as the 5’ to 3’s ingle DNA strand except for the replacement of T with U, and thus the 5’ to 3’s ingle DNA strand is referred to as a “coding strand” or a “sense strand” .
  • An exemplary guide nucleic acid (e.g., a guide RNA) is depicted to comprise a guide sequence and a scaffold sequence.
  • the guide sequence is designed to hybridize to a part of the transcript (target RNA) , and so the guide sequence “targets” that part. And thus, that part of the target RNA based on which the guide sequence is designed and to which the guide sequence may hybridize is referred to as a “target sequence” .
  • the guide sequence is 100% (fully) reversely complementary to the target sequence.
  • the guide sequence is reversely complementary to the target sequence and contains a mismatch with the target sequence.
  • nucleic acid sequence e.g., a DNA sequence, an RNA sequence
  • a nucleic acid sequence is written in 5’ to 3’ direction /orientation unless explicitly indicated otherwise.
  • a DNA sequence of ATGC it is usually understood as 5’-ATGC-3’ unless otherwise indicated. Its reverse sequence is 5’-CGTA-3’. Its fully complementary sequence is 5’-TACG-3’. Its fully reverse complementary sequence is 5’-GCAT-3’. Note that the fully complementary sequence usually does not have the ability to base-pair /hybridize with the original sequence.
  • the double-strand sequence of a dsDNA may be represented with the sequence of its 5’ to 3’s ingle DNA strand conventionally written in 5’ to 3’ direction /orientation unless otherwise indicated.
  • the dsDNA may be simply represented as 5’-ATGC-3’.
  • either the 5’ to 3’s ingle DNA strand or the 3’ to 5’s ingle DNA strand of a dsDNA can be a nontarget strand from which a protospacer sequence is selected.
  • the 5’ to 3’s ingle DNA strand is the sense strand of the gene
  • the 3’ to 5’ single DNA strand is the antisense strand of the gene.
  • the sense strand or the antisense strand of a gene can be a nontarget strand from which a protospacer sequence is selected.
  • the transcript (target RNA) transcribed from the dsDNA then has a (target) sequence of 5’-AUGC-3’.
  • the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-AUGC-3’ that is fully reversely complementary to the 3’ to 5’s trand of the target dsRNA, which would be set forth in ATGC in the electric sequence listing but marked as an RNA sequence; and in another embodiment, the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the 5’ to 3’s trand of the target dsRNA, which would be set forth in GCAT in the electric sequence listing but marked as an RNA sequence.
  • the guide sequence of a guide nucleic acid is fully reversely complementary to the target sequence and the target sequence is fully reversely complementary to the protospacer sequence
  • the guide sequence is identical to the protospacer sequence except for the U in the guide sequence due to its RNA nature and correspondingly the T in the protospacer sequence due to its DNA nature.
  • symbol “t” is used to denote both T in DNA and U in RNA (See “Table 1: List of nucleotides symbols” , the definition of symbol “t” is “thymine in DNA/uracil in RNA (t/u) ” ) .
  • such a guide sequence could be set forth in the same sequence as a corresponding protospacer sequence.
  • a single SEQ ID NO in the electronic sequence listing can be used to denote both such guide sequence and protospacer sequence, regardless whether such a single SEQ ID NO is marked as DNA or RNA in the electronic sequence listing.
  • a reference is made to such a SEQ ID NO that sets forth a protospacer /guide sequence it refers to either a protospacer sequence that is a DNA sequence or a guide sequence that is an RNA sequence depending on the context, no matter whether it is marked as a DNA or an RNA in the electronic sequence listing.
  • the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the (target) sequence of the target RNA, which would be set forth in GCAT in the electric sequence listing but marked as an RNA sequence.
  • RNA sequence As used herein, if a DNA sequence, for example, 5’-ATGC-3’ is transcribed to an RNA sequence, with each dT (deoxythymidine, or “T” for short) in the primary sequence replaced with a U (uridine) and other dA (deoxyadenosine, or “A” for short) , dG (deoxyguanosine, or “G” for short) , and dC (deoxycytidine, or “C” for short) replaced with A (adenosine) , G (guanosine) , and C (cytidine) , respectively, for example, 5’-AUGC-3’ , it is said in the disclosure that the DNA sequence “encodes” the RNA sequence.
  • the term “activity” refers to a biological activity.
  • the activity includes enzymatic activity, e.g., catalytic ability of an effector.
  • the activity can include nuclease activity, e.g., dsDNA endonuclease activity, RNA endonuclease activity.
  • nucleic acid programmable binding protein napBP
  • nucleic acid programmable binding domain napBD
  • a programmable nucleic acid e.g., DNA or RNA
  • gRNA guide nucleic acid
  • the napBP may be indirectly associated with (e.g., bound to) the target nucleic acid via the interaction (e.g., binding) between the napBP and the programmable nucleic acid (e.g., scaffold sequence of the programmable nucleic acid) and the interaction (e.g., hybridization) between the programmable nucleic acid (e.g., the guide sequence of the programmable nucleic acid) and the target nucleic acid (e.g., the target sequence of the target nucleic acid) .
  • the napBP is a nucleic acid programmable DNA binding protein (napDNAbp) .
  • the napBP is a nucleic acid programmable RNA binding protein (napRNAbp) .
  • the term “complex” refers to a grouping of two or more molecules.
  • the complex comprises a polypeptide and a nucleic acid interacting with (e.g., binding to, coming into contact with, adhering to) one another.
  • the term “complex” can refer to a grouping of a guide nucleic acid and a polypeptide (e.g., a napBP) .
  • the term “complex” can refer to a grouping of a guide nucleic acid, a polypeptide (e.g., a napBP) , and a target nucleic acid.
  • the term “protospacer adjacent motif’ or “PAM” refers to a short DNA sequence (or a DNA motif) adjacent to a protospacer sequence on the nontarget strand of a dsDNA.
  • adjacent includes instances wherein there is no nucleotide between the protospacer sequence and the PAM and also instances wherein there are a small number (e.g., 1, 2, 3, 4, or 5) of nucleotides between the protospacer sequence and the PAM.
  • a “immediately adjacent (to) ” B, A “immediately 5’ to” B, and A “immediately 3’ to” B mean that there is no nucleotide between A and B.
  • the PAM is immediately 5’ to a protospacer sequence. In some embodiments, the PAM is immediately 3’ to a protospacer sequence.
  • the term “guide nucleic acid” refers to any nucleic acid that facilitates the targeting of a napBP to a target nucleic acid.
  • the guide nucleic acid may be designed to include a guide sequence capable of hybridizing to a specific sequence of a target nucleic acid, and the guide nucleic acid may also comprise a scaffold sequence facilitating the guiding of a napBP to the target nucleic acid.
  • the guide nucleic acid is a guide RNA.
  • the guide nucleic acid is a nucleic acid encoding a guide RNA.
  • nucleic acid As used herein, the terms “nucleic acid” , “polynucleotide” , and “nucleotide sequence” are used interchangeably to refer to a polymeric form of nucleotides of any length, including deoxyribonucleotides, ribonucleotides, combinations thereof, and analogs or modifications thereof.
  • guide RNA is used interchangeably with the term “CRISPR RNA (crRNA) ” , “single guide RNA (sgRNA) ” , or “RNA guide”
  • guide sequence is used interchangeably with the term “spacer sequence”
  • sinaffold sequence is used interchangeably with the term “direct repeat sequence” .
  • the guide sequence is so designed to be capable of hybridizing to a target sequence.
  • the term “hybridize” , “hybridizing” , or “hybridization” refers to a reaction in which one or more polynucleotide sequences react to form a complex that is stabilized via hydrogen bonding between the bases of the polynucleotide sequences. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner.
  • a polynucleotide sequence capable of hybridizing to a given polynucleotide sequence is referred to as the “complement” of the given polynucleotide sequence.
  • the hybridization of a guide sequence and a target sequence is so stabilized to permit an effector polypeptide (e.g., a napBP) that is complexed with a nucleic acid comprising the guide sequence or a function domain associated (e.g., fused) with the effector polypeptide to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence.
  • an effector polypeptide e.g., a napBP
  • a nucleic acid comprising the guide sequence or a function domain associated (e.g., fused) with the effector polypeptide to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence.
  • the guide sequence is reversely complementary to a target sequence.
  • reverse complementary refers to the ability of nucleobases of a first polynucleotide sequence, such as a guide sequence, to base pair with nucleobases of a second polynucleotide sequence, such as a target sequence, by traditional Watson-Crick base-pairing. Two reverse complementary polynucleotide sequences are able to non-covalently bind under appropriate temperature and solution ionic strength conditions.
  • a first polynucleotide sequence (e.g., a guide sequence) comprises 100% (fully) reverse complementarity to a second nucleic acid (e.g., a target sequence) .
  • a first polynucleotide sequence (e.g., a guide sequence) is reverse complementary to a second polynucleotide sequence (e.g., a target sequence) if the first polynucleotide sequence comprises at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%complementarity to the second nucleic acid (i.e., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or
  • the term “substantially complementary” refers to a first polynucleotide sequence (e.g., a guide sequence) that has a certain level of complementarity to a second polynucleotide sequence (e.g., a target sequence) (e.g., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%of the nucleotides of the first polynucleotide sequence can base-pair with the nucleotides of the second polynucleotide sequence, or at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotides of the first polynucleotide sequence mismatch the nucleotides of the second polynucleotide sequence) .
  • the level of complementarity is such that the first polynucleotide sequence (e.g., a guide sequence) can hybridize to the second polynucleotide sequence (e.g., a target sequence) with sufficient affinity to permit an effector polypeptide (e.g., a napBP) that is complexed with a nucleic acid comprising the first polynucleotide sequence or a function domain associated (e.g., fused) with the effector polypeptide to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence.
  • a guide sequence that is substantially complementary to a target sequence has less than 100%complementarity to the target sequence.
  • a guide sequence that is substantially complementary to a target sequence has at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementarity to the target sequence, and/or has at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotide mismatches from the target sequence.
  • sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percentage sequence identity (%) between two or more sequences (polypeptide or polynucleotide sequences) . Sequence homologies may be generated by any of a number of computer programs known in the art, for example, BLAST, FASTA. A suitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (University of Wisconsin, U.S.A; Devereux et al., 1984, Nucleic Acids Research 12: 387) .
  • Examples of other software than may perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al., 1999 ibid-Chapter 18) , FASTA (Atschul et al., 1990, J. Mol. Biol., 403-410) , and the GENEWORKS suite of comparison tools. Both BLAST and FASTA are available for offline and online searching (see Ausubel et al., 1999 ibid, pages 7-58 to 7-60) .
  • a commonly used online tool to calculate percentage sequence identity between two or more sequences is available on the website of EMBL's European Bioinformatics Institute (www dot ebi dot ac dot uk slash jdispatcher slash) , allowing fast online calculation of percentage sequence identity by global alignment or local alignment.
  • polypeptide and “peptide” are used interchangeably herein to refer to polymers of amino acids of any length.
  • a protein may have one or more polypeptides.
  • An amino acid polymer can also be modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component.
  • a “variant” is interpreted to mean a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide, respectively, but retains essential properties, e.g., binding property of a napBP.
  • a typical variant of a polynucleotide differs in nucleic acid sequence from another reference polynucleotide.
  • a change in the nucleic acid sequence of the polynucleotide variant may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide.
  • a change in the nucleic acid sequence of the polynucleotide variant may result in an amino acid substitution, addition, and/or deletion in the polypeptide encoded by the reference polynucleotide.
  • a typical variant of a polypeptide differs in amino acid sequence from another reference polypeptide. Generally, the difference is limited so that the sequences of the reference polypeptide and the polypeptide variant are closely similar overall and, in many regions, identical.
  • the polypeptide variant and reference polypeptide may differ in amino acid sequence by one or more substitutions, additions, and/or deletions in any combination.
  • a variant of a polynucleotide or polypeptide may be naturally occurring, such as, an allelic variant, or it may be a variant that is not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to skilled artisans.
  • the terms “upstream” and “downstream” refer to the relative positions of two or more elements within a nucleic acid in 5’ to 3’ direction.
  • a first sequence is upstream of a second sequence when the 3’ end of the first sequence is present at the left side of the 5’ end of the second sequence.
  • a first sequence is downstream of a second sequence when the 5’ end of the first sequence is present at the right side of the 3’ end of the second sequence.
  • the PAM is upstream of a napBP-induced indel, and a napBP-induced indel is downstream of the PAM.
  • the PAM is downstream of a napBP-induced indel, and a napBP-induced indel is upstream of the PAM.
  • wild type has the meaning commonly understood by those skilled in the art to mean a typical form of an organism, a strain, a gene, or a feature that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from sources in nature and not intentionally modified.
  • nucleic acid or polypeptide As used herein, the terms “non-naturally occurring” and “engineered” are used interchangeably and refer to artificial participation. When these terms are used to describe a nucleic acid or a polypeptide, it is meant that the nucleic acid or polypeptide is at least substantially freed from at least one other component of its association in nature or as found in nature.
  • regulatory element is intended to include promoters, enhancers, internal ribosome entry sites (IRES) , and other expression control elements (e.g., transcription termination signals, such as, polyadenylation signals and poly-U sequences) .
  • IRES internal ribosome entry sites
  • regulatory elements e.g., transcription termination signals, such as, polyadenylation signals and poly-U sequences
  • Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct expression of a nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences) .
  • Regulatory elements may also direct expression in a time-dependent manner, e.g., in a cell cycle-dependent or developmental stage-dependent manner, which may or may not be tissue or cell type specific.
  • the term “cell” is understood to refer not only to a particular individual cell, but to the progeny or potential progeny of the cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term.
  • in vivo refers to inside the body of an organism
  • ex vivo or “in vitro” means outside the body of an organism.
  • the term “treat” , “treatment” , or “treating” is an approach for obtaining beneficial or desired results including clinical results.
  • the beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms resulting from a disease, diminishing the extent of a disease, stabilizing a disease (e.g., delaying the worsening of a disease) , delaying the spread (e.g., metastasis) of a disease, delaying the recurrence of a disease, reducing recurrence rate of a disease, delay or slowing the progression of a disease, ameliorating a disease state, providing a remission (partial or total) of a disease, decreasing the dose of one or more other medications required to treat a disease, delaying the progression of a disease, increasing the quality of life, and/or prolonging survival.
  • treatment is a reduction of pathological consequence of a disease (such as cancer)
  • disease includes the terms “disorder” and “condition” and is not limited to those have been specifically medically defined.
  • transcript includes any transcription product by transcription from a DNA, including subgenomic RNA, mRNA, non-coding RNA, and any variants, derivatives, or ancestors thereof, for example, pre-mRNA, and any transcripts or isoforms produced from the DNA or the pre-mRNA by, e.g., alternative promoter usage, alternative splicing, alternative initiation, and any naturally occurring variants thereof or processed products therefrom.
  • reference to “not” a value or parameter generally means and describes “other than” a value or parameter.
  • the method is not used to treat cancer of type X means the method may be used to treat cancer of types other than X.
  • the term “and/or” in a phrase such as “A and/or B” is intended to include both A and B; A or B; A (alone) ; and B (alone) .
  • the term “and/or” in a phrase such as “A, B, and/or C” is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone) ; B (alone) ; and C (alone) .
  • FIG. 1 Design and mechanisms of base editing by conventional base editors and glycosylase-based base editors of the disclosure.
  • FIG. 1a Schematic diagrams of ABE (left) and CBE (middle) and the deaminase-free glycosylase-based guanine base editor (gGBE, right) of the disclosure.
  • a nCas9-sgRNA complex creates an R-loop at the target site in the DNA.
  • the evolved adenine deaminase (tRNA adenosine deaminase, TadA) and AID/APOBEC-like cytidine deaminase converts the exposed adenine (A) into deoxyinosine (I) and cytosine (C) into deoxyuridine (U) , respectively.
  • an additional linked protein, uracil glycosylase inhibitor (UGI) protects U from uracil DNA N-glycosylase (UNG) . After deamination, the resulting I is recognized as G, and U as T by DNA polymerase during DNA repair or replication.
  • gGBE glycosylase-based guanine base editor
  • PAM protospacer adjacent motif
  • AP apurinic/apyrimidinic sites.
  • FIG. 1b A screening reporter system for detecting G-to-T conversion by gGBE.
  • P2A 2A peptide from porcine teschovirus-1.
  • WT wild-type.
  • FIG. 2 Mutagenesis of the MPG moiety in gGBEs.
  • FIG. 2a Schematic diagram of mutagenesis and screening strategy for engineered gGBE. The EGFP reporter plasmids were transiently co-transfected into cultured cells along with the gGBE expression plasmids.
  • FIG. 2b Genotypes of a subset of engineered gGBEs, with percentage of EGFP + cells for each gGBE on the far-right column (more engineered gGBEs listed in FIG. 13) . Different steps of mutagenesis are marked by different shaded colors.
  • FIG. 3 Characterization of editing profiles of gGBE via target deep sequencing.
  • G# G position with highest on-target base editing frequencies across protospacer positions 1–20.
  • site # genomic site number.
  • FIG. 3b The ratio of G-to-C/T to G-to-A/C/T conversion frequency by gGBEv6.3 editing at the sites shown in FIG. 3a.
  • FIG. 3c Frequencies of G conversions by gGBEv6.3 across protospacer positions 1–20 at the edited sites in FIG.
  • FIG. 3a (in which PAM was at positions 21–23) .
  • Single dots represent individual data point from 3 independent replicates per site. Boxes span the interquartile range (25th to 75th percentile) ; horizontal line in the box indicates the median (50th percentile) ; and small horizontal bars mark the minimal and maximal values.
  • OT off-target.
  • FIG. 4 Gene editing applications of gGBE.
  • FIG. 4a Application of gGBEv6.3 for editing splicing sites, introduction of premature termination codons (PTCs) , as well as editing that bypasses PTCs.
  • FIG. 4b Schematic diagram illustrating gGBE-indued skipping of DMD exon 45.
  • FIG. 4d Schematic diagram illustrating the introduction of PTCs in the mouse Tyr gene by gGBE.
  • FIG. 4g Phenotype of F0 mice generated by gGBE editing in mouse zygotes. The Image showing the presence of edited P6 mice. Red arrowhead, albino; blue arrowhead, mosaic pigmentation.
  • FIG. 4h Bar plots showing the on-target G editing frequencies for individual mouse pups, with gGBEv6.3 targeting Tyr site 3.
  • FIG. 4i Genotyping of representative F0 pups from (FIG. 4h) . The frequencies of mutant alleles were determined by high-throughput sequencing. Red arrowhead, albino pups.
  • FIG. 5 illustrates example nucleotide conversion by base excision and translesion synthesis (TLS) .
  • FIG. 6 illustrates an exemplify target dsDNA containing a first exemplify deoxyribonucleotide dG, an exemplify guide nucleic acid, and an exemplify napDNAbp before base editing.
  • FIG. 7 illustrates an exemplify target dsDNA containing a fourth exemplify deoxyribonucleotide dC, an exemplify guide nucleic acid, and an exemplify napDNAbp after base editing.
  • FIG. 8 illustrates an exemplify target dsDNA containing a fourth exemplify deoxyribonucleotide dT, an exemplify guide nucleic acid, and an exemplify napDNAbp after base editing.
  • FIG. 9a Schematic diagram of AYBE.
  • MPG N-methylpurine DNA glycosylase
  • BER base excision repair
  • FIG. 9b Schematic diagram of CGBE.
  • Uracil DNA N-glycosylase excises the uridine (U) resulting from deamination of cytosine (C) by the AID/APOBEC-like cytidine deaminase, triggering base excision repair (BER) pathway in cells, thus causing dominant C-to-G editing.
  • PAM Protospacer adjacent motif.
  • AP apurinic/apyrimidinic sites.
  • FIG. 10 Characterization of A-to-T and G-to-T editing with an intron-split EGFP reporter system.
  • FIG. 10a Design of the reporter for A-to-T or G-to-T editing detection. P2A, 2A peptide from the porcine teschovirus-1.
  • FIG. 10b Percentage of EGFP + cells for evaluation of A editing efficiency by gABE with various MPG.
  • FIG. 10c Percentage of EGFP + cells representing the efficiency of G-to-T conversion for gGBE containing various MPG and sgRNA.
  • FIG. 10d Representative flow cytometry scatter plots showing gating strategy and the percentages of EGFP + cells for gGBEv3.
  • FIG. 11 View of MPG structure and the first round of mutagenesis of MPG.
  • FIG. 11a Structures for aa 78-298 region (left) and 163-179 region (right) of human MPG protein (shown in gray) , as predicted by AlphaFold (alphafold. com/entry/P29372) aligned with the crystal structure of MPG (PDB entry 1ewn, not shown) , in which ⁇ A was mutated to G in the DNA.
  • FIG. 11a Structures for aa 78-298 region (left) and 163-179 region (right) of human MPG protein (shown in gray) , as predicted by AlphaFold (alphafold. com/entry/P29372) aligned with the crystal structure of MPG (PD
  • FIG. 12a-d Percentage of EGFP + cells of gGBEs with various MPG mutants from sequential substitutions of glutamic acid (FIG. 12a) , valine (FIG. 12b) , glycine (FIG. 12c) , and tyrosine (FIG. 12d) (X-to-E, V, G, or Y) .
  • n 3. All values are presented as mean ⁇ s.e.m.
  • FIG. 13 Progressive engineering and G editing efficiency of gGBEs.
  • FIG. 13a Progressive mutations of gGBEs. Different rounds of mutations are marked with different color shades.
  • FIG. 14a-c Frequencies of C (FIG. 14a) , T (FIG. 14b) and A (FIG. 14c) conversions by gGBEv6.3 across the protospacer positions 1–20 (where PAM is at positions 21–23) from the edited sites in FIG. 3a.
  • FIG. 14d Frequencies of G-to-T and G-to-C editing by gGBEv6.3.
  • FIG. 14j The statistical analysis of on-target DNA base editing for each NG motif from the edited sites in (FIG. 14i) . Each dot represents the mean of three biological replicates for each edited position at various edited sites.
  • FIG. 15 The guide sequence-dependent and guide sequence-independent off-target analysis.
  • OT off-target.
  • Data for AYBE and ABE8e were adopted from Tong et al. [1] . All values are presented as mean ⁇ s.e.m.
  • FIG. 16 The percentage of G-to-C and G-to-T among all G-to-C/T/Aconversion events at DMD or Tyr sites targeted by gGBEv6.3.
  • FIG. 16a Percentages of G-to-C and G-to-T editing events in HEK293T cells with gGBEv6.3 at two DMD sites (corresponding to FIG. 4C) .
  • n 3.
  • SA splicing acceptor site.
  • PAM Protospacer adjacent motif.
  • FIG. 16b Percentages of G-to-C and G-to-T editing events in N2a cells with gGBEv6.3 at three Tyr sites (corresponding to FIG. 4e) .
  • n 3. All values are presented as mean ⁇ s.e.m.
  • FIG. 17 G editing in mouse embryos with gGBEv6.3.
  • FIG. 17e Bar plots showing the on-target G editing frequencies for individual mouse embryos, with gGBEv6.3 targeting Tyr site 1, Tyr site 2, and Tyr site 3.
  • FIG. 18 Phenotypes and genotyping of F0 mouse pups.
  • FIG. 18a-c Phenotypes of F0 mice generated by microinjection of gGBEv6.3 encoding mRNA and sgRNA for targeting Tyr site 1 (FIG. 18a) , site 2 (FIG. 18b) and site 3 (FIG. 18c) . The images were obtained for P6 mice. Arrowheads: red, albino mice; blue, mice with mosaic pigmentation.
  • FIG. 18d Bar plots showing the on-target G editing frequencies for individual mouse pups, with gGBEv6.3 targeting Tyr site 1 and Tyr site 2.
  • FIG. 19 Design and mechanisms of two orthogonal glycosylase-based base editors.
  • FIG. 19a Prototype versions of a deaminase-free glycosylase-based thymine base editor (gTBE) and a deaminase-free glycosylase-based cytosine base editor (gCBE) .
  • PAM Protospacer adjacent motif.
  • AP apurinic/apyrimidinic sites.
  • Star (*) in magenta indicates the nick generated by nCas9.
  • FIG. 19b Schematic diagram of potential pathway for T (or C) editing and outcomes.
  • a glycosylase mutant is designed to remove normal T or C, an nCas9-sgRNA complex creates an R-loop at the target site and nicks the non-edited strand, then the AP site generated is repaired by translesion synthesis (TLS) and/or DNA replication, leading to T or C editing.
  • TLS translesion synthesis
  • DSB double-strand break. indel, insertion and deletion.
  • FIG. 19c Schematic of various gTBE and gCBE candidate architectures. Note that Y156A and N213D of UNG2 are equivalent to Y147A and N204D of UNG1, respectively.
  • T target sgRNA.
  • T target sgRNA.
  • FIG. 20 Protein engineering and evolution of gTBEs.
  • FIG. 20a Schematic diagram of mutagenesis and screening strategy for engineering gTBE.
  • the EGFP reporter plasmids were transiently co-transfected into cultured cells along with the gTBE plasmids, and the fluorescence intensity of EGFP was detected with flow cytometry.
  • FIG. 20b Left, the selected residues (shown as surface) for mutagenesis nearby the catalytic site pocket of human UNG-DNA complex (PDB entry 1EMH 24 ) , in which d ⁇ U was mutated to T in the DNA (dT) .
  • PDB entry 1EMH 24 the catalytic site pocket of human UNG-DNA complex
  • d ⁇ U was mutated to T in the DNA
  • right location of the effective residues in gTBEv3 shown as spheres in red on the three-dimensional structure.
  • FIG. 20a Schematic diagram of mutagenesis and screening strategy for engineering gTBE
  • WT wild-type UNG2 ⁇ 88. dead, catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) 60 .
  • FIG. 21 Characterization of editing profiles of gTBE via target deep sequencing.
  • T# T position with highest on-target base editing frequencies across protospacer positions 1–20.
  • site # genomic site number.
  • FIG. 21b The ratio of T-to-C/G to T-to-A/C/G conversion frequency by gTBEv3 editing at the sites shown in FIG. 21a.
  • T# T position with highest on-target base editing frequencies across
  • OT off-target.
  • FIG. 22 Enhancement of gCBE editing efficiency through protein engineering.
  • FIG. 22a Schematic diagram of mutagenesis and screening strategy for engineering gCBE.
  • WT wild-type UNG2 ⁇ 88. dead, catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) 60 .
  • C# C position with highest on-target base editing frequencies across protospacer positions 1-20.
  • FIG. 22f gRNA-independent cumulative off-target editing frequencies detected by the orthogonal R-loop assay at each R-loop site.
  • FIG. 23 Gene editing applications of gTBE and gCBE.
  • FIG. 23a principle for exon skipping with base editors.
  • FIG. 23b Bar plots showing the numbers of sgRNA candidates targeting the splicing sites in 16 genes by different base editors.
  • the 16 genes are AGT, ANGPTL3, APOC3, B2M, CD33, DMD, DNMT3A, HPD, KLKB1, PCSK9, PDCD1, PRDM1, TGFBR2, TRAC, TTR, and VEGFA.
  • FIG. 23c Venn diagram showing the distribution of sgRNAs for 4 base editors in FIG. 23b.
  • FIG. 23d Schematic diagram illustrating sgRNA candidates specifically targeting SD or SA sites in human DMD with gTBEv3 (red lines) or gCBEv2 (black lines) , but not ABE or CBE.
  • FIG. 23e Schematic diagram illustrating the skipping of human DMD exon 45 induced by gTBE-induced disruption of the splicing donor site.
  • FIG. 23g DNA sequencing chromatograms from wild-type (WT) and representative embryos co-injected with gTBEv3 mRNA and sgRNA targeting the SD site of human DMD exon 45.
  • WT wild-type
  • FIG. 24 Comparison of different gTBEs.
  • FIG. 24a the strategies for protein engineering and screening used in three studies.
  • FIG. 24b Schematic of the basic architectures for various base editors. UNG2*, UNG2 mutant from the corresponding base editor. ⁇ NTD, deletion of the N-terminal domain.
  • FIG. 25 Characteristic sequences and motifs of human UNG1 and UNG2.
  • FIG. 25a UNG1-specific N-terminal residues (amino acid 1-35) are marked in grey.
  • UNG2-specific N-terminal residues (amino acid 1-44) are light blue.
  • the common RPA-binding site (yellow) and the globular catalytic domain (light green) are indicated.
  • RPA Replication protein A.
  • UNGs contain five conserved motifs numbered from UNG2 as follows: the catalytic water-activating loop (152-GQDPYH-157) ; the proline (Pro) -rich loop compressing the DNA backbone 5’ to the lesion (174-PPPPS-178) ; the uracil-binding motif (210-LLLN-213) ; the glycine-serine (Gly-Ser) loop that compresses the DNA backbone 3’ to the lesion (255-GS-256) ; and the leucine (Leu) -intercalation loop penetrating the minor groove (277-HPSPLS-282) .
  • FIG. 26 Characterizations of T-to-G and C-to-G reporter system.
  • FIG. 26a Schematic construct designs of the reporter for T-to-G or C-to-G editing detection.
  • PAM Protospacer adjacent motif.
  • FIG. 26b Representative flow cytometry scatter plots showing gating strategy and the percentages of gated cells for the negative control (upper panel) and gCBEv0.3 (lower panel) .
  • FIG. 27 Editing efficiency of gTBE and gCBE candidates with various UNG-NTD truncations.
  • WT wild-type UNG2 ⁇ 88. dead, catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ⁇ s.e.m.
  • FIG. 28 Performance of UNG mutants in the background of gTBEv0.3.
  • FIG. 29 Performance of UNG mutants in the background of gTBEv2.
  • dead catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ⁇ s.e.m.
  • FIG. 30 Further characterization of editing profiles of gTBEv3.
  • FIG. 30b-e Frequencies of T (FIG. 30b) , G (FIG. 30c) , C (FIG. 30d) and A (FIG. 30e) conversions by gTBEv3 across the protospacer positions 1-20 (where PAM is at positions 21–23) from the edited sites in FIG. 21a.
  • FIG. 30f Frequencies of T-to-G and T-to-C editing by gTBEv3.
  • T# T position with highest on-target base editing frequencies across protospacer positions 1–20.
  • site # genomic site number.
  • FIG. 30j The ratio of T-to-Sto total T editing (base conversions and indels) by gTBEv3 editing at the sites shown in FIG. 21a.
  • FIG. 31 Guide sequence-dependent off-target analysis for gTBEv3 at more sites.
  • the guide sequence-dependent off-target analysis for gTBEv3 editing at site 9 (a) and site 15 (b) (n 3) .
  • OT off-target. All values are presented as mean ⁇ s.e.m.
  • FIG. 32 Performance of UNG mutants in the background of gCBEv0.3.
  • Replacement of alanine with valine (A-to-V) is intended to cover all the residues in the interested regions.
  • dead catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ⁇ s.e.m.
  • FIG. 33 Further characterization of editing profiles of gCBEv2.
  • FIG. 33c-d Frequencies of C (FIG. 33c) and T (FIG.
  • FIG. 33d The ratio of C-to-G/T to C-to-A/G/T conversion frequency by gCBEv2 editing at the sites shown in FIG. 22c.
  • FIG. 33f-h Percentage of C-to-G (FIG. 33f) , C-to-T (FIG.
  • C# C position with highest on-target base editing frequencies across protospacer positions 1-20.
  • site # genomic site number.
  • j The statistical analysis of on-target DNA base editing for each NC motif from the 16 edited sites. Each dot represents the mean of three biological replicates for each edited position at various edited sites.
  • FIG. 34 Base editing at spicing sites with gTBEv3.
  • FIG. 34a The optimal editing windows for various base editors.
  • FIG. 34b Venn diagram showing the distribution of sgRNAs for CBE and gCBEv2 in FIG. 23b.
  • T#or C# The position of targeted T or C across protospacer positions 1–20.
  • FIG. 34d DNA sequencing chromatograms for targeting the SD site of human DMD exon 37 and exon 12 with gTBEv3. Sanger sequencing results were quantified by EditR.
  • FIG. 35 PTCs editing and introduction for various base editors.
  • FIG. 35a principle for bypassing premature termination codons (PTCs) with various base editors.
  • FIG. 35b the possible codon outcomes from stop codons (TAA, TAG or TGA) editing with different base editors.
  • FIG. 35c principle for introduction of PTCs with various base editors.
  • FIG. 35d the available codons for editing into stop codons (TAA, TAG or TGA) with different base editors.
  • 35e The 10 ⁇ 10 dot plot diagram showing the percentage of possible sgRNAs for introduction of premature termination codons (PTCs) by targeting different codons (with the number of available sgRNAs presented in the right) in 15 well-studied genes (AGT, ANGPTL3, APOC3, B2M, CD33, DNMT3A, HPD, KLKB1, PCSK9, PDCD1, PRDM1, TGFBR2, TRAC, TTR, VEGFA) for gene and cell therapy research with gGBEv6.3 and CBE.
  • AGT ANGPTL3, APOC3, B2M
  • CD33 DNMT3A, HPD, KLKB1, PCSK9, PDCD1, PRDM1, TGFBR2, TRAC, TTR, VEGFA
  • FIG. 36 Additional comparison of different gTBEs.
  • FIG. 36f-h The statistical analysis of T base editing (FIG.
  • FIG. 36a-h the graphs were derived from the data for various base editors shown in FIG. 24c. Dunnett’s multiple comparisons test after one-way ANOVA was used to compare the gTBEv3 or gTBEv5 with other base editors in FIG. 36f-h.
  • FIG. 37 T editing in the dsDNA upstream from the target site.
  • FIG. 38 Comparison of various glycosylase-based base editors for cytosine editing.
  • FIG. 38a Schematic of the basic architectures for various base editors. UNG2*, UNG2 mutant from the corresponding base editor. ⁇ NTD, deletion of the N-terminal domain.
  • FIG. 38b-c The frequencies of total C editing (base conversions and indels, FIG. 38b) or C base conversions (FIG. 38c) for various base editors at 19 endogenous loci.
  • the cytosines with editing frequencies >25%for any base editors were showed.
  • FIG. 38a Schematic of the basic architectures for various base editors. UNG2*, UNG2 mutant from the corresponding base editor. ⁇ NTD, deletion of the N-terminal domain.
  • FIG. 38b-c The frequencies of total C editing (base conversions and
  • FIG. 39 Additional comparison of various glycosylase-based base editors for cytosine editing.
  • FIG. 39c Frequencies of C base conversions by various base editors across the protospacer positions 1-20 (where PAM is at positions 21–23) .
  • the graphs were derived from the data for various base editors shown in FIG. 38c.
  • FIG. 40 Off-target analysis of various glycosylase-based base editors.
  • OT off-target.
  • FIG. 41 Comparison between gTBEs or gCBEs and PEs.
  • OT off-target.
  • the PE6d was used together with epegRNA and nick sgRNA. For PE6d max, PE6d was co-expressed with the codon-optimized hMLH1dn, a dominant negative MMR protein. All values are presented as mean ⁇ s.e.m.
  • FIG. 42 Characterization of editing profiles for gTBEs or gCBEs in HEK293T, HuH-7, and U2OS cells.
  • HuH-7 a cell line established from a human hepatocellular carcinoma
  • U2OS a cell line established from a human bone osteosarcoma. All values are presented as mean ⁇ s.e.m.
  • dG, dT, dC target deoxyribonucleotide
  • ABE and CBE require a deaminase for the base editing of A and C.
  • provided in the disclosure includes at least in part base editors and base editing methods capable of base editing of a target deoxyribonucleotide (e.g., dG, dT, dC) in a target dsDNA in the absence of a deaminase.
  • a target deoxyribonucleotide e.g., dG, dT, dC
  • the base editors and base editing methods of the disclosure rely on the base excision domain of the base editor.
  • the base excision domain is capable of directly excising the base of a target deoxyribonucleotide in a target dsDNA to generate an abasic site in situ, trigging a base excision repair (BER) pathway.
  • BER base excision repair
  • the target deoxyribonucleotide may be converted to another deoxyribonucleotide, leading to base editing of the target deoxyribonucleotide.
  • the base editors and base editing methods of the disclosure also rely on a nucleic acid programmable DNA binding domain (napDNAbd) to specifically direct the base editor to a target dsDNA via a guide nucleic acid capable of interacting with both the napDNAbd and the target dsDNA.
  • the napDNAbd may be associated (e.g., complex) with a guide nucleic acid (e.g., a guide RNA) .
  • the guide nucleic acid is designed to localize or target the napDNAbd to the target dsDNA, by relying on the hybridization between a target sequence of the target dsDNA and a corresponding guide sequence of the guide nucleic acid.
  • the guide nucleic acid comprises a guide sequence that is capable of hybridizing to a target sequence of the target dsDNA due to the substantial complementarity between the guide sequence and the target sequence.
  • the guide nucleic acid also comprises a scaffold sequence capable of forming a complex with the napDNAbd. In this way, the guide nucleic acid “programs” the napDNAbd such that the napDNAbd can specifically localize and (indirectly) bind to the region on and around the target sequence of the target dsDNA via the guide nucleic acid.
  • the binding of the napDNAbd to the target dsDNA enables the base excising domain associated with the napDNAbd to specifically access to and function on the base of the target deoxyribonucleotide in the target sequence of the target dsDNA in a guide sequence-specific/dependent way.
  • the base excision domain of the base editor of the disclosure directly excises the base of a target deoxyribonucleotide (the first deoxyribonucleotide in FIG. 6) , generates an abasic site where the base is removed, which is an apurinic site where a purine (e.g., guanine) is removed or an apyrimidinic site where a pyrimidine (e.g., thymine, cytosine) is removed.
  • a purine e.g., guanine
  • a pyrimidine e.g., thymine, cytosine
  • the abasic site may be repaired by translesion synthesis (TLS) (by, e.g., TLS polymerase) and/or DNA replication, leading to base editing, in which case nicking in the strand opposite to the abasic site may not be necessary.
  • TLS translesion synthesis
  • a nucleic acid programmable DNA nickase such as Cas9 nickase
  • it creates a nick in the strand (non-edited strand) opposite to the abasic site, and the apyrimidinic site may be removed by AP lyase to generate another nick on the edited strand, which two nicks trigger double-strand break (DSB) repair and introduction of indel mutation, also leading to highly potential change of the target deoxyribonucleotide.
  • DSB double-strand break
  • FIG. 6 shows, before base editing, a first deoxyribonucleotide of dG as target deoxyribonucleotide to be edited on the edited strand (nontarget strand) and a second deoxyribonucleotide of dC on the opposite strand (non-edited strand /target strand) base pairing with the dG.
  • the dG is located in a protospacer sequence on the nontarget strand of the target dsDNA
  • the dC is located in a target sequence on the target strand of the target dsDNA.
  • a guide nucleic acid is designed to comprise a guide sequence capable of hybridizing to the target sequence and comprises a scaffold sequence capable of forming a complex with a napDNAbd.
  • the napDNAbd is capable of nicking the target stand and is fused with a base excising domain capable of excising guanine of the dG.
  • FIG. 7 shows, after base editing, a fourth deoxyribonucleotide of dC as outcome deoxyribonucleotide on the edited strand (nontarget strand) and a third deoxyribonucleotide of dG on the opposite strand (non-edited strand /target strand) base pairing with the dC.
  • FIG. 7 together show direct dG-to-dC base editing.
  • FIG. 8 shows, after base editing, a fourth deoxyribonucleotide of dT as outcome deoxyribonucleotide on the edited strand (nontarget strand) and a third deoxyribonucleotide of dA on the opposite strand (non-edited strand /target strand) base pairing with the dT.
  • FIG. 6 and FIG. 8 together show direct dG-to-dT base editing.
  • the base editing approach of the disclosure allows direct base editing of a target deoxyribonucleotide (e.g., dG, dT, dC) in a target dsDNA, expanding the scope of target design and screening for the direct base editing. For example, if editing of a target dG to dA is desired, the traditional base editors incapable of directly editing dG would have to be applied to edit dC on the opposite strand to dT, thereby indirectly editing dG to dA.
  • a target deoxyribonucleotide e.g., dG, dT, dC
  • the editing ability of the traditional CBE might not be able to edit the dC with a desired outcome dT;the PAM limitation of the CBE might not allow designing a target /guide sequence targeting the dC to specifically direct the CBE to the dC; and even if such a guide sequence can be designed, the base editing efficiency of the CBE might not be sufficient.
  • the target dG can be directly base edited, and therefore, developers would have much more chance to design, screen, and obtain a suitable target /guide sequence targeting the dG to specifically direct the base editor of the disclosure to the dG.
  • the base editing approach of the disclosure may function in the absence of deamination of the base of the target deoxyribonucleotide before the excision of the base of the target deoxyribonucleotide, or in the absence of deamination at all.
  • traditional ABE needs to deaminase the base (adenine) of a target dA to a hypoxanthine, thereby converting the target dA to inosine (I) , which reads as dG in DNA repair replication
  • traditional CBE needs to deaminase the base (cytosine) of a target dC to an uracil, thereby converting the target dC to uridine (U) , which reads as dT in DNA repair or replication.
  • Deamination is unlikely for G (due to spontaneous remediation) [18] and impossible for T (due to the absence of amine) , making the development of deaminase-based G and T base editors a challenging task.
  • the omission of a deaminase domain in the base editor of the disclosure opens the way to base editing of G and T, and may also avoid undesired effects caused by the deaminase domain and deamination of traditional deaminase-based base editors and reduce base editor size.
  • a deaminase-free, glycosylase-based guanine base editor (gGBE) was developed with G editing ability, by fusing a nucleic acid programmable DNA nickase such as Cas9 nickase with a human N-methylpurine DNA glycosylase (MPG) mutant capable of excising guanine of dG developed by several rounds of MPG mutagenesis via unbiased and rational screening. It was demonstrated that the gGBE has high G editing efficiency.
  • MPG human N-methylpurine DNA glycosylase
  • the gGBE exhibited high base editing efficiency (up to 81.2%) and high G-to-T or G-to-C (i.e., G-to-Y) conversion ratio (up to 0.95) in both cultured human cells and mouse embryos.
  • a nucleic acid programmable DNA nickase such as Cas9 nickase with a human uracil DNA glycosylase (UNG) mutant capable of excising thymine of dT or cytosine of dC separately developed by mutagenesis of UNG
  • UNG human uracil DNA glycosylase
  • two deaminase-free, glycosylase-based base editors for direct T editing (gTBE) and direct C editing (gCBE) were developed to achieve orthogonal base editing, that is gTBE for direct T editing and gCBE for direct C editing, respectively.
  • gTBE and gCBE were obtained with high activity of T-to-S (i.e., T-to-C or T-to-G) and C-to-G conversions, respectively. Furthermore, by embedding the UNG mutant into a nucleic acid programmable DNA nickase such as Cas9 nickase, more gTBE and gCBE were generated, showing enhanced average editing efficiency and alternative editing windows.
  • the editing profile of gTBE and gCBE were characterized by targeting dozens of endogenous genomic loci in cultured mammalian cells as well as mouse embryos, demonstrating their high base editing efficiency.
  • the base editor of the disclosure may be provided in the form of a fusion protein.
  • the disclosure provides a fusion protein comprising:
  • napDNAbd nucleic acid programmable DNA binding domain capable of binding a target dsDNA comprising:
  • a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
  • dG deoxyguanosine
  • dT thymidine
  • dC deoxycytidine
  • a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
  • first deoxyribonucleotide e.g., dG, dT, dC
  • target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence
  • a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide.
  • the fusion protein does not comprise a deaminase domain, e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
  • a deaminase domain e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
  • the disclosure provides a system comprising:
  • fusion protein or a polynucleotide encoding the fusion protein, the fusion protein comprising:
  • napDNAbd nucleic acid programmable DNA binding domain capable of binding a target dsDNA comprising:
  • a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
  • dG deoxyguanosine
  • dT thymidine
  • dC deoxycytidine
  • a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
  • first deoxyribonucleotide e.g., dG, dT, dC
  • target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence
  • a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide
  • a guide nucleic acid or a polynucleotide encoding the guide nucleic acid comprising:
  • the system is a complex comprising the fusion protein complexed with the guide nucleic acid.
  • the complex further comprises the target dsDNA hybridized with the guide sequence.
  • the system is a composition comprising the component (i) and the component (ii) .
  • the guide nucleic acid as described herein is a guide RNA (gRNA) .
  • gRNA guide RNA
  • sgRNA single guide RNA
  • the disclosure provides a method of modifying a target dsDNA, comprising contacting the target dsDNA with a system,
  • the target dsDNA comprising:
  • a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
  • dG deoxyguanosine
  • dT thymidine
  • dC deoxycytidine
  • a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
  • first deoxyribonucleotide e.g., dG, dT, dC
  • target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence
  • fusion protein or a polynucleotide encoding the fusion protein, the fusion protein comprising:
  • napDNAbd nucleic acid programmable DNA binding domain
  • a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide
  • a guide nucleic acid or a polynucleotide encoding the guide nucleic acid comprising:
  • the method does not include deamination of the base of the first deoxyribonucleotide before the excision of the base of the first deoxyribonucleotide.
  • the method does not include deamination of the base of the first deoxyribonucleotide.
  • the method comprises inducing strand separation of the target dsDNA.
  • the target deoxyribonucleotide to be edited in the target dsDNA may be termed as “first deoxyribonucleotide”
  • the outcome deoxyribonucleotide converted from the target deoxyribonucleotide by base editing may be termed as “fourth deoxyribonucleotide” .
  • the first deoxyribonucleotide is deoxyguanosine (dG) , thymidine (dT) , deoxyadenosine (dA) , or deoxycytidine (dC) .
  • the first deoxyribonucleotide is dG.
  • the first deoxyribonucleotide is dT.
  • the fourth deoxyribonucleotide is dA, dT, dC, or dG.
  • the second deoxyribonucleotide is dC, dA, dT, or dG.
  • the third deoxyribonucleotide is dA, dT, dC, or dG.
  • the first deoxyribonucleotide is converted to a fourth deoxyribonucleotide that is different from the first deoxyribonucleotide.
  • the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide is dG-to-dA, dG-to-dT, dG-to-dC, dT-to-dA, dT-to-dC, dT-to-dG, dC-to-dA, dC-to-dT, or dC-to-dG.
  • the target dsDNA is a wild type or naturally-occuring. In some embodiments, the target dsDNA is not a wild type or naturally-occuring. In some embodiments, the target dsDNA is eukaryotic or prokaryotic. In some embodiments, the target dsDNA is from an animal (e.g., human, monkey, mouse) or plant. In some embodiments, the target dsDNA is a target gene. In some embodiments, the gene is an animal (e.g., human, monkey, mouse) or plant gene. In some embodiments, the dsDNA is in a target cell.
  • the first deoxyribonucleotide is native or nonnative to the target dsDNA. In some embodiments, the first deoxyribonucleotide is a mutation in the target dsDNA. In some embodiments, the first deoxyribonucleotide is a pathogenic mutation in the target dsDNA. In some embodiments, the first deoxyribonucleotide is a mutation resulting in a stop codon in the target dsDNA.
  • the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide directly or indirectly converts a stop codon to a non-stop codon or directly or indirectly converts a non-stop codon to a stop codon, either on the target strand or the nontarget strand.
  • the stop codon is on the sense strand of the dsDNA.
  • the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide occurs on the sense strand or the nonsense strand of the dsDNA. In some embodiments, the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide (e.g., dG-to-dT) occurs on the sense strand of the dsDNA, converting a stop codon on the sense strand to a non-stop codon or converting a non-stop codon (e.g., GAA) on the sense strand to a stop codon (e.g., TAA) .
  • dG-to-dT the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide
  • the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide occurs on the nonsense strand of the dsDNA, converting a stop codon on the sense strand to a non-stop codon or converting a non-stop codon (e.g., TCA) on the sense strand to a stop codon (e.g., TGA) .
  • the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide occurs at the splicing site (e.g., splicing donor, splicing acceptor) of the target dsDNA.
  • the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide e.g., dG-to-dC
  • the splicing site e.g., splicing donor, splicing acceptor
  • the first deoxyribonucleotide is at a position of the protospacer sequence selected from the group consisting of position 1, position 2, position 3, position 4, position 5, position 6, position 7, position 8, position 9, position 10, position 11, position 12, position 13, position 14, position 15, position 16, position 17, position 18, position 19, position 20, and a combination thereof; or wherein the first deoxyribonucleotide is at a position of the protospacer sequence between position 1 and position 20, both inclusive; or wherein the first deoxyribonucleotide is at a position of the protospacer sequence between position 1 and position 14, both inclusive.
  • the first deoxyribonucleotide is at a position of the protospacer sequence selected from the group consisting of position 6, position 7, position 8, position 9, position 10, position 11, and a combination thereof; or wherein the first deoxyribonucleotide is at a position of the protospacer sequence between position 6 and position 11, both inclusive. In some embodiments, the first deoxyribonucleotide is at position 7 of the protospacer sequence.
  • the first deoxyribonucleotide is the N1 or N2 nucleotide in a motif of N1N2, wherein N1 or N2 is A, T, G, or C. In some embodiments, the first deoxyribonucleotide is the N2 nucleotide in a motif of N1N2, wherein N1 is A or T, and N2 is C. In some embodiments, the first deoxyribonucleotide is the N1, N2, or N3 nucleotide in a motif of N1N2N3, wherein N1, N2, or N3 is A, T, G, or C.
  • base excising domain (BED) is used interchangeably with “base excising protein (BEP) ” or “base excising enzyme (BEE) ” and refers to a protein capable of recognizing and excising a base (e.g., A, T, C, G, or U) of a nucleotide of a nucleic acid (e.g., DNA (ssDNA or dsDNA) or RNA) .
  • base is used interchangeably with “nucleobase” or “nitrogenous base” .
  • Base includes, for example, adenine (A) , cytosine (C) , guanine (G) , thymine (T) , and uracil (U) , and they may be termed as primary, normal, or canonical base.
  • a deoxyribonucleotide is composed of a base, a deoxyribose, and a phosphate
  • a deoxyribonucleoside is composed of a base and a deoxyribose. Excising the base of a deoxyribonucleoside releases the base from the deoxyribonucleoside.
  • excising the base of a deoxyribonucleoside comprises cleaving or hydrolyzing the glycosidic bond linking the base to the deoxyribose of the first deoxyribonucleotide, thereby releasing the base from the first deoxyribonucleotide.
  • the base excising domain is (substantially) capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide (e.g., dG, dT, dC) .
  • the first deoxyribonucleotide is dG, dT, dA, or dC.
  • the base excising domain is (substantially) capable of excising guanine of dG.
  • the base excising domain is (substantially) capable of excising thymine of dT.
  • the base excising domain is (substantially) capable of excising cytosine of dC.
  • the base excising domain is (substantially) capable of excising adenine of dA.
  • the base excising domain can only excise one type of bases but not excise the other types of bases.
  • the base excising domain is (substantially) incapable of excising guanine of dG.
  • the base excising domain is (substantially) incapable of excising thymine of dT.
  • the base excising domain is (substantially) incapable of excising cytosine of dC.
  • the base excising domain is (substantially) incapable of excising adenine of dA.
  • the base excising domain can excise more than one type of bases.
  • the base excising domain is (substantially) capable of excising any two, three, or four of guanine of dG, thymine of dT, cytosine of dC, and adenine of dA.
  • the base excising domain is (substantially) capable of excising both guanine of dG and thymine of dT.
  • the base excising domain is (substantially) capable of excising uracil. In some embodiment, the base excising domain is (substantially) incapable of excising uracil. In some embodiment, the base excising domain is (substantially) capable of excising hypoxanthine. In some embodiment, the base excising domain is (substantially) incapable of excising hypoxanthine.
  • the fusion protein of the disclosure does not comprise a base excising domain (substantially) capable of excising guanine of dG, thymine of dT, cytosine of dC, adenine of dA, uracil, and/or hypoxanthine.
  • the base excising domain is (substantially) incapable of excising bases on both strands of a target dsDNA. In some embodiments, the base excising domain is (substantially) incapable of excising both bases of a pair of base-paired deoxyribonucleotides on both strands of a dsDNA.
  • the base excision domain comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to a naturally-occurring base excision domain, such as a naturally-occurring base excision domain provided herein.
  • a naturally-occurring base excision domain provided herein.
  • the fusion protein of the disclosure comprises one, two, three, or more base excising domains. In some embodiments, the fusion protein comprises two, three, or more base excising domains, which are the same or different.
  • the base excising domain could be a glycosylase having the desired base exising ability.
  • the base excising domain comprises a glycosylase.
  • the glycosylase is selected from the group consisting of N-methylpurine DNA glycosylase (MPG) , 8-oxoguanine DNA glycosylase (OGG1) , methyl-CpG binding domain 4, DNA glycosylase (MBD4) , thymine DNA glycosylase (TDG) , uracil DNA glycosylase (UNG) , single-strand-selective monofunctional uracil-DNA glycosylase 1 (SMUG1) , mutY DNA glycosylase (MUTYH) , nth like DNA glycosylase 1 (NTHL1) , nei like DNA glycosylase 1 (NEIL1) , nei like DNA glycosylase 2 (NEIL2) , nei like DNA glycosylase 3 (NEIL3) , and mutants thereof capable
  • Exemplary glycosylases capable of excising a base include, without limitation, UDG-N204D and UDG-Y147A as described in Kavli, B. et al. Excision of cytosine and thymine from DNA by mutants of human uracil-DNA glycosylase. EMBO J 15, 3442-3447 (1996) ; the entire contents of which are hereby incorporated by reference.
  • the base excision domain is not wild type or naturally-occurring.
  • the base excising domain comprises an N-methylpurine DNA glycosylase (MPG) .
  • MPG comprises a motif GxxYxxxxYGxxxxxN, wherein x represents any amino acid.
  • the MPG is obtained from a species selected from Table A.
  • the MPG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference MPG.
  • the wild type or reference MPG is human MPG (SEQ ID NO: 9) or an MPG obtained from a species selected from Table A or any MPG as set forth in Table D or a homology or mutant (e.g., comprising an amino acid sequence of SEQ ID NO: 5, 6, or 7) thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) (e.g., SEQ ID NO: 1) .
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
  • the amino acid mutation confers an ability to excise a base on the MPG.
  • the base is guanine.
  • the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control MPG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of G163, N169, D175, C178, S198, K202, G203, S206, K210, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
  • the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference MPG.
  • the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
  • the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference MPG.
  • 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
  • the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1.
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
  • the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, wherein the position is numbered according to SEQ ID NO: 1.
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 1 or 9.
  • the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
  • the MPG is (substantially) capable of excising guanine of dG. In some embodiments, the MPG is (substantially) incapable of excising thymine of dT. In some embodiments, the MPG is (substantially) incapable of excising cytosine of dC. In some embodiments, the MPG is (substantially) incapable of excising adenine of dA.
  • the MPG is not wild type or naturally-occurring.
  • the base excising domain comprises an uracil-DNA glycosylase (UNG) .
  • UNG comprises a motif GQDPYH.
  • the UNG is obtained from a species selected from Table C.
  • the UNG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference UNG.
  • the wild type or reference UNG is human UNG1 (SEQ ID NO: 54) or human UNG2 (SEQ ID NO: 133) or an UNG obtained from a species selected from Table C or any UNG as set forth in Table D or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) .
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 56, 58, 135, or 137.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
  • UNG1 SEQ ID NO: 54
  • UNG2 SEQ ID NO: 133
  • residues Y156 and N213 of UNG2 are corresponding to the residues Y147 and N204 of UNG1, respectively.
  • the amino acid mutation confers an ability to excise a base on the UNG.
  • the base is thymine.
  • the base is cytosine.
  • the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control UNG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of Y156, K184, N213, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO:133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135 or 137.
  • the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference UNG.
  • the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of Y156A, K184A, N213D, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
  • the amino acid mutation comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
  • the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference UNG.
  • 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%
  • the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of A214T, Q259A, and Y284D, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
  • the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of K184A and A214V, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
  • the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
  • M N-terminal Methionine
  • the UNG is (substantially) capable of excising thymine of dT. In some embodiments, the UNG is (substantially) capable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising thymine of dT. In some embodiments, the UNG is (substantially) incapable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising adenine of dA. In some embodiments, the UNG is (substantially) incapable of excising guanine of dG.
  • the UNG is not wild type or naturally-occurring.
  • the base excising domain comprises TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
  • the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is human TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 (SEQ ID NO: 64, 65, 66, 67, 68, 69, 70, 71, or 72, respectively) or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) .
  • Method N-terminal Methionine
  • the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid sequence having a sequence identity of at least about 60%(e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 64-72, respectively.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
  • the amino acid mutation confers an ability to excise a base on the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
  • the base is guanine, thymine, cytosine, adenine, uracil, or hypoxanthine.
  • the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
  • the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1, respectively..
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising guanine of dG. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising adenine of dA.
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising adenine of dA. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising guanine of dG.
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is not wild type or naturally-occurring.
  • napDNAbd is used interchangeably with “nucleic acid programmable DNA binding protein (napDNAbp) ” .
  • the napDNAbd is RNA programmable DNA binding protein.
  • Various napDNAbd are known in the art, including, for example, those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
  • Representative napDNAbd include, for example, CRISPR-associated (Cas) proteins, IscB, IsrB, Argonaute, and TnpB.
  • the napDNAbd substantially lacks dsDNA cleavage activity (endonuclease activity) . In some embodiments, the napDNAbd substantially lacks dsDNA cleavage activity (endonuclease activity) and nickase activity. In some embodiments, the napDNAbd is nuclease-inactive, for example, a dead Cas. In some embodiments, the napDNAbd is endonuclease-inactive, for example, a dead Cas.
  • the napDNAbd is a nickase. In some embodiments, the napDNAbd has nickase activity. In some embodiments, the napDNAbd has nickase activity to nick the target strand. In some embodiments, the napDNAbd nicks the target strand. In some embodiments, the method comprising nicking the target strand. In some embodiments, the nick on the target strand or nicking the target strand incorporates an indel (insertion and/or deletion) into the target strand.
  • the napDNAbd is capable of inducing strand separation of the target dsDNA.
  • the napDNAbd comprises a Cas domain. In some embodiments, the napDNAbd comprises a Cas nickase (nCas) or a dead (nuclease-inactive) Cas (dCas) of a Cas protein.
  • nCas Cas nickase
  • dCas dead (nuclease-inactive) Cas
  • the Cas protein is selected from a group consisting of a Cas9 protein (such as, SpCas9, SaCas9, GeoCas9, CjCas9, Cas9-KKH, circularly permuted Cas9, Argonaute (Ago) , SmacCas9, Spy-macCas9, xCas9, SpCas9-NG, SpG Cas9) ; a Cas12 protein (such as, Cas12a (Cpf1) , AsCas12a, LbCas12a, Cas12b, Cas12c, Cas12d, Cas12e, Cas12f (Cas14) , Cas12g, Cas12h, Cas12i, xCas12i, Cas12Max, hfCas12Max, Cas12j, Cas12k, Cas12l, Cas12m, Cas12n, Cas12o, Cas
  • the Cas nickase is a Cas9 nickase (nCas9) , such as SpCas9 nickase (SpCas9-D10A) .
  • the dead Cas is a dead Cas9 (dCas9) , such as dead SpCas9 (SpCas9-D10A+H840A) .
  • dCas9 dead Cas9
  • SpCas9-D10A+H840A dead SpCas9
  • the Cas nickase is a Cas12i nickase (nCas12i) or dead Cas12i (dCas12i) , such as a deadCas12i of xCas12i polypeptide.
  • the napDNAbd comprises an IscB nickase (nIscB) or a dead IscB (dIscB) of an IscB protein (e.g., OgeuIscB) or an IscB protein described in PCT/CN2023/129167, PCT/CN2023/142506, PCT/CN2024/071744, and PCT/CN2023/125069, which are incorporated herein by reference in their entireties.
  • IscB IscB nickase
  • dIscB dead IscB
  • IscB protein e.g., OgeuIscB
  • the napDNAbd comprise an amino acid sequence having a sequence identity of at least about 60%(e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 2, 48, 50, 52, or 163.
  • the napDNAbd comprises a TnpB nickase or a dead TnpB of a TnpB protein.
  • the fusion protein comprises, from N-terminal to C-terminal, (1) the napDNAbp and the base excising domain; or (2) the base excising domain and the napDNAbp.
  • the napDNAbd (e.g., Cas9) is a two-part napDNAbd, for example, a two-part split Cas9, comprising a N-terminal portion and a C-terminal portion, and wherein the fusion protein comprises, from N-terminal to C-terminal, (1) the N-terminal portion of the napDNAbd, the base excising domain, and the C-terminal portion of the napDNAbd; (2) the C-terminal portion of the napDNAbd, the base excising domain, and the N-terminal portion of the napDNAbd; or (3) the base excising domain, the C-terminal portion of the napDNAbd (e.g., amino acids at positions 1249-1368) , and the N-temrinal portion (e.g., amino acids at positions 1-1248) of the napDNAbd.
  • the C-terminal portion of the napDNAbd e.g., amino acids at positions 1249-1368
  • the N-temrinal portion e.g., amino acids
  • the napDNAbd is SpCas9 (e.g., a SpCas9 nickase) or a mutant thereof (e.g., a SpG Cas9 nickase) .
  • the N-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1 or 2 to 1012, 1028, 1041, 1046, 1047, 1248, 1249, or 1300.
  • the C-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1013, 1029, 1042, 1047, 1048, 1249, 1063, 1064, 1230, 1249, or 1301 to 1368.
  • the fusion protein comprises the base excising domain embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2; or embedded between positions 2-1047 of nCas9 (SEQ ID NO: 2) and positions 1064-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2.
  • a typical protein would usually have a N-terminal Met at its most N-terminal (position 1) , since it requires to be translated from a polynucleotide containing a start codon ATG (encoding Met) at its most 5’ end.
  • a second protein e.g., an NLS, a napDNAbd
  • the start codon ATG may not be necessary for the protein since there would typically be a start codon upstream of the second protein for the translation of the fusion protein as a whole, and thus the N-terminal Met of the protein could be removed.
  • Any protein described in the disclosure refers to both the protein per se and a N-terminal truncation thereof with its most N-terminal Met (if present) removed.
  • the fusion protein comprises an NLS at the N-terminal and/or C-terminal of the napDNAbp. In some embodiments, the fusion protein comprises an NLS at the N-terminal and/or C-terminal of the base excising domain. In some embodiments, the NLS is or comprises a SV40 NLS, a bpSV40 NLS (e.g., SEQ ID NO: 10 or 11) , or a NP NLS (Xenopus laevis Nucleoplasmin NLS, nucleoplasmin NLS) .
  • Additional NLS suitable for the disclosure or the way of linking an NLS to any of the components of the fusion protein of the disclosure include, for example, a linker of SGGS, or those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
  • the components (e.g., the napDNAbp and the base excising domain, the NLS and the napDNAbp, or the NLS and the base excising domain) of the fusion protein are fused to each other with or without a linker.
  • Suitable linkers include, for example, SGGS, the linker of SEQ ID NO: 134, and those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
  • the fusion protein comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 12, 14, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 55, 57, 59, 61, 63, 136, 138, 140, 142, 143, 145, 147, 149, 151, 153, 154, 156, 158, 160, 161, 162, and 164.
  • 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,
  • the fusion protein of the disclosure may be used in combination with a deaminase domain for various purposes, e.g., improved outcome purity.
  • purity it means the percentage /proportion of an outcome among all possible outcomes.
  • purity of dT means the percentage /proportion of dT as an outcome among all possible outcomes including, for example, dA, dT, dG, and dC.
  • the introduction of a deaminase domain may contribute to further conversion of an undesired deoxyribonucleotide as a byproduct (e.g., dC) to a desired deoxyribonucleotide (e.g., dT) by A-to-T base editing.
  • dG-to-dT a byproduct
  • the target dG is converted, in part, to dC by the base editing without deamination as described herein
  • the dC is converted to dT by the C-to-T base editing with deamination, thereby achieving high purity dG-to-dT.
  • the fusion protein further comprises a deaminase domain.
  • the deaminase domain may be fused to a component of the fusion protein without or with a linker as described herein.
  • adenine deaminases are known in the art, including, for example, those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
  • Representative adenine deaminases include, for example, TadA and homologs and variants thereof, and APOBEC and homologs and variants thereof.
  • the deaminase domain is a deaminase domain (substantially) capable of deaminating adenine, guanine, hypoxanthine, cytidine, thymine, and/or uracil. In some embodiments, the deaminase domain is an adenine deaminase domain or a cytosine deaminase domain.
  • the deaminase domain comprises a tRNA adenosine deaminase (TadA) or a functional variant or fragment thereof, e.g., TadA8e (SEQ ID NO: 3) , TadA8.17, TadA8.20, TadA9, TadA8E V106W , TadA8E V106W+D108Q TadA-CDa, TadA-CDb, TadA-CDc, TadA-CDd, TadA-CDe, TadA-dual, T AD AC-1.2, T AD AC-1.14, T AD AC-1.17, T AD AC-1.19, T AD AC-2.5, T AD AC-2.6, T AD AC-2.9, T AD AC-2.19, T AD AC-2.23, TadA8e-N46L, TadA8e-N46P.
  • TadA tRNA adenosine deaminase
  • TadA tRNA adeno
  • the deaminase domain comprises an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase, an activation induced deaminase (AID) , a cytidine deaminase 1 from Petromyzon marinus (pmCDA1) , or a functional variant or fragment thereof, e.g., APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H.
  • APOBEC apolipoprotein B mRNA-editing complex
  • AID activation induced deaminase
  • APOBEC1 a functional variant or fragment thereof, e.g., APOBEC1, APOBEC2, APOBEC3A, APOBEC3
  • the protospacer sequence comprises about, at least about, or at most about 14 contiguous nucleotides of the target dsDNA, e.g., about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides on the nontarget strand of the target dsDNA, or in a numerical range between any two of the preceding values, e.g., from about 16 to about 50, or from about 17 to about 22 contiguous nucleotides of the target dsDNA.
  • the protospacer sequence comprises about 20, 30, or 50 contiguous nucleotides of the target
  • the protospacer sequence is a stretch of about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides on the nontarget strand of the target dsDNA, or a stretch of contiguous nucleotides on the nontarget strand of the target dsDNA in a numerical range between any two of the preceding values, e.g., a stretch of from about 16 to about 50 contiguous nucleotides.
  • the protospacer sequence is a stretch of about 20, 30, or 50 contiguous nucleotides on the nontarget strand of
  • the protospacer sequence is immediately 5’ or 3’ to a protospacer adjacent motif (PAM) comprises sequence 5’-NN-3’ , 5’-NNN-3’ , 5’-NNNN-3’ , 5’-NNNNN-3’ , or 5’-NNNNNN-3’ , wherein N is A, T, G, or C.
  • the protospacer sequence is immediately 5’ to a protospacer adjacent motif (PAM) comprises sequence 5’-NGG-3’ or 5’-NTN-3’ , wherein N is A, T, G, or C.
  • the protospacer sequence is immediately 3’ to a protospacer adjacent motif (PAM) comprises sequence 5’-TTN-3’ , wherein N is A, T, G, or C.
  • the guide sequence is in a length of about, at least about, or at most about 14 nucleotides, e.g., about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more nucleotides, or in a length of nucleotides in a numerical range between any two of the preceding values, e.g., in a length of from about 16 to about 50 nucleotides. In some embodiments, the guide sequence is in a length of about 20, 30, or 50 nucleotides.
  • the guide sequence is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% (fully) , optionally about 100% (fully) , reversely complementary to the target sequence; (2) the guide sequence contains no more than 5, 4, 3, 2, or 1 mismatch or contains no mismatch with the target sequence; or (3) the guide sequence comprises no mismatch with the target sequence in the first 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 nucleotides at the 5’ end of the guide sequence when the PAM is immediately 5’ to the protospacer sequence or at the
  • the guide sequence contains 1 mismatch with the target sequence. In some embodiments, the guide sequence is about 98%reversely complementary to the target sequence. In some embodiments, the 1 mismatch in the guide sequence is at a position corresponding the nucleotide of the target sequence that is intended to be substituted.
  • the guide sequence comprises (1) a sequence of SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) .
  • the guide sequence comprises a sequence of any one of SEQ ID NOs: SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) .
  • the disclosure provides a guide nucleic acid comprising a guide sequence as described herein and a scaffold sequence capable of forming a complex with a napDNAbd.
  • the scaffold sequence and the napDNAbd may be as described herein.
  • the scaffold sequence is compatible with the napDNAbd of the disclosure and is capable of complexing with the napDNAbd.
  • the scaffold sequence may be a naturally occurring scaffold sequence identified along with the napDNAbd, or a variant thereof maintaining the ability to complex with the napDNAbd.
  • the ability to complex with the napDNAbd is maintained as long as the secondary structure of the variant is substantially identical to the secondary structure of the naturally occurring scaffold sequence.
  • a nucleotide deletion, insertion, or substitution in the primary sequence of the scaffold sequence may not necessarily change the secondary structure of the scaffold sequence (e.g., the relative locations and/or sizes of the stems, bulges, and loops of the scaffold sequence do not significantly deviate from that of the original stems, bulges, and loops) .
  • the nucleotide deletion, insertion, or substitution may be in a bulge or loop region of the scaffold sequence so that the overall symmetry of the bulge and hence the secondary structure remains largely the same.
  • the nucleotide deletion, insertion, or substitution may also be in the stems of the scaffold sequence so that the lengths of the stems do not significantly deviate from that of the original stems (e.g., adding or deleting one base pair in each of two stems correspond to 4 total base changes) .
  • the scaffold sequence is 5’ or 3’ to the guide sequence.
  • the scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 40, 73, or 74.
  • the scaffold sequence comprises (1) a sequence of SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 40, 73, or 74.
  • the scaffold sequence comprises the sequence of SEQ ID NO: 40
  • TLS Translesion synthesis
  • the fusion protein of the disclosure may be used in combination with a translesion synthesis (TLS) polymerase for improved outcome purity.
  • TLS translesion synthesis
  • purity it means the percentage /proportion of an outcome among all possible outcomes.
  • purity of dT means the percentage /proportion of dT as an outcome among all possible outcomes including, for example, dA, dT, dG, and dC.
  • TLS polymerases may have their own inclination of incorporating various deoxyribonucleotide opposite an abasic site during polymerization, as listed in Table 5. By taking advantage of such inclination, the base editing outcome may be intentionally controlled to improve outcome purity.
  • human Pol ⁇ (SEQ ID NO: 118) is a TLS polymerase preferentially incorporating dA opposite an abasic site.
  • the base editing outcome may be adjusted toward dT, thereby increasing purity of dT product.
  • the fusion protein or system of the disclosure further comprises a translesion synthesis (TLS) polymerase or a recruiting domain or component capable of recruiting a TLS polymerase.
  • TLS translesion synthesis
  • the TLS polymerase or the recruiting domain or component is fused to a component of the fusion protein without or with a linker as described herein.
  • Non-limiting examples of the TLS polymerase is selected from the group consisting of Pol ⁇ (alpha) , Pol ⁇ (beta) , Pol ⁇ (delta) (PCNA) , Pol ⁇ (gamma) , Pol ⁇ (eta) , Pol ⁇ (iota) , Pol ⁇ (kappa) , Pol ⁇ (lamda) , Pol ⁇ (mu) , Pol ⁇ (nu) , Pol ⁇ (theta) , and REV1.
  • the TLS polymerase comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 118.
  • the TLS polymerase comprises the amino acid sequence of SEQ ID NO: 118 (Pol ⁇ ) .
  • the fusion protein or system further comprising the translesion synthesis (TLS) polymerase or a recruiting domain capable of recruiting a TLS polymerase leads to conversion of the first deoxyribonucleotide to dG, dC, dT, or dA.
  • TLS translesion synthesis
  • the disclosure provides an MPG described herein, or of the disclosure.
  • the MPG comprises a motif GxxYxxxxYGxxxxxN, wherein x represents any amino acid.
  • the MPG is obtained from a species selected from Table A.
  • the MPG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference MPG.
  • the wild type or reference MPG is human MPG (SEQ ID NO: 9) or an MPG obtained from a species selected from Table A or any MPG as set forth in Table D or a homology or mutant (e.g., comprising an amino acid sequence of SEQ ID NO: 5, 6, or 7) thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) (e.g., SEQ ID NO: 1) .
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
  • the amino acid mutation confers an ability to excise a base on the MPG.
  • the base is guanine.
  • the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control MPG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of G163, N169, D175, C178, S198, K202, G203, S206, K210, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO:1.
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
  • the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference MPG.
  • the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
  • the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference MPG.
  • 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
  • the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1.
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
  • the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, wherein the position is numbered according to SEQ ID NO: 1.
  • the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 1 or 9.
  • the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
  • the MPG is (substantially) capable of excising guanine of dG. In some embodiments, the MPG is (substantially) incapable of excising thymine of dT. In some embodiments, the MPG is (substantially) incapable of excising cytosine of dC. In some embodiments, the MPG is (substantially) incapable of excising adenine of dA.
  • the disclosure provides a fusion protein comprising the MPG described herein, or of the disclosure, and a functional domain, such as, a napDNAbd.
  • the disclosure provides use of the MPG described herein, or of the disclosure, for base editing as described herein.
  • the MPG is not wild type or naturally-occurring.
  • the disclosure provides an UNG described herein, or of the disclosure.
  • the UNG comprises a motif GQDPYH. In some embodiments, the UNG is obtained from a species selected from Table C. In some embodiments, the UNG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference UNG. In some embodiments, the wild type or reference UNG is human UNG1 (SEQ ID NO: 54) or human UNG2 (SEQ ID NO: 133) or an UNG obtained from a species selected from Table C or any UNG as set forth in Table D or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) . In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 56, 58, 135, or 137.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
  • UNG1 SEQ ID NO: 54
  • UNG2 SEQ ID NO: 133
  • residues Y156 and N213 of UNG2 are corresponding to the residues Y147 and N204 of UNG1, respectively.
  • the amino acid mutation confers an ability to excise a base on the UNG.
  • the base is thymine.
  • the base is cytosine.
  • the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control UNG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of Y156, K184, N213, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO:133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135 or 137.
  • the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference UNG.
  • the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of Y156A, K184A, N213D, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
  • the amino acid mutation comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
  • the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
  • the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference UNG.
  • 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%
  • the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of A214T, Q259A, and Y284D, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
  • the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of K184A and A214V, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
  • the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
  • the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
  • M N-terminal Methionine
  • the UNG is (substantially) capable of excising thymine of dT. In some embodiments, the UNG is (substantially) capable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising thymine of dT. In some embodiments, the UNG is (substantially) incapable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising adenine of dA. In some embodiments, the UNG is (substantially) incapable of excising guanine of dG.
  • the disclosure provides a fusion protein comprising the UNG described herein, or of the disclosure, and a functional domain, such as, a napDNAbd.
  • the disclosure provides use of the UNG described herein, or of the disclosure, for base editing as described herein.
  • the UNG is not wild type or naturally-occurring.
  • the disclosure provides a TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 described herein, or of the disclosure.
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
  • the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is human TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 (SEQ ID NO: 64, 65, 66, 67, 68, 69, 70, 71, or 72, respectively) or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) .
  • Method N-terminal Methionine
  • the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 64-72, respectively.
  • 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99
  • the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
  • the amino acid mutation confers an ability to excise a base on the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
  • the base is guanine, thymine, cytosine, adenine, uracil, or hypoxanthine.
  • the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
  • the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
  • the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1, respectively..
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising guanine of dG. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising adenine of dA.
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising adenine of dA. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising guanine of dG.
  • the disclosure provides a fusion protein comprising the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 described herein, or of the disclosure, and a functional domain, such as, a napDNAbd.
  • the disclosure provides use of the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 described herein, or of the disclosure, for base editing as described herein.
  • the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is not wild type or naturally-occurring.
  • Also provided in the disclosure is a polynucleotide comprising or encoding the guide nucleic acid.
  • the polynucleotide comprising or encoding the guide nucleic acid is a DNA, a RNA, or a DNA/RNA mixture.
  • DNA/RNA mixture it refers to a nucleic acid comprising both one or more modified or unmodified ribonucleotides and one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
  • DNA or RNA it may also refer to a DNA containing one or more modified or unmodified ribonucleotides, whether consecutive or not, or an RNA containing one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
  • the guide nucleic acid is operably linked to or under the regulation of a promoter.
  • the promoter is a ubiquitous, tissue-specific, cell-type specific, constitutive, or inducible promoter.
  • Suitable promoters include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a ⁇ -actin promoter, an elongation factor 1 ⁇ short (EFS) promoter, a ⁇ glucuronidase (GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and/or promoter, a chicken ⁇ -actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1 ⁇ -subunit (EF1 ⁇
  • the disclosure provides a polynucleotide encoding the fusion protein of the disclosure and optionally the guide nucleic acid of the disclosure.
  • the polynucleotide encoding the fusion protein is a DNA, a RNA, or a DNA/RNA mixture.
  • DNA/RNA mixture it refers to a nucleic acid comprising both one or more modified or unmodified ribonucleotides and one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
  • DNA or RNA it may also refer to a DNA containing one or more modified or unmodified ribonucleotides, whether consecutive or not, or an RNA containing one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
  • the polynucleotide encoding the napDNAbd is a mRNA.
  • the polynucleotide encoding the napDNAbd comprises a sequence encoding the napDNAbd and a promoter operably linked to the sequence encoding the napDNAbd.
  • the polynucleotide encoding the napDNAbd is operably linked to or under the regulation of a promoter.
  • the promoter is a ubiquitous, tissue-specific, cell-type specific, constitutive, or inducible promoter.
  • Suitable promoters include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a ⁇ -actin promoter, an elongation factor 1 ⁇ short (EFS) promoter, a ⁇ glucuronidase (GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and/or promoter, a chicken ⁇ -actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1 ⁇ -subunit (EF1 ⁇
  • the disclosure provides a delivery system comprising (1) the fusion protein of the disclosure, the polynucleotide of the disclosure, or the system of the disclosure; and (2) a delivery vehicle.
  • the disclosure provides a vector comprising the polynucleotide of the disclosure.
  • the vector encodes a guide nucleic acid of the disclosure.
  • the vector is a plasmid vector, a recombinant AAV (rAAV) vector (vector genome) , or a recombinant lentivirus vector.
  • the disclosure provides a recombinant AAV (rAAV) particle comprising the rAAV vector genome of the disclosure.
  • a simple introduction of AAV for delivery may refer to “Adeno-associated Virus (AAV) Guide” (addgene. org/guides/aav/) .
  • Adeno-associated virus when engineered to delivery, e.g., a protein-encoding sequence of interest, may be termed as a (r) AAV vector, a (r) AAV vector particle, or a (r) AAV particle, where “r” stands for “recombinant” .
  • the genome packaged in AAV vectors for delivery may be termed as a (r) AAV vector genome, vector genome, or vg for short, while viral genome may refer to the original viral genome of natural AAVs.
  • the serotypes of the capsids of rAAV particles can be matched to the types of target cells.
  • Table 2 of WO2018002719A1 lists exemplary cell types that can be transduced by the indicated AAV serotypes (incorporated herein by reference) .
  • the rAAV particle comprising a capsid with a serotype suitable for delivery into a desired target cell.
  • the rAAV particle comprising a capsid with a serotype of AAV1, AAV2, AAV3A, AAV3B, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV-DJ, or AAV. PHP. eB, a member of the Clade to which any of the AAV1-AAV13 belong, or a functional variant (e.g., a functional truncation) thereof, encapsidating the rAAV vector genome.
  • the serotype of the capsid is wild type serotype or a functional variant thereof.
  • rAAV particles may be produced using the triple transfection method (described in detail in U.S. Pat. No. 6,001,650) .
  • the vector titers are usually expressed as vector genomes per ml (vg/ml) .
  • the vector titer is above 1 ⁇ 10 9 , above 5 ⁇ 10 10 , above 1 ⁇ 10 11 , above 5 ⁇ 10 11 , above 1 ⁇ 10 12 , above 5 ⁇ 10 12 , or above 1 ⁇ 10 13 vg/ml.
  • RNA sequence as a vector genome into a rAAV particle
  • systems and methods of packaging an RNA sequence as a vector genome into a rAAV particle is recently developed and applicable herein. See PCT/CN2022/075366, which is incorporated herein by reference in its entirety.
  • sequence elements described herein for DNA vector genomes when present in RNA vector genomes, should generally be considered to be applicable for the RNA vector genomes except that the deoxyribonucleotides in the DNA sequence are the corresponding ribonucleotides in the RNA sequence (e.g., dT is equivalent to U, and dA is equivalent to A) and/or the element in the DNA sequence is replaced with the corresponding element with a corresponding function in the RNA sequence or omitted because its function is unnecessary in the RNA sequence and/or an additional element necessary for the RNA vector genome is introduced.
  • dT is equivalent to U
  • dA is equivalent to A
  • a coding sequence e.g., as a sequence element of rAAV vector genomes herein, is construed, understood, and considered as covering and covers both a DNA coding sequence and an RNA coding sequence.
  • an RNA sequence can be transcribed from the DNA coding sequence, and optionally further a protein can be translated from the transcribed RNA sequence as necessary.
  • the RNA coding sequence per se can be a functional RNA sequence for use, or an RNA sequence can be produced from the RNA coding sequence, e.g., by RNA processing, or a protein can be translated from the RNA coding sequence.
  • a fusion protein coding sequence encoding a fusion protein covers either a fusion protein DNA coding sequence from which a fusion protein is expressed (indirectly via transcription and translation) or a fusion protein RNA coding sequence from which a fusion protein is translated (directly) .
  • a gRNA coding sequence encoding a gRNA covers either a gRNA DNA coding sequence from which a gRNA is transcribed or a gRNA RNA coding sequence (1) which per se is the functional gRNA for use, or (2) from which a gRNA is produced, e.g., by RNA processing.
  • 5’-ITR and/or 3’-ITR as DNA packaging signals may be unnecessary and can be omitted at least partly, while RNA packaging signals can be introduced.
  • a promoter to drive transcription of DNA sequences may be unnecessary and can be omitted at least partly.
  • a sequence encoding a polyA signal may be unnecessary and can be omitted at least partly, while a polyA tail can be introduced.
  • DNA elements of rAAV DNA vector genomes can be either omitted or replaced with corresponding RNA elements and/or additional RNA elements can be introduced, in order to adapt to the strategy of delivering an RNA vector genome by rAAV particles.
  • the disclosure provides a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
  • a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
  • the disclosure provides a ribonucleoprotein (RNP) comprising the fusion protein of the disclosure and a guide nucleic acid of the disclosure.
  • RNP ribonucleoprotein
  • the disclosure provides a lipid nanoparticle (LNP) comprising an RNA (e.g., mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid of the disclosure.
  • LNP lipid nanoparticle
  • the disclosure provides a pharmaceutical composition
  • a pharmaceutical composition comprising (1) the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, or the cell of the disclosure; and (2) a pharmaceutically acceptable excipient.
  • the pharmaceutical composition comprises the rAAV particle in a concentration selected from the group consisting of about 1 ⁇ 10 10 vg/mL, 2 ⁇ 10 10 vg/mL, 3 ⁇ 10 10 vg/mL, 4 ⁇ 10 10 vg/mL, 5 ⁇ 10 10 vg/mL, 6 ⁇ 10 10 vg/mL, 7 ⁇ 10 10 vg/mL, 8 ⁇ 10 10 vg/mL, 9 ⁇ 10 10 vg/mL, 1 ⁇ 10 11 vg/mL, 2 ⁇ 10 11 vg/mL, 3 ⁇ 10 11 vg/mL, 4 ⁇ 10 11 vg/mL, 5 ⁇ 10 11 vg/mL, 6 ⁇ 10 11 vg/mL, 7 ⁇ 10 11 vg/mL, 8 ⁇ 10 11 vg/mL, 9 ⁇ 10 11 vg/mL, 1 ⁇ 10 12 vg/mL, 2 ⁇ 10 12 vg/mL, 3 ⁇ 10 12 vg/
  • the volume of the injection is selected from the group consisting of about 1 microliter, 10 microliters, 50 microliters, 100 microliters, 150 microliters, 200 microliters, 250 microliters, 300 microliters, 350 microliters, 400 microliters, 450 microliters, 500 microliters, 550 microliters, 600 microliters, 650 microliters, 700 microliters, 750 microliters, 800 microliters, 850 microliters, 900 microliters, 950 microliters, 1000 microliters, and a volume of a numerical range between any of two preceding values, e.g., in a concentration of from about 10 microliters to about 750 microliters.
  • the methods of the disclosure can be used to introduce the systems of the disclosure into a cell and cause the cell to alter the production of one or more cellular produces, such as antibody, starch, ethanol, or any other desired products. Such cells and progenies thereof are within the scope of the disclosure.
  • the disclosure provides a cell or a progeny thereof comprising the system of the disclosure.
  • the cell is a eukaryote.
  • the cell is a human cell.
  • the disclosure provides a cell or a progeny thereof modified by the system of the disclosure or the method of the disclosure.
  • the cell is a eukaryote.
  • the cell is a human cell.
  • the cell is modified in vitro, in vivo, or ex vivo.
  • the cell is a stem cell. In some embodiments, the cell is not a human embryonic stem cell. In some embodiments, the cell is not a human germ cell.
  • the cell is a prokaryotic cell.
  • the cell is a eukaryotic cell (e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant cell, or a yeast cell) or a prokaryotic cell (e.g., a bacteria cell) .
  • a eukaryotic cell e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant cell, or a yeast cell
  • a prokaryotic cell e.g., a bacteria cell
  • the cell is from a plant or an animal.
  • the plant is a dicotyledon.
  • the dicotyledon is selected from the group consisting of soybean, cabbage (e.g., Chinese cabbage) , rapeseed, brassica, watermelon, melon, potato, tomato, tobacco, eggplant, pepper, cucumber, cotton, alfalfa, eggplant, grape.
  • the plant is a monocotyledon.
  • the monocotyledon is selected from the group consisting of rice, corn, wheat, barley, oat, sorghum, millet, grasses, Poaceae, Zizania, Avena, Coix, Hordeum, Oryza, Panicum (e.g., Panicum miliaceum) , Secale, Setaria (e.g., Setaria italica) , Sorghum, Triticum, Zea, Cymbopogon, Saccharum (e.g., Saccharum officinarum) , Phyllostachys, Dendrocalamus, Bambusa, Yushania.
  • the animal is selected from the group consisting of pig, ox, sheep, goat, mouse, rat, alpaca, monkey, rabbit, chicken, duck, goose, fish (e.g., zebra fish) .
  • the cell is a eukaryotic cell, such as a mammalian cell, including a human cell (a primary human cell or an established human cell line) .
  • the cell is a non-human mammalian cell, such as a cell from a non-human primate (e.g., monkey) , a cow /bull /cattle, sheep, goat, pig, horse, dog, cat, rodent (such as rabbit, mouse, rat, hamster, etc. ) .
  • the cell is from fish (such as salmon) , bird (such as poultry bird, including chick, duck, goose) , reptile, shellfish (e.g., oyster, claim, lobster, shrimp) , insect, worm, yeast, etc.
  • the cell is from a plant, such as monocot or dicot.
  • the plant is a food crop such as barley, cassava, cotton, groundnuts or peanuts, maize, millet, oil palm fruit, potatoes, pulses, rapeseed or canola, rice, rye, sorghum, soybeans, sugar cane, sugar beets, sunflower, and wheat.
  • the plant is a cereal (barley, maize, millet, rice, rye, sorghum, and wheat) .
  • the plant is a tuber (cassava and potatoes) .
  • the plant is a sugar crop (sugar beets and sugar cane) .
  • the plant is an oil-bearing crop (soybeans, groundnuts or peanuts, rapeseed or canola, sunflower, and oil palm fruit) .
  • the plant is a fiber crop (cotton) .
  • the plant is a tree (such as a peach or a nectarine tree, an apple or pear tree, a nut tree such as almond or walnut or pistachio tree, or a citrus tree, e.g., orange, grapefruit or lemon tree) , a grass, a vegetable, a fruit, or an algae.
  • the plant is a nightshade plant; a plant of the genus Brassica; a plant of the genus Lactuca; a plant of the genus Spinacia; a plant of the genus Capsicum;
  • the cell is not within the body of an organism, such as, human or animal. In some embodiments, the cell is not a human embryonic stem cell. In some embodiments, the cell is not a human germ cell.
  • the disclosure provides a method for modifying a target dsDNA, comprising contacting the target DNA with the system of the disclosure, wherein the guide sequence is capable of hybridizing to a target sequence of the target dsDNA, wherein the target dsDNA is modified by the complex.
  • the disclosure provides a method for diagnosing, preventing, and/or treating a disease in a subject in need thereof, comprising administering to the subject (e.g., a therapeutically effective amount /dose of) the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, the cell of the disclosure, or the pharmaceutical composition of the disclosure, wherein the disease is associated with a target dsDNA, wherein the guide sequence is capable of hybridizing to a target sequence of the target dsDNA, wherein the target dsDNA is modified by the complex, and wherein the modification of the target dsDNA diagnose, prevents, and/or treats the disease.
  • the subject e.g., a therapeutically effective amount /dose of
  • the disease is selected from the group consisting of Angelman syndrome (AS) , Alzheimer's disease (AD) , transthyretin amyloidosis (ATTR) , transthyretin amyloid cardiomyopathy (ATTR-CM) , cystic fibrosis (CF) , hereditary angioedema, diabetes, progressive pseudohypertrophic muscular dystrophy, Duchenne muscular dystrophy (DMD) , Becker muscular dystrophy (BMD) , spinal muscular atrophy (SMA) , alpha-1-antitrypsin deficiency, Pompe disease, myotonic dystrophy, Huntington’s disease (HTT) , fragile X syndrome, Friedreich ataxia, amyotrophic lateral sclerosis (ALS) , frontotemporal dementia, hereditary chronic kidney disease, hyperlipidemia, Leber congenital amaurosis (LCA) , sickle cell disease, thalassemia (e.g., ⁇ -thalassemia)
  • the target dsDNA encodes a mRNA, a tRNA, a ribosomal RNA (rRNA) , a microRNA (miRNA) , a non-coding RNA, a long non-coding (lnc) RNA, a nuclear RNA, an interfering RNA (iRNA) , a small interfering RNA (siRNA) , a ribozyme, a riboswitch, a satellite RNA, a microswitch, a microzyme, or a viral RNA.
  • iRNA interfering RNA
  • siRNA small interfering RNA
  • the target dsDNA is a eukaryotic DNA.
  • the eukaryotic DNA is a mammal DNA, such as a non-human mammalian DNA, a non-human primate DNA, a human DNA, a plant DNA, an insect DNA, a bird DNA, a reptile DNA, a rodent (e.g., mouse, rat) DNA, a fish DNA, a nematode DNA, or a yeast DNA.
  • a mammal DNA such as a non-human mammalian DNA, a non-human primate DNA, a human DNA, a plant DNA, an insect DNA, a bird DNA, a reptile DNA, a rodent (e.g., mouse, rat) DNA, a fish DNA, a nematode DNA, or a yeast DNA.
  • the target dsDNA is in a eukaryotic cell, for example, a human cell, a non-human primate cell, or a mouse cell.
  • the administrating comprises local administration or systemic administration.
  • the administrating comprises intrathecal administration, intramuscular administration, intravenous administration, transdermal administration, intranasal administration, oral administration, mucosal administration, intraperitoneal administration, intracranial administration, intracerebroventricular administration, or stereotaxic administration.
  • the administration is injection or infusion.
  • the subject is a human, a non-human primate, or a mouse.
  • the level of the transcript (e.g., mRNA) of the target dsDNA is decreased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the transcript (e.g., mRNA) of the target dsDNA in the subject prior to the administration.
  • the level of the transcript (e.g., mRNA) of the target dsDNA is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the transcript (e.g., mRNA) of the target dsDNA in the subject prior to the administration.
  • the level of the expression product (e.g., protein) of the target dsDNA is decreased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the expression product (e.g., protein) of the target dsDNA in the subject prior to the administration.
  • the level of the expression product (e.g., protein) of the target dsDNA is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the expression product (e.g., protein) of the target dsDNA in the subject prior to the administration.
  • the expression product is a functional mutant of the expression product of the target dsDNA.
  • the median survival of the subject suffering from the disease but receiving the administration is 5 days, 10 days, 20 days, 30 days, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 12 months, 1.5 year, 2 years, 2.5 years, 3 years, 4 years, 5 years, 6 years, 7 years, 8 years, 9 years, 10 years or more longer than that of a subject or a population of subjects suffering from the disease and not receiving the administration.
  • the therapeutically effective dose may be either via a single dose, or multiple doses.
  • the actual dose may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.
  • the therapeutically effective dose of the rAAV particle may be about 1.0E+8, 2.0E+8, 3.0E+8, 4.0E+8, 6.0E+8, 8.0E+8, 1.0E+9, 2.0E+9, 3.0E+9, 4.0E+9, 6.0E+9, 8.0E+9, 1.0E+10, 2.0E+10, 3.0E+10, 4.0E+10, 6.0E+10, 8.0E+10, 1.0E+11, 2.0E+11, 3.0E+11, 4.0E+11, 6.0E+11, 8.0E+11, 1.0E+12, 2.0E+12, 3.0E+12, 4.0E+12, 6.0E+12, 8.0E+12, 1.0E+13, 2.0E+13, 3.0E+13, 4.0E+13, 6.0E+13, 8.0E+13, 1.0E+14, 2.0E+14, 3.0E+14, 4.0E+14, 6.0E+14, 8.0E+14, 1.0E+15, 2.0E+15, 2.0
  • the disclosure provides a method of detecting a target dsDNA, comprising contacting the target dsDNA with the system of the disclosure, wherein the target dsDNA is modified by the complex, and wherein the modification detects the target dsDNA.
  • the modification generates a detectable signal, e.g., a fluorescent signal.
  • the disclosure provides a kit comprising the fusion protein of the disclosure, the system of the disclosure, the polynucleotide of the disclosure, the vector of the disclosure, the RNP of the disclosure, the LNP of the disclosure, the delivery system of the disclosure, the cell of the disclosure, or the pharmaceutical composition of the disclosure, or any one, two, or all components of the same.
  • the kit further comprises an instruction to use the component (s) contained therein, and/or instructions for combining with additional component (s) that may be available or necessary elsewhere.
  • the kit further comprises one or more buffers that may be used to dissolve any of the component (s) contained therein, and/or to provide suitable reaction conditions for one or more of the component (s) .
  • buffers may include one or more of PBS, HEPES, Tris, MOPS, Na 2 CO 3 , NaHCO 3 , NaB, or combinations thereof.
  • the reaction condition includes a proper pH, such as a basic pH. In some embodiments, the pH is between 7-10.
  • any one or more of the kit components may be stored in a suitable container or at a suitable temperature, e.g., 4 degree Celsius.
  • the G-to-T reporter was to be used for evaluating the guanine base editing efficiency of the gBE (for this purpose, termed as glycosylase-based guanine base editor (gGBE) ) , and the A-to-T reporter for evaluating the adenine base editing efficiency of the gBE (for this purpose, termed as glycosylase-based adenine base editor (gABE) ) .
  • gBE expression vectors with wild type human MPG (SEQ ID NO: 1) (without first N-terminal Methionine (M) as compared to the full length wild type human MPG (SEQ ID NO: 9) ) or a distinctive version of human MPG mutants (i.e., MPGv0.2 (SEQ ID NO: 4) , MPGv1 (SEQ ID NO: 5) , MPGv2 (SEQ ID NO: 6) , and MPGv3 (SEQ ID NO: 7) ) that were reported in a previous study [11] were constructed.
  • MPGv0.2 SEQ ID NO: 4
  • MPGv1 SEQ ID NO: 5
  • MPGv2 SEQ ID NO: 6
  • MPGv3 SEQ ID NO: 7
  • conversion activity refers to the activity of the gBE of the disclosure to convert a target deoxyribonucleotide to an outcome deoxyribonucleotide, and the outcome deoxyribonucleotide may be or may not be specified as a specific type of deoxyribonucleotide, e.g., G-to-T.
  • (base) editing efficiency refers to the activity of the gBE of the disclosure to convert a target deoxyribonucleotide to an outcome deoxyribonucleotide, and the outcome deoxyribonucleotide may be or may not be specified as a specific type of deoxyribonucleotide, e.g., G-to-T.
  • both the outcome deoxyribonucleotides for conversion activity and (base) editing efficiency are not specified, or both the outcome deoxyribonucleotides for conversion activity and (base) editing efficiency are specified as the same one or more specific types of deoxyribonucleotide, they may refer to the same performance of the gBE of the disclosure and can be used interchangeably.
  • the gBE (hereafter referred to as gGBEv3) (SEQ ID NO: 15) containing MPGv3 (SEQ ID NO: 7) exhibited the highest G-to-T base editing efficiency (4.33%) in cultured HEK293T cells (FIG. 1c) as compared to that of the gGBE (SEQ ID NO: 14) with WT hMPG (SEQ ID NO: 1) (0.03%) , showing a striking 144-fold enhancement in G base editing efficiency.
  • gGBEv3 was mutated with G174R or D175R, generating new base editors gGBEv3.1 (SEQ ID NO: 17) (containing MPGv3 carrying additional substitution G174R; termed as MPGv3.1 (SEQ ID NO: 16) ) and gGBEv3.2 (SEQ ID NO: 19) (containing MPGv3 carrying additional substitution D175R; termed as MPGv3.2 (SEQ ID NO: 18) ) (FIG. 2a) . It was found that the G-to-T conversion activity of gGBEv3.2 (10.22%) was about 1.78-fold of gGBEv3 (5.73%) (FIG. 2b and FIG. 11b) .
  • gGBEv4 (SEQ ID NO: 23) (containinig MPGv3 carrying additional substitutions of both D175R and C178N; termed as MPGv4 (SEQ ID NO: 22) ) (39.57%) achieved a synergistic enhancement of G-to-T editing efficiency of about 6.9-fold compared to gGBEv3 (5.73%) (FIG. 2b and FIG. 11b) .
  • nCas9 and MPG were changed to see whether the editing effeiciency would be associated with the positional relationship of nCas9 and MPG. It was found that gGBEv4 with MPG fused at the C-terminus of nCas9 had slightly higher editing efficiency than gGBEv4 with MPG fused at the N-terminus of nCas9 (34.6%vs. 25.9%, FIG. 11d) .
  • the R163-V179 region of MPGv4 was mutated by sequential replacement with amino acids having distinct properties, including glutamic acid (with negative charged side chain) , valine (with small hydrophobic side chain) , glycine (with no side chain) , or tyrosine (with large hydrophobic side chain) (X-to-E, V, G, or Y) .
  • gGBEv4.1 (SEQ ID NO: 25) (containing MPGv4 carrying additional substitution I170V; termed as MPGv4.1 (SEQ ID NO: 24) )
  • gGBEv4.2 (SEQ ID NO: 27) (containing MPGv4 carrying additional substitution S169G (or N169G if compared with WT MPG) ; termed as MPGv4.2 (SEQ ID NO: 26) )
  • gGBEv4.3 SEQ ID NO: 29
  • MPGv4.3 (SEQ ID NO: 28)
  • amino acid sequences of gGBEv5.1, v5.2, v6.1, v6.2, and v6.4 are set forth in SEQ ID NOs: 31, 33, 35, 37, and 39, respectively, and the corerspoinding MPGv5.1, v5.2, v6.1, v6.2, and v6.4 are set forth in SEQ ID NOs: 30, 32, 34, 36, and 38, respectively.
  • the enhancement of G editing efficiency of the gGBEs (v3, v4, v4.2, v6.3, and gGBE with WT MPG) obtained above was next validated at two endogenous genomic sites in cultured HEK293T cells.
  • the cells were transfected with a construct encoding each gGBE, together with mCherry and sgRNA that targeted site 3 or site 10, and mCherry + cells were FACS-sorted for target deep sequencing analysis.
  • a gradual elevation of overall G editing efficiency was obtained at G7 from 6.4%to 78.5%for site 3, and from 7.5%to 80.3% for site 10, respectively (FIG. 2c and 2d) , confirming that gGBEv6.3 was indeed the best version of gGBE.
  • the engineered gGBEv6.3 (SEQ ID NO: 12) (carrying G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R mutations on MPGv6.3 (SEQ ID NO: 8) ) had the highest G editing efficiency and was used in the following studies.
  • the guide sequence-dependent off-target base editing efficiency of gGBEv6.3 was analyzed at several previously reported [11, 35] and in silico-predicted [37] guide sequence-dependent off-target sites, and the ability of gGBEv6.3 to mediate guide sequence-independent off-target DNA editing was characterized using orthogonal R-loop assay in five dSaCas9 R-loops, as reported in previous studies [11, 35] . Similar or lower percentage of editing at the guide sequence-dependent off-target loci (FIG. 3d and FIG. 15a) was found, as compared with that of adenine base editors found previously [11, 35] .
  • gGBEv6.3 was also tested with A-to-T reporter, C-to-G reporter, and T-to-G reporter, and the editing efficiency was 0.68%, 0.58%, and 1.81%, respectively, demonstrating its specificity for G editing, which is desired for targeted base editing application and reduced unwanted off-target editing.
  • the G-to-Y conversion ability of gGBE allows for a variety of gene-editing applications, including editing splicing sites, introduction of premature termination codons (PTCs) , as well as editing that bypasses PTCs (FIG. 4a) .
  • the inactive splicing acceptor (SA) signal with disruptive point mutations exemplified by the intron-split EGFP reporter system used above, could be remediated with gGBE (FIG. 1b) .
  • SA inactive splicing acceptor
  • gGBE could be used for disrupting the splicing signal by converting G within a splicing donor site ( “GT” ) or splicing acceptor site ( “AG” ) to other bases, resulting in exon skipping.
  • the splice acceptor site of DMD (Duchenne muscular dystrophy) exon 45 was edited with gGBEv6.3, and a high efficiency of G editing (up to 30.3%) was achieved with a high G-to-Y ratio (up to 0.88) when targeting DMD site 1 (FIG. 4b, c and FIG. 16a) .
  • Another application of gGBE is to introduce PTCs to disrupt gene expression by converting TCA, TAC, or GAA codon into a stop codon TGA, TAG, or TAA.
  • PTC by GAA to TAA conversion could be introduced only by using gGBEv6.3, no other current base editor could induce this type of PTC.
  • gGBEv6.3 By targeting three sites in the mouse Tyr (Tyrosinase, associated with coat color) gene with gGBEv6.3 to create PTCs (FIG. 4d) in cultured N2a cells, a high efficiency of G editing (up to 46.3%) was achieved with a high G-to-Y ratio (up to 0.95) (FIG. 4e and FIG. 16b) .
  • gGBEv6.3-encoding mRNA and Tyr-targeting sgRNA were co-injected into mouse zygotes of C57BL6 background (black coat color) , with 20 mouse embryos used for each of the three Tyr-targeting sgRNAs.
  • Highly efficient G editing (FIG. 17a) was found for two of the three sgRNAs, with an average of 50.9%PTC introduction efficiency when targeting the Tyr site 3 (FIG. 17b-c) .
  • gGBE induced very few indels in mouse embryos (FIG. 17d-e) .
  • deaminase-based base editors Two major classes of deaminase-based base editors (dBE) , ABE and CBE, as well as their derivatives (such as AYBE and CGBE) , perform base editing with deamination of A or C as the first key step [3–11] .
  • deaminase-free base editors were designed based on engineered MPG, and a gGBE editor that could achieve highly efficient G-to-C and G-to-T conversion in both cultured human cells and mouse embryos was generated.
  • the engineered MPG demonstrates that DNA glycosylases could be engineered into proteins that selectively excise a specific nucleotide base, such as, G.
  • the high editing efficiency of the gGBE of the disclosure could be attributed to the mutations in the MPG moiety that may facilitate its specific substrate selection or DNA-binding activity, or both.
  • Base editor constructs used in this study were cloned into a mammalian expression plasmid backbone under the control of a EF1 ⁇ promoter by standard molecular cloning techniques.
  • Intron-split EGFP reporters were engineered as previously described [11] .
  • corresponding mutations at the splice acceptor site were made to construct A-to-T reporter or G-to-T reporter via site-directed mutagenesis by PCR, respectively. Mutations at the splice acceptor site led to inactive EGFP production by non-spliced EGFP transcripts.
  • BpiI-harboring MPG mutants MPG-G174R/D175R/T199R/S230R/Q294R/D295R mutants or corresponding combinations, were constructed via site-directed mutagenesis by PCR. Sequential asparagine /glutamic acid /valine /glycine /tyrosine substitutions (X-to-N, E, V, G, or Y) were designed, with oligos coding for the mutants annealed and ligated into corresponding BpiI-digested backbone vectors. The gRNA oligos were annealed and ligated into BpiI sites. Unless otherwise indicated, each and every mutation in MPG is numbered based on the full-length wild type human MPG (SEQ ID NO: 9) with the first N-terminal Met.
  • Target sequencing data analysis was described previously [11] .
  • the targeted amplicon sequencing reads were processed using fastp with default parameters [47] .
  • the cleaned pairs were then merged using FLASH v1.2.11.
  • the amplified sequences from individual targets were demultiplexed using fastx barcode splitter.
  • pl from fastx_toolkit 0.0.14
  • Further amplicon sequencing analysis was performed by CRISPResso2 [48] .
  • a 10-bp window was used to quantify modifications centered around the middle of the 20-bp gRNA. Otherwise, the default parameters were used for analysis.
  • G-to-C purity was calculated as G-to-C editing efficiency / (G-to-C editing efficiency + G-to-T editing efficiency + G-to-A editing efficiency) .
  • G-to-Y conversion ratio was calculated as (G-to-C editing efficiency + G-to-T editing efficiency) / (G-to-C editing efficiency + G-to-T editing efficiency + G-to-A editing efficiency)
  • HEK293T cells were cultured with DMEM (Catalog#11995065, Gibco) supplemented with 10%fetal bovine serum (Catalog#04-001-1ACS, BI) and 0.1 mM non-essential amino acids (Catalog#11140-050, Gibco) . Cells were grown in an incubator at 37 °C with 5%CO 2 . MPG mutant screening was conducted in 48-well plates. The day before transfection, 3 ⁇ 10 4 HEK293T cells per well were plated in 250 ⁇ L of complete growth medium in the 48-well plates.
  • Orthogonal R-loop assays were performed as described previously [1, 2] .
  • 1 ⁇ g of gGBE plasmid with sgRNA targeting site 3 and 1 ⁇ g of dSaCas9 plasmid with corresponding sgRNA targeting five off-target sites to generate R-loops were co-transfected into HEK293T cells in 12-well plates using PEI (DNA/PEI ratio of 1: 2) .
  • PEI DNA/PEI ratio of 1: 2
  • 48h after transfection expression of mCherry, BFP and EGFP fluorescence were analyzed by BD FACS Aria III or Beckman CytoFLEX S. Flow cytometry results were analyzed with FlowJo V10.5.3.
  • the gating strategy in the identification of mCherry + , BFP + and EGFP + cells for on-target editing efficiency evaluation was supplied in FIG. 10d.
  • mice were approved by the Biomedical Research Ethics Committee of Center for HuidaGene Therapeutics Co. Ltd.
  • Super ovulated C57BL/6 females (4 weeks old) were mated with C57BL/6 males (8 weeks old) , and females from the ICR strain were used as foster mothers.
  • Mice were maintained in a specific pathogen-free facility under a 12-hour dark–light cycle, and constant temperature (20–26°C) and humidity (40–60%) maintenance.
  • the gGBE plasmids were structured by standard PCR amplification with Phanta Max Super-Fidelity DNA Polymerase (Vazyme Biotech Co., Ltd) , assembly with Gibson Assembly Master Mix (NEB E2611L) , and transformation into chemically competent DH5 ⁇ cells.
  • the gGBE plasmids were linearized by the FastDigest KpnI restriction enzyme (Thermo Fisher) , purified using Gel Extraction Kit (Omega) , and used as the template for in vitro transcription (IVT) using the mMESSAGE mMACHINE T7 Ultra kit (Life Technologies) .
  • T7 promoter sequence was added to the sgRNA template by PCR amplification of px330 (Addgene, #42230) .
  • the T7-Tyr-sgRNA PCR product was purified using Gel Extraction Kit (Omega) and used as the template for IVT of sgRNAs using the MEGAshortscript T7 kit (Life Technologies) .
  • the gGBE mRNA and Tyr-sgRNAs were purified using the MEGAclear kit (Life Technologies) and eluted in RNase-free water. In vitro transcribed RNAs were aliquoted and stored at -80°C until use. Prior to microinjection, the mixture of gGBE mRNA and Tyr-sgRNA was prepared by centrifuge for 10 min at 14,000 rpm at 4°C and supernatant transferred to 0.2 mL fresh PCR tubes for injection.
  • Genomic DNA was extracted by addition of 40 ⁇ l of lysis buffer and 1 ⁇ L Proteinase K (Catalog#PD101-01, Vazyme) directly into each tube of sorted cells.
  • the genomic DNA/lysis buffer mixture was incubated at 55 °C for 45 min, followed by a 95 °C enzyme inactivation step for 10 min.
  • the regions of interest for target sites were amplified by PCR using site-specific primers.
  • PCR reaction was performed at 95 °C for 5 min, 30 cycles at 95 °Cfor 15 s, 60 °C for 15 s, 72 °C 30 s, and a final extension at 72 °C for 5 min using Max Super-Fidelity DNA Polymerase (Catalog#P505-d3, Vazyme) .
  • PCR products were purified using universal DNA purification kit (TIANGEN) according to the manufacturer’s instructions, and analyzed by Sanger sequencing (Genewiz) .
  • the amplicons were ligated to adapters and sequencing was performed on the Illumina MiSeq platforms. Protospacer sequences used for each genomic locus are listed in Table 1.
  • T, C, and U are structurally similar, it was speculated that the excision of canonical T or C could be achieved by engineering certain uracil DNA glycosylase (UNG) .
  • UNG uracil DNA glycosylase
  • the excision of T or C would generate an apurinic/apyrimidinic (AP) site, then trigger base excision repair (BER) pathway and facilitate direct T editing or C editing (FIG. 19a-b) .
  • UNG1-Y147A and UNG1-N204D Two human UNG1 variants, UNG1-Y147A and UNG1-N204D, have been engineered to excise T and C in DNA, respectively 17 .
  • the residues Y156 and N213 of UNG2 are corresponding to the residues Y147 and N204 of UNG1, respectively, as determined by sequence alignment of UNG1 and UNG2.
  • gTBE deaminase-free glycosylase-based thymine base editor
  • gCBE deaminase-free glycosylase-based cytosine base editor
  • T-to-G reporter and C-to-G reporter two similar intron-split EGFP reporter systems as reported previously 9 , were established to evaluate the editing efficiency of gTBE and gCBE, respectively (FIG. 26a) .
  • the AG-to-AT or AG-to-AC inactive splicing acceptor (SA) could only be remediated with T-to-G or C-to-G conversion, which leads to correct splicing of EGFP-coding sequence and EGFP expression and emission of EGFP signals (FIG. 26b) .
  • the gTBE or gCBE encoding vector was co-transfected with the T-to-G or C-to-G reporter vector containing a targeting single-guide RNA (targeting sgRNA) that targeted the corresponding mis-splicing mutations.
  • targeting sgRNA targeting single-guide RNA
  • NTD N-terminal domain
  • UNG contains protein binding motifs and sites for post-translational modifications 18 , which might constrain targeted excision activity of the glycosylase domain in ssDNA 19, 20
  • gTBEv0.2 SEQ ID NO: 140
  • UNG2 ⁇ 88-Y156A SEQ ID NO: 139
  • gCBEv0.2 SEQ ID NO: 142
  • UNG2 ⁇ 88-N213D SEQ ID NO: 141 fused at the C-terminus of nCas9
  • gTBEv0.2 exhibited comparable T-to-G conversion activity with gTBEv0.1 (1.0%vs. 1.1%, FIG. 19d) , while gCBEv0.2 exhibited significantly increased C-to-G conversion activity compared with gCBEv0.1 (13.3% vs. 1.0%, FIG. 19e) .
  • both gTBEv0.3 (SEQ ID NO: 143) (with UNG2 ⁇ 88-Y156A fused at the N-terminus of nCas9) and gCBEv0.3 (SEQ ID NO: 154) (with UNG2 ⁇ 88-N213D fused at the N-terminus of nCas9) showed much higher editing efficiency than gTBEv0.2 and gCBEv0.2 with UNG2 mutant fused at the C-terminus of nCas9 (10.2%vs. 1.0%for gTBE and 51.4%vs. 13.3%for gCBE; FIG. 19c-e) , about 10-and 3.9-fold enhancement in editing efficiency, respectively.
  • UNG contains five conserved motifs required for efficient glycosylase activity: the catalytic water-activating loop, the proline-rich loop, the uracil-binding motif, the glycine-serine motif, and the leucine loop 23-25 (FIG. 25b) .
  • gTBEv1.1 (SEQ ID NO: 145) (v0.3 plus A214V) with UNG2 ⁇ 88-Y156A+A214V (SEQ ID NO: 144) was obtained with largely elevated T-to-G conversion activity of about 2.68-fold as compared with gTBEv0.3 (FIG. 28a) .
  • site-saturation mutagenesis focusing on the residue at position 214 was further performed.
  • gTBEv1.2 (SEQ ID NO: 147) (v0.3 plus A214T) with UNG2 ⁇ 88-Y156A+A214T (SEQ ID NO:146) was obtained with elevated editing efficiency of about 1.06-fold in comparison with the T editing efficiency of gTBEv1.1 (FIG. 28b) .
  • T editing efficiency across different gTBE was validated at one endogenous genomic site in cultured mammalian cells (HEK293T cells) .
  • HEK293T cells cultured mammalian cells
  • FACS fluorescence-activated cell sorting
  • the editing profiles of gTBEv3 was characterized by targeting 20 endogenous genomic loci, most of which were used in previous base editing studies 11, 12, 26, 27 . It was found that gTBEv3 achieved efficient T base editing efficiency (ranged from 24.3%to 81.5%; FIG. 21a and FIG. 30a-b) , and essentially no A, C, or G editing at all the examined sites (FIG. 30c-e) .
  • the T-to-C or T-to-G conversions were the predominant events (FIG. 30f-h) , only a low percentage of T-to-A conversion were detected (FIG. 21a and FIG. 30i) , consistent with the previous findings for gGBE 3 , AYBE 9 and CGBEs 11-15 .
  • the ratios of T-to-Sto T conversion ranged from 0.68 to 0.97 (without indels, FIG. 21b) and from 0.41 to 0.92 (with indels, FIG. 30j) . It was found that gTBEv3 also induced indels with frequency ranging from 5.2%to 45.2%at the 20 edited sites (FIG. 21c) . Furthermore, it was found that the editable range of gTBEv3 was positions 2 to 11, and the optimal editing window with high efficiency of T conversion covered protospacer positions 3 to 7, with the highest editing efficiency at position 5 (FIG. 30b) . No obvious motif preference was found for T conversions with gTBEv3 by analyzing the on-target editing and sequences of all the tested sites (FIG. 30k) .
  • the off-target activity of gTBEv3 was analyzed at several in silico-predicted 28 guide sequence-dependent off-target sites, and the ability of gTBEv3 to mediate guide sequence-independent off-target DNA editing was characterized by using orthogonal R-loop assay in five previously reported dSaCas9 R-loops 9, 29 . Very low percentage of editing was found at all the guide sequence-dependent off-target loci (FIG. 21d-e and FIG. 31) , and very low frequencies (1.1%in average) was detected at all five guide sequence-independent off-target sites (FIG. 21f) . Taken together, gTBEv3 represents a highly efficient T-to-Sbase editor with low off-target effects in mammalian cells.
  • gCBEv1.1 was generated by introducing A214V into gCBEv0.3 (FIG. 22a) . It was found that gCBEv1.1 (SEQ ID NO: 156) with UNG2 ⁇ 88-N213D+A214V (SEQ ID NO: 155) had largely elevated C-to-G conversion activity of about 1.34-fold as compared to gCBEv0.3 when evaluated using the C-to-G reporter (FIG. 32a) .
  • alanine-scanning mutagenesis was conducted on the region of D154-D189 of UNG2 to examine its role in the regulation of base excision activity, and gCBEv1.2 (SEQ ID NO: 158) (v0.3 plus K184A) with UNG2 ⁇ 88-N213D+K184A (SEQ ID NO: 157) was obtained with largely elevated C-to-G conversion activity by about 1.55-fold as compared with gCBEv0.3 (FIG. 32b) .
  • the combination of A214V and K184A was further investigated by combining these two mutations to generate gCBEv2 (SEQ ID NO: 160) with UNG2 ⁇ 88-N213D+K184A+A214V (SEQ ID NO: 159) , achieving C-to-G editing efficiency of about 1.3-fold compared with gCBEv0.3 (FIG. 22b) .
  • the improvement of C editing efficiency across different gCBE was further validated by targeting an endogenous genomic site, and a gradual increase of overall C editing efficiency from 18.2%to 37.2%at C2 of the site 28 was observed (FIG. 33a) .
  • gCBEv2 When compared to CGBE1 12 , a C-to-G base editor, it was found that gCBEv2 showed higher editing efficiency at certain positions towards the distal end of the target sequence (FIG. 22d and FIG. 33c) , indicating its positional preference within different optimal editing windows (positions 2 to 6 for gCBEv2 vs. positions 5 to 7 for CGBE1 12 ) . gCBEv2 induced fewer indels at site 36, and more indels at site 28 and site 29 than CGBE1 (FIG. 33k) .
  • the potential applications of gTBE and gCBE were further evaluated.
  • the gTBE could not only remediate inactive splicing signals in the intron-split EGFP reporter systems used above (FIG. 19-20 and FIG. 26) , but also be used for exon skipping by disrupting splicing signals at splicing donor (SD) or splicing acceptor (SA) sites (FIG. 23a) .
  • SD splicing donor
  • SA splicing acceptor
  • gTBE and gCBE together with other existing base editors, provide 1904 sgRNA candidates (protospacer sequence /guide sequence shown in Table 3) with the SD or SA sites located in each optimal editing window (FIG. 23b and FIG. 34a) .
  • sgRNA candidates protospacer sequence /guide sequence shown in Table 3
  • FIG. 23c 771 sgRNA candidates for ABE and CBE targeting
  • 156 and 103 candidates overlapped with those for gGBE and gTBE, respectively
  • 232 and 223 sgRNA candidates could only be screened by gGBE or gTBE targeting, respectively (FIG. 23c) .
  • 851 sgRNA candidates protospacer sequence /guide sequences shown in Table 4 targeting various codons for PTCs introduction in 15 genes were analyzed with gGBE and CBE, with 191 TAC and 124 TCA for gGBE targeting (FIG. 35e) .
  • sgRNAs specifically targeting SD or SA sites was designed and screened with gTBEv3 or gCBEv2 (FIG. 23d and FIG. 34c) , including three sgRNAs targeting the SD sites of DMD exon 45 (FIG. 23e) , 12, and 37 (FIG. 34d) uniquely targeted by gTBEv3. Disruption of the SD site of exon 45, thus leading to exon skipping, would be applicable to restore dystrophin expression in 9%DMD patients 33 .
  • gTBEv3-encoding mRNA and sgRNA targeting the SD site of DMD exon 45 were co-injected into zygotes of humanized mice to explore the potential application of gTBE. It was found that 100% (20/20) mouse embryos harbored efficient base conversion (ranged from 35.0%to 97.0%) at the desired position T3 (FIG. 23f-g) , indicating the great potential of gTBE for human disease modeling and gene therapy. Overall, gBEs, including gTBE, gCBE, and gGBE, provide more options for the sites that deaminase-based base editors could not target, largely expanding the targeting scope of base editors.
  • gTBEv4 (SEQ ID NO: 161) and gTBEv5 (SEQ ID NO: 162) were generated by inserting the UNG2 mutant (SEQ ID NO: 152) contained in gTBEv3 (SEQ ID NO: 153) into split nCas9 domains at different locations (FIG. 24b) .
  • the UNG2 mutant (SEQ ID NO: 152) was embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) .
  • nCas9 SEQ ID NO: 2
  • nCas9 SEQ ID NO: 2
  • positions 1064-1368 of nCas9 SEQ ID NO: 2
  • the first amino acid residue D of nCas9 was numbered as position 2 instead of position 1.
  • gCBEv3 (SEQ ID NO: 164) (FIG. 38a) was generated by replacing the UNG mutant (SEQ ID NO: 152) in gTBEv5 (SEQ ID NO: 162) with the UNG mutant (SEQ ID NO: 159) in gCBEv2 (SEQ ID NO: 160) .
  • T editing efficiency of various thymine base editors was compared at 17 endogenous sites, including five sites from He’s study 34 and five sites from Ye’s study 35 (FIG. 24c and FIG. 36) .
  • gTBEv3 showed higher editing efficiency than DAF-TBE at the overwhelming majority of Ts (29 out of 35) of tested sites (FIG. 24c, FIG. 36f) , indicating that UNG mutants generated herein by rational mutagenesis are superior to those by random mutagenesis.
  • gTBEv3 was also compared with gTBEv4 and gTBEv5, two base editors constructed using the embedding strategy.
  • gTBEv4 showed a shifted editing window of positions 7-13 from positions 3-7 (FIG. 24d) , with no significant difference in average editing efficiency from gTBEv3 (23.2%vs. 23.1%, FIG. 36f) .
  • the editing efficiency was largely increased compared to that of gTBEv3 (averaging 39.3%vs. 23.1%, FIG. 36f) and gTBEv4 and others, with the same predominant T-to-Sconversions (FIG. 36a-d and g) , and the optimal editing window covered protospacer positions 5 to 9 (FIG. 24d) .
  • TSBE3 (carrying L83Q and G116E mutations, equivalent to L74Q and G107E in UNG1) is an nCas9-embedded base editor with almost the same insertion position as gTBEv5 (FIG. 24c) .
  • gTBEv5 showed higher editing efficiency than TSBE3 (39.3%vs. 22.5%, FIG. 36f) at the overwhelming majority of Ts (29 out of 35) of the tested sites (FIG. 24c) , indicating that the UNG mutant generated herein by rational mutagenesis are superior to those generated by PLM-assisted mutagenesis.
  • the optimal editing window of TSBE3 covered protospacer positions 4 to 9 (FIG. 24d) .
  • the circularly permuted DAF-TBE2 showed low average editing efficiency and an editing window of positions 9-13, different from the editing window (positions 2-6) of DAF-TBE (FIG. 24d) .
  • gTBEv5 induced comparable indel rates to that of DAF-TBE (14.4%vs. 14.4%) , DAF-TBE2 (14.4%vs. 10.3%) , and TSBE3 (14.4%vs. 13.5%, FIG. 36e-g) .
  • gTBEs induced much fewer unintended T editing than TSBE3 and DAF-TBEs in the proximal DNA sequence upstream from two sites (site 38 and site 44) harboring unintended edits (FIG. B13) , consistent with the finding that the NTD of UNG could promote targeting the enzyme to ssDNA–dsDNA junctions 19 .
  • gCBEv2 induced comparable average indel rates with other deaminase-free base editors, including DAF-CBE (16.8%vs. 16.9%) , DAF-CBE2 (16.8%vs. 12.1%) , and CGBE-CDG (16.8%vs. 13.6%, FIG. 38d-g) .
  • the C-to-G editing frequency and purity of different base editors showed respective advantages for CGBE1 and various deaminase-free base editors at different cytosine position across the protospacer (FIG. 39a-b) .
  • Each base editor can edit its target base within a certain editable window, that is, positions 2 to 9 for gCBEv2, positions 2 to 11 for gCBEv3, positions 4 to 10 for CGBE1, positions 2 to 9 for CGBE-CDG, positions 2 to 9 for DAF-CBE, and positions 9 to 12 for DAF-CBE2 (FIG. 39c) .
  • Prime editing (PE) system could theoretically mediate all types of base substitution, including T-to-G conversion and C-to-G conversion 39 .
  • gTBEv3 and gTBEv5 were compared with the recently evolved PE6d system 40 at six previously reported endogenous sites 35 in HEK293T cells.
  • gTBEv3 and gTBEv5 outperformed PE6d or PE6d max for T-to-G conversion at four tested sites (FIG. 41a) .
  • gCBEv2 and gCBEv3 outperformed PE6d or PE6d max for C-to-G conversion at five tested sites (FIG. 41b) .
  • base editing and prime editing offer complementary strengths, and base editors generally show more efficient editing if the target base is positioned optimally.
  • gTBEs and gCBEs exhibited efficient T and C editing efficiency across three different human cell lines (HEK293T, U2OS, and HuH-7 cells) , with slight perturbations of the product purity for gTBEs and comparable substitution frequency of certain base for gCBEs in different cell lines (FIG. 42) .
  • the deaminase-based base editor (dBE) and derivatives thereof enable direct editing of adenine (A) and cytosine (C) , but not thymine (T) .
  • A adenine
  • C cytosine
  • T thymine
  • SNP pathogenic single nucleotide polymorphism
  • two orthogonal base editors, gTBE and gCBE that could achieve highly efficient T and C editing in both cultured human cells and mouse embryos were developed.
  • the gTBE and gCBE could greatly broaden the targeting scope of base editors by breaking the limitations of PAM and narrow editing window, thus increasing the opportunity to obtain an efficient strategy for further research.
  • the T-to-Sconversion ability of gTBE allows for a variety of gene editing applications, including editing splicing sites, as well as editing that bypass PTCs.
  • Wild-type UNG proteins are highly specific against uracil in both ssDNA and dsDNA, with a preference for ssDNA 43 .
  • the NTD of UNG containing motifs and sites for undesired protein-protein interactions and post-translational modifications could promote targeting the enzyme to ssDNA–dsDNA junctions 19, 20 .
  • TSBE3, with full length UNG2, and DAF-TBEs induced more undesired edits than gTBEs in the proximal DNA sequence upstream from two sites harboring unintended edits (FIG. 37) .
  • Base editor constructs used in this study were cloned into a mammalian expression plasmid backbone under the control of a EF1 ⁇ promoter by standard molecular cloning techniques, and the two intron-split EGFP reporters were constructed similar to those described previously 9 , except that the engineered sequence containing the last 86 base pairs (bp) intron of human RPS5 was inserted between BFP and EGFP coding sequences. And the corresponding mutations at the splice acceptor site were made to construct T-to-G reporter or C-to-G reporter via site-directed mutagenesis by PCR, respectively. Mutations at the splice acceptor site led to inactive EGFP production.
  • the corresponding mutations at the splice acceptor site were put at position 6 across the protospacer.
  • the wild-type UNG2 sequence (313 amino acids long) (SEQ ID NO: 133) was PCR-amplified from cDNA of HEK293T, UNG2-Y156A, UNG2-N213D, UNG-NTD-truncated mutants, and corresponding combinations were constructed via site-directed mutagenesis by PCR.
  • the UNG mutants were fused at different orientations with respect to nCas9 via Gibson Assembly method.
  • PE6d architecture harbored a human codon-optimized RNaseH-truncated evolved and engineered M-MLV variant with R221K/N394K/H840A mutations in SpCas9.
  • the nick sgRNA and epegRNA with tevoPreQ 1 motif were cloned into PE6d construct using Golden Gate assembly, resulting in an all-in-one plasmid.
  • PE6d max the codon-optimized hMLH1dn was co-expressed with PE6d.
  • UNG mutagenesis libraries were designed and generated as previously described 52 with some modification.
  • the region of 98-313 aa in UNG2 were divided into 8 aa long segments.
  • BpiI-harboring mutants containing Y156A or N213D were introduced via site-directed mutagenesis by PCR.
  • the regions of I150-L179, A158-K261, L210-T217, and Q274-Y284 were selected for rounds of sequential alanine /arginine /aspartic acid /valine substitutions (X-to-A, R, D, or V) .
  • the Cas-OFFinder 28 was used to search for potential guide sequence-dependent off-target sites of Cas9 RNA-guided endonucleases with a maximum of 3 mismatches (with no bulges) .
  • a PAM-flexible Cas9 variant SpG (SEQ ID NO: 163) was used in place of nCas9 (SEQ ID NO: 2) .
  • the sgRNA oligos were annealed and ligated into BpiI sites.
  • HEK293T, HuH-7, and U2OS cells were cultured with DMEM (Catalog#11995065, Gibco) supplemented with 10%fetal bovine serum (Catalog#04-001-1ACS, BI) and 0.1 mM non-essential amino acids (Catalog# 11140-050, Gibco) in an incubator at 37 °C with 5%CO 2 .
  • Mutant screening was conducted in 48-well plates, with 3 ⁇ 10 4 HEK293T cells per well plated in 250 ⁇ L of complete growth medium the day before transfection. Between 16 and 24 h after seeding, cells were co-transfected with 250 ng gTBE (or gCBE) plasmids, 250 ng T-to-G (or C-to-G) reporter plasmids, and 1 ⁇ g Polyethylenimine (PEI) (DNA/PEI ratio of 1: 2) per well.
  • PEI Polyethylenimine
  • Endogenous target sites of interest were amplified from genomic DNA as previously described 9 . Briefly, 10,000 positive cells with mCherry were isolated by FACS after 72 h of transfection, then genomic DNA was extracted and the regions of interest for target sites were amplified by PCR using site-specific primers. The purified PCR products were analyzed by Sanger sequencing (Genewiz) .
  • Target sequencing data analysis was described in the previous paper 3 .
  • the amplicons were ligated to adapters and sequencing was performed on the Illumina MiSeq platforms, then the targeted amplicon sequencing reads were processed using fastp with default parameters 53 , and further amplicon sequencing analysis were performed by CRISPResso2 54 .
  • T-to-G purity was calculated as T-to-G editing efficiency / (T-to-C editing efficiency + T-to-G editing efficiency + T-to-A editing efficiency) .
  • T-to-Sconversion ratio was calculated as (T-to-C editing efficiency + T-to-G editing efficiency) / (T-to-C editing efficiency + T-to-G editing efficiency + T-to-A editing efficiency) .
  • Protospacer sequences guide sequence are shown in Table 2.
  • the mRNA and sgRNA preparations were performed as previously described 9 .
  • the gTBEv3 expression plasmid was linearized by the FastDigest KpnI restriction enzyme (Catalog#FD0524, Thermo Fisher) , purified using Gel Extraction Kit (Catalog#D2500-03, Omega) , and used as the template for in vitro transcription (IVT) using the mMESSAGE mMACHINE T7 Ultra kit (Catalog#AM1345, Thermo Ambion) .
  • T7 promoter sequence was added to the sgRNA template by PCR amplification.
  • the T7-DMD-sgRNA PCR product was purified using Gel Extraction Kit (Catalog#D2500-03, Omega) and used as the template for IVT of sgRNAs using the MEGAshortscript T7 kit (Catalog#AM1354, invitrogen) .
  • the gTBEv3-encoding mRNA and DMD-sgRNA were purified using the MEGAclear kit (Catalog#AM1908, invitrogen) , eluted in RNase-free water and stored at -80°C until use.
  • mice Animal manipulations were consistent with those reported previously 3 . Experiments involving mice were approved by the Biomedical Research Ethics Committee of Center for HuidaGene Therapeutics Co. Ltd. Mice were maintained in a specific pathogen-free facility under a 12-hour dark–light cycle, and constant temperature (20–26°C) and humidity (40–60%) maintenance.
  • HEK293T cells were plated in 12-well plates as above and transfected with 2 ⁇ g of gTBEv5, gCBEv3, CGBE1, or mCherry plasmids using PEI (DNA/PEI ratio of 1: 2) . At 48 hours after transfection, around 5 ⁇ 10 6 cells were collected. Total RNA was extracted with a TRIzol-based method, fragmented, and reverse transcribed to cDNAs with HiScript Q RT SuperMix according to the manufacturer’s instructions. Total RNA integrity was quantified using an Agilent 2100 Bioanalyzer. The RNA-seq library was qualified using the Illumina NovaSeq 6000 platform (performed by GENEWIZ) . Trimmomatic (v.
  • RNA editing sites were calculated using REDItools2 57 with default parameters.
  • the dbSNP (v. 146) database downloaded from NCBI was used to filter the sites overlapped with common single nucleotide variants (SNVs) . The sites with less than five mutated or nonmutated reads were further filtered.
  • StringTie 58 was used to calculate expression value.
  • DESeq2 59 was used to calculate differentially expressed genes with FDR ⁇ 0.05 and Fold change>1.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Laminated Bodies (AREA)
  • Transition And Organic Metals Composition Catalysts For Addition Polymerization (AREA)

Abstract

L'invention concerne de nouveaux éditeurs de base et leurs utilisations.
PCT/CN2024/089874 2023-04-25 2024-04-25 Nouveaux éditeurs de base et leurs utilisations Pending WO2024222812A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202480020620.3A CN120936712A (zh) 2023-04-25 2024-04-25 新型碱基编辑器及其用途

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
CNPCT/CN2023/090660 2023-04-25
CN2023090660 2023-04-25
CN2023091734 2023-04-28
CNPCT/CN2023/091734 2023-04-28
CN2023094565 2023-05-16
CNPCT/CN2023/094565 2023-05-16
CN2024070217 2024-01-02
CNPCT/CN2024/070217 2024-01-02
CNPCT/CN2024/084498 2024-03-28
CN2024084498 2024-03-28

Publications (1)

Publication Number Publication Date
WO2024222812A1 true WO2024222812A1 (fr) 2024-10-31

Family

ID=93255591

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/089874 Pending WO2024222812A1 (fr) 2023-04-25 2024-04-25 Nouveaux éditeurs de base et leurs utilisations

Country Status (2)

Country Link
CN (1) CN120936712A (fr)
WO (1) WO2024222812A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119220593A (zh) * 2024-11-29 2024-12-31 三亚中国农业科学院国家南繁研究院 一种用于植物c到k的碱基编辑器
CN119241729A (zh) * 2024-12-09 2025-01-03 三亚中国农业科学院国家南繁研究院 一种用于植物t到g的碱基编辑器

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170321210A1 (en) * 2014-11-04 2017-11-09 National University Corporation Kobe University Method for modifying genome sequence to introduce specific mutation to targeted dna sequence by base-removal reaction, and molecular complex used therein
US20180179503A1 (en) * 2016-12-23 2018-06-28 President And Fellows Of Harvard College Editing of ccr5 receptor gene to protect against hiv infection
US20210024906A1 (en) * 2017-11-22 2021-01-28 National University Corporation Kobe University Complex for genome editing having stability and few side-effects, and nucleic acid coding same
US20210403898A1 (en) * 2020-06-30 2021-12-30 Pairwise Plants Services, Inc. Compositions, systems, and methods for base diversification
CN117126827A (zh) * 2023-06-20 2023-11-28 西湖大学 一种融合蛋白及含有由尿嘧啶-n-糖基化酶突变体介导的碱基编辑系统和应用
WO2023237063A1 (fr) * 2022-06-08 2023-12-14 Huidagene Therapeutics Co., Ltd. Nouveaux acides nucléiques guides pour systèmes d'édition de bases d'arn et leurs utilisations

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170321210A1 (en) * 2014-11-04 2017-11-09 National University Corporation Kobe University Method for modifying genome sequence to introduce specific mutation to targeted dna sequence by base-removal reaction, and molecular complex used therein
US20180179503A1 (en) * 2016-12-23 2018-06-28 President And Fellows Of Harvard College Editing of ccr5 receptor gene to protect against hiv infection
US20210024906A1 (en) * 2017-11-22 2021-01-28 National University Corporation Kobe University Complex for genome editing having stability and few side-effects, and nucleic acid coding same
US20210403898A1 (en) * 2020-06-30 2021-12-30 Pairwise Plants Services, Inc. Compositions, systems, and methods for base diversification
WO2023237063A1 (fr) * 2022-06-08 2023-12-14 Huidagene Therapeutics Co., Ltd. Nouveaux acides nucléiques guides pour systèmes d'édition de bases d'arn et leurs utilisations
CN117126827A (zh) * 2023-06-20 2023-11-28 西湖大学 一种融合蛋白及含有由尿嘧啶-n-糖基化酶突变体介导的碱基编辑系统和应用

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TONG HUAWEI, LIU NANA, WEI YINGHUI, ZHOU YINGSI, LI YUN, WU DANNI, JIN MING, CUI SHUNA, LI HENGBIN, LI GUOLING, ZHOU JINGXING, YUA: "Programmable deaminase-free base editors for G-to-Y conversion by engineered glycosylase", NATIONAL SCIENCE REVIEW, vol. 10, no. 8, 28 June 2023 (2023-06-28), XP093230108, ISSN: 2095-5138, DOI: 10.1093/nsr/nwad143 *
TONG HUAWEI, WANG HAOQIANG, LIU NANA, LI GUOLING, ZHOU YINGSI, WU DANNI, LI YUN, JIN MING, WANG XUCHEN, LI HENGBIN, WEI YINGHUI, Y: "Development of deaminase-free T-to-S base editor and C-to-G base editor by engineered human uracil DNA glycosylase", BIORXIV, 1 January 2024 (2024-01-01), XP093230111, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2024.01.01.573809v1.full.pdf> DOI: 10.1101/2024.01.01.573809 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119220593A (zh) * 2024-11-29 2024-12-31 三亚中国农业科学院国家南繁研究院 一种用于植物c到k的碱基编辑器
CN119220593B (zh) * 2024-11-29 2025-04-01 三亚中国农业科学院国家南繁研究院 一种用于植物c到k的碱基编辑器
CN119241729A (zh) * 2024-12-09 2025-01-03 三亚中国农业科学院国家南繁研究院 一种用于植物t到g的碱基编辑器

Also Published As

Publication number Publication date
CN120936712A (zh) 2025-11-11

Similar Documents

Publication Publication Date Title
US20240141336A1 (en) Targeted rna editing
JP7646554B2 (ja) アルファ-1アンチトリプシン不全を治療するための組成物および方法
CN114072496A (zh) 腺苷脱氨酶碱基编辑器及使用其修饰靶标序列中的核碱基的方法
CN114072509A (zh) 脱氨反应脱靶减低的核碱基编辑器和使用其修饰核碱基靶序列的方法
AU2020344547A1 (en) Novel nucleobase editors and methods of using same
CN116497067A (zh) 治疗血红素病变的组合物和方法
CN114190093A (zh) 使用腺苷酸脱氨酶碱基编辑器破坏疾病相关基因的剪接受体位点,包括用于治疗遗传性疾病
CN114206395B (zh) 使用可编程碱基编辑器系统编辑单核苷酸多态性的方法
WO2024222812A1 (fr) Nouveaux éditeurs de base et leurs utilisations
CA3116739A1 (fr) Compositions et methodes de traitement d&#39;une deficience en alpha 1-antitrypsine
US20250283063A1 (en) Novel crispr-cas12i systems and uses thereof
WO2023086953A1 (fr) Compositions et procédés pour le traitement de l&#39;œdème de quincke héréditaire (hae)
WO2023217280A1 (fr) Éditeur de base d&#39;adénine programmable et ses utilisations
US20240132868A1 (en) Compositions and methods for the self-inactivation of base editors
AU2023261324A1 (en) Novel crispr-cas12f systems and uses thereof
WO2024026478A1 (fr) Compositions et méthodes de traitement d&#39;une maladie oculaire congénitale
WO2024229240A2 (fr) Compositions et méthodes de traitement de la maladie de stargardt
NZ732182B2 (en) Targeted rna editing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24796188

Country of ref document: EP

Kind code of ref document: A1