WO2024222812A1 - Novel base editors and uses thereof - Google Patents
Novel base editors and uses thereof Download PDFInfo
- Publication number
- WO2024222812A1 WO2024222812A1 PCT/CN2024/089874 CN2024089874W WO2024222812A1 WO 2024222812 A1 WO2024222812 A1 WO 2024222812A1 CN 2024089874 W CN2024089874 W CN 2024089874W WO 2024222812 A1 WO2024222812 A1 WO 2024222812A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- seq
- fusion protein
- amino acid
- base
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K19/00—Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
- C12N9/222—Clustered regularly interspaced short palindromic repeats [CRISPR]-associated [CAS] enzymes
- C12N9/226—Class 2 CAS enzyme complex, e.g. single CAS protein
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/24—Hydrolases (3) acting on glycosyl compounds (3.2)
- C12N9/2497—Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing N- glycosyl compounds (3.2.2)
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/80—Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2320/00—Applications; Uses
- C12N2320/30—Special therapeutic applications
- C12N2320/33—Alteration of splicing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y302/00—Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
- C12Y302/02—Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2) hydrolysing N-glycosyl compounds (3.2.2)
- C12Y302/02021—DNA-3-methyladenine glycosylase II (3.2.2.21)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y302/00—Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
- C12Y302/02—Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2) hydrolysing N-glycosyl compounds (3.2.2)
- C12Y302/02027—Uracil-DNA glycosylase (3.2.2.27)
Definitions
- Base editing is a powerful technology for basic research and therapeutic applications [1, 2] .
- Current base editors mainly contain a nucleic acid programmable DNA binding protein, such as a catalytically impaired CRISPR-associated (Cas) nuclease, that was fused with a single-stranded DNA deaminase enzyme and sometimes an additional protein that could modulate DNA repair machinery [3, 4] .
- Cas CRISPR-associated
- C-to-G base editors [6-10] and adenine transversion base editor (AYBE) [11] were constructed by fusing existing CBE or ABE with a DNA glycosylase variant to generate new tools for achieving more versatile base editing outcomes, including C-to-G, A-to-C and A-to-T editing (FIG. 9) .
- CRISPR-free CBEs (DdCBEs) were reported for performing C-to-T base editing in mitochondria DNA, by fusing two halves of a double-strand DNA cytidine deaminase (DddA) variants with two separate TALE (transcription activator-like effector) proteins [12-14] .
- Provided in the disclosure includes at least in part base editors and base editing methods capable of direct base editing of a target deoxyribonucleotide (e.g., dG, dT) in a target dsDNA.
- Provided in the disclosure includes at least in part base editors and base editing methods capable of base editing of a target deoxyribonucleotide (e.g., dC) in a target dsDNA in the absence of deamination.
- the disclosure provides a fusion protein comprising:
- napDNAbd nucleic acid programmable DNA binding domain capable of binding a target dsDNA comprising:
- a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
- dG deoxyguanosine
- dT thymidine
- dC deoxycytidine
- a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
- first deoxyribonucleotide e.g., dG, dT, dC
- target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence
- a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide.
- the fusion protein does not comprise a deaminase domain, e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
- a deaminase domain e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
- the first deoxyribonucleotide is deoxyguanosine (dG) , thymidine (dT) , or deoxycytidine (dC) .
- the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide is dG-to-dA, dG-to-dT, dG-to-dC, dT-to-dA, dT-to-dC, dT-to-dG, dC-to-dA, dC-to-dT, or dC-to-dG.
- the base excising domain comprises a glycosylase.
- the glycosylase is selected from the group consisting of N-methylpurine DNA glycosylase (MPG) , 8-oxoguanine DNA glycosylase (OGG1) , methyl-CpG binding domain 4, DNA glycosylase (MBD4) , thymine DNA glycosylase (TDG) , uracil DNA glycosylase (UNG) , single-strand-selective monofunctional uracil-DNA glycosylase 1 (SMUG1) , mutY DNA glycosylase (MUTYH) , nth like DNA glycosylase 1 (NTHL1) , nei like DNA glycosylase 1 (NEIL1) , nei like DNA glycosylase 2 (NEIL2) , nei like DNA glycosylase 3 (NEIL3) , and mutants thereof capable of recognizing and excising a base from a nucleotide of a nucleic acid.
- MPG N-methylpurine DNA glycosylase
- the base excising domain comprises an N-methylpurine DNA glycosylase (MPG) .
- MPG N-methylpurine DNA glycosylase
- the MPG comprises an amino acid substitution relative to a reference MPG of SEQ ID NO: 7 at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
- the amino acid substitution is a substitution with R, A, N, or G.
- the MPG comprises an amino acid substitution relative to a reference MPG of SEQ ID NO: 7 that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
- the MPG comprises a combination substitution relative to a reference MPG of SEQ ID NO: 7 that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1.
- the MPG comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
- the MPG is (substantially) capable of excising guanine of dG.
- the base excising domain comprises an uracil-DNA glycosylase (UNG) .
- UNG uracil-DNA glycosylase
- the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 or 137 at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the amino acid substitution is a substitution with A, D, V, or T.
- the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 or 137 that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
- the UNG comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the reference UNG of SEQ ID NO: 135 or 137, wherein the position is numbered according to SEQ ID NO: 133.
- the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 137 that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
- M N-terminal Methionine
- the UNG is (substantially) capable of excising thymine of dT.
- the UNG is (substantially) capable of excising cytosine of dC.
- the napDNAbd is RNA programmable DNA binding protein.
- the napDNAbd is selected from the group consisting of CRISPR-associated (Cas) protein, IscB, IsrB, Argonaute, and TnpB.
- the napDNAbd is a nickase, e.g., a Cas9 nickase, an IscB nickase.
- the napDNAbd is nuclease-inactive, e.g., dead Cas9, dead Cas12i.
- the napDNAbd comprise an amino acid sequence having a sequence identity of at least about 60%(e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 2, 48, 50, 52, or 163.
- the fusion protein comprises, from N-terminal to C-terminal, (1) the napDNAbp and the base excising domain; or (2) the base excising domain and the napDNAbp.
- the napDNAbd (e.g., Cas9) is a two-part napDNAbd, for example, a two-part split Cas9, comprising a N-terminal portion and a C-terminal portion, and wherein the fusion protein comprises, from N-terminal to C-terminal, (1) the N-terminal portion of the napDNAbd, the base excising domain, and the C-terminal portion of the napDNAbd; (2) the C-terminal portion of the napDNAbd, the base excising domain, and the N-terminal portion of the napDNAbd; or (3) the base excising domain, the C-terminal portion of the napDNAbd (e.g., amino acids at positions 1249-1368) , and the N-temrinal portion (e.g., amino acids at positions 1-1248) of the napDNAbd .
- the C-terminal portion of the napDNAbd e.g., amino acids at positions 1249-1368
- the N-temrinal portion e.g., amino
- the napDNAbd is SpCas9 (e.g., a SpCas9 nickase) or a mutant thereof (e.g., a SpG Cas9 nickase) .
- the N-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1 or 2 to 1012, 1028, 1041, 1046, 1047, 1248, 1249, or 1300.
- the C-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1013, 1029, 1042, 1047, 1048, 1249, 1063, 1064, 1230, 1249, or 1301 to 1368.
- the fusion protein comprises the base excising domain embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2; or embedded between positions 2-1047 of nCas9 (SEQ ID NO: 2) and positions 1064-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2.
- the fusion protein comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 12, 14, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 55, 57, 59, 61, 63, 136, 138, 140, 142, 143, 145, 147, 149, 151, 153, 154, 156, 158, 160, 161, 162, and 164.
- 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,
- the disclosure provides a system comprising:
- a guide nucleic acid or a polynucleotide encoding the guide nucleic acid comprising:
- the guide nucleic acid is a guide RNA (gRNA) .
- gRNA guide RNA
- the scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 40, 73, or 74.
- the scaffold sequence comprises (1) a sequence of SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 40, 73, or 74.
- the fusion protein or system of the disclosure further comprises a translesion synthesis (TLS) polymerase or a recruiting domain or component capable of recruiting a TLS polymerase.
- TLS translesion synthesis
- the TLS polymerase is selected from the group consisting of Pol ⁇ (alpha) , Pol ⁇ (beta) , Pol ⁇ (delta) (PCNA) , Pol ⁇ (gamma) , Pol ⁇ (eta) , Pol ⁇ (iota) , Pol ⁇ (kappa) , Pol ⁇ (lamda) , Pol ⁇ (mu) , Pol ⁇ (nu) , Pol ⁇ (theta) , and REV1.
- the disclosure provides a polynucleotide encoding the fusion protein of the disclosure and optionally the guide nucleic acid of the disclosure.
- the disclosure provides a delivery system comprising (1) the fusion protein of the disclosure, the polynucleotide of the disclosure, or the system of the disclosure; and (2) a delivery vehicle.
- the disclosure provides a vector comprising the polynucleotide of the disclosure.
- the disclosure provides a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
- a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
- the disclosure provides a pharmaceutical composition comprising (1) the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, or the cell of the disclosure; and (2) a pharmaceutically acceptable excipient.
- the disclosure provides a cell or a progeny thereof comprising the system of the disclosure.
- the disclosure provides a cell or a progeny thereof modified by the system of the disclosure or the method of the disclosure.
- the disclosure provides a method of modifying a target dsDNA, comprising contacting the target dsDNA with the system of the disclosure,
- the target dsDNA comprising:
- a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
- dG deoxyguanosine
- dT thymidine
- dC deoxycytidine
- a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
- first deoxyribonucleotide e.g., dG, dT, dC
- the protospacer sequence is fully reverse complementary to the target sequence.
- the method does not include deamination of the base of the first deoxyribonucleotide before the excision of the base of the first deoxyribonucleotide.
- the method does not include deamination of the base of the first deoxyribonucleotide.
- the disclosure provides an MPG described herein, or of the disclosure.
- the disclosure provides an UNG described herein, or of the disclosure.
- Nucleic acid programmable binding protein for example, nucleic acid programmable DNA binding protein, (napDNAbp) , such as Cas9, Cas12, IscB, nucleic acid programmable RNA binding protein (napRNAbp) , such as, Cas13, is capable of binding to a target nucleic acid (e.g., dsDNA, mRNA) as guided by a guide nucleic acid (e.g., a guide RNA) comprising a guide sequence targeting the target nucleic acid.
- a target nucleic acid e.g., dsDNA, mRNA
- a guide nucleic acid e.g., a guide RNA
- the target nucleic acid is eukaryotic.
- the guide nucleic acid comprises a scaffold sequence responsible for forming a complex with the napBP, and a guide sequence that is intentionally designed to be responsible for hybridizing to a target sequence of the target nucleic acid, thereby guiding the complex comprising the napBP and the guide nucleic acid to the target nucleic acid.
- an exemplary target dsDNA is depicted to comprise a 5’ to 3’s ingle DNA strand and a 3’ to 5’ single DNA strand.
- An exemplary guide nucleic acid (e.g., a guide RNA) is depicted to comprise a guide sequence and a scaffold sequence.
- the guide sequence is designed to hybridize to a part of the 3’ to 5’s ingle DNA strand, and so the guide sequence “targets” that part.
- the 3’ to 5’s ingle DNA strand is referred to as a “target strand (TS) ” of the target dsDNA
- NTS nontarget strand
- target sequence That part of the target strand based on which the guide sequence is designed and to which the guide sequence may hybridize is referred to as a “target sequence”
- protospacer sequence the opposite part on the nontarget strand corresponding to that part is referred to as the “protospacer sequence” , which is 100% (fully) reversely complementary to the target sequence and is said to be “corresponding to” the target sequence in the disclosure.
- an exemplary target dsDNA is depicted to comprise a 5’ to 3’s ingle DNA strand and a 3’ to 5’ single DNA strand.
- an exemplary target RNA (transcript, e.g., a pre-mRNA) may be transcribed using the 3’ to 5’s ingle DNA strand as a synthesis template, and thus the 3’ to 5’s ingle DNA strand is referred to as a “template strand” or a “antisense strand” .
- the transcript so transcribed has the same primary sequence as the 5’ to 3’s ingle DNA strand except for the replacement of T with U, and thus the 5’ to 3’s ingle DNA strand is referred to as a “coding strand” or a “sense strand” .
- An exemplary guide nucleic acid (e.g., a guide RNA) is depicted to comprise a guide sequence and a scaffold sequence.
- the guide sequence is designed to hybridize to a part of the transcript (target RNA) , and so the guide sequence “targets” that part. And thus, that part of the target RNA based on which the guide sequence is designed and to which the guide sequence may hybridize is referred to as a “target sequence” .
- the guide sequence is 100% (fully) reversely complementary to the target sequence.
- the guide sequence is reversely complementary to the target sequence and contains a mismatch with the target sequence.
- nucleic acid sequence e.g., a DNA sequence, an RNA sequence
- a nucleic acid sequence is written in 5’ to 3’ direction /orientation unless explicitly indicated otherwise.
- a DNA sequence of ATGC it is usually understood as 5’-ATGC-3’ unless otherwise indicated. Its reverse sequence is 5’-CGTA-3’. Its fully complementary sequence is 5’-TACG-3’. Its fully reverse complementary sequence is 5’-GCAT-3’. Note that the fully complementary sequence usually does not have the ability to base-pair /hybridize with the original sequence.
- the double-strand sequence of a dsDNA may be represented with the sequence of its 5’ to 3’s ingle DNA strand conventionally written in 5’ to 3’ direction /orientation unless otherwise indicated.
- the dsDNA may be simply represented as 5’-ATGC-3’.
- either the 5’ to 3’s ingle DNA strand or the 3’ to 5’s ingle DNA strand of a dsDNA can be a nontarget strand from which a protospacer sequence is selected.
- the 5’ to 3’s ingle DNA strand is the sense strand of the gene
- the 3’ to 5’ single DNA strand is the antisense strand of the gene.
- the sense strand or the antisense strand of a gene can be a nontarget strand from which a protospacer sequence is selected.
- the transcript (target RNA) transcribed from the dsDNA then has a (target) sequence of 5’-AUGC-3’.
- the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-AUGC-3’ that is fully reversely complementary to the 3’ to 5’s trand of the target dsRNA, which would be set forth in ATGC in the electric sequence listing but marked as an RNA sequence; and in another embodiment, the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the 5’ to 3’s trand of the target dsRNA, which would be set forth in GCAT in the electric sequence listing but marked as an RNA sequence.
- the guide sequence of a guide nucleic acid is fully reversely complementary to the target sequence and the target sequence is fully reversely complementary to the protospacer sequence
- the guide sequence is identical to the protospacer sequence except for the U in the guide sequence due to its RNA nature and correspondingly the T in the protospacer sequence due to its DNA nature.
- symbol “t” is used to denote both T in DNA and U in RNA (See “Table 1: List of nucleotides symbols” , the definition of symbol “t” is “thymine in DNA/uracil in RNA (t/u) ” ) .
- such a guide sequence could be set forth in the same sequence as a corresponding protospacer sequence.
- a single SEQ ID NO in the electronic sequence listing can be used to denote both such guide sequence and protospacer sequence, regardless whether such a single SEQ ID NO is marked as DNA or RNA in the electronic sequence listing.
- a reference is made to such a SEQ ID NO that sets forth a protospacer /guide sequence it refers to either a protospacer sequence that is a DNA sequence or a guide sequence that is an RNA sequence depending on the context, no matter whether it is marked as a DNA or an RNA in the electronic sequence listing.
- the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the (target) sequence of the target RNA, which would be set forth in GCAT in the electric sequence listing but marked as an RNA sequence.
- RNA sequence As used herein, if a DNA sequence, for example, 5’-ATGC-3’ is transcribed to an RNA sequence, with each dT (deoxythymidine, or “T” for short) in the primary sequence replaced with a U (uridine) and other dA (deoxyadenosine, or “A” for short) , dG (deoxyguanosine, or “G” for short) , and dC (deoxycytidine, or “C” for short) replaced with A (adenosine) , G (guanosine) , and C (cytidine) , respectively, for example, 5’-AUGC-3’ , it is said in the disclosure that the DNA sequence “encodes” the RNA sequence.
- the term “activity” refers to a biological activity.
- the activity includes enzymatic activity, e.g., catalytic ability of an effector.
- the activity can include nuclease activity, e.g., dsDNA endonuclease activity, RNA endonuclease activity.
- nucleic acid programmable binding protein napBP
- nucleic acid programmable binding domain napBD
- a programmable nucleic acid e.g., DNA or RNA
- gRNA guide nucleic acid
- the napBP may be indirectly associated with (e.g., bound to) the target nucleic acid via the interaction (e.g., binding) between the napBP and the programmable nucleic acid (e.g., scaffold sequence of the programmable nucleic acid) and the interaction (e.g., hybridization) between the programmable nucleic acid (e.g., the guide sequence of the programmable nucleic acid) and the target nucleic acid (e.g., the target sequence of the target nucleic acid) .
- the napBP is a nucleic acid programmable DNA binding protein (napDNAbp) .
- the napBP is a nucleic acid programmable RNA binding protein (napRNAbp) .
- the term “complex” refers to a grouping of two or more molecules.
- the complex comprises a polypeptide and a nucleic acid interacting with (e.g., binding to, coming into contact with, adhering to) one another.
- the term “complex” can refer to a grouping of a guide nucleic acid and a polypeptide (e.g., a napBP) .
- the term “complex” can refer to a grouping of a guide nucleic acid, a polypeptide (e.g., a napBP) , and a target nucleic acid.
- the term “protospacer adjacent motif’ or “PAM” refers to a short DNA sequence (or a DNA motif) adjacent to a protospacer sequence on the nontarget strand of a dsDNA.
- adjacent includes instances wherein there is no nucleotide between the protospacer sequence and the PAM and also instances wherein there are a small number (e.g., 1, 2, 3, 4, or 5) of nucleotides between the protospacer sequence and the PAM.
- a “immediately adjacent (to) ” B, A “immediately 5’ to” B, and A “immediately 3’ to” B mean that there is no nucleotide between A and B.
- the PAM is immediately 5’ to a protospacer sequence. In some embodiments, the PAM is immediately 3’ to a protospacer sequence.
- the term “guide nucleic acid” refers to any nucleic acid that facilitates the targeting of a napBP to a target nucleic acid.
- the guide nucleic acid may be designed to include a guide sequence capable of hybridizing to a specific sequence of a target nucleic acid, and the guide nucleic acid may also comprise a scaffold sequence facilitating the guiding of a napBP to the target nucleic acid.
- the guide nucleic acid is a guide RNA.
- the guide nucleic acid is a nucleic acid encoding a guide RNA.
- nucleic acid As used herein, the terms “nucleic acid” , “polynucleotide” , and “nucleotide sequence” are used interchangeably to refer to a polymeric form of nucleotides of any length, including deoxyribonucleotides, ribonucleotides, combinations thereof, and analogs or modifications thereof.
- guide RNA is used interchangeably with the term “CRISPR RNA (crRNA) ” , “single guide RNA (sgRNA) ” , or “RNA guide”
- guide sequence is used interchangeably with the term “spacer sequence”
- sinaffold sequence is used interchangeably with the term “direct repeat sequence” .
- the guide sequence is so designed to be capable of hybridizing to a target sequence.
- the term “hybridize” , “hybridizing” , or “hybridization” refers to a reaction in which one or more polynucleotide sequences react to form a complex that is stabilized via hydrogen bonding between the bases of the polynucleotide sequences. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner.
- a polynucleotide sequence capable of hybridizing to a given polynucleotide sequence is referred to as the “complement” of the given polynucleotide sequence.
- the hybridization of a guide sequence and a target sequence is so stabilized to permit an effector polypeptide (e.g., a napBP) that is complexed with a nucleic acid comprising the guide sequence or a function domain associated (e.g., fused) with the effector polypeptide to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence.
- an effector polypeptide e.g., a napBP
- a nucleic acid comprising the guide sequence or a function domain associated (e.g., fused) with the effector polypeptide to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence.
- the guide sequence is reversely complementary to a target sequence.
- reverse complementary refers to the ability of nucleobases of a first polynucleotide sequence, such as a guide sequence, to base pair with nucleobases of a second polynucleotide sequence, such as a target sequence, by traditional Watson-Crick base-pairing. Two reverse complementary polynucleotide sequences are able to non-covalently bind under appropriate temperature and solution ionic strength conditions.
- a first polynucleotide sequence (e.g., a guide sequence) comprises 100% (fully) reverse complementarity to a second nucleic acid (e.g., a target sequence) .
- a first polynucleotide sequence (e.g., a guide sequence) is reverse complementary to a second polynucleotide sequence (e.g., a target sequence) if the first polynucleotide sequence comprises at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%complementarity to the second nucleic acid (i.e., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or
- the term “substantially complementary” refers to a first polynucleotide sequence (e.g., a guide sequence) that has a certain level of complementarity to a second polynucleotide sequence (e.g., a target sequence) (e.g., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%of the nucleotides of the first polynucleotide sequence can base-pair with the nucleotides of the second polynucleotide sequence, or at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotides of the first polynucleotide sequence mismatch the nucleotides of the second polynucleotide sequence) .
- the level of complementarity is such that the first polynucleotide sequence (e.g., a guide sequence) can hybridize to the second polynucleotide sequence (e.g., a target sequence) with sufficient affinity to permit an effector polypeptide (e.g., a napBP) that is complexed with a nucleic acid comprising the first polynucleotide sequence or a function domain associated (e.g., fused) with the effector polypeptide to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence.
- a guide sequence that is substantially complementary to a target sequence has less than 100%complementarity to the target sequence.
- a guide sequence that is substantially complementary to a target sequence has at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementarity to the target sequence, and/or has at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotide mismatches from the target sequence.
- sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percentage sequence identity (%) between two or more sequences (polypeptide or polynucleotide sequences) . Sequence homologies may be generated by any of a number of computer programs known in the art, for example, BLAST, FASTA. A suitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (University of Wisconsin, U.S.A; Devereux et al., 1984, Nucleic Acids Research 12: 387) .
- Examples of other software than may perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al., 1999 ibid-Chapter 18) , FASTA (Atschul et al., 1990, J. Mol. Biol., 403-410) , and the GENEWORKS suite of comparison tools. Both BLAST and FASTA are available for offline and online searching (see Ausubel et al., 1999 ibid, pages 7-58 to 7-60) .
- a commonly used online tool to calculate percentage sequence identity between two or more sequences is available on the website of EMBL's European Bioinformatics Institute (www dot ebi dot ac dot uk slash jdispatcher slash) , allowing fast online calculation of percentage sequence identity by global alignment or local alignment.
- polypeptide and “peptide” are used interchangeably herein to refer to polymers of amino acids of any length.
- a protein may have one or more polypeptides.
- An amino acid polymer can also be modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component.
- a “variant” is interpreted to mean a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide, respectively, but retains essential properties, e.g., binding property of a napBP.
- a typical variant of a polynucleotide differs in nucleic acid sequence from another reference polynucleotide.
- a change in the nucleic acid sequence of the polynucleotide variant may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide.
- a change in the nucleic acid sequence of the polynucleotide variant may result in an amino acid substitution, addition, and/or deletion in the polypeptide encoded by the reference polynucleotide.
- a typical variant of a polypeptide differs in amino acid sequence from another reference polypeptide. Generally, the difference is limited so that the sequences of the reference polypeptide and the polypeptide variant are closely similar overall and, in many regions, identical.
- the polypeptide variant and reference polypeptide may differ in amino acid sequence by one or more substitutions, additions, and/or deletions in any combination.
- a variant of a polynucleotide or polypeptide may be naturally occurring, such as, an allelic variant, or it may be a variant that is not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to skilled artisans.
- the terms “upstream” and “downstream” refer to the relative positions of two or more elements within a nucleic acid in 5’ to 3’ direction.
- a first sequence is upstream of a second sequence when the 3’ end of the first sequence is present at the left side of the 5’ end of the second sequence.
- a first sequence is downstream of a second sequence when the 5’ end of the first sequence is present at the right side of the 3’ end of the second sequence.
- the PAM is upstream of a napBP-induced indel, and a napBP-induced indel is downstream of the PAM.
- the PAM is downstream of a napBP-induced indel, and a napBP-induced indel is upstream of the PAM.
- wild type has the meaning commonly understood by those skilled in the art to mean a typical form of an organism, a strain, a gene, or a feature that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from sources in nature and not intentionally modified.
- nucleic acid or polypeptide As used herein, the terms “non-naturally occurring” and “engineered” are used interchangeably and refer to artificial participation. When these terms are used to describe a nucleic acid or a polypeptide, it is meant that the nucleic acid or polypeptide is at least substantially freed from at least one other component of its association in nature or as found in nature.
- regulatory element is intended to include promoters, enhancers, internal ribosome entry sites (IRES) , and other expression control elements (e.g., transcription termination signals, such as, polyadenylation signals and poly-U sequences) .
- IRES internal ribosome entry sites
- regulatory elements e.g., transcription termination signals, such as, polyadenylation signals and poly-U sequences
- Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct expression of a nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences) .
- Regulatory elements may also direct expression in a time-dependent manner, e.g., in a cell cycle-dependent or developmental stage-dependent manner, which may or may not be tissue or cell type specific.
- the term “cell” is understood to refer not only to a particular individual cell, but to the progeny or potential progeny of the cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term.
- in vivo refers to inside the body of an organism
- ex vivo or “in vitro” means outside the body of an organism.
- the term “treat” , “treatment” , or “treating” is an approach for obtaining beneficial or desired results including clinical results.
- the beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms resulting from a disease, diminishing the extent of a disease, stabilizing a disease (e.g., delaying the worsening of a disease) , delaying the spread (e.g., metastasis) of a disease, delaying the recurrence of a disease, reducing recurrence rate of a disease, delay or slowing the progression of a disease, ameliorating a disease state, providing a remission (partial or total) of a disease, decreasing the dose of one or more other medications required to treat a disease, delaying the progression of a disease, increasing the quality of life, and/or prolonging survival.
- treatment is a reduction of pathological consequence of a disease (such as cancer)
- disease includes the terms “disorder” and “condition” and is not limited to those have been specifically medically defined.
- transcript includes any transcription product by transcription from a DNA, including subgenomic RNA, mRNA, non-coding RNA, and any variants, derivatives, or ancestors thereof, for example, pre-mRNA, and any transcripts or isoforms produced from the DNA or the pre-mRNA by, e.g., alternative promoter usage, alternative splicing, alternative initiation, and any naturally occurring variants thereof or processed products therefrom.
- reference to “not” a value or parameter generally means and describes “other than” a value or parameter.
- the method is not used to treat cancer of type X means the method may be used to treat cancer of types other than X.
- the term “and/or” in a phrase such as “A and/or B” is intended to include both A and B; A or B; A (alone) ; and B (alone) .
- the term “and/or” in a phrase such as “A, B, and/or C” is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone) ; B (alone) ; and C (alone) .
- FIG. 1 Design and mechanisms of base editing by conventional base editors and glycosylase-based base editors of the disclosure.
- FIG. 1a Schematic diagrams of ABE (left) and CBE (middle) and the deaminase-free glycosylase-based guanine base editor (gGBE, right) of the disclosure.
- a nCas9-sgRNA complex creates an R-loop at the target site in the DNA.
- the evolved adenine deaminase (tRNA adenosine deaminase, TadA) and AID/APOBEC-like cytidine deaminase converts the exposed adenine (A) into deoxyinosine (I) and cytosine (C) into deoxyuridine (U) , respectively.
- an additional linked protein, uracil glycosylase inhibitor (UGI) protects U from uracil DNA N-glycosylase (UNG) . After deamination, the resulting I is recognized as G, and U as T by DNA polymerase during DNA repair or replication.
- gGBE glycosylase-based guanine base editor
- PAM protospacer adjacent motif
- AP apurinic/apyrimidinic sites.
- FIG. 1b A screening reporter system for detecting G-to-T conversion by gGBE.
- P2A 2A peptide from porcine teschovirus-1.
- WT wild-type.
- FIG. 2 Mutagenesis of the MPG moiety in gGBEs.
- FIG. 2a Schematic diagram of mutagenesis and screening strategy for engineered gGBE. The EGFP reporter plasmids were transiently co-transfected into cultured cells along with the gGBE expression plasmids.
- FIG. 2b Genotypes of a subset of engineered gGBEs, with percentage of EGFP + cells for each gGBE on the far-right column (more engineered gGBEs listed in FIG. 13) . Different steps of mutagenesis are marked by different shaded colors.
- FIG. 3 Characterization of editing profiles of gGBE via target deep sequencing.
- G# G position with highest on-target base editing frequencies across protospacer positions 1–20.
- site # genomic site number.
- FIG. 3b The ratio of G-to-C/T to G-to-A/C/T conversion frequency by gGBEv6.3 editing at the sites shown in FIG. 3a.
- FIG. 3c Frequencies of G conversions by gGBEv6.3 across protospacer positions 1–20 at the edited sites in FIG.
- FIG. 3a (in which PAM was at positions 21–23) .
- Single dots represent individual data point from 3 independent replicates per site. Boxes span the interquartile range (25th to 75th percentile) ; horizontal line in the box indicates the median (50th percentile) ; and small horizontal bars mark the minimal and maximal values.
- OT off-target.
- FIG. 4 Gene editing applications of gGBE.
- FIG. 4a Application of gGBEv6.3 for editing splicing sites, introduction of premature termination codons (PTCs) , as well as editing that bypasses PTCs.
- FIG. 4b Schematic diagram illustrating gGBE-indued skipping of DMD exon 45.
- FIG. 4d Schematic diagram illustrating the introduction of PTCs in the mouse Tyr gene by gGBE.
- FIG. 4g Phenotype of F0 mice generated by gGBE editing in mouse zygotes. The Image showing the presence of edited P6 mice. Red arrowhead, albino; blue arrowhead, mosaic pigmentation.
- FIG. 4h Bar plots showing the on-target G editing frequencies for individual mouse pups, with gGBEv6.3 targeting Tyr site 3.
- FIG. 4i Genotyping of representative F0 pups from (FIG. 4h) . The frequencies of mutant alleles were determined by high-throughput sequencing. Red arrowhead, albino pups.
- FIG. 5 illustrates example nucleotide conversion by base excision and translesion synthesis (TLS) .
- FIG. 6 illustrates an exemplify target dsDNA containing a first exemplify deoxyribonucleotide dG, an exemplify guide nucleic acid, and an exemplify napDNAbp before base editing.
- FIG. 7 illustrates an exemplify target dsDNA containing a fourth exemplify deoxyribonucleotide dC, an exemplify guide nucleic acid, and an exemplify napDNAbp after base editing.
- FIG. 8 illustrates an exemplify target dsDNA containing a fourth exemplify deoxyribonucleotide dT, an exemplify guide nucleic acid, and an exemplify napDNAbp after base editing.
- FIG. 9a Schematic diagram of AYBE.
- MPG N-methylpurine DNA glycosylase
- BER base excision repair
- FIG. 9b Schematic diagram of CGBE.
- Uracil DNA N-glycosylase excises the uridine (U) resulting from deamination of cytosine (C) by the AID/APOBEC-like cytidine deaminase, triggering base excision repair (BER) pathway in cells, thus causing dominant C-to-G editing.
- PAM Protospacer adjacent motif.
- AP apurinic/apyrimidinic sites.
- FIG. 10 Characterization of A-to-T and G-to-T editing with an intron-split EGFP reporter system.
- FIG. 10a Design of the reporter for A-to-T or G-to-T editing detection. P2A, 2A peptide from the porcine teschovirus-1.
- FIG. 10b Percentage of EGFP + cells for evaluation of A editing efficiency by gABE with various MPG.
- FIG. 10c Percentage of EGFP + cells representing the efficiency of G-to-T conversion for gGBE containing various MPG and sgRNA.
- FIG. 10d Representative flow cytometry scatter plots showing gating strategy and the percentages of EGFP + cells for gGBEv3.
- FIG. 11 View of MPG structure and the first round of mutagenesis of MPG.
- FIG. 11a Structures for aa 78-298 region (left) and 163-179 region (right) of human MPG protein (shown in gray) , as predicted by AlphaFold (alphafold. com/entry/P29372) aligned with the crystal structure of MPG (PDB entry 1ewn, not shown) , in which ⁇ A was mutated to G in the DNA.
- FIG. 11a Structures for aa 78-298 region (left) and 163-179 region (right) of human MPG protein (shown in gray) , as predicted by AlphaFold (alphafold. com/entry/P29372) aligned with the crystal structure of MPG (PD
- FIG. 12a-d Percentage of EGFP + cells of gGBEs with various MPG mutants from sequential substitutions of glutamic acid (FIG. 12a) , valine (FIG. 12b) , glycine (FIG. 12c) , and tyrosine (FIG. 12d) (X-to-E, V, G, or Y) .
- n 3. All values are presented as mean ⁇ s.e.m.
- FIG. 13 Progressive engineering and G editing efficiency of gGBEs.
- FIG. 13a Progressive mutations of gGBEs. Different rounds of mutations are marked with different color shades.
- FIG. 14a-c Frequencies of C (FIG. 14a) , T (FIG. 14b) and A (FIG. 14c) conversions by gGBEv6.3 across the protospacer positions 1–20 (where PAM is at positions 21–23) from the edited sites in FIG. 3a.
- FIG. 14d Frequencies of G-to-T and G-to-C editing by gGBEv6.3.
- FIG. 14j The statistical analysis of on-target DNA base editing for each NG motif from the edited sites in (FIG. 14i) . Each dot represents the mean of three biological replicates for each edited position at various edited sites.
- FIG. 15 The guide sequence-dependent and guide sequence-independent off-target analysis.
- OT off-target.
- Data for AYBE and ABE8e were adopted from Tong et al. [1] . All values are presented as mean ⁇ s.e.m.
- FIG. 16 The percentage of G-to-C and G-to-T among all G-to-C/T/Aconversion events at DMD or Tyr sites targeted by gGBEv6.3.
- FIG. 16a Percentages of G-to-C and G-to-T editing events in HEK293T cells with gGBEv6.3 at two DMD sites (corresponding to FIG. 4C) .
- n 3.
- SA splicing acceptor site.
- PAM Protospacer adjacent motif.
- FIG. 16b Percentages of G-to-C and G-to-T editing events in N2a cells with gGBEv6.3 at three Tyr sites (corresponding to FIG. 4e) .
- n 3. All values are presented as mean ⁇ s.e.m.
- FIG. 17 G editing in mouse embryos with gGBEv6.3.
- FIG. 17e Bar plots showing the on-target G editing frequencies for individual mouse embryos, with gGBEv6.3 targeting Tyr site 1, Tyr site 2, and Tyr site 3.
- FIG. 18 Phenotypes and genotyping of F0 mouse pups.
- FIG. 18a-c Phenotypes of F0 mice generated by microinjection of gGBEv6.3 encoding mRNA and sgRNA for targeting Tyr site 1 (FIG. 18a) , site 2 (FIG. 18b) and site 3 (FIG. 18c) . The images were obtained for P6 mice. Arrowheads: red, albino mice; blue, mice with mosaic pigmentation.
- FIG. 18d Bar plots showing the on-target G editing frequencies for individual mouse pups, with gGBEv6.3 targeting Tyr site 1 and Tyr site 2.
- FIG. 19 Design and mechanisms of two orthogonal glycosylase-based base editors.
- FIG. 19a Prototype versions of a deaminase-free glycosylase-based thymine base editor (gTBE) and a deaminase-free glycosylase-based cytosine base editor (gCBE) .
- PAM Protospacer adjacent motif.
- AP apurinic/apyrimidinic sites.
- Star (*) in magenta indicates the nick generated by nCas9.
- FIG. 19b Schematic diagram of potential pathway for T (or C) editing and outcomes.
- a glycosylase mutant is designed to remove normal T or C, an nCas9-sgRNA complex creates an R-loop at the target site and nicks the non-edited strand, then the AP site generated is repaired by translesion synthesis (TLS) and/or DNA replication, leading to T or C editing.
- TLS translesion synthesis
- DSB double-strand break. indel, insertion and deletion.
- FIG. 19c Schematic of various gTBE and gCBE candidate architectures. Note that Y156A and N213D of UNG2 are equivalent to Y147A and N204D of UNG1, respectively.
- T target sgRNA.
- T target sgRNA.
- FIG. 20 Protein engineering and evolution of gTBEs.
- FIG. 20a Schematic diagram of mutagenesis and screening strategy for engineering gTBE.
- the EGFP reporter plasmids were transiently co-transfected into cultured cells along with the gTBE plasmids, and the fluorescence intensity of EGFP was detected with flow cytometry.
- FIG. 20b Left, the selected residues (shown as surface) for mutagenesis nearby the catalytic site pocket of human UNG-DNA complex (PDB entry 1EMH 24 ) , in which d ⁇ U was mutated to T in the DNA (dT) .
- PDB entry 1EMH 24 the catalytic site pocket of human UNG-DNA complex
- d ⁇ U was mutated to T in the DNA
- right location of the effective residues in gTBEv3 shown as spheres in red on the three-dimensional structure.
- FIG. 20a Schematic diagram of mutagenesis and screening strategy for engineering gTBE
- WT wild-type UNG2 ⁇ 88. dead, catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) 60 .
- FIG. 21 Characterization of editing profiles of gTBE via target deep sequencing.
- T# T position with highest on-target base editing frequencies across protospacer positions 1–20.
- site # genomic site number.
- FIG. 21b The ratio of T-to-C/G to T-to-A/C/G conversion frequency by gTBEv3 editing at the sites shown in FIG. 21a.
- T# T position with highest on-target base editing frequencies across
- OT off-target.
- FIG. 22 Enhancement of gCBE editing efficiency through protein engineering.
- FIG. 22a Schematic diagram of mutagenesis and screening strategy for engineering gCBE.
- WT wild-type UNG2 ⁇ 88. dead, catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) 60 .
- C# C position with highest on-target base editing frequencies across protospacer positions 1-20.
- FIG. 22f gRNA-independent cumulative off-target editing frequencies detected by the orthogonal R-loop assay at each R-loop site.
- FIG. 23 Gene editing applications of gTBE and gCBE.
- FIG. 23a principle for exon skipping with base editors.
- FIG. 23b Bar plots showing the numbers of sgRNA candidates targeting the splicing sites in 16 genes by different base editors.
- the 16 genes are AGT, ANGPTL3, APOC3, B2M, CD33, DMD, DNMT3A, HPD, KLKB1, PCSK9, PDCD1, PRDM1, TGFBR2, TRAC, TTR, and VEGFA.
- FIG. 23c Venn diagram showing the distribution of sgRNAs for 4 base editors in FIG. 23b.
- FIG. 23d Schematic diagram illustrating sgRNA candidates specifically targeting SD or SA sites in human DMD with gTBEv3 (red lines) or gCBEv2 (black lines) , but not ABE or CBE.
- FIG. 23e Schematic diagram illustrating the skipping of human DMD exon 45 induced by gTBE-induced disruption of the splicing donor site.
- FIG. 23g DNA sequencing chromatograms from wild-type (WT) and representative embryos co-injected with gTBEv3 mRNA and sgRNA targeting the SD site of human DMD exon 45.
- WT wild-type
- FIG. 24 Comparison of different gTBEs.
- FIG. 24a the strategies for protein engineering and screening used in three studies.
- FIG. 24b Schematic of the basic architectures for various base editors. UNG2*, UNG2 mutant from the corresponding base editor. ⁇ NTD, deletion of the N-terminal domain.
- FIG. 25 Characteristic sequences and motifs of human UNG1 and UNG2.
- FIG. 25a UNG1-specific N-terminal residues (amino acid 1-35) are marked in grey.
- UNG2-specific N-terminal residues (amino acid 1-44) are light blue.
- the common RPA-binding site (yellow) and the globular catalytic domain (light green) are indicated.
- RPA Replication protein A.
- UNGs contain five conserved motifs numbered from UNG2 as follows: the catalytic water-activating loop (152-GQDPYH-157) ; the proline (Pro) -rich loop compressing the DNA backbone 5’ to the lesion (174-PPPPS-178) ; the uracil-binding motif (210-LLLN-213) ; the glycine-serine (Gly-Ser) loop that compresses the DNA backbone 3’ to the lesion (255-GS-256) ; and the leucine (Leu) -intercalation loop penetrating the minor groove (277-HPSPLS-282) .
- FIG. 26 Characterizations of T-to-G and C-to-G reporter system.
- FIG. 26a Schematic construct designs of the reporter for T-to-G or C-to-G editing detection.
- PAM Protospacer adjacent motif.
- FIG. 26b Representative flow cytometry scatter plots showing gating strategy and the percentages of gated cells for the negative control (upper panel) and gCBEv0.3 (lower panel) .
- FIG. 27 Editing efficiency of gTBE and gCBE candidates with various UNG-NTD truncations.
- WT wild-type UNG2 ⁇ 88. dead, catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ⁇ s.e.m.
- FIG. 28 Performance of UNG mutants in the background of gTBEv0.3.
- FIG. 29 Performance of UNG mutants in the background of gTBEv2.
- dead catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ⁇ s.e.m.
- FIG. 30 Further characterization of editing profiles of gTBEv3.
- FIG. 30b-e Frequencies of T (FIG. 30b) , G (FIG. 30c) , C (FIG. 30d) and A (FIG. 30e) conversions by gTBEv3 across the protospacer positions 1-20 (where PAM is at positions 21–23) from the edited sites in FIG. 21a.
- FIG. 30f Frequencies of T-to-G and T-to-C editing by gTBEv3.
- T# T position with highest on-target base editing frequencies across protospacer positions 1–20.
- site # genomic site number.
- FIG. 30j The ratio of T-to-Sto total T editing (base conversions and indels) by gTBEv3 editing at the sites shown in FIG. 21a.
- FIG. 31 Guide sequence-dependent off-target analysis for gTBEv3 at more sites.
- the guide sequence-dependent off-target analysis for gTBEv3 editing at site 9 (a) and site 15 (b) (n 3) .
- OT off-target. All values are presented as mean ⁇ s.e.m.
- FIG. 32 Performance of UNG mutants in the background of gCBEv0.3.
- Replacement of alanine with valine (A-to-V) is intended to cover all the residues in the interested regions.
- dead catalytically inactive UNG2 ⁇ 88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ⁇ s.e.m.
- FIG. 33 Further characterization of editing profiles of gCBEv2.
- FIG. 33c-d Frequencies of C (FIG. 33c) and T (FIG.
- FIG. 33d The ratio of C-to-G/T to C-to-A/G/T conversion frequency by gCBEv2 editing at the sites shown in FIG. 22c.
- FIG. 33f-h Percentage of C-to-G (FIG. 33f) , C-to-T (FIG.
- C# C position with highest on-target base editing frequencies across protospacer positions 1-20.
- site # genomic site number.
- j The statistical analysis of on-target DNA base editing for each NC motif from the 16 edited sites. Each dot represents the mean of three biological replicates for each edited position at various edited sites.
- FIG. 34 Base editing at spicing sites with gTBEv3.
- FIG. 34a The optimal editing windows for various base editors.
- FIG. 34b Venn diagram showing the distribution of sgRNAs for CBE and gCBEv2 in FIG. 23b.
- T#or C# The position of targeted T or C across protospacer positions 1–20.
- FIG. 34d DNA sequencing chromatograms for targeting the SD site of human DMD exon 37 and exon 12 with gTBEv3. Sanger sequencing results were quantified by EditR.
- FIG. 35 PTCs editing and introduction for various base editors.
- FIG. 35a principle for bypassing premature termination codons (PTCs) with various base editors.
- FIG. 35b the possible codon outcomes from stop codons (TAA, TAG or TGA) editing with different base editors.
- FIG. 35c principle for introduction of PTCs with various base editors.
- FIG. 35d the available codons for editing into stop codons (TAA, TAG or TGA) with different base editors.
- 35e The 10 ⁇ 10 dot plot diagram showing the percentage of possible sgRNAs for introduction of premature termination codons (PTCs) by targeting different codons (with the number of available sgRNAs presented in the right) in 15 well-studied genes (AGT, ANGPTL3, APOC3, B2M, CD33, DNMT3A, HPD, KLKB1, PCSK9, PDCD1, PRDM1, TGFBR2, TRAC, TTR, VEGFA) for gene and cell therapy research with gGBEv6.3 and CBE.
- AGT ANGPTL3, APOC3, B2M
- CD33 DNMT3A, HPD, KLKB1, PCSK9, PDCD1, PRDM1, TGFBR2, TRAC, TTR, VEGFA
- FIG. 36 Additional comparison of different gTBEs.
- FIG. 36f-h The statistical analysis of T base editing (FIG.
- FIG. 36a-h the graphs were derived from the data for various base editors shown in FIG. 24c. Dunnett’s multiple comparisons test after one-way ANOVA was used to compare the gTBEv3 or gTBEv5 with other base editors in FIG. 36f-h.
- FIG. 37 T editing in the dsDNA upstream from the target site.
- FIG. 38 Comparison of various glycosylase-based base editors for cytosine editing.
- FIG. 38a Schematic of the basic architectures for various base editors. UNG2*, UNG2 mutant from the corresponding base editor. ⁇ NTD, deletion of the N-terminal domain.
- FIG. 38b-c The frequencies of total C editing (base conversions and indels, FIG. 38b) or C base conversions (FIG. 38c) for various base editors at 19 endogenous loci.
- the cytosines with editing frequencies >25%for any base editors were showed.
- FIG. 38a Schematic of the basic architectures for various base editors. UNG2*, UNG2 mutant from the corresponding base editor. ⁇ NTD, deletion of the N-terminal domain.
- FIG. 38b-c The frequencies of total C editing (base conversions and
- FIG. 39 Additional comparison of various glycosylase-based base editors for cytosine editing.
- FIG. 39c Frequencies of C base conversions by various base editors across the protospacer positions 1-20 (where PAM is at positions 21–23) .
- the graphs were derived from the data for various base editors shown in FIG. 38c.
- FIG. 40 Off-target analysis of various glycosylase-based base editors.
- OT off-target.
- FIG. 41 Comparison between gTBEs or gCBEs and PEs.
- OT off-target.
- the PE6d was used together with epegRNA and nick sgRNA. For PE6d max, PE6d was co-expressed with the codon-optimized hMLH1dn, a dominant negative MMR protein. All values are presented as mean ⁇ s.e.m.
- FIG. 42 Characterization of editing profiles for gTBEs or gCBEs in HEK293T, HuH-7, and U2OS cells.
- HuH-7 a cell line established from a human hepatocellular carcinoma
- U2OS a cell line established from a human bone osteosarcoma. All values are presented as mean ⁇ s.e.m.
- dG, dT, dC target deoxyribonucleotide
- ABE and CBE require a deaminase for the base editing of A and C.
- provided in the disclosure includes at least in part base editors and base editing methods capable of base editing of a target deoxyribonucleotide (e.g., dG, dT, dC) in a target dsDNA in the absence of a deaminase.
- a target deoxyribonucleotide e.g., dG, dT, dC
- the base editors and base editing methods of the disclosure rely on the base excision domain of the base editor.
- the base excision domain is capable of directly excising the base of a target deoxyribonucleotide in a target dsDNA to generate an abasic site in situ, trigging a base excision repair (BER) pathway.
- BER base excision repair
- the target deoxyribonucleotide may be converted to another deoxyribonucleotide, leading to base editing of the target deoxyribonucleotide.
- the base editors and base editing methods of the disclosure also rely on a nucleic acid programmable DNA binding domain (napDNAbd) to specifically direct the base editor to a target dsDNA via a guide nucleic acid capable of interacting with both the napDNAbd and the target dsDNA.
- the napDNAbd may be associated (e.g., complex) with a guide nucleic acid (e.g., a guide RNA) .
- the guide nucleic acid is designed to localize or target the napDNAbd to the target dsDNA, by relying on the hybridization between a target sequence of the target dsDNA and a corresponding guide sequence of the guide nucleic acid.
- the guide nucleic acid comprises a guide sequence that is capable of hybridizing to a target sequence of the target dsDNA due to the substantial complementarity between the guide sequence and the target sequence.
- the guide nucleic acid also comprises a scaffold sequence capable of forming a complex with the napDNAbd. In this way, the guide nucleic acid “programs” the napDNAbd such that the napDNAbd can specifically localize and (indirectly) bind to the region on and around the target sequence of the target dsDNA via the guide nucleic acid.
- the binding of the napDNAbd to the target dsDNA enables the base excising domain associated with the napDNAbd to specifically access to and function on the base of the target deoxyribonucleotide in the target sequence of the target dsDNA in a guide sequence-specific/dependent way.
- the base excision domain of the base editor of the disclosure directly excises the base of a target deoxyribonucleotide (the first deoxyribonucleotide in FIG. 6) , generates an abasic site where the base is removed, which is an apurinic site where a purine (e.g., guanine) is removed or an apyrimidinic site where a pyrimidine (e.g., thymine, cytosine) is removed.
- a purine e.g., guanine
- a pyrimidine e.g., thymine, cytosine
- the abasic site may be repaired by translesion synthesis (TLS) (by, e.g., TLS polymerase) and/or DNA replication, leading to base editing, in which case nicking in the strand opposite to the abasic site may not be necessary.
- TLS translesion synthesis
- a nucleic acid programmable DNA nickase such as Cas9 nickase
- it creates a nick in the strand (non-edited strand) opposite to the abasic site, and the apyrimidinic site may be removed by AP lyase to generate another nick on the edited strand, which two nicks trigger double-strand break (DSB) repair and introduction of indel mutation, also leading to highly potential change of the target deoxyribonucleotide.
- DSB double-strand break
- FIG. 6 shows, before base editing, a first deoxyribonucleotide of dG as target deoxyribonucleotide to be edited on the edited strand (nontarget strand) and a second deoxyribonucleotide of dC on the opposite strand (non-edited strand /target strand) base pairing with the dG.
- the dG is located in a protospacer sequence on the nontarget strand of the target dsDNA
- the dC is located in a target sequence on the target strand of the target dsDNA.
- a guide nucleic acid is designed to comprise a guide sequence capable of hybridizing to the target sequence and comprises a scaffold sequence capable of forming a complex with a napDNAbd.
- the napDNAbd is capable of nicking the target stand and is fused with a base excising domain capable of excising guanine of the dG.
- FIG. 7 shows, after base editing, a fourth deoxyribonucleotide of dC as outcome deoxyribonucleotide on the edited strand (nontarget strand) and a third deoxyribonucleotide of dG on the opposite strand (non-edited strand /target strand) base pairing with the dC.
- FIG. 7 together show direct dG-to-dC base editing.
- FIG. 8 shows, after base editing, a fourth deoxyribonucleotide of dT as outcome deoxyribonucleotide on the edited strand (nontarget strand) and a third deoxyribonucleotide of dA on the opposite strand (non-edited strand /target strand) base pairing with the dT.
- FIG. 6 and FIG. 8 together show direct dG-to-dT base editing.
- the base editing approach of the disclosure allows direct base editing of a target deoxyribonucleotide (e.g., dG, dT, dC) in a target dsDNA, expanding the scope of target design and screening for the direct base editing. For example, if editing of a target dG to dA is desired, the traditional base editors incapable of directly editing dG would have to be applied to edit dC on the opposite strand to dT, thereby indirectly editing dG to dA.
- a target deoxyribonucleotide e.g., dG, dT, dC
- the editing ability of the traditional CBE might not be able to edit the dC with a desired outcome dT;the PAM limitation of the CBE might not allow designing a target /guide sequence targeting the dC to specifically direct the CBE to the dC; and even if such a guide sequence can be designed, the base editing efficiency of the CBE might not be sufficient.
- the target dG can be directly base edited, and therefore, developers would have much more chance to design, screen, and obtain a suitable target /guide sequence targeting the dG to specifically direct the base editor of the disclosure to the dG.
- the base editing approach of the disclosure may function in the absence of deamination of the base of the target deoxyribonucleotide before the excision of the base of the target deoxyribonucleotide, or in the absence of deamination at all.
- traditional ABE needs to deaminase the base (adenine) of a target dA to a hypoxanthine, thereby converting the target dA to inosine (I) , which reads as dG in DNA repair replication
- traditional CBE needs to deaminase the base (cytosine) of a target dC to an uracil, thereby converting the target dC to uridine (U) , which reads as dT in DNA repair or replication.
- Deamination is unlikely for G (due to spontaneous remediation) [18] and impossible for T (due to the absence of amine) , making the development of deaminase-based G and T base editors a challenging task.
- the omission of a deaminase domain in the base editor of the disclosure opens the way to base editing of G and T, and may also avoid undesired effects caused by the deaminase domain and deamination of traditional deaminase-based base editors and reduce base editor size.
- a deaminase-free, glycosylase-based guanine base editor (gGBE) was developed with G editing ability, by fusing a nucleic acid programmable DNA nickase such as Cas9 nickase with a human N-methylpurine DNA glycosylase (MPG) mutant capable of excising guanine of dG developed by several rounds of MPG mutagenesis via unbiased and rational screening. It was demonstrated that the gGBE has high G editing efficiency.
- MPG human N-methylpurine DNA glycosylase
- the gGBE exhibited high base editing efficiency (up to 81.2%) and high G-to-T or G-to-C (i.e., G-to-Y) conversion ratio (up to 0.95) in both cultured human cells and mouse embryos.
- a nucleic acid programmable DNA nickase such as Cas9 nickase with a human uracil DNA glycosylase (UNG) mutant capable of excising thymine of dT or cytosine of dC separately developed by mutagenesis of UNG
- UNG human uracil DNA glycosylase
- two deaminase-free, glycosylase-based base editors for direct T editing (gTBE) and direct C editing (gCBE) were developed to achieve orthogonal base editing, that is gTBE for direct T editing and gCBE for direct C editing, respectively.
- gTBE and gCBE were obtained with high activity of T-to-S (i.e., T-to-C or T-to-G) and C-to-G conversions, respectively. Furthermore, by embedding the UNG mutant into a nucleic acid programmable DNA nickase such as Cas9 nickase, more gTBE and gCBE were generated, showing enhanced average editing efficiency and alternative editing windows.
- the editing profile of gTBE and gCBE were characterized by targeting dozens of endogenous genomic loci in cultured mammalian cells as well as mouse embryos, demonstrating their high base editing efficiency.
- the base editor of the disclosure may be provided in the form of a fusion protein.
- the disclosure provides a fusion protein comprising:
- napDNAbd nucleic acid programmable DNA binding domain capable of binding a target dsDNA comprising:
- a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
- dG deoxyguanosine
- dT thymidine
- dC deoxycytidine
- a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
- first deoxyribonucleotide e.g., dG, dT, dC
- target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence
- a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide.
- the fusion protein does not comprise a deaminase domain, e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
- a deaminase domain e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
- the disclosure provides a system comprising:
- fusion protein or a polynucleotide encoding the fusion protein, the fusion protein comprising:
- napDNAbd nucleic acid programmable DNA binding domain capable of binding a target dsDNA comprising:
- a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
- dG deoxyguanosine
- dT thymidine
- dC deoxycytidine
- a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
- first deoxyribonucleotide e.g., dG, dT, dC
- target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence
- a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide
- a guide nucleic acid or a polynucleotide encoding the guide nucleic acid comprising:
- the system is a complex comprising the fusion protein complexed with the guide nucleic acid.
- the complex further comprises the target dsDNA hybridized with the guide sequence.
- the system is a composition comprising the component (i) and the component (ii) .
- the guide nucleic acid as described herein is a guide RNA (gRNA) .
- gRNA guide RNA
- sgRNA single guide RNA
- the disclosure provides a method of modifying a target dsDNA, comprising contacting the target dsDNA with a system,
- the target dsDNA comprising:
- a first deoxyribonucleotide e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine)
- dG deoxyguanosine
- dT thymidine
- dC deoxycytidine
- a second deoxyribonucleotide e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine)
- first deoxyribonucleotide e.g., dG, dT, dC
- target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence
- fusion protein or a polynucleotide encoding the fusion protein, the fusion protein comprising:
- napDNAbd nucleic acid programmable DNA binding domain
- a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide
- a guide nucleic acid or a polynucleotide encoding the guide nucleic acid comprising:
- the method does not include deamination of the base of the first deoxyribonucleotide before the excision of the base of the first deoxyribonucleotide.
- the method does not include deamination of the base of the first deoxyribonucleotide.
- the method comprises inducing strand separation of the target dsDNA.
- the target deoxyribonucleotide to be edited in the target dsDNA may be termed as “first deoxyribonucleotide”
- the outcome deoxyribonucleotide converted from the target deoxyribonucleotide by base editing may be termed as “fourth deoxyribonucleotide” .
- the first deoxyribonucleotide is deoxyguanosine (dG) , thymidine (dT) , deoxyadenosine (dA) , or deoxycytidine (dC) .
- the first deoxyribonucleotide is dG.
- the first deoxyribonucleotide is dT.
- the fourth deoxyribonucleotide is dA, dT, dC, or dG.
- the second deoxyribonucleotide is dC, dA, dT, or dG.
- the third deoxyribonucleotide is dA, dT, dC, or dG.
- the first deoxyribonucleotide is converted to a fourth deoxyribonucleotide that is different from the first deoxyribonucleotide.
- the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide is dG-to-dA, dG-to-dT, dG-to-dC, dT-to-dA, dT-to-dC, dT-to-dG, dC-to-dA, dC-to-dT, or dC-to-dG.
- the target dsDNA is a wild type or naturally-occuring. In some embodiments, the target dsDNA is not a wild type or naturally-occuring. In some embodiments, the target dsDNA is eukaryotic or prokaryotic. In some embodiments, the target dsDNA is from an animal (e.g., human, monkey, mouse) or plant. In some embodiments, the target dsDNA is a target gene. In some embodiments, the gene is an animal (e.g., human, monkey, mouse) or plant gene. In some embodiments, the dsDNA is in a target cell.
- the first deoxyribonucleotide is native or nonnative to the target dsDNA. In some embodiments, the first deoxyribonucleotide is a mutation in the target dsDNA. In some embodiments, the first deoxyribonucleotide is a pathogenic mutation in the target dsDNA. In some embodiments, the first deoxyribonucleotide is a mutation resulting in a stop codon in the target dsDNA.
- the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide directly or indirectly converts a stop codon to a non-stop codon or directly or indirectly converts a non-stop codon to a stop codon, either on the target strand or the nontarget strand.
- the stop codon is on the sense strand of the dsDNA.
- the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide occurs on the sense strand or the nonsense strand of the dsDNA. In some embodiments, the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide (e.g., dG-to-dT) occurs on the sense strand of the dsDNA, converting a stop codon on the sense strand to a non-stop codon or converting a non-stop codon (e.g., GAA) on the sense strand to a stop codon (e.g., TAA) .
- dG-to-dT the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide
- the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide occurs on the nonsense strand of the dsDNA, converting a stop codon on the sense strand to a non-stop codon or converting a non-stop codon (e.g., TCA) on the sense strand to a stop codon (e.g., TGA) .
- the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide occurs at the splicing site (e.g., splicing donor, splicing acceptor) of the target dsDNA.
- the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide e.g., dG-to-dC
- the splicing site e.g., splicing donor, splicing acceptor
- the first deoxyribonucleotide is at a position of the protospacer sequence selected from the group consisting of position 1, position 2, position 3, position 4, position 5, position 6, position 7, position 8, position 9, position 10, position 11, position 12, position 13, position 14, position 15, position 16, position 17, position 18, position 19, position 20, and a combination thereof; or wherein the first deoxyribonucleotide is at a position of the protospacer sequence between position 1 and position 20, both inclusive; or wherein the first deoxyribonucleotide is at a position of the protospacer sequence between position 1 and position 14, both inclusive.
- the first deoxyribonucleotide is at a position of the protospacer sequence selected from the group consisting of position 6, position 7, position 8, position 9, position 10, position 11, and a combination thereof; or wherein the first deoxyribonucleotide is at a position of the protospacer sequence between position 6 and position 11, both inclusive. In some embodiments, the first deoxyribonucleotide is at position 7 of the protospacer sequence.
- the first deoxyribonucleotide is the N1 or N2 nucleotide in a motif of N1N2, wherein N1 or N2 is A, T, G, or C. In some embodiments, the first deoxyribonucleotide is the N2 nucleotide in a motif of N1N2, wherein N1 is A or T, and N2 is C. In some embodiments, the first deoxyribonucleotide is the N1, N2, or N3 nucleotide in a motif of N1N2N3, wherein N1, N2, or N3 is A, T, G, or C.
- base excising domain (BED) is used interchangeably with “base excising protein (BEP) ” or “base excising enzyme (BEE) ” and refers to a protein capable of recognizing and excising a base (e.g., A, T, C, G, or U) of a nucleotide of a nucleic acid (e.g., DNA (ssDNA or dsDNA) or RNA) .
- base is used interchangeably with “nucleobase” or “nitrogenous base” .
- Base includes, for example, adenine (A) , cytosine (C) , guanine (G) , thymine (T) , and uracil (U) , and they may be termed as primary, normal, or canonical base.
- a deoxyribonucleotide is composed of a base, a deoxyribose, and a phosphate
- a deoxyribonucleoside is composed of a base and a deoxyribose. Excising the base of a deoxyribonucleoside releases the base from the deoxyribonucleoside.
- excising the base of a deoxyribonucleoside comprises cleaving or hydrolyzing the glycosidic bond linking the base to the deoxyribose of the first deoxyribonucleotide, thereby releasing the base from the first deoxyribonucleotide.
- the base excising domain is (substantially) capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide (e.g., dG, dT, dC) .
- the first deoxyribonucleotide is dG, dT, dA, or dC.
- the base excising domain is (substantially) capable of excising guanine of dG.
- the base excising domain is (substantially) capable of excising thymine of dT.
- the base excising domain is (substantially) capable of excising cytosine of dC.
- the base excising domain is (substantially) capable of excising adenine of dA.
- the base excising domain can only excise one type of bases but not excise the other types of bases.
- the base excising domain is (substantially) incapable of excising guanine of dG.
- the base excising domain is (substantially) incapable of excising thymine of dT.
- the base excising domain is (substantially) incapable of excising cytosine of dC.
- the base excising domain is (substantially) incapable of excising adenine of dA.
- the base excising domain can excise more than one type of bases.
- the base excising domain is (substantially) capable of excising any two, three, or four of guanine of dG, thymine of dT, cytosine of dC, and adenine of dA.
- the base excising domain is (substantially) capable of excising both guanine of dG and thymine of dT.
- the base excising domain is (substantially) capable of excising uracil. In some embodiment, the base excising domain is (substantially) incapable of excising uracil. In some embodiment, the base excising domain is (substantially) capable of excising hypoxanthine. In some embodiment, the base excising domain is (substantially) incapable of excising hypoxanthine.
- the fusion protein of the disclosure does not comprise a base excising domain (substantially) capable of excising guanine of dG, thymine of dT, cytosine of dC, adenine of dA, uracil, and/or hypoxanthine.
- the base excising domain is (substantially) incapable of excising bases on both strands of a target dsDNA. In some embodiments, the base excising domain is (substantially) incapable of excising both bases of a pair of base-paired deoxyribonucleotides on both strands of a dsDNA.
- the base excision domain comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to a naturally-occurring base excision domain, such as a naturally-occurring base excision domain provided herein.
- a naturally-occurring base excision domain provided herein.
- the fusion protein of the disclosure comprises one, two, three, or more base excising domains. In some embodiments, the fusion protein comprises two, three, or more base excising domains, which are the same or different.
- the base excising domain could be a glycosylase having the desired base exising ability.
- the base excising domain comprises a glycosylase.
- the glycosylase is selected from the group consisting of N-methylpurine DNA glycosylase (MPG) , 8-oxoguanine DNA glycosylase (OGG1) , methyl-CpG binding domain 4, DNA glycosylase (MBD4) , thymine DNA glycosylase (TDG) , uracil DNA glycosylase (UNG) , single-strand-selective monofunctional uracil-DNA glycosylase 1 (SMUG1) , mutY DNA glycosylase (MUTYH) , nth like DNA glycosylase 1 (NTHL1) , nei like DNA glycosylase 1 (NEIL1) , nei like DNA glycosylase 2 (NEIL2) , nei like DNA glycosylase 3 (NEIL3) , and mutants thereof capable
- Exemplary glycosylases capable of excising a base include, without limitation, UDG-N204D and UDG-Y147A as described in Kavli, B. et al. Excision of cytosine and thymine from DNA by mutants of human uracil-DNA glycosylase. EMBO J 15, 3442-3447 (1996) ; the entire contents of which are hereby incorporated by reference.
- the base excision domain is not wild type or naturally-occurring.
- the base excising domain comprises an N-methylpurine DNA glycosylase (MPG) .
- MPG comprises a motif GxxYxxxxYGxxxxxN, wherein x represents any amino acid.
- the MPG is obtained from a species selected from Table A.
- the MPG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference MPG.
- the wild type or reference MPG is human MPG (SEQ ID NO: 9) or an MPG obtained from a species selected from Table A or any MPG as set forth in Table D or a homology or mutant (e.g., comprising an amino acid sequence of SEQ ID NO: 5, 6, or 7) thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) (e.g., SEQ ID NO: 1) .
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
- the amino acid mutation confers an ability to excise a base on the MPG.
- the base is guanine.
- the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control MPG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of G163, N169, D175, C178, S198, K202, G203, S206, K210, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
- the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference MPG.
- the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
- the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference MPG.
- 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
- the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1.
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
- the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, wherein the position is numbered according to SEQ ID NO: 1.
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 1 or 9.
- the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
- the MPG is (substantially) capable of excising guanine of dG. In some embodiments, the MPG is (substantially) incapable of excising thymine of dT. In some embodiments, the MPG is (substantially) incapable of excising cytosine of dC. In some embodiments, the MPG is (substantially) incapable of excising adenine of dA.
- the MPG is not wild type or naturally-occurring.
- the base excising domain comprises an uracil-DNA glycosylase (UNG) .
- UNG comprises a motif GQDPYH.
- the UNG is obtained from a species selected from Table C.
- the UNG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference UNG.
- the wild type or reference UNG is human UNG1 (SEQ ID NO: 54) or human UNG2 (SEQ ID NO: 133) or an UNG obtained from a species selected from Table C or any UNG as set forth in Table D or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) .
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 56, 58, 135, or 137.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
- UNG1 SEQ ID NO: 54
- UNG2 SEQ ID NO: 133
- residues Y156 and N213 of UNG2 are corresponding to the residues Y147 and N204 of UNG1, respectively.
- the amino acid mutation confers an ability to excise a base on the UNG.
- the base is thymine.
- the base is cytosine.
- the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control UNG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of Y156, K184, N213, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO:133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135 or 137.
- the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference UNG.
- the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of Y156A, K184A, N213D, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
- the amino acid mutation comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
- the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference UNG.
- 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%
- the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of A214T, Q259A, and Y284D, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
- the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of K184A and A214V, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
- the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
- M N-terminal Methionine
- the UNG is (substantially) capable of excising thymine of dT. In some embodiments, the UNG is (substantially) capable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising thymine of dT. In some embodiments, the UNG is (substantially) incapable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising adenine of dA. In some embodiments, the UNG is (substantially) incapable of excising guanine of dG.
- the UNG is not wild type or naturally-occurring.
- the base excising domain comprises TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
- the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is human TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 (SEQ ID NO: 64, 65, 66, 67, 68, 69, 70, 71, or 72, respectively) or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) .
- Method N-terminal Methionine
- the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid sequence having a sequence identity of at least about 60%(e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 64-72, respectively.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
- the amino acid mutation confers an ability to excise a base on the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
- the base is guanine, thymine, cytosine, adenine, uracil, or hypoxanthine.
- the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
- the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1, respectively..
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising guanine of dG. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising adenine of dA.
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising adenine of dA. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising guanine of dG.
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is not wild type or naturally-occurring.
- napDNAbd is used interchangeably with “nucleic acid programmable DNA binding protein (napDNAbp) ” .
- the napDNAbd is RNA programmable DNA binding protein.
- Various napDNAbd are known in the art, including, for example, those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
- Representative napDNAbd include, for example, CRISPR-associated (Cas) proteins, IscB, IsrB, Argonaute, and TnpB.
- the napDNAbd substantially lacks dsDNA cleavage activity (endonuclease activity) . In some embodiments, the napDNAbd substantially lacks dsDNA cleavage activity (endonuclease activity) and nickase activity. In some embodiments, the napDNAbd is nuclease-inactive, for example, a dead Cas. In some embodiments, the napDNAbd is endonuclease-inactive, for example, a dead Cas.
- the napDNAbd is a nickase. In some embodiments, the napDNAbd has nickase activity. In some embodiments, the napDNAbd has nickase activity to nick the target strand. In some embodiments, the napDNAbd nicks the target strand. In some embodiments, the method comprising nicking the target strand. In some embodiments, the nick on the target strand or nicking the target strand incorporates an indel (insertion and/or deletion) into the target strand.
- the napDNAbd is capable of inducing strand separation of the target dsDNA.
- the napDNAbd comprises a Cas domain. In some embodiments, the napDNAbd comprises a Cas nickase (nCas) or a dead (nuclease-inactive) Cas (dCas) of a Cas protein.
- nCas Cas nickase
- dCas dead (nuclease-inactive) Cas
- the Cas protein is selected from a group consisting of a Cas9 protein (such as, SpCas9, SaCas9, GeoCas9, CjCas9, Cas9-KKH, circularly permuted Cas9, Argonaute (Ago) , SmacCas9, Spy-macCas9, xCas9, SpCas9-NG, SpG Cas9) ; a Cas12 protein (such as, Cas12a (Cpf1) , AsCas12a, LbCas12a, Cas12b, Cas12c, Cas12d, Cas12e, Cas12f (Cas14) , Cas12g, Cas12h, Cas12i, xCas12i, Cas12Max, hfCas12Max, Cas12j, Cas12k, Cas12l, Cas12m, Cas12n, Cas12o, Cas
- the Cas nickase is a Cas9 nickase (nCas9) , such as SpCas9 nickase (SpCas9-D10A) .
- the dead Cas is a dead Cas9 (dCas9) , such as dead SpCas9 (SpCas9-D10A+H840A) .
- dCas9 dead Cas9
- SpCas9-D10A+H840A dead SpCas9
- the Cas nickase is a Cas12i nickase (nCas12i) or dead Cas12i (dCas12i) , such as a deadCas12i of xCas12i polypeptide.
- the napDNAbd comprises an IscB nickase (nIscB) or a dead IscB (dIscB) of an IscB protein (e.g., OgeuIscB) or an IscB protein described in PCT/CN2023/129167, PCT/CN2023/142506, PCT/CN2024/071744, and PCT/CN2023/125069, which are incorporated herein by reference in their entireties.
- IscB IscB nickase
- dIscB dead IscB
- IscB protein e.g., OgeuIscB
- the napDNAbd comprise an amino acid sequence having a sequence identity of at least about 60%(e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 2, 48, 50, 52, or 163.
- the napDNAbd comprises a TnpB nickase or a dead TnpB of a TnpB protein.
- the fusion protein comprises, from N-terminal to C-terminal, (1) the napDNAbp and the base excising domain; or (2) the base excising domain and the napDNAbp.
- the napDNAbd (e.g., Cas9) is a two-part napDNAbd, for example, a two-part split Cas9, comprising a N-terminal portion and a C-terminal portion, and wherein the fusion protein comprises, from N-terminal to C-terminal, (1) the N-terminal portion of the napDNAbd, the base excising domain, and the C-terminal portion of the napDNAbd; (2) the C-terminal portion of the napDNAbd, the base excising domain, and the N-terminal portion of the napDNAbd; or (3) the base excising domain, the C-terminal portion of the napDNAbd (e.g., amino acids at positions 1249-1368) , and the N-temrinal portion (e.g., amino acids at positions 1-1248) of the napDNAbd.
- the C-terminal portion of the napDNAbd e.g., amino acids at positions 1249-1368
- the N-temrinal portion e.g., amino acids
- the napDNAbd is SpCas9 (e.g., a SpCas9 nickase) or a mutant thereof (e.g., a SpG Cas9 nickase) .
- the N-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1 or 2 to 1012, 1028, 1041, 1046, 1047, 1248, 1249, or 1300.
- the C-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1013, 1029, 1042, 1047, 1048, 1249, 1063, 1064, 1230, 1249, or 1301 to 1368.
- the fusion protein comprises the base excising domain embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2; or embedded between positions 2-1047 of nCas9 (SEQ ID NO: 2) and positions 1064-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2.
- a typical protein would usually have a N-terminal Met at its most N-terminal (position 1) , since it requires to be translated from a polynucleotide containing a start codon ATG (encoding Met) at its most 5’ end.
- a second protein e.g., an NLS, a napDNAbd
- the start codon ATG may not be necessary for the protein since there would typically be a start codon upstream of the second protein for the translation of the fusion protein as a whole, and thus the N-terminal Met of the protein could be removed.
- Any protein described in the disclosure refers to both the protein per se and a N-terminal truncation thereof with its most N-terminal Met (if present) removed.
- the fusion protein comprises an NLS at the N-terminal and/or C-terminal of the napDNAbp. In some embodiments, the fusion protein comprises an NLS at the N-terminal and/or C-terminal of the base excising domain. In some embodiments, the NLS is or comprises a SV40 NLS, a bpSV40 NLS (e.g., SEQ ID NO: 10 or 11) , or a NP NLS (Xenopus laevis Nucleoplasmin NLS, nucleoplasmin NLS) .
- Additional NLS suitable for the disclosure or the way of linking an NLS to any of the components of the fusion protein of the disclosure include, for example, a linker of SGGS, or those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
- the components (e.g., the napDNAbp and the base excising domain, the NLS and the napDNAbp, or the NLS and the base excising domain) of the fusion protein are fused to each other with or without a linker.
- Suitable linkers include, for example, SGGS, the linker of SEQ ID NO: 134, and those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
- the fusion protein comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 12, 14, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 55, 57, 59, 61, 63, 136, 138, 140, 142, 143, 145, 147, 149, 151, 153, 154, 156, 158, 160, 161, 162, and 164.
- 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,
- the fusion protein of the disclosure may be used in combination with a deaminase domain for various purposes, e.g., improved outcome purity.
- purity it means the percentage /proportion of an outcome among all possible outcomes.
- purity of dT means the percentage /proportion of dT as an outcome among all possible outcomes including, for example, dA, dT, dG, and dC.
- the introduction of a deaminase domain may contribute to further conversion of an undesired deoxyribonucleotide as a byproduct (e.g., dC) to a desired deoxyribonucleotide (e.g., dT) by A-to-T base editing.
- dG-to-dT a byproduct
- the target dG is converted, in part, to dC by the base editing without deamination as described herein
- the dC is converted to dT by the C-to-T base editing with deamination, thereby achieving high purity dG-to-dT.
- the fusion protein further comprises a deaminase domain.
- the deaminase domain may be fused to a component of the fusion protein without or with a linker as described herein.
- adenine deaminases are known in the art, including, for example, those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
- Representative adenine deaminases include, for example, TadA and homologs and variants thereof, and APOBEC and homologs and variants thereof.
- the deaminase domain is a deaminase domain (substantially) capable of deaminating adenine, guanine, hypoxanthine, cytidine, thymine, and/or uracil. In some embodiments, the deaminase domain is an adenine deaminase domain or a cytosine deaminase domain.
- the deaminase domain comprises a tRNA adenosine deaminase (TadA) or a functional variant or fragment thereof, e.g., TadA8e (SEQ ID NO: 3) , TadA8.17, TadA8.20, TadA9, TadA8E V106W , TadA8E V106W+D108Q TadA-CDa, TadA-CDb, TadA-CDc, TadA-CDd, TadA-CDe, TadA-dual, T AD AC-1.2, T AD AC-1.14, T AD AC-1.17, T AD AC-1.19, T AD AC-2.5, T AD AC-2.6, T AD AC-2.9, T AD AC-2.19, T AD AC-2.23, TadA8e-N46L, TadA8e-N46P.
- TadA tRNA adenosine deaminase
- TadA tRNA adeno
- the deaminase domain comprises an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase, an activation induced deaminase (AID) , a cytidine deaminase 1 from Petromyzon marinus (pmCDA1) , or a functional variant or fragment thereof, e.g., APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H.
- APOBEC apolipoprotein B mRNA-editing complex
- AID activation induced deaminase
- APOBEC1 a functional variant or fragment thereof, e.g., APOBEC1, APOBEC2, APOBEC3A, APOBEC3
- the protospacer sequence comprises about, at least about, or at most about 14 contiguous nucleotides of the target dsDNA, e.g., about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides on the nontarget strand of the target dsDNA, or in a numerical range between any two of the preceding values, e.g., from about 16 to about 50, or from about 17 to about 22 contiguous nucleotides of the target dsDNA.
- the protospacer sequence comprises about 20, 30, or 50 contiguous nucleotides of the target
- the protospacer sequence is a stretch of about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides on the nontarget strand of the target dsDNA, or a stretch of contiguous nucleotides on the nontarget strand of the target dsDNA in a numerical range between any two of the preceding values, e.g., a stretch of from about 16 to about 50 contiguous nucleotides.
- the protospacer sequence is a stretch of about 20, 30, or 50 contiguous nucleotides on the nontarget strand of
- the protospacer sequence is immediately 5’ or 3’ to a protospacer adjacent motif (PAM) comprises sequence 5’-NN-3’ , 5’-NNN-3’ , 5’-NNNN-3’ , 5’-NNNNN-3’ , or 5’-NNNNNN-3’ , wherein N is A, T, G, or C.
- the protospacer sequence is immediately 5’ to a protospacer adjacent motif (PAM) comprises sequence 5’-NGG-3’ or 5’-NTN-3’ , wherein N is A, T, G, or C.
- the protospacer sequence is immediately 3’ to a protospacer adjacent motif (PAM) comprises sequence 5’-TTN-3’ , wherein N is A, T, G, or C.
- the guide sequence is in a length of about, at least about, or at most about 14 nucleotides, e.g., about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more nucleotides, or in a length of nucleotides in a numerical range between any two of the preceding values, e.g., in a length of from about 16 to about 50 nucleotides. In some embodiments, the guide sequence is in a length of about 20, 30, or 50 nucleotides.
- the guide sequence is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% (fully) , optionally about 100% (fully) , reversely complementary to the target sequence; (2) the guide sequence contains no more than 5, 4, 3, 2, or 1 mismatch or contains no mismatch with the target sequence; or (3) the guide sequence comprises no mismatch with the target sequence in the first 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 nucleotides at the 5’ end of the guide sequence when the PAM is immediately 5’ to the protospacer sequence or at the
- the guide sequence contains 1 mismatch with the target sequence. In some embodiments, the guide sequence is about 98%reversely complementary to the target sequence. In some embodiments, the 1 mismatch in the guide sequence is at a position corresponding the nucleotide of the target sequence that is intended to be substituted.
- the guide sequence comprises (1) a sequence of SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) .
- the guide sequence comprises a sequence of any one of SEQ ID NOs: SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) .
- the disclosure provides a guide nucleic acid comprising a guide sequence as described herein and a scaffold sequence capable of forming a complex with a napDNAbd.
- the scaffold sequence and the napDNAbd may be as described herein.
- the scaffold sequence is compatible with the napDNAbd of the disclosure and is capable of complexing with the napDNAbd.
- the scaffold sequence may be a naturally occurring scaffold sequence identified along with the napDNAbd, or a variant thereof maintaining the ability to complex with the napDNAbd.
- the ability to complex with the napDNAbd is maintained as long as the secondary structure of the variant is substantially identical to the secondary structure of the naturally occurring scaffold sequence.
- a nucleotide deletion, insertion, or substitution in the primary sequence of the scaffold sequence may not necessarily change the secondary structure of the scaffold sequence (e.g., the relative locations and/or sizes of the stems, bulges, and loops of the scaffold sequence do not significantly deviate from that of the original stems, bulges, and loops) .
- the nucleotide deletion, insertion, or substitution may be in a bulge or loop region of the scaffold sequence so that the overall symmetry of the bulge and hence the secondary structure remains largely the same.
- the nucleotide deletion, insertion, or substitution may also be in the stems of the scaffold sequence so that the lengths of the stems do not significantly deviate from that of the original stems (e.g., adding or deleting one base pair in each of two stems correspond to 4 total base changes) .
- the scaffold sequence is 5’ or 3’ to the guide sequence.
- the scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 40, 73, or 74.
- the scaffold sequence comprises (1) a sequence of SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 40, 73, or 74.
- the scaffold sequence comprises the sequence of SEQ ID NO: 40
- TLS Translesion synthesis
- the fusion protein of the disclosure may be used in combination with a translesion synthesis (TLS) polymerase for improved outcome purity.
- TLS translesion synthesis
- purity it means the percentage /proportion of an outcome among all possible outcomes.
- purity of dT means the percentage /proportion of dT as an outcome among all possible outcomes including, for example, dA, dT, dG, and dC.
- TLS polymerases may have their own inclination of incorporating various deoxyribonucleotide opposite an abasic site during polymerization, as listed in Table 5. By taking advantage of such inclination, the base editing outcome may be intentionally controlled to improve outcome purity.
- human Pol ⁇ (SEQ ID NO: 118) is a TLS polymerase preferentially incorporating dA opposite an abasic site.
- the base editing outcome may be adjusted toward dT, thereby increasing purity of dT product.
- the fusion protein or system of the disclosure further comprises a translesion synthesis (TLS) polymerase or a recruiting domain or component capable of recruiting a TLS polymerase.
- TLS translesion synthesis
- the TLS polymerase or the recruiting domain or component is fused to a component of the fusion protein without or with a linker as described herein.
- Non-limiting examples of the TLS polymerase is selected from the group consisting of Pol ⁇ (alpha) , Pol ⁇ (beta) , Pol ⁇ (delta) (PCNA) , Pol ⁇ (gamma) , Pol ⁇ (eta) , Pol ⁇ (iota) , Pol ⁇ (kappa) , Pol ⁇ (lamda) , Pol ⁇ (mu) , Pol ⁇ (nu) , Pol ⁇ (theta) , and REV1.
- the TLS polymerase comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 118.
- the TLS polymerase comprises the amino acid sequence of SEQ ID NO: 118 (Pol ⁇ ) .
- the fusion protein or system further comprising the translesion synthesis (TLS) polymerase or a recruiting domain capable of recruiting a TLS polymerase leads to conversion of the first deoxyribonucleotide to dG, dC, dT, or dA.
- TLS translesion synthesis
- the disclosure provides an MPG described herein, or of the disclosure.
- the MPG comprises a motif GxxYxxxxYGxxxxxN, wherein x represents any amino acid.
- the MPG is obtained from a species selected from Table A.
- the MPG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference MPG.
- the wild type or reference MPG is human MPG (SEQ ID NO: 9) or an MPG obtained from a species selected from Table A or any MPG as set forth in Table D or a homology or mutant (e.g., comprising an amino acid sequence of SEQ ID NO: 5, 6, or 7) thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) (e.g., SEQ ID NO: 1) .
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
- the amino acid mutation confers an ability to excise a base on the MPG.
- the base is guanine.
- the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control MPG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of G163, N169, D175, C178, S198, K202, G203, S206, K210, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO:1.
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
- the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference MPG.
- the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
- the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference MPG.
- 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
- the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1.
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
- the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, wherein the position is numbered according to SEQ ID NO: 1.
- the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 1 or 9.
- the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
- the MPG is (substantially) capable of excising guanine of dG. In some embodiments, the MPG is (substantially) incapable of excising thymine of dT. In some embodiments, the MPG is (substantially) incapable of excising cytosine of dC. In some embodiments, the MPG is (substantially) incapable of excising adenine of dA.
- the disclosure provides a fusion protein comprising the MPG described herein, or of the disclosure, and a functional domain, such as, a napDNAbd.
- the disclosure provides use of the MPG described herein, or of the disclosure, for base editing as described herein.
- the MPG is not wild type or naturally-occurring.
- the disclosure provides an UNG described herein, or of the disclosure.
- the UNG comprises a motif GQDPYH. In some embodiments, the UNG is obtained from a species selected from Table C. In some embodiments, the UNG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference UNG. In some embodiments, the wild type or reference UNG is human UNG1 (SEQ ID NO: 54) or human UNG2 (SEQ ID NO: 133) or an UNG obtained from a species selected from Table C or any UNG as set forth in Table D or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) . In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 56, 58, 135, or 137.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
- UNG1 SEQ ID NO: 54
- UNG2 SEQ ID NO: 133
- residues Y156 and N213 of UNG2 are corresponding to the residues Y147 and N204 of UNG1, respectively.
- the amino acid mutation confers an ability to excise a base on the UNG.
- the base is thymine.
- the base is cytosine.
- the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control UNG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of Y156, K184, N213, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO:133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135 or 137.
- the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference UNG.
- the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of Y156A, K184A, N213D, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
- the amino acid mutation comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
- the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
- the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference UNG.
- 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%
- the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of A214T, Q259A, and Y284D, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
- the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of K184A and A214V, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
- the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
- M N-terminal Methionine
- the UNG is (substantially) capable of excising thymine of dT. In some embodiments, the UNG is (substantially) capable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising thymine of dT. In some embodiments, the UNG is (substantially) incapable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising adenine of dA. In some embodiments, the UNG is (substantially) incapable of excising guanine of dG.
- the disclosure provides a fusion protein comprising the UNG described herein, or of the disclosure, and a functional domain, such as, a napDNAbd.
- the disclosure provides use of the UNG described herein, or of the disclosure, for base editing as described herein.
- the UNG is not wild type or naturally-occurring.
- the disclosure provides a TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 described herein, or of the disclosure.
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
- the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is human TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 (SEQ ID NO: 64, 65, 66, 67, 68, 69, 70, 71, or 72, respectively) or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) .
- Method N-terminal Methionine
- the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 64-72, respectively.
- 60% e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99
- the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
- the amino acid mutation confers an ability to excise a base on the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
- the base is guanine, thymine, cytosine, adenine, uracil, or hypoxanthine.
- the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
- the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1.
- the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1, respectively..
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising guanine of dG. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising adenine of dA.
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising adenine of dA. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising guanine of dG.
- the disclosure provides a fusion protein comprising the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 described herein, or of the disclosure, and a functional domain, such as, a napDNAbd.
- the disclosure provides use of the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 described herein, or of the disclosure, for base editing as described herein.
- the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is not wild type or naturally-occurring.
- Also provided in the disclosure is a polynucleotide comprising or encoding the guide nucleic acid.
- the polynucleotide comprising or encoding the guide nucleic acid is a DNA, a RNA, or a DNA/RNA mixture.
- DNA/RNA mixture it refers to a nucleic acid comprising both one or more modified or unmodified ribonucleotides and one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
- DNA or RNA it may also refer to a DNA containing one or more modified or unmodified ribonucleotides, whether consecutive or not, or an RNA containing one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
- the guide nucleic acid is operably linked to or under the regulation of a promoter.
- the promoter is a ubiquitous, tissue-specific, cell-type specific, constitutive, or inducible promoter.
- Suitable promoters include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a ⁇ -actin promoter, an elongation factor 1 ⁇ short (EFS) promoter, a ⁇ glucuronidase (GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and/or promoter, a chicken ⁇ -actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1 ⁇ -subunit (EF1 ⁇
- the disclosure provides a polynucleotide encoding the fusion protein of the disclosure and optionally the guide nucleic acid of the disclosure.
- the polynucleotide encoding the fusion protein is a DNA, a RNA, or a DNA/RNA mixture.
- DNA/RNA mixture it refers to a nucleic acid comprising both one or more modified or unmodified ribonucleotides and one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
- DNA or RNA it may also refer to a DNA containing one or more modified or unmodified ribonucleotides, whether consecutive or not, or an RNA containing one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
- the polynucleotide encoding the napDNAbd is a mRNA.
- the polynucleotide encoding the napDNAbd comprises a sequence encoding the napDNAbd and a promoter operably linked to the sequence encoding the napDNAbd.
- the polynucleotide encoding the napDNAbd is operably linked to or under the regulation of a promoter.
- the promoter is a ubiquitous, tissue-specific, cell-type specific, constitutive, or inducible promoter.
- Suitable promoters include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a ⁇ -actin promoter, an elongation factor 1 ⁇ short (EFS) promoter, a ⁇ glucuronidase (GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and/or promoter, a chicken ⁇ -actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1 ⁇ -subunit (EF1 ⁇
- the disclosure provides a delivery system comprising (1) the fusion protein of the disclosure, the polynucleotide of the disclosure, or the system of the disclosure; and (2) a delivery vehicle.
- the disclosure provides a vector comprising the polynucleotide of the disclosure.
- the vector encodes a guide nucleic acid of the disclosure.
- the vector is a plasmid vector, a recombinant AAV (rAAV) vector (vector genome) , or a recombinant lentivirus vector.
- the disclosure provides a recombinant AAV (rAAV) particle comprising the rAAV vector genome of the disclosure.
- a simple introduction of AAV for delivery may refer to “Adeno-associated Virus (AAV) Guide” (addgene. org/guides/aav/) .
- Adeno-associated virus when engineered to delivery, e.g., a protein-encoding sequence of interest, may be termed as a (r) AAV vector, a (r) AAV vector particle, or a (r) AAV particle, where “r” stands for “recombinant” .
- the genome packaged in AAV vectors for delivery may be termed as a (r) AAV vector genome, vector genome, or vg for short, while viral genome may refer to the original viral genome of natural AAVs.
- the serotypes of the capsids of rAAV particles can be matched to the types of target cells.
- Table 2 of WO2018002719A1 lists exemplary cell types that can be transduced by the indicated AAV serotypes (incorporated herein by reference) .
- the rAAV particle comprising a capsid with a serotype suitable for delivery into a desired target cell.
- the rAAV particle comprising a capsid with a serotype of AAV1, AAV2, AAV3A, AAV3B, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV-DJ, or AAV. PHP. eB, a member of the Clade to which any of the AAV1-AAV13 belong, or a functional variant (e.g., a functional truncation) thereof, encapsidating the rAAV vector genome.
- the serotype of the capsid is wild type serotype or a functional variant thereof.
- rAAV particles may be produced using the triple transfection method (described in detail in U.S. Pat. No. 6,001,650) .
- the vector titers are usually expressed as vector genomes per ml (vg/ml) .
- the vector titer is above 1 ⁇ 10 9 , above 5 ⁇ 10 10 , above 1 ⁇ 10 11 , above 5 ⁇ 10 11 , above 1 ⁇ 10 12 , above 5 ⁇ 10 12 , or above 1 ⁇ 10 13 vg/ml.
- RNA sequence as a vector genome into a rAAV particle
- systems and methods of packaging an RNA sequence as a vector genome into a rAAV particle is recently developed and applicable herein. See PCT/CN2022/075366, which is incorporated herein by reference in its entirety.
- sequence elements described herein for DNA vector genomes when present in RNA vector genomes, should generally be considered to be applicable for the RNA vector genomes except that the deoxyribonucleotides in the DNA sequence are the corresponding ribonucleotides in the RNA sequence (e.g., dT is equivalent to U, and dA is equivalent to A) and/or the element in the DNA sequence is replaced with the corresponding element with a corresponding function in the RNA sequence or omitted because its function is unnecessary in the RNA sequence and/or an additional element necessary for the RNA vector genome is introduced.
- dT is equivalent to U
- dA is equivalent to A
- a coding sequence e.g., as a sequence element of rAAV vector genomes herein, is construed, understood, and considered as covering and covers both a DNA coding sequence and an RNA coding sequence.
- an RNA sequence can be transcribed from the DNA coding sequence, and optionally further a protein can be translated from the transcribed RNA sequence as necessary.
- the RNA coding sequence per se can be a functional RNA sequence for use, or an RNA sequence can be produced from the RNA coding sequence, e.g., by RNA processing, or a protein can be translated from the RNA coding sequence.
- a fusion protein coding sequence encoding a fusion protein covers either a fusion protein DNA coding sequence from which a fusion protein is expressed (indirectly via transcription and translation) or a fusion protein RNA coding sequence from which a fusion protein is translated (directly) .
- a gRNA coding sequence encoding a gRNA covers either a gRNA DNA coding sequence from which a gRNA is transcribed or a gRNA RNA coding sequence (1) which per se is the functional gRNA for use, or (2) from which a gRNA is produced, e.g., by RNA processing.
- 5’-ITR and/or 3’-ITR as DNA packaging signals may be unnecessary and can be omitted at least partly, while RNA packaging signals can be introduced.
- a promoter to drive transcription of DNA sequences may be unnecessary and can be omitted at least partly.
- a sequence encoding a polyA signal may be unnecessary and can be omitted at least partly, while a polyA tail can be introduced.
- DNA elements of rAAV DNA vector genomes can be either omitted or replaced with corresponding RNA elements and/or additional RNA elements can be introduced, in order to adapt to the strategy of delivering an RNA vector genome by rAAV particles.
- the disclosure provides a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
- a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
- the disclosure provides a ribonucleoprotein (RNP) comprising the fusion protein of the disclosure and a guide nucleic acid of the disclosure.
- RNP ribonucleoprotein
- the disclosure provides a lipid nanoparticle (LNP) comprising an RNA (e.g., mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid of the disclosure.
- LNP lipid nanoparticle
- the disclosure provides a pharmaceutical composition
- a pharmaceutical composition comprising (1) the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, or the cell of the disclosure; and (2) a pharmaceutically acceptable excipient.
- the pharmaceutical composition comprises the rAAV particle in a concentration selected from the group consisting of about 1 ⁇ 10 10 vg/mL, 2 ⁇ 10 10 vg/mL, 3 ⁇ 10 10 vg/mL, 4 ⁇ 10 10 vg/mL, 5 ⁇ 10 10 vg/mL, 6 ⁇ 10 10 vg/mL, 7 ⁇ 10 10 vg/mL, 8 ⁇ 10 10 vg/mL, 9 ⁇ 10 10 vg/mL, 1 ⁇ 10 11 vg/mL, 2 ⁇ 10 11 vg/mL, 3 ⁇ 10 11 vg/mL, 4 ⁇ 10 11 vg/mL, 5 ⁇ 10 11 vg/mL, 6 ⁇ 10 11 vg/mL, 7 ⁇ 10 11 vg/mL, 8 ⁇ 10 11 vg/mL, 9 ⁇ 10 11 vg/mL, 1 ⁇ 10 12 vg/mL, 2 ⁇ 10 12 vg/mL, 3 ⁇ 10 12 vg/
- the volume of the injection is selected from the group consisting of about 1 microliter, 10 microliters, 50 microliters, 100 microliters, 150 microliters, 200 microliters, 250 microliters, 300 microliters, 350 microliters, 400 microliters, 450 microliters, 500 microliters, 550 microliters, 600 microliters, 650 microliters, 700 microliters, 750 microliters, 800 microliters, 850 microliters, 900 microliters, 950 microliters, 1000 microliters, and a volume of a numerical range between any of two preceding values, e.g., in a concentration of from about 10 microliters to about 750 microliters.
- the methods of the disclosure can be used to introduce the systems of the disclosure into a cell and cause the cell to alter the production of one or more cellular produces, such as antibody, starch, ethanol, or any other desired products. Such cells and progenies thereof are within the scope of the disclosure.
- the disclosure provides a cell or a progeny thereof comprising the system of the disclosure.
- the cell is a eukaryote.
- the cell is a human cell.
- the disclosure provides a cell or a progeny thereof modified by the system of the disclosure or the method of the disclosure.
- the cell is a eukaryote.
- the cell is a human cell.
- the cell is modified in vitro, in vivo, or ex vivo.
- the cell is a stem cell. In some embodiments, the cell is not a human embryonic stem cell. In some embodiments, the cell is not a human germ cell.
- the cell is a prokaryotic cell.
- the cell is a eukaryotic cell (e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant cell, or a yeast cell) or a prokaryotic cell (e.g., a bacteria cell) .
- a eukaryotic cell e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant cell, or a yeast cell
- a prokaryotic cell e.g., a bacteria cell
- the cell is from a plant or an animal.
- the plant is a dicotyledon.
- the dicotyledon is selected from the group consisting of soybean, cabbage (e.g., Chinese cabbage) , rapeseed, brassica, watermelon, melon, potato, tomato, tobacco, eggplant, pepper, cucumber, cotton, alfalfa, eggplant, grape.
- the plant is a monocotyledon.
- the monocotyledon is selected from the group consisting of rice, corn, wheat, barley, oat, sorghum, millet, grasses, Poaceae, Zizania, Avena, Coix, Hordeum, Oryza, Panicum (e.g., Panicum miliaceum) , Secale, Setaria (e.g., Setaria italica) , Sorghum, Triticum, Zea, Cymbopogon, Saccharum (e.g., Saccharum officinarum) , Phyllostachys, Dendrocalamus, Bambusa, Yushania.
- the animal is selected from the group consisting of pig, ox, sheep, goat, mouse, rat, alpaca, monkey, rabbit, chicken, duck, goose, fish (e.g., zebra fish) .
- the cell is a eukaryotic cell, such as a mammalian cell, including a human cell (a primary human cell or an established human cell line) .
- the cell is a non-human mammalian cell, such as a cell from a non-human primate (e.g., monkey) , a cow /bull /cattle, sheep, goat, pig, horse, dog, cat, rodent (such as rabbit, mouse, rat, hamster, etc. ) .
- the cell is from fish (such as salmon) , bird (such as poultry bird, including chick, duck, goose) , reptile, shellfish (e.g., oyster, claim, lobster, shrimp) , insect, worm, yeast, etc.
- the cell is from a plant, such as monocot or dicot.
- the plant is a food crop such as barley, cassava, cotton, groundnuts or peanuts, maize, millet, oil palm fruit, potatoes, pulses, rapeseed or canola, rice, rye, sorghum, soybeans, sugar cane, sugar beets, sunflower, and wheat.
- the plant is a cereal (barley, maize, millet, rice, rye, sorghum, and wheat) .
- the plant is a tuber (cassava and potatoes) .
- the plant is a sugar crop (sugar beets and sugar cane) .
- the plant is an oil-bearing crop (soybeans, groundnuts or peanuts, rapeseed or canola, sunflower, and oil palm fruit) .
- the plant is a fiber crop (cotton) .
- the plant is a tree (such as a peach or a nectarine tree, an apple or pear tree, a nut tree such as almond or walnut or pistachio tree, or a citrus tree, e.g., orange, grapefruit or lemon tree) , a grass, a vegetable, a fruit, or an algae.
- the plant is a nightshade plant; a plant of the genus Brassica; a plant of the genus Lactuca; a plant of the genus Spinacia; a plant of the genus Capsicum;
- the cell is not within the body of an organism, such as, human or animal. In some embodiments, the cell is not a human embryonic stem cell. In some embodiments, the cell is not a human germ cell.
- the disclosure provides a method for modifying a target dsDNA, comprising contacting the target DNA with the system of the disclosure, wherein the guide sequence is capable of hybridizing to a target sequence of the target dsDNA, wherein the target dsDNA is modified by the complex.
- the disclosure provides a method for diagnosing, preventing, and/or treating a disease in a subject in need thereof, comprising administering to the subject (e.g., a therapeutically effective amount /dose of) the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, the cell of the disclosure, or the pharmaceutical composition of the disclosure, wherein the disease is associated with a target dsDNA, wherein the guide sequence is capable of hybridizing to a target sequence of the target dsDNA, wherein the target dsDNA is modified by the complex, and wherein the modification of the target dsDNA diagnose, prevents, and/or treats the disease.
- the subject e.g., a therapeutically effective amount /dose of
- the disease is selected from the group consisting of Angelman syndrome (AS) , Alzheimer's disease (AD) , transthyretin amyloidosis (ATTR) , transthyretin amyloid cardiomyopathy (ATTR-CM) , cystic fibrosis (CF) , hereditary angioedema, diabetes, progressive pseudohypertrophic muscular dystrophy, Duchenne muscular dystrophy (DMD) , Becker muscular dystrophy (BMD) , spinal muscular atrophy (SMA) , alpha-1-antitrypsin deficiency, Pompe disease, myotonic dystrophy, Huntington’s disease (HTT) , fragile X syndrome, Friedreich ataxia, amyotrophic lateral sclerosis (ALS) , frontotemporal dementia, hereditary chronic kidney disease, hyperlipidemia, Leber congenital amaurosis (LCA) , sickle cell disease, thalassemia (e.g., ⁇ -thalassemia)
- the target dsDNA encodes a mRNA, a tRNA, a ribosomal RNA (rRNA) , a microRNA (miRNA) , a non-coding RNA, a long non-coding (lnc) RNA, a nuclear RNA, an interfering RNA (iRNA) , a small interfering RNA (siRNA) , a ribozyme, a riboswitch, a satellite RNA, a microswitch, a microzyme, or a viral RNA.
- iRNA interfering RNA
- siRNA small interfering RNA
- the target dsDNA is a eukaryotic DNA.
- the eukaryotic DNA is a mammal DNA, such as a non-human mammalian DNA, a non-human primate DNA, a human DNA, a plant DNA, an insect DNA, a bird DNA, a reptile DNA, a rodent (e.g., mouse, rat) DNA, a fish DNA, a nematode DNA, or a yeast DNA.
- a mammal DNA such as a non-human mammalian DNA, a non-human primate DNA, a human DNA, a plant DNA, an insect DNA, a bird DNA, a reptile DNA, a rodent (e.g., mouse, rat) DNA, a fish DNA, a nematode DNA, or a yeast DNA.
- the target dsDNA is in a eukaryotic cell, for example, a human cell, a non-human primate cell, or a mouse cell.
- the administrating comprises local administration or systemic administration.
- the administrating comprises intrathecal administration, intramuscular administration, intravenous administration, transdermal administration, intranasal administration, oral administration, mucosal administration, intraperitoneal administration, intracranial administration, intracerebroventricular administration, or stereotaxic administration.
- the administration is injection or infusion.
- the subject is a human, a non-human primate, or a mouse.
- the level of the transcript (e.g., mRNA) of the target dsDNA is decreased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the transcript (e.g., mRNA) of the target dsDNA in the subject prior to the administration.
- the level of the transcript (e.g., mRNA) of the target dsDNA is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the transcript (e.g., mRNA) of the target dsDNA in the subject prior to the administration.
- the level of the expression product (e.g., protein) of the target dsDNA is decreased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the expression product (e.g., protein) of the target dsDNA in the subject prior to the administration.
- the level of the expression product (e.g., protein) of the target dsDNA is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the expression product (e.g., protein) of the target dsDNA in the subject prior to the administration.
- the expression product is a functional mutant of the expression product of the target dsDNA.
- the median survival of the subject suffering from the disease but receiving the administration is 5 days, 10 days, 20 days, 30 days, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 12 months, 1.5 year, 2 years, 2.5 years, 3 years, 4 years, 5 years, 6 years, 7 years, 8 years, 9 years, 10 years or more longer than that of a subject or a population of subjects suffering from the disease and not receiving the administration.
- the therapeutically effective dose may be either via a single dose, or multiple doses.
- the actual dose may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.
- the therapeutically effective dose of the rAAV particle may be about 1.0E+8, 2.0E+8, 3.0E+8, 4.0E+8, 6.0E+8, 8.0E+8, 1.0E+9, 2.0E+9, 3.0E+9, 4.0E+9, 6.0E+9, 8.0E+9, 1.0E+10, 2.0E+10, 3.0E+10, 4.0E+10, 6.0E+10, 8.0E+10, 1.0E+11, 2.0E+11, 3.0E+11, 4.0E+11, 6.0E+11, 8.0E+11, 1.0E+12, 2.0E+12, 3.0E+12, 4.0E+12, 6.0E+12, 8.0E+12, 1.0E+13, 2.0E+13, 3.0E+13, 4.0E+13, 6.0E+13, 8.0E+13, 1.0E+14, 2.0E+14, 3.0E+14, 4.0E+14, 6.0E+14, 8.0E+14, 1.0E+15, 2.0E+15, 2.0
- the disclosure provides a method of detecting a target dsDNA, comprising contacting the target dsDNA with the system of the disclosure, wherein the target dsDNA is modified by the complex, and wherein the modification detects the target dsDNA.
- the modification generates a detectable signal, e.g., a fluorescent signal.
- the disclosure provides a kit comprising the fusion protein of the disclosure, the system of the disclosure, the polynucleotide of the disclosure, the vector of the disclosure, the RNP of the disclosure, the LNP of the disclosure, the delivery system of the disclosure, the cell of the disclosure, or the pharmaceutical composition of the disclosure, or any one, two, or all components of the same.
- the kit further comprises an instruction to use the component (s) contained therein, and/or instructions for combining with additional component (s) that may be available or necessary elsewhere.
- the kit further comprises one or more buffers that may be used to dissolve any of the component (s) contained therein, and/or to provide suitable reaction conditions for one or more of the component (s) .
- buffers may include one or more of PBS, HEPES, Tris, MOPS, Na 2 CO 3 , NaHCO 3 , NaB, or combinations thereof.
- the reaction condition includes a proper pH, such as a basic pH. In some embodiments, the pH is between 7-10.
- any one or more of the kit components may be stored in a suitable container or at a suitable temperature, e.g., 4 degree Celsius.
- the G-to-T reporter was to be used for evaluating the guanine base editing efficiency of the gBE (for this purpose, termed as glycosylase-based guanine base editor (gGBE) ) , and the A-to-T reporter for evaluating the adenine base editing efficiency of the gBE (for this purpose, termed as glycosylase-based adenine base editor (gABE) ) .
- gBE expression vectors with wild type human MPG (SEQ ID NO: 1) (without first N-terminal Methionine (M) as compared to the full length wild type human MPG (SEQ ID NO: 9) ) or a distinctive version of human MPG mutants (i.e., MPGv0.2 (SEQ ID NO: 4) , MPGv1 (SEQ ID NO: 5) , MPGv2 (SEQ ID NO: 6) , and MPGv3 (SEQ ID NO: 7) ) that were reported in a previous study [11] were constructed.
- MPGv0.2 SEQ ID NO: 4
- MPGv1 SEQ ID NO: 5
- MPGv2 SEQ ID NO: 6
- MPGv3 SEQ ID NO: 7
- conversion activity refers to the activity of the gBE of the disclosure to convert a target deoxyribonucleotide to an outcome deoxyribonucleotide, and the outcome deoxyribonucleotide may be or may not be specified as a specific type of deoxyribonucleotide, e.g., G-to-T.
- (base) editing efficiency refers to the activity of the gBE of the disclosure to convert a target deoxyribonucleotide to an outcome deoxyribonucleotide, and the outcome deoxyribonucleotide may be or may not be specified as a specific type of deoxyribonucleotide, e.g., G-to-T.
- both the outcome deoxyribonucleotides for conversion activity and (base) editing efficiency are not specified, or both the outcome deoxyribonucleotides for conversion activity and (base) editing efficiency are specified as the same one or more specific types of deoxyribonucleotide, they may refer to the same performance of the gBE of the disclosure and can be used interchangeably.
- the gBE (hereafter referred to as gGBEv3) (SEQ ID NO: 15) containing MPGv3 (SEQ ID NO: 7) exhibited the highest G-to-T base editing efficiency (4.33%) in cultured HEK293T cells (FIG. 1c) as compared to that of the gGBE (SEQ ID NO: 14) with WT hMPG (SEQ ID NO: 1) (0.03%) , showing a striking 144-fold enhancement in G base editing efficiency.
- gGBEv3 was mutated with G174R or D175R, generating new base editors gGBEv3.1 (SEQ ID NO: 17) (containing MPGv3 carrying additional substitution G174R; termed as MPGv3.1 (SEQ ID NO: 16) ) and gGBEv3.2 (SEQ ID NO: 19) (containing MPGv3 carrying additional substitution D175R; termed as MPGv3.2 (SEQ ID NO: 18) ) (FIG. 2a) . It was found that the G-to-T conversion activity of gGBEv3.2 (10.22%) was about 1.78-fold of gGBEv3 (5.73%) (FIG. 2b and FIG. 11b) .
- gGBEv4 (SEQ ID NO: 23) (containinig MPGv3 carrying additional substitutions of both D175R and C178N; termed as MPGv4 (SEQ ID NO: 22) ) (39.57%) achieved a synergistic enhancement of G-to-T editing efficiency of about 6.9-fold compared to gGBEv3 (5.73%) (FIG. 2b and FIG. 11b) .
- nCas9 and MPG were changed to see whether the editing effeiciency would be associated with the positional relationship of nCas9 and MPG. It was found that gGBEv4 with MPG fused at the C-terminus of nCas9 had slightly higher editing efficiency than gGBEv4 with MPG fused at the N-terminus of nCas9 (34.6%vs. 25.9%, FIG. 11d) .
- the R163-V179 region of MPGv4 was mutated by sequential replacement with amino acids having distinct properties, including glutamic acid (with negative charged side chain) , valine (with small hydrophobic side chain) , glycine (with no side chain) , or tyrosine (with large hydrophobic side chain) (X-to-E, V, G, or Y) .
- gGBEv4.1 (SEQ ID NO: 25) (containing MPGv4 carrying additional substitution I170V; termed as MPGv4.1 (SEQ ID NO: 24) )
- gGBEv4.2 (SEQ ID NO: 27) (containing MPGv4 carrying additional substitution S169G (or N169G if compared with WT MPG) ; termed as MPGv4.2 (SEQ ID NO: 26) )
- gGBEv4.3 SEQ ID NO: 29
- MPGv4.3 (SEQ ID NO: 28)
- amino acid sequences of gGBEv5.1, v5.2, v6.1, v6.2, and v6.4 are set forth in SEQ ID NOs: 31, 33, 35, 37, and 39, respectively, and the corerspoinding MPGv5.1, v5.2, v6.1, v6.2, and v6.4 are set forth in SEQ ID NOs: 30, 32, 34, 36, and 38, respectively.
- the enhancement of G editing efficiency of the gGBEs (v3, v4, v4.2, v6.3, and gGBE with WT MPG) obtained above was next validated at two endogenous genomic sites in cultured HEK293T cells.
- the cells were transfected with a construct encoding each gGBE, together with mCherry and sgRNA that targeted site 3 or site 10, and mCherry + cells were FACS-sorted for target deep sequencing analysis.
- a gradual elevation of overall G editing efficiency was obtained at G7 from 6.4%to 78.5%for site 3, and from 7.5%to 80.3% for site 10, respectively (FIG. 2c and 2d) , confirming that gGBEv6.3 was indeed the best version of gGBE.
- the engineered gGBEv6.3 (SEQ ID NO: 12) (carrying G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R mutations on MPGv6.3 (SEQ ID NO: 8) ) had the highest G editing efficiency and was used in the following studies.
- the guide sequence-dependent off-target base editing efficiency of gGBEv6.3 was analyzed at several previously reported [11, 35] and in silico-predicted [37] guide sequence-dependent off-target sites, and the ability of gGBEv6.3 to mediate guide sequence-independent off-target DNA editing was characterized using orthogonal R-loop assay in five dSaCas9 R-loops, as reported in previous studies [11, 35] . Similar or lower percentage of editing at the guide sequence-dependent off-target loci (FIG. 3d and FIG. 15a) was found, as compared with that of adenine base editors found previously [11, 35] .
- gGBEv6.3 was also tested with A-to-T reporter, C-to-G reporter, and T-to-G reporter, and the editing efficiency was 0.68%, 0.58%, and 1.81%, respectively, demonstrating its specificity for G editing, which is desired for targeted base editing application and reduced unwanted off-target editing.
- the G-to-Y conversion ability of gGBE allows for a variety of gene-editing applications, including editing splicing sites, introduction of premature termination codons (PTCs) , as well as editing that bypasses PTCs (FIG. 4a) .
- the inactive splicing acceptor (SA) signal with disruptive point mutations exemplified by the intron-split EGFP reporter system used above, could be remediated with gGBE (FIG. 1b) .
- SA inactive splicing acceptor
- gGBE could be used for disrupting the splicing signal by converting G within a splicing donor site ( “GT” ) or splicing acceptor site ( “AG” ) to other bases, resulting in exon skipping.
- the splice acceptor site of DMD (Duchenne muscular dystrophy) exon 45 was edited with gGBEv6.3, and a high efficiency of G editing (up to 30.3%) was achieved with a high G-to-Y ratio (up to 0.88) when targeting DMD site 1 (FIG. 4b, c and FIG. 16a) .
- Another application of gGBE is to introduce PTCs to disrupt gene expression by converting TCA, TAC, or GAA codon into a stop codon TGA, TAG, or TAA.
- PTC by GAA to TAA conversion could be introduced only by using gGBEv6.3, no other current base editor could induce this type of PTC.
- gGBEv6.3 By targeting three sites in the mouse Tyr (Tyrosinase, associated with coat color) gene with gGBEv6.3 to create PTCs (FIG. 4d) in cultured N2a cells, a high efficiency of G editing (up to 46.3%) was achieved with a high G-to-Y ratio (up to 0.95) (FIG. 4e and FIG. 16b) .
- gGBEv6.3-encoding mRNA and Tyr-targeting sgRNA were co-injected into mouse zygotes of C57BL6 background (black coat color) , with 20 mouse embryos used for each of the three Tyr-targeting sgRNAs.
- Highly efficient G editing (FIG. 17a) was found for two of the three sgRNAs, with an average of 50.9%PTC introduction efficiency when targeting the Tyr site 3 (FIG. 17b-c) .
- gGBE induced very few indels in mouse embryos (FIG. 17d-e) .
- deaminase-based base editors Two major classes of deaminase-based base editors (dBE) , ABE and CBE, as well as their derivatives (such as AYBE and CGBE) , perform base editing with deamination of A or C as the first key step [3–11] .
- deaminase-free base editors were designed based on engineered MPG, and a gGBE editor that could achieve highly efficient G-to-C and G-to-T conversion in both cultured human cells and mouse embryos was generated.
- the engineered MPG demonstrates that DNA glycosylases could be engineered into proteins that selectively excise a specific nucleotide base, such as, G.
- the high editing efficiency of the gGBE of the disclosure could be attributed to the mutations in the MPG moiety that may facilitate its specific substrate selection or DNA-binding activity, or both.
- Base editor constructs used in this study were cloned into a mammalian expression plasmid backbone under the control of a EF1 ⁇ promoter by standard molecular cloning techniques.
- Intron-split EGFP reporters were engineered as previously described [11] .
- corresponding mutations at the splice acceptor site were made to construct A-to-T reporter or G-to-T reporter via site-directed mutagenesis by PCR, respectively. Mutations at the splice acceptor site led to inactive EGFP production by non-spliced EGFP transcripts.
- BpiI-harboring MPG mutants MPG-G174R/D175R/T199R/S230R/Q294R/D295R mutants or corresponding combinations, were constructed via site-directed mutagenesis by PCR. Sequential asparagine /glutamic acid /valine /glycine /tyrosine substitutions (X-to-N, E, V, G, or Y) were designed, with oligos coding for the mutants annealed and ligated into corresponding BpiI-digested backbone vectors. The gRNA oligos were annealed and ligated into BpiI sites. Unless otherwise indicated, each and every mutation in MPG is numbered based on the full-length wild type human MPG (SEQ ID NO: 9) with the first N-terminal Met.
- Target sequencing data analysis was described previously [11] .
- the targeted amplicon sequencing reads were processed using fastp with default parameters [47] .
- the cleaned pairs were then merged using FLASH v1.2.11.
- the amplified sequences from individual targets were demultiplexed using fastx barcode splitter.
- pl from fastx_toolkit 0.0.14
- Further amplicon sequencing analysis was performed by CRISPResso2 [48] .
- a 10-bp window was used to quantify modifications centered around the middle of the 20-bp gRNA. Otherwise, the default parameters were used for analysis.
- G-to-C purity was calculated as G-to-C editing efficiency / (G-to-C editing efficiency + G-to-T editing efficiency + G-to-A editing efficiency) .
- G-to-Y conversion ratio was calculated as (G-to-C editing efficiency + G-to-T editing efficiency) / (G-to-C editing efficiency + G-to-T editing efficiency + G-to-A editing efficiency)
- HEK293T cells were cultured with DMEM (Catalog#11995065, Gibco) supplemented with 10%fetal bovine serum (Catalog#04-001-1ACS, BI) and 0.1 mM non-essential amino acids (Catalog#11140-050, Gibco) . Cells were grown in an incubator at 37 °C with 5%CO 2 . MPG mutant screening was conducted in 48-well plates. The day before transfection, 3 ⁇ 10 4 HEK293T cells per well were plated in 250 ⁇ L of complete growth medium in the 48-well plates.
- Orthogonal R-loop assays were performed as described previously [1, 2] .
- 1 ⁇ g of gGBE plasmid with sgRNA targeting site 3 and 1 ⁇ g of dSaCas9 plasmid with corresponding sgRNA targeting five off-target sites to generate R-loops were co-transfected into HEK293T cells in 12-well plates using PEI (DNA/PEI ratio of 1: 2) .
- PEI DNA/PEI ratio of 1: 2
- 48h after transfection expression of mCherry, BFP and EGFP fluorescence were analyzed by BD FACS Aria III or Beckman CytoFLEX S. Flow cytometry results were analyzed with FlowJo V10.5.3.
- the gating strategy in the identification of mCherry + , BFP + and EGFP + cells for on-target editing efficiency evaluation was supplied in FIG. 10d.
- mice were approved by the Biomedical Research Ethics Committee of Center for HuidaGene Therapeutics Co. Ltd.
- Super ovulated C57BL/6 females (4 weeks old) were mated with C57BL/6 males (8 weeks old) , and females from the ICR strain were used as foster mothers.
- Mice were maintained in a specific pathogen-free facility under a 12-hour dark–light cycle, and constant temperature (20–26°C) and humidity (40–60%) maintenance.
- the gGBE plasmids were structured by standard PCR amplification with Phanta Max Super-Fidelity DNA Polymerase (Vazyme Biotech Co., Ltd) , assembly with Gibson Assembly Master Mix (NEB E2611L) , and transformation into chemically competent DH5 ⁇ cells.
- the gGBE plasmids were linearized by the FastDigest KpnI restriction enzyme (Thermo Fisher) , purified using Gel Extraction Kit (Omega) , and used as the template for in vitro transcription (IVT) using the mMESSAGE mMACHINE T7 Ultra kit (Life Technologies) .
- T7 promoter sequence was added to the sgRNA template by PCR amplification of px330 (Addgene, #42230) .
- the T7-Tyr-sgRNA PCR product was purified using Gel Extraction Kit (Omega) and used as the template for IVT of sgRNAs using the MEGAshortscript T7 kit (Life Technologies) .
- the gGBE mRNA and Tyr-sgRNAs were purified using the MEGAclear kit (Life Technologies) and eluted in RNase-free water. In vitro transcribed RNAs were aliquoted and stored at -80°C until use. Prior to microinjection, the mixture of gGBE mRNA and Tyr-sgRNA was prepared by centrifuge for 10 min at 14,000 rpm at 4°C and supernatant transferred to 0.2 mL fresh PCR tubes for injection.
- Genomic DNA was extracted by addition of 40 ⁇ l of lysis buffer and 1 ⁇ L Proteinase K (Catalog#PD101-01, Vazyme) directly into each tube of sorted cells.
- the genomic DNA/lysis buffer mixture was incubated at 55 °C for 45 min, followed by a 95 °C enzyme inactivation step for 10 min.
- the regions of interest for target sites were amplified by PCR using site-specific primers.
- PCR reaction was performed at 95 °C for 5 min, 30 cycles at 95 °Cfor 15 s, 60 °C for 15 s, 72 °C 30 s, and a final extension at 72 °C for 5 min using Max Super-Fidelity DNA Polymerase (Catalog#P505-d3, Vazyme) .
- PCR products were purified using universal DNA purification kit (TIANGEN) according to the manufacturer’s instructions, and analyzed by Sanger sequencing (Genewiz) .
- the amplicons were ligated to adapters and sequencing was performed on the Illumina MiSeq platforms. Protospacer sequences used for each genomic locus are listed in Table 1.
- T, C, and U are structurally similar, it was speculated that the excision of canonical T or C could be achieved by engineering certain uracil DNA glycosylase (UNG) .
- UNG uracil DNA glycosylase
- the excision of T or C would generate an apurinic/apyrimidinic (AP) site, then trigger base excision repair (BER) pathway and facilitate direct T editing or C editing (FIG. 19a-b) .
- UNG1-Y147A and UNG1-N204D Two human UNG1 variants, UNG1-Y147A and UNG1-N204D, have been engineered to excise T and C in DNA, respectively 17 .
- the residues Y156 and N213 of UNG2 are corresponding to the residues Y147 and N204 of UNG1, respectively, as determined by sequence alignment of UNG1 and UNG2.
- gTBE deaminase-free glycosylase-based thymine base editor
- gCBE deaminase-free glycosylase-based cytosine base editor
- T-to-G reporter and C-to-G reporter two similar intron-split EGFP reporter systems as reported previously 9 , were established to evaluate the editing efficiency of gTBE and gCBE, respectively (FIG. 26a) .
- the AG-to-AT or AG-to-AC inactive splicing acceptor (SA) could only be remediated with T-to-G or C-to-G conversion, which leads to correct splicing of EGFP-coding sequence and EGFP expression and emission of EGFP signals (FIG. 26b) .
- the gTBE or gCBE encoding vector was co-transfected with the T-to-G or C-to-G reporter vector containing a targeting single-guide RNA (targeting sgRNA) that targeted the corresponding mis-splicing mutations.
- targeting sgRNA targeting single-guide RNA
- NTD N-terminal domain
- UNG contains protein binding motifs and sites for post-translational modifications 18 , which might constrain targeted excision activity of the glycosylase domain in ssDNA 19, 20
- gTBEv0.2 SEQ ID NO: 140
- UNG2 ⁇ 88-Y156A SEQ ID NO: 139
- gCBEv0.2 SEQ ID NO: 142
- UNG2 ⁇ 88-N213D SEQ ID NO: 141 fused at the C-terminus of nCas9
- gTBEv0.2 exhibited comparable T-to-G conversion activity with gTBEv0.1 (1.0%vs. 1.1%, FIG. 19d) , while gCBEv0.2 exhibited significantly increased C-to-G conversion activity compared with gCBEv0.1 (13.3% vs. 1.0%, FIG. 19e) .
- both gTBEv0.3 (SEQ ID NO: 143) (with UNG2 ⁇ 88-Y156A fused at the N-terminus of nCas9) and gCBEv0.3 (SEQ ID NO: 154) (with UNG2 ⁇ 88-N213D fused at the N-terminus of nCas9) showed much higher editing efficiency than gTBEv0.2 and gCBEv0.2 with UNG2 mutant fused at the C-terminus of nCas9 (10.2%vs. 1.0%for gTBE and 51.4%vs. 13.3%for gCBE; FIG. 19c-e) , about 10-and 3.9-fold enhancement in editing efficiency, respectively.
- UNG contains five conserved motifs required for efficient glycosylase activity: the catalytic water-activating loop, the proline-rich loop, the uracil-binding motif, the glycine-serine motif, and the leucine loop 23-25 (FIG. 25b) .
- gTBEv1.1 (SEQ ID NO: 145) (v0.3 plus A214V) with UNG2 ⁇ 88-Y156A+A214V (SEQ ID NO: 144) was obtained with largely elevated T-to-G conversion activity of about 2.68-fold as compared with gTBEv0.3 (FIG. 28a) .
- site-saturation mutagenesis focusing on the residue at position 214 was further performed.
- gTBEv1.2 (SEQ ID NO: 147) (v0.3 plus A214T) with UNG2 ⁇ 88-Y156A+A214T (SEQ ID NO:146) was obtained with elevated editing efficiency of about 1.06-fold in comparison with the T editing efficiency of gTBEv1.1 (FIG. 28b) .
- T editing efficiency across different gTBE was validated at one endogenous genomic site in cultured mammalian cells (HEK293T cells) .
- HEK293T cells cultured mammalian cells
- FACS fluorescence-activated cell sorting
- the editing profiles of gTBEv3 was characterized by targeting 20 endogenous genomic loci, most of which were used in previous base editing studies 11, 12, 26, 27 . It was found that gTBEv3 achieved efficient T base editing efficiency (ranged from 24.3%to 81.5%; FIG. 21a and FIG. 30a-b) , and essentially no A, C, or G editing at all the examined sites (FIG. 30c-e) .
- the T-to-C or T-to-G conversions were the predominant events (FIG. 30f-h) , only a low percentage of T-to-A conversion were detected (FIG. 21a and FIG. 30i) , consistent with the previous findings for gGBE 3 , AYBE 9 and CGBEs 11-15 .
- the ratios of T-to-Sto T conversion ranged from 0.68 to 0.97 (without indels, FIG. 21b) and from 0.41 to 0.92 (with indels, FIG. 30j) . It was found that gTBEv3 also induced indels with frequency ranging from 5.2%to 45.2%at the 20 edited sites (FIG. 21c) . Furthermore, it was found that the editable range of gTBEv3 was positions 2 to 11, and the optimal editing window with high efficiency of T conversion covered protospacer positions 3 to 7, with the highest editing efficiency at position 5 (FIG. 30b) . No obvious motif preference was found for T conversions with gTBEv3 by analyzing the on-target editing and sequences of all the tested sites (FIG. 30k) .
- the off-target activity of gTBEv3 was analyzed at several in silico-predicted 28 guide sequence-dependent off-target sites, and the ability of gTBEv3 to mediate guide sequence-independent off-target DNA editing was characterized by using orthogonal R-loop assay in five previously reported dSaCas9 R-loops 9, 29 . Very low percentage of editing was found at all the guide sequence-dependent off-target loci (FIG. 21d-e and FIG. 31) , and very low frequencies (1.1%in average) was detected at all five guide sequence-independent off-target sites (FIG. 21f) . Taken together, gTBEv3 represents a highly efficient T-to-Sbase editor with low off-target effects in mammalian cells.
- gCBEv1.1 was generated by introducing A214V into gCBEv0.3 (FIG. 22a) . It was found that gCBEv1.1 (SEQ ID NO: 156) with UNG2 ⁇ 88-N213D+A214V (SEQ ID NO: 155) had largely elevated C-to-G conversion activity of about 1.34-fold as compared to gCBEv0.3 when evaluated using the C-to-G reporter (FIG. 32a) .
- alanine-scanning mutagenesis was conducted on the region of D154-D189 of UNG2 to examine its role in the regulation of base excision activity, and gCBEv1.2 (SEQ ID NO: 158) (v0.3 plus K184A) with UNG2 ⁇ 88-N213D+K184A (SEQ ID NO: 157) was obtained with largely elevated C-to-G conversion activity by about 1.55-fold as compared with gCBEv0.3 (FIG. 32b) .
- the combination of A214V and K184A was further investigated by combining these two mutations to generate gCBEv2 (SEQ ID NO: 160) with UNG2 ⁇ 88-N213D+K184A+A214V (SEQ ID NO: 159) , achieving C-to-G editing efficiency of about 1.3-fold compared with gCBEv0.3 (FIG. 22b) .
- the improvement of C editing efficiency across different gCBE was further validated by targeting an endogenous genomic site, and a gradual increase of overall C editing efficiency from 18.2%to 37.2%at C2 of the site 28 was observed (FIG. 33a) .
- gCBEv2 When compared to CGBE1 12 , a C-to-G base editor, it was found that gCBEv2 showed higher editing efficiency at certain positions towards the distal end of the target sequence (FIG. 22d and FIG. 33c) , indicating its positional preference within different optimal editing windows (positions 2 to 6 for gCBEv2 vs. positions 5 to 7 for CGBE1 12 ) . gCBEv2 induced fewer indels at site 36, and more indels at site 28 and site 29 than CGBE1 (FIG. 33k) .
- the potential applications of gTBE and gCBE were further evaluated.
- the gTBE could not only remediate inactive splicing signals in the intron-split EGFP reporter systems used above (FIG. 19-20 and FIG. 26) , but also be used for exon skipping by disrupting splicing signals at splicing donor (SD) or splicing acceptor (SA) sites (FIG. 23a) .
- SD splicing donor
- SA splicing acceptor
- gTBE and gCBE together with other existing base editors, provide 1904 sgRNA candidates (protospacer sequence /guide sequence shown in Table 3) with the SD or SA sites located in each optimal editing window (FIG. 23b and FIG. 34a) .
- sgRNA candidates protospacer sequence /guide sequence shown in Table 3
- FIG. 23c 771 sgRNA candidates for ABE and CBE targeting
- 156 and 103 candidates overlapped with those for gGBE and gTBE, respectively
- 232 and 223 sgRNA candidates could only be screened by gGBE or gTBE targeting, respectively (FIG. 23c) .
- 851 sgRNA candidates protospacer sequence /guide sequences shown in Table 4 targeting various codons for PTCs introduction in 15 genes were analyzed with gGBE and CBE, with 191 TAC and 124 TCA for gGBE targeting (FIG. 35e) .
- sgRNAs specifically targeting SD or SA sites was designed and screened with gTBEv3 or gCBEv2 (FIG. 23d and FIG. 34c) , including three sgRNAs targeting the SD sites of DMD exon 45 (FIG. 23e) , 12, and 37 (FIG. 34d) uniquely targeted by gTBEv3. Disruption of the SD site of exon 45, thus leading to exon skipping, would be applicable to restore dystrophin expression in 9%DMD patients 33 .
- gTBEv3-encoding mRNA and sgRNA targeting the SD site of DMD exon 45 were co-injected into zygotes of humanized mice to explore the potential application of gTBE. It was found that 100% (20/20) mouse embryos harbored efficient base conversion (ranged from 35.0%to 97.0%) at the desired position T3 (FIG. 23f-g) , indicating the great potential of gTBE for human disease modeling and gene therapy. Overall, gBEs, including gTBE, gCBE, and gGBE, provide more options for the sites that deaminase-based base editors could not target, largely expanding the targeting scope of base editors.
- gTBEv4 (SEQ ID NO: 161) and gTBEv5 (SEQ ID NO: 162) were generated by inserting the UNG2 mutant (SEQ ID NO: 152) contained in gTBEv3 (SEQ ID NO: 153) into split nCas9 domains at different locations (FIG. 24b) .
- the UNG2 mutant (SEQ ID NO: 152) was embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) .
- nCas9 SEQ ID NO: 2
- nCas9 SEQ ID NO: 2
- positions 1064-1368 of nCas9 SEQ ID NO: 2
- the first amino acid residue D of nCas9 was numbered as position 2 instead of position 1.
- gCBEv3 (SEQ ID NO: 164) (FIG. 38a) was generated by replacing the UNG mutant (SEQ ID NO: 152) in gTBEv5 (SEQ ID NO: 162) with the UNG mutant (SEQ ID NO: 159) in gCBEv2 (SEQ ID NO: 160) .
- T editing efficiency of various thymine base editors was compared at 17 endogenous sites, including five sites from He’s study 34 and five sites from Ye’s study 35 (FIG. 24c and FIG. 36) .
- gTBEv3 showed higher editing efficiency than DAF-TBE at the overwhelming majority of Ts (29 out of 35) of tested sites (FIG. 24c, FIG. 36f) , indicating that UNG mutants generated herein by rational mutagenesis are superior to those by random mutagenesis.
- gTBEv3 was also compared with gTBEv4 and gTBEv5, two base editors constructed using the embedding strategy.
- gTBEv4 showed a shifted editing window of positions 7-13 from positions 3-7 (FIG. 24d) , with no significant difference in average editing efficiency from gTBEv3 (23.2%vs. 23.1%, FIG. 36f) .
- the editing efficiency was largely increased compared to that of gTBEv3 (averaging 39.3%vs. 23.1%, FIG. 36f) and gTBEv4 and others, with the same predominant T-to-Sconversions (FIG. 36a-d and g) , and the optimal editing window covered protospacer positions 5 to 9 (FIG. 24d) .
- TSBE3 (carrying L83Q and G116E mutations, equivalent to L74Q and G107E in UNG1) is an nCas9-embedded base editor with almost the same insertion position as gTBEv5 (FIG. 24c) .
- gTBEv5 showed higher editing efficiency than TSBE3 (39.3%vs. 22.5%, FIG. 36f) at the overwhelming majority of Ts (29 out of 35) of the tested sites (FIG. 24c) , indicating that the UNG mutant generated herein by rational mutagenesis are superior to those generated by PLM-assisted mutagenesis.
- the optimal editing window of TSBE3 covered protospacer positions 4 to 9 (FIG. 24d) .
- the circularly permuted DAF-TBE2 showed low average editing efficiency and an editing window of positions 9-13, different from the editing window (positions 2-6) of DAF-TBE (FIG. 24d) .
- gTBEv5 induced comparable indel rates to that of DAF-TBE (14.4%vs. 14.4%) , DAF-TBE2 (14.4%vs. 10.3%) , and TSBE3 (14.4%vs. 13.5%, FIG. 36e-g) .
- gTBEs induced much fewer unintended T editing than TSBE3 and DAF-TBEs in the proximal DNA sequence upstream from two sites (site 38 and site 44) harboring unintended edits (FIG. B13) , consistent with the finding that the NTD of UNG could promote targeting the enzyme to ssDNA–dsDNA junctions 19 .
- gCBEv2 induced comparable average indel rates with other deaminase-free base editors, including DAF-CBE (16.8%vs. 16.9%) , DAF-CBE2 (16.8%vs. 12.1%) , and CGBE-CDG (16.8%vs. 13.6%, FIG. 38d-g) .
- the C-to-G editing frequency and purity of different base editors showed respective advantages for CGBE1 and various deaminase-free base editors at different cytosine position across the protospacer (FIG. 39a-b) .
- Each base editor can edit its target base within a certain editable window, that is, positions 2 to 9 for gCBEv2, positions 2 to 11 for gCBEv3, positions 4 to 10 for CGBE1, positions 2 to 9 for CGBE-CDG, positions 2 to 9 for DAF-CBE, and positions 9 to 12 for DAF-CBE2 (FIG. 39c) .
- Prime editing (PE) system could theoretically mediate all types of base substitution, including T-to-G conversion and C-to-G conversion 39 .
- gTBEv3 and gTBEv5 were compared with the recently evolved PE6d system 40 at six previously reported endogenous sites 35 in HEK293T cells.
- gTBEv3 and gTBEv5 outperformed PE6d or PE6d max for T-to-G conversion at four tested sites (FIG. 41a) .
- gCBEv2 and gCBEv3 outperformed PE6d or PE6d max for C-to-G conversion at five tested sites (FIG. 41b) .
- base editing and prime editing offer complementary strengths, and base editors generally show more efficient editing if the target base is positioned optimally.
- gTBEs and gCBEs exhibited efficient T and C editing efficiency across three different human cell lines (HEK293T, U2OS, and HuH-7 cells) , with slight perturbations of the product purity for gTBEs and comparable substitution frequency of certain base for gCBEs in different cell lines (FIG. 42) .
- the deaminase-based base editor (dBE) and derivatives thereof enable direct editing of adenine (A) and cytosine (C) , but not thymine (T) .
- A adenine
- C cytosine
- T thymine
- SNP pathogenic single nucleotide polymorphism
- two orthogonal base editors, gTBE and gCBE that could achieve highly efficient T and C editing in both cultured human cells and mouse embryos were developed.
- the gTBE and gCBE could greatly broaden the targeting scope of base editors by breaking the limitations of PAM and narrow editing window, thus increasing the opportunity to obtain an efficient strategy for further research.
- the T-to-Sconversion ability of gTBE allows for a variety of gene editing applications, including editing splicing sites, as well as editing that bypass PTCs.
- Wild-type UNG proteins are highly specific against uracil in both ssDNA and dsDNA, with a preference for ssDNA 43 .
- the NTD of UNG containing motifs and sites for undesired protein-protein interactions and post-translational modifications could promote targeting the enzyme to ssDNA–dsDNA junctions 19, 20 .
- TSBE3, with full length UNG2, and DAF-TBEs induced more undesired edits than gTBEs in the proximal DNA sequence upstream from two sites harboring unintended edits (FIG. 37) .
- Base editor constructs used in this study were cloned into a mammalian expression plasmid backbone under the control of a EF1 ⁇ promoter by standard molecular cloning techniques, and the two intron-split EGFP reporters were constructed similar to those described previously 9 , except that the engineered sequence containing the last 86 base pairs (bp) intron of human RPS5 was inserted between BFP and EGFP coding sequences. And the corresponding mutations at the splice acceptor site were made to construct T-to-G reporter or C-to-G reporter via site-directed mutagenesis by PCR, respectively. Mutations at the splice acceptor site led to inactive EGFP production.
- the corresponding mutations at the splice acceptor site were put at position 6 across the protospacer.
- the wild-type UNG2 sequence (313 amino acids long) (SEQ ID NO: 133) was PCR-amplified from cDNA of HEK293T, UNG2-Y156A, UNG2-N213D, UNG-NTD-truncated mutants, and corresponding combinations were constructed via site-directed mutagenesis by PCR.
- the UNG mutants were fused at different orientations with respect to nCas9 via Gibson Assembly method.
- PE6d architecture harbored a human codon-optimized RNaseH-truncated evolved and engineered M-MLV variant with R221K/N394K/H840A mutations in SpCas9.
- the nick sgRNA and epegRNA with tevoPreQ 1 motif were cloned into PE6d construct using Golden Gate assembly, resulting in an all-in-one plasmid.
- PE6d max the codon-optimized hMLH1dn was co-expressed with PE6d.
- UNG mutagenesis libraries were designed and generated as previously described 52 with some modification.
- the region of 98-313 aa in UNG2 were divided into 8 aa long segments.
- BpiI-harboring mutants containing Y156A or N213D were introduced via site-directed mutagenesis by PCR.
- the regions of I150-L179, A158-K261, L210-T217, and Q274-Y284 were selected for rounds of sequential alanine /arginine /aspartic acid /valine substitutions (X-to-A, R, D, or V) .
- the Cas-OFFinder 28 was used to search for potential guide sequence-dependent off-target sites of Cas9 RNA-guided endonucleases with a maximum of 3 mismatches (with no bulges) .
- a PAM-flexible Cas9 variant SpG (SEQ ID NO: 163) was used in place of nCas9 (SEQ ID NO: 2) .
- the sgRNA oligos were annealed and ligated into BpiI sites.
- HEK293T, HuH-7, and U2OS cells were cultured with DMEM (Catalog#11995065, Gibco) supplemented with 10%fetal bovine serum (Catalog#04-001-1ACS, BI) and 0.1 mM non-essential amino acids (Catalog# 11140-050, Gibco) in an incubator at 37 °C with 5%CO 2 .
- Mutant screening was conducted in 48-well plates, with 3 ⁇ 10 4 HEK293T cells per well plated in 250 ⁇ L of complete growth medium the day before transfection. Between 16 and 24 h after seeding, cells were co-transfected with 250 ng gTBE (or gCBE) plasmids, 250 ng T-to-G (or C-to-G) reporter plasmids, and 1 ⁇ g Polyethylenimine (PEI) (DNA/PEI ratio of 1: 2) per well.
- PEI Polyethylenimine
- Endogenous target sites of interest were amplified from genomic DNA as previously described 9 . Briefly, 10,000 positive cells with mCherry were isolated by FACS after 72 h of transfection, then genomic DNA was extracted and the regions of interest for target sites were amplified by PCR using site-specific primers. The purified PCR products were analyzed by Sanger sequencing (Genewiz) .
- Target sequencing data analysis was described in the previous paper 3 .
- the amplicons were ligated to adapters and sequencing was performed on the Illumina MiSeq platforms, then the targeted amplicon sequencing reads were processed using fastp with default parameters 53 , and further amplicon sequencing analysis were performed by CRISPResso2 54 .
- T-to-G purity was calculated as T-to-G editing efficiency / (T-to-C editing efficiency + T-to-G editing efficiency + T-to-A editing efficiency) .
- T-to-Sconversion ratio was calculated as (T-to-C editing efficiency + T-to-G editing efficiency) / (T-to-C editing efficiency + T-to-G editing efficiency + T-to-A editing efficiency) .
- Protospacer sequences guide sequence are shown in Table 2.
- the mRNA and sgRNA preparations were performed as previously described 9 .
- the gTBEv3 expression plasmid was linearized by the FastDigest KpnI restriction enzyme (Catalog#FD0524, Thermo Fisher) , purified using Gel Extraction Kit (Catalog#D2500-03, Omega) , and used as the template for in vitro transcription (IVT) using the mMESSAGE mMACHINE T7 Ultra kit (Catalog#AM1345, Thermo Ambion) .
- T7 promoter sequence was added to the sgRNA template by PCR amplification.
- the T7-DMD-sgRNA PCR product was purified using Gel Extraction Kit (Catalog#D2500-03, Omega) and used as the template for IVT of sgRNAs using the MEGAshortscript T7 kit (Catalog#AM1354, invitrogen) .
- the gTBEv3-encoding mRNA and DMD-sgRNA were purified using the MEGAclear kit (Catalog#AM1908, invitrogen) , eluted in RNase-free water and stored at -80°C until use.
- mice Animal manipulations were consistent with those reported previously 3 . Experiments involving mice were approved by the Biomedical Research Ethics Committee of Center for HuidaGene Therapeutics Co. Ltd. Mice were maintained in a specific pathogen-free facility under a 12-hour dark–light cycle, and constant temperature (20–26°C) and humidity (40–60%) maintenance.
- HEK293T cells were plated in 12-well plates as above and transfected with 2 ⁇ g of gTBEv5, gCBEv3, CGBE1, or mCherry plasmids using PEI (DNA/PEI ratio of 1: 2) . At 48 hours after transfection, around 5 ⁇ 10 6 cells were collected. Total RNA was extracted with a TRIzol-based method, fragmented, and reverse transcribed to cDNAs with HiScript Q RT SuperMix according to the manufacturer’s instructions. Total RNA integrity was quantified using an Agilent 2100 Bioanalyzer. The RNA-seq library was qualified using the Illumina NovaSeq 6000 platform (performed by GENEWIZ) . Trimmomatic (v.
- RNA editing sites were calculated using REDItools2 57 with default parameters.
- the dbSNP (v. 146) database downloaded from NCBI was used to filter the sites overlapped with common single nucleotide variants (SNVs) . The sites with less than five mutated or nonmutated reads were further filtered.
- StringTie 58 was used to calculate expression value.
- DESeq2 59 was used to calculate differentially expressed genes with FDR ⁇ 0.05 and Fold change>1.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Biomedical Technology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Medicinal Chemistry (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Plant Pathology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Enzymes And Modification Thereof (AREA)
- Laminated Bodies (AREA)
- Transition And Organic Metals Composition Catalysts For Addition Polymerization (AREA)
Abstract
Provided in the disclosure are novel base editors and uses thereof.
Description
REFERENCE TO RELATED APPLICATIONS
The instant application claims the priority to and the benefit of the filing date of PCT/CN2023/090660, filed on April 25, 2023; PCT/CN2023/091734, filed on April 28, 2023; PCT/CN2023/094565, filed on May 16, 2023; PCT/CN2024/070217, filed on January 2, 2024; and PCT/CN2024/084498, filed on March 28, 2024, the entire contents of which, including any drawings and sequence listing, are incorporated herein by reference.
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
The disclosure contains a Sequence Listing XML file which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on April 25, 2024, by software “WIPO Sequence” according to WIPO Standard ST. 26, is named HGP032PCT. xml, and is 2,772, 603 bytes in size.
According to WIPO Standard ST. 26, symbol “t” is used to denote both T in DNA and U in RNA. Thus, in the instant sequence listing prepared according to ST. 26, wherever a sequence is an RNA, the T in the sequence shall be deemed as U.
Base editing is a powerful technology for basic research and therapeutic applications [1, 2] . Current base editors mainly contain a nucleic acid programmable DNA binding protein, such as a catalytically impaired CRISPR-associated (Cas) nuclease, that was fused with a single-stranded DNA deaminase enzyme and sometimes an additional protein that could modulate DNA repair machinery [3, 4] . In the past several years, two main classes of DNA base editors, adenine base editors (ABEs) [5] and cytosine base editors (CBEs) [4] have been developed and widely used for A-to-G and C-to-T conversions, respectively (FIG. 1a) . Recently, C-to-G base editors (CGBEs) [6-10] and adenine transversion base editor (AYBE) [11] were constructed by fusing existing CBE or ABE with a DNA glycosylase variant to generate new tools for achieving more versatile base editing outcomes, including C-to-G, A-to-C and A-to-T editing (FIG. 9) . CRISPR-free CBEs (DdCBEs) were reported for performing C-to-T base editing in mitochondria DNA, by fusing two halves of a double-strand DNA cytidine deaminase (DddA) variants with two separate TALE (transcription activator-like effector) proteins [12-14] . So far, these base editing methods all begin with deamination of C or A as an essential step to produce uridine (U) or inosine (I) intermediate, respectively, which in turn is transformed into another base by endogenous DNA repair or replication mechanisms [4-11] . Although G and T in the non-edited strand could be modified along with the editing of C and A in the edited strand, respectively, no existing base editor is capable of directly editing G or T.
It would be desired to develop novel base editors and base editing methods for base editing of a deoxyribonucleotide, such as, dG or dT, beyond the current base editors.
Citation or identification of any document in the disclosure is not an admission that such a document is available as prior art to the disclosure. Each of the references mentioned or cited in the disclosure is incorporated by reference in its entirety.
Provided in the disclosure includes at least in part base editors and base editing methods capable of direct base editing of a target deoxyribonucleotide (e.g., dG, dT) in a target dsDNA. Provided in the disclosure includes at least in part base editors and base editing methods capable of base editing of a target deoxyribonucleotide (e.g., dC) in a target dsDNA in the absence of deamination.
In an aspect, the disclosure provides a fusion protein comprising:
(1) a nucleic acid programmable DNA binding domain (napDNAbd) capable of binding a target dsDNA comprising:
(a) a first deoxyribonucleotide (e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine) ) in a protospacer sequence on a nontarget strand (edited strand) of the target dsDNA, and
(b) a second deoxyribonucleotide (e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine) ) base pairing with the first deoxyribonucleotide (e.g., dG, dT, dC) and in a target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence; and
(2) a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide.
In some embodiments, the fusion protein does not comprise a deaminase domain, e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
In some embodiments, the first deoxyribonucleotide is deoxyguanosine (dG) , thymidine (dT) , or deoxycytidine (dC) .
In some embodiments, the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide is dG-to-dA, dG-to-dT, dG-to-dC, dT-to-dA, dT-to-dC, dT-to-dG, dC-to-dA, dC-to-dT, or dC-to-dG.
In some embodiments, the base excising domain comprises a glycosylase.
In some embodiments, the glycosylase is selected from the group consisting of N-methylpurine DNA glycosylase (MPG) , 8-oxoguanine DNA glycosylase (OGG1) , methyl-CpG binding domain 4, DNA glycosylase (MBD4) , thymine DNA glycosylase (TDG) , uracil DNA glycosylase (UNG) , single-strand-selective monofunctional uracil-DNA glycosylase 1 (SMUG1) , mutY DNA glycosylase (MUTYH) , nth like DNA glycosylase 1 (NTHL1) , nei like DNA glycosylase 1 (NEIL1) , nei like DNA glycosylase 2 (NEIL2) , nei like DNA glycosylase 3 (NEIL3) , and mutants thereof capable of recognizing and excising a base from a nucleotide of a nucleic acid.
In some embodiments, the base excising domain comprises an N-methylpurine DNA glycosylase (MPG) .
In some embodiments, the MPG comprises an amino acid substitution relative to a reference MPG of SEQ ID NO: 7 at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
In some embodiments, the amino acid substitution is a substitution with R, A, N, or G.
In some embodiments, the MPG comprises an amino acid substitution relative to a reference MPG of SEQ ID NO: 7 that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
In some embodiments, the MPG comprises a combination substitution relative to a reference MPG of SEQ ID NO: 7 that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1.
In some embodiments, the MPG comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
In some embodiments, the MPG is (substantially) capable of excising guanine of dG.
In some embodiments, the base excising domain comprises an uracil-DNA glycosylase (UNG) .
In some embodiments, the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 or 137 at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the amino acid substitution is a substitution with A, D, V, or T.
In some embodiments, the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 or 137 that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the UNG comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the reference UNG of SEQ ID NO: 135 or 137, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 137 that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%,
99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
In some embodiments, the UNG is (substantially) capable of excising thymine of dT.
In some embodiments, the UNG is (substantially) capable of excising cytosine of dC.
In some embodiments, the napDNAbd is RNA programmable DNA binding protein.
In some embodiments, the napDNAbd is selected from the group consisting of CRISPR-associated (Cas) protein, IscB, IsrB, Argonaute, and TnpB.
In some embodiments, the napDNAbd is a nickase, e.g., a Cas9 nickase, an IscB nickase.
In some embodiments, the napDNAbd is nuclease-inactive, e.g., dead Cas9, dead Cas12i.
In some embodiments, the napDNAbd comprise an amino acid sequence having a sequence identity of at least about 60%(e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 2, 48, 50, 52, or 163.
In some embodiments, the fusion protein comprises, from N-terminal to C-terminal, (1) the napDNAbp and the base excising domain; or (2) the base excising domain and the napDNAbp.
In some embodiments, the napDNAbd (e.g., Cas9) is a two-part napDNAbd, for example, a two-part split Cas9, comprising a N-terminal portion and a C-terminal portion, and wherein the fusion protein comprises, from N-terminal to C-terminal, (1) the N-terminal portion of the napDNAbd, the base excising domain, and the C-terminal portion of the napDNAbd; (2) the C-terminal portion of the napDNAbd, the base excising domain, and the N-terminal portion of the napDNAbd; or (3) the base excising domain, the C-terminal portion of the napDNAbd (e.g., amino acids at positions 1249-1368) , and the N-temrinal portion (e.g., amino acids at positions 1-1248) of the napDNAbd .
In some embodiments, the napDNAbd is SpCas9 (e.g., a SpCas9 nickase) or a mutant thereof (e.g., a SpG Cas9 nickase) .
In some embodiments, the N-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1 or 2 to 1012, 1028, 1041, 1046, 1047, 1248, 1249, or 1300.
In some embodiments, the C-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1013, 1029, 1042, 1047, 1048, 1249, 1063, 1064, 1230, 1249, or 1301 to 1368.
In some embodiments, the fusion protein comprises the base excising domain embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2; or embedded between positions 2-1047 of nCas9 (SEQ ID NO: 2) and positions 1064-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2.
In some embodiments, the fusion protein comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 12, 14, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 55, 57, 59, 61, 63, 136, 138, 140, 142, 143, 145, 147, 149, 151, 153, 154, 156, 158, 160, 161, 162, and 164.
In another aspect, the disclosure provides a system comprising:
(i) the fusion protein of the disclosure or a polynucleotide encoding the fusion protein; and
(ii) a guide nucleic acid or a polynucleotide encoding the guide nucleic acid, the guide nucleic acid comprising:
(1) a scaffold sequence capable of forming a complex with the napDNAbd; and
(2) a guide sequence capable of hybridizing to the target sequence on the target strand of the target dsDNA, thereby guiding the complex to the target dsDNA.
In some embodiments, the guide nucleic acid is a guide RNA (gRNA) .
In some embodiments, the scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 40, 73, or 74.
In some embodiments, the scaffold sequence comprises (1) a sequence of SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 40, 73, or 74.
In some embodiments, the fusion protein or system of the disclosure further comprises a translesion synthesis (TLS) polymerase or a recruiting domain or component capable of recruiting a TLS polymerase.
In some embodiments, the TLS polymerase is selected from the group consisting of Polα (alpha) , Polβ (beta) , Polδ(delta) (PCNA) , Polγ (gamma) , Polη (eta) , Polι (iota) , Polκ (kappa) , Polλ (lamda) , Polμ (mu) , Polν (nu) , Polθ (theta) , and REV1.
In yet another aspect, the disclosure provides a polynucleotide encoding the fusion protein of the disclosure and optionally the guide nucleic acid of the disclosure.
In yet another aspect, the disclosure provides a delivery system comprising (1) the fusion protein of the disclosure, the polynucleotide of the disclosure, or the system of the disclosure; and (2) a delivery vehicle.
In yet another aspect, the disclosure provides a vector comprising the polynucleotide of the disclosure.
In yet another aspect, the disclosure provides a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
In yet another aspect, the disclosure provides a pharmaceutical composition comprising (1) the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, or the cell of the disclosure; and (2) a pharmaceutically acceptable excipient. In yet another aspect, the disclosure provides a cell or a progeny thereof comprising the system of the disclosure. In yet another aspect, the disclosure provides a cell or a progeny thereof modified by the system of the disclosure or the method of the disclosure.
In yet another aspect, the disclosure provides a method of modifying a target dsDNA, comprising contacting the target dsDNA with the system of the disclosure,
the target dsDNA comprising:
(a) a first deoxyribonucleotide (e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine) ) in a protospacer sequence on a nontarget strand (edited strand) of the target dsDNA, and
(b) a second deoxyribonucleotide (e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine) ) base pairing with the first deoxyribonucleotide (e.g., dG, dT, dC) and in a target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence.
In some embodiments, the method does not include deamination of the base of the first deoxyribonucleotide before the excision of the base of the first deoxyribonucleotide.
In some embodiments, the method does not include deamination of the base of the first deoxyribonucleotide.
In yet another aspect, the disclosure provides an MPG described herein, or of the disclosure.
In yet another aspect, the disclosure provides an UNG described herein, or of the disclosure.
The details of one or more embodiments of the disclosure are set forth in the description below. Other features or advantages of the disclosure will be apparent from the following drawings and detailed description of several embodiments, and also from the appended claims. It is understood that any aspect or embodiment of the disclosure can be combined with any other one or more aspects or embodiments of the disclosure, including aspects or embodiments only described in one sub-section, only in the examples, or only in the claims, to constitute another embodiment explicitly or implicitly disclosed herein unless otherwise indicated.
Definitions
The disclosure will be described with respect to particular embodiments, but the disclosure is not limited thereto in any respect. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which this disclosure belongs. Terms as set forth hereinafter are generally to be understood in their plain and ordinary meaning or common sense unless indicated otherwise.
Overview
Nucleic acid programmable binding protein (napBP) , for example, nucleic acid programmable DNA binding protein, (napDNAbp) , such as Cas9, Cas12, IscB, nucleic acid programmable RNA binding protein (napRNAbp) , such as, Cas13, is capable of binding to a target nucleic acid (e.g., dsDNA, mRNA) as guided by a guide nucleic acid (e.g., a guide RNA) comprising a guide sequence targeting the target nucleic acid. In some embodiments, the target nucleic acid is eukaryotic.
Without wishing to be bound by theory, in some embodiments, the guide nucleic acid comprises a scaffold sequence responsible for forming a complex with the napBP, and a guide sequence that is intentionally designed to be responsible for hybridizing to a target sequence of the target nucleic acid, thereby guiding the complex comprising the napBP and the guide nucleic acid to the target nucleic acid.
Referring to FIG. 6, an exemplary target dsDNA is depicted to comprise a 5’ to 3’s ingle DNA strand and a 3’ to 5’ single DNA strand.
An exemplary guide nucleic acid (e.g., a guide RNA) is depicted to comprise a guide sequence and a scaffold sequence. The guide sequence is designed to hybridize to a part of the 3’ to 5’s ingle DNA strand, and so the guide sequence “targets” that part. And thus, the 3’ to 5’s ingle DNA strand is referred to as a “target strand (TS) ” of the target dsDNA, while the opposite 5’ to 3’s ingle DNA strand is referred to as a
“nontarget strand (NTS) ” of the target dsDNA. That part of the target strand based on which the guide sequence is designed and to which the guide sequence may hybridize is referred to as a “target sequence” , while the opposite part on the nontarget strand corresponding to that part is referred to as the “protospacer sequence” , which is 100% (fully) reversely complementary to the target sequence and is said to be “corresponding to” the target sequence in the disclosure.
Referring to FIG. 6, an exemplary target dsDNA is depicted to comprise a 5’ to 3’s ingle DNA strand and a 3’ to 5’ single DNA strand. According to conventional transcription process, an exemplary target RNA (transcript, e.g., a pre-mRNA) may be transcribed using the 3’ to 5’s ingle DNA strand as a synthesis template, and thus the 3’ to 5’s ingle DNA strand is referred to as a “template strand” or a “antisense strand” . The transcript so transcribed has the same primary sequence as the 5’ to 3’s ingle DNA strand except for the replacement of T with U, and thus the 5’ to 3’s ingle DNA strand is referred to as a “coding strand” or a “sense strand” .
An exemplary guide nucleic acid (e.g., a guide RNA) is depicted to comprise a guide sequence and a scaffold sequence. The guide sequence is designed to hybridize to a part of the transcript (target RNA) , and so the guide sequence “targets” that part. And thus, that part of the target RNA based on which the guide sequence is designed and to which the guide sequence may hybridize is referred to as a “target sequence” . In some embodiments, the guide sequence is 100% (fully) reversely complementary to the target sequence. In some other embodiments, the guide sequence is reversely complementary to the target sequence and contains a mismatch with the target sequence.
Generally, as is conventional in the art, a nucleic acid sequence (e.g., a DNA sequence, an RNA sequence) is written in 5’ to 3’ direction /orientation unless explicitly indicated otherwise.
For example, for a DNA sequence of ATGC, it is usually understood as 5’-ATGC-3’ unless otherwise indicated. Its reverse sequence is 5’-CGTA-3’. Its fully complementary sequence is 5’-TACG-3’. Its fully reverse complementary sequence is 5’-GCAT-3’. Note that the fully complementary sequence usually does not have the ability to base-pair /hybridize with the original sequence.
Generally, the double-strand sequence of a dsDNA may be represented with the sequence of its 5’ to 3’s ingle DNA strand conventionally written in 5’ to 3’ direction /orientation unless otherwise indicated.
For example, for a dsDNA having a 5’ to 3’s ingle DNA strand of 5’-ATGC-3’ and a 3’ to 5’ single DNA strand of 3’-TACG-5’ , the dsDNA may be simply represented as 5’-ATGC-3’.
5’-----ATGC -----3’
3’-----TACG -----5’
It should be noted that either the 5’ to 3’s ingle DNA strand or the 3’ to 5’s ingle DNA strand of a dsDNA can be a nontarget strand from which a protospacer sequence is selected.
Generally, for a gene as a dsDNA, the 5’ to 3’s ingle DNA strand is the sense strand of the gene, and the 3’ to 5’ single DNA strand is the antisense strand of the gene. It should be noted that either the sense strand or the antisense strand of a gene can be a nontarget strand from which a protospacer sequence is selected.
Normally, the transcript (target RNA) transcribed from the dsDNA then has a (target) sequence of 5’-AUGC-3’.
To hybridize to a target dsDNA, in one embodiment, the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-AUGC-3’ that is fully reversely complementary to the 3’ to 5’s trand of the target dsRNA, which would be set forth in ATGC in the electric sequence listing but marked as an RNA sequence; and in another embodiment, the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the 5’ to 3’s trand of the target dsRNA, which would be set forth in GCAT in the electric sequence listing but marked as an RNA sequence.
In the case that the guide sequence of a guide nucleic acid is fully reversely complementary to the target sequence and the target sequence is fully reversely complementary to the protospacer sequence, the guide sequence is identical to the protospacer sequence except for the U in the guide sequence due to its RNA nature and correspondingly the T in the protospacer sequence due to its DNA nature. According to WIPO standard ST.26, symbol “t” is used to denote both T in DNA and U in RNA (See “Table 1: List of nucleotides symbols” , the definition of symbol “t” is “thymine in DNA/uracil in RNA (t/u) ” ) . Thus, in the electronic sequence listing of the disclosure prepared according to ST. 26, such a guide sequence could be set forth in the same sequence as a corresponding protospacer sequence. For convenience, a single SEQ ID NO in the electronic sequence listing can be used to denote both such guide sequence and protospacer sequence, regardless whether such a single SEQ ID NO is marked as DNA or RNA in the electronic sequence listing. When a reference is made to such a SEQ ID NO that sets forth a protospacer /guide sequence, it refers to either a protospacer sequence that is a DNA sequence or a guide sequence that is an RNA sequence depending on the context, no matter whether it is marked as a DNA or an RNA in the electronic sequence listing.
To hybridize to the target RNA, in one embodiment, the guide sequence of a guide nucleic acid is designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the (target) sequence of the target
RNA, which would be set forth in GCAT in the electric sequence listing but marked as an RNA sequence.
Term
As used herein, if a DNA sequence, for example, 5’-ATGC-3’ is transcribed to an RNA sequence, with each dT (deoxythymidine, or “T” for short) in the primary sequence replaced with a U (uridine) and other dA (deoxyadenosine, or “A” for short) , dG (deoxyguanosine, or “G” for short) , and dC (deoxycytidine, or “C” for short) replaced with A (adenosine) , G (guanosine) , and C (cytidine) , respectively, for example, 5’-AUGC-3’ , it is said in the disclosure that the DNA sequence “encodes” the RNA sequence.
As used herein, the term “activity” refers to a biological activity. In some embodiments, the activity includes enzymatic activity, e.g., catalytic ability of an effector. For example, the activity can include nuclease activity, e.g., dsDNA endonuclease activity, RNA endonuclease activity.
As used herein, the term “nucleic acid programmable binding protein (napBP) ” may be used interchangeably with “nucleic acid programmable binding domain (napBD) ” to refer to a protein that can associate (e.g., bind) with a programmable nucleic acid (e.g., DNA or RNA) , such as a guide nucleic acid (e.g., gRNA) , that is able to be programmed to guide the protein to a specific sequence of a target nucleic acid via the interaction (e.g., hybridization) between the programmable nucleic acid (e.g., the guide sequence of the programmable nucleic acid) and the target nucleic acid (e.g., the target sequence of the target nucleic acid) . The napBP may be indirectly associated with (e.g., bound to) the target nucleic acid via the interaction (e.g., binding) between the napBP and the programmable nucleic acid (e.g., scaffold sequence of the programmable nucleic acid) and the interaction (e.g., hybridization) between the programmable nucleic acid (e.g., the guide sequence of the programmable nucleic acid) and the target nucleic acid (e.g., the target sequence of the target nucleic acid) . In some embodiments, the napBP is a nucleic acid programmable DNA binding protein (napDNAbp) . In some embodiments, the napBP is a nucleic acid programmable RNA binding protein (napRNAbp) .
As used herein, the term “complex” refers to a grouping of two or more molecules. In some embodiments, the complex comprises a polypeptide and a nucleic acid interacting with (e.g., binding to, coming into contact with, adhering to) one another. As used herein, the term “complex” can refer to a grouping of a guide nucleic acid and a polypeptide (e.g., a napBP) . As used herein, the term “complex” can refer to a grouping of a guide nucleic acid, a polypeptide (e.g., a napBP) , and a target nucleic acid.
As used herein, the term “protospacer adjacent motif’ or “PAM” refers to a short DNA sequence (or a DNA motif) adjacent to a protospacer sequence on the nontarget strand of a dsDNA. As used herein, the term “adjacent” includes instances wherein there is no nucleotide between the protospacer sequence and the PAM and also instances wherein there are a small number (e.g., 1, 2, 3, 4, or 5) of nucleotides between the protospacer sequence and the PAM. As used herein, A “immediately adjacent (to) ” B, A “immediately 5’ to” B, and A “immediately 3’ to” B mean that there is no nucleotide between A and B. In some embodiments, the PAM is immediately 5’ to a protospacer sequence. In some embodiments, the PAM is immediately 3’ to a protospacer sequence.
As used herein, the term “guide nucleic acid” refers to any nucleic acid that facilitates the targeting of a napBP to a target nucleic acid. For this purpose, the guide nucleic acid may be designed to include a guide sequence capable of hybridizing to a specific sequence of a target nucleic acid, and the guide nucleic acid may also comprise a scaffold sequence facilitating the guiding of a napBP to the target nucleic acid. In some embodiments, the guide nucleic acid is a guide RNA. In some embodiments, the guide nucleic acid is a nucleic acid encoding a guide RNA.
As used herein, the terms “nucleic acid” , “polynucleotide” , and “nucleotide sequence” are used interchangeably to refer to a polymeric form of nucleotides of any length, including deoxyribonucleotides, ribonucleotides, combinations thereof, and analogs or modifications thereof.
As used in the context of CRISPR-Cas techniques (e.g., CRISPR-Cas12 techniques) , the term “guide RNA” is used interchangeably with the term “CRISPR RNA (crRNA) ” , “single guide RNA (sgRNA) ” , or “RNA guide” , the term “guide sequence” is used interchangeably with the term “spacer sequence” , and the term “scaffold sequence” is used interchangeably with the term “direct repeat sequence” .
As described herein, the guide sequence is so designed to be capable of hybridizing to a target sequence. As used herein, the term “hybridize” , “hybridizing” , or “hybridization” refers to a reaction in which one or more polynucleotide sequences react to form a complex that is stabilized via hydrogen bonding between the bases of the polynucleotide sequences. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner. A polynucleotide sequence capable of hybridizing to a given polynucleotide sequence is referred to as the “complement” of the given polynucleotide sequence. As used herein, the hybridization of a guide sequence and a target sequence is so stabilized to permit an effector polypeptide (e.g., a napBP) that is complexed with a nucleic acid comprising the guide sequence or a function domain associated (e.g., fused) with the effector polypeptide
to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence.
For the purpose of hybridization, in some embodiments, the guide sequence is reversely complementary to a target sequence. As used herein, the term “reverse complementary” refers to the ability of nucleobases of a first polynucleotide sequence, such as a guide sequence, to base pair with nucleobases of a second polynucleotide sequence, such as a target sequence, by traditional Watson-Crick base-pairing. Two reverse complementary polynucleotide sequences are able to non-covalently bind under appropriate temperature and solution ionic strength conditions. In some embodiments, a first polynucleotide sequence (e.g., a guide sequence) comprises 100% (fully) reverse complementarity to a second nucleic acid (e.g., a target sequence) . In some embodiments, a first polynucleotide sequence (e.g., a guide sequence) is reverse complementary to a second polynucleotide sequence (e.g., a target sequence) if the first polynucleotide sequence comprises at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%complementarity to the second nucleic acid (i.e., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%of the nucleotides of the first polynucleotide sequence can base-pair with the nucleotides of the second polynucleotide sequence) . As used herein, the term “substantially complementary” refers to a first polynucleotide sequence (e.g., a guide sequence) that has a certain level of complementarity to a second polynucleotide sequence (e.g., a target sequence) (e.g., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%of the nucleotides of the first polynucleotide sequence can base-pair with the nucleotides of the second polynucleotide sequence, or at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotides of the first polynucleotide sequence mismatch the nucleotides of the second polynucleotide sequence) . In some embodiments, the level of complementarity is such that the first polynucleotide sequence (e.g., a guide sequence) can hybridize to the second polynucleotide sequence (e.g., a target sequence) with sufficient affinity to permit an effector polypeptide (e.g., a napBP) that is complexed with a nucleic acid comprising the first polynucleotide sequence or a function domain associated (e.g., fused) with the effector polypeptide to act (e.g., cleave, deaminize) on the target sequence or its complement or nearby sequence. In some embodiments, a guide sequence that is substantially complementary to a target sequence has less than 100%complementarity to the target sequence. In some embodiments, a guide sequence that is substantially complementary to a target sequence has at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementarity to the target sequence, and/or has at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotide mismatches from the target sequence.
As used herein, the term “sequence identity” is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percentage sequence identity (%) between two or more sequences (polypeptide or polynucleotide sequences) . Sequence homologies may be generated by any of a number of computer programs known in the art, for example, BLAST, FASTA. A suitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (University of Wisconsin, U.S.A; Devereux et al., 1984, Nucleic Acids Research 12: 387) . Examples of other software than may perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al., 1999 ibid-Chapter 18) , FASTA (Atschul et al., 1990, J. Mol. Biol., 403-410) , and the GENEWORKS suite of comparison tools. Both BLAST and FASTA are available for offline and online searching (see Ausubel et al., 1999 ibid, pages 7-58 to 7-60) . A commonly used online tool to calculate percentage sequence identity between two or more sequences (polypeptide or polynucleotide sequences) is available on the website of EMBL's European Bioinformatics Institute (www dot ebi dot ac dot uk slash jdispatcher slash) , allowing fast online calculation of percentage sequence identity by global alignment or local alignment.
As used herein, the terms “polypeptide” and “peptide” are used interchangeably herein to refer to polymers of amino acids of any length. A protein may have one or more polypeptides. An amino acid polymer can also be modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component.
As used herein, a “variant” is interpreted to mean a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide, respectively, but retains essential properties, e.g., binding property of a napBP. A typical variant of a polynucleotide differs in nucleic acid sequence from another reference polynucleotide. A change in the nucleic acid sequence of the polynucleotide variant may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide. A change in the nucleic acid sequence of the polynucleotide variant may result in an amino acid substitution, addition, and/or deletion in the polypeptide encoded by the reference polynucleotide. A typical variant of a polypeptide differs in amino acid sequence from another reference polypeptide. Generally, the difference is limited so
that the sequences of the reference polypeptide and the polypeptide variant are closely similar overall and, in many regions, identical. The polypeptide variant and reference polypeptide may differ in amino acid sequence by one or more substitutions, additions, and/or deletions in any combination. A variant of a polynucleotide or polypeptide may be naturally occurring, such as, an allelic variant, or it may be a variant that is not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to skilled artisans.
As used herein, the terms “upstream” and “downstream” refer to the relative positions of two or more elements within a nucleic acid in 5’ to 3’ direction. A first sequence is upstream of a second sequence when the 3’ end of the first sequence is present at the left side of the 5’ end of the second sequence. A first sequence is downstream of a second sequence when the 5’ end of the first sequence is present at the right side of the 3’ end of the second sequence. In some embodiments, the PAM is upstream of a napBP-induced indel, and a napBP-induced indel is downstream of the PAM. In some embodiments, the PAM is downstream of a napBP-induced indel, and a napBP-induced indel is upstream of the PAM.
As used herein, the term “wild type” has the meaning commonly understood by those skilled in the art to mean a typical form of an organism, a strain, a gene, or a feature that distinguishes it from a mutant or variant when it exists in nature. It can be isolated from sources in nature and not intentionally modified.
As used herein, the terms “non-naturally occurring” and “engineered” are used interchangeably and refer to artificial participation. When these terms are used to describe a nucleic acid or a polypeptide, it is meant that the nucleic acid or polypeptide is at least substantially freed from at least one other component of its association in nature or as found in nature.
As used herein, the term “regulatory element” is intended to include promoters, enhancers, internal ribosome entry sites (IRES) , and other expression control elements (e.g., transcription termination signals, such as, polyadenylation signals and poly-U sequences) . Such regulatory elements are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990) . Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct expression of a nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences) . Regulatory elements may also direct expression in a time-dependent manner, e.g., in a cell cycle-dependent or developmental stage-dependent manner, which may or may not be tissue or cell type specific.
As used herein, the term “cell” is understood to refer not only to a particular individual cell, but to the progeny or potential progeny of the cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term.
As used herein, the term “in vivo” refers to inside the body of an organism, and the terms “ex vivo” or “in vitro” means outside the body of an organism.
As used herein, the term “treat” , “treatment” , or “treating” is an approach for obtaining beneficial or desired results including clinical results. For purposes of the disclosure, the beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms resulting from a disease, diminishing the extent of a disease, stabilizing a disease (e.g., delaying the worsening of a disease) , delaying the spread (e.g., metastasis) of a disease, delaying the recurrence of a disease, reducing recurrence rate of a disease, delay or slowing the progression of a disease, ameliorating a disease state, providing a remission (partial or total) of a disease, decreasing the dose of one or more other medications required to treat a disease, delaying the progression of a disease, increasing the quality of life, and/or prolonging survival. Also encompassed by “treatment” is a reduction of pathological consequence of a disease (such as cancer) . The methods of the disclosure contemplate any one or more of these aspects of treatment.
As used herein, the term “disease” includes the terms “disorder” and “condition” and is not limited to those have been specifically medically defined.
As used herein, the term “transcript” includes any transcription product by transcription from a DNA, including subgenomic RNA, mRNA, non-coding RNA, and any variants, derivatives, or ancestors thereof, for example, pre-mRNA, and any transcripts or isoforms produced from the DNA or the pre-mRNA by, e.g., alternative promoter usage, alternative splicing, alternative initiation, and any naturally occurring variants thereof or processed products therefrom.
As used herein, reference to “not” a value or parameter generally means and describes “other than” a value or parameter. For example, the method is not used to treat cancer of type X means the method may be used to treat cancer of types other than X.
As used herein, the singular forms “a” , “an” , and “the” include plural referents unless the context clearly dictates otherwise.
As used herein, the term “and/or” in a phrase such as “A and/or B” is intended to include both A and B; A or B; A (alone) ; and B (alone) . Likewise, the term “and/or” in a phrase such as “A, B, and/or C” is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone) ; B (alone) ; and C (alone) .
As used herein, when the term “about” is ahead of a serious of numbers (for example, about 1, 2, 3) , it is understood that each of the serious of numbers is modified by the term “about” (that is, about 1, about 2, about 3) . The term “about X-Y” or “about X to Y” used herein has the same meaning as “about X to about Y. ”
It is understood that embodiments of the disclosure described herein include “consisting” and/or “consisting essentially of” embodiments.
It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely” , “only” , and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
An understanding of the features and advantages of the disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure may be utilized, and the accompanying drawings of which:
FIG. 1. Design and mechanisms of base editing by conventional base editors and glycosylase-based base editors of the disclosure. FIG. 1a: Schematic diagrams of ABE (left) and CBE (middle) and the deaminase-free glycosylase-based guanine base editor (gGBE, right) of the disclosure. A nCas9-sgRNA complex creates an R-loop at the target site in the DNA. In ABE and CBE, the evolved adenine deaminase (tRNA adenosine deaminase, TadA) and AID/APOBEC-like cytidine deaminase converts the exposed adenine (A) into deoxyinosine (I) and cytosine (C) into deoxyuridine (U) , respectively. In CBE, an additional linked protein, uracil glycosylase inhibitor (UGI) , protects U from uracil DNA N-glycosylase (UNG) . After deamination, the resulting I is recognized as G, and U as T by DNA polymerase during DNA repair or replication. In the gGBE of the disclosure, a prototype version of glycosylase-based guanine base editor (gGBE) is designed to remove G, and the AP site generated thereby is repaired by TLS and/or DNA replication, leading to G-to-C or G-to-T conversion. PAM, protospacer adjacent motif. AP, apurinic/apyrimidinic sites. FIG. 1b: A screening reporter system for detecting G-to-T conversion by gGBE. P2A, 2A peptide from porcine teschovirus-1. FIG. 1c: Percentage of EGFP+ cells for G editing efficiency evaluation of gGBE with various MPG (mean ± s.e.m., n = 3) . WT: wild-type.
FIG. 2. Mutagenesis of the MPG moiety in gGBEs. FIG. 2a: Schematic diagram of mutagenesis and screening strategy for engineered gGBE. The EGFP reporter plasmids were transiently co-transfected into cultured cells along with the gGBE expression plasmids. FIG. 2b: Genotypes of a subset of engineered gGBEs, with percentage of EGFP+ cells for each gGBE on the far-right column (more engineered gGBEs listed in FIG. 13) . Different steps of mutagenesis are marked by different shaded colors. FIG. 2c-d: G base editing outcomes with different gGBE at the edited G7 position in site 3 (FIG. 2c) and site 10 (FIG. 2d) in transfected HEK293T cells by target deep sequencing (mean ± s.e.m., n = 3) .
FIG. 3. Characterization of editing profiles of gGBE via target deep sequencing. FIG. 3a: Bar plots showing the on-target DNA base editing at positions with the highest G conversion frequencies at each genomic site in HEK293T cells (mean ± s.e.m., n = 3) . G#: G position with highest on-target base editing frequencies across protospacer positions 1–20. site #: genomic site number. FIG. 3b: The ratio of G-to-C/T to G-to-A/C/T conversion frequency by gGBEv6.3 editing at the sites shown in FIG. 3a. FIG. 3c: Frequencies of G conversions by gGBEv6.3 across protospacer positions 1–20 at the edited sites in FIG. 3a (in which PAM was at positions 21–23) . Single dots represent individual data point from 3 independent replicates per site. Boxes span the interquartile range (25th to 75th percentile) ; horizontal line in the box indicates the median (50th percentile) ; and small horizontal bars mark the minimal and maximal values. FIG. 3d: The guide sequence-dependent off-target analysis for gGBEv6.3 editing efficiency at site 7 (mean ± s.e.m, n = 3) . OT: off-target. FIG. 3e: The guide sequence-independent off-target editing efficiency detected by the orthogonal R-loop assay at each R-loop site (mean ± s.e.m, n = 3) .
FIG. 4. Gene editing applications of gGBE. FIG. 4a: Application of gGBEv6.3 for editing splicing sites, introduction of premature termination codons (PTCs) , as well as editing that bypasses PTCs. FIG. 4b: Schematic diagram illustrating gGBE-indued skipping of DMD exon 45. FIG. 4c: Bar plots showing the on-target DNA base editing frequencies of G editing and the ratio of G-to-C/T to G-to-A/C/T editing frequencies by gGBEv6.3 at two DMD sites in HEK293T cells (mean ± s.e.m, n = 3) . SpG, a PAM-flexible Cas9 variant (SEQ ID NO: 163) , was used for targeting DMD site 2. FIG. 4d: Schematic diagram illustrating the introduction of PTCs in the mouse Tyr gene by gGBE. FIG. 4e: Bar plots showing the on-target G editing frequencies and G-to-Y Ratio by gGBEv6.3 at three Tyr sites in N2a cultured cells (mean± s.e.m, n = 3) . FIG. 4f: Percentages of G conversion-induced PTCs by gGBEv6.3 at two Tyr sites in
mouse pups (mean ± s.e.m., n = 25 and 21 pups for site 2 and site 3, respectively) . FIG. 4g: Phenotype of F0 mice generated by gGBE editing in mouse zygotes. The Image showing the presence of edited P6 mice. Red arrowhead, albino; blue arrowhead, mosaic pigmentation. FIG. 4h: Bar plots showing the on-target G editing frequencies for individual mouse pups, with gGBEv6.3 targeting Tyr site 3. FIG. 4i: Genotyping of representative F0 pups from (FIG. 4h) . The frequencies of mutant alleles were determined by high-throughput sequencing. Red arrowhead, albino pups.
FIG. 5 illustrates example nucleotide conversion by base excision and translesion synthesis (TLS) .
FIG. 6 illustrates an exemplify target dsDNA containing a first exemplify deoxyribonucleotide dG, an exemplify guide nucleic acid, and an exemplify napDNAbp before base editing.
FIG. 7 illustrates an exemplify target dsDNA containing a fourth exemplify deoxyribonucleotide dC, an exemplify guide nucleic acid, and an exemplify napDNAbp after base editing.
FIG. 8 illustrates an exemplify target dsDNA containing a fourth exemplify deoxyribonucleotide dT, an exemplify guide nucleic acid, and an exemplify napDNAbp after base editing.
FIG. 9. Overview of AYBE and CGBE. FIG. 9a: Schematic diagram of AYBE. N-methylpurine DNA glycosylase (MPG) excises the inosine (I) resulting from deamination of adenine (A) by the adenine deaminase (TadA) , triggering base excision repair (BER) pathway in cells, thus causing more versatile base editing outcomes, including A-to-C and A-to-T editing. FIG. 9b: Schematic diagram of CGBE. Uracil DNA N-glycosylase (UNG) excises the uridine (U) resulting from deamination of cytosine (C) by the AID/APOBEC-like cytidine deaminase, triggering base excision repair (BER) pathway in cells, thus causing dominant C-to-G editing. PAM, Protospacer adjacent motif. AP, apurinic/apyrimidinic sites.
FIG. 10. Characterization of A-to-T and G-to-T editing with an intron-split EGFP reporter system. FIG. 10a: Design of the reporter for A-to-T or G-to-T editing detection. P2A, 2A peptide from the porcine teschovirus-1. FIG. 10b: Percentage of EGFP+ cells for evaluation of A editing efficiency by gABE with various MPG. WT: wild-type; v0.2: MPG-N169S; v1: MPG with N169S, S198A, K202A, G203A, S206A and K210A mutations; v2: MPG with mutations N169S and G163R; v3: MPG with G163R, N169S, S198A, K202A, G203A, S206A and K210A mutations (mean ± s.e.m., n = 3) . FIG. 10c: Percentage of EGFP+ cells representing the efficiency of G-to-T conversion for gGBE containing various MPG and sgRNA. dMPG, inactive dead MPG; T: targeting sgRNA; NT: nontargeting sgRNA (mean ± s.e.m., n =3) .FIG. 10d: Representative flow cytometry scatter plots showing gating strategy and the percentages of EGFP+ cells for gGBEv3.
FIG. 11. View of MPG structure and the first round of mutagenesis of MPG. FIG. 11a: Structures for aa 78-298 region (left) and 163-179 region (right) of human MPG protein (shown in gray) , as predicted by AlphaFold (alphafold. com/entry/P29372) aligned with the crystal structure of MPG (PDB entry 1ewn, not shown) , in which εA was mutated to G in the DNA. FIG. 11b: Percentage of EGFP+ cells for evaluating G editing efficiency with different engineered gGBEs with various MPG mutants (mean ± s.e.m., n = 3) . FIG. 11c: Performance of various engineered gGBEs measured by the percentage of EGFP+ cells in the first round of screening. Dotted line, mean value of the MPGv3 group. Fold changes were calculated relative to gGBEv3 (mean ± s.e.m., n = 3) . FIG. 11d: Percentage of EGFP+ cells for each gGBE (mean ± s.e.m., n = 3) .
FIG. 12. Performance of engineered gGBEs in the second round of screening. FIG. 12a-d: Percentage of EGFP+ cells of gGBEs with various MPG mutants from sequential substitutions of glutamic acid (FIG. 12a) , valine (FIG. 12b) , glycine (FIG. 12c) , and tyrosine (FIG. 12d) (X-to-E, V, G, or Y) . n = 3. All values are presented as mean ± s.e.m.
FIG. 13. Progressive engineering and G editing efficiency of gGBEs. FIG. 13a: Progressive mutations of gGBEs. Different rounds of mutations are marked with different color shades. FIG. 13b: Percentage of EGFP+ cells for each gGBE. n = 3. All values are presented as mean ± s.e.m.
FIG. 14. Further characterization of editing profiles for gGBEv6.3. FIG. 14a-c: Frequencies of C (FIG. 14a) , T (FIG. 14b) and A (FIG. 14c) conversions by gGBEv6.3 across the protospacer positions 1–20 (where PAM is at positions 21–23) from the edited sites in FIG. 3a. FIG. 14d: Frequencies of G-to-T and G-to-C editing by gGBEv6.3. In FIG. 14a-d, single dot represents individual replicate (n = 3 independent replicates per site) , and boxes span the interquartile range (25th to 75th percentile) ; horizontal lines within the boxes indicate the median (50%) ; and whiskers extend to the minimal and maximal values. FIG. 14e-g: Percentage of G-to-C (FIG. 14e) , G-to-T (FIG. 14f) or G-to-A (FIG. 14g) editing by gGBEv6.3 at various edited sites shown in FIG. 3a (mean ± s.e.m., n = 3) . FIG. 14h: Indels frequencies with gGBEv6.3 at 24 on-target sites (mean ± s.e.m., n = 3) . FIG. 14i: Bar plots showing the on-target DNA base editing at positions with G conversion frequencies >10%at each genomic site in HEK293T cells (mean ± s.e.m., n = 3) . FIG. 14j: The statistical analysis of on-target DNA base editing for each NG motif from the edited sites in (FIG. 14i) . Each dot represents the mean of three biological replicates for each edited position at various edited sites.
FIG. 15. The guide sequence-dependent and guide sequence-independent off-target analysis. FIG. 15a: The guide sequence-dependent off-target analysis for gGBEv6.3 editing at different sites (n = 3) . OT: off-target. FIG. 15b: The guide sequence-independent off-target editing detected by the orthogonal R-loop assay at each R-loop site for gGBE, AYBE, and ABE8e (n = 3) , respectively. Data for AYBE and ABE8e were adopted from Tong et al. [1] . All values are presented as mean ± s.e.m.
FIG. 16. The percentage of G-to-C and G-to-T among all G-to-C/T/Aconversion events at DMD or Tyr sites targeted by gGBEv6.3. FIG. 16a: Percentages of G-to-C and G-to-T editing events in HEK293T cells with gGBEv6.3 at two DMD sites (corresponding to FIG. 4C) . n = 3. SA, splicing acceptor site. PAM, Protospacer adjacent motif. FIG. 16b: Percentages of G-to-C and G-to-T editing events in N2a cells with gGBEv6.3 at three Tyr sites (corresponding to FIG. 4e) . n = 3. All values are presented as mean ± s.e.m.
FIG. 17. G editing in mouse embryos with gGBEv6.3. FIG. 17a: On-target base editing efficiencies for gGBEv6.3 targeting three Tyr sites in mouse embryos (mean ± s.e.m., n = 20) . FIG. 17b-c: Percentages of G conversion-induced PTCs by gGBEv6.3 at three Tyr sites in mouse embryos (mean ± s.e.m., n = 20 embryos) . FIG. 17d: Indel frequencies induced by gGBEv6.3 targeting at three Tyr sites in mouse embryos shown in FIG. 17a (n = 20) . FIG. 17e: Bar plots showing the on-target G editing frequencies for individual mouse embryos, with gGBEv6.3 targeting Tyr site 1, Tyr site 2, and Tyr site 3.
FIG. 18. Phenotypes and genotyping of F0 mouse pups. FIG. 18a-c: Phenotypes of F0 mice generated by microinjection of gGBEv6.3 encoding mRNA and sgRNA for targeting Tyr site 1 (FIG. 18a) , site 2 (FIG. 18b) and site 3 (FIG. 18c) . The images were obtained for P6 mice. Arrowheads: red, albino mice; blue, mice with mosaic pigmentation. FIG. 18d: Bar plots showing the on-target G editing frequencies for individual mouse pups, with gGBEv6.3 targeting Tyr site 1 and Tyr site 2.
FIG. 19. Design and mechanisms of two orthogonal glycosylase-based base editors. FIG. 19a, Prototype versions of a deaminase-free glycosylase-based thymine base editor (gTBE) and a deaminase-free glycosylase-based cytosine base editor (gCBE) . PAM, Protospacer adjacent motif. AP, apurinic/apyrimidinic sites. Star (*) in magenta indicates the nick generated by nCas9. FIG. 19b, Schematic diagram of potential pathway for T (or C) editing and outcomes. A glycosylase mutant is designed to remove normal T or C, an nCas9-sgRNA complex creates an R-loop at the target site and nicks the non-edited strand, then the AP site generated is repaired by translesion synthesis (TLS) and/or DNA replication, leading to T or C editing. DSB, double-strand break. indel, insertion and deletion. FIG. 19c, Schematic of various gTBE and gCBE candidate architectures. Note that Y156A and N213D of UNG2 are equivalent to Y147A and N204D of UNG1, respectively. FIG. 19d, Percentage of EGFP+ cells for T editing efficiency evaluation of different gTBE using T-to-G reporter (n = 3) . NT, non-target sgRNA. T: target sgRNA. FIG. 19e, Percentage of EGFP+ cells for C editing efficiency evaluation of different gCBE using C-to-G reporter (n = 3) . NT, non-target sgRNA. T: target sgRNA. FIG. 19f, the orthogonality of gTBE and gCBE for base editing evaluated using two different reporters (n = 3) . All values are presented as mean ± s.e.m.
FIG. 20. Protein engineering and evolution of gTBEs. FIG. 20a, Schematic diagram of mutagenesis and screening strategy for engineering gTBE. The EGFP reporter plasmids were transiently co-transfected into cultured cells along with the gTBE plasmids, and the fluorescence intensity of EGFP was detected with flow cytometry. FIG. 20b, Left, the selected residues (shown as surface) for mutagenesis nearby the catalytic site pocket of human UNG-DNA complex (PDB entry 1EMH24) , in which dΨU was mutated to T in the DNA (dT) . Right, location of the effective residues in gTBEv3 shown as spheres in red on the three-dimensional structure. FIG. 20c, Gradual improvement of EGFP activation for each gTBE (n = 3) . WT, wild-type UNG2Δ88. dead, catalytically inactive UNG2Δ88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) 60. FIG. 20d, Frequencies of T base editing outcomes (left) and indels (right) with different gTBE at the edited T5 position in site 9 (CLYBL gene) in transfected HEK293T cells by target deep sequencing (n = 3) . All values are presented as mean ± s.e.m.
FIG. 21. Characterization of editing profiles of gTBE via target deep sequencing. FIG. 21a, Bar plots showing the on-target DNA base editing at positions with the highest T conversion frequencies at each genomic site in HEK293T cells (mean ± s.e.m., n = 3) . T#: T position with highest on-target base editing frequencies across protospacer positions 1–20. site #: genomic site number. FIG. 21b, The ratio of T-to-C/G to T-to-A/C/G conversion frequency by gTBEv3 editing at the sites shown in FIG. 21a. FIG. 21c, indels frequencies with gTBEv3 at 20 on-target sites (n = 3) . FIG. 21d-e, The guide sequence-dependent off-target analysis for gTBEv3 editing efficiency at site 1 and site 15 (n = 3) . OT: off-target. FIG. 21f, The guide sequence-independent off-target editing efficiency detected by the orthogonal R-loop assay at each R-loop site (n = 3) . All values are presented as mean ± s.e.m.
FIG. 22. Enhancement of gCBE editing efficiency through protein engineering. FIG. 22a, Schematic diagram of mutagenesis and screening strategy for engineering gCBE. FIG. 22b, Gradual improvement of EGFP activation for each gCBE (n = 3) . WT, wild-type UNG2Δ88. dead, catalytically inactive UNG2Δ88
(carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) 60. FIG. 22c, Bar plots showing the on-target DNA base editing at positions with the highest C conversion frequencies at each genomic site in HEK293T cells (n = 3) . C#: C position with highest on-target base editing frequencies across protospacer positions 1-20. site #: genomic site number. FIG. 22d, Bar plots showing the on-target DNA base editing of different positions at three loci with gCBEv2 or CGBE1 (n = 3) . FIG. 22e, On-target base editing frequencies for gCBEv2 at C6 of site 22 in HEK293T cells for the orthogonal R-loop assay (n = 3) . FIG. 22f, gRNA-independent cumulative off-target editing frequencies detected by the orthogonal R-loop assay at each R-loop site. Each R-loop was performed by co-transfection of each base editor, and an SpCas9 sgRNA targeting corresponding site with dSaCas9 and a SaCas9 sgRNA (n = 3) . All values are presented as mean ± s.e.m.
FIG. 23. Gene editing applications of gTBE and gCBE. FIG. 23a, principle for exon skipping with base editors. FIG. 23b, Bar plots showing the numbers of sgRNA candidates targeting the splicing sites in 16 genes by different base editors. gCBE, gCBEv2; gGBE, gGBEv6.3; gTBE, gTBEv3. The 16 genes are AGT, ANGPTL3, APOC3, B2M, CD33, DMD, DNMT3A, HPD, KLKB1, PCSK9, PDCD1, PRDM1, TGFBR2, TRAC, TTR, and VEGFA. FIG. 23c, Venn diagram showing the distribution of sgRNAs for 4 base editors in FIG. 23b. FIG. 23d, Schematic diagram illustrating sgRNA candidates specifically targeting SD or SA sites in human DMD with gTBEv3 (red lines) or gCBEv2 (black lines) , but not ABE or CBE. FIG. 23e, Schematic diagram illustrating the skipping of human DMD exon 45 induced by gTBE-induced disruption of the splicing donor site. FIG. 23f, On-target base editing efficiency for gTBEv3 targeting the splicing donor site of humanized DMD exon 45 in mouse embryos (mean ± s.e.m., n = 20) . FIG. 23g, DNA sequencing chromatograms from wild-type (WT) and representative embryos co-injected with gTBEv3 mRNA and sgRNA targeting the SD site of human DMD exon 45.
FIG. 24. Comparison of different gTBEs. FIG. 24a, the strategies for protein engineering and screening used in three studies. FIG. 24b, Schematic of the basic architectures for various base editors. UNG2*, UNG2 mutant from the corresponding base editor. ΔNTD, deletion of the N-terminal domain. FIG. 24c, The frequencies of T conversions at 17 endogenous loci. The thymines with editing frequencies >25%for any base editors were showed. The highest frequencies at corresponding positions were highlighted as Heat map (n = 3) . FIG. 24d, Frequencies of T conversions by various base editors across the protospacer positions 1-20 (where PAM is at positions 21–23) from the edited sites in FIG. 24c. Single dot represents individual replicate (n = 3 independent replicates per site) , and boxes span the interquartile range (25th to 75th percentile) ; horizontal lines within the boxes indicate the median (50%) ; and whiskers extend to the minimal and maximal values.
FIG. 25. Characteristic sequences and motifs of human UNG1 and UNG2. FIG. 25a, UNG1-specific N-terminal residues (amino acid 1-35) are marked in grey. UNG2-specific N-terminal residues (amino acid 1-44) are light blue. The common RPA-binding site (yellow) and the globular catalytic domain (light green) are indicated. RPA, Replication protein A. FIG. 25b, UNGs contain five conserved motifs numbered from UNG2 as follows: the catalytic water-activating loop (152-GQDPYH-157) ; the proline (Pro) -rich loop compressing the DNA backbone 5’ to the lesion (174-PPPPS-178) ; the uracil-binding motif (210-LLLN-213) ; the glycine-serine (Gly-Ser) loop that compresses the DNA backbone 3’ to the lesion (255-GS-256) ; and the leucine (Leu) -intercalation loop penetrating the minor groove (277-HPSPLS-282) .
FIG. 26. Characterizations of T-to-G and C-to-G reporter system. FIG. 26a, Schematic construct designs of the reporter for T-to-G or C-to-G editing detection. PAM, Protospacer adjacent motif. FIG. 26b, Representative flow cytometry scatter plots showing gating strategy and the percentages of gated cells for the negative control (upper panel) and gCBEv0.3 (lower panel) .
FIG. 27. Editing efficiency of gTBE and gCBE candidates with various UNG-NTD truncations. FIG. 27a, Percentage of EGFP+ cells for evaluating T editing efficiency of gTBE candidates with various UNG mutants (n = 3) . FIG. 27b, Percentage of EGFP+ cells for evaluating C editing efficiency of gCBE candidates with various UNG mutants (n = 3) . WT, wild-type UNG2Δ88. dead, catalytically inactive UNG2Δ88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ± s.e.m.
FIG. 28. Performance of UNG mutants in the background of gTBEv0.3. FIG. 28a, Percentage of EGFP+ cells of gTBE from alanine-scanning mutagenesis of regions covering the catalytic water-activating loop, the Pro-rich loop, and the uracil-binding motif (n = 3) . Replacement of alanine with valine, A-to-V, is intended to cover all the residues in the interested regions. FIG. 28b, Percentage of EGFP+ cells of gTBE variants from site-saturation mutagenesis of the residue at position 214 (n = 3) . FIG. 28c, Percentage of EGFP+ cells of gTBE with mutations of selected spatial neighbors of residue T214 (n = 3) . dead, catalytically inactive UNG2Δ88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ± s.e.m.
FIG. 29. Performance of UNG mutants in the background of gTBEv2. FIG. 29a-c, Percentage of EGFP+ cells of gTBE with different UNG mutants from sequential substitutions of arginine (FIG. 29a) , aspartic acid (FIG. 29b) , and valine (FIG. 29c) (X-to-R, D, or V) (n = 3) . dead, catalytically inactive UNG2Δ88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ± s.e.m.
FIG. 30. Further characterization of editing profiles of gTBEv3. FIG. 30a, Bar plots showing the on-target DNA base editing at positions with T conversion frequencies >10%at each genomic site in HEK293T cells (mean ± s.e.m., n = 3) . FIG. 30b-e, Frequencies of T (FIG. 30b) , G (FIG. 30c) , C (FIG. 30d) and A (FIG. 30e) conversions by gTBEv3 across the protospacer positions 1-20 (where PAM is at positions 21–23) from the edited sites in FIG. 21a. FIG. 30f, Frequencies of T-to-G and T-to-C editing by gTBEv3. In FIG. 30b-f, single dot represents individual replicate (n = 3 independent replicates per site) , and boxes span the interquartile range (25th to 75th percentile) ; horizontal lines within the boxes indicate the median (50%) ; and whiskers extend to the minimal and maximal values. FIG. 30g-i, Percentage of T-to-G (FIG. 30g) , T-to-C (FIG. 30h) or T-to-A (FIG. 30i) editing by gTBEv3 at various edited sites shown in Figure 3a (mean ± s.e.m., n = 3) . T#: T position with highest on-target base editing frequencies across protospacer positions 1–20. site #: genomic site number. FIG. 30j, The ratio of T-to-Sto total T editing (base conversions and indels) by gTBEv3 editing at the sites shown in FIG. 21a. FIG. 30k. The statistical analysis of on-target DNA base editing for each NT motif from the edited sites in (FIG. 30a) . Each dot represents the mean of three biological replicates for each edited position at various edited sites. n = 8, 14, 6, 6 for motif AT, CT, GT, TT.
FIG. 31. Guide sequence-dependent off-target analysis for gTBEv3 at more sites. The guide sequence-dependent off-target analysis for gTBEv3 editing at site 9 (a) and site 15 (b) (n = 3) . OT: off-target. All values are presented as mean ± s.e.m.
FIG. 32. Performance of UNG mutants in the background of gCBEv0.3. FIG. 32a, Percentage of EGFP+ cells of gCBE by introduction of the mutation A214V (n = 3) . FIG. 32b, Percentage of EGFP+ cells of gCBE from alanine-scanning mutagenesis of regions covering the catalytic water-activating loop and the Pro-rich loop (n = 3) . Replacement of alanine with valine (A-to-V) is intended to cover all the residues in the interested regions. dead, catalytically inactive UNG2Δ88 (carrying D154N and H277N mutations, equivalent to D145N and H268N of UNG1) . All values are presented as mean ± s.e.m.
FIG. 33. Further characterization of editing profiles of gCBEv2. FIG. 33a, Frequency of C base editing outcomes with different gCBE at the edited C2 position in site 28 in transfected HEK293T cells by target deep sequencing (mean ± s.e.m., n = 3) . FIG. 33b, Bar plots showing the on-target DNA base editing at two or more positions with C conversion frequencies >10%at each genomic site in HEK293T cells (mean ± s.e.m., n = 3) . FIG. 33c-d, Frequencies of C (FIG. 33c) and T (FIG. 33d) conversions by gCBEv2 across the protospacer positions 1-20 (where PAM is at positions 21–23) from the edited sites in Figure 4c. Single dot represents individual replicate (n = 3 independent replicates per site) , and boxes span the interquartile range (25th to 75th percentile) ; horizontal lines within the boxes indicate the median (50%) ; and whiskers extend to the minimal and maximal values. FIG. 33e, The ratio of C-to-G/T to C-to-A/G/T conversion frequency by gCBEv2 editing at the sites shown in FIG. 22c. FIG. 33f-h, Percentage of C-to-G (FIG. 33f) , C-to-T (FIG. 33g) or C-to-A (FIG. 33h) editing by gCBEv2 at various edited sites shown in Figure 3a (mean ± s.e.m., n = 3) . FIG. 33i, indels frequencies with gCBEv2 at 16 on-target sites (mean ± s.e.m., n = 3) . In FIG. 33e-i, C#: C position with highest on-target base editing frequencies across protospacer positions 1-20. site #: genomic site number. j, The statistical analysis of on-target DNA base editing for each NC motif from the 16 edited sites. Each dot represents the mean of three biological replicates for each edited position at various edited sites. n = 8, 9, 10, 13 for motif AC, CC, GC, TC. FIG. 33k, indels frequencies with gCBEv2 and CGBE1 at 3 on-target sites from Figure 4d (mean ± s.e.m., n =3) . FIG. 33l, On-target base editing frequencies for CGBE1 at C6 of site 22 in HEK293T cells for the orthogonal R-loop assay (mean ± s.e.m., n = 3) .
FIG. 34. Base editing at spicing sites with gTBEv3. FIG. 34a, The optimal editing windows for various base editors. FIG. 34b, Venn diagram showing the distribution of sgRNAs for CBE and gCBEv2 in FIG. 23b. FIG. 34c, Bar plots showing the frequency of T or C conversations at several splicing acceptor (SA) or splicing donor (SD) sites of interest targeted by gTBEv3 or gCBEv2 (mean ± s.e.m., n = 3) . T#or C#: The position of targeted T or C across protospacer positions 1–20. FIG. 34d, DNA sequencing chromatograms for targeting the SD site of human DMD exon 37 and exon 12 with gTBEv3. Sanger sequencing results were quantified by EditR.
FIG. 35. PTCs editing and introduction for various base editors. FIG. 35a, principle for bypassing premature termination codons (PTCs) with various base editors. FIG. 35b, the possible codon outcomes from stop codons (TAA, TAG or TGA) editing with different base editors. FIG. 35c, principle for introduction of PTCs with various base editors. FIG. 35d, the available codons for editing into stop codons (TAA, TAG
or TGA) with different base editors. FIG. 35e, The 10 × 10 dot plot diagram showing the percentage of possible sgRNAs for introduction of premature termination codons (PTCs) by targeting different codons (with the number of available sgRNAs presented in the right) in 15 well-studied genes (AGT, ANGPTL3, APOC3, B2M, CD33, DNMT3A, HPD, KLKB1, PCSK9, PDCD1, PRDM1, TGFBR2, TRAC, TTR, VEGFA) for gene and cell therapy research with gGBEv6.3 and CBE. In FIG. 35b and FIG. 35d, AYBE, AYBEv3; gCBE, gCBEv2; gGBE, gGBEv6.3; gTBE, gTBEv3.
FIG. 36. Additional comparison of different gTBEs. FIG. 36a-b, The frequencies of T-to-G (FIG. 36a) or T-to-G (FIG. 36b) conversions. The highest frequencies (>20%) of edited thymines at corresponding positions were highlighted as Heat map (n = 3) . FIG. 36c-d, Heat map showing the percentages of T-to-Sto total base conversions (FIG. 36c) or T-to-Sto total T editing (base conversions and indels; FIG. 36d) for various base editors (n = 3) . FIG. 36e, Heat map showing the indels frequencies for various base editors (n = 3) . FIG. 36f-h, The statistical analysis of T base editing (FIG. 36f, n = 35) , T-to-Spercentages (FIG. 36g, n = 35) and indels (FIG. 36h, n = 17) . All values are presented as mean ± s.e.m. Each dot represents the mean of three biological replicates for each edited position at various edited sites. FIG. 36a-h, the graphs were derived from the data for various base editors shown in FIG. 24c. Dunnett’s multiple comparisons test after one-way ANOVA was used to compare the gTBEv3 or gTBEv5 with other base editors in FIG. 36f-h.
FIG. 37. T editing in the dsDNA upstream from the target site. FIG. 37a, Bar plots showing the frequency of T conversation at -5T, -3T, -2T and -1T in the dsDNA upstream from the target site 38 (VEGFA) (n = 3) . FIG. 37b, Bar plots showing the frequency of T conversation at -4T, -3T, -2T and -1T in the dsDNA upstream from the target site 44 (EMX1-siteH1F) (n = 3) . All values are presented as mean ± s.e.m., the graphs were derived from the data for various base editors shown in FIG. 24c.
FIG. 38. Comparison of various glycosylase-based base editors for cytosine editing. FIG. 38a, Schematic of the basic architectures for various base editors. UNG2*, UNG2 mutant from the corresponding base editor. ΔNTD, deletion of the N-terminal domain. FIG. 38b-c, The frequencies of total C editing (base conversions and indels, FIG. 38b) or C base conversions (FIG. 38c) for various base editors at 19 endogenous loci. The cytosines with editing frequencies >25%for any base editors were showed. The highest frequencies at corresponding positions were highlighted as Heat map (n = 3) . FIG. 38d, Heat map showing the indels frequencies for various base editors (n = 3) . FIG. 38e-g, The statistical analysis of total C editing (FIG. 38e, n = 56) , C base conversions (FIG. 38f, n = 49) and indels (FIG. 38g, n = 19) . All values are presented as mean ± s.e.m. Each dot represents the mean of three biological replicates for each edited position at various edited sites. Dunnett’s multiple comparisons test after one-way ANOVA was used to compare the gCBEv2 or gCBEv3 with other base editors.
FIG. 39. Additional comparison of various glycosylase-based base editors for cytosine editing. FIG. 39a-b, Heat map showing the C-to-G frequencies (FIG. 39a) or C-to-G purity (FIG. 39b) for various base editors (n = 3) . The cytosines with C-to-G editing frequencies >25%for any base editors were highlighted. FIG. 39c, Frequencies of C base conversions by various base editors across the protospacer positions 1-20 (where PAM is at positions 21–23) . Single dot represents individual replicate (n = 3 independent replicates per site) , and boxes span the interquartile range (25th to 75th percentile) ; horizontal lines within the boxes indicate the median (50%) ; and whiskers extend to the minimal and maximal values. The graphs were derived from the data for various base editors shown in FIG. 38c.
FIG. 40. Off-target analysis of various glycosylase-based base editors. FIG. 40a-b, The guide sequence-dependent off-target analysis of cumulative T editing (FIG. 40a) or C editing (FIG. 40b) frequencies for various base editors at corresponding sites (n = 3) . OT: off-target. FIG. 40c, The guide sequence-independent cumulative off-target T editing detected by the orthogonal R-loop assay at each R-loop site (n = 3) . FIG. 40d, The statistical analysis of guide sequence-independent off-target T editing (n = 5) . Each dot represents the mean of three biological replicates for each edited position at various edited sites. Dunnett’s multiple comparisons test after one-way ANOVA was used to compare the gTBEv3 or gTBEv5 with other base editors. FIG. 40e, RNA off-target analysis for various base editors (n = 3) . The mCherry was used as control. D = A or G or U; V = A or C or G. For multiple comparisons, the Dunnett’s multiple comparisons test after one-way ANOVA was used to compare the gCBEv3 or gTBEv5 with other groups. For comparison of RNA U-to-V SNVs for gTBEv5 and mCherry, two-tailed unpaired two-sample t test was used. All values are presented as mean ± s.e.m.
FIG. 41. Comparison between gTBEs or gCBEs and PEs. FIG. 41a, Bar plots showing the on-target T-to-G editing frequency for various editors at each genomic site in HEK293T cells (n = 3) . OT: off-target. FIG. 41b, Bar plots showing the on-target C-to-G editing frequency for various editors at each genomic site in HEK293T cells (n = 3) . The PE6d was used together with epegRNA and nick sgRNA. For PE6d max, PE6d was co-expressed with the codon-optimized hMLH1dn, a dominant negative MMR protein. All values are presented as mean ± s.e.m.
FIG. 42. Characterization of editing profiles for gTBEs or gCBEs in HEK293T, HuH-7, and U2OS cells. FIG. 42a, Bar plots showing the on-target T base editing frequency, T-to-G editing purity or T-to-C editing purity for gTBEv3 and gTBEv5 in different cell lines (n = 3) . FIG. 42b, Bar plots showing the on-target C base editing frequency, C-to-G editing purity or C-to-T editing purity for gCBEv2 and gCBEv3 in different cell lines (n = 3) . HuH-7, a cell line established from a human hepatocellular carcinoma; U2OS, a cell line established from a human bone osteosarcoma. All values are presented as mean ± s.e.m.
The figures herein are for illustrative purposes only and are not necessarily drawn to scale.
Overview
Traditional DNA base editors such as ABE and CBE can directly edit A and C with the help of deamination. However, G and T can only be indirectly edited by directly editing C and T on the strand opposite to G and T. In contrast, provided in the disclosure includes at least in part base editors and base editing methods capable of direct base editing of a target deoxyribonucleotide (e.g., dG, dT, dC) in a target dsDNA.
Traditional DNA base editors such as ABE and CBE require a deaminase for the base editing of A and C. In contrast, provided in the disclosure includes at least in part base editors and base editing methods capable of base editing of a target deoxyribonucleotide (e.g., dG, dT, dC) in a target dsDNA in the absence of a deaminase.
The base editors and base editing methods of the disclosure rely on the base excision domain of the base editor. The base excision domain is capable of directly excising the base of a target deoxyribonucleotide in a target dsDNA to generate an abasic site in situ, trigging a base excision repair (BER) pathway. As a result of base excision and subsequent repair, the target deoxyribonucleotide may be converted to another deoxyribonucleotide, leading to base editing of the target deoxyribonucleotide.
The base editors and base editing methods of the disclosure also rely on a nucleic acid programmable DNA binding domain (napDNAbd) to specifically direct the base editor to a target dsDNA via a guide nucleic acid capable of interacting with both the napDNAbd and the target dsDNA. The napDNAbd may be associated (e.g., complex) with a guide nucleic acid (e.g., a guide RNA) . The guide nucleic acid is designed to localize or target the napDNAbd to the target dsDNA, by relying on the hybridization between a target sequence of the target dsDNA and a corresponding guide sequence of the guide nucleic acid. Specifically, the guide nucleic acid comprises a guide sequence that is capable of hybridizing to a target sequence of the target dsDNA due to the substantial complementarity between the guide sequence and the target sequence. The guide nucleic acid also comprises a scaffold sequence capable of forming a complex with the napDNAbd. In this way, the guide nucleic acid “programs” the napDNAbd such that the napDNAbd can specifically localize and (indirectly) bind to the region on and around the target sequence of the target dsDNA via the guide nucleic acid. The binding of the napDNAbd to the target dsDNA enables the base excising domain associated with the napDNAbd to specifically access to and function on the base of the target deoxyribonucleotide in the target sequence of the target dsDNA in a guide sequence-specific/dependent way.
Referring to FIG. 1a, 6-8, and 19a, the base excision domain of the base editor of the disclosure directly excises the base of a target deoxyribonucleotide (the first deoxyribonucleotide in FIG. 6) , generates an abasic site where the base is removed, which is an apurinic site where a purine (e.g., guanine) is removed or an apyrimidinic site where a pyrimidine (e.g., thymine, cytosine) is removed. The abasic site may be repaired by translesion synthesis (TLS) (by, e.g., TLS polymerase) and/or DNA replication, leading to base editing, in which case nicking in the strand opposite to the abasic site may not be necessary.
Alternatively, if a nucleic acid programmable DNA nickase such as Cas9 nickase is used as the napDNAbd, it creates a nick in the strand (non-edited strand) opposite to the abasic site, and the apyrimidinic site may be removed by AP lyase to generate another nick on the edited strand, which two nicks trigger double-strand break (DSB) repair and introduction of indel mutation, also leading to highly potential change of the target deoxyribonucleotide.
As a specific example, FIG. 6 shows, before base editing, a first deoxyribonucleotide of dG as target deoxyribonucleotide to be edited on the edited strand (nontarget strand) and a second deoxyribonucleotide of dC on the opposite strand (non-edited strand /target strand) base pairing with the dG. The dG is located in a protospacer sequence on the nontarget strand of the target dsDNA, and the dC is located in a target sequence on the target strand of the target dsDNA. A guide nucleic acid is designed to comprise a guide sequence capable of hybridizing to the target sequence and comprises a scaffold sequence capable of forming a complex with a napDNAbd. The napDNAbd is capable of nicking the target stand and is fused with a base excising domain capable of excising guanine of the dG. As a specific example, FIG. 7 shows, after base editing, a fourth deoxyribonucleotide of dC as outcome deoxyribonucleotide on the edited strand (nontarget strand) and a third deoxyribonucleotide of dG on the opposite strand (non-edited strand /target strand) base pairing with the dC. FIG. 6 and FIG. 7 together show direct dG-to-dC base editing.
As a specific example, FIG. 8 shows, after base editing, a fourth deoxyribonucleotide of dT as outcome deoxyribonucleotide on the edited strand (nontarget strand) and a third deoxyribonucleotide of dA on the opposite strand (non-edited strand /target strand) base pairing with the dT. FIG. 6 and FIG. 8 together show direct dG-to-dT base editing.
The base editing approach of the disclosure allows direct base editing of a target deoxyribonucleotide (e.g., dG, dT, dC) in a target dsDNA, expanding the scope of target design and screening for the direct base editing. For example, if editing of a target dG to dA is desired, the traditional base editors incapable of directly editing dG would have to be applied to edit dC on the opposite strand to dT, thereby indirectly editing dG to dA. However, the editing ability of the traditional CBE might not be able to edit the dC with a desired outcome dT;the PAM limitation of the CBE might not allow designing a target /guide sequence targeting the dC to specifically direct the CBE to the dC; and even if such a guide sequence can be designed, the base editing efficiency of the CBE might not be sufficient. With the base editing approach of the disclosure, the target dG can be directly base edited, and therefore, developers would have much more chance to design, screen, and obtain a suitable target /guide sequence targeting the dG to specifically direct the base editor of the disclosure to the dG.
The base editing approach of the disclosure may function in the absence of deamination of the base of the target deoxyribonucleotide before the excision of the base of the target deoxyribonucleotide, or in the absence of deamination at all. On the contrary, traditional ABE needs to deaminase the base (adenine) of a target dA to a hypoxanthine, thereby converting the target dA to inosine (I) , which reads as dG in DNA repair replication; and traditional CBE needs to deaminase the base (cytosine) of a target dC to an uracil, thereby converting the target dC to uridine (U) , which reads as dT in DNA repair or replication. Deamination is unlikely for G (due to spontaneous remediation) [18] and impossible for T (due to the absence of amine) , making the development of deaminase-based G and T base editors a challenging task. The omission of a deaminase domain in the base editor of the disclosure opens the way to base editing of G and T, and may also avoid undesired effects caused by the deaminase domain and deamination of traditional deaminase-based base editors and reduce base editor size.
As a non-limiting example of the disclosure, a deaminase-free, glycosylase-based guanine base editor (gGBE) was developed with G editing ability, by fusing a nucleic acid programmable DNA nickase such as Cas9 nickase with a human N-methylpurine DNA glycosylase (MPG) mutant capable of excising guanine of dG developed by several rounds of MPG mutagenesis via unbiased and rational screening. It was demonstrated that the gGBE has high G editing efficiency. Furthermore, the gGBE exhibited high base editing efficiency (up to 81.2%) and high G-to-T or G-to-C (i.e., G-to-Y) conversion ratio (up to 0.95) in both cultured human cells and mouse embryos. The editing profile of the gGBE was characterized by targeting dozens of endogenous genomic loci in cultured mammalian cells as well as mouse embryos, demonstrating its high G-to-Y (Y = C or T) base editing efficiency.
As another non-limiting example of the disclosure, by fusing a nucleic acid programmable DNA nickase such as Cas9 nickase with a human uracil DNA glycosylase (UNG) mutant capable of excising thymine of dT or cytosine of dC separately developed by mutagenesis of UNG, two deaminase-free, glycosylase-based base editors for direct T editing (gTBE) and direct C editing (gCBE) were developed to achieve orthogonal base editing, that is gTBE for direct T editing and gCBE for direct C editing, respectively. By several rounds of structure-informed rational mutagenesis on UNG in cultured human cells, gTBE and gCBE were obtained with high activity of T-to-S (i.e., T-to-C or T-to-G) and C-to-G conversions, respectively. Furthermore, by embedding the UNG mutant into a nucleic acid programmable DNA nickase such as Cas9 nickase, more gTBE and gCBE were generated, showing enhanced average editing efficiency and alternative editing windows. The editing profile of gTBE and gCBE were characterized by targeting dozens of endogenous genomic loci in cultured mammalian cells as well as mouse embryos, demonstrating their high base editing efficiency.
Base editor
The base editor of the disclosure may be provided in the form of a fusion protein. In an aspect, the disclosure provides a fusion protein comprising:
(1) a nucleic acid programmable DNA binding domain (napDNAbd) capable of binding a target dsDNA comprising:
(a) a first deoxyribonucleotide (e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine) ) in a protospacer sequence on a nontarget strand (edited strand) of the target dsDNA, and
(b) a second deoxyribonucleotide (e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine) ) base pairing with the first deoxyribonucleotide (e.g., dG, dT, dC) and in a target sequence on the target strand (non-edited strand) of the target dsDNA,
wherein the protospacer sequence is fully reverse complementary to the target sequence; and
(2) a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide.
In some embodiments, the fusion protein does not comprise a deaminase domain, e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof.
The components of the fusion protein are described more specifically in the other sub-sections herein.
Base editing system
In another aspect, the disclosure provides a system comprising:
(i) a fusion protein or a polynucleotide encoding the fusion protein, the fusion protein comprising:
(1) a nucleic acid programmable DNA binding domain (napDNAbd) capable of binding a target dsDNA comprising:
(a) a first deoxyribonucleotide (e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine) ) in a protospacer sequence on a nontarget strand (edited strand) of the target dsDNA, and
(b) a second deoxyribonucleotide (e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine) ) base pairing with the first deoxyribonucleotide (e.g., dG, dT, dC) and in a target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence; and
(2) a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide; and
(ii) a guide nucleic acid or a polynucleotide encoding the guide nucleic acid, the guide nucleic acid comprising:
(1) a scaffold sequence capable of forming a complex with the napDNAbd; and
(2) a guide sequence capable of hybridizing to the target sequence on the target strand of the target dsDNA, thereby guiding the complex to the target dsDNA.
In some embodiments, the system is a complex comprising the fusion protein complexed with the guide nucleic acid. In some embodiments, the complex further comprises the target dsDNA hybridized with the guide sequence. In some embodiments, the system is a composition comprising the component (i) and the component (ii) .
In some embodiments, the guide nucleic acid as described herein is a guide RNA (gRNA) . As used herein, the term “gRNA” is used interchangeably with single guide RNA (sgRNA) .
The components of the system are described more specifically in the other sub-sections herein.
Base editing method
In yet another aspect, the disclosure provides a method of modifying a target dsDNA, comprising contacting the target dsDNA with a system,
the target dsDNA comprising:
(a) a first deoxyribonucleotide (e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine) ) in a protospacer sequence on a nontarget strand (edited strand) of the target dsDNA, and
(b) a second deoxyribonucleotide (e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine) ) base pairing with the first deoxyribonucleotide (e.g., dG, dT, dC) and in a target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence; and
the system comprising:
(i) a fusion protein or a polynucleotide encoding the fusion protein, the fusion protein comprising:
(1) a nucleic acid programmable DNA binding domain (napDNAbd) capable of binding the target dsDNA; and
(2) a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide; and
(ii) a guide nucleic acid or a polynucleotide encoding the guide nucleic acid, the guide nucleic acid comprising:
(1) a scaffold sequence capable of forming a complex with the napDNAbd; and
(2) a guide sequence capable of hybridizing to the target sequence on the target strand of the target dsDNA, thereby guiding the complex to the target dsDNA.
In some embodiments, the method does not include deamination of the base of the first deoxyribonucleotide before the excision of the base of the first deoxyribonucleotide.
In some embodiments, the method does not include deamination of the base of the first deoxyribonucleotide.
In some embodiments, the method comprises inducing strand separation of the target dsDNA.
The components of the method are described more specifically in the other sub-sections herein.
Target deoxyribonucleotide and target dsDNA
Referring to FIG. 6-8, the target deoxyribonucleotide to be edited in the target dsDNA may be termed as “first deoxyribonucleotide” , and the outcome deoxyribonucleotide converted from the target deoxyribonucleotide by base editing may be termed as “fourth deoxyribonucleotide” .
In some embodiments, the first deoxyribonucleotide is deoxyguanosine (dG) , thymidine (dT) , deoxyadenosine (dA) , or deoxycytidine (dC) . In some embodiments, the first deoxyribonucleotide is dG. In some embodiments, the first deoxyribonucleotide is dT. In some embodiments, the fourth deoxyribonucleotide is dA, dT, dC, or dG. In some embodiments, the second deoxyribonucleotide is dC, dA, dT, or dG. In some embodiments, the third deoxyribonucleotide is dA, dT, dC, or dG.
As would be desired, in some embodiments, the first deoxyribonucleotide is converted to a fourth deoxyribonucleotide that is different from the first deoxyribonucleotide. In some embodiments, the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide is dG-to-dA, dG-to-dT, dG-to-dC, dT-to-dA, dT-to-dC, dT-to-dG, dC-to-dA, dC-to-dT, or dC-to-dG.
In some embodiments, the target dsDNA is a wild type or naturally-occuring. In some embodiments, the target dsDNA is not a wild type or naturally-occuring. In some embodiments, the target dsDNA is eukaryotic or prokaryotic. In some embodiments, the target dsDNA is from an animal (e.g., human, monkey, mouse) or plant. In some embodiments, the target dsDNA is a target gene. In some embodiments, the gene is an animal (e.g., human, monkey, mouse) or plant gene. In some embodiments, the dsDNA is in a target cell.
In some embodiments, the first deoxyribonucleotide is native or nonnative to the target dsDNA. In some embodiments, the first deoxyribonucleotide is a mutation in the target dsDNA. In some embodiments, the first deoxyribonucleotide is a pathogenic mutation in the target dsDNA. In some embodiments, the first deoxyribonucleotide is a mutation resulting in a stop codon in the target dsDNA.
In some embodiments, the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide directly or indirectly converts a stop codon to a non-stop codon or directly or indirectly converts a non-stop codon to a stop codon, either on the target strand or the nontarget strand. In some embodiments, the stop codon is on the sense strand of the dsDNA.
In some embodiments, the the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide occurs on the sense strand or the nonsense strand of the dsDNA. In some embodiments, the the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide (e.g., dG-to-dT) occurs on the sense strand of the dsDNA, converting a stop codon on the sense strand to a non-stop codon or converting a non-stop codon (e.g., GAA) on the sense strand to a stop codon (e.g., TAA) . In some embodiments, the the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide (e.g., dG-to-dC) occurs on the nonsense strand of the dsDNA, converting a stop codon on the sense strand to a non-stop codon or converting a non-stop codon (e.g., TCA) on the sense strand to a stop codon (e.g., TGA) .
In some embodiments, the the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide (e.g., dG-to-dC) occurs at the splicing site (e.g., splicing donor, splicing acceptor) of the target dsDNA. In some embodiments, the the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide (e.g., dG-to-dC) occurring at the splicing site (e.g., splicing donor, splicing acceptor) increases or decreases the translation of a transcript transcribed from the target dsDNA.
In some embodiments, the first deoxyribonucleotide is at a position of the protospacer sequence selected from the group consisting of position 1, position 2, position 3, position 4, position 5, position 6, position 7, position 8, position 9, position 10, position 11, position 12, position 13, position 14, position 15, position 16, position 17, position 18, position 19, position 20, and a combination thereof; or wherein the first deoxyribonucleotide is at a position of the protospacer sequence between position 1 and position 20, both inclusive; or wherein the first deoxyribonucleotide is at a position of the protospacer sequence between position 1 and position 14, both inclusive. In some embodiments, the first deoxyribonucleotide is at a position of the protospacer sequence selected from the group consisting of position 6, position 7, position 8, position 9, position 10, position 11, and a combination thereof; or wherein the first deoxyribonucleotide is at a position of the protospacer sequence between position 6 and position 11, both inclusive. In some embodiments, the first deoxyribonucleotide is at position 7 of the protospacer sequence.
In some embodiments, the first deoxyribonucleotide is the N1 or N2 nucleotide in a motif of N1N2, wherein N1 or N2 is A, T, G, or C. In some embodiments, the first deoxyribonucleotide is the N2 nucleotide in a motif of N1N2, wherein N1 is A or T, and N2 is C. In some embodiments, the first deoxyribonucleotide is the N1, N2, or N3 nucleotide in a motif of N1N2N3, wherein N1, N2, or N3 is A, T, G, or C.
Base excising domain
As used herein, the term “base excising domain (BED) ” is used interchangeably with “base excising protein (BEP) ”
or “base excising enzyme (BEE) ” and refers to a protein capable of recognizing and excising a base (e.g., A, T, C, G, or U) of a nucleotide of a nucleic acid (e.g., DNA (ssDNA or dsDNA) or RNA) . As used herein, the term “base” is used interchangeably with “nucleobase” or “nitrogenous base” . Base includes, for example, adenine (A) , cytosine (C) , guanine (G) , thymine (T) , and uracil (U) , and they may be termed as primary, normal, or canonical base. As well known in the art, a deoxyribonucleotide is composed of a base, a deoxyribose, and a phosphate, and a deoxyribonucleoside is composed of a base and a deoxyribose. Excising the base of a deoxyribonucleoside releases the base from the deoxyribonucleoside. In some embodiments, excising the base of a deoxyribonucleoside comprises cleaving or hydrolyzing the glycosidic bond linking the base to the deoxyribose of the first deoxyribonucleotide, thereby releasing the base from the first deoxyribonucleotide.
In some embodiments, the base excising domain is (substantially) capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide (e.g., dG, dT, dC) . In some embodiments, the first deoxyribonucleotide is dG, dT, dA, or dC. In some embodiments, the base excising domain is (substantially) capable of excising guanine of dG. In some embodiments, the base excising domain is (substantially) capable of excising thymine of dT. In some embodiments, the base excising domain is (substantially) capable of excising cytosine of dC. In some embodiments, the base excising domain is (substantially) capable of excising adenine of dA.
In some cases, it would be desired to apply highly specific base editing, for example, in disease treatment. Therefore, it would be desired that the base excising domain can only excise one type of bases but not excise the other types of bases. In some embodiments, the base excising domain is (substantially) incapable of excising guanine of dG. In some embodiments, the base excising domain is (substantially) incapable of excising thymine of dT. In some embodiments, the base excising domain is (substantially) incapable of excising cytosine of dC. In some embodiments, the base excising domain is (substantially) incapable of excising adenine of dA.
In some cases, it would be desired to apply mutilplexed base editing. Therefore, it would be desired that the base excising domain can excise more than one type of bases. In some embodiments, the base excising domain is (substantially) capable of excising any two, three, or four of guanine of dG, thymine of dT, cytosine of dC, and adenine of dA. For example, in some embodiments, the base excising domain is (substantially) capable of excising both guanine of dG and thymine of dT.
In some embodiment, the base excising domain is (substantially) capable of excising uracil. In some embodiment, the base excising domain is (substantially) incapable of excising uracil. In some embodiment, the base excising domain is (substantially) capable of excising hypoxanthine. In some embodiment, the base excising domain is (substantially) incapable of excising hypoxanthine.
In some embodiments, the fusion protein of the disclosure does not comprise a base excising domain (substantially) capable of excising guanine of dG, thymine of dT, cytosine of dC, adenine of dA, uracil, and/or hypoxanthine.
In some embodiment, the base excising domain is (substantially) incapable of excising bases on both strands of a target dsDNA. In some embodiments, the base excising domain is (substantially) incapable of excising both bases of a pair of base-paired deoxyribonucleotides on both strands of a dsDNA.
In some embodiments, the base excision domain comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to a naturally-occurring base excision domain, such as a naturally-occurring base excision domain provided herein.
In some embodiments, the fusion protein of the disclosure comprises one, two, three, or more base excising domains. In some embodiments, the fusion protein comprises two, three, or more base excising domains, which are the same or different.
The base excising domain could be a glycosylase having the desired base exising ability. In some embodiments, the base excising domain comprises a glycosylase. In some embodiments, the glycosylase is selected from the group consisting of N-methylpurine DNA glycosylase (MPG) , 8-oxoguanine DNA glycosylase (OGG1) , methyl-CpG binding domain 4, DNA glycosylase (MBD4) , thymine DNA glycosylase (TDG) , uracil DNA glycosylase (UNG) , single-strand-selective monofunctional uracil-DNA glycosylase 1 (SMUG1) , mutY DNA glycosylase (MUTYH) , nth like DNA glycosylase 1 (NTHL1) , nei like DNA glycosylase 1 (NEIL1) , nei like DNA glycosylase 2 (NEIL2) , nei like DNA glycosylase 3 (NEIL3) , and mutants thereof capable of recognizing and excising a base from a nucleotide of a nucleic acid.
Exemplary glycosylases capable of excising a base include, without limitation, UDG-N204D and UDG-Y147A as described in Kavli, B. et al. Excision of cytosine and thymine from DNA by mutants of human uracil-DNA glycosylase. EMBO J 15, 3442-3447 (1996) ; the entire contents of which are hereby incorporated by reference.
In some embodiments, the base excision domain is not wild type or naturally-occurring.
MPG
In some embodiments, the base excising domain comprises an N-methylpurine DNA glycosylase (MPG) . In some embodiments, the MPG comprises a motif GxxYxxxxYGxxxxxN, wherein x represents any amino acid. In some embodiments, the MPG is obtained from a species selected from Table A. In some embodiments, the MPG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference MPG. In some embodiments, the wild type or reference MPG is human MPG (SEQ ID NO: 9) or an MPG obtained from a species selected from Table A or any MPG as set forth in Table D or a homology or mutant (e.g., comprising an amino acid sequence of SEQ ID NO: 5, 6, or 7) thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) (e.g., SEQ ID NO: 1) . In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, and/or 298 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
In some embodiments, the amino acid mutation confers an ability to excise a base on the MPG. In some embodiments, the base is guanine.
In some embodiments, the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control MPG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of G163, N169, D175, C178, S198, K202, G203, S206, K210, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1. In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
In some embodiments, the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference MPG. In some embodiments, the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) ,
Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/H) ) ; or (4) a negatively charged amino acid residue (such as, Aspartic Acid (Asp/D) , Glutamic Acid (Glue/E) ) . In some embodiments, the amino acid substitution is a substitution with R, A, N, or G.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
In some embodiments, the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference MPG.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1. In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
In some embodiments, the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1. In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
In some embodiments, the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, wherein the position is numbered according to SEQ ID NO: 1. In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 1 or 9.
In some embodiments, the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
In some embodiments, the MPG is (substantially) capable of excising guanine of dG. In some embodiments, the MPG is (substantially) incapable of excising thymine of dT. In some embodiments, the MPG is (substantially) incapable of excising cytosine of dC. In some embodiments, the MPG is (substantially) incapable of excising adenine of dA.
In some embodiments, the MPG is not wild type or naturally-occurring.
UNG
In some embodiments, the base excising domain comprises an uracil-DNA glycosylase (UNG) . In some embodiments, the UNG comprises a motif GQDPYH. In some embodiments, the UNG is obtained from a species selected from Table C. In some embodiments, the UNG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference UNG. In some embodiments, the wild type or reference UNG is human UNG1 (SEQ ID NO: 54) or human UNG2 (SEQ ID NO: 133) or an UNG obtained from a species selected from Table C or any UNG as set forth in Table D or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) . In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 56, 58, 135, or 137.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220,
221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, and/or 313 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
Alternative splicing as well as transcription from two distinct start sites leads to two different human UNG isoforms, mitochondrial UNG1 (304 amino acids, aa) (SEQ ID NO: 54) and nuclear UNG2 (313 aa) (SEQ ID NO: 133) , each possessing an unique N-terminus that mediates translocation to mitochondria or nucleus 16 (FIG. 25) . The sequence alignment of human UNG1 and human UNG2 shows that the amino acid residues at positions 1-35 of UNG1 are different from the amino acid residues at positions 1-44 of UNG2, and the remaining parts of UNG1 and UNG2 are identical. The corresponding relationship between the positions of UNG1 (SEQ ID NO: 54) and UNG2 (SEQ ID NO: 133) can be determined by sequence alignment. For example, the residues Y156 and N213 of UNG2 are corresponding to the residues Y147 and N204 of UNG1, respectively.
In some embodiments, the amino acid mutation confers an ability to excise a base on the UNG. In some embodiments, the base is thymine. In some embodiments, the base is cytosine.
In some embodiments, the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control UNG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of Y156, K184, N213, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO:133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135 or 137.
In some embodiments, the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference UNG. In some embodiments, the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/H) ) ; or (4) a negatively charged amino acid residue (such as, Aspartic Acid (Asp/D) , Glutamic Acid (Glue/E) ) . In some embodiments, the amino acid substitution is a substitution with A, D, V, or T.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of Y156A, K184A, N213D, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
In some embodiments, the amino acid mutation comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
In some embodiments, the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference UNG.
In some embodiments, the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of A214T, Q259A, and Y284D, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
In some embodiments, the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of K184A and A214V, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
In some embodiments, the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
In some embodiments, the UNG is (substantially) capable of excising thymine of dT. In some embodiments, the UNG is (substantially) capable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising thymine of dT. In some embodiments, the UNG is (substantially) incapable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising adenine of dA. In some embodiments, the UNG is (substantially) incapable of excising guanine of dG.
In some embodiments, the UNG is not wild type or naturally-occurring.
Additional BED
In some embodiments, the base excising domain comprises TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1. In some embodiments, the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is human TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 (SEQ ID NO: 64, 65, 66, 67, 68, 69, 70, 71, or 72, respectively) or a homology or mutant thereof
or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) . In some embodiments, the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid sequence having a sequence identity of at least about 60%(e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 64-72, respectively.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, and/or 604 of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1, wherein the position is numbered according to any one of SEQ ID NOs: 64-72.
In some embodiments, the amino acid mutation confers an ability to excise a base on the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1. In some embodiments, the base is guanine, thymine, cytosine, adenine, uracil, or hypoxanthine.
In some embodiments, the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1. In some
embodiments, the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/H) ) ; or (4) a negatively charged amino acid residue (such as, Aspartic Acid (Asp/D) , Glutamic Acid (Glue/E) ) .
In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1, respectively..
In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising guanine of dG. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising thymine of dT.In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising adenine of dA.
In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising cytosine of dC.In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising adenine of dA. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising guanine of dG.
In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is not wild type or naturally-occurring.
napDNAbd
As used herein, the term “napDNAbd” is used interchangeably with “nucleic acid programmable DNA binding protein (napDNAbp) ” . In some embodiments, the napDNAbd is RNA programmable DNA binding protein. Various napDNAbd are known in the art, including, for example, those listed in WO2020/181195, which is incorporated herein by reference in its entirety. Representative napDNAbd include, for example, CRISPR-associated (Cas) proteins, IscB, IsrB, Argonaute, and TnpB.
In some embodiments, the napDNAbd substantially lacks dsDNA cleavage activity (endonuclease activity) . In some embodiments, the napDNAbd substantially lacks dsDNA cleavage activity (endonuclease activity) and nickase activity. In some embodiments, the napDNAbd is nuclease-inactive, for example, a dead Cas. In some embodiments, the napDNAbd is endonuclease-inactive, for example, a dead Cas.
In some embodiments, the napDNAbd is a nickase. In some embodiments, the napDNAbd has nickase activity. In some embodiments, the napDNAbd has nickase activity to nick the target strand. In some embodiments, the napDNAbd nicks the target strand. In some embodiments, the method comprising nicking the target strand. In some embodiments, the nick on the target strand or nicking the target strand incorporates an indel (insertion and/or deletion) into the target strand.
In some embodiments, the napDNAbd is capable of inducing strand separation of the target dsDNA.
In some embodiments, the napDNAbd comprises a Cas domain. In some embodiments, the napDNAbd comprises a Cas nickase (nCas) or a dead (nuclease-inactive) Cas (dCas) of a Cas protein.
In some embodiments, the Cas protein is selected from a group consisting of a Cas9 protein (such as, SpCas9, SaCas9, GeoCas9, CjCas9, Cas9-KKH, circularly permuted Cas9, Argonaute (Ago) , SmacCas9, Spy-macCas9, xCas9, SpCas9-NG, SpG Cas9) ; a Cas12 protein (such as, Cas12a (Cpf1) , AsCas12a, LbCas12a, Cas12b, Cas12c, Cas12d, Cas12e, Cas12f (Cas14) , Cas12g, Cas12h, Cas12i, xCas12i, Cas12Max, hfCas12Max, Cas12j, Cas12k, Cas12l, Cas12m, Cas12n, Cas12o, Cas12p, Cas12q, Cas12r, Cas12s, Cas12t, Cas12u, Cas12v, Cas12w, Cas12x, Cas12y, Cas12z) ; a Cas13 protein (such as, Cas13a, Cas13b, Cas13c, Cas13d, Cas13e, Cas13f, Cas13x, Cas13y) ; Csn2; and a mutant thereof. xCas12i, Cas12Max, and hfCas12Max are listed in PCT/CN2022/129376, which is incorporated herein by reference in its entirety.
In some embodiments, the Cas nickase is a Cas9 nickase (nCas9) , such as SpCas9 nickase (SpCas9-D10A) .
In some embodiments, the dead Cas is a dead Cas9 (dCas9) , such as dead SpCas9 (SpCas9-D10A+H840A) .
In some embodiments, the Cas nickase is a Cas12i nickase (nCas12i) or dead Cas12i (dCas12i) , such as a
deadCas12i of xCas12i polypeptide.
In some embodiments, the napDNAbd comprises an IscB nickase (nIscB) or a dead IscB (dIscB) of an IscB protein (e.g., OgeuIscB) or an IscB protein described in PCT/CN2023/129167, PCT/CN2023/142506, PCT/CN2024/071744, and PCT/CN2023/125069, which are incorporated herein by reference in their entireties.
In some embodiments, the napDNAbd comprise an amino acid sequence having a sequence identity of at least about 60%(e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 2, 48, 50, 52, or 163.
In some embodiments, the napDNAbd comprises a TnpB nickase or a dead TnpB of a TnpB protein.
BE configuration
In some embodiments, the fusion protein comprises, from N-terminal to C-terminal, (1) the napDNAbp and the base excising domain; or (2) the base excising domain and the napDNAbp.
In some embodiments, the napDNAbd (e.g., Cas9) is a two-part napDNAbd, for example, a two-part split Cas9, comprising a N-terminal portion and a C-terminal portion, and wherein the fusion protein comprises, from N-terminal to C-terminal, (1) the N-terminal portion of the napDNAbd, the base excising domain, and the C-terminal portion of the napDNAbd; (2) the C-terminal portion of the napDNAbd, the base excising domain, and the N-terminal portion of the napDNAbd; or (3) the base excising domain, the C-terminal portion of the napDNAbd (e.g., amino acids at positions 1249-1368) , and the N-temrinal portion (e.g., amino acids at positions 1-1248) of the napDNAbd.
In some embodiments, the napDNAbd is SpCas9 (e.g., a SpCas9 nickase) or a mutant thereof (e.g., a SpG Cas9 nickase) . In some embodiments, the N-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1 or 2 to 1012, 1028, 1041, 1046, 1047, 1248, 1249, or 1300. In some embodiments, the C-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1013, 1029, 1042, 1047, 1048, 1249, 1063, 1064, 1230, 1249, or 1301 to 1368.
For example, in some embodiments, the fusion protein comprises the base excising domain embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2; or embedded between positions 2-1047 of nCas9 (SEQ ID NO: 2) and positions 1064-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2.
A typical protein would usually have a N-terminal Met at its most N-terminal (position 1) , since it requires to be translated from a polynucleotide containing a start codon ATG (encoding Met) at its most 5’ end. However, if such a protein is fused at the C-terminal of a second protein (e.g., an NLS, a napDNAbd) , the start codon ATG may not be necessary for the protein since there would typically be a start codon upstream of the second protein for the translation of the fusion protein as a whole, and thus the N-terminal Met of the protein could be removed. Any protein described in the disclosure refers to both the protein per se and a N-terminal truncation thereof with its most N-terminal Met (if present) removed.
In some embodiments, the fusion protein comprises an NLS at the N-terminal and/or C-terminal of the napDNAbp. In some embodiments, the fusion protein comprises an NLS at the N-terminal and/or C-terminal of the base excising domain. In some embodiments, the NLS is or comprises a SV40 NLS, a bpSV40 NLS (e.g., SEQ ID NO: 10 or 11) , or a NP NLS (Xenopus laevis Nucleoplasmin NLS, nucleoplasmin NLS) . Additional NLS suitable for the disclosure or the way of linking an NLS to any of the components of the fusion protein of the disclosure include, for example, a linker of SGGS, or those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
In some embodiments, the components (e.g., the napDNAbp and the base excising domain, the NLS and the napDNAbp, or the NLS and the base excising domain) of the fusion protein are fused to each other with or without a linker. Suitable linkers include, for example, SGGS, the linker of SEQ ID NO: 134, and those listed in WO2020/181195, which is incorporated herein by reference in its entirety.
In some embodiments, the fusion protein comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 12, 14, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 55, 57, 59, 61, 63, 136, 138, 140, 142, 143, 145, 147, 149, 151, 153, 154, 156, 158, 160, 161, 162, and 164.
Deaminase domain
Notwithstanding the advantage of omissing a deamianse domain in the base editor of the disclosure, deamination may be allowed after the excision of the base of the target deoxyribonucleotide. In some aspects, the fusion protein of the disclosure may be used in combination with a deaminase domain for various purposes,
e.g., improved outcome purity. By “purity” it means the percentage /proportion of an outcome among all possible outcomes. For example, purity of dT means the percentage /proportion of dT as an outcome among all possible outcomes including, for example, dA, dT, dG, and dC. As an example, the introduction of a deaminase domain may contribute to further conversion of an undesired deoxyribonucleotide as a byproduct (e.g., dC) to a desired deoxyribonucleotide (e.g., dT) by A-to-T base editing. So in summary, there could be a two-stage conversion if dG-to-dT is desired, first, the target dG is converted, in part, to dC by the base editing without deamination as described herein, and second, the dC is converted to dT by the C-to-T base editing with deamination, thereby achieving high purity dG-to-dT. Therefore, in some embodiments, the fusion protein further comprises a deaminase domain. The deaminase domain may be fused to a component of the fusion protein without or with a linker as described herein.
Various deaminases are known in the art, including, for example, those listed in WO2020/181195, which is incorporated herein by reference in its entirety. Representative adenine deaminases include, for example, TadA and homologs and variants thereof, and APOBEC and homologs and variants thereof.
In some embodiments, the deaminase domain is a deaminase domain (substantially) capable of deaminating adenine, guanine, hypoxanthine, cytidine, thymine, and/or uracil. In some embodiments, the deaminase domain is an adenine deaminase domain or a cytosine deaminase domain.
In some embodiments, the deaminase domain comprises a tRNA adenosine deaminase (TadA) or a functional variant or fragment thereof, e.g., TadA8e (SEQ ID NO: 3) , TadA8.17, TadA8.20, TadA9, TadA8EV106W, TadA8EV106W+D108Q TadA-CDa, TadA-CDb, TadA-CDc, TadA-CDd, TadA-CDe, TadA-dual, TADAC-1.2, TADAC-1.14, TADAC-1.17, TADAC-1.19, TADAC-2.5, TADAC-2.6, TADAC-2.9, TADAC-2.19, TADAC-2.23, TadA8e-N46L, TadA8e-N46P.
In some embodiments, the deaminase domain comprises an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase, an activation induced deaminase (AID) , a cytidine deaminase 1 from Petromyzon marinus (pmCDA1) , or a functional variant or fragment thereof, e.g., APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H.
Protospacer sequence
In some embodiments, the protospacer sequence comprises about, at least about, or at most about 14 contiguous nucleotides of the target dsDNA, e.g., about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides on the nontarget strand of the target dsDNA, or in a numerical range between any two of the preceding values, e.g., from about 16 to about 50, or from about 17 to about 22 contiguous nucleotides of the target dsDNA. In some embodiments, the protospacer sequence comprises about 20, 30, or 50 contiguous nucleotides on the nontarget strand of the target dsDNA.
In some embodiments, the protospacer sequence is a stretch of about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides on the nontarget strand of the target dsDNA, or a stretch of contiguous nucleotides on the nontarget strand of the target dsDNA in a numerical range between any two of the preceding values, e.g., a stretch of from about 16 to about 50 contiguous nucleotides. In some embodiments, the protospacer sequence is a stretch of about 20, 30, or 50 contiguous nucleotides on the nontarget strand of the target dsDNA.
In some embodiments, the protospacer sequence is immediately 5’ or 3’ to a protospacer adjacent motif (PAM) comprises sequence 5’-NN-3’ , 5’-NNN-3’ , 5’-NNNN-3’ , 5’-NNNNN-3’ , or 5’-NNNNNN-3’ , wherein N is A, T, G, or C. For example, in some embodiments, the protospacer sequence is immediately 5’ to a protospacer adjacent motif (PAM) comprises sequence 5’-NGG-3’ or 5’-NTN-3’ , wherein N is A, T, G, or C. In some embodiments, the protospacer sequence is immediately 3’ to a protospacer adjacent motif (PAM) comprises sequence 5’-TTN-3’ , wherein N is A, T, G, or C.
Guide sequence
In some embodiments, the guide sequence is in a length of about, at least about, or at most about 14 nucleotides, e.g., about, at least about, or at most about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more nucleotides, or in a length of nucleotides in a numerical range between any two of the preceding values, e.g., in a length of from about 16 to about 50 nucleotides. In some embodiments, the guide sequence is in a length of about 20, 30, or 50 nucleotides.
In some embodiments, (1) the guide sequence is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% (fully) , optionally about 100% (fully) , reversely complementary to the target sequence; (2)
the guide sequence contains no more than 5, 4, 3, 2, or 1 mismatch or contains no mismatch with the target sequence; or (3) the guide sequence comprises no mismatch with the target sequence in the first 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 nucleotides at the 5’ end of the guide sequence when the PAM is immediately 5’ to the protospacer sequence or at the 3’ end of the guide sequence when the PAM is immediately 3’ to the protospacer sequence. In some embodiments, the guide sequence is about 100%(fully) , reversely complementary to the target sequence.
In some embodiments, the guide sequence contains 1 mismatch with the target sequence. In some embodiments, the guide sequence is about 98%reversely complementary to the target sequence. In some embodiments, the 1 mismatch in the guide sequence is at a position corresponding the nucleotide of the target sequence that is intended to be substituted.
In some embodiments, the guide sequence comprises (1) a sequence of SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) .
In some embodiments, the guide sequence comprises a sequence of any one of SEQ ID NOs: SEQ ID NO: 75-103 and 165-3030 (excluding the PAM if present) .
In some aspects, the disclosure provides a guide nucleic acid comprising a guide sequence as described herein and a scaffold sequence capable of forming a complex with a napDNAbd. The scaffold sequence and the napDNAbd may be as described herein.
Scaffold sequence
For the purpose of the disclosure, the scaffold sequence is compatible with the napDNAbd of the disclosure and is capable of complexing with the napDNAbd. The scaffold sequence may be a naturally occurring scaffold sequence identified along with the napDNAbd, or a variant thereof maintaining the ability to complex with the napDNAbd. Generally, the ability to complex with the napDNAbd is maintained as long as the secondary structure of the variant is substantially identical to the secondary structure of the naturally occurring scaffold sequence. A nucleotide deletion, insertion, or substitution in the primary sequence of the scaffold sequence may not necessarily change the secondary structure of the scaffold sequence (e.g., the relative locations and/or sizes of the stems, bulges, and loops of the scaffold sequence do not significantly deviate from that of the original stems, bulges, and loops) . For example, the nucleotide deletion, insertion, or substitution may be in a bulge or loop region of the scaffold sequence so that the overall symmetry of the bulge and hence the secondary structure remains largely the same. The nucleotide deletion, insertion, or substitution may also be in the stems of the scaffold sequence so that the lengths of the stems do not significantly deviate from that of the original stems (e.g., adding or deleting one base pair in each of two stems correspond to 4 total base changes) . In some embodiments, the scaffold sequence is 5’ or 3’ to the guide sequence.
In some embodiments, the scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 40, 73, or 74. In some embodiments, the scaffold sequence comprises (1) a sequence of SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 40, 73, or 74. In some embodiments, the scaffold sequence comprises the sequence of SEQ ID NO: 40, 73, or 74.
Translesion synthesis (TLS) polymerase
In some aspects, the fusion protein of the disclosure may be used in combination with a translesion synthesis (TLS) polymerase for improved outcome purity. By “purity” it means the percentage /proportion of an outcome among all possible outcomes. For example, purity of dT means the percentage /proportion of dT as an outcome among all possible outcomes including, for example, dA, dT, dG, and dC. TLS polymerases may have their own inclination of incorporating various deoxyribonucleotide opposite an abasic site during polymerization, as listed in Table 5. By taking advantage of such inclination, the base editing outcome may be intentionally controlled to improve outcome purity. For example, human Polη (SEQ ID NO: 118) is a TLS polymerase preferentially incorporating dA opposite an abasic site. With combination use of
human Polη, the base editing outcome may be adjusted toward dT, thereby increasing purity of dT product.
Table 5. DNA polymerases for incorporating perfect base opposite abasic sites.
In some embodiments, the fusion protein or system of the disclosure further comprises a translesion synthesis (TLS) polymerase or a recruiting domain or component capable of recruiting a TLS polymerase. In some embodiments, the TLS polymerase or the recruiting domain or component is fused to a component of the fusion protein without or with a linker as described herein.
Non-limiting examples of the TLS polymerase is selected from the group consisting of Polα (alpha) , Polβ (beta) , Polδ (delta) (PCNA) , Polγ (gamma) , Polη (eta) , Polι (iota) , Polκ (kappa) , Polλ (lamda) , Polμ (mu) , Polν(nu) , Polθ (theta) , and REV1.
In some embodiments, the TLS polymerase comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of SEQ ID NO: 118. In some embodiments, the TLS polymerase comprises the amino acid sequence of SEQ ID NO: 118 (Polη) .
In some embodiments, the fusion protein or system further comprising the translesion synthesis (TLS) polymerase or a recruiting domain capable of recruiting a TLS polymerase leads to conversion of the first deoxyribonucleotide to dG, dC, dT, or dA.
MPG
In yet another aspect, the disclosure provides an MPG described herein, or of the disclosure.
In some embodiments, the MPG comprises a motif GxxYxxxxYGxxxxxN, wherein x represents any amino acid. In some embodiments, the MPG is obtained from a species selected from Table A. In some embodiments, the MPG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference MPG. In some embodiments, the wild type or reference MPG is human MPG (SEQ ID NO: 9) or an MPG obtained from a species selected from Table A or any MPG as set forth in Table D or a homology or mutant (e.g., comprising an amino acid sequence of SEQ ID NO: 5, 6, or 7) thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) (e.g., SEQ ID NO: 1) . In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, and/or 298 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
In some embodiments, the amino acid mutation confers an ability to excise a base on the MPG. In some embodiments, the base is guanine.
In some embodiments, the amino acid mutation leads to increased base excising ability as compared to an otherwise
identical control MPG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of G163, N169, D175, C178, S198, K202, G203, S206, K210, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the wild type or reference MPG, wherein the position is numbered according to SEQ ID NO:1. In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
In some embodiments, the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference MPG. In some embodiments, the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/H) ) ; or (4) a negatively charged amino acid residue (such as, Aspartic Acid (Asp/D) , Glutamic Acid (Glue/E) ) . In some embodiments, the amino acid substitution is a substitution with R, A, N, or G.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
In some embodiments, the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference MPG.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1. In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
In some embodiments, the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1. In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 7.
In some embodiments, the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R, wherein the position is numbered according to SEQ ID NO: 1. In some embodiments, the wild type or reference MPG comprises the amino acid sequence of SEQ ID NO: 1 or 9.
In some embodiments, the MPG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
In some embodiments, the MPG is (substantially) capable of excising guanine of dG. In some embodiments, the MPG is (substantially) incapable of excising thymine of dT. In some embodiments, the MPG is (substantially) incapable of excising cytosine of dC. In some embodiments, the MPG is (substantially) incapable of excising adenine of dA.
In some aspects, the disclosure provides a fusion protein comprising the MPG described herein, or of the disclosure, and a functional domain, such as, a napDNAbd.
In some aspects, the disclosure provides use of the MPG described herein, or of the disclosure, for base editing as described herein.
In some embodiments, the MPG is not wild type or naturally-occurring.
UNG
In yet another aspect, the disclosure provides an UNG described herein, or of the disclosure.
In some embodiments, the UNG comprises a motif GQDPYH. In some embodiments, the UNG is obtained from a species selected from Table C. In some embodiments, the UNG comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference UNG. In some embodiments, the wild type or reference UNG is human UNG1 (SEQ ID NO: 54) or human UNG2 (SEQ ID NO: 133) or an UNG obtained from a species selected from Table C or any UNG as set forth in Table D or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) . In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 56, 58, 135, or 137.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, and/or 313 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
Alternative splicing as well as transcription from two distinct start sites leads to two different human UNG isoforms, mitochondrial UNG1 (304 amino acids, aa) (SEQ ID NO: 54) and nuclear UNG2 (313 aa) (SEQ ID NO: 133) , each possessing an unique N-terminus that mediates translocation to mitochondria or nucleus 16 (FIG. 25) . The sequence alignment of human UNG1 and human UNG2 shows that the amino acid residues at positions 1-35 of UNG1 are different from the amino acid residues at positions 1-44 of UNG2, and the remaining parts of UNG1 and UNG2 are identical. The corresponding relationship between the positions of UNG1 (SEQ ID NO: 54) and UNG2 (SEQ ID NO: 133) can be determined by sequence alignment. For example, the residues Y156 and N213 of UNG2 are corresponding to the residues Y147 and N204 of UNG1, respectively.
In some embodiments, the amino acid mutation confers an ability to excise a base on the UNG. In some embodiments, the base is thymine. In some embodiments, the base is cytosine.
In some embodiments, the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control UNG without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%,
600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of Y156, K184, N213, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO:133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135 or 137.
In some embodiments, the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference UNG. In some embodiments, the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/H) ) ; or (4) a negatively charged amino acid residue (such as, Aspartic Acid (Asp/D) , Glutamic Acid (Glue/E) ) . In some embodiments, the amino acid substitution is a substitution with A, D, V, or T.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of Y156A, K184A, N213D, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
In some embodiments, the amino acid mutation comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
In some embodiments, the amino acid mutation comprises an amino acid substitution that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
In some embodiments, the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference UNG.
In some embodiments, the amino acid mutation comprises a combination substitution that is corresponding to a
combination substitution of A214T, Q259A, and Y284D, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 135.
In some embodiments, the amino acid mutation comprises a combination substitution that is corresponding to a combination substitution of K184A and A214V, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the wild type or reference UNG, wherein the position is numbered according to SEQ ID NO: 133. In some embodiments, the wild type or reference UNG comprises the amino acid sequence of SEQ ID NO: 137.
In some embodiments, the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
In some embodiments, the UNG is (substantially) capable of excising thymine of dT. In some embodiments, the UNG is (substantially) capable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising thymine of dT. In some embodiments, the UNG is (substantially) incapable of excising cytosine of dC. In some embodiments, the UNG is (substantially) incapable of excising adenine of dA. In some embodiments, the UNG is (substantially) incapable of excising guanine of dG.
In some aspects, the disclosure provides a fusion protein comprising the UNG described herein, or of the disclosure, and a functional domain, such as, a napDNAbd.
In some aspects, the disclosure provides use of the UNG described herein, or of the disclosure, for base editing as described herein.
In some embodiments, the UNG is not wild type or naturally-occurring.
Additional BEP
In yet another aspect, the disclosure provides a TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 described herein, or of the disclosure.
In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid mutation relative to (compared to; with reference to) a wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1. In some embodiments, the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is human TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 (SEQ ID NO: 64, 65, 66, 67, 68, 69, 70, 71, or 72, respectively) or a homology or mutant thereof or an N-terminal truncation thereof lacking the most N-terminal Methionine (Met) (coded by start codon ATG) . In some embodiments, the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 64-72, respectively.
In some embodiments, the amino acid mutation comprises an amino acid substitution at a position that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346,
347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, and/or 604 of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1, wherein the position is numbered according to any one of SEQ ID NOs: 64-72.
In some embodiments, the amino acid mutation confers an ability to excise a base on the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1. In some embodiments, the base is guanine, thymine, cytosine, adenine, uracil, or hypoxanthine.
In some embodiments, the amino acid mutation leads to increased base excising ability as compared to an otherwise identical control TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid mutation leads to increased guide sequence-specific base editing efficiency of the fusion protein as compared to an otherwise identical control fusion protein without said amino acid mutation, e.g., an increase by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000%, 1100%, 1200%, 1300%, 1400%, 1500%, 1600%, 1700%, 1800%, 1900%, 2000%, or more.
In some embodiments, the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution. In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the amino acid residue at the position of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1. In some embodiments, the amino acid substitution is an amino acid substitution with (1) a non-polar amino acid residue (such as, Glycine (Gly/G) , Alanine (Ala/A) , Valine (Val/V) , Cysteine (Cys/C) , Proline (Pro/P) , Leucine (Leu/L) , Isoleucine (Ile/I) , Methionine (Met/M) , Tryptophan (Trp/W) , Phenylalanine (Phe/F) ; (2) a polar amino acid residue (such as, Serine (Ser/S) , Threonine (Thr/T) , Tyrosine (Tyr/Y) , Asparagine (Asn/N) , Glutamine (Gln/Q) ) ; (3) a positively charged amino acid residue (such as, Lysine (Lys/K) , Arginine (Arg/R) , Histidine (His/H) ) ; or (4) a negatively charged amino acid residue (such as, Aspartic Acid (Asp/D) , Glutamic Acid (Glue/E) ) .
In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.9%) and less than 100%to the amino acid sequence of the wild type or reference TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1, respectively..
In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising guanine of dG. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising thymine of dT.In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising cytosine of dC. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) capable of excising adenine of dA.
In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising thymine of dT. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising cytosine of
dC.In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising adenine of dA. In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is (substantially) incapable of excising guanine of dG.
In some aspects, the disclosure provides a fusion protein comprising the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 described herein, or of the disclosure, and a functional domain, such as, a napDNAbd.
In some aspects, the disclosure provides use of the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 described herein, or of the disclosure, for base editing as described herein.
In some embodiments, the TDG, SMUG1, OGG1, MBD4, MUTYH, NEIL1, NEIL2, NEIL3, or NTHL1 is not wild type or naturally-occurring.
Regulation of guide nucleic acid
Also provided in the disclosure is a polynucleotide comprising or encoding the guide nucleic acid.
In some embodiments, the polynucleotide comprising or encoding the guide nucleic acid is a DNA, a RNA, or a DNA/RNA mixture. By “DNA/RNA mixture” it refers to a nucleic acid comprising both one or more modified or unmodified ribonucleotides and one or more modified or unmodified deoxyribonucleotides, whether consecutive or not. However, by “DNA” or “RNA” it may also refer to a DNA containing one or more modified or unmodified ribonucleotides, whether consecutive or not, or an RNA containing one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
In some embodiments, the guide nucleic acid is operably linked to or under the regulation of a promoter.
In some embodiments, the promoter is a ubiquitous, tissue-specific, cell-type specific, constitutive, or inducible promoter.
Suitable promoters are known in the art and include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a β-actin promoter, an elongation factor 1α short (EFS) promoter, a β glucuronidase (GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and/or promoter, a chicken β-actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1α-subunit (EF1α) promoter, a ubiquitin C (UBC) promoter, a prion promoter, a neuron-specific enolase (NSE) , a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a platelet-derived growth factor (PDGF) promoter, a platelet-derived growth factor B-chain (PDGF-β) promoter, a synapsin (Syn) promoter, a synapsin 1 (Syn1) promoter, a methyl-CpG binding protein 2 (MeCP2) promoter, a Ca2+/calmodulin-dependent protein kinase II (CaMKII) promoter, a metabotropic glutamate receptor 2 (mGluR2) promoter, a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a β-globin minigene nβ2 promoter, a preproenkephalin (PPE) promoter, an enkephalin (Enk) promoter, an excitatory amino acid transporter 2 (EAAT2) promoter, a glial fibrillary acidic protein (GFAP) promoter, and a myelin basic protein (MBP) promoter.
Regulation of fusion protein
In yet another aspect, the disclosure provides a polynucleotide encoding the fusion protein of the disclosure and optionally the guide nucleic acid of the disclosure.
In some embodiments, the polynucleotide encoding the fusion protein is a DNA, a RNA, or a DNA/RNA mixture. By “DNA/RNA mixture” it refers to a nucleic acid comprising both one or more modified or unmodified ribonucleotides and one or more modified or unmodified deoxyribonucleotides, whether consecutive or not. However, by “DNA” or “RNA” it may also refer to a DNA containing one or more modified or unmodified ribonucleotides, whether consecutive or not, or an RNA containing one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
In some embodiments, the polynucleotide encoding the napDNAbd is a mRNA.
In some embodiments, the polynucleotide encoding the napDNAbd comprises a sequence encoding the napDNAbd and a promoter operably linked to the sequence encoding the napDNAbd.
In some embodiments, the polynucleotide encoding the napDNAbd is operably linked to or under the regulation of a promoter.
In some embodiments, the promoter is a ubiquitous, tissue-specific, cell-type specific, constitutive, or inducible promoter.
Suitable promoters are known in the art and include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a β-actin promoter, an elongation factor 1α short (EFS) promoter, a β glucuronidase
(GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and/or promoter, a chicken β-actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1α-subunit (EF1α) promoter, a ubiquitin C (UBC) promoter, a prion promoter, a neuron-specific enolase (NSE) , a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a platelet-derived growth factor (PDGF) promoter, a platelet-derived growth factor B-chain (PDGF-β) promoter, a synapsin (Syn) promoter, a human synapsin (hSyn) promoter, a synapsin 1 (Syn1) promoter, a methyl-CpG binding protein 2 (MeCP2) promoter, a Ca2+/calmodulin-dependent protein kinase II (CaMKII) promoter, a metabotropic glutamate receptor 2 (mGluR2) promoter, a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a β-globin minigene nβ2 promoter, a preproenkephalin (PPE) promoter, an enkephalin (Enk) promoter, an excitatory amino acid transporter 2 (EAAT2) promoter, a glial fibrillary acidic protein (GFAP) promoter, a myelin basic protein (MBP) promoter, a OTOF promoter, a GRK1 promoter, a CRX promoter, a NRL promoter, a MECP2 promoter, a mMECP2 promoter, a hMECP2 promoter, an APP promoter, and a RCVRN promoter.
Delivery
Various ways of delivery can be applied to the fusion protein of the disclosure or the system of the disclosure as needed in practices.
In yet another aspect, the disclosure provides a delivery system comprising (1) the fusion protein of the disclosure, the polynucleotide of the disclosure, or the system of the disclosure; and (2) a delivery vehicle.
In yet another aspect, the disclosure provides a vector comprising the polynucleotide of the disclosure. In some embodiments, the vector encodes a guide nucleic acid of the disclosure. In some embodiments, the vector is a plasmid vector, a recombinant AAV (rAAV) vector (vector genome) , or a recombinant lentivirus vector.
In yet another aspect, the disclosure provides a recombinant AAV (rAAV) particle comprising the rAAV vector genome of the disclosure. A simple introduction of AAV for delivery may refer to “Adeno-associated Virus (AAV) Guide” (addgene. org/guides/aav/) .
Adeno-associated virus (AAV) , when engineered to delivery, e.g., a protein-encoding sequence of interest, may be termed as a (r) AAV vector, a (r) AAV vector particle, or a (r) AAV particle, where “r” stands for “recombinant” . And the genome packaged in AAV vectors for delivery may be termed as a (r) AAV vector genome, vector genome, or vg for short, while viral genome may refer to the original viral genome of natural AAVs.
The serotypes of the capsids of rAAV particles can be matched to the types of target cells. For example, Table 2 of WO2018002719A1 lists exemplary cell types that can be transduced by the indicated AAV serotypes (incorporated herein by reference) .
In some embodiments, the rAAV particle comprising a capsid with a serotype suitable for delivery into a desired target cell. In some embodiments, the rAAV particle comprising a capsid with a serotype of AAV1, AAV2, AAV3A, AAV3B, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV-DJ, or AAV. PHP. eB, a member of the Clade to which any of the AAV1-AAV13 belong, or a functional variant (e.g., a functional truncation) thereof, encapsidating the rAAV vector genome. In some embodiments, the serotype of the capsid is wild type serotype or a functional variant thereof.
General principles of rAAV particle production are known in the art. In some embodiments, rAAV particles may be produced using the triple transfection method (described in detail in U.S. Pat. No. 6,001,650) .
The vector titers are usually expressed as vector genomes per ml (vg/ml) . In some embodiments, the vector titer is above 1×109, above 5×1010, above 1×1011, above 5×1011, above 1×1012, above 5×1012, or above 1×1013 vg/ml.
Instead of packaging a single strand (ss) DNA sequence as a vector genome of a rAAV particle, systems and methods of packaging an RNA sequence as a vector genome into a rAAV particle is recently developed and applicable herein. See PCT/CN2022/075366, which is incorporated herein by reference in its entirety.
When the vector genome is RNA as in, for example, PCT/CN2022/075366, for simplicity of description and claiming, sequence elements described herein for DNA vector genomes, when present in RNA vector genomes, should generally be considered to be applicable for the RNA vector genomes except that the deoxyribonucleotides in the DNA sequence are the corresponding ribonucleotides in the RNA sequence (e.g., dT is equivalent to U, and dA is equivalent to A) and/or the element in the DNA sequence is replaced with the corresponding element with a corresponding function in the RNA sequence or omitted because its function is unnecessary in the RNA sequence and/or an additional element necessary for the RNA vector genome is introduced.
As used herein, a coding sequence, e.g., as a sequence element of rAAV vector genomes herein, is construed, understood, and considered as covering and covers both a DNA coding sequence and an RNA coding sequence. When it is a DNA coding sequence, an RNA sequence can be transcribed from the DNA coding sequence, and optionally further a protein can be translated from the transcribed RNA sequence as
necessary. When it is an RNA coding sequence, the RNA coding sequence per se can be a functional RNA sequence for use, or an RNA sequence can be produced from the RNA coding sequence, e.g., by RNA processing, or a protein can be translated from the RNA coding sequence.
For example, a fusion protein coding sequence encoding a fusion protein covers either a fusion protein DNA coding sequence from which a fusion protein is expressed (indirectly via transcription and translation) or a fusion protein RNA coding sequence from which a fusion protein is translated (directly) .
For example, a gRNA coding sequence encoding a gRNA covers either a gRNA DNA coding sequence from which a gRNA is transcribed or a gRNA RNA coding sequence (1) which per se is the functional gRNA for use, or (2) from which a gRNA is produced, e.g., by RNA processing.
In some embodiments for rAAV RNA vector genomes, 5’-ITR and/or 3’-ITR as DNA packaging signals may be unnecessary and can be omitted at least partly, while RNA packaging signals can be introduced.
In some embodiments for rAAV RNA vector genomes, a promoter to drive transcription of DNA sequences may be unnecessary and can be omitted at least partly.
In some embodiments for rAAV RNA vector genomes, a sequence encoding a polyA signal may be unnecessary and can be omitted at least partly, while a polyA tail can be introduced.
Similarly, other DNA elements of rAAV DNA vector genomes can be either omitted or replaced with corresponding RNA elements and/or additional RNA elements can be introduced, in order to adapt to the strategy of delivering an RNA vector genome by rAAV particles.
In yet another aspect, the disclosure provides a complex comprising the fusion protein or the polynucleotide (e.g., a mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid (e.g., a gRNA) of the disclosure.
In yet another aspect, the disclosure provides a ribonucleoprotein (RNP) comprising the fusion protein of the disclosure and a guide nucleic acid of the disclosure.
In yet another aspect, the disclosure provides a lipid nanoparticle (LNP) comprising an RNA (e.g., mRNA) encoding the fusion protein of the disclosure and a guide nucleic acid of the disclosure.
Pharmaceutical composition
In yet another aspect, the disclosure provides a pharmaceutical composition comprising (1) the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, or the cell of the disclosure; and (2) a pharmaceutically acceptable excipient.
In some embodiments, the pharmaceutical composition comprises the rAAV particle in a concentration selected from the group consisting of about 1×1010 vg/mL, 2×1010 vg/mL, 3×1010 vg/mL, 4×1010 vg/mL, 5×1010 vg/mL, 6×1010 vg/mL, 7×1010 vg/mL, 8×1010 vg/mL, 9×1010 vg/mL, 1×1011 vg/mL, 2×1011 vg/mL, 3×1011 vg/mL, 4×1011 vg/mL, 5×1011 vg/mL, 6×1011 vg/mL, 7×1011 vg/mL, 8×1011 vg/mL, 9×1011 vg/mL, 1×1012 vg/mL, 2×1012 vg/mL, 3×1012 vg/mL, 4×1012 vg/mL, 5×1012 vg/mL, 6×1012 vg/mL, 7×1012 vg/mL, 8×1012 vg/mL, 9×1012 vg/mL, 1×1013 vg/mL, or in a concentration of a numerical range between any of two preceding values, e.g., in a concentration of from about 9×1010 vg/mL to about 8×1011 vg/mL. In some embodiments, the pharmaceutical composition is an injection formulation.
In some embodiments, the volume of the injection is selected from the group consisting of about 1 microliter, 10 microliters, 50 microliters, 100 microliters, 150 microliters, 200 microliters, 250 microliters, 300 microliters, 350 microliters, 400 microliters, 450 microliters, 500 microliters, 550 microliters, 600 microliters, 650 microliters, 700 microliters, 750 microliters, 800 microliters, 850 microliters, 900 microliters, 950 microliters, 1000 microliters, and a volume of a numerical range between any of two preceding values, e.g., in a concentration of from about 10 microliters to about 750 microliters.
Cells
The methods of the disclosure can be used to introduce the systems of the disclosure into a cell and cause the cell to alter the production of one or more cellular produces, such as antibody, starch, ethanol, or any other desired products. Such cells and progenies thereof are within the scope of the disclosure.
In yet another aspect, the disclosure provides a cell or a progeny thereof comprising the system of the disclosure. In some embodiments, the cell is a eukaryote. In some embodiments, the cell is a human cell.
In yet another aspect, the disclosure provides a cell or a progeny thereof modified by the system of the disclosure or the method of the disclosure. In some embodiments, the cell is a eukaryote. In some embodiments, the cell is a human cell. In some embodiments, the cell is modified in vitro, in vivo, or ex vivo.
In some embodiments, the cell is a stem cell. In some embodiments, the cell is not a human embryonic stem cell. In some embodiments, the cell is not a human germ cell.
In some embodiments, the cell is a prokaryotic cell.
In some embodiments, the cell is a eukaryotic cell (e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant
cell, or a yeast cell) or a prokaryotic cell (e.g., a bacteria cell) .
In some embodiments, the cell is from a plant or an animal.
In some embodiments, the plant is a dicotyledon. In some embodiments, the dicotyledon is selected from the group consisting of soybean, cabbage (e.g., Chinese cabbage) , rapeseed, brassica, watermelon, melon, potato, tomato, tobacco, eggplant, pepper, cucumber, cotton, alfalfa, eggplant, grape.
In some embodiments, the plant is a monocotyledon. In some embodiments, the monocotyledon is selected from the group consisting of rice, corn, wheat, barley, oat, sorghum, millet, grasses, Poaceae, Zizania, Avena, Coix, Hordeum, Oryza, Panicum (e.g., Panicum miliaceum) , Secale, Setaria (e.g., Setaria italica) , Sorghum, Triticum, Zea, Cymbopogon, Saccharum (e.g., Saccharum officinarum) , Phyllostachys, Dendrocalamus, Bambusa, Yushania.
In some embodiments, the animal is selected from the group consisting of pig, ox, sheep, goat, mouse, rat, alpaca, monkey, rabbit, chicken, duck, goose, fish (e.g., zebra fish) .
In some embodiments, the cell is a eukaryotic cell, such as a mammalian cell, including a human cell (a primary human cell or an established human cell line) . In some embodiments, the cell is a non-human mammalian cell, such as a cell from a non-human primate (e.g., monkey) , a cow /bull /cattle, sheep, goat, pig, horse, dog, cat, rodent (such as rabbit, mouse, rat, hamster, etc. ) . In some embodiments, the cell is from fish (such as salmon) , bird (such as poultry bird, including chick, duck, goose) , reptile, shellfish (e.g., oyster, claim, lobster, shrimp) , insect, worm, yeast, etc. In some embodiments, the cell is from a plant, such as monocot or dicot. In certain embodiment, the plant is a food crop such as barley, cassava, cotton, groundnuts or peanuts, maize, millet, oil palm fruit, potatoes, pulses, rapeseed or canola, rice, rye, sorghum, soybeans, sugar cane, sugar beets, sunflower, and wheat. In certain embodiment, the plant is a cereal (barley, maize, millet, rice, rye, sorghum, and wheat) . In certain embodiment, the plant is a tuber (cassava and potatoes) . In certain embodiment, the plant is a sugar crop (sugar beets and sugar cane) . In certain embodiment, the plant is an oil-bearing crop (soybeans, groundnuts or peanuts, rapeseed or canola, sunflower, and oil palm fruit) . In certain embodiment, the plant is a fiber crop (cotton) . In certain embodiment, the plant is a tree (such as a peach or a nectarine tree, an apple or pear tree, a nut tree such as almond or walnut or pistachio tree, or a citrus tree, e.g., orange, grapefruit or lemon tree) , a grass, a vegetable, a fruit, or an algae. In certain embodiment, the plant is a nightshade plant; a plant of the genus Brassica; a plant of the genus Lactuca; a plant of the genus Spinacia; a plant of the genus Capsicum;
cotton, tobacco, asparagus, carrot, cabbage, broccoli, cauliflower, tomato, eggplant, pepper, lettuce, spinach, strawberry, blueberry, raspberry, blackberry, grape, coffee, cocoa, etc.
In some embodiments, the cell is not within the body of an organism, such as, human or animal. In some embodiments, the cell is not a human embryonic stem cell. In some embodiments, the cell is not a human germ cell.
Method
In yet another aspect, the disclosure provides a method for modifying a target dsDNA, comprising contacting the target DNA with the system of the disclosure, wherein the guide sequence is capable of hybridizing to a target sequence of the target dsDNA, wherein the target dsDNA is modified by the complex.
In yet another aspect, the disclosure provides a method for diagnosing, preventing, and/or treating a disease in a subject in need thereof, comprising administering to the subject (e.g., a therapeutically effective amount /dose of) the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, the lipid nanoparticle of the disclosure, the cell of the disclosure, or the pharmaceutical composition of the disclosure, wherein the disease is associated with a target dsDNA, wherein the guide sequence is capable of hybridizing to a target sequence of the target dsDNA, wherein the target dsDNA is modified by the complex, and wherein the modification of the target dsDNA diagnose, prevents, and/or treats the disease.
In some embodiments, the disease is selected from the group consisting of Angelman syndrome (AS) , Alzheimer's disease (AD) , transthyretin amyloidosis (ATTR) , transthyretin amyloid cardiomyopathy (ATTR-CM) , cystic fibrosis (CF) , hereditary angioedema, diabetes, progressive pseudohypertrophic muscular dystrophy, Duchenne muscular dystrophy (DMD) , Becker muscular dystrophy (BMD) , spinal muscular atrophy (SMA) , alpha-1-antitrypsin deficiency, Pompe disease, myotonic dystrophy, Huntington’s disease (HTT) , fragile X syndrome, Friedreich ataxia, amyotrophic lateral sclerosis (ALS) , frontotemporal dementia, hereditary chronic kidney disease, hyperlipidemia, Leber congenital amaurosis (LCA) , sickle cell disease, thalassemia (e.g., β-thalassemia) , Parkinson's disease (PD) , myelodysplastic syndrome (MDS) , retinitis pigmentosa (RP) , age-related macular degeneration (AMD) , Hepatitis B, nonalcoholic fatty liver disease (NAFLD) , Acquired Immune Deficiency Syndrome, corneal dystrophy (CD) , hypercholesterolemia, familial hypercholesterolemia (FH) , heart disease (e.g., hypertrophic cardiomyopathy (HCM) ) , and cancer.
In some embodiments, the target dsDNA encodes a mRNA, a tRNA, a ribosomal RNA (rRNA) , a microRNA (miRNA) , a non-coding RNA, a long non-coding (lnc) RNA, a nuclear RNA, an interfering RNA (iRNA) , a small interfering RNA (siRNA) , a ribozyme, a riboswitch, a satellite RNA, a microswitch, a microzyme, or a viral RNA.
In some embodiments, the target dsDNA is a eukaryotic DNA.
In some embodiments, the eukaryotic DNA is a mammal DNA, such as a non-human mammalian DNA, a non-human primate DNA, a human DNA, a plant DNA, an insect DNA, a bird DNA, a reptile DNA, a rodent (e.g., mouse, rat) DNA, a fish DNA, a nematode DNA, or a yeast DNA.
In some embodiments, the target dsDNA is in a eukaryotic cell, for example, a human cell, a non-human primate cell, or a mouse cell.
In some embodiments, the administrating comprises local administration or systemic administration.
In some embodiments, the administrating comprises intrathecal administration, intramuscular administration, intravenous administration, transdermal administration, intranasal administration, oral administration, mucosal administration, intraperitoneal administration, intracranial administration, intracerebroventricular administration, or stereotaxic administration.
In some embodiments, the administration is injection or infusion.
In some embodiments, the subject is a human, a non-human primate, or a mouse.
In some embodiments, the level of the transcript (e.g., mRNA) of the target dsDNA is decreased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the transcript (e.g., mRNA) of the target dsDNA in the subject prior to the administration.
In some embodiments, the level of the transcript (e.g., mRNA) of the target dsDNA is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the transcript (e.g., mRNA) of the target dsDNA in the subject prior to the administration.
In some embodiments, the level of the expression product (e.g., protein) of the target dsDNA is decreased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the expression product (e.g., protein) of the target dsDNA in the subject prior to the administration.
In some embodiments, the level of the expression product (e.g., protein) of the target dsDNA is increased in the subject by at least about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, or more compared to the level of the expression product (e.g., protein) of the target dsDNA in the subject prior to the administration. In some embodiments, the expression product is a functional mutant of the expression product of the target dsDNA.
In some embodiments, the median survival of the subject suffering from the disease but receiving the administration is 5 days, 10 days, 20 days, 30 days, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 12 months, 1.5 year, 2 years, 2.5 years, 3 years, 4 years, 5 years, 6 years, 7 years, 8 years, 9 years, 10 years or more longer than that of a subject or a population of subjects suffering from the disease and not receiving the administration.
The therapeutically effective dose may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dose may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.
For example, the therapeutically effective dose of the rAAV particle may be about 1.0E+8, 2.0E+8, 3.0E+8, 4.0E+8, 6.0E+8, 8.0E+8, 1.0E+9, 2.0E+9, 3.0E+9, 4.0E+9, 6.0E+9, 8.0E+9, 1.0E+10, 2.0E+10, 3.0E+10, 4.0E+10, 6.0E+10, 8.0E+10, 1.0E+11, 2.0E+11, 3.0E+11, 4.0E+11, 6.0E+11, 8.0E+11, 1.0E+12, 2.0E+12, 3.0E+12, 4.0E+12, 6.0E+12, 8.0E+12, 1.0E+13, 2.0E+13, 3.0E+13, 4.0E+13, 6.0E+13, 8.0E+13, 1.0E+14, 2.0E+14, 3.0E+14, 4.0E+14, 6.0E+14, 8.0E+14, 1.0E+15, 2.0E+15, 3.0E+15, 4.0E+15, 6.0E+15, 8.0E+15, 1.0E+16, 2.0E+16, 3.0E+16, 4.0E+16, 6.0E+16, 8.0E+16, or 1.0E+17 vg, or within a range of any two of the those point values. vg stands for vector genomes of rAAV particles for administration.
In yet another aspect, the disclosure provides a method of detecting a target dsDNA, comprising contacting the target dsDNA with the system of the disclosure, wherein the target dsDNA is modified by the complex, and wherein the modification detects the target dsDNA. In some embodiments, the modification generates
a detectable signal, e.g., a fluorescent signal.
Kits
In yet another aspect, the disclosure provides a kit comprising the fusion protein of the disclosure, the system of the disclosure, the polynucleotide of the disclosure, the vector of the disclosure, the RNP of the disclosure, the LNP of the disclosure, the delivery system of the disclosure, the cell of the disclosure, or the pharmaceutical composition of the disclosure, or any one, two, or all components of the same.
In some embodiments, the kit further comprises an instruction to use the component (s) contained therein, and/or instructions for combining with additional component (s) that may be available or necessary elsewhere.
In some embodiments, the kit further comprises one or more buffers that may be used to dissolve any of the component (s) contained therein, and/or to provide suitable reaction conditions for one or more of the component (s) . Such buffers may include one or more of PBS, HEPES, Tris, MOPS, Na2CO3, NaHCO3, NaB, or combinations thereof. In some embodiments, the reaction condition includes a proper pH, such as a basic pH. In some embodiments, the pH is between 7-10.
In some embodiments, any one or more of the kit components may be stored in a suitable container or at a suitable temperature, e.g., 4 degree Celsius.
Further embodiments are illustrated in the following Examples which are given for illustrative purposes only and are not intended to limit the scope of the disclosure.
EXAMPLES
The following examples are provided to further illustrate some embodiments of the disclosure but are not intended to limit the scope of the invention; it will be understood by their exemplary nature that other procedures, methodologies, or techniques known to those skilled in the art may alternatively be used.
Example 1. Development of deaminase-free, glycosylase-based base editors
It was intended to develop a novel base editing system by harnessing the BER pathway triggered by depurination of G or A. By fusing human MPG at the C-terminus of Cas9-D10A nickase (nCas9) (SEQ ID NO: 2) , a prototype version of a deaminase-free, glycosylase-based base editor (gBE) was generated, in which the MPG was intended to remove normal G or A, two purines with similar structure of hypoxanthine (Hx) , to generate AP sites, with nCas9 nicking the opposite non-edited strand. This would allow the damaged DNA to preferentially use the edited strand as a template for DNA repair and/or DNA replication. Therefore, nucleotide incorporation opposite the AP site via TLS would lead to diverse base editing outcomes (FIG. 1a) .
To conveniently evaluate the editing efficiency of the gBE, a simple intron-split EGFP reporter system was developed as reported previously [11] . Disruptive point mutations (AG to CG or TG) were introduced in the intron boundary to generate inactive splicing acceptor (SA) signals, and G-to-T or A-to-T conversion (C-to-A or T-to-A conversion in the opposite strand) was required to correct the mutation for proper splicing of the EGFP coding sequence, thus activating EGFP expression and generating EGFP signals (FIG. 1b and FIG. 10a) . The EGFP fluorescence intensity could be detected with flow cytometry (FIG. 10d) . The G-to-T reporter was to be used for evaluating the guanine base editing efficiency of the gBE (for this purpose, termed as glycosylase-based guanine base editor (gGBE) ) , and the A-to-T reporter for evaluating the adenine base editing efficiency of the gBE (for this purpose, termed as glycosylase-based adenine base editor (gABE) ) .
gBE expression vectors with wild type human MPG (SEQ ID NO: 1) (without first N-terminal Methionine (M) as compared to the full length wild type human MPG (SEQ ID NO: 9) ) or a distinctive version of human MPG mutants (i.e., MPGv0.2 (SEQ ID NO: 4) , MPGv1 (SEQ ID NO: 5) , MPGv2 (SEQ ID NO: 6) , and MPGv3 (SEQ ID NO: 7) ) that were reported in a previous study [11] were constructed. After co-transfecting the G-to-T or A-to-T reporter vector with the gBE expression vector encoding a targeting single guide RNA (targeting sgRNA) that targeted the intronic mis-splicing mutation, it was found that all the gBE showed G-to-T but not A-to-T conversion activity (FIG. 1c and FIG. 10b) . The differences between the normal G and A substrate recognition capability of the tested MPGs might account for the failure of A editing in this system.
As used herein, the term “conversion activity” refers to the activity of the gBE of the disclosure to convert a target deoxyribonucleotide to an outcome deoxyribonucleotide, and the outcome deoxyribonucleotide may be or may not be specified as a specific type of deoxyribonucleotide, e.g., G-to-T. As used herein the term “(base) editing efficiency” refers to the activity of the gBE of the disclosure to convert a target deoxyribonucleotide to an outcome deoxyribonucleotide, and the outcome deoxyribonucleotide may be or may not be specified as a specific type of deoxyribonucleotide, e.g., G-to-T. In the case that both the
outcome deoxyribonucleotides for conversion activity and (base) editing efficiency are not specified, or both the outcome deoxyribonucleotides for conversion activity and (base) editing efficiency are specified as the same one or more specific types of deoxyribonucleotide, they may refer to the same performance of the gBE of the disclosure and can be used interchangeably.
For G base editing, the gBE (hereafter referred to as gGBEv3) (SEQ ID NO: 15) containing MPGv3 (SEQ ID NO: 7) exhibited the highest G-to-T base editing efficiency (4.33%) in cultured HEK293T cells (FIG. 1c) as compared to that of the gGBE (SEQ ID NO: 14) with WT hMPG (SEQ ID NO: 1) (0.03%) , showing a striking 144-fold enhancement in G base editing efficiency. As two negative controls, no conversion activity was found for the gBE (SEQ ID NO: 13) with dead MPG (SEQ ID NO: 3) (dMPG, carrying E125A, Y127A, and H136A mutations) or gGBEv3 together with a nontargeting sgRNA ( “NT” ) (FIG. 10c) . Thus, the G-to-T conversion by gGBEv3 was demonstrated to depend on the catalytic G excision domain of the MPG guided by the specific sgRNA.
Example 2. Further enhancement of G editing efficiency of gGBE
To further increase the G-to-T activity of gGBEv3, protein engineering was performed by several rounds of rational mutagenesis of MPGv3 contained in gGBEv3, and the G-to-T reporter in Example 1 was used to evaluate the G editing efficiency (FIG. 2a) . Based on structural analysis of MPG [32, 33] , the 17-aa-long region from R163 to V179 (R163-V179) of MPGv3 was selected, which forms a pocket around the targeted G base in a model of MPG-DNA complex (FIG. 11a) .
On one hand, gGBEv3 was mutated with G174R or D175R, generating new base editors gGBEv3.1 (SEQ ID NO: 17) (containing MPGv3 carrying additional substitution G174R; termed as MPGv3.1 (SEQ ID NO: 16) ) and gGBEv3.2 (SEQ ID NO: 19) (containing MPGv3 carrying additional substitution D175R; termed as MPGv3.2 (SEQ ID NO: 18) ) (FIG. 2a) . It was found that the G-to-T conversion activity of gGBEv3.2 (10.22%) was about 1.78-fold of gGBEv3 (5.73%) (FIG. 2b and FIG. 11b) .
On the other hand, the R163-V179 region was scanned with sequential substitution to Asparagine (Asn/N) (X-to-N) .Interestingly, gGBEv3.3 (SEQ ID NO: 21) (containing MPGv3 carrying additional substitution C178N; termed as MPGv3.3 (SEQ ID NO: 20) ) (31.37%) was obtained with substantially elevated G-to-T conversion activity of about 5.5-fold compared to gGBEv3 (5.73%) (FIG. 2a and FIG. 11c) .
Together, gGBEv4 (SEQ ID NO: 23) (containinig MPGv3 carrying additional substitutions of both D175R and C178N; termed as MPGv4 (SEQ ID NO: 22) ) (39.57%) achieved a synergistic enhancement of G-to-T editing efficiency of about 6.9-fold compared to gGBEv3 (5.73%) (FIG. 2b and FIG. 11b) .
In addition, the orientation of nCas9 and MPG was changed to see whether the editing effeiciency would be associated with the positional relationship of nCas9 and MPG. It was found that gGBEv4 with MPG fused at the C-terminus of nCas9 had slightly higher editing efficiency than gGBEv4 with MPG fused at the N-terminus of nCas9 (34.6%vs. 25.9%, FIG. 11d) .
Furthermore, another round of mutagenesis and screening was performed based on gGBEv4 to elevate G editing efficiency. The R163-V179 region of MPGv4 was mutated by sequential replacement with amino acids having distinct properties, including glutamic acid (with negative charged side chain) , valine (with small hydrophobic side chain) , glycine (with no side chain) , or tyrosine (with large hydrophobic side chain) (X-to-E, V, G, or Y) . Three base editors, gGBEv4.1 (SEQ ID NO: 25) (containing MPGv4 carrying additional substitution I170V; termed as MPGv4.1 (SEQ ID NO: 24) ) , gGBEv4.2 (SEQ ID NO: 27) (containing MPGv4 carrying additional substitution S169G (or N169G if compared with WT MPG) ; termed as MPGv4.2 (SEQ ID NO: 26) ) , and gGBEv4.3 (SEQ ID NO: 29) (containing MPGv4 carrying additional substitution R163Y (or G163Y if compared with WT MPG) ; termed as MPGv4.3 (SEQ ID NO: 28) ) , were found to show elevated editing efficiency of 1.06-, 1.28-, and 1.09-fold, respectively (FIG. 12) , as compared with that of gGBEv4. Two additional base editors (gGBEv5.1 with S169G and I170V and gGBEv5.2 with R163Y, S169G, and I170Y) were generated by combining the above three effective mutations, but no further enhancement was found compared with gGBEv4.1 and gGBEv4.2 (FIG. 13) .
Encouraged by the above finding on gGBEv3.2, T199R, S230R, Q294R, or D295R was further introduced individually into gGBEv4.2. It was found that gGBEv6.3 (SEQ ID NO: 12) (containing MPGv4.2 plus additional substitution Q294R; termed as MPGv6.3 (SEQ ID NO: 8) ) (50.83%) had further increased editing efficiency by about 1.02-and 1.694-folds, as compared with gGBEv4.2 (SEQ ID NO: 27) (49.93%) and gGBE (SEQ ID NO: 14) (0.03%) with WT MPG, respectively (FIG. 2b and FIG. 13) . The amino acid sequences of gGBEv5.1, v5.2, v6.1, v6.2, and v6.4 are set forth in SEQ ID NOs: 31, 33, 35, 37, and 39, respectively, and the corerspoinding MPGv5.1, v5.2, v6.1, v6.2, and v6.4 are set forth in SEQ ID NOs: 30, 32, 34, 36, and 38, respectively.
The enhancement of G editing efficiency of the gGBEs (v3, v4, v4.2, v6.3, and gGBE with WT MPG) obtained above was next validated at two endogenous genomic sites in cultured HEK293T cells. The cells were transfected with a construct encoding each gGBE, together with mCherry and sgRNA that targeted site 3
or site 10, and mCherry+ cells were FACS-sorted for target deep sequencing analysis. A gradual elevation of overall G editing efficiency was obtained at G7 from 6.4%to 78.5%for site 3, and from 7.5%to 80.3% for site 10, respectively (FIG. 2c and 2d) , confirming that gGBEv6.3 was indeed the best version of gGBE. The G-to-C and G-to-T conversions were the predominant events at these two sites, and a small percentage of G-to-A conversion (4.6%for site 3, 5.0%for site 10) and a few insertions and deletions (indels, 8.5% for site 3; 4.3%for site 10) were detected for gGBEv6.3. These results indicate that the four rounds of mutagenesis described above had effectively optimized gGBE activity for G-to-C and G-to-T base editing. Taken together, the engineered gGBEv6.3 (SEQ ID NO: 12) (carrying G163R, N169G, D175R, C178N, S198A, K202A, G203A, S206A, K210A, Q294R mutations on MPGv6.3 (SEQ ID NO: 8) ) had the highest G editing efficiency and was used in the following studies.
Example 3. Characterization of gGBEv6.3 at human genomic DNA sites
The editing profiles of gGBEv6.3 was characterized by targeting 24 endogenous genomic loci, most of which were used in previous base editing studies [9, 35, 36] . It was found that gGBEv6.3 achieved efficient G base editing efficiency (ranged from 27.6%to 76.5%) , with predominately G-to-C and G-to-T conversions, and essentially no A, C, or T editing at all 24 sites examined (FIG. 3a and 3c and FIG. 14a-c) . Among all the cells examined, the ratio of G-to-C/T (G-to-Y, Y = C and T) to G-to-A/T/C conversion was high (ranging from 0.72 to 0.94, FIG. 3b) . A small percentage of G-to-A conversion was detected (FIG. 3a and FIG. 14e-g) , which is consistent with previous results with AYBE [11] and CGBEs [6-10] . It was found that gGBEv6.3 also induced indel at the 24 edited sites with frequency ranging from 4.7%to 30.0% (FIG. 14h) . Furthermore, it was found that the editable range of gGBEv6.3 was positions 1 to 14, and the optimal editing window with high efficiency of G conversion covered protospacer positions 6 to 11 (FIG. 3c) , with the highest editing efficiency at position 7 (FIG. 3c, FIG. 14d and S6i) . The analysis of on-target editing and sequences of all the sites showed that gGBEv6.3 performed no obvious NG motif preference for G conversions (FIG. 14j) .
The guide sequence-dependent off-target base editing efficiency of gGBEv6.3 was analyzed at several previously reported [11, 35] and in silico-predicted [37] guide sequence-dependent off-target sites, and the ability of gGBEv6.3 to mediate guide sequence-independent off-target DNA editing was characterized using orthogonal R-loop assay in five dSaCas9 R-loops, as reported in previous studies [11, 35] . Similar or lower percentage of editing at the guide sequence-dependent off-target loci (FIG. 3d and FIG. 15a) was found, as compared with that of adenine base editors found previously [11, 35] . Moreover, among five guide sequence-independent off-target sites, very low frequencies (1.1%on average) were detected at four sites (FIG. 3e and FIG. 15b) , and a slightly higher frequency was detected at one site, which harbored 12 Gs across the entire protospacer. Taken together, it was demonstrated that gGBEv6.3 represents a highly efficient G-to-Y base editor with a low off-target effect in mammalian cells.
On the other hand, gGBEv6.3 was also tested with A-to-T reporter, C-to-G reporter, and T-to-G reporter, and the editing efficiency was 0.68%, 0.58%, and 1.81%, respectively, demonstrating its specificity for G editing, which is desired for targeted base editing application and reduced unwanted off-target editing.
Example 4. Potential gene-editing applications of gGBE
The G-to-Y conversion ability of gGBE allows for a variety of gene-editing applications, including editing splicing sites, introduction of premature termination codons (PTCs) , as well as editing that bypasses PTCs (FIG. 4a) . The inactive splicing acceptor (SA) signal with disruptive point mutations, exemplified by the intron-split EGFP reporter system used above, could be remediated with gGBE (FIG. 1b) . Conversely, gGBE could be used for disrupting the splicing signal by converting G within a splicing donor site ( “GT” ) or splicing acceptor site ( “AG” ) to other bases, resulting in exon skipping. To illustrate this application, the splice acceptor site of DMD (Duchenne muscular dystrophy) exon 45 was edited with gGBEv6.3, and a high efficiency of G editing (up to 30.3%) was achieved with a high G-to-Y ratio (up to 0.88) when targeting DMD site 1 (FIG. 4b, c and FIG. 16a) .
Another application of gGBE is to introduce PTCs to disrupt gene expression by converting TCA, TAC, or GAA codon into a stop codon TGA, TAG, or TAA. Note that PTC by GAA to TAA conversion could be introduced only by using gGBEv6.3, no other current base editor could induce this type of PTC. By targeting three sites in the mouse Tyr (Tyrosinase, associated with coat color) gene with gGBEv6.3 to create PTCs (FIG. 4d) in cultured N2a cells, a high efficiency of G editing (up to 46.3%) was achieved with a high G-to-Y ratio (up to 0.95) (FIG. 4e and FIG. 16b) . To further illustrate the potential in vivo application, gGBEv6.3-encoding mRNA and Tyr-targeting sgRNA were co-injected into mouse zygotes of C57BL6 background (black coat color) , with 20 mouse embryos used for each of the three Tyr-targeting sgRNAs. Highly efficient G editing (FIG. 17a) was found for two of the three sgRNAs, with an average of 50.9%PTC introduction efficiency when targeting the Tyr site 3 (FIG. 17b-c) . Similar to the data
obtained above in cultured cell lines (FIG. 4e) , gGBE induced very few indels in mouse embryos (FIG. 17d-e) . When targeting the Tyr site 3 with gGBEv6.3, all the 21 F0 pups had G conversions with an average of 59.4%efficiency (FIG. 4f-h) . This gGBEv6.3-induced G-to-Y editing resulted in the albino or mosaic phenotype in the coat color of the F0 mice (FIG. 18) , suggesting efficient disruption of the tyrosinase activity (FIG. 4i) . Thus, gGBEv6.3 was demonstrated to be an efficient G base editor not only in cell lines but also in mouse embryos.
DISCUSSION
Two major classes of deaminase-based base editors (dBE) , ABE and CBE, as well as their derivatives (such as AYBE and CGBE) , perform base editing with deamination of A or C as the first key step [3–11] . In the disclosure, deaminase-free base editors were designed based on engineered MPG, and a gGBE editor that could achieve highly efficient G-to-C and G-to-T conversion in both cultured human cells and mouse embryos was generated. The engineered MPG demonstrates that DNA glycosylases could be engineered into proteins that selectively excise a specific nucleotide base, such as, G. The high editing efficiency of the gGBE of the disclosure could be attributed to the mutations in the MPG moiety that may facilitate its specific substrate selection or DNA-binding activity, or both.
In all human pathogenic SNPs (60 372 total) , there are about 10%C-to-G and 5%T-to-G SNPs [11] . Although C-to-G SNPs could be corrected by CGBE [6–10] , an efficient sgRNA was not usually found due to the PAM limitation and narrow editing window. The gGBE of the disclosure could increase the opportunity to find an efficient sgRNA by targeting the opposite strand compared with CGBE. For T-to-G SNPs, no current base editor could efficiently induce G-to-T (or C-to-A in the opposite strand) conversion. Therefore, the gGBE of the disclosure greatly broadens the targeting scope of base editors.
Methods
Molecular cloning
Base editor constructs used in this study were cloned into a mammalian expression plasmid backbone under the control of a EF1α promoter by standard molecular cloning techniques. Intron-split EGFP reporters were engineered as previously described [11] . In brief, corresponding mutations at the splice acceptor site were made to construct A-to-T reporter or G-to-T reporter via site-directed mutagenesis by PCR, respectively. Mutations at the splice acceptor site led to inactive EGFP production by non-spliced EGFP transcripts. Encouraged by the findings from previous base editors [7, 10] , modification of the 68th base (G-to-C) was made in the intron sequence for introducing artificial PAM on the template strand, thus the corresponding mutations at the splice acceptor site were at position 6 across the protospacer. Transversion corrections in A-to-T reporter or G-to-T reporter were required for proper splicing of EGFP-coding sequence. MPG mutagenesis libraries were designed and generated as previously described [46] . The 17aa-long region from R163 to V179 (R163-V179) of MPGv3 was selected for protein engineering. BpiI-harboring MPG mutants, MPG-G174R/D175R/T199R/S230R/Q294R/D295R mutants or corresponding combinations, were constructed via site-directed mutagenesis by PCR. Sequential asparagine /glutamic acid /valine /glycine /tyrosine substitutions (X-to-N, E, V, G, or Y) were designed, with oligos coding for the mutants annealed and ligated into corresponding BpiI-digested backbone vectors. The gRNA oligos were annealed and ligated into BpiI sites. Unless otherwise indicated, each and every mutation in MPG is numbered based on the full-length wild type human MPG (SEQ ID NO: 9) with the first N-terminal Met.
Analysis of target sequencing data
Target sequencing data analysis was described previously [11] . In brief, the targeted amplicon sequencing reads were processed using fastp with default parameters [47] . The cleaned pairs were then merged using FLASH v1.2.11. The amplified sequences from individual targets were demultiplexed using fastx barcode splitter. pl from fastx_toolkit (0.0.14) . Further amplicon sequencing analysis was performed by CRISPResso2 [48] . A 10-bp window was used to quantify modifications centered around the middle of the 20-bp gRNA. Otherwise, the default parameters were used for analysis. G-to-C purity was calculated as G-to-C editing efficiency / (G-to-C editing efficiency + G-to-T editing efficiency + G-to-A editing efficiency) . G-to-Y conversion ratio was calculated as (G-to-C editing efficiency + G-to-T editing efficiency) / (G-to-C editing efficiency + G-to-T editing efficiency + G-to-A editing efficiency) .
Cell culture, Transfection, and flow cytometry analysis
HEK293T cells were cultured with DMEM (Catalog#11995065, Gibco) supplemented with 10%fetal bovine serum (Catalog#04-001-1ACS, BI) and 0.1 mM non-essential amino acids (Catalog#11140-050, Gibco) . Cells were grown in an incubator at 37 ℃ with 5%CO2. MPG mutant screening was conducted in 48-well plates. The day before transfection, 3 × 104 HEK293T cells per well were plated in 250 μL of complete
growth medium in the 48-well plates. After 12h, 500 ng gGBE plasmids and 500 ng A-to-T or G-to-T reporter plasmids were co-transfected into cells with 2 μg Polyethylenimine (PEI) (DNA/PEI ratio of 1: 2) per well. For cell transfection of HEK293T for FACS, 5 × 105 cells per well were plated in 12-well plates with 1 ml complete growth medium the day before transfection. After 14-16h, 2 μg gGBE-sgRNA plasmids were transfected into cells using PEI (DNA/PEI ratio of 1: 2) . For targeting DMD site 2 with an NG PAM, a PAM-flexible Cas9 variant SpG was used. Orthogonal R-loop assays were performed as described previously [1, 2] . In brief, 1 μg of gGBE plasmid with sgRNA targeting site 3 and 1 μg of dSaCas9 plasmid with corresponding sgRNA targeting five off-target sites to generate R-loops were co-transfected into HEK293T cells in 12-well plates using PEI (DNA/PEI ratio of 1: 2) . 48h after transfection, expression of mCherry, BFP and EGFP fluorescence were analyzed by BD FACS Aria III or Beckman CytoFLEX S. Flow cytometry results were analyzed with FlowJo V10.5.3. The gating strategy in the identification of mCherry+, BFP+ and EGFP+ cells for on-target editing efficiency evaluation was supplied in FIG. 10d.
Animals
Experiments involving mice were approved by the Biomedical Research Ethics Committee of Center for HuidaGene Therapeutics Co. Ltd. Super ovulated C57BL/6 females (4 weeks old) were mated with C57BL/6 males (8 weeks old) , and females from the ICR strain were used as foster mothers. Mice were maintained in a specific pathogen-free facility under a 12-hour dark–light cycle, and constant temperature (20–26℃) and humidity (40–60%) maintenance.
In vitro transcription of gGBE mRNA and Tyr-sgRNAs
The gGBE plasmids were structured by standard PCR amplification with Phanta Max Super-Fidelity DNA Polymerase (Vazyme Biotech Co., Ltd) , assembly with Gibson Assembly Master Mix (NEB E2611L) , and transformation into chemically competent DH5α cells. The gGBE plasmids were linearized by the FastDigest KpnI restriction enzyme (Thermo Fisher) , purified using Gel Extraction Kit (Omega) , and used as the template for in vitro transcription (IVT) using the mMESSAGE mMACHINE T7 Ultra kit (Life Technologies) . For Tyr-sgRNAs preparation, T7 promoter sequence was added to the sgRNA template by PCR amplification of px330 (Addgene, #42230) . The T7-Tyr-sgRNA PCR product was purified using Gel Extraction Kit (Omega) and used as the template for IVT of sgRNAs using the MEGAshortscript T7 kit (Life Technologies) . The gGBE mRNA and Tyr-sgRNAs were purified using the MEGAclear kit (Life Technologies) and eluted in RNase-free water. In vitro transcribed RNAs were aliquoted and stored at -80℃ until use. Prior to microinjection, the mixture of gGBE mRNA and Tyr-sgRNA was prepared by centrifuge for 10 min at 14,000 rpm at 4℃ and supernatant transferred to 0.2 mL fresh PCR tubes for injection.
Microinjection of mouse zygotes with gGBE mRNA and Tyr-sgRNA
Super-ovulated C57BL/6 females (4 weeks old) were mated with C57BL/6 males, and fertilized embryos were collected from oviducts 21 h post hCG injection. For zygote injection, the mixture of gGBE mRNA (100 ng/μL) and Tyr-sgRNAs (100 ng/μL) was injected into the cytoplasm of 1-cell embryo in a droplet of M2 medium using a FemtoJet microinjector (Eppendorf) with constant flow settings. The injected embryos were cultured in M16 medium with amino acids at 37℃ under 5%CO2 in air for 2 hours and then transferred into oviducts of pseudo-pregnant ICR foster mothers at 0.5-d. p. c.
Target sequencing of endogenous sites
At 72 h post-transfection, 10,000 mCherry positive cells were isolated by FACS. Genomic DNA was extracted by addition of 40 μl of lysis buffer and 1 μL Proteinase K (Catalog#PD101-01, Vazyme) directly into each tube of sorted cells. The genomic DNA/lysis buffer mixture was incubated at 55 ℃ for 45 min, followed by a 95 ℃ enzyme inactivation step for 10 min. The regions of interest for target sites were amplified by PCR using site-specific primers. The PCR reaction was performed at 95 ℃ for 5 min, 30 cycles at 95 ℃for 15 s, 60 ℃ for 15 s, 72 ℃ 30 s, and a final extension at 72 ℃ for 5 min usingMax Super-Fidelity DNA Polymerase (Catalog#P505-d3, Vazyme) . PCR products were purified using universal DNA purification kit (TIANGEN) according to the manufacturer’s instructions, and analyzed by Sanger sequencing (Genewiz) . The amplicons were ligated to adapters and sequencing was performed on the Illumina MiSeq platforms. Protospacer sequences used for each genomic locus are listed in Table 1.
Statistical analysis
Statistical tests performed by Graphpad Prism 8 included the two-tailed unpaired two-sample t-test or Dunnett's multiple comparisons test after one-way ANOVA.
REFERENCES
1. Porto EM, Komor AC, Slaymaker IM, et al.; Base editing: advances and therapeutic opportunities. Nat Rev Drug Discov 2020; 19 (12) : 839-859. doi: 10.1038/s41573-020-0084-6.
2. Rees HA, Liu DR; Base editing: precision chemistry on the genome and transcriptome of living cells. Nat Rev Genet 2018; 19 (12) : 770-788. doi: 10.1038/s41576-018-0059-1.
3. Komor AC, Zhao KT, Packer MS, et al.; Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C: G-to-T: Abase editors with higher efficiency and product purity. Sci Adv 2017; 3 (8) : eaao4774. doi: 10.1126/sciadv. aao4774.
4. Komor AC, Kim YB, Packer MS, et al.; Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 2016; 533 (7603) : 420-4. doi: 10.1038/nature17946.
5. Gaudelli NM, Komor AC, Rees HA, et al.; Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 2017; 551 (7681) : 464-471. doi: 10.1038/nature24644.
6. Zhao D, Li J, Li S, et al.; Glycosylase base editors enable C-to-A and C-to-G base changes. Nat Biotechnol 2021; 39 (1) : 35-40. doi: 10.1038/s41587-020-0592-2.
7. Kurt IC, Zhou R, Iyer S, et al.; CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat Biotechnol 2021; 39 (1) : 41-46. doi: 10.1038/s41587-020-0609-x.
8. Koblan LW, Arbab M, Shen MW, et al.; Efficient C*G-to-G*C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat Biotechnol 2021; 39 (11) : 1414-1425. doi: 10.1038/s41587-021-00938-z.
9. Chen L, Park JE, Paa P, et al.; Programmable C: G to G: C genome editing with CRISPR-Cas9-directed base excision repair proteins. Nat Commun 2021; 12 (1) : 1384. doi: 10.1038/s41467-021-21559-9.
10. Yuan T, Yan N, Fei T, et al.; Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods. Nat Commun 2021; 12 (1) : 4902. doi: 10.1038/s41467-021-25217-y.
11. Tong H, Wang X, Liu Y, et al.; Programmable A-to-Y base editing by fusing an adenine base editor with an N-methylpurine DNA glycosylase. Nat Biotechnol 2023. doi: 10.1038/s41587-022-01595-6.
12. Mok BY, de Moraes MH, Zeng J, et al.; A bacterial cytidine deaminase toxin enables CRISPR-free mitochondrial base editing. Nature 2020; 583 (7817) : 631-637. doi: 10.1038/s41586-020-2477-4.
13. Lei Z, Meng H, Liu L, et al.; Mitochondrial base editor induces substantial nuclear off-target mutations. Nature 2022; 606 (7915) : 804-811. doi: 10.1038/s41586-022-04836-5.
14. Cho SI, Lee S, Mok YG, et al.; Targeted A-to-G base editing in human mitochondrial DNA with programmable deaminases. Cell 2022; 185 (10) : 1764-1776 e12. doi: 10.1016/j. cell. 2022.03.039.
15. Ladstatter S, Tachibana-Konwalski K; A Surveillance Mechanism Ensures Repair of DNA Lesions during Zygotic Reprogramming. Cell 2016; 167 (7) : 1774-1787 e13. doi: 10.1016/j. cell. 2016.11.009.
16. Marrone A, Ballantyne J; Hydrolysis of DNA and its molecular components in the dry state. Forensic Sci Int Genet 2010; 4 (3) : 168-77. doi: 10.1016/j. fsigen. 2009.08.007.
17. Lindahl T; Instability and decay of the primary structure of DNA. Nature 1993; 362 (6422) : 709-15. doi: 10.1038/362709a0.
18. Lee SJ, Choi MY, Miller RE; Vibrational spectroscopy of xanthine in superfluid helium nanodroplets. Chemical Physics Letters 2009; 475 (1) : 24-29. doi: 10.1016/j. cplett. 2009.05.016.
19. Hindi NN, Elsakrmy N, Ramotar D; The base excision repair process: comparison between higher and lower eukaryotes. Cell Mol Life Sci 2021; 78 (24) : 7943-7965. doi: 10.1007/s00018-021-03990-9.
20. Thompson PS, Cortez D; New insights into abasic site repair and tolerance. DNA Repair (Amst) 2020; 90: 102866. doi: 10.1016/j. dnarep. 2020.102866.
21. Bauer NC, Corbett AH, Doetsch PW; The current state of eukaryotic DNA base damage and repair. Nucleic Acids Res 2015; 43 (21) : 10083-101. doi: 10.1093/nar/gkv1136.
22. Jacobs AL, Schar P; DNA glycosylases: in DNA repair and beyond. Chromosoma 2012; 121 (1) : 1-20. doi: 10.1007/s00412-011-0347-4.
23. Robertson AB, Klungland A, Rognes T, et al.; DNA repair in mammalian cells: Base excision repair: the long and short of it. Cell Mol Life Sci 2009; 66 (6) : 981-93. doi: 10.1007/s00018-009-8736-z.
24. Gallagher PE, Brent TP; Partial purification and characterization of 3-methyladenine-DNA glycosylase from human placenta. Biochemistry 1982; 21 (25) : 6404-9. doi: 10.1021/bi00268a013.
25. Asaeda A, Ide H, Asagoshi K, et al.; Substrate specificity of human methylpurine DNA N-glycosylase. Biochemistry 2000; 39 (8) : 1959-65. doi: 10.1021/bi9917075.
26. Lau AY, Wyatt MD, Glassner BJ, et al.; Molecular basis for discriminating between normal and damaged bases by the human alkyladenine glycosylase, AAG. Proc Natl Acad Sci U S A 2000; 97 (25) : 13573-8. doi: 10.1073/pnas. 97.25.13573.
27. Wyatt MD, Samson LD; Influence of DNA structure on hypoxanthine and 1, N (6) -ethenoadenine removal by murine 3-methyladenine DNA glycosylase. Carcinogenesis 2000; 21 (5) : 901-8. doi: 10.1093/carcin/21.5.901.
28. Saparbaev M, Laval J; Excision of hypoxanthine from DNA containing dIMP residues by the Escherichia coli, yeast, rat, and human alkylpurine DNA glycosylases. Proc Natl Acad Sci U S A 1994; 91 (13) : 5873-7. doi: 10.1073/pnas. 91.13.5873.
29. O'Brien PJ, Ellenberger T; Dissecting the broad substrate specificity of human 3-methyladenine-DNA glycosylase. J Biol Chem 2004; 279 (11) : 9750-7. doi: 10.1074/jbc. M312232200.
30. Berdal KG, Johansen RF, Seeberg E; Release of normal bases from intact DNA by a native DNA repair enzyme. EMBO J 1998; 17 (2) : 363-7. doi: 10.1093/emboj/17.2.363.
31. Kay JE, Corrigan JJ, Armijo AL, et al.; Excision of mutagenic replication-blocking lesions suppresses cancer but promotes cytotoxicity and lethality in nitrosamine-exposed mice. Cell Rep 2021; 34 (11) : 108864. doi: 10.1016/j. celrep. 2021.108864.
32. Vallur AC, Maher RL, Bloom LB; The efficiency of hypoxanthine excision by alkyladenine DNA glycosylase is altered by changes in nearest neighbor bases. DNA Repair (Amst) 2005; 4 (10) : 1088-98. doi: 10.1016/j. dnarep. 2005.05.008.
33. Lau AY, Scharer OD, Samson L, et al.; Crystal structure of a human alkylbase-DNA repair enzyme complexed to DNA: mechanisms for nucleotide flipping and base excision. Cell 1998; 95 (2) : 249-58. doi: 10.1016/s0092-8674 (00) 81755-9.
34. Connor EE, Wyatt MD; Active-site clashes prevent the human 3-methyladenine DNA glycosylase from improperly removing bases. Chem Biol 2002; 9 (9) : 1033-41. doi: 10.1016/s1074-5521 (02) 00215-6.
35. Richter MF, Zhao KT, Eton E, et al.; Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat Biotechnol 2020; 38 (7) : 883-891. doi: 10.1038/s41587-020-0453-z.
36. Doman JL, Raguram A, Newby GA, et al.; Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat Biotechnol 2020; 38 (5) : 620-628. doi: 10.1038/s41587-020-0414-6.
37. Bae S, Park J, Kim JS; Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 2014; 30 (10) : 1473-5. doi: 10.1093/bioinformatics/btu048.
38. Zuo E, Sun Y, Wei W, et al.; GOTI, a method to identify genome-wide off-target effects of genome editing in mouse embryos. Nat Protoc 2020; 15 (9) : 3009-3029. doi: 10.1038/s41596-020-0361-1.
39. Zuo E, Sun Y, Wei W, et al.; Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 2019; 364 (6437) : 289-292. doi: 10.1126/science. aav9973.
40. Yan N, Feng H, Sun Y, et al.; Cytosine base editors induce off-target mutations and adverse phenotypic effects in transgenic mice. Nat Commun 2023; 14 (1) : 1784. doi: 10.1038/s41467-023-37508-7.
41. Lei Z, Meng H, Lv Z, et al.; Detect-seq reveals out-of-protospacer editing and target-strand editing by cytosine base editors. Nat Methods 2021; 18 (6) : 643-651. doi: 10.1038/s41592-021-01172-w.
42. Chen L, Zhang S, Xue N, et al.; Engineering a precise adenine base editor with minimal bystander editing. Nat Chem Biol 2023; 19 (1) : 101-110. doi: 10.1038/s41589-022-01163-8.
43. Kim YB, Komor AC, Levy JM, et al.; Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat Biotechnol 2017; 35 (4) : 371-376. doi: 10.1038/nbt. 3803.
44. Wang Y, Zhao D, Sun L, et al.; Engineering of the Translesion DNA Synthesis Pathway Enables Controllable C-to-G and C-to-A Base Editing in Corynebacterium glutamicum. ACS Synth Biol 2022; 11 (10) : 3368-3378. doi: 10.1021/acssynbio. 2c00265.
45. Sun N, Zhao D, Li S, et al.; Reconstructed glycosylase base editors GBE2.0 with enhanced C-to-G base editing efficiency and purity. Mol Ther 2022; 30 (7) : 2452-2463. doi: 10.1016/j. ymthe. 2022.03.023.
46. Tong H, Huang J, Xiao Q, et al.; High-fidelity Cas13 variants for targeted RNA degradation with minimal collateral effects. Nat Biotechnol 2023; 41 (1) : 108-119. doi: 10.1038/s41587-022-01419-7.
47. Chen S, Zhou Y, Chen Y, et al.; fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018; 34 (17) : i884-i890. doi: 10.1093/bioinformatics/bty560.
48. Clement K, Rees H, Canver MC, et al.; CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat Biotechnol 2019; 37 (3) : 224-226. doi: 10.1038/s41587-019-0032-3.
Example 5. Development of orthogonal base editors based on engineered glycosylases
Encouraged by the development of gGBE in the above Examples, thymine and cytosine base editor were developed using the deaminase-free glycosylase-based strategy. Since the three pyrimidine bases (i.e., T, C, and U) are structurally similar, it was speculated that the excision of canonical T or C could be achieved by
engineering certain uracil DNA glycosylase (UNG) . The excision of T or C would generate an apurinic/apyrimidinic (AP) site, then trigger base excision repair (BER) pathway and facilitate direct T editing or C editing (FIG. 19a-b) .
Alternative splicing as well as transcription from two distinct start sites leads to two different human UNG isoforms, mitochondrial UNG1 (304 amino acids, aa) (SEQ ID NO: 54) and nuclear UNG2 (313 aa) (SEQ ID NO: 133) , each possessing an unique N-terminus that mediates translocation to mitochondria or nucleus 16 (FIG. 25) . The sequence alignment of human UNG1 and human UNG2 shows that the amino acid residues at positions 1-35 of UNG1 are different from the amino acid residues at positions 1-44 of UNG2, and the remaining parts of UNG1 and UNG2 are identical.
Two human UNG1 variants, UNG1-Y147A and UNG1-N204D, have been engineered to excise T and C in DNA, respectively17. The residues Y156 and N213 of UNG2 are corresponding to the residues Y147 and N204 of UNG1, respectively, as determined by sequence alignment of UNG1 and UNG2. To edit nuclear DNA, two prototype gBEs, a deaminase-free glycosylase-based thymine base editor (gTBE) (gTBEv0.1; SEQ ID NO: 136) and a deaminase-free glycosylase-based cytosine base editor (gCBE) (gCBEv0.1; SEQ ID NO:138) , were generated by fusing UNG2-Y156A (SEQ ID NO: 135) and UNG2-N213D (SEQ ID NO: 137) (wherein Y156A and N213D of human UNG2 are equivalent to Y147A and N204D of human UNG1, respectively) (with its N-termial Met retained) at the C-terminus of Cas9-D10A nickase (nCas9) (SEQ ID NO:2) via a linker (SEQ ID NO: 134) , respectively (FIG. 19a-c) .
T-to-G reporter and C-to-G reporter, two similar intron-split EGFP reporter systems as reported previously9, were established to evaluate the editing efficiency of gTBE and gCBE, respectively (FIG. 26a) . In these reporters, the AG-to-AT or AG-to-AC inactive splicing acceptor (SA) could only be remediated with T-to-G or C-to-G conversion, which leads to correct splicing of EGFP-coding sequence and EGFP expression and emission of EGFP signals (FIG. 26b) . The gTBE or gCBE encoding vector was co-transfected with the T-to-G or C-to-G reporter vector containing a targeting single-guide RNA (targeting sgRNA) that targeted the corresponding mis-splicing mutations.
It was observed that as compared to a negative control, gTBEv0.1 showed T-to-G conversion activity, and gCBEv0.1 showed C-to-G conversion activity (FIG. 19c-e) .
Given the disordered N-terminal domain (NTD) of UNG contains protein binding motifs and sites for post-translational modifications18, which might constrain targeted excision activity of the glycosylase domain in ssDNA19, 20, gTBEv0.2 (SEQ ID NO: 140) (with UNG2Δ88-Y156A (SEQ ID NO: 139) fused at the C-terminus of nCas9) and gCBEv0.2 (SEQ ID NO: 142) (with UNG2Δ88-N213D (SEQ ID NO: 141) fused at the C-terminus of nCas9) were constructed (FIG. 19c) by deleting 88 amino acid residues (denoted as “Δ88” ) at the N-terminal of UNG2 to eliminate potential undesired protein-protein interactions20-22. As noted, with the N-terminal deletion of 88 amino acid residues, the remaining part of human UNG2 would be identical to the remaining part of human UNG1 with corresponding N-terminal deletion of 79 amino acid residues, and in such a case, both UNG2 and UNG1 may be termed as UNG unless it is necessary for the context to distinguish UNG2 from UNG1. Unless otherwise indicated, the amino acid mutation on UNG is numbered based on full length human UNG2.
gTBEv0.2 exhibited comparable T-to-G conversion activity with gTBEv0.1 (1.0%vs. 1.1%, FIG. 19d) , while gCBEv0.2 exhibited significantly increased C-to-G conversion activity compared with gCBEv0.1 (13.3% vs. 1.0%, FIG. 19e) . Moreover, both gTBEv0.3 (SEQ ID NO: 143) (with UNG2Δ88-Y156A fused at the N-terminus of nCas9) and gCBEv0.3 (SEQ ID NO: 154) (with UNG2Δ88-N213D fused at the N-terminus of nCas9) showed much higher editing efficiency than gTBEv0.2 and gCBEv0.2 with UNG2 mutant fused at the C-terminus of nCas9 (10.2%vs. 1.0%for gTBE and 51.4%vs. 13.3%for gCBE; FIG. 19c-e) , about 10-and 3.9-fold enhancement in editing efficiency, respectively. This indicated that it would be desired to have N’-UNG-nCas9-C’ configuration for both gTBE and gCBE. Unless otherwise indicated, this configuration was used in all the subsequent Examples.
For negative control, no editing efficiency was found for all the above-mentioned gTBE and gCBE together with a nontargeting sgRNA (FIG. 19d-e) .
In addition, various versions of N-terminal deletion were applied to UNG2 of gTBE and gCBE. For gTBE, it was demonstrated that the deletion of 88 amino acid residues in gTBEv0.3 achieved significantly increased editing efficiency than the deletion of 44, 72, and 93 amino acid residues; and for gCBE, it was demonstrated that the deletion of 88 amino acid residues in gCBEv0.3 achieved comparable editing efficiency to the deletion of 93 amino acid residues and significantly increased editing efficiency than the deletion of 44 and 72 amino acid residues (FIG. 27) .
Furthermore, the orthogonality of gTBE and gCBE for base editing was examined. Although engineered from the same original DNA glycosylase UNG, no C editing efficiency was observed for gTBEv0.3 and no T editing efficiency was observed for gCBEv0.3 (FIG. 19f) . Thus, two orthogonal base editors were developed, gTBE for direct T editing and gCBE for C editing.
Example 6. Evolution of gTBE with enhanced editing efficiency
To further increase the T-to-G conversion activity of gTBEv0.3, rational mutagenesis was performed for engineering the UNG moiety, using the T-to-G reporter to evaluate the editing efficiency in cultured mammalian cells (HEK293T cells) (FIG. 20a) . Based on structural and functional analysis, WT UNG contains five conserved motifs required for efficient glycosylase activity: the catalytic water-activating loop, the proline-rich loop, the uracil-binding motif, the glycine-serine motif, and the leucine loop23-25 (FIG. 25b) . Since Y156 in the catalytic water-activating loop and N213 in the uracil-binding motif are critical for activity switch from U excision to T or C excision, sequential and spatial neighbors of these two residues were selected for examination of their roles in the regulation of base excision activity (FIG. 20a-b) . Alanine-scanning mutagenesis was conducted by replacing all non-alanine with alanine (X-to-A) and alanine with valine (A-to-V) to cover all the residues in the regions of I150-L179 and L210-T217.
Interestingly, gTBEv1.1 (SEQ ID NO: 145) (v0.3 plus A214V) with UNG2Δ88-Y156A+A214V (SEQ ID NO: 144) was obtained with largely elevated T-to-G conversion activity of about 2.68-fold as compared with gTBEv0.3 (FIG. 28a) . To check whether there was any amino acid at position 214 that would performe better than valine, site-saturation mutagenesis focusing on the residue at position 214 was further performed. gTBEv1.2 (SEQ ID NO: 147) (v0.3 plus A214T) with UNG2Δ88-Y156A+A214T (SEQ ID NO:146) was obtained with elevated editing efficiency of about 1.06-fold in comparison with the T editing efficiency of gTBEv1.1 (FIG. 28b) .
On the other hand, the spatial neighbors of A214, nearby the Gly-Ser loop that compresses the DNA backbone 3′to the lesion (FIG. 20b) , was examined, and gTBEv1.3 (SEQ ID NO: 149) (v0.3 plus Q259A) with UNG2Δ88-Y156A+Q259A (SEQ ID NO: 148) was obtained with increased editing efficiency of about 1.46-fold as compared with gTBEv0.3 (FIG. 28c) .
Furthermore, a synergistic enhancement of T-to-G editing efficiency of about 2.7-fold in comparison with the T editing efficiency of gTBEv0.3 was found for gTBEv2 (SEQ ID NO: 151) (v0.3 plus combination of A214T and Q259A) with UNG2Δ88-Y156A+A214T+Q259A (SEQ ID NO: 150) (FIG. 20c) .
The residues in the regions of Q274-Y284, in or nearby the Leu-intercalation loop, was also scanned by sequential replacement with amino acids of distinct properties, including arginine (with positive charged side chain) , aspartic acid (with negative charged side chain) , or valine (with small hydrophobic side chain) (X-to-R, D, or V) . gTBEv3 (SEQ ID NO: 153) (v2 plus Y284D) with UNG2Δ88-Y156A+A214T+Q259A+Y284D (SEQ ID NO: 152) was found to show elevated editing efficiency of about 1.22-fold as compared with gTBEv2 (FIG. 29b) and of about 3.09-fold as compared with gTBEv0.3 (FIG. 20c) .
The improvement of T editing efficiency across different gTBE was validated at one endogenous genomic site in cultured mammalian cells (HEK293T cells) . After transfected with all-in-one constructs encoding each gTBE, together with a targeting sgRNA that targeted site 9 in CLYBL gene and mCherry for fluorescence-activated cell sorting (FACS) , mCherry-positive cells were FACS-sorted. Through target deep sequencing analysis, a gradual increase of overall T editing efficiency was obtained at T5 from 26.9%for gTBEv1.1 to 67.4%for gTBEv3, confirming that gTBEv3 was the best version for base editing at endogenous targets, with T-to-S (i.e., T-to-C or T-to-G; S = C or G base) conversions as the predominant events at this site (FIG. 20d) .
These results indicate that the mutagenesis described above effectively optimized the performance of gTBE for T-to-C and T-to-G base editing. gTBEv3 having the highest T editing efficiency was used in the subsequent Examples.
Example 7. Characterization of gTBEv3 at human genomic DNA sites
The editing profiles of gTBEv3 was characterized by targeting 20 endogenous genomic loci, most of which were used in previous base editing studies11, 12, 26, 27. It was found that gTBEv3 achieved efficient T base editing efficiency (ranged from 24.3%to 81.5%; FIG. 21a and FIG. 30a-b) , and essentially no A, C, or G editing at all the examined sites (FIG. 30c-e) . The T-to-C or T-to-G conversions were the predominant events (FIG. 30f-h) , only a low percentage of T-to-A conversion were detected (FIG. 21a and FIG. 30i) , consistent with the previous findings for gGBE3, AYBE9 and CGBEs11-15. The ratios of T-to-Sto T conversion ranged from 0.68 to 0.97 (without indels, FIG. 21b) and from 0.41 to 0.92 (with indels, FIG. 30j) . It was found that gTBEv3 also induced indels with frequency ranging from 5.2%to 45.2%at the 20 edited sites (FIG. 21c) . Furthermore, it was found that the editable range of gTBEv3 was positions 2 to 11, and the optimal editing window with high efficiency of T conversion covered protospacer positions 3 to 7, with the highest editing efficiency at position 5 (FIG. 30b) . No obvious motif preference was found for T conversions with gTBEv3 by analyzing the on-target editing and sequences of all the tested sites (FIG. 30k) .
The off-target activity of gTBEv3 was analyzed at several in silico-predicted28 guide sequence-dependent off-target sites, and the ability of gTBEv3 to mediate guide sequence-independent off-target DNA editing was characterized by using orthogonal R-loop assay in five previously reported dSaCas9 R-loops9, 29. Very low percentage of editing was found at all the guide sequence-dependent off-target loci (FIG. 21d-e and FIG. 31) , and very low frequencies (1.1%in average) was detected at all five guide sequence-independent off-target sites (FIG. 21f) . Taken together, gTBEv3 represents a highly efficient T-to-Sbase editor with low off-target effects in mammalian cells.
Example 8. Enhancement of C editing efficiency of gCBE
To examine whether the mutations emerged from the engineering of gTBE would benefit the enhancement of gCBE activity, gCBEv1.1 was generated by introducing A214V into gCBEv0.3 (FIG. 22a) . It was found that gCBEv1.1 (SEQ ID NO: 156) with UNG2Δ88-N213D+A214V (SEQ ID NO: 155) had largely elevated C-to-G conversion activity of about 1.34-fold as compared to gCBEv0.3 when evaluated using the C-to-G reporter (FIG. 32a) .
On the other hand, alanine-scanning mutagenesis was conducted on the region of D154-D189 of UNG2 to examine its role in the regulation of base excision activity, and gCBEv1.2 (SEQ ID NO: 158) (v0.3 plus K184A) with UNG2Δ88-N213D+K184A (SEQ ID NO: 157) was obtained with largely elevated C-to-G conversion activity by about 1.55-fold as compared with gCBEv0.3 (FIG. 32b) .
The combination of A214V and K184A was further investigated by combining these two mutations to generate gCBEv2 (SEQ ID NO: 160) with UNG2Δ88-N213D+K184A+A214V (SEQ ID NO: 159) , achieving C-to-G editing efficiency of about 1.3-fold compared with gCBEv0.3 (FIG. 22b) . The improvement of C editing efficiency across different gCBE was further validated by targeting an endogenous genomic site, and a gradual increase of overall C editing efficiency from 18.2%to 37.2%at C2 of the site 28 was observed (FIG. 33a) .
By targeting 16 endogenous genomic loci, the editing profiles of gCBEv2 was characterized, showing efficient C base editing efficiency ranging from 31.8%to 77.7% (FIG. 22c and FIG. 33b-d) . It was found that gCBEv2 induced predominant C-to-G conversions as well as C-to-T conversions, with the ratios of C-to-G/T to C-to-A/G/T conversion reaching up to 0.97, and there were very few C-to-A conversions detected (FIG. 22c, FIG. 33e-h) . gCBEv2 induced indels with frequency ranged from 3.1%to 48.3%at the examined sites (FIG. 33i) . After analyzing the sequences of all the tested sites, it was found that the editable range of gCBEv2 was positions 2 to 9 (FIG. 33c) , and gCBEv2 showed preferences for editing at AC or TC motifs with a higher efficiency than other motifs (FIG. 33j) .
When compared to CGBE112, a C-to-G base editor, it was found that gCBEv2 showed higher editing efficiency at certain positions towards the distal end of the target sequence (FIG. 22d and FIG. 33c) , indicating its positional preference within different optimal editing windows (positions 2 to 6 for gCBEv2 vs. positions 5 to 7 for CGBE112) . gCBEv2 induced fewer indels at site 36, and more indels at site 28 and site 29 than CGBE1 (FIG. 33k) . To be noted, using the orthogonal R-loop assay9, 29 mentioned above, it was found that gCBEv2 showed comparably frequencies with CGBE1 at two guide sequence-independent off-target sites (FIG. 22e-f and FIG. 33l) .
Moreover, it was found that gCBEv2 could solely facilitate C editing, and there was essentially no T editing at all the examined sites (FIG. 33c-d) . The editing specificity of gCBEv2, together with that of gTBEv3 (FIG. 30b-e) , consolidated the orthogonality of these two base editors for direct base editing.
Example 9. Applications of gTBE and gCBE
The potential applications of gTBE and gCBE were further evaluated. The gTBE could not only remediate inactive splicing signals in the intron-split EGFP reporter systems used above (FIG. 19-20 and FIG. 26) , but also be used for exon skipping by disrupting splicing signals at splicing donor (SD) or splicing acceptor (SA) sites (FIG. 23a) . After analyzing the splicing sites in 16 well-studied genes for gene and cell therapy research30-32, it was found that gTBE and gCBE, together with other existing base editors, provide 1904 sgRNA candidates (protospacer sequence /guide sequence shown in Table 3) with the SD or SA sites located in each optimal editing window (FIG. 23b and FIG. 34a) . Among the 771 sgRNA candidates for ABE and CBE targeting, 156 and 103 candidates overlapped with those for gGBE and gTBE, respectively (FIG. 23c) . Moreover, 232 and 223 sgRNA candidates could only be screened by gGBE or gTBE targeting, respectively (FIG. 23c) . For gCBE, apart from 205 sgRNA candidates overlapped with those for CBE, there were 148 unique sgRNA candidates (FIG. 34b) . The availability of these base editors could largely expand the scope of guide RNA screening for efficient editing at splicing sites (FIG. 34) . In addition, these newly developed base editors could be utilized for bypassing premature termination codons (PTCs) and introduction of PTCs (FIG. 35) . The gTBE and gCBE could provide more versatile codon outcomes from PTCs editing (FIG. 35b) and introduce PTCs by editing more codons encoding various amino acids
(FIG. 35d) . To potentially disrupt gene function by introduction of PTCs, 851 sgRNA candidates (protospacer sequence /guide sequences shown in Table 4) targeting various codons for PTCs introduction in 15 genes were analyzed with gGBE and CBE, with 191 TAC and 124 TCA for gGBE targeting (FIG. 35e) .
To illustrate these applications, editing the splicing sites in human DMD gene (Duchenne muscular dystrophy, encoding dystrophin) that cannot be targeted with ABE or CBE was developed. A series of sgRNAs specifically targeting SD or SA sites was designed and screened with gTBEv3 or gCBEv2 (FIG. 23d and FIG. 34c) , including three sgRNAs targeting the SD sites of DMD exon 45 (FIG. 23e) , 12, and 37 (FIG. 34d) uniquely targeted by gTBEv3. Disruption of the SD site of exon 45, thus leading to exon skipping, would be applicable to restore dystrophin expression in 9%DMD patients33. Thus, gTBEv3-encoding mRNA and sgRNA targeting the SD site of DMD exon 45 were co-injected into zygotes of humanized mice to explore the potential application of gTBE. It was found that 100% (20/20) mouse embryos harbored efficient base conversion (ranged from 35.0%to 97.0%) at the desired position T3 (FIG. 23f-g) , indicating the great potential of gTBE for human disease modeling and gene therapy. Overall, gBEs, including gTBE, gCBE, and gGBE, provide more options for the sites that deaminase-based base editors could not target, largely expanding the targeting scope of base editors.
Example 10. Comparison of different base editing systems
gTBEv4 (SEQ ID NO: 161) and gTBEv5 (SEQ ID NO: 162) were generated by inserting the UNG2 mutant (SEQ ID NO: 152) contained in gTBEv3 (SEQ ID NO: 153) into split nCas9 domains at different locations (FIG. 24b) . For gTBEv4 (SEQ ID NO: 161) , the UNG2 mutant (SEQ ID NO: 152) was embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) . For gTBEv5 (SEQ ID NO: 162) , the UNG2 mutant (SEQ ID NO: 152) was embedded between positions 2-1047 of nCas9 (SEQ ID NO: 2) and positions 1064-1368 of nCas9 (SEQ ID NO: 2) . Note that the first amino acid residue D of nCas9 (SEQ ID NO: 2) was numbered as position 2 instead of position 1.
Similarly, gCBEv3 (SEQ ID NO: 164) (FIG. 38a) was generated by replacing the UNG mutant (SEQ ID NO: 152) in gTBEv5 (SEQ ID NO: 162) with the UNG mutant (SEQ ID NO: 159) in gCBEv2 (SEQ ID NO: 160) .
To better characterize the performance of various deaminase-free base editors, a side-by-side comparison of base editors in this study and those from the other two studies in He et al. developing a TSBE3 for T-to-G/C substitutions using protein language model (PLM) -assisted strategy34 and Ye et al. conducting rounds of random mutagenesis by error-prone PCR for directed evolution in Escherichia coli and obtaining several deaminase-free base editors (DAF-TBEs and DAF-CBEs) 35 were made (FIG. 24a) . The basic architectures of the above-mentioned base editors are different, for instance, TSBE3 was constructed using an embedding strategy and DAF-TBE2 using a circularly permuted strategy (FIG. 24b) .
The T editing efficiency of various thymine base editors was compared at 17 endogenous sites, including five sites from He’s study34 and five sites from Ye’s study35 (FIG. 24c and FIG. 36) . For the base editors with UNG mutants fused at the N-terminus of nCas9, gTBEv3 showed higher editing efficiency than DAF-TBE at the overwhelming majority of Ts (29 out of 35) of tested sites (FIG. 24c, FIG. 36f) , indicating that UNG mutants generated herein by rational mutagenesis are superior to those by random mutagenesis.
gTBEv3 was also compared with gTBEv4 and gTBEv5, two base editors constructed using the embedding strategy. gTBEv4 showed a shifted editing window of positions 7-13 from positions 3-7 (FIG. 24d) , with no significant difference in average editing efficiency from gTBEv3 (23.2%vs. 23.1%, FIG. 36f) . For gTBEv5, the editing efficiency was largely increased compared to that of gTBEv3 (averaging 39.3%vs. 23.1%, FIG. 36f) and gTBEv4 and others, with the same predominant T-to-Sconversions (FIG. 36a-d and g) , and the optimal editing window covered protospacer positions 5 to 9 (FIG. 24d) .
TSBE3 (carrying L83Q and G116E mutations, equivalent to L74Q and G107E in UNG1) is an nCas9-embedded base editor with almost the same insertion position as gTBEv5 (FIG. 24c) . gTBEv5 showed higher editing efficiency than TSBE3 (39.3%vs. 22.5%, FIG. 36f) at the overwhelming majority of Ts (29 out of 35) of the tested sites (FIG. 24c) , indicating that the UNG mutant generated herein by rational mutagenesis are superior to those generated by PLM-assisted mutagenesis. The optimal editing window of TSBE3 covered protospacer positions 4 to 9 (FIG. 24d) .
The circularly permuted DAF-TBE2 showed low average editing efficiency and an editing window of positions 9-13, different from the editing window (positions 2-6) of DAF-TBE (FIG. 24d) .
Despite showing the highest average editing efficiency, gTBEv5 induced comparable indel rates to that of DAF-TBE (14.4%vs. 14.4%) , DAF-TBE2 (14.4%vs. 10.3%) , and TSBE3 (14.4%vs. 13.5%, FIG. 36e-g) . To be noted, gTBEs induced much fewer unintended T editing than TSBE3 and DAF-TBEs in the proximal DNA sequence upstream from two sites (site 38 and site 44) harboring unintended edits (FIG. B13) , consistent with the finding that the NTD of UNG could promote targeting the enzyme to ssDNA–dsDNA junctions19.
Similarly, the C editing efficiency of various base editors (FIG. 38a) were compared at 19 endogenous sites, including five sites from He’s study34 and five sites from Ye’s study35 (FIG. 38b-d) . It was found that both gCBEv2 and gCBEv3 showed higher overall average editing efficiency than all the other base editors (FIG. 38e-f) , especially for gCBEv3. gCBEv2 outperformed DAF-CBE (30.1%vs. 21.3%) and CGBE-CDG (30.1%vs. 19.3%) for the average efficiency of base conversion (FIG. 38e-f) , indicating that the UNG mutant generated herein by rational mutagenesis are superior to those by random mutagenesis. gCBEv2 induced comparable average indel rates with other deaminase-free base editors, including DAF-CBE (16.8%vs. 16.9%) , DAF-CBE2 (16.8%vs. 12.1%) , and CGBE-CDG (16.8%vs. 13.6%, FIG. 38d-g) .The C-to-G editing frequency and purity of different base editors showed respective advantages for CGBE1 and various deaminase-free base editors at different cytosine position across the protospacer (FIG. 39a-b) . Each base editor can edit its target base within a certain editable window, that is, positions 2 to 9 for gCBEv2, positions 2 to 11 for gCBEv3, positions 4 to 10 for CGBE1, positions 2 to 9 for CGBE-CDG, positions 2 to 9 for DAF-CBE, and positions 9 to 12 for DAF-CBE2 (FIG. 39c) .
By analyzing the off-target effects both at some guide sequence-dependent and guide sequence-independent off-target sites, it was found that gTBEs and gCBEs induced comparable low-level off-target edits similar to that of other base editors at most sites (FIG. 40a-c) . Moreover, by performing transcriptome-wide RNA analysis, it was found that gTBEv5 and gCBEv3 did not exhibit significant off-target RNA editing or impact the cell’s inherent DNA repair processes (FIG. 40d) , consistent with those of DAF-TBE, DAF-CBE, CGBE-CDG, and TSBE334, 35.
Prime editing (PE) system could theoretically mediate all types of base substitution, including T-to-G conversion and C-to-G conversion39. gTBEv3 and gTBEv5 were compared with the recently evolved PE6d system40 at six previously reported endogenous sites35 in HEK293T cells. gTBEv3 and gTBEv5 outperformed PE6d or PE6d max for T-to-G conversion at four tested sites (FIG. 41a) . gCBEv2 and gCBEv3 outperformed PE6d or PE6d max for C-to-G conversion at five tested sites (FIG. 41b) . These findings indicate that base editing and prime editing offer complementary strengths, and base editors generally show more efficient editing if the target base is positioned optimally. In addition, gTBEs and gCBEs exhibited efficient T and C editing efficiency across three different human cell lines (HEK293T, U2OS, and HuH-7 cells) , with slight perturbations of the product purity for gTBEs and comparable substitution frequency of certain base for gCBEs in different cell lines (FIG. 42) .
Taken together, it was found that gTBEs and gCBEs in this study outperformed other base editors, including DAF-TBEs, DAF-CBE, TSBE3, and CGBE-CDG from the other two studies. And the alternative editing windows of different base editors would provide more choices for proper base conversion.
DISCUSSION
The deaminase-based base editor (dBE) and derivatives thereof enable direct editing of adenine (A) and cytosine (C) , but not thymine (T) . In human, about 19%of the pathogenic single nucleotide polymorphism (SNP) could be corrected through T-to-G conversion9. In this study, two orthogonal base editors, gTBE and gCBE, that could achieve highly efficient T and C editing in both cultured human cells and mouse embryos were developed. The gTBE and gCBE could greatly broaden the targeting scope of base editors by breaking the limitations of PAM and narrow editing window, thus increasing the opportunity to obtain an efficient strategy for further research. The T-to-Sconversion ability of gTBE allows for a variety of gene editing applications, including editing splicing sites, as well as editing that bypass PTCs.
It has been shown that the same original DNA glycosylase could be engineered into enzymes that selectively excise different specific nucleotide bases and harnessed to develop novel base editors using the deaminase-free glycosylase-based strategy of the disclosure. The enhanced editing efficiency could be attributed to the mutations in the UNG moiety that facilitate its specific substrate preference or ssDNA-binding activity, or both. The high editing efficiency of gTBEv5 indicates that insertion of UNG mutant into split nCas9 might enhance target DNA accessibility by modulate the interaction between the UNG moiety and the target DNA.
In this study, the glycosylase-mediated base editors developed in different studies were systematically compared. Structure-informed rational design was used in the disclosure, successfully engineering gTBE and gCBE for highly efficient T and C editing, respectively. He et al. utilized PLM to assist the engineering of TSBE3, while Ye et al. obtained DAF-TBE and DAF-CBE by performing random mutagenesis (FIG. 24a) . It was found that gTBE/gCBE in the disclosure outperformed DAF-TBE, DAF-CBE, TSBE3, and CGBE-CDG, with higher average editing efficiency and alternative editing windows (FIG. 24c-d and FIG. 38-39) .
Wild-type UNG proteins are highly specific against uracil in both ssDNA and dsDNA, with a preference for ssDNA43. The NTD of UNG containing motifs and sites for undesired protein-protein interactions and post-translational modifications could promote targeting the enzyme to ssDNA–dsDNA junctions19, 20.
TSBE3, with full length UNG2, and DAF-TBEs induced more undesired edits than gTBEs in the proximal DNA sequence upstream from two sites harboring unintended edits (FIG. 37) .
In summary, two orthogonal base editors based on the same original DNA glycosylase have been engineered for direct T editing and direct C editing, and structure-informed rational design represents an efficient and efficacious protein engineering strategy, providing reference and solving thought for the subsequent evolution of other proteins.
Methods
Molecular cloning
Base editor constructs used in this study were cloned into a mammalian expression plasmid backbone under the control of a EF1α promoter by standard molecular cloning techniques, and the two intron-split EGFP reporters were constructed similar to those described previously9, except that the engineered sequence containing the last 86 base pairs (bp) intron of human RPS5 was inserted between BFP and EGFP coding sequences. And the corresponding mutations at the splice acceptor site were made to construct T-to-G reporter or C-to-G reporter via site-directed mutagenesis by PCR, respectively. Mutations at the splice acceptor site led to inactive EGFP production. Encouraged by the findings from previous base editors12,
15, the corresponding mutations at the splice acceptor site were put at position 6 across the protospacer.
The wild-type UNG2 sequence (313 amino acids long) (SEQ ID NO: 133) was PCR-amplified from cDNA of HEK293T, UNG2-Y156A, UNG2-N213D, UNG-NTD-truncated mutants, and corresponding combinations were constructed via site-directed mutagenesis by PCR. The UNG mutants were fused at different orientations with respect to nCas9 via Gibson Assembly method. PE6d architecture harbored a human codon-optimized RNaseH-truncated evolved and engineered M-MLV variant with R221K/N394K/H840A mutations in SpCas9. The nick sgRNA and epegRNA with tevoPreQ1 motif were cloned into PE6d construct using Golden Gate assembly, resulting in an all-in-one plasmid. For PE6d max, the codon-optimized hMLH1dn was co-expressed with PE6d.
UNG mutagenesis libraries were designed and generated as previously described52 with some modification. In brief, the region of 98-313 aa in UNG2 were divided into 8 aa long segments. BpiI-harboring mutants containing Y156A or N213D were introduced via site-directed mutagenesis by PCR. For evolution of gTBE, the regions of I150-L179, A158-K261, L210-T217, and Q274-Y284 were selected for rounds of sequential alanine /arginine /aspartic acid /valine substitutions (X-to-A, R, D, or V) . And site-saturation mutagenesis of the residue 214 was conducted to check whether there is any amino acid at this position that would perform better than the valine. For evolution of gCBE, the region of D154-D189 was selected for sequential alanine substitutions (X-to-A) . To cover all the residues in the corresponding segments for sequential alanine substitutions, alanine was mutated to valine (A-to-V) . Oligos coding for the mutants annealed and ligated into corresponding BpiI-digested backbone vectors.
The Cas-OFFinder28 was used to search for potential guide sequence-dependent off-target sites of Cas9 RNA-guided endonucleases with a maximum of 3 mismatches (with no bulges) . For sgRNAs targeting DMD splicing sites with an NGN PAM, a PAM-flexible Cas9 variant SpG (SEQ ID NO: 163) was used in place of nCas9 (SEQ ID NO: 2) . The sgRNA oligos were annealed and ligated into BpiI sites.
Cell culture, Transfection, and flow cytometry analysis
HEK293T, HuH-7, and U2OS cells were cultured with DMEM (Catalog#11995065, Gibco) supplemented with 10%fetal bovine serum (Catalog#04-001-1ACS, BI) and 0.1 mM non-essential amino acids (Catalog# 11140-050, Gibco) in an incubator at 37 ℃ with 5%CO2.
Mutant screening was conducted in 48-well plates, with 3 × 104 HEK293T cells per well plated in 250 μL of complete growth medium the day before transfection. Between 16 and 24 h after seeding, cells were co-transfected with 250 ng gTBE (or gCBE) plasmids, 250 ng T-to-G (or C-to-G) reporter plasmids, and 1 μg Polyethylenimine (PEI) (DNA/PEI ratio of 1: 2) per well. For cell transfection of HEK293T, HuH-7, and U2OS for FACS, 5 × 105 cells per well were plated in 12-well plates with 1 ml complete growth medium the day before transfection. After 14-16h, 2 μg all-in-one plasmids expressing gTBE or gCBE and corresponding sgRNA were transfected into cells using PEI (DNA/PEI ratio of 1: 2) or FuGENE HD transfection reagent (DNA: FuGENE ratio of 1: 3; E2311, Promega) . Orthogonal R-loop assays were performed as described previously9, 29. In brief, 1 μg of gTBE or gCBE expression plasmid with sgRNA targeting the corresponding site (with mCherry as reporter) and 1 μg of dSaCas9 expression plasmid with corresponding sgRNA targeting five off-target sites to generate R-loops (with EGFP as reporter) were co-transfected into HEK293T cells using PEI (DNA/PEI ratio of 1: 2) . For prime editing, 2 μg all-in-one plasmids containing PE6d, nick sgRNA, and epegRNA, or 1 μg all-in-one plasmid and 1 μg of hMLH1dn plasmid were co-transfected into cells using PEI (DNA/PEI ratio of 1: 2) .
At 48h post-transfection, expression of mCherry, BFP, and EGFP fluorescence were analyzed by BD FACS Aria III or Beckman CytoFLEX S. Flow cytometry results were analyzed with FlowJo V10.5.3. The gating strategy in the identification of mCherry+, BFP+ and EGFP+ cells for on-target editing efficiency evaluation was supplied in FIG. 26b.
Target sequencing of endogenous sites and data analysis
Endogenous target sites of interest were amplified from genomic DNA as previously described9. Briefly, 10,000 positive cells with mCherry were isolated by FACS after 72 h of transfection, then genomic DNA was extracted and the regions of interest for target sites were amplified by PCR using site-specific primers. The purified PCR products were analyzed by Sanger sequencing (Genewiz) .
Target sequencing data analysis was described in the previous paper3. In brief, the amplicons were ligated to adapters and sequencing was performed on the Illumina MiSeq platforms, then the targeted amplicon sequencing reads were processed using fastp with default parameters53, and further amplicon sequencing analysis were performed by CRISPResso254. T-to-G purity was calculated as T-to-G editing efficiency / (T-to-C editing efficiency + T-to-G editing efficiency + T-to-A editing efficiency) . T-to-Sconversion ratio was calculated as (T-to-C editing efficiency + T-to-G editing efficiency) / (T-to-C editing efficiency + T-to-G editing efficiency + T-to-A editing efficiency) . Protospacer sequences guide sequence are shown in Table 2.
In vitro transcription of gTBEv3 mRNA and DMD sgRNA
The mRNA and sgRNA preparations were performed as previously described9. The gTBEv3 expression plasmid was linearized by the FastDigest KpnI restriction enzyme (Catalog#FD0524, Thermo Fisher) , purified using Gel Extraction Kit (Catalog#D2500-03, Omega) , and used as the template for in vitro transcription (IVT) using the mMESSAGE mMACHINE T7 Ultra kit (Catalog#AM1345, Thermo Ambion) . For DMD-sgRNA preparation, T7 promoter sequence was added to the sgRNA template by PCR amplification. The T7-DMD-sgRNA PCR product was purified using Gel Extraction Kit (Catalog#D2500-03, Omega) and used as the template for IVT of sgRNAs using the MEGAshortscript T7 kit (Catalog#AM1354, invitrogen) . The gTBEv3-encoding mRNA and DMD-sgRNA were purified using the MEGAclear kit (Catalog#AM1908, invitrogen) , eluted in RNase-free water and stored at -80℃ until use.
Animals and microinjection of mouse zygotes
Animal manipulations were consistent with those reported previously3. Experiments involving mice were approved by the Biomedical Research Ethics Committee of Center for HuidaGene Therapeutics Co. Ltd. Mice were maintained in a specific pathogen-free facility under a 12-hour dark–light cycle, and constant temperature (20–26℃) and humidity (40–60%) maintenance.
Super ovulated humanized DMD females with human DMD exon 45 in C57BL/6 background (4 weeks old) were mated with C57BL/6 males (8 weeks old) , and females from the ICR strain were used as foster mothers. Fertilized embryos were collected from oviducts 21 h post hCG injection. For zygote injection, the mixture of gTBEv3-encoding mRNA (250 ng/μL) and DMD-sgRNA (100 ng/μL) was injected into the cytoplasm of 1-cell embryo in a droplet of M2 medium using a FemtoJet microinjector (Eppendorf) with constant flow settings. The injected embryos were cultured in M16 medium with amino acids to blastocysts for three days (37℃ and 5%CO2) before genomic DNA extraction and target amplification.
RNA sequencing experiments
HEK293T cells were plated in 12-well plates as above and transfected with 2 μg of gTBEv5, gCBEv3, CGBE1, or mCherry plasmids using PEI (DNA/PEI ratio of 1: 2) . At 48 hours after transfection, around 5 × 106 cells were collected. Total RNA was extracted with a TRIzol-based method, fragmented, and reverse transcribed to cDNAs with HiScript Q RT SuperMix according to the manufacturer’s instructions. Total RNA integrity was quantified using an Agilent 2100 Bioanalyzer. The RNA-seq library was qualified using the Illumina NovaSeq 6000 platform (performed by GENEWIZ) . Trimmomatic (v. 0.39-2) 55 was used to filter the RNAseq raw data. The clean reads were aligned to the hg38 reference genome with Hisat2 (v. 2.2.1) 56. RNA editing sites were calculated using REDItools257 with default parameters. The dbSNP (v. 146) database downloaded from NCBI was used to filter the sites overlapped with common single nucleotide variants (SNVs) . The sites with less than five mutated or nonmutated reads were further filtered.
StringTie58 was used to calculate expression value. DESeq259 was used to calculate differentially expressed genes with FDR<0.05 and Fold change>1.
Statistical analysis
Statistical tests performed by Graphpad Prism 8 included the two-tailed unpaired two-sample t-test or Dunnett's multiple comparisons test after one-way ANOVA.
REFERENCES
1. Porto, E.M., Komor, A.C., Slaymaker, I.M. &Yeo, G. W. Base editing: advances and therapeutic opportunities. Nat Rev Drug Discov 19, 839-859 (2020) .
2. Rees, H.A. &Liu, D.R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat Rev Genet 19, 770-788 (2018) .
3. Tong, H. et al. Programmable deaminase-free base editors for G-to-Y conversion by engineered glycosylase. Natl Sci Rev 10, nwad143 (2023) .
4. Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017) .
5. Komor, A.C., Kim, Y.B., Packer, M.S., Zuris, J.A. &Liu, D.R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424 (2016) .
6. Mok, B.Y. et al. A bacterial cytidine deaminase toxin enables CRISPR-free mitochondrial base editing. Nature 583, 631-637 (2020) .
7. Lei, Z. et al. Mitochondrial base editor induces substantial nuclear off-target mutations. Nature 606, 804-811 (2022) .
8. Zhang, X. et al. Dual base editor catalyzes both cytosine and adenine base conversions in human cells. Nat Biotechnol 38, 856-860 (2020) .
9. Tong, H. et al. Programmable A-to-Y base editing by fusing an adenine base editor with an N-methylpurine DNA glycosylase. Nat Biotechnol 41, 1080-1084 (2023) .
10. Chen, L. et al. Adenine transversion editors enable precise, efficient A*T-to-C*G base editing in mammalian cells and embryos. Nat Biotechnol (2023) .
11. Zhao, D. et al. Glycosylase base editors enable C-to-A and C-to-G base changes. Nat Biotechnol 39, 35-40 (2021) .
12. Kurt, I.C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat Biotechnol 39, 41-46 (2021) .
13. Koblan, L.W. et al. Efficient C*G-to-G*C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat Biotechnol 39, 1414-1425 (2021) .
14. Chen, L. et al. Programmable C: G to G: C genome editing with CRISPR-Cas9-directed base excision repair proteins. Nat Commun 12, 1384 (2021) .
15. Yuan, T. et al. Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods. Nat Commun 12, 4902 (2021) .
16. Nilsen, H. et al. Nuclear and mitochondrial uracil-DNA glycosylases are generated by alternative splicing and transcription from different positions in the UNG gene. Nucleic Acids Res 25, 750-755 (1997) .
17. Kavli, B. et al. Excision of cytosine and thymine from DNA by mutants of human uracil-DNA glycosylase. EMBO J 15, 3442-3447 (1996) .
18. Rodriguez, G. et al. Disordered N-Terminal Domain of Human Uracil DNA Glycosylase (hUNG2) Enhances DNA Translocation. ACS Chem Biol 12, 2260-2263 (2017) .
19. Weiser, B.P., Rodriguez, G., Cole, P.A. &Stivers, J. T. N-terminal domain of human uracil DNA glycosylase (hUNG2) promotes targeting to uracil sites adjacent to ssDNA-dsDNA junctions. Nucleic Acids Res 46, 7169-7178 (2018) .
20. Perkins, J.L. &Zhao, L. The N-terminal domain of uracil-DNA glycosylase: Roles for disordered regions. DNA Repair (Amst) 101, 103077 (2021) .
21. Nagelhus, T.A. et al. A sequence in the N-terminal region of human uracil-DNA glycosylase with homology to XPA interacts with the C-terminal part of the 34-kDa subunit of replication protein A. J Biol Chem 272, 6561-6566 (1997) .
22. Torseth, K. et al. The UNG2 Arg88Cys variant abrogates RPA-mediated recruitment of UNG2 to single-stranded DNA. DNA Repair (Amst) 11, 559-569 (2012) .
23. Schormann, N., Ricciardi, R. &Chattopadhyay, D. Uracil-DNA glycosylases-structural and functional perspectives on an essential family of DNA repair enzymes. Protein Sci 23, 1667-1685 (2014) .
24. Parikh, S.S. et al. Uracil-DNA glycosylase-DNA substrate and product structures: conformational strain promotes catalytic efficiency by coupled stereoelectronic effects. Proc Natl Acad Sci U S A 97, 5083-5088 (2000) .
25. Parikh, S.S. et al. Base excision repair initiation revealed by crystal structures and binding kinetics of human uracil-DNA glycosylase with DNA. EMBO J 17, 5214-5226 (1998) .
26. Chen, L. et al. Re-engineering the adenine deaminase TadA-8e for efficient and specific CRISPR-based cytosine base editing. Nat Biotechnol 41, 663-672 (2023) .
27. Jeong, Y.K. et al. Adenine base editor engineering reduces editing of bystander cytosines. Nat Biotechnol 39, 1426-1433 (2021) .
28. Bae, S., Park, J. &Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473-1475 (2014) .
29. Richter, M.F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat Biotechnol 38, 883-891 (2020) .
30. Uddin, F., Rudin, C.M. &Sen, T. CRISPR Gene Therapy: Applications, Limitations, and Implications for the Future. Front Oncol 10, 1387 (2020) .
31. Nordestgaard, B.G., Nicholls, S.J., Langsted, A., Ray, K.K. &Tybjaerg-Hansen, A. Advances in lipid-lowering therapy through gene-silencing technologies. Nat Rev Cardiol 15, 261-272 (2018) .
32. Zhang, X. et al. Gene knockout in cellular immunotherapy: Application and limitations. Cancer Lett 540, 215736 (2022) .
33. Bladen, C.L. et al. The TREAT-NMD DMD Global Database: analysis of more than 7,000 Duchenne muscular dystrophy mutations. Hum Mutat 36, 395-402 (2015) .
34. He, Y. et al. Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing. Mol Cell, Online ahead of print (2024) .
35. Ye, L. et al. Glycosylase-based base editors for efficient T-to-G and C-to-G editing in mammalian cells. Nat Biotechnol, Online ahead of print (2024) .
36. Li, S. et al. Docking sites inside Cas9 for adenine base editing diversification and RNA off-target elimination. Nat Commun 11, 5827 (2020) .
37. Liu, Y. et al. A Cas-embedding strategy for minimizing off-target effects of DNA base editors. Nat Commun 11, 6073 (2020) .
38. Nguyen Tran, M.T. et al. Engineering domain-inlaid SaCas9 adenine base editors with reduced RNA off-targets and increased on-target DNA editing. Nat Commun 11, 4871 (2020) .
39. Anzalone, A.V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149-157 (2019) .
40. Doman, J.L. et al. Phage-assisted evolution and protein engineering yield compact, efficient prime editors. Cell 186, 3983-4002 e3926 (2023) .
41. Zuo, E. et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289-292 (2019) .
42. Yan, N. et al. Cytosine base editors induce off-target mutations and adverse phenotypic effects in transgenic mice. Nat Commun 14, 1784 (2023) .
43. Slupphaug, G. et al. Properties of a recombinant human uracil-DNA glycosylase from the UNG gene and evidence that UNG encodes the major uracil-DNA glycosylase. Biochemistry 34, 128-138 (1995) .
44. Chen, L. et al. Engineering a precise adenine base editor with minimal bystander editing. Nat Chem Biol 19, 101-110 (2023) .
45. Kim, Y.B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat Biotechnol 35, 371-376 (2017) .
46. Huang, M.E. et al. C-to-G editing generates double-strand breaks causing deletion, transversion and translocation. Nat Cell Biol 26, 294-304 (2024) .
47. Hindi, N.N., Elsakrmy, N. &Ramotar, D. The base excision repair process: comparison between higher and lower eukaryotes. Cell Mol Life Sci 78, 7943-7965 (2021) .
48. Thompson, P.S. &Cortez, D. New insights into abasic site repair and tolerance. DNA Repair (Amst) 90, 102866 (2020) .
49. Wang, Y. et al. Engineering of the Translesion DNA Synthesis Pathway Enables Controllable C-to-G and C-to-A Base Editing in Corynebacterium glutamicum. ACS Synth Biol 11, 3368-3378 (2022) .
50. Sun, N. et al. Reconstructed glycosylase base editors GBE2.0 with enhanced C-to-G base editing efficiency and purity. Mol Ther 30, 2452-2463 (2022) .
51. Komor, A.C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C: G-to-T: A base editors with higher efficiency and product purity. SCI ADV 3, eaao4774 (2017) .
52. Tong, H. et al. High-fidelity Cas13 variants for targeted RNA degradation with minimal collateral effects. Nat Biotechnol 41, 108-119 (2023) .
53. Chen, S., Zhou, Y., Chen, Y. &Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884-i890 (2018) .
54. Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat Biotechnol 37, 224-226 (2019) .
55. Bolger, A.M., Lohse, M. &Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120 (2014) .
56. Kim, D., Paggi, J.M., Park, C., Bennett, C. &Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37, 907-915 (2019) .
57. Flati, T. et al. HPC-REDItools: a novel HPC-aware tool for improved large scale RNA-editing analysis. BMC Bioinformatics 21, 353 (2020) .
58. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290-295 (2015) .
59. Anders, S. &Huber, W. Differential expression analysis for sequence count data. Genome Biol 11, R106 (2010) .
60. Krusong, K., Carpenter, E.P., Bellamy, S.R., Savva, R. &Baldwin, G. S. A comparative study of uracil-DNA glycosylases from human and herpes simplex virus type 1. J Biol Chem 281, 4983-4992 (2006) .
EXEMPLARY SEQUENCES
TABLE A
TABLE B
TABLE C
TABLE D
Claims (42)
- A fusion protein comprising:(1) a nucleic acid programmable DNA binding domain (napDNAbd) capable of binding a target dsDNA comprising:(a) a first deoxyribonucleotide (e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine) ) in a protospacer sequence on a nontarget strand (edited strand) of the target dsDNA, and(b) a second deoxyribonucleotide (e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine) ) base pairing with the first deoxyribonucleotide (e.g., dG, dT, dC) and in a target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence; and(2) a base excising domain capable of excising the base (e.g., guanine, thymine, cytosine) of the first deoxyribonucleotide;wherein the fusion protein does not comprise a deaminase domain, e.g., an adenine or cytosine deaminase domain, e.g., TadA and variants thereof; andwherein the first deoxyribonucleotide is deoxyguanosine (dG) , thymidine (dT) , or deoxycytidine (dC) .
- The fusion protein of any preceding claim, wherein the conversion of the first deoxyribonucleotide to the fourth deoxyribonucleotide is dG-to-dA, dG-to-dT, dG-to-dC, dT-to-dA, dT-to-dC, dT-to-dG, dC-to-dA, dC-to-dT, or dC-to-dG.
- The fusion protein of any preceding claim, wherein the base excising domain comprises a glycosylase.
- The fusion protein of any preceding claim, wherein the glycosylase is selected from the group consisting of N-methylpurine DNA glycosylase (MPG) , 8-oxoguanine DNA glycosylase (OGG1) , methyl-CpG binding domain 4, DNA glycosylase (MBD4) , thymine DNA glycosylase (TDG) , uracil DNA glycosylase (UNG) , single-strand-selective monofunctional uracil-DNA glycosylase 1 (SMUG1) , mutY DNA glycosylase (MUTYH) , nth like DNA glycosylase 1 (NTHL1) , nei like DNA glycosylase 1 (NEIL1) , nei like DNA glycosylase 2 (NEIL2) , nei like DNA glycosylase 3 (NEIL3) , and mutants thereof capable of recognizing and excising a base from a nucleotide of a nucleic acid.
- The fusion protein of any preceding claim, wherein the base excising domain comprises an N-methylpurine DNA glycosylase (MPG) .
- The fusion protein of any preceding claim, wherein the MPG comprises an amino acid substitution relative to a reference MPG of SEQ ID NO: 7 at a position that is corresponding to a position or that is a position selected from the group consisting of N169, D175, C178, and/or Q294 of the reference MPG, wherein the position is numbered according to SEQ ID NO: 1.
- The fusion protein of any preceding claim, wherein the amino acid substitution is a substitution with R, A, N, or G.
- The fusion protein of any preceding claim, wherein the MPG comprises an amino acid substitution relative to a reference MPG of SEQ ID NO: 7 that is corresponding to a substitution or that is a substitution selected from the group consisting of N169G, D175R, C178N, Q294R, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 1.
- The fusion protein of any preceding claim, wherein the MPG comprises a combination substitution relative to a reference MPG of SEQ ID NO: 7 that is corresponding to a combination substitution of N169G, D175R, C178N, and Q294R, wherein the position is numbered according to SEQ ID NO: 1.
- The fusion protein of any preceding claim, wherein the MPG comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 8, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, or 38.
- The fusion protein of any preceding claim, wherein the MPG is (substantially) capable of excising guanine of dG.
- The fusion protein of any preceding claim, wherein the base excising domain comprises an uracil-DNA glycosylase (UNG) .
- The fusion protein of any preceding claim, wherein the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 or 137 at a position that is corresponding to a position or that is a position selected from the group consisting of K184, A214, Q259, and/or Y284 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- The fusion protein of any preceding claim, wherein the amino acid substitution is a substitution with A, D, V, or T.
- The fusion protein of any preceding claim, wherein the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 or 137 that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, wherein the position is numbered according to SEQ ID NO: 133.
- The fusion protein of any preceding claim, wherein the UNG comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-65, 1-66, 1-67, 1-68, 1-69, 1-70, 1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78, 1-79, 1-80, 1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88, 1-89, 1-90, 1-91, 1-92, 1-93, 1-94, 1-95, 1-96, 1-97, 1-98, 1-99, or 1-100 of the reference UNG of SEQ ID NO: 135 or 137, wherein the position is numbered according to SEQ ID NO: 133.
- The fusion protein of any preceding claim, wherein the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 135 that is corresponding to a substitution or that is a substitution selected from the group consisting of A214T, Q259A, Y284D, and a combination of any two or more substitutions thereof, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- The fusion protein of any preceding claim, wherein the UNG comprises an amino acid substitution relative to a reference UNG of SEQ ID NO: 137 that is corresponding to a substitution or that is a substitution selected from the group consisting of K184A, A214V, and a combination of the two substitutions, and comprises a deletion of amino acids at positions that are corresponding to positions or that are positions 1-88 of the reference UNG, wherein the position is numbered according to SEQ ID NO: 133.
- The fusion protein of any preceding claim, wherein the UNG comprising said amino acid mutation comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 56, 58, 60, 62, 135, 137, 139, 141, 144, 146, 148, 150, 152, 155, 157, and 159 or an N-terminal truncation thereof lacking the most N-terminal Methionine (M) (coded by start codon ATG) .
- The fusion protein of any preceding claim, wherein the UNG is (substantially) capable of excising thymine of dT.
- The fusion protein of any preceding claim, wherein the UNG is (substantially) capable of excising cytosine of dC.
- The fusion protein of any preceding claim, wherein the napDNAbd is RNA programmable DNA binding protein.
- The fusion protein of any preceding claim, wherein the napDNAbd is selected from the group consisting of CRISPR-associated (Cas) protein, IscB, IsrB, Argonaute, and TnpB.
- The fusion protein of any preceding claim, wherein the napDNAbd is a nickase, e.g., a Cas9 nickase, an IscB nickase.
- The fusion protein of any preceding claim, wherein the napDNAbd is nuclease-inactive, e.g., dead Cas9, dead Cas12i.
- The fusion protein of any preceding claim, wherein the napDNAbd comprise an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 2, 48, 50, 52, or 163.
- The fusion protein of any preceding claim, wherein the fusion protein comprises, from N-terminal to C-terminal, (1) the napDNAbp and the base excising domain; or (2) the base excising domain and the napDNAbp.
- The fusion protein of any preceding claim, wherein the napDNAbd (e.g., Cas9) is a two-part napDNAbd, for example, a two-part split Cas9, comprising a N-terminal portion and a C-terminal portion, and wherein the fusion protein comprises, from N-terminal to C-terminal, (1) the N-terminal portion of the napDNAbd, the base excising domain, and the C-terminal portion of the napDNAbd; (2) the C-terminal portion of the napDNAbd, the base excising domain, and the N-terminal portion of the napDNAbd; or (3) the base excising domain, the C-terminal portion of the napDNAbd (e.g., amino acids at positions 1249-1368) , and the N-temrinal portion (e.g., amino acids at positions 1-1248) of the napDNAbd .
- The fusion protein of any preceding claim, wherein the napDNAbd is SpCas9 (e.g., a SpCas9 nickase) or a mutant thereof (e.g., a SpG Cas9 nickase) .
- The fusion protein of any preceding claim, wherein the N-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1 or 2 to 1012, 1028, 1041, 1046, 1047, 1248, 1249, or 1300.
- The fusion protein of any preceding claim, wherein the C-terminal portion of the napDNAbd is the amino acids of the napDNAbp at positions 1013, 1029, 1042, 1047, 1048, 1249, 1063, 1064, 1230, 1249, or 1301 to 1368.
- The fusion protein of any preceding claim, wherein the fusion protein comprises the base excising domain embedded between positions 2-1248 of nCas9 (SEQ ID NO: 2) and positions 1249-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2; or embedded between positions 2-1047 of nCas9 (SEQ ID NO: 2) and positions 1064-1368 of nCas9 (SEQ ID NO: 2) , wherein the first amino acid residue D of nCas9 (SEQ ID NO: 2) was designated as position 2.
- The fusion protein of any preceding claim, wherein the fusion protein comprises an amino acid sequence having a sequence identity of at least about 60% (e.g., at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 12, 14, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 55, 57, 59, 61, 63, 136, 138, 140, 142, 143, 145, 147, 149, 151, 153, 154, 156, 158, 160, 161, 162, and 164.
- A system comprising:(i) the fusion protein of any preceding claim or a polynucleotide encoding the fusion protein; and(ii) a guide nucleic acid or a polynucleotide encoding the guide nucleic acid, the guide nucleic acid comprising:(1) a scaffold sequence capable of forming a complex with the napDNAbd; and(2) a guide sequence capable of hybridizing to the target sequence on the target strand of the target dsDNA, thereby guiding the complex to the target dsDNA.
- The fusion protein of any preceding claim, wherein the guide nucleic acid is a guide RNA (gRNA) .
- The fusion protein of any preceding claim, wherein the scaffold sequence has substantially the same secondary structure as the secondary structure of the sequence of SEQ ID NO: 40, 73, or 74, or wherein the scaffold sequence comprises (1) a sequence of SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6, nucleotides truncated at the 5’ or 3’ end; or (2) a sequence having a sequence identity of at least about 70%, 75%, 80%, 85%, 90%, 95%, or 100%to SEQ ID NO: 40, 73, or 74 or a 5’ or 3’ end truncation thereof with 1, 2, 3, 4, 5, or 6 nucleotides truncated at the 5’ or 3’ end; or (3) a sequence having at most 1, 2, 3, 4, 5, or 6 nucleotide differences, whether consecutive or not, compared to SEQ ID NO: 40, 73, or 74.
- The fusion protein or system of any preceding claim, wherein the fusion protein or system further comprises a translesion synthesis (TLS) polymerase or a recruiting domain or component capable of recruiting a TLS polymerase.
- The fusion protein or system of any preceding claim, wherein TLS polymerase is selected from the group consisting of Polα (alpha) , Polβ (beta) , Polδ (delta) (PCNA) , Polγ (gamma) , Polη (eta) , Polι (iota) , Polκ (kappa) , Polλ (lamda) , Polμ (mu) , Polν (nu) , Polθ (theta) , and REV1.
- A method of modifying a target dsDNA, comprising contacting the target dsDNA with the system of any preceding claim,the target dsDNA comprising:(a) a first deoxyribonucleotide (e.g., dG (deoxyguanosine) , dT (thymidine) , dC (deoxycytidine) ) in a protospacer sequence on a nontarget strand (edited strand) of the target dsDNA, and(b) a second deoxyribonucleotide (e.g., dC (deoxycytidine) , dA (deoxyadenosine) , dG (deoxyguanosine) ) base pairing with the first deoxyribonucleotide (e.g., dG, dT, dC) and in a target sequence on the target strand (non-edited strand) of the target dsDNA, wherein the protospacer sequence is fully reverse complementary to the target sequence;wherein the method does not include deamination of the base of the first deoxyribonucleotide before the excision of the base of the first deoxyribonucleotide.
- The method of any preceding claim, wherein the method does not include deamination of the base of the first deoxyribonucleotide.
- An MPG as defined in any preceding claim.
- An UNG as defined in any preceding claim.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202480020620.3A CN120936712A (en) | 2023-04-25 | 2024-04-25 | Novel base editor and its applications |
Applications Claiming Priority (10)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2023090660 | 2023-04-25 | ||
| CNPCT/CN2023/090660 | 2023-04-25 | ||
| CNPCT/CN2023/091734 | 2023-04-28 | ||
| CN2023091734 | 2023-04-28 | ||
| CN2023094565 | 2023-05-16 | ||
| CNPCT/CN2023/094565 | 2023-05-16 | ||
| CN2024070217 | 2024-01-02 | ||
| CNPCT/CN2024/070217 | 2024-01-02 | ||
| CN2024084498 | 2024-03-28 | ||
| CNPCT/CN2024/084498 | 2024-03-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024222812A1 true WO2024222812A1 (en) | 2024-10-31 |
Family
ID=93255591
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/089874 Pending WO2024222812A1 (en) | 2023-04-25 | 2024-04-25 | Novel base editors and uses thereof |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN120936712A (en) |
| WO (1) | WO2024222812A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119220593A (en) * | 2024-11-29 | 2024-12-31 | 三亚中国农业科学院国家南繁研究院 | Base editor for plants C to K |
| CN119241729A (en) * | 2024-12-09 | 2025-01-03 | 三亚中国农业科学院国家南繁研究院 | Base editor for plants T to G |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170321210A1 (en) * | 2014-11-04 | 2017-11-09 | National University Corporation Kobe University | Method for modifying genome sequence to introduce specific mutation to targeted dna sequence by base-removal reaction, and molecular complex used therein |
| US20180179503A1 (en) * | 2016-12-23 | 2018-06-28 | President And Fellows Of Harvard College | Editing of ccr5 receptor gene to protect against hiv infection |
| US20210024906A1 (en) * | 2017-11-22 | 2021-01-28 | National University Corporation Kobe University | Complex for genome editing having stability and few side-effects, and nucleic acid coding same |
| US20210403898A1 (en) * | 2020-06-30 | 2021-12-30 | Pairwise Plants Services, Inc. | Compositions, systems, and methods for base diversification |
| CN117126827A (en) * | 2023-06-20 | 2023-11-28 | 西湖大学 | Fusion protein, base editing system containing uracil-N-glycosylase mutant mediation and application |
| WO2023237063A1 (en) * | 2022-06-08 | 2023-12-14 | Huidagene Therapeutics Co., Ltd. | Novel guide nucleic acids for rna base editing systems and uses thereof |
-
2024
- 2024-04-25 WO PCT/CN2024/089874 patent/WO2024222812A1/en active Pending
- 2024-04-25 CN CN202480020620.3A patent/CN120936712A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170321210A1 (en) * | 2014-11-04 | 2017-11-09 | National University Corporation Kobe University | Method for modifying genome sequence to introduce specific mutation to targeted dna sequence by base-removal reaction, and molecular complex used therein |
| US20180179503A1 (en) * | 2016-12-23 | 2018-06-28 | President And Fellows Of Harvard College | Editing of ccr5 receptor gene to protect against hiv infection |
| US20210024906A1 (en) * | 2017-11-22 | 2021-01-28 | National University Corporation Kobe University | Complex for genome editing having stability and few side-effects, and nucleic acid coding same |
| US20210403898A1 (en) * | 2020-06-30 | 2021-12-30 | Pairwise Plants Services, Inc. | Compositions, systems, and methods for base diversification |
| WO2023237063A1 (en) * | 2022-06-08 | 2023-12-14 | Huidagene Therapeutics Co., Ltd. | Novel guide nucleic acids for rna base editing systems and uses thereof |
| CN117126827A (en) * | 2023-06-20 | 2023-11-28 | 西湖大学 | Fusion protein, base editing system containing uracil-N-glycosylase mutant mediation and application |
Non-Patent Citations (2)
| Title |
|---|
| TONG HUAWEI, LIU NANA, WEI YINGHUI, ZHOU YINGSI, LI YUN, WU DANNI, JIN MING, CUI SHUNA, LI HENGBIN, LI GUOLING, ZHOU JINGXING, YUA: "Programmable deaminase-free base editors for G-to-Y conversion by engineered glycosylase", NATIONAL SCIENCE REVIEW, vol. 10, no. 8, 28 June 2023 (2023-06-28), XP093230108, ISSN: 2095-5138, DOI: 10.1093/nsr/nwad143 * |
| TONG HUAWEI, WANG HAOQIANG, LIU NANA, LI GUOLING, ZHOU YINGSI, WU DANNI, LI YUN, JIN MING, WANG XUCHEN, LI HENGBIN, WEI YINGHUI, Y: "Development of deaminase-free T-to-S base editor and C-to-G base editor by engineered human uracil DNA glycosylase", BIORXIV, 1 January 2024 (2024-01-01), XP093230111, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2024.01.01.573809v1.full.pdf> DOI: 10.1101/2024.01.01.573809 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119220593A (en) * | 2024-11-29 | 2024-12-31 | 三亚中国农业科学院国家南繁研究院 | Base editor for plants C to K |
| CN119220593B (en) * | 2024-11-29 | 2025-04-01 | 三亚中国农业科学院国家南繁研究院 | A C-to-K base editor for plants |
| CN119241729A (en) * | 2024-12-09 | 2025-01-03 | 三亚中国农业科学院国家南繁研究院 | Base editor for plants T to G |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120936712A (en) | 2025-11-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240141336A1 (en) | Targeted rna editing | |
| JP7646554B2 (en) | Compositions and methods for treating alpha-1 antitrypsin deficiency - Patents.com | |
| CN114072496A (en) | Adenosine deaminase base editor and method for modifying nucleobases in target sequence by using same | |
| CN114072509A (en) | Nucleobase editor with reduced off-target of deamination and method of modifying nucleobase target sequence using same | |
| WO2021050571A1 (en) | Novel nucleobase editors and methods of using same | |
| CN116497067A (en) | Compositions and methods for treating heme lesions | |
| CN114190093A (en) | Disruption of splice acceptor sites of disease-associated genes using adenylate deaminase base editor, including use in treating genetic diseases | |
| CN114206395B (en) | Methods for editing single nucleotide polymorphisms using a programmable base editor system | |
| WO2024222812A1 (en) | Novel base editors and uses thereof | |
| CA3116739A1 (en) | Compositions and methods for treating alpha-1 antitrypsin deficiencey | |
| US20250283063A1 (en) | Novel crispr-cas12i systems and uses thereof | |
| WO2023086953A1 (en) | Compositions and methods for the treatment of hereditary angioedema (hae) | |
| WO2023217280A1 (en) | Programmable adenine base editor and uses thereof | |
| US20240132868A1 (en) | Compositions and methods for the self-inactivation of base editors | |
| AU2023261324A1 (en) | Novel crispr-cas12f systems and uses thereof | |
| WO2024026478A1 (en) | Compositions and methods for treating a congenital eye disease | |
| WO2024229240A2 (en) | Compositions and methods for treating stargardt disease | |
| NZ732182B2 (en) | Targeted rna editing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24796188 Country of ref document: EP Kind code of ref document: A1 |