[go: up one dir, main page]

US20200172895A1 - Using split deaminases to limit unwanted off-target base editor deamination - Google Patents

Using split deaminases to limit unwanted off-target base editor deamination Download PDF

Info

Publication number
US20200172895A1
US20200172895A1 US16/615,538 US201816615538A US2020172895A1 US 20200172895 A1 US20200172895 A1 US 20200172895A1 US 201816615538 A US201816615538 A US 201816615538A US 2020172895 A1 US2020172895 A1 US 2020172895A1
Authority
US
United States
Prior art keywords
split
ncas9
cell
deaminase
fusion protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/615,538
Inventor
J. Keith Joung
James Angstman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
General Hospital Corp
Original Assignee
General Hospital Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Hospital Corp filed Critical General Hospital Corp
Priority to US16/615,538 priority Critical patent/US20200172895A1/en
Publication of US20200172895A1 publication Critical patent/US20200172895A1/en
Assigned to THE GENERAL HOSPITAL CORPORATION reassignment THE GENERAL HOSPITAL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ANGSTMAN, James, JOUNG, J. KEITH
Abandoned legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • C07K14/4701Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals not used
    • C07K14/4702Regulators; Modulating activity
    • C07K14/4703Inhibitors; Suppressors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/06Animal cells or tissues; Human cells or tissues
    • C12N5/0602Vertebrate cells
    • C12N5/0634Cells from the blood or the immune system
    • C12N5/0647Haematopoietic stem cells; Uncommitted or multipotent progenitors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/70Fusion polypeptide containing domain for protein-protein interaction
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
    • C07K2319/81Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor containing a Zn-finger domain for DNA binding
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)

Definitions

  • Described herein are methods and compositions for improving the genome-wide specificities of targeted base editing technologies.
  • BE Base editing technologies use an engineered DNA binding domain (such as RNA-guided, catalytically inactive Cas9 (dead Cas9 or dCas9), a nickase version of Cas9 (nCas9), or zinc finger (ZF) arrays) to recruit a cytosine deaminase domain to a specific genomic location to effect site-specific cytosine ⁇ thymine transition substitutions 1,2 .
  • BEs are a particularly attractive tool for treating genetic diseases that manifest in cellular contexts where making precise mutations by homology directed repair (HDR) would be therapeutically beneficial but are difficult to create with traditional nuclease-based genome editing technology.
  • HDR homology directed repair
  • HDR high-length indel mutations
  • BE technology has the potential to allow practitioners to make highly controllable, highly precise mutations without the need for cell-type-variable DNA repair mechanisms.
  • Base editor platforms possess the unique capability to generate precise, user-defined genome-editing events without the need for a donor DNA molecule.
  • Base Editors that include a single strand nicking CRISPR-Cas9 (nCas9) protein fused to cytosine deaminase domain and uracil glycosylase inhibitor (UGI) domains (e.g., BE3) efficiently induce cytosine-to-thymine (C-to-T) base transitions in a site-specific manner as determined by the CRISPR guide RNA (gRNA) spacer sequence 1 .
  • BEs that use split deaminases (sDA) that are functional when brought into close proximity to each other, one fused to a ZF and one to an nCas9-UGI protein comprising one or more UGIs, so as to limit the ability of the deaminase domain from deaminating at off-target ssDNA target sites independent of nCas9 R-loop formation.
  • sDA split deaminases
  • fusion proteins comprising: (i) a first portion of a split deaminase (“sDA1”) enzyme fused to a programmable DNA-binding domain, preferably selected from the group consisting of such as a ZF, TALE, Cas9, catalytically inactive Cas9 (dCas9) or Cas9 ortholog (i.e., a homologous protein from another species such as dCpf1), nicking Cas9 (nCas9) or nicking Cas9 ortholog, wherein the sDA1 is an N-terminal truncated, catalytically inactive or deficient derivative of a parental deaminase selected from the group consisting of hAID, rAPOBEC1, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F, hAPOBEC3G, or hAPOBEC3H, and variants thereof,
  • the split deaminases are not full length proteins, but are fragments thereof, wherein the co-expression of a fusion protein of (i) with a fusion protein of (ii) comprising a sDA1 and sDA2 portion from the same parental deaminase in eukaryotic cells, and their subsequent co-localization at adjacent genomic target sites, provides a catalytically active base-editor.
  • sDA1 and sDA2 are used herein to refer to the first and second split deaminases generally, and do not refer specifically to the exemplary split deaminases described herein.
  • nucleic acids encoding the fusion proteins described herein, and compositions comprising one or more of those nucleic acids, e.g., wherein the nucleic acids encode a pair of the fusion proteins, e.g., comprising a SDA1 and SDA2 portion from the same parental deaminase.
  • vectors comprising the nucleic acids, and isolated host cells comprising and optionally expressing the nucleic acids.
  • the host cell is a stem cell, e.g., a hematopoietic stem cell.
  • fusion proteins described herein comprising a SDA1 and SDA2 portion from the same parental deaminase, as well as one or more gRNAs that interact with Cas9 domains in the fusion proteins.
  • one of the fusion proteins comprises nCas9
  • the other fusion protein comprises ZF or TALE
  • the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
  • the nucleic acid is in a cell, e.g., a eukaryotic cell, and the method comprises contact the cell with the fusion proteins or expressing the fusion proteins in the cell.
  • one of the fusion proteins comprises nCas9
  • the other fusion protein comprises ZF or TALE
  • the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
  • the fusion protein is delivered as an RNP, mRNA, or plasmid.
  • compositions comprising a purified fusion protein or pair of fusion proteins described herein, preferably a pair of fusion proteins described herein comprising a sDA1 and sDA2 portion from the same parental deaminase, an optionally one or more gRNAs that interact with Cas9 domains in the fusion proteins.
  • the composition comprise one or more ribonucleoprotein (RNP) complexes.
  • RNP ribonucleoprotein
  • the fusion protein is delivered as an RNP, mRNA, or plasmid DNA.
  • Also provided herein are methods for deaminating a selected cytosine in a nucleic acid the method comprising contacting the nucleic acid with a fusion protein or base editing system described herein.
  • compositions comprising a purified a fusion protein or base editing system as described herein.
  • nucleic acids encoding a fusion protein or base editing system described herein, as well as vectors comprising the nucleic acids, and host cells comprising the nucleic acids, e.g., stem cells, e.g., hematopoietic stem cells.
  • FIG. 1 Diagram of an exemplary typical high efficiency base editing setup.
  • a nicking Cas9 bearing a catalytically inactivating mutation at one of its two nuclease domains binds to the target site dictated by the variable spacer sequence of the gRNA.
  • the formation of a stable R-loop creates a ssDNA editing window on the non-deaminated strand.
  • the Cas9 creates a single strand break in the genomic DNA, prompting the host cell to repair the lesion using the deaminated strand as a template, thus biasing repair towards the cytosine ⁇ thymine transition substitution. See Komor et al., 2016.
  • FIGS. 2A-2G Schematic representation of: 2A.) First-generation base editor targeting and deaminating at an on-target site, with a deaminase targeting an R-loop generated by an on-target nCas9.
  • 2B. First-generation base-editor binding to and deaminating an off-target genomic R-loop independent of its nCas9 targeting capabilities.
  • 2C. First-generation base-editor binding to and deaminating an off-target genomic transcription bubble independent of its nCas9 targeting capabilities.
  • FIGS. 3A-3B hAPOBEC3G with representative candidate split sites. Multiple rotational views of the hAPOBEC3G structure are shown. Magenta colored loop regions are candidate split sites selected on the bases of their lack of secondary structures and their distance from the catalytic center. PDB: 3E1U.
  • FIG. 4 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.1-3AC3L-ZF-C and N-sDA2.1-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated.
  • FIG. 5 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.1-3AC3L-ZF-C and N-sDA2.2-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. The sDA1.1 and sDA2.2 pair did not stimulate discernable C-to-T conversion in any orientation attempted. EGFP target sequence,
  • FIG. 6 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.1-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. Low-level C-to-T mutations are observed primarily when using gRNA2 with either ZF, with gRNA1 experiments yielding detectable but diminished levels of activity. EGFP target sequence,
  • FIG. 7 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.2-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. Low-level C-to-T mutations are observed primarily when using gRNA2 with either ZF, with gRNA1 experiments yielding detectable but diminished levels of activity. EGFP target sequence,
  • FIG. 8 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.3-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • FIG. 9 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.3-3AC3L-ZF-C and N-sDA2.2-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • FIG. 10 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.3-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • FIG. 11 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.3-3AC3L-ZF-C and N-sDA2.4-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • FIG. 12 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.4-3AC3L-ZF-C and N-sDA2.3-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • FIG. 13 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.4-3AC3L-ZF-C and N-sDA2.4-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • FIG. 14 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.5-3AC3L-ZF-C and N-sDA2.4-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • FIG. 15 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.5-3AC3L-ZF-C and N-sDA2.6-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • FIG. 16 C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.6-3AC3L-ZF-C and N-sDA2.6-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N ⁇ C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • FIG. 17 C-to-T conversion data with first-generation BE3 (described in reference 1) with both gRNAs used in this study. (Note that the coloration gradient of these samples is shaded lighter than graphs above and that direct comparison requires evaluation of relative numerical rates). Orientation information is depicted, with an arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM). EGFP target sequence,
  • FIG. 18 C-to-T conversion rates of individual N-sDA1-ZF-C proteins without an adjacent sDA2-nCas9-UGI. No discernable editing observed. EGFP target sequence,
  • FIG. 19 C-to-T conversion rates of individual N-sDA2-nCas9-UGI-C proteins without an adjacent N-sDA1-ZF-C. No discernable editing was observed.
  • EGFP target sequence
  • FIG. 20 Evidence of C-to-T conversion when using adjacently-targeting N-sDA1.X-NLS-ZF-C and N-sDA2.X-nCas9-UGI-C human APOBEC3a (hA3A) split Base Editors in the indicated orientation. Pointed boxes representing the nCas9 gRNA binding site (gRNA2) and ZF binding site (ZF1) are shown, with the pointed ends indicating the PAM-proximal end of the gRNA and indicating the N ⁇ C orientation of the ZF, respectively. Conversion rates at each position are indicated by shaded boxes.
  • Rates of deamination by split BE pairs are around 2.5% per cytosine using the sDA1.6+sDA2.6 configuration and around 1.7% per cytosine for the sDA1.1+sDA2.1 configuration, while a hAPOBEC3A-nCas9-UGI positive control possessed 3-4 ⁇ the amount of on-target activity as active hA3A halfase pairs.
  • gRNA target region
  • FIGS. 21A-21D Summary of C-to-T conversion rate of all rAPO1 halfase combination base editors as compared to a benchmark BE3 base editor at an integrated EGFP locus. The sum of total C-to-T editing percentages among three cytosines within or near the target gRNA's approximate editing window is shown, as averaged between two replicates.
  • 21A shows the ZF1+gRNA1 data
  • 21B shows the ZF1+gRNA2 data
  • 21C shows the ZF2+gRNA1 data
  • 21D shows the ZF2+gRNA2 data.
  • FIG. 22 Representation of a portion of the EGFP reporter gene and the target sites used for the rAPO1 halfase combination experiments.
  • EGFP target region
  • a cytosine deaminase (DA) domain and uracil glycosylase inhibitor (UGI; a small bacteriophage protein that inhibits host cell uracil DNA glycosylase (UDG), the enzyme responsible for excising uracil from the genome 1, 4 ) are both fused to nCas9 (derived from either Streptococcus pyogenes Cas9 (SpCas9) or Staphylococcus aureus Cas9 (SaCas9).
  • the nCas9 forms an R-loop at a target site specified by its single guide RNA (gRNA) and recognition of an adjacent protospacer adjacent motif (PAM), leaving approximately 4-8 nucleotides of the non-target strand exposed as single stranded DNA (ssDNA) near the PAM-distal end of the R-loop ( FIG. 1 ).
  • This region of the ssDNA is the template that is able to be deaminated by the ssDNA-specific DA domain to produce a guanosine:uracil (G:U) mismatch and defines the editing window.
  • the nCas9 nicks the non-deaminated strand of DNA, biasing conversion of the G:U mismatch to an adenine:thymine (A:T) base pair by directing the cell to repair the nick lesion using the deaminated strand as a template.
  • deaminase domains described in these fusion proteins have been rat APOBEC1 (rAPO1), an activation-induced cytosine deaminase (AID) derived from lamprey termed CDA (PmCDA), human AID (hAID), or a hyperactive form of hAID lacking a nuclear export signal, or an engineered variant of human APOBEC3A (hA3A) termed eA3A 1-2, 5-7, 16 . Any of these deaminase domains from these BEs can be used as parental deaminases in the present fusion proteins.
  • nSpCas9 nCas9 domain
  • any Cas9-like nickase could be used based on any ortholog of the Cpf1 protein (including the related Cpf1 enzyme class) to perform this function, unless specifically indicated.
  • a completely enzymatically dead dCas9 or Cas9-like enzyme can also be used as the targeting mechanism of a functional BE enzyme.
  • BE in therapeutic settings will be to assess its genome-wide capacity for off-target mutagenesis and to modify the technology to minimize or, ideally, to eliminate the risks of stimulating deleterious off-target mutations.
  • BEs that can be used to reduce or eliminate potential unwanted BE mutagenesis.
  • AID/APOBEC enzymes Because of AID/APOBEC enzymes' natural ability to bind and deaminate cytosines in genomic DNA and cytosines in RNA, non-specific spurious deamination events are a possibly important source of off-target mutagenesis in the genome and transcriptome from CRISPR Base Editor technology.
  • BE's nCas9 domain and any potential dCas9, TALE, and/or ZF domains
  • this might do nothing to prevent the natural RNA- and ssDNA-targeting ability of the APOBEC enzyme from non-specifically deaminating globally across the transcriptome or the whichever regions of the genome are exposed as ssDNA, such as actively transcribed regions or DNA undergoing replication.
  • an E. coli -based assay examining deaminases showed that an actively transcribed region could be highly enriched ( ⁇ 7-530 fold) for C ⁇ T transition mutations when exposed to various overexpressed mammalian deaminases 4 .
  • one group has found that co-expression of PmCda1 and nCas9 as two separate, untethered proteins in yeast cells results in similar levels of deamination at the gRNA-specified target site as when the two components are expressed as direct fusion partners, demonstrating that these proteins are capable of deaminating ssDNA from solution without an affinity tether to the genomic location 5 .
  • sDA1 split deaminase
  • ZF any DNA targeting domain orthogonal to Cas9, such as Cpf1, TALE, ZF, or a dCas9 orthogonal to the nCas9 used to target sDA2, may be suitable
  • any DNA targeting domain orthogonal to Cas9 such as Cpf1, TALE, ZF, or a dCas9 orthogonal to the nCas9 used to target sDA2
  • a reciprocal or somewhat overlapping C-terminal truncation of a deaminase fused to an nCas9-UGI fusion protein such that the N-terminal truncation and the C-terminal truncation together form a functional enzyme.
  • the exemplary BEs were made in a similar orientation to the first-generation BE3 enzyme (sDA2-nCas9-UGI) targeting an adjacent sequence with a ⁇ 17-24 bp target site 1 .
  • a yeast cytosine deaminase yCD
  • yCD yeast cytosine deaminase
  • FIGS. 3A-3B we used APOBEC structural information to determine the unstructured linker regions as potential sites at which to split APOBEC enzymes ( FIGS. 3A-3B ), since those sites may be less likely to affect overall functionality or folding of the constituent subdomains.
  • This split deaminase strategy can be used with wild-type versions of deaminase enyzmes, and also any engineered variants that may be described, with the split BE potentially retaining any special features of the engineered deaminases 16 .
  • a split BE should generally increase the specificity of editing compared to typical BEs by virtue of the fact that the split BE system requires the binding of a higher number of sequential/adjacent DNA bases, thereby decreasing the off-target effects conferred by off-target binding of either halfase on its own.
  • CRISPR BE architectures are known to induce C-to-T mutations in human cells at some genomic sites that are imperfect matches to their gRNAs 13 , and since ZFs are known to bind with some capacity to off-target sites it stands to reason that a ZF-BE architecture would also induce off-target mutagenesis to some capacityl 14 .
  • CRISPR/Cas-based targeting system including Cas9s from Streptococcus pyogenes or Stapholococcus aureus or Cpf1 proteins from various organisms could be used in place of the nCas9 portion of the sDA2-nCas9-UGI fusion protein, so long as the targeting mechanism results in specific DNA binding and the creation of an R-loop that exposes ssDNA to action by the reconstituted split deaminase.
  • Table 1 contains a list of representative CRISPR/Cas targeting systems and the residues/mutations therein known to be important for creating nickase and catalytically inactive (dead) mutants.
  • ZF domains are chosen as the DNA binding domain for sDA1 due to their small size, presumed lack of immunogenicity, and because, unlike CRISPR-based targeting systems, they do not create an R-loop upon binding and do not expose additional substrate ssDNA to the deaminase domain. In principle, however, use of any engineered DNA binding domain, such as a CRISPR-based targeting complex or a TALE DNA binding domain, could still result in functional sDA1 halfase. In the examples shown herein, ZF domains targeting an integrated EGFP gene were used for the sDA1 halfases 15 .
  • the present fusion proteins can include programmable DNA binding domains such as engineered C2H2 zinc-fingers, transcription activator effector-like effectors (TALEs), and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) Cas RNA-guided nucleases (RGNs) and their variants, including ssDNA nickases (nCas9) or their analogs and catalytically inactive dead Cas9 (dCas9) and its analogs, and any engineered protospacer-adjacent motif (PAM) variants.
  • a programmable DNA binding domain is one that can be engineered to bind to a selected target sequence.
  • nCas9 in general any Cas9-like nickase could be used based on any ortholog of the Cpf1 protein (including the related Cpf1 enzyme class), unless specifically indicated.
  • BV3L6 (AsCpf1) U2UMQ6 D908, 993E, Q1226, D1263 22 L. bacterium N2006 (LbCpf1) A0A182DWE3 D832A 24 *predicted based on UniRule annotation on the UniProt database. **May be determinable based on sequence alignment with other Cpf1 orthologs These orthologs, and mutants and variants thereof as known in the art, can be used in any of the fusion proteins described herein.
  • the Cas9 nuclease from S. pyogenes can be guided via simple base pair complementarity between 17-20 nucleotides of an engineered guide RNA (gRNA), e.g., a single guide RNA or crRNA/tracrRNA pair, and the complementary strand of a target genomic DNA sequence of interest that lies next to a protospacer adjacent motif (PAM), e.g., a PAM matching the sequence NGG or NAG (Shen et al., Cell Res (2013); Dicarlo et al., Nucleic Acids Res (2013); Jiang et al., Nat Biotechnol 31, 233-239 (2013); Jinek et al., Elife 2, e00471 (2013); Hwang et al., Nat Biotechnol 31, 227-229 (2013); Cong et al., Science 339, 819-823 (2013); Mali et al., Science 339, 823-826 (2013c); Cho e
  • Cpf1 The engineered CRISPR from Prevotella and Francisella 1 (Cpf1) nuclease can also be used, e.g., as described in Zetsche et al., Cell 163, 759-771 (2015); Schunder et al., Int J Med Microbiol 303, 51-60 (2013); Makarova et al., Nat Rev Microbiol 13, 722-736 (2015); Fagerlund et al., Genome Biol 16, 251 (2015).
  • Cpf1 requires only a single 42-nt crRNA, which has 23 nt at its 3′ end that are complementary to the protospacer of the target DNA sequence (Zetsche et al., 2015).
  • SpCas9 recognizes an NGG PAM sequence that is 3′ of the protospacer
  • AsCpf1 and LbCp1 recognize TTTN PAMs that are found 5′ of the protospacer (Id.).
  • the present system utilizes a wild type or variant Cas9 protein from S. pyogenes or Staphylococcus aureus , or a wild type Cpf1 protein from Acidaminococcus sp. BV3L6 or Lachnospiraceae bacterium ND2006 either as encoded in bacteria or codon-optimized for expression in mammalian cells and/or modified in its PAM recognition specificity and/or its genome-wide specificity.
  • a number of variants have been described; see, e.g., WO 2016/141224, PCT/US2016/049147, Kleinstiver et al., Nat Biotechnol.
  • the guide RNA is expressed or present in the cell together with the Cas9 or Cpf1. Either the guide RNA or the nuclease, or both, can be expressed transiently or stably in the cell or introduced as a purified protein or nucleic acid.
  • the Cas9 also includes one of the following mutations, which reduce nuclease activity of the Cas9; e.g., for SpCas9, mutations at D10A or H840A (which creates a single-strand nickase).
  • the SpCas9 variants also include mutations at one of the following amino acid positions, which destroy the nuclease activity of the Cas9: D10, E762, D839, H983, or D986 and H840 or N863, e.g., D10A/D10N and H840A/H840N/H840Y, to render the nuclease portion of the protein catalytically inactive; substitutions at these positions could be alanine (as they are in Nishimasu al., Cell 156, 935-949 (2014)), or other residues, e.g., glutamine, asparagine, tyrosine, serine, or aspartate, e.g., E762Q, H983N, H983Y, D986N, N863D, N863S, or N863H (see WO 2014/152432).
  • the Cas9 is fused to one or more Uracil glycosylase inhibitor (UGI) protein sequences;
  • UGI Uracil glycosylase inhibitor
  • UGIs are at the C-terminus of a BE fusion protein, but could conceivably be at the N-terminus, or between the DNA binding domain and the sDA domain. Linkers as known in the art can be used to separate domains.
  • Transcription activator like effectors of plant pathogenic bacteria in the genus Xanthomonas play important roles in disease, or trigger defense, by binding host DNA and activating effector-specific host genes. Specificity depends on an effector-variable number of imperfect, typically ⁇ 33-35 amino acid repeats. Polymorphisms are present primarily at repeat positions 12 and 13, which are referred to herein as the repeat variable-diresidue (RVD).
  • RVDs of TAL effectors correspond to the nucleotides in their target sites in a direct, linear fashion, one RVD to one nucleotide, with some degeneracy and no apparent context dependence.
  • the polymorphic region that grants nucleotide specificity may be expressed as a triresidue or triplet.
  • Each DNA binding repeat can include a RVD that determines recognition of a base pair in the target DNA sequence, wherein each DNA binding repeat is responsible for recognizing one base pair in the target DNA sequence.
  • the RVD can comprise one or more of: HA for recognizing C; ND for recognizing C; HI for recognizing C; HN for recognizing G; NA for recognizing G; SN for recognizing G or A; YG for recognizing T; and NK for recognizing G, and one or more of: HD for recognizing C; NG for recognizing T; NI for recognizing A; NN for recognizing G or A; NS for recognizing A or C or G or T; N* for recognizing C or T, wherein * represents a gap in the second position of the RVD; HG for recognizing T; H* for recognizing T, wherein * represents a gap in the second position of the RVD; and IG for recognizing T.
  • TALE proteins may be useful in research and biotechnology as targeted chimeric nucleases that can facilitate homologous recombination in genome engineering (e.g., to add or enhance traits useful for biofuels or biorenewables in plants). These proteins also may be useful as, for example, transcription factors, and especially for therapeutic applications requiring a very high level of specificity such as therapeutics against pathogens (e.g., viruses) as non-limiting examples.
  • pathogens e.g., viruses
  • MegaTALs are a fusion of a meganuclease with a TAL effector; see, e.g., Boissel et al., Nucl. Acids Res. 42(4):2591-2601 (2014); Boissel and Scharenberg, Methods Mol Biol. 2015; 1239:171-96.
  • Zinc finger (ZF) proteins are DNA-binding proteins that contain one or more zinc fingers, independently folded zinc-containing mini-domains, the structure of which is well known in the art and defined in, for example, Miller et al., 1985, EMBO J., 4:1609; Berg, 1988, Proc. Natl. Acad. Sci. USA, 85:99; Lee et al., 1989, Science. 245:635; and Klug, 1993, Gene, 135:83.
  • Crystal structures of the zinc finger protein Zif268 and its variants bound to DNA show a semi-conserved pattern of interactions, in which typically three amino acids from the alpha-helix of the zinc finger contact three adjacent base pairs or a “subsite” in the DNA (Pavletich et al., 1991, Science, 252:809; Elrod-Erickson et al., 1998, Structure, 6:451).
  • the crystal structure of Zif268 suggested that zinc finger DNA-binding domains might function in a modular manner with a one-to-one interaction between a zinc finger and a three-base-pair “subsite” in the DNA sequence.
  • multiple zinc fingers are typically linked together in a tandem array to achieve sequence-specific recognition of a contiguous DNA sequence (Klug, 1993, Gene 135:83).
  • Such recombinant zinc finger proteins can be fused to functional domains, such as transcriptional activators, transcriptional repressors, methylation domains, and nucleases to regulate gene expression, alter DNA methylation, and introduce targeted alterations into genomes of model organisms, plants, and human cells (Carroll, 2008, Gene Ther., 15:1463-68; Cathomen, 2008, Mol. Ther., 16:1200-07; Wu et al., 2007, Cell. Mol. Life Sci., 64:2933-44).
  • functional domains such as transcriptional activators, transcriptional repressors, methylation domains, and nucleases to regulate gene expression, alter DNA methylation, and introduce targeted alterations into genomes of model organisms, plants, and human cells
  • module assembly One existing method for engineering zinc finger arrays, known as “modular assembly,” advocates the simple joining together of pre-selected zinc finger modules into arrays (Segal et al., 2003, Biochemistry, 42:2137-48; Beerli et al., 2002, Nat. Biotechnol., 20:135-141; Mandell et al., 2006, Nucleic Acids Res., 34:W516-523; Carroll et al., 2006, Nat. Protoc. 1:1329-41; Liu et al., 2002, J. Biol. Chem., 277:3850-56; Bae et al., 2003, Nat. Biotechnol., 21:275-280; Wright et al., 2006, Nat.
  • the base editor is a deaminase that modifies cytosine DNA bases, e.g., a cytosine deaminase from the apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like (APOBEC) family of deaminases, including APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D/E, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4 (see, e.g., Yang et al., J Genet Genomics. 2017 Sep.
  • APOBEC catalytic polypeptide-like family of deaminases
  • activation-induced cytosine deaminase AID
  • activation-induced cytosine deaminase AID
  • AICDA activation induced cytosine deaminase
  • CDA1 cytosine deaminase 1
  • CDA2 cytosine deaminase acting on tRNA
  • Table 2 provides exemplary sequences; other sequences can also be used.
  • split deaminase regions are shown in Table 3.
  • Each split region listed in Table 3 represents a region of the enzyme either known to be a linker region devoid of secondary structure and positioned away from enzymatically important functions or predicted to be linker based on alignment with hAPOBEC3G where structural information is lacking (* indicates which proteins lack sufficient structural information).
  • Unstructured recognition loops were not included due to their importance in determining substrate binding and specificity. All protein sequences acquired from uniprot.org. All positional information refers to positions within the full-length protein sequences as described below. Candidate split regions described only indicate our best attempt at a priori prediction of which splits will be functional.
  • the split deaminase regions can include mutations that may enhance base editing, e.g., when made to the nCas9-UGI portion, e.g., mutations corresponding to W90, R126, or R132 of SEQ ID NO:46, e.g., corresponding to W90Y, R126E, R132E, of SEQ ID NO:46 (see, e.g., Kim et al. “Increasing the Genome-Targeting Scope and Precision of Base Editing with Engineered Cas9-Cytosine Deaminase Fusions.” Nature Biotechnology 35(4):371-376 (2017)).
  • the split deaminase regions can include mutations at positions corresponding to one or more of N57, Y130, or K60 of SEQ ID NO:49, e.g., mutations corresponding to N57G, N57A, N57Q, Y130F, K60D of SEQ ID NO:49 (see, e.g., reference 17).
  • the components of the fusion proteins are at least 80%, e.g., at least 85%, 90%, 95%, 97%, or 99% identical to the amino acid sequence of a exemplary sequence (e.g., as provided herein), e.g., have differences at up to 1%, 2%, 5%, 10%, 15%, or 20% of the residues of the exemplary sequence replaced, e.g., with conservative mutations, e.g., including or in addition to the mutations described herein.
  • the variant retains desired activity of the parent, e.g., nickase activity, and/or the ability to interact with a guide RNA and/or target DNA, optionally with improved specificity or altered substrate specificity.
  • the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes).
  • the length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%.
  • the nucleotides at corresponding amino acid positions or nucleotide positions are then compared.
  • nucleic acid “identity” is equivalent to nucleic acid “homology”.
  • the percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. Percent identity between two polypeptides or nucleic acid sequences is determined in various ways that are within the skill in the art, for instance, using publicly available computer software such as Smith Waterman Alignment (Smith, T. F. and M. S.
  • the length of comparison can be any length, up to and including full length (e.g., 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100%).
  • full length e.g., 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100%.
  • at least 80% of the full length of the sequence is aligned.
  • the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
  • Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine.
  • isolated nucleic acids encoding the split deaminase fusion proteins
  • vectors comprising the isolated nucleic acids, optionally operably linked to one or more regulatory domains for expressing the variant proteins
  • host cells e.g., mammalian host cells, comprising the nucleic acids, and optionally expressing the variant proteins.
  • the host cells are stem cells, e.g., hematopoietic stem cells.
  • the fusion proteins include a linker between the DNA binding domain (e.g., ZFN, TALE, or nCas9) and the BE domains.
  • Linkers that can be used in these fusion proteins (or between fusion proteins in a concatenated structure) can include any sequence that does not interfere with the function of the fusion proteins.
  • the linkers are short, e.g., 2-20 amino acids, and are typically flexible (i.e., comprising amino acids with a high degree of freedom such as glycine, alanine, and serine).
  • the linker comprises one or more units consisting of GGGS (SEQ ID NO:5) or GGGGS (SEQ ID NO:6), e.g., two, three, four, or more repeats of the GGGS (SEQ ID NO:5) or GGGGS (SEQ ID NO:6) unit.
  • Other linker sequences can also be used.
  • the split deaminase fusion protein includes a cell-penetrating peptide sequence that facilitates delivery to the intracellular space, e.g., HIV-derived TAT peptide, penetratins, transportans, or hCT derived cell-penetrating peptides, see, e.g., Caron et al., (2001) Mol Ther. 3(3):310-8; Langel, Cell-Penetrating Peptides: Processes and Applications (CRC Press, Boca Raton Fla. 2002); El-Andaloussi et al., (2005) Curr Pharm Des. 11(28):3597-611; and Deshayes et al., (2005) Cell Mol Life Sci. 62(16):1839-49.
  • a cell-penetrating peptide sequence that facilitates delivery to the intracellular space, e.g., HIV-derived TAT peptide, penetratins, transportans, or hCT derived cell-penetra
  • CPPs Cell penetrating peptides
  • cytoplasm or other organelles e.g. the mitochondria and the nucleus.
  • molecules that can be delivered by CPPs include therapeutic drugs, plasmid DNA, oligonucleotides, siRNA, peptide-nucleic acid (PNA), proteins, peptides, nanoparticles, and liposomes.
  • CPPs are generally 30 amino acids or less, are derived from naturally or non-naturally occurring protein or chimeric sequences, and contain either a high relative abundance of positively charged amino acids, e.g.
  • CPPs that are commonly used in the art include Tat (Frankel et al., (1988) Cell. 55:1189-1193, Vives et al., (1997) J. Biol. Chem. 272:16010-16017), penetratin (Derossi et al., (1994) J. Biol. Chem. 269:10444-10450), polyarginine peptide sequences (Wender et al., (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008, Futaki et al., (2001) J. Biol. Chem. 276:5836-5840), and transportan (Pooga et al., (1998) Nat. Biotechnol. 16:857-861).
  • CPPs can be linked with their cargo through covalent or non-covalent strategies.
  • Methods for covalently joining a CPP and its cargo are known in the art, e.g. chemical cross-linking (Stetsenko et al., (2000) J. Org. Chem. 65:4900-4909, Gait et al. (2003) Cell. Mol. Life. Sci. 60:844-853) or cloning a fusion protein (Nagahara et al., (1998) Nat. Med. 4:1449-1453).
  • Non-covalent coupling between the cargo and short amphipathic CPPs comprising polar and non-polar domains is established through electrostatic and hydrophobic interactions.
  • CPPs have been utilized in the art to deliver potentially therapeutic biomolecules into cells. Examples include cyclosporine linked to polyarginine for immunosuppression (Rothbard et al., (2000) Nature Medicine 6(11):1253-1257), siRNA against cyclin B1 linked to a CPP called MPG for inhibiting tumorigenesis (Crombez et al., (2007) Biochem Soc. Trans. 35:44-46), tumor suppressor p53 peptides linked to CPPs to reduce cancer cell growth (Takenobu et al., (2002) Mol. Cancer Ther. 1(12):1043-1049, Snyder et al., (2004) PLoS Biol. 2:E36), and dominant negative forms of Ras or phosphoinositol 3 kinase (PI3K) fused to Tat to treat asthma (Myou et al., (2003) J. Immunol. 171:4399-4405).
  • PI3K phosphoinositol 3
  • CPPs have been utilized in the art to transport contrast agents into cells for imaging and biosensing applications.
  • green fluorescent protein (GFP) attached to Tat has been used to label cancer cells (Shokolenko et al., (2005) DNA Repair 4(4):511-518).
  • Tat conjugated to quantum dots have been used to successfully cross the blood-brain barrier for visualization of the rat brain (Santra et al., (2005) Chem. Commun. 3144-3146).
  • CPPs have also been combined with magnetic resonance imaging techniques for cell imaging (Liu et al., (2006) Biochem. and Biophys. Res. Comm. 347(1):133-140). See also Ramsey and Flynn, Pharmacol Ther. 2015 Jul. 22. pii: S0163-7258(15)00141-2.
  • the split deaminase fusion proteins can include a nuclear localization sequence, e.g., SV40 large T antigen NLS (PKKKRRV (SEQ ID NO:7)) and nucleoplasmin NLS (KRPAATKKAGQAKKKK (SEQ ID NO: 8)).
  • PKKKRRV SEQ ID NO:7
  • KRPAATKKAGQAKKKK SEQ ID NO: 8
  • Other NLSs are known in the art; see, e.g., Cokol et al., EMBO Rep. 2000 Nov. 15; 1(5): 411-415; Freitas and Cunha, Curr Genomics. 2009 December; 10(8): 550-557.
  • the split deaminase fusion proteins include a moiety that has a high affinity for a ligand, for example GST, FLAG or hexahistidine sequences.
  • affinity tags can facilitate the purification of recombinant split deaminase fusion proteins.
  • the split deaminase fusion proteins described herein can be used for altering the genome of a cell.
  • the methods generally include expressing or contacting the split deaminase fusion proteins in the cells; in versions using one or two Cas9s, the methods include using a guide RNA having a region complementary to a selected portion of the genome of the cell.
  • Methods for selectively altering the genome of a cell are known in the art, see, e.g., U.S. Pat. No. 8,993,233; US 20140186958; U.S. Pat. No.
  • CRISPRs Clustered Regularly Interspaced Short Palindromic Repeats
  • the proteins can be produced using any method known in the art, e.g., by in vitro translation, or expression in a suitable host cell from nucleic acid encoding the split deaminase fusion protein; a number of methods are known in the art for producing proteins.
  • the proteins can be produced in and purified from yeast, E. coli , insect cell lines, plants, transgenic animals, or cultured mammalian cells; see, e.g., Palomares et al., “Production of Recombinant Proteins: Challenges and Solutions,” Methods Mol Biol. 2004; 267:15-52.
  • split deaminase fusion proteins can be linked to a moiety that facilitates transfer into a cell, e.g., a lipid nanoparticle, optionally with a linker that is cleaved once the protein is inside the cell. See, e.g., LaFountaine et al., Int J Pharm. 2015 Aug. 13; 494(1):180-194.
  • the nucleic acid encoding the split deaminase fusion can be cloned into an intermediate vector for transformation into prokaryotic or eukaryotic cells for replication and/or expression.
  • Intermediate vectors are typically prokaryote vectors, e.g., plasmids, or shuttle vectors, or insect vectors, for storage or manipulation of the nucleic acid encoding the split deaminase fusion for production of the split deaminase fusion protein.
  • the nucleic acid encoding the split deaminase fusion protein can also be cloned into an expression vector, for administration to a plant cell, animal cell, preferably a mammalian cell or a human cell, fungal cell, bacterial cell, or protozoan cell.
  • a sequence encoding a split deaminase fusion protein is typically subcloned into an expression vector that contains a promoter to direct transcription.
  • Suitable bacterial and eukaryotic promoters are well known in the art and described, e.g., in Sambrook et al., Molecular Cloning, A Laboratory Manual (3d ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., eds., 2010).
  • Bacterial expression systems for expressing the engineered protein are available in, e.g., E.
  • Kits for such expression systems are commercially available.
  • Eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available.
  • the promoter used to direct expression of a nucleic acid depends on the particular application. For example, a strong constitutive promoter is typically used for expression and purification of fusion proteins. In contrast, when the split deaminase fusion protein is to be administered in vivo for gene regulation, either a constitutive or an inducible promoter can be used, depending on the particular use of the split deaminase fusion protein.
  • a preferred promoter for administration of the split deaminase fusion protein can be a weak promoter, such as HSV TK or a promoter having similar activity.
  • the promoter can also include elements that are responsive to transactivation, e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tetracycline-regulated systems and the RU-486 system (see, e.g., Gossen & Bujard, 1992, Proc. Natl. Acad. Sci. USA, 89:5547; Oligino et al., 1998, Gene Ther., 5:491-496; Wang et al., 1997, Gene Ther., 4:432-441; Neering et al., 1996, Blood, 88:1147-55; and Rendahl et al., 1998, Nat. Biotechnol., 16:757-761).
  • elements that are responsive to transactivation e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tetracycline-regulated systems and the RU-486 system
  • the expression vector typically contains a transcription unit or expression cassette that contains all the additional elements required for the expression of the nucleic acid in host cells, either prokaryotic or eukaryotic.
  • a typical expression cassette thus contains a promoter operably linked, e.g., to the nucleic acid sequence encoding the split deaminase fusion protein, and any signals required, e.g., for efficient polyadenylation of the transcript, transcriptional termination, ribosome binding sites, or translation termination.
  • Additional elements of the cassette may include, e.g., enhancers, and heterologous spliced intronic signals.
  • the particular expression vector used to transport the genetic information into the cell is selected with regard to the intended use of the split deaminase fusion protein, e.g., expression in plants, animals, bacteria, fungus, protozoa, etc.
  • Standard bacterial expression vectors include plasmids such as pBR322 based plasmids, pSKF, pET23D, and commercially available tag-fusion expression systems such as GST and LacZ.
  • Expression vectors containing regulatory elements from eukaryotic viruses are often used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus.
  • eukaryotic vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV40 early promoter, SV40 late promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
  • the vectors for expressing the split deaminase fusion protein can include RNA Pol III promoters to drive expression of the guide RNAs, e.g., the H1, U6 or 7SK promoters. These human promoters allow for expression of split deaminase fusion protein in mammalian cells following plasmid transfection.
  • Some expression systems have markers for selection of stably transfected cell lines such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate reductase.
  • High yield expression systems are also suitable, such as using a baculovirus vector in insect cells, with the gRNA encoding sequence under the direction of the polyhedrin promoter or other strong baculovirus promoters.
  • the elements that are typically included in expression vectors also include a replicon that functions in E. coli , a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential regions of the plasmid to allow insertion of recombinant sequences.
  • Standard transfection methods are used to produce bacterial, mammalian, yeast or insect cell lines that express large quantities of protein, which are then purified using standard techniques (see, e.g., Colley et al., 1989, J. Biol. Chem., 264:17619-22; Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are performed according to standard techniques (see, e.g., Morrison, 1977, J. Bacteriol. 132:349-351; Clark-Curtiss & Curtiss, Methods in Enzymology 101:347-362 (Wu et al., eds, 1983).
  • Any of the known procedures for introducing foreign nucleotide sequences into host cells may be used. These include the use of calcium phosphate transfection, polybrene, protoplast fusion, electroporation, nucleofection, liposomes, microinjection, naked DNA, plasmid vectors, viral vectors, both episomal and integrative, and any of the other well-known methods for introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, e.g., Sambrook et al., supra). It is only necessary that the particular genetic engineering procedure used be capable of successfully introducing at least one gene into the host cell capable of expressing the split deaminase fusion protein.
  • the methods also include delivering a gRNA that interacts with the Cas9.
  • the methods can include delivering the split deaminase fusion protein and guide RNA together, e.g., as a complex.
  • the split deaminase fusion protein and gRNA can be can be overexpressed in a host cell and purified, then complexed with the guide RNA (e.g., in a test tube) to form a ribonucleoprotein (RNP), and delivered to cells.
  • the split deaminase fusion protein can be expressed in and purified from bacteria through the use of bacterial expression plasmids.
  • His-tagged split deaminase fusion protein can be expressed in bacterial cells and then purified using nickel affinity chromatography.
  • RNPs circumvents the necessity of delivering plasmid DNAs encoding the nuclease or the guide, or encoding the nuclease as an mRNA. RNP delivery may also improve specificity, presumably because the half-life of the RNP is shorter and there's no persistent expression of the nuclease and guide (as you′d get from a plasmid).
  • the RNPs can be delivered to the cells in vivo or in vitro, e.g., using lipid-mediated transfection or electroporation. See, e.g., Liang et al.
  • the present invention also includes the vectors and cells comprising the vectors, as well as kits comprising the proteins and nucleic acids described herein, e.g., for use in a method described herein.
  • sDA1-containing expression plasmids were constructed by selectively amplifying desired regions of the rAPO1, hA3A, or BE3 genes, as well as DNA sequences encoding a 3AC3L-NLS or NLS only linker and desired EGFP-targeting ZFs, by the PCR method such that they had significant overlapping ends and using isothermal assembly (or “Gibson Assembly,” NEB) to assemble them in the desired order in a pCAG expression vector.
  • sDA2-containing expression plasmids were constructed by truncating a BE3 gene by PCR and using Gibson assembly to put the resulting pieces into a pCAG expression plasmid. PCR was conducted using Q5 or Phusion polymerases (NEB).
  • a HEK293 cell line in which an integrated EGFP reporter gene has been integrated was grown in culture using media consisting of Advanced Dulbeccos Modified Medium (Gibco) supplemented with 10% heat inactivated fetal bovine serum (Gibco), 1% 10,000 U/ml penicillin-streptomycin solution (Gibco), and 1% Glutamax (Gibco). Cells were passaged every 3-4 days to maintain an actively growing population and avoid anoxic conditions.
  • Transfections containing 1.0 microgram of transfection quality DNA were conducted by seeding 1.5 ⁇ 10 5 cells in 24-well TC-treated plates (Corning) and using TransIT-293 reagent according to manufacturer's protocol (Minis Bio).
  • 400 nanograms contained the sDA1-encoding plasmid
  • 400 nanograms contained the sDA2-encoding plasmid
  • 200 nanograms contained an expression plasmid encoding the SpCas9 gRNA targeting the EGFP reporter gene.
  • 400 nanograms contained BE-expressing plasmid
  • 400 nanograms contained a pMax-GFP-encoding plasmid (Lonza)
  • 200 nanograms contained an expression plasmid encoding the SpCas9 gRNA targeting the EGFP reporter gene.
  • 400 nanograms contained the sDA-encoding plasmid
  • 400 nanograms contained a pMax-GFP-encoding plasmid (Lonza)
  • 200 nanograms contained an expression plasmid encoding the SpCas9 gRNA targeting the EGFP reporter gene.
  • Genomic DNA was harvested 3 days post-transfection using the DNAdvance kit (Agencourt).
  • Rates of base editing at target loci were determined by deep-sequencing of PCR amplicons amplified off of genomic DNA isolated from transfected cells.
  • Target site genomic DNA was amplified using EGFP-specific DNA primers flanking the sDA2 nCas9 binding sites.
  • Illumina TruSeq adapters were added to the ends of the amplicons either by PCR or NEBNext Ultra II kit (NEB) and molecularly indexed with NEBNext Dual Index Primers (NEB). Samples were combined into libraries and sequenced on the Illumina MiSeq machine using the MiSeq Reagent Micro Kit v2 (Illumina). Sequencing results were analyzed using a batch version of the software CRISPResso (crispresso.rocks).
  • ZF1 Binding site (SEQ ID NO: 9) aGAAGATGGTg ZF2 Binding Site: (SEQ ID NO: 10) gGTCGGGGTAg gRNA1 Binding Site (with PAM): (SEQ ID NO: 11) TTCAAGTCCGCCATGCCCGAAGG gRNA2 Binding Site (with PAM): (SEQ ID NO: 12) CATGCCCGAAGGCTACGTCCAGG
  • X indicates an undetermined amino acid residue, indicating the variable regions of a ZF that are responsible for specific DNA binding.
  • aureus Cas9 (SEQ ID NO: 39) MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRS KRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLS EEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAEL QLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLL ETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLY NALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDI KGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQ EELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAIN
  • lavamentivorans Cas9 (SEQ ID NO: 41) MERIFGFDIGTTSIGFSVIDYSSTQSAGNIQRLGVRIFPEARDPDGTPLNQ QRRQKRMMRRQLRRRRIRRKALNETLHEAGFLPAYGSADWPVVMADEPYE LRRRGLEEGLSAYEFGRAIYHLAQHRHFKGRELEESDTPDPDVDDEKEAANE RAATLKALKNEQTTLGAWLARRPPSDRKRGIHAHRNVVAEEFERLWEVQSK FHPALKSEEMRARISDTIFAQRPVFWRKNTLGECRFMPGEPLCPKGSWLSQQR RMLEKLNNLAIAGGNARPLDAEERDAILSKLQQQASMSWPGVRSALKALYK QRGEPGAEKSLKFNLELGGESKLLGNALEAKLADMFGPDWPAHPRKQEIRH AVHERLWAADYGETPDKKRVIILSEKDRKAHREAAANSFVADFGITGEQAAQ LQALKLPT
  • Tables 4-6 show the exact truncation variants that we have created and evaluated.
  • split BEs in which the halfases shared overlapping peptide sequences. We reasoned that this “extra” overlap may enable proper folding of the constituent halfases so as to enable functional reconstitution of the deamniase, and also noted that the most functional split yCD pair included a significant overlap in peptide sequence 12 .
  • each rAPOBEC1 pair was tested in two different orientations with regards to the ZF and gRNA binding sites, with two different ZF domains and two different gRNAs for 4 total orientation pairs. Only directly reciprocal hAPOBEC3A pairs were tested (e.g. sDA1.1 with sDA2.1).
  • Activity of each BE halfase pair when co-delivered by plasmid transfection with an approximate ratio of 1:1 for each halfase is shown in FIGS. 4-16 ( FIG. 17 is a positive BE3 control for comparison) for each orientation of rAPO1 sDA pairs and FIG.
  • FIG. 21 A summary of the cumulative editing efficiencies (the sum of the editing rates at the cytosines within the gRNA editing window) of all rAPO1 halfase pairs in each orientation is given in FIG. 21 .
  • the target site configurations for and all DNA targeting proteins used for rAPO1 experiments is shown in FIG. 22 .
  • All rAPO1 split BEs shown include an sDA1 halfase with an sDA1-3AC3L-NLS-ZF configuration, while all hA3A split BEs include an sDA1 with an sDA1-NLS-ZF configuration.
  • rAPO1 sDA1.1+rAPO1 sDA2.1, rAPO1 sDA1.2+rAPO1 sDA2.1, rAPO1 sDA1.2+rAPO1 sDA2.2, hA3A sDA1.1+sDA2.1, and hA3A 1.6+hA3A 2.6 show significant activity compared to a positive BE3 control ( FIGS. 4, 6, 7, 20, and 21 ).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Toxicology (AREA)
  • Hematology (AREA)
  • Mycology (AREA)
  • Cell Biology (AREA)
  • Immunology (AREA)
  • Developmental Biology & Embryology (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Medicines Containing Material From Animals Or Micro-Organisms (AREA)

Abstract

Described herein are methods and compositions for improving the genome-wide specificities of targeted base editing technologies. Herein, we describe dimeric base editing (BE) technologies that use split deaminases (sDA) that are functional when brought into close proximity to each other, one fused to a ZF and one to an nCas9-UGI protein comprising one or more UGIs, so as to limit the ability of the deaminase domain from deaminating at off-target ssDNA target sites independent of nCas9 R-loop formation. Thus, provided herein are fusion proteins comprising: (i) a first portion of a split deaminase (“sDAI”) enzyme fused to a programmable DNA-binding domain; or (ii) a second portion of a split deaminase (“sDA2”) fused to an nCas9 protein. The present invention also includes the vectors and cells comprising the vectors, as well as kits comprising the proteins and nucleic acids described herein.

Description

    CLAIM OF PRIORITY
  • This application claims the benefit of U.S. Provisional Patent Application Ser. No. 62/511,296, filed on May 25, 2017; Ser. No. 62/541,544, filed on Aug. 4, 2017; and Ser. No. 62/622,676, filed on Jan. 26, 2018. The entire contents of the foregoing are hereby incorporated by reference.
  • FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
  • This invention was made with Government support under Grant No. GM118158 awarded by the National Institutes of Health. The Government has certain rights in the invention.
  • TECHNICAL FIELD
  • Described herein are methods and compositions for improving the genome-wide specificities of targeted base editing technologies.
  • BACKGROUND
  • Base editing (BE) technologies use an engineered DNA binding domain (such as RNA-guided, catalytically inactive Cas9 (dead Cas9 or dCas9), a nickase version of Cas9 (nCas9), or zinc finger (ZF) arrays) to recruit a cytosine deaminase domain to a specific genomic location to effect site-specific cytosine→thymine transition substitutions1,2. BEs are a particularly attractive tool for treating genetic diseases that manifest in cellular contexts where making precise mutations by homology directed repair (HDR) would be therapeutically beneficial but are difficult to create with traditional nuclease-based genome editing technology. For example, it is challenging or impossible to achieve HDR outcomes in tissues composed primarily of slowly dividing or post-mitotic cell populations, since HDR pathways are restricted to the G2 and S phases of the cell cycle3. In addition, the efficiency of HDR can be substantially limited by the competing and more efficient induction of variable-length indel mutations caused by non-homologous end-joining-mediated repair of nuclease-induced breaks. By contrast, BE technology has the potential to allow practitioners to make highly controllable, highly precise mutations without the need for cell-type-variable DNA repair mechanisms.
  • SUMMARY
  • Base editor platforms (BE) possess the unique capability to generate precise, user-defined genome-editing events without the need for a donor DNA molecule. Base Editors (BEs) that include a single strand nicking CRISPR-Cas9 (nCas9) protein fused to cytosine deaminase domain and uracil glycosylase inhibitor (UGI) domains (e.g., BE3) efficiently induce cytosine-to-thymine (C-to-T) base transitions in a site-specific manner as determined by the CRISPR guide RNA (gRNA) spacer sequence1. As with all genome editing reagents, it is critical to first determine and then mitigate BE's capacity for generating off-target mutations before it is used for therapeutics so as to limit its potential for creating deleterious and irreversible genetically-encoded side-effects. Herein, we describe dimeric BEs that use split deaminases (sDA) that are functional when brought into close proximity to each other, one fused to a ZF and one to an nCas9-UGI protein comprising one or more UGIs, so as to limit the ability of the deaminase domain from deaminating at off-target ssDNA target sites independent of nCas9 R-loop formation.
  • Thus, provided herein are fusion proteins comprising: (i) a first portion of a split deaminase (“sDA1”) enzyme fused to a programmable DNA-binding domain, preferably selected from the group consisting of such as a ZF, TALE, Cas9, catalytically inactive Cas9 (dCas9) or Cas9 ortholog (i.e., a homologous protein from another species such as dCpf1), nicking Cas9 (nCas9) or nicking Cas9 ortholog, wherein the sDA1 is an N-terminal truncated, catalytically inactive or deficient derivative of a parental deaminase selected from the group consisting of hAID, rAPOBEC1, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F, hAPOBEC3G, or hAPOBEC3H, and variants thereof, e.g., variants that have altered substrate specificities or activities such as eA3A; or (ii) a second portion of a split deaminase (“sDA2”) fused to an nCas9 protein, preferably an nCas9-UGI protein, e.g., in a manner similar to previously described base editor architectures, or any orthogonal DNA targeting domain as the one used for its complementary sDA1 portion (e.g., dCpf1, TALE, ZF), wherein the sDA2 is a C-terminal truncated, catalytically inactive or deficient derivative of a parental deaminase selected from the group consisting of hAID, rAPOBEC1*, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F, hAPOBEC3G, or hAPOBEC3H, and/or variants thereof, e.g., with altered substrate specificities or activities such as eA3A. In the present methods, the split deaminases are not full length proteins, but are fragments thereof, wherein the co-expression of a fusion protein of (i) with a fusion protein of (ii) comprising a sDA1 and sDA2 portion from the same parental deaminase in eukaryotic cells, and their subsequent co-localization at adjacent genomic target sites, provides a catalytically active base-editor. The terms “sDA1” and “sDA2” are used herein to refer to the first and second split deaminases generally, and do not refer specifically to the exemplary split deaminases described herein.
  • Also provided herein are nucleic acids encoding the fusion proteins described herein, and compositions comprising one or more of those nucleic acids, e.g., wherein the nucleic acids encode a pair of the fusion proteins, e.g., comprising a SDA1 and SDA2 portion from the same parental deaminase. Further, provided herein are vectors comprising the nucleic acids, and isolated host cells comprising and optionally expressing the nucleic acids. In some embodiments, the host cell is a stem cell, e.g., a hematopoietic stem cell.
  • In addition, provided herein are methods for targeted deamination of one or more selected cytosines in a nucleic acid. The methods include contacting the nucleic acid with a pair of fusion proteins described herein comprising a SDA1 and SDA2 portion from the same parental deaminase, as well as one or more gRNAs that interact with Cas9 domains in the fusion proteins. In some embodiments, one of the fusion proteins comprises nCas9, the other fusion protein comprises ZF or TALE, and the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
  • In some embodiments, the nucleic acid is in a cell, e.g., a eukaryotic cell, and the method comprises contact the cell with the fusion proteins or expressing the fusion proteins in the cell.
  • Also provided are methods for improving specificity of targeted deamination in a cell, e.g., a eukaryotic cell, by expressing in the cell, or contacting the cell with, a pair of fusion proteins described herein comprising a sDA1 and sDA2 portion from the same parental deaminase, as well as one or more gRNAs that interact with Cas9 domains in the fusion proteins. In some embodiments, one of the fusion proteins comprises nCas9, the other fusion protein comprises ZF or TALE, the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
  • In some embodiments, the fusion protein is delivered as an RNP, mRNA, or plasmid.
  • Also provided herein are methods for deaminating one or more selected cytosines in a nucleic acid, by contacting the nucleic acid with a pair of fusion proteins described herein comprising a sDA1 and sDA2 portion from the same parental deaminase, as well as one or more gRNAs that interact with Cas9 domains in the fusion proteins.
  • In addition, provided herein are compositions comprising a purified fusion protein or pair of fusion proteins described herein, preferably a pair of fusion proteins described herein comprising a sDA1 and sDA2 portion from the same parental deaminase, an optionally one or more gRNAs that interact with Cas9 domains in the fusion proteins. In some embodiments, the composition comprise one or more ribonucleoprotein (RNP) complexes.
  • Also provided herein are ribonucleoprotein (RNP) complexes that include a variant spCas9 protein as described herein and a guide RNA that targets a sequence having a PAM sequence targeted by the split deaminase fusion protein comprising Cas9 or Cas9 derivative.
  • Also provided herein are methods for targeted deamination, or improving specificity of targeted deamination, of a selected cytosine in a nucleic acid, comprising contacting the nucleic acid with one or more of the fusion proteins or base editing systems described herein.
  • In some embodiments, the fusion protein is delivered as an RNP, mRNA, or plasmid DNA.
  • Also provided herein are methods for deaminating a selected cytosine in a nucleic acid, the method comprising contacting the nucleic acid with a fusion protein or base editing system described herein.
  • Additionally, provided herein are compositions comprising a purified a fusion protein or base editing system as described herein.
  • Further, provided herein are nucleic acids encoding a fusion protein or base editing system described herein, as well as vectors comprising the nucleic acids, and host cells comprising the nucleic acids, e.g., stem cells, e.g., hematopoietic stem cells.
  • Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control.
  • Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
  • DESCRIPTION OF DRAWINGS
  • FIG. 1. Diagram of an exemplary typical high efficiency base editing setup. A nicking Cas9 bearing a catalytically inactivating mutation at one of its two nuclease domains binds to the target site dictated by the variable spacer sequence of the gRNA. The formation of a stable R-loop creates a ssDNA editing window on the non-deaminated strand. The Cas9 creates a single strand break in the genomic DNA, prompting the host cell to repair the lesion using the deaminated strand as a template, thus biasing repair towards the cytosine→thymine transition substitution. See Komor et al., 2016.
  • FIGS. 2A-2G. Schematic representation of: 2A.) First-generation base editor targeting and deaminating at an on-target site, with a deaminase targeting an R-loop generated by an on-target nCas9. 2B.) First-generation base-editor binding to and deaminating an off-target genomic R-loop independent of its nCas9 targeting capabilities. 2C.) First-generation base-editor binding to and deaminating an off-target genomic transcription bubble independent of its nCas9 targeting capabilities. (Note: 2B and 2C are exemplary cases of genomic ssDNA targets potentially available to BE deamination, but do not constitute an exhaustive list.) 2D.) A split-deaminase (sDA) BE targeting a genomic site with an nCas9-mediated R-loop and adjacent TALE or ZF binding. Note that the two split deaminase portions (sDA1 and sDA2) are brought into close proximity by the adjacent binding, reconstituting their catalytic activity and allowing on-target deamination. 2E and 2F.) Even if one half of a split BE could bind a non-target piece of ssDNA such as a genomic R-loop or transcription bubble through its sDA domain, it would not have enough machinery to reconstitute deaminase enzymatic activity (sDA2-nCas9-UGI half shown). 2G. Off-target binding of the ZF or nCas9 components of a sDA system (ZF-sDA1 shown) would not result in co-localization of enough machinery to reconstitute deaminase enzymatic activity.
  • FIGS. 3A-3B. hAPOBEC3G with representative candidate split sites. Multiple rotational views of the hAPOBEC3G structure are shown. Magenta colored loop regions are candidate split sites selected on the bases of their lack of secondary structures and their distance from the catalytic center. PDB: 3E1U.
  • FIG. 4. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.1-3AC3L-ZF-C and N-sDA2.1-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. The sDA1.1 and sDA2.1 pair resulted in significant C-to-T conversion when the ZF binding site was upstream of gRNA binding site with an in-series orientation with 31 bps in-between. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
  • FIG. 5. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.1-3AC3L-ZF-C and N-sDA2.2-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. The sDA1.1 and sDA2.2 pair did not stimulate discernable C-to-T conversion in any orientation attempted. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
  • FIG. 6. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.1-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. Low-level C-to-T mutations are observed primarily when using gRNA2 with either ZF, with gRNA1 experiments yielding detectable but diminished levels of activity. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
  • FIG. 7. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.2-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. Low-level C-to-T mutations are observed primarily when using gRNA2 with either ZF, with gRNA1 experiments yielding detectable but diminished levels of activity. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
  • FIG. 8. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.3-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
  • FIG. 9. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.3-3AC3L-ZF-C and N-sDA2.2-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
  • FIG. 10. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.3-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 11. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.3-3AC3L-ZF-C and N-sDA2.4-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 12. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.4-3AC3L-ZF-C and N-sDA2.3-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 13. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.4-3AC3L-ZF-C and N-sDA2.4-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 14. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.5-3AC3L-ZF-C and N-sDA2.4-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 15. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.5-3AC3L-ZF-C and N-sDA2.6-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 16. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.6-3AC3L-ZF-C and N-sDA2.6-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N→C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 17. C-to-T conversion data with first-generation BE3 (described in reference 1) with both gRNAs used in this study. (Note that the coloration gradient of these samples is shaded lighter than graphs above and that direct comparison requires evaluation of relative numerical rates). Orientation information is depicted, with an arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM). EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 18. C-to-T conversion rates of individual N-sDA1-ZF-C proteins without an adjacent sDA2-nCas9-UGI. No discernable editing observed. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 19. C-to-T conversion rates of individual N-sDA2-nCas9-UGI-C proteins without an adjacent N-sDA1-ZF-C. No discernable editing was observed. EGFP target sequence,
  • (SEQ ID NO: 1)
    CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC
    T.
  • FIG. 20. Evidence of C-to-T conversion when using adjacently-targeting N-sDA1.X-NLS-ZF-C and N-sDA2.X-nCas9-UGI-C human APOBEC3a (hA3A) split Base Editors in the indicated orientation. Pointed boxes representing the nCas9 gRNA binding site (gRNA2) and ZF binding site (ZF1) are shown, with the pointed ends indicating the PAM-proximal end of the gRNA and indicating the N→C orientation of the ZF, respectively. Conversion rates at each position are indicated by shaded boxes. Rates of deamination by split BE pairs are around 2.5% per cytosine using the sDA1.6+sDA2.6 configuration and around 1.7% per cytosine for the sDA1.1+sDA2.1 configuration, while a hAPOBEC3A-nCas9-UGI positive control possessed 3-4× the amount of on-target activity as active hA3A halfase pairs. gRNA target region:
  • (SEQ ID NO: 2)
    CATGCCCGAAGGCTACGTCCAG.
  • FIGS. 21A-21D. Summary of C-to-T conversion rate of all rAPO1 halfase combination base editors as compared to a benchmark BE3 base editor at an integrated EGFP locus. The sum of total C-to-T editing percentages among three cytosines within or near the target gRNA's approximate editing window is shown, as averaged between two replicates. 21A shows the ZF1+gRNA1 data, 21B shows the ZF1+gRNA2 data, 21C shows the ZF2+gRNA1 data, 21D shows the ZF2+gRNA2 data.
  • FIG. 22. Representation of a portion of the EGFP reporter gene and the target sites used for the rAPO1 halfase combination experiments. EGFP target region:
  • (SEQ ID NO: 3)
    GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCC
    GCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACG.
  • DETAILED DESCRIPTION
  • In the most efficient BE configuration described to date, a cytosine deaminase (DA) domain and uracil glycosylase inhibitor (UGI; a small bacteriophage protein that inhibits host cell uracil DNA glycosylase (UDG), the enzyme responsible for excising uracil from the genome1, 4) are both fused to nCas9 (derived from either Streptococcus pyogenes Cas9 (SpCas9) or Staphylococcus aureus Cas9 (SaCas9). The nCas9 forms an R-loop at a target site specified by its single guide RNA (gRNA) and recognition of an adjacent protospacer adjacent motif (PAM), leaving approximately 4-8 nucleotides of the non-target strand exposed as single stranded DNA (ssDNA) near the PAM-distal end of the R-loop (FIG. 1). This region of the ssDNA is the template that is able to be deaminated by the ssDNA-specific DA domain to produce a guanosine:uracil (G:U) mismatch and defines the editing window. The nCas9 nicks the non-deaminated strand of DNA, biasing conversion of the G:U mismatch to an adenine:thymine (A:T) base pair by directing the cell to repair the nick lesion using the deaminated strand as a template. To date, the deaminase domains described in these fusion proteins have been rat APOBEC1 (rAPO1), an activation-induced cytosine deaminase (AID) derived from lamprey termed CDA (PmCDA), human AID (hAID), or a hyperactive form of hAID lacking a nuclear export signal, or an engineered variant of human APOBEC3A (hA3A) termed eA3A1-2, 5-7, 16. Any of these deaminase domains from these BEs can be used as parental deaminases in the present fusion proteins. BE technology was primarily established using the SpCas9 protein for its nCas9 domain (nSpCas9), but although herein we refer to nCas9, in general any Cas9-like nickase could be used based on any ortholog of the Cpf1 protein (including the related Cpf1 enzyme class) to perform this function, unless specifically indicated. In addition, a completely enzymatically dead dCas9 (or Cas9-like enzyme) can also be used as the targeting mechanism of a functional BE enzyme.
  • An important consideration for the use of BE in therapeutic settings will be to assess its genome-wide capacity for off-target mutagenesis and to modify the technology to minimize or, ideally, to eliminate the risks of stimulating deleterious off-target mutations. Herein, we described technological improvements to BEs that can be used to reduce or eliminate potential unwanted BE mutagenesis.
  • Using Split Deaminases to Limit Unwanted Off-Target Base Editor Deamination
  • Because of AID/APOBEC enzymes' natural ability to bind and deaminate cytosines in genomic DNA and cytosines in RNA, non-specific spurious deamination events are a possibly important source of off-target mutagenesis in the genome and transcriptome from CRISPR Base Editor technology. In theory, even if the BE's nCas9 domain (and any potential dCas9, TALE, and/or ZF domains) are eminently specific, this might do nothing to prevent the natural RNA- and ssDNA-targeting ability of the APOBEC enzyme from non-specifically deaminating globally across the transcriptome or the whichever regions of the genome are exposed as ssDNA, such as actively transcribed regions or DNA undergoing replication. In fact, an E. coli-based assay examining deaminases showed that an actively transcribed region could be highly enriched (˜7-530 fold) for C→T transition mutations when exposed to various overexpressed mammalian deaminases4. Further, one group has found that co-expression of PmCda1 and nCas9 as two separate, untethered proteins in yeast cells results in similar levels of deamination at the gRNA-specified target site as when the two components are expressed as direct fusion partners, demonstrating that these proteins are capable of deaminating ssDNA from solution without an affinity tether to the genomic location5. This concern is especially relevant now that scientists are becoming increasingly aware that R-loops are a more common occurrence in the genomes of eukaryotic cells than previously thought, thus creating many potential steady-state off-target ssDNA substrates where an APOBEC could bind and deaminate6. While it is as yet unproven whether BE overexpression itself can sufficiently stimulate spurious deamination and mutagenesis on a global genomic scale, aberrant and over-active APOBEC deaminase activity is a known driver of tumorigenic mutagenesis7 and overexpression of at least hAPO38-11 has been shown to stimulate genomic cytosine hypermutation. Thus, it stands to reason that limiting the naturally global deaminating activity of over-expressed deaminases like BE will be important for translating BE technologies into therapeutic applications. Of note, since most BEs include at least one UGI inhibitor to bias deamination events toward productive C→T mutations, it is possible that global off-target BE activity is even more mutagenic than the effects of aberrant deaminase activity alone during tumorigenesis.
  • To impose a stricter requirement for BEs to act on their intended target sequences rather than globally, we created a split BE architecture comprised of two separate proteins consisting of reciprocal deaminase truncation variants fused to adjacently-targeted DNA binding domains. These dimeric BE technologies make use of “split deaminases” (sDAs) that require co-localization (an “AND Gate”) of both sDA domains at adjacent DNA sites to function properly. In this scenario, spurious binding events of either “halfase” of the dimeric base-editor will be unlikely to result in productive deamination events, since each component on its own will not contain the full complement of enzymatic machinery necessary to catalyze cytosine deamination (FIGS. 2A-2G).
  • To create dimeric BEs, we fused an N-terminal truncation of a split deaminase (sDA1) enzyme to a ZF (though any DNA targeting domain orthogonal to Cas9, such as Cpf1, TALE, ZF, or a dCas9 orthogonal to the nCas9 used to target sDA2, may be suitable) targeted to a ˜9-24 bp sequence, and a reciprocal or somewhat overlapping C-terminal truncation of a deaminase fused to an nCas9-UGI fusion protein, such that the N-terminal truncation and the C-terminal truncation together form a functional enzyme. The exemplary BEs were made in a similar orientation to the first-generation BE3 enzyme (sDA2-nCas9-UGI) targeting an adjacent sequence with a ˜17-24 bp target site1. To the best of the inventors' knowledge, though there is no record of functional split APOBEC enzymes (or other mammalian deaminases), a yeast cytosine deaminase (yCD) has been shown to constitute at least a partially functional enzyme (on cytosine as a metabolite and the pro-drug 5-fluorocytosine, though it was not shown to be able to deaminate DNA) when split and reconstituted by protein dimerization12 and serves as a useful template to inform how various APOBEC proteins may be effectively bifurcated; however, since the yCD shares little primary sequence homology to mammalian deaminases, and the split yCD was not reported to function on DNA and used protein scaffolds to bring its constituent pieces together, it is not obvious that yCD split deaminases will be directly comparable to those so far described for use in for BEs. Therefore, we used APOBEC structural information to determine the unstructured linker regions as potential sites at which to split APOBEC enzymes (FIGS. 3A-3B), since those sites may be less likely to affect overall functionality or folding of the constituent subdomains. This split deaminase strategy can be used with wild-type versions of deaminase enyzmes, and also any engineered variants that may be described, with the split BE potentially retaining any special features of the engineered deaminases16.
  • This architecture should virtually eliminate the capacity for spurious deamination, since any other DNA binding event by either of the two constituent halfases will lack any enzymatic deaminase activity and will be therefore unable to perturb genomic DNA. In addition, a split BE should generally increase the specificity of editing compared to typical BEs by virtue of the fact that the split BE system requires the binding of a higher number of sequential/adjacent DNA bases, thereby decreasing the off-target effects conferred by off-target binding of either halfase on its own. CRISPR BE architectures are known to induce C-to-T mutations in human cells at some genomic sites that are imperfect matches to their gRNAs13, and since ZFs are known to bind with some capacity to off-target sites it stands to reason that a ZF-BE architecture would also induce off-target mutagenesis to some capacityl14.
  • It is conceivable and likely that any CRISPR/Cas-based targeting system, including Cas9s from Streptococcus pyogenes or Stapholococcus aureus or Cpf1 proteins from various organisms could be used in place of the nCas9 portion of the sDA2-nCas9-UGI fusion protein, so long as the targeting mechanism results in specific DNA binding and the creation of an R-loop that exposes ssDNA to action by the reconstituted split deaminase. Table 1 contains a list of representative CRISPR/Cas targeting systems and the residues/mutations therein known to be important for creating nickase and catalytically inactive (dead) mutants. Note that while Cpf1 nickases have yet to be described, catalytically null Cpf1 orthologs may replicate the targeting characteristics of nCas9 such that it could form the basis of a functional sDA2 halfase. In some embodiments, ZF domains are chosen as the DNA binding domain for sDA1 due to their small size, presumed lack of immunogenicity, and because, unlike CRISPR-based targeting systems, they do not create an R-loop upon binding and do not expose additional substrate ssDNA to the deaminase domain. In principle, however, use of any engineered DNA binding domain, such as a CRISPR-based targeting complex or a TALE DNA binding domain, could still result in functional sDA1 halfase. In the examples shown herein, ZF domains targeting an integrated EGFP gene were used for the sDA1 halfases15.
  • Programmable DNA Binding Domain
  • The present fusion proteins can include programmable DNA binding domains such as engineered C2H2 zinc-fingers, transcription activator effector-like effectors (TALEs), and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) Cas RNA-guided nucleases (RGNs) and their variants, including ssDNA nickases (nCas9) or their analogs and catalytically inactive dead Cas9 (dCas9) and its analogs, and any engineered protospacer-adjacent motif (PAM) variants. A programmable DNA binding domain is one that can be engineered to bind to a selected target sequence.
  • CRISPR-Cas Nucleases
  • Although herein we refer to nCas9, in general any Cas9-like nickase could be used based on any ortholog of the Cpf1 protein (including the related Cpf1 enzyme class), unless specifically indicated.
  • TABLE 1
    List of Exemplary Cas9 Orthologs
    UniProt
    Accession Nickase Mutations/
    Ortholog Number Catalytic residues
    S. pyogenes Cas9 (SpCas9) Q99ZW2 D10A, E762A, H840A,
    N854A, N863A,
    D986A17
    S. aureus Cas9 (SaCas9) J7RUA5 D10A and N58018
    S. thermophilus Cas9 G3ECR1 D31A and N891A19
    (St1Cas9)
    S. pasteurianus Cas9 F5X275 D10, H599*
    (SpaCas9)
    C. jejuni Cas9 (CjCas9) Q0P897 D8A, H559A20
    F. novicida Cas9 (FnCas9) A0Q5Y3 D11, N99521
    P. lavamentivorans Cas9 A7HP89 D8, H601*
    (PlCas9)
    C. lari Cas9 (ClCas9) G1UFN3 D7, H567*
    F. novicida Cpf1 (FnCpf1) A0Q7Q2 D917, E1006, D125521
    M. bovoculi Cpf1 (MbCpf1) Sequence N/A**
    given at end
    A. sp. BV3L6 (AsCpf1) U2UMQ6 D908, 993E, Q1226,
    D126322
    L. bacterium N2006 (LbCpf1) A0A182DWE3 D832A24
    *predicted based on UniRule annotation on the UniProt database.
    **May be determinable based on sequence alignment with other Cpf1 orthologs

    These orthologs, and mutants and variants thereof as known in the art, can be used in any of the fusion proteins described herein. See, e.g., WO 2017/040348 (which describes variants of SaCas9 and SpCas 9 with increased specificity) and WO 2016/141224 (which describes variants of SaCas9 and SpCas 9 with altered PAM specificity).
  • The Cas9 nuclease from S. pyogenes (hereafter simply Cas9) can be guided via simple base pair complementarity between 17-20 nucleotides of an engineered guide RNA (gRNA), e.g., a single guide RNA or crRNA/tracrRNA pair, and the complementary strand of a target genomic DNA sequence of interest that lies next to a protospacer adjacent motif (PAM), e.g., a PAM matching the sequence NGG or NAG (Shen et al., Cell Res (2013); Dicarlo et al., Nucleic Acids Res (2013); Jiang et al., Nat Biotechnol 31, 233-239 (2013); Jinek et al., Elife 2, e00471 (2013); Hwang et al., Nat Biotechnol 31, 227-229 (2013); Cong et al., Science 339, 819-823 (2013); Mali et al., Science 339, 823-826 (2013c); Cho et al., Nat Biotechnol 31, 230-232 (2013); Jinek et al., Science 337, 816-821 (2012)). The engineered CRISPR from Prevotella and Francisella 1 (Cpf1) nuclease can also be used, e.g., as described in Zetsche et al., Cell 163, 759-771 (2015); Schunder et al., Int J Med Microbiol 303, 51-60 (2013); Makarova et al., Nat Rev Microbiol 13, 722-736 (2015); Fagerlund et al., Genome Biol 16, 251 (2015). Unlike SpCas9, Cpf1 requires only a single 42-nt crRNA, which has 23 nt at its 3′ end that are complementary to the protospacer of the target DNA sequence (Zetsche et al., 2015). Furthermore, whereas SpCas9 recognizes an NGG PAM sequence that is 3′ of the protospacer, AsCpf1 and LbCp1 recognize TTTN PAMs that are found 5′ of the protospacer (Id.).
  • In some embodiments, the present system utilizes a wild type or variant Cas9 protein from S. pyogenes or Staphylococcus aureus, or a wild type Cpf1 protein from Acidaminococcus sp. BV3L6 or Lachnospiraceae bacterium ND2006 either as encoded in bacteria or codon-optimized for expression in mammalian cells and/or modified in its PAM recognition specificity and/or its genome-wide specificity. A number of variants have been described; see, e.g., WO 2016/141224, PCT/US2016/049147, Kleinstiver et al., Nat Biotechnol. 2016 August; 34(8):869-74; Tsai and Joung, Nat Rev Genet. 2016 May; 17(5):300-12; Kleinstiver et al., Nature. 2016 Jan. 28; 529(7587):490-5; Shmakov et al., Mol Cell. 2015 Nov. 5; 60(3):385-97; Kleinstiver et al., Nat Biotechnol. 2015 December; 33(12):1293-1298; Dahlman et al., Nat Biotechnol. 2015 November; 33(11):1159-61; Kleinstiver et al., Nature. 2015 Jul. 23; 523(7561):481-5; Wyvekens et al., Hum Gene Ther. 2015 July; 26(7):425-31; Hwang et al., Methods Mol Biol. 2015; 1311:317-34; Osborn et al., Hum Gene Ther. 2015 February; 26(2):114-26; Konermann et al., Nature. 2015 Jan. 29; 517(7536):583-8; Fu et al., Methods Enzymol. 2014; 546:21-45; and Tsai et al., Nat Biotechnol. 2014 June; 32(6):569-76, inter alia.
  • The guide RNA is expressed or present in the cell together with the Cas9 or Cpf1. Either the guide RNA or the nuclease, or both, can be expressed transiently or stably in the cell or introduced as a purified protein or nucleic acid.
  • In some embodiments, the Cas9 also includes one of the following mutations, which reduce nuclease activity of the Cas9; e.g., for SpCas9, mutations at D10A or H840A (which creates a single-strand nickase).
  • In some embodiments, the SpCas9 variants also include mutations at one of the following amino acid positions, which destroy the nuclease activity of the Cas9: D10, E762, D839, H983, or D986 and H840 or N863, e.g., D10A/D10N and H840A/H840N/H840Y, to render the nuclease portion of the protein catalytically inactive; substitutions at these positions could be alanine (as they are in Nishimasu al., Cell 156, 935-949 (2014)), or other residues, e.g., glutamine, asparagine, tyrosine, serine, or aspartate, e.g., E762Q, H983N, H983Y, D986N, N863D, N863S, or N863H (see WO 2014/152432).
  • In some embodiments, the Cas9 is fused to one or more Uracil glycosylase inhibitor (UGI) protein sequences; an exemplary UGI sequence is as follows:
  • TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDEST
    DENVMLLTSDAPEYKPWALVIQDSNGENKIKML (SEQ ID NO: 4;
    Uniprot: P14739).

    Typically, the UGIs are at the C-terminus of a BE fusion protein, but could conceivably be at the N-terminus, or between the DNA binding domain and the sDA domain. Linkers as known in the art can be used to separate domains.
  • TAL Effector Repeat Arrays
  • Transcription activator like effectors (TALEs) of plant pathogenic bacteria in the genus Xanthomonas play important roles in disease, or trigger defense, by binding host DNA and activating effector-specific host genes. Specificity depends on an effector-variable number of imperfect, typically ˜33-35 amino acid repeats. Polymorphisms are present primarily at repeat positions 12 and 13, which are referred to herein as the repeat variable-diresidue (RVD). The RVDs of TAL effectors correspond to the nucleotides in their target sites in a direct, linear fashion, one RVD to one nucleotide, with some degeneracy and no apparent context dependence. In some embodiments, the polymorphic region that grants nucleotide specificity may be expressed as a triresidue or triplet.
  • Each DNA binding repeat can include a RVD that determines recognition of a base pair in the target DNA sequence, wherein each DNA binding repeat is responsible for recognizing one base pair in the target DNA sequence. In some embodiments, the RVD can comprise one or more of: HA for recognizing C; ND for recognizing C; HI for recognizing C; HN for recognizing G; NA for recognizing G; SN for recognizing G or A; YG for recognizing T; and NK for recognizing G, and one or more of: HD for recognizing C; NG for recognizing T; NI for recognizing A; NN for recognizing G or A; NS for recognizing A or C or G or T; N* for recognizing C or T, wherein * represents a gap in the second position of the RVD; HG for recognizing T; H* for recognizing T, wherein * represents a gap in the second position of the RVD; and IG for recognizing T.
  • TALE proteins may be useful in research and biotechnology as targeted chimeric nucleases that can facilitate homologous recombination in genome engineering (e.g., to add or enhance traits useful for biofuels or biorenewables in plants). These proteins also may be useful as, for example, transcription factors, and especially for therapeutic applications requiring a very high level of specificity such as therapeutics against pathogens (e.g., viruses) as non-limiting examples.
  • Methods for generating engineered TALE arrays are known in the art, see, e.g., the fast ligation-based automatable solid-phase high-throughput (FLASH) system described in U.S. Ser. No. 61/610,212, and Reyon et al., Nature Biotechnology 30,460-465 (2012); as well as the methods described in Bogdanove & Voytas, Science 333, 1843-1846 (2011); Bogdanove et al., Curr Opin Plant Biol 13, 394-401 (2010); Scholze & Boch, J. Curr Opin Microbiol (2011); Boch et al., Science 326, 1509-1512 (2009); Moscou & Bogdanove, Science 326, 1501 (2009); Miller et al., Nat Biotechnol 29, 143-148 (2011); Morbitzer et al., T. Proc Natl Acad Sci USA 107, 21617-21622 (2010); Morbitzer et al., Nucleic Acids Res 39, 5790-5799 (2011); Zhang et al., Nat Biotechnol 29, 149-153 (2011); Geissler et al., PLoS ONE 6, e19509 (2011); Weber et al., PLoS ONE 6, e19722 (2011); Christian et al., Genetics 186, 757-761 (2010); Li et al., Nucleic Acids Res 39, 359-372 (2011); Mahfouz et al., Proc Natl Acad Sci USA 108, 2623-2628 (2011); Mussolino et al., Nucleic Acids Res (2011); Li et al., Nucleic Acids Res 39, 6315-6325 (2011); Cermak et al., Nucleic Acids Res 39, e82 (2011); Wood et al., Science 333, 307 (2011); Hockemeye et al. Nat Biotechnol 29, 731-734 (2011); Tesson et al., Nat Biotechnol 29, 695-696 (2011); Sander et al., Nat Biotechnol 29, 697-698 (2011); Huang et al., Nat Biotechnol 29, 699-700 (2011); and Zhang et al., Nat Biotechnol 29, 149-153 (2011); all of which are incorporated herein by reference in their entirety.
  • Also suitable for use in the present methods are MegaTALs, which are a fusion of a meganuclease with a TAL effector; see, e.g., Boissel et al., Nucl. Acids Res. 42(4):2591-2601 (2014); Boissel and Scharenberg, Methods Mol Biol. 2015; 1239:171-96.
  • Zinc Fingers
  • Zinc finger (ZF) proteins are DNA-binding proteins that contain one or more zinc fingers, independently folded zinc-containing mini-domains, the structure of which is well known in the art and defined in, for example, Miller et al., 1985, EMBO J., 4:1609; Berg, 1988, Proc. Natl. Acad. Sci. USA, 85:99; Lee et al., 1989, Science. 245:635; and Klug, 1993, Gene, 135:83. Crystal structures of the zinc finger protein Zif268 and its variants bound to DNA show a semi-conserved pattern of interactions, in which typically three amino acids from the alpha-helix of the zinc finger contact three adjacent base pairs or a “subsite” in the DNA (Pavletich et al., 1991, Science, 252:809; Elrod-Erickson et al., 1998, Structure, 6:451). Thus, the crystal structure of Zif268 suggested that zinc finger DNA-binding domains might function in a modular manner with a one-to-one interaction between a zinc finger and a three-base-pair “subsite” in the DNA sequence. In naturally occurring zinc finger transcription factors, multiple zinc fingers are typically linked together in a tandem array to achieve sequence-specific recognition of a contiguous DNA sequence (Klug, 1993, Gene 135:83).
  • Multiple studies have shown that it is possible to artificially engineer the DNA binding characteristics of individual zinc fingers by randomizing the amino acids at the alpha-helical positions involved in DNA binding and using selection methodologies such as phage display to identify desired variants capable of binding to DNA target sites of interest (Rebar et al., 1994, Science, 263:671; Choo et al., 1994 Proc. Natl. Acad. Sci. USA, 91:11163; Jamieson et al., 1994, Biochemistry 33:5689; Wu et al., 1995 Proc. Natl. Acad. Sci. USA, 92: 344). Such recombinant zinc finger proteins can be fused to functional domains, such as transcriptional activators, transcriptional repressors, methylation domains, and nucleases to regulate gene expression, alter DNA methylation, and introduce targeted alterations into genomes of model organisms, plants, and human cells (Carroll, 2008, Gene Ther., 15:1463-68; Cathomen, 2008, Mol. Ther., 16:1200-07; Wu et al., 2007, Cell. Mol. Life Sci., 64:2933-44).
  • One existing method for engineering zinc finger arrays, known as “modular assembly,” advocates the simple joining together of pre-selected zinc finger modules into arrays (Segal et al., 2003, Biochemistry, 42:2137-48; Beerli et al., 2002, Nat. Biotechnol., 20:135-141; Mandell et al., 2006, Nucleic Acids Res., 34:W516-523; Carroll et al., 2006, Nat. Protoc. 1:1329-41; Liu et al., 2002, J. Biol. Chem., 277:3850-56; Bae et al., 2003, Nat. Biotechnol., 21:275-280; Wright et al., 2006, Nat. Protoc., 1:1637-52). Although straightforward enough to be practiced by any researcher, recent reports have demonstrated a high failure rate for this method, particularly in the context of zinc finger nucleases (Ramirez et al., 2008, Nat. Methods, 5:374-375; Kim et al., 2009, Genome Res. 19:1279-88), a limitation that typically necessitates the construction and cell-based testing of very large numbers of zinc finger proteins for any given target gene (Kim et al., 2009, Genome Res. 19:1279-88).
  • Combinatorial selection-based methods that identify zinc finger arrays from randomized libraries have been shown to have higher success rates than modular assembly (Maeder et al., 2008, Mol. Cell, 31:294-301; Joung et al., 2010, Nat. Methods, 7:91-92; Isalan et al., 2001, Nat. Biotechnol., 19:656-660). In preferred embodiments, the zinc finger arrays are described in, or are generated as described in, WO 2011/017293 and WO 2004/099366. Additional suitable zinc finger DBDs are described in U.S. Pat. Nos. 6,511,808, 6,013,453, 6,007,988, and 6,503,717 and U.S. patent application 2002/0160940.
  • Base Editors
  • In some embodiments, the base editor is a deaminase that modifies cytosine DNA bases, e.g., a cytosine deaminase from the apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like (APOBEC) family of deaminases, including APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D/E, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4 (see, e.g., Yang et al., J Genet Genomics. 2017 Sep. 20; 44(9):423-437); activation-induced cytosine deaminase (AID), e.g., activation induced cytosine deaminase (AICDA), cytosine deaminase 1 (CDA1), and CDA2, and cytosine deaminase acting on tRNA (CDAT). The following Table 2 provides exemplary sequences; other sequences can also be used.
  • TABLE 2
    GenBank Accession Nos.
    Deaminase Nucleic Acid Amino Acid
    hAID/AICDA NM_020661.3 isoform 1 NP_065712.1 variant 1
    NM_020661.3 isoform 2 NP_065712.1 variant 2
    APOBEC1 NM_001644.4 isoform a NP_001635.2 variant 1
    NM_005889.3 isoform b NP_005880.2 variant 3
    APOBEC2 NM_006789.3 NP_006780.1
    APOBEC3A NM_145699.3 isoform a NP_663745.1 variant 1
    NM_001270406.1 isoform b NP_001257335.1
    variant 2
    APOBEC3B NM_004900.4 isoform a NP_004891.4 variant 1
    NM_001270411.1 isoform b NP_001257340.1
    variant 2
    APOBEC3C NM_014508.2 NP_055323.2
    APOBEC3D/E NM_152426.3 NP_689639.2
    APOBEC3F NM_145298.5 isoform a NP_660341.2 variant 1
    NM_001006666.1 isoform b NP_001006667.1
    variant 2
    APOBEC3G NM_021822.3 (isoform a) NP_068594.1 (variant 1)
    APOBEC3H NM_001166003.2 NP_001159475.2
    (variant SV-200)
    APOBEC4 NM_203454.2 NP_982279.1
    CDA1* NM_127515.4 NP_179547.1
    yCD (FCY1)* NM_001184159.1 NP_015387.1
    *from Saccharomyces cerevisicae S288C
  • Exemplary split deaminase regions are shown in Table 3. Each split region listed in Table 3 represents a region of the enzyme either known to be a linker region devoid of secondary structure and positioned away from enzymatically important functions or predicted to be linker based on alignment with hAPOBEC3G where structural information is lacking (* indicates which proteins lack sufficient structural information). Unstructured recognition loops were not included due to their importance in determining substrate binding and specificity. All protein sequences acquired from uniprot.org. All positional information refers to positions within the full-length protein sequences as described below. Candidate split regions described only indicate our best attempt at a priori prediction of which splits will be functional.
  • TABLE 3
    Split Deaminase Regions
    Split Split Split Split Split Split
    Region Region Region Region Region Region
    Deaminase
    1 2 3 4 5 6
    hAID N51- D69- S85-P86 P102- M129- E153-
    H56 C75 N103 T140 E163
    rAPOBEC1* H48- Y75- S91-P92 P108- M144- N158-
    H61 R81 H109 T145 W167
    mAPOBEC3* N66-I70 V87-E93 S103- H120- M156- D170-
    P104 N121 D157 K180
    hAPOBEC3A N57- Q83-I89 S99- T118- M153- D167-
    H70 P100 H119 T154 G178
    hAPOBEC3C N57- I79-K85 S95-P96 S112- M148- Y162-
    H66 N113 D149 K172
    hAPOBEC3G N244- K270- S286- K303- M338- D352-
    H257 D276 P287 H304 T339 D362
    hAPOBEC3H* N49- K67- S83-P84 D100- M136- D150-
    H54 C73 H101 G137 Y160
    hAPOBEC3F N240- I262- S278- S295- M331- Y345-
    H249 N268 P279 N296 G332 K355
  • The split deaminase regions can include mutations that may enhance base editing, e.g., when made to the nCas9-UGI portion, e.g., mutations corresponding to W90, R126, or R132 of SEQ ID NO:46, e.g., corresponding to W90Y, R126E, R132E, of SEQ ID NO:46 (see, e.g., Kim et al. “Increasing the Genome-Targeting Scope and Precision of Base Editing with Engineered Cas9-Cytosine Deaminase Fusions.” Nature Biotechnology 35(4):371-376 (2017)). Alternatively or in addition, the split deaminase regions can include mutations at positions corresponding to one or more of N57, Y130, or K60 of SEQ ID NO:49, e.g., mutations corresponding to N57G, N57A, N57Q, Y130F, K60D of SEQ ID NO:49 (see, e.g., reference 17).
  • Variants
  • In some embodiments, the components of the fusion proteins are at least 80%, e.g., at least 85%, 90%, 95%, 97%, or 99% identical to the amino acid sequence of a exemplary sequence (e.g., as provided herein), e.g., have differences at up to 1%, 2%, 5%, 10%, 15%, or 20% of the residues of the exemplary sequence replaced, e.g., with conservative mutations, e.g., including or in addition to the mutations described herein. In preferred embodiments, the variant retains desired activity of the parent, e.g., nickase activity, and/or the ability to interact with a guide RNA and/or target DNA, optionally with improved specificity or altered substrate specificity.
  • To determine the percent identity of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). The length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%. The nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein nucleic acid “identity” is equivalent to nucleic acid “homology”). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. Percent identity between two polypeptides or nucleic acid sequences is determined in various ways that are within the skill in the art, for instance, using publicly available computer software such as Smith Waterman Alignment (Smith, T. F. and M. S. Waterman (1981) J Mol Biol 147:195-7); “BestFit” (Smith and Waterman, Advances in Applied Mathematics, 482-489 (1981)) as incorporated into GeneMatcher Plus™, Schwarz and Dayhof (1979) Atlas of Protein Sequence and Structure, Dayhof, M. O., Ed, pp 353-358; BLAST program (Basic Local Alignment Search Tool; (Altschul, S. F., W. Gish, et al. (1990) J Mol Biol 215: 403-10), BLAST-2, BLAST-P, BLAST-N, BLAST-X, WU-BLAST-2, ALIGN, ALIGN-2, CLUSTAL, or Megalign (DNASTAR) software. In addition, those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the length of the sequences being compared. In general, for proteins or nucleic acids, the length of comparison can be any length, up to and including full length (e.g., 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100%). For purposes of the present compositions and methods, at least 80% of the full length of the sequence is aligned.
  • For purposes of the present disclosure, the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
  • Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine.
  • Also provided herein are isolated nucleic acids encoding the split deaminase fusion proteins, vectors comprising the isolated nucleic acids, optionally operably linked to one or more regulatory domains for expressing the variant proteins, and host cells, e.g., mammalian host cells, comprising the nucleic acids, and optionally expressing the variant proteins. In some embodiments, the host cells are stem cells, e.g., hematopoietic stem cells.
  • In some embodiments, the fusion proteins include a linker between the DNA binding domain (e.g., ZFN, TALE, or nCas9) and the BE domains. Linkers that can be used in these fusion proteins (or between fusion proteins in a concatenated structure) can include any sequence that does not interfere with the function of the fusion proteins. In preferred embodiments, the linkers are short, e.g., 2-20 amino acids, and are typically flexible (i.e., comprising amino acids with a high degree of freedom such as glycine, alanine, and serine). In some embodiments, the linker comprises one or more units consisting of GGGS (SEQ ID NO:5) or GGGGS (SEQ ID NO:6), e.g., two, three, four, or more repeats of the GGGS (SEQ ID NO:5) or GGGGS (SEQ ID NO:6) unit. Other linker sequences can also be used.
  • In some embodiments, the split deaminase fusion protein includes a cell-penetrating peptide sequence that facilitates delivery to the intracellular space, e.g., HIV-derived TAT peptide, penetratins, transportans, or hCT derived cell-penetrating peptides, see, e.g., Caron et al., (2001) Mol Ther. 3(3):310-8; Langel, Cell-Penetrating Peptides: Processes and Applications (CRC Press, Boca Raton Fla. 2002); El-Andaloussi et al., (2005) Curr Pharm Des. 11(28):3597-611; and Deshayes et al., (2005) Cell Mol Life Sci. 62(16):1839-49.
  • Cell penetrating peptides (CPPs) are short peptides that facilitate the movement of a wide range of biomolecules across the cell membrane into the cytoplasm or other organelles, e.g. the mitochondria and the nucleus. Examples of molecules that can be delivered by CPPs include therapeutic drugs, plasmid DNA, oligonucleotides, siRNA, peptide-nucleic acid (PNA), proteins, peptides, nanoparticles, and liposomes. CPPs are generally 30 amino acids or less, are derived from naturally or non-naturally occurring protein or chimeric sequences, and contain either a high relative abundance of positively charged amino acids, e.g. lysine or arginine, or an alternating pattern of polar and non-polar amino acids. CPPs that are commonly used in the art include Tat (Frankel et al., (1988) Cell. 55:1189-1193, Vives et al., (1997) J. Biol. Chem. 272:16010-16017), penetratin (Derossi et al., (1994) J. Biol. Chem. 269:10444-10450), polyarginine peptide sequences (Wender et al., (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008, Futaki et al., (2001) J. Biol. Chem. 276:5836-5840), and transportan (Pooga et al., (1998) Nat. Biotechnol. 16:857-861).
  • CPPs can be linked with their cargo through covalent or non-covalent strategies. Methods for covalently joining a CPP and its cargo are known in the art, e.g. chemical cross-linking (Stetsenko et al., (2000) J. Org. Chem. 65:4900-4909, Gait et al. (2003) Cell. Mol. Life. Sci. 60:844-853) or cloning a fusion protein (Nagahara et al., (1998) Nat. Med. 4:1449-1453). Non-covalent coupling between the cargo and short amphipathic CPPs comprising polar and non-polar domains is established through electrostatic and hydrophobic interactions.
  • CPPs have been utilized in the art to deliver potentially therapeutic biomolecules into cells. Examples include cyclosporine linked to polyarginine for immunosuppression (Rothbard et al., (2000) Nature Medicine 6(11):1253-1257), siRNA against cyclin B1 linked to a CPP called MPG for inhibiting tumorigenesis (Crombez et al., (2007) Biochem Soc. Trans. 35:44-46), tumor suppressor p53 peptides linked to CPPs to reduce cancer cell growth (Takenobu et al., (2002) Mol. Cancer Ther. 1(12):1043-1049, Snyder et al., (2004) PLoS Biol. 2:E36), and dominant negative forms of Ras or phosphoinositol 3 kinase (PI3K) fused to Tat to treat asthma (Myou et al., (2003) J. Immunol. 171:4399-4405).
  • CPPs have been utilized in the art to transport contrast agents into cells for imaging and biosensing applications. For example, green fluorescent protein (GFP) attached to Tat has been used to label cancer cells (Shokolenko et al., (2005) DNA Repair 4(4):511-518). Tat conjugated to quantum dots have been used to successfully cross the blood-brain barrier for visualization of the rat brain (Santra et al., (2005) Chem. Commun. 3144-3146). CPPs have also been combined with magnetic resonance imaging techniques for cell imaging (Liu et al., (2006) Biochem. and Biophys. Res. Comm. 347(1):133-140). See also Ramsey and Flynn, Pharmacol Ther. 2015 Jul. 22. pii: S0163-7258(15)00141-2.
  • Alternatively or in addition, the split deaminase fusion proteins can include a nuclear localization sequence, e.g., SV40 large T antigen NLS (PKKKRRV (SEQ ID NO:7)) and nucleoplasmin NLS (KRPAATKKAGQAKKKK (SEQ ID NO: 8)). Other NLSs are known in the art; see, e.g., Cokol et al., EMBO Rep. 2000 Nov. 15; 1(5): 411-415; Freitas and Cunha, Curr Genomics. 2009 December; 10(8): 550-557.
  • In some embodiments, the split deaminase fusion proteins include a moiety that has a high affinity for a ligand, for example GST, FLAG or hexahistidine sequences. Such affinity tags can facilitate the purification of recombinant split deaminase fusion proteins.
  • The split deaminase fusion proteins described herein can be used for altering the genome of a cell. The methods generally include expressing or contacting the split deaminase fusion proteins in the cells; in versions using one or two Cas9s, the methods include using a guide RNA having a region complementary to a selected portion of the genome of the cell. Methods for selectively altering the genome of a cell are known in the art, see, e.g., U.S. Pat. No. 8,993,233; US 20140186958; U.S. Pat. No. 9,023,649; WO/2014/099744; WO 2014/089290; WO2014/144592; WO144288; WO2014/204578; WO2014/152432; WO2115/099850; U.S. Pat. No. 8,697,359; US20160024529; US20160024524; US20160024523; US20160024510; US20160017366; US20160017301; US20150376652; US20150356239; US20150315576; US20150291965; US20150252358; US20150247150; US20150232883; US20150232882; US20150203872; US20150191744; US20150184139; US20150176064; US20150167000; US20150166969; US20150159175; US20150159174; US20150093473; US20150079681; US20150067922; US20150056629; US20150044772; US20150024500; US20150024499; US20150020223; US20140356867; US20140295557; US20140273235; US20140273226; US20140273037; US20140189896; US20140113376; US20140093941; US20130330778; US20130288251; US20120088676; US20110300538; US20110236530; US20110217739; US20110002889; US20100076057; US20110189776; US20110223638; US20130130248; US20150050699; US20150071899; US20150050699; US20150045546; US20150031134; US20150024500; US20140377868; US20140357530; US20140349400; US20140335620; US20140335063; US20140315985; US20140310830; US20140310828; US20140309487; US20140304853; US20140298547; US20140295556; US20140294773; US20140287938; US20140273234; US20140273232; US20140273231; US20140273230; US20140271987; US20140256046; US20140248702; US20140242702; US20140242700; US20140242699; US20140242664; US20140234972; US20140227787; US20140212869; US20140201857; US20140199767; US20140189896; US20140186958; US20140186919; US20140186843; US20140179770; US20140179006; US20140170753; WO/2008/108989; WO/2010/054108; WO/2012/164565; WO/2013/098244; WO/2013/176772; US 20150071899; Makarova et al., “Evolution and classification of the CRISPR-Cas systems” 9(6) Nature Reviews Microbiology 467-477 (1-23) (June 2011); Wiedenheft et al., “RNA-guided genetic silencing systems in bacteria and archaea” 482 Nature 331-338 (Feb. 16, 2012); Gasiunas et al., “Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria” 109(39) Proceedings of the National Academy of Sciences USA E2579-E2586 (Sep. 4, 2012); Jinek et al., “A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity” 337 Science 816-821 (Aug. 17, 2012); Carroll, “A CRISPR Approach to Gene Targeting” 20(9) Molecular Therapy 1658-1660 (September 2012); U.S. Appl. No. 61/652,086, filed May 25, 2012; Al-Attar et al., Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs): The Hallmark of an Ingenious Antiviral Defense Mechanism in Prokaryotes, Biol Chem. (2011) vol. 392, Issue 4, pp. 277-289; Hale et al., Essential Features and Rational Design of CRISPR RNAs That Function With the Cas RAMP Module Complex to Cleave RNAs, Molecular Cell, (2012) vol. 45, Issue 3, 292-302.
  • For methods in which the split deaminase fusion proteins are delivered to cells, the proteins can be produced using any method known in the art, e.g., by in vitro translation, or expression in a suitable host cell from nucleic acid encoding the split deaminase fusion protein; a number of methods are known in the art for producing proteins. For example, the proteins can be produced in and purified from yeast, E. coli, insect cell lines, plants, transgenic animals, or cultured mammalian cells; see, e.g., Palomares et al., “Production of Recombinant Proteins: Challenges and Solutions,” Methods Mol Biol. 2004; 267:15-52. In addition, the split deaminase fusion proteins can be linked to a moiety that facilitates transfer into a cell, e.g., a lipid nanoparticle, optionally with a linker that is cleaved once the protein is inside the cell. See, e.g., LaFountaine et al., Int J Pharm. 2015 Aug. 13; 494(1):180-194.
  • Expression Systems
  • To use the split deaminase fusion proteins described herein, it may be desirable to express them from a nucleic acid that encodes them. This can be performed in a variety of ways. For example, the nucleic acid encoding the split deaminase fusion can be cloned into an intermediate vector for transformation into prokaryotic or eukaryotic cells for replication and/or expression. Intermediate vectors are typically prokaryote vectors, e.g., plasmids, or shuttle vectors, or insect vectors, for storage or manipulation of the nucleic acid encoding the split deaminase fusion for production of the split deaminase fusion protein. The nucleic acid encoding the split deaminase fusion protein can also be cloned into an expression vector, for administration to a plant cell, animal cell, preferably a mammalian cell or a human cell, fungal cell, bacterial cell, or protozoan cell.
  • To obtain expression, a sequence encoding a split deaminase fusion protein is typically subcloned into an expression vector that contains a promoter to direct transcription. Suitable bacterial and eukaryotic promoters are well known in the art and described, e.g., in Sambrook et al., Molecular Cloning, A Laboratory Manual (3d ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., eds., 2010). Bacterial expression systems for expressing the engineered protein are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et al., 1983, Gene 22:229-235). Kits for such expression systems are commercially available. Eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available.
  • The promoter used to direct expression of a nucleic acid depends on the particular application. For example, a strong constitutive promoter is typically used for expression and purification of fusion proteins. In contrast, when the split deaminase fusion protein is to be administered in vivo for gene regulation, either a constitutive or an inducible promoter can be used, depending on the particular use of the split deaminase fusion protein. In addition, a preferred promoter for administration of the split deaminase fusion protein can be a weak promoter, such as HSV TK or a promoter having similar activity. The promoter can also include elements that are responsive to transactivation, e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tetracycline-regulated systems and the RU-486 system (see, e.g., Gossen & Bujard, 1992, Proc. Natl. Acad. Sci. USA, 89:5547; Oligino et al., 1998, Gene Ther., 5:491-496; Wang et al., 1997, Gene Ther., 4:432-441; Neering et al., 1996, Blood, 88:1147-55; and Rendahl et al., 1998, Nat. Biotechnol., 16:757-761).
  • In addition to the promoter, the expression vector typically contains a transcription unit or expression cassette that contains all the additional elements required for the expression of the nucleic acid in host cells, either prokaryotic or eukaryotic. A typical expression cassette thus contains a promoter operably linked, e.g., to the nucleic acid sequence encoding the split deaminase fusion protein, and any signals required, e.g., for efficient polyadenylation of the transcript, transcriptional termination, ribosome binding sites, or translation termination. Additional elements of the cassette may include, e.g., enhancers, and heterologous spliced intronic signals.
  • The particular expression vector used to transport the genetic information into the cell is selected with regard to the intended use of the split deaminase fusion protein, e.g., expression in plants, animals, bacteria, fungus, protozoa, etc. Standard bacterial expression vectors include plasmids such as pBR322 based plasmids, pSKF, pET23D, and commercially available tag-fusion expression systems such as GST and LacZ.
  • Expression vectors containing regulatory elements from eukaryotic viruses are often used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV40 early promoter, SV40 late promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
  • The vectors for expressing the split deaminase fusion protein can include RNA Pol III promoters to drive expression of the guide RNAs, e.g., the H1, U6 or 7SK promoters. These human promoters allow for expression of split deaminase fusion protein in mammalian cells following plasmid transfection.
  • Some expression systems have markers for selection of stably transfected cell lines such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate reductase. High yield expression systems are also suitable, such as using a baculovirus vector in insect cells, with the gRNA encoding sequence under the direction of the polyhedrin promoter or other strong baculovirus promoters.
  • The elements that are typically included in expression vectors also include a replicon that functions in E. coli, a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential regions of the plasmid to allow insertion of recombinant sequences.
  • Standard transfection methods are used to produce bacterial, mammalian, yeast or insect cell lines that express large quantities of protein, which are then purified using standard techniques (see, e.g., Colley et al., 1989, J. Biol. Chem., 264:17619-22; Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are performed according to standard techniques (see, e.g., Morrison, 1977, J. Bacteriol. 132:349-351; Clark-Curtiss & Curtiss, Methods in Enzymology 101:347-362 (Wu et al., eds, 1983).
  • Any of the known procedures for introducing foreign nucleotide sequences into host cells may be used. These include the use of calcium phosphate transfection, polybrene, protoplast fusion, electroporation, nucleofection, liposomes, microinjection, naked DNA, plasmid vectors, viral vectors, both episomal and integrative, and any of the other well-known methods for introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, e.g., Sambrook et al., supra). It is only necessary that the particular genetic engineering procedure used be capable of successfully introducing at least one gene into the host cell capable of expressing the split deaminase fusion protein.
  • In methods wherein the fusion proteins include a Cas9 domain, the methods also include delivering a gRNA that interacts with the Cas9.
  • Alternatively, the methods can include delivering the split deaminase fusion protein and guide RNA together, e.g., as a complex. For example, the split deaminase fusion protein and gRNA can be can be overexpressed in a host cell and purified, then complexed with the guide RNA (e.g., in a test tube) to form a ribonucleoprotein (RNP), and delivered to cells. In some embodiments, the split deaminase fusion protein can be expressed in and purified from bacteria through the use of bacterial expression plasmids. For example, His-tagged split deaminase fusion protein can be expressed in bacterial cells and then purified using nickel affinity chromatography. The use of RNPs circumvents the necessity of delivering plasmid DNAs encoding the nuclease or the guide, or encoding the nuclease as an mRNA. RNP delivery may also improve specificity, presumably because the half-life of the RNP is shorter and there's no persistent expression of the nuclease and guide (as you′d get from a plasmid). The RNPs can be delivered to the cells in vivo or in vitro, e.g., using lipid-mediated transfection or electroporation. See, e.g., Liang et al. “Rapid and highly efficient mammalian cell engineering via Cas9 protein transfection.” Journal of biotechnology 208 (2015): 44-53; Zuris, John A., et al. “Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo.” Nature biotechnology 33.1 (2015): 73-80; Kim et al. “Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins.” Genome research 24.6 (2014): 1012-1019.
  • The present invention also includes the vectors and cells comprising the vectors, as well as kits comprising the proteins and nucleic acids described herein, e.g., for use in a method described herein.
  • EXAMPLES
  • The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
  • Materials and Methods
  • The following materials and methods were used in the Examples below.
  • Molecular Cloning
  • sDA1-containing expression plasmids were constructed by selectively amplifying desired regions of the rAPO1, hA3A, or BE3 genes, as well as DNA sequences encoding a 3AC3L-NLS or NLS only linker and desired EGFP-targeting ZFs, by the PCR method such that they had significant overlapping ends and using isothermal assembly (or “Gibson Assembly,” NEB) to assemble them in the desired order in a pCAG expression vector. sDA2-containing expression plasmids were constructed by truncating a BE3 gene by PCR and using Gibson assembly to put the resulting pieces into a pCAG expression plasmid. PCR was conducted using Q5 or Phusion polymerases (NEB).
  • Cell Culture and Transfections
  • A HEK293 cell line in which an integrated EGFP reporter gene has been integrated (unpublished) was grown in culture using media consisting of Advanced Dulbeccos Modified Medium (Gibco) supplemented with 10% heat inactivated fetal bovine serum (Gibco), 1% 10,000 U/ml penicillin-streptomycin solution (Gibco), and 1% Glutamax (Gibco). Cells were passaged every 3-4 days to maintain an actively growing population and avoid anoxic conditions. Transfections containing 1.0 microgram of transfection quality DNA (Qiagen Maxi- or Miniprep) were conducted by seeding 1.5×105 cells in 24-well TC-treated plates (Corning) and using TransIT-293 reagent according to manufacturer's protocol (Minis Bio). For split deaminase experiments: of the 1.0 micrograms of DNA transfected, 400 nanograms contained the sDA1-encoding plasmid, 400 nanograms contained the sDA2-encoding plasmid, and 200 nanograms contained an expression plasmid encoding the SpCas9 gRNA targeting the EGFP reporter gene. For BE control experiments: 400 nanograms contained BE-expressing plasmid, 400 nanograms contained a pMax-GFP-encoding plasmid (Lonza), and 200 nanograms contained an expression plasmid encoding the SpCas9 gRNA targeting the EGFP reporter gene. For individual halfase controls: 400 nanograms contained the sDA-encoding plasmid, 400 nanograms contained a pMax-GFP-encoding plasmid (Lonza), and 200 nanograms contained an expression plasmid encoding the SpCas9 gRNA targeting the EGFP reporter gene. Genomic DNA was harvested 3 days post-transfection using the DNAdvance kit (Agencourt).
  • High-Throughput Amplicon Sequencing
  • Rates of base editing at target loci were determined by deep-sequencing of PCR amplicons amplified off of genomic DNA isolated from transfected cells. Target site genomic DNA was amplified using EGFP-specific DNA primers flanking the sDA2 nCas9 binding sites. Illumina TruSeq adapters were added to the ends of the amplicons either by PCR or NEBNext Ultra II kit (NEB) and molecularly indexed with NEBNext Dual Index Primers (NEB). Samples were combined into libraries and sequenced on the Illumina MiSeq machine using the MiSeq Reagent Micro Kit v2 (Illumina). Sequencing results were analyzed using a batch version of the software CRISPResso (crispresso.rocks).
  • gRNA and ZF Target Sequences
  • ZF1 Binding site:
    (SEQ ID NO: 9)
    aGAAGATGGTg
    ZF2 Binding Site:
    (SEQ ID NO: 10)
    gGTCGGGGTAg
    gRNA1 Binding Site (with PAM):
    (SEQ ID NO: 11)
    TTCAAGTCCGCCATGCCCGAAGG
    gRNA2 Binding Site (with PAM):
    (SEQ ID NO: 12)
    CATGCCCGAAGGCTACGTCCAGG
  • Relevant Protein Sequences
  • In the following sequences, “X” indicates an undetermined amino acid residue, indicating the variable regions of a ZF that are responsible for specific DNA binding.
  • 3AC3L-NLS Linker
    (SEQ ID NO: 13)
    SSGNSNANSRGPSFSSGLVPLSLRGSHGSPKKKRKVGS
    NLS Linker
    (SEQ ID NO: 14)
    GSPKKKRKVGS
    N-rAPOBEC1 sDA1.1-3AC3L-NLS-ZF-C
    (SEQ ID NO: 15)
    MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS
    ISSGNSNANSRGPSFSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCRICMR
    NFSXXXXLXXHTRTHTGEKPFQCRICMRNFSXXXXLXXHLRTHTGEKPFQCR
    ICMRNFSXXXXLXXHLKTHLRGSSAQ
    N-rAPOBEC1 sDA1.2-3AC3L-NLS-ZF-C
    (SEQ ID NO: 16)
    MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS
    IWRHTSQNTNKHVEVNFIEKFTTERYFCPSSGNSNANSRGPSFSSGLVPLSLRG
    SHGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXHTRTHTGEKPFQCRI
    CMRNFSXXXXLXXHLRTHTGEKPFQCRTCMRNFSXXXXLXXHLKTHLRGSS
    AQ
    N-rAPOBEC1 sDA1.3-3AC3L-NLS-ZF-C
    (SEQ ID NO: 17)
    MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS
    IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSSSGNSNANSRG
    PSFSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXH
    TRTHTGEKPFQCRICMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXXXX
    LXXEILKTHLRGSSAQ
    N-rAPOBEC1 sDA1.4-3AC3L-NLS-ZF-C
    (SEQ ID NO: 18)
    MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS
    IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEF
    LSRYPSSGNSNANSRGPSFSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCR
    ICMRNFSXXXXLXXHTRTHTGEKPFQCRTCMRNFSXXXXLXXHLRTHTGEKP
    FQCRICMRNFSXXXXLXXHLKTHLRGSSAQ
    N-rAPOBEC1 sDA1.5-3AC3L-NLS-ZF-C
    (SEQ ID NO: 19)
    MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS
    IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEF
    LSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMSSGNSNANSRGPS
    FSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXHTR
    THTGEKPFQCRTCMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXXXXLX
    XHLKTHLRGSSAQ
    N-rAPOBEC1 sDA1.6-3AC3L-NLS-ZF-C
    (SEQ ID NO: 20)
    MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS
    IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEF
    LSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNF
    VNYSSSGNSNANSRGPSFSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCRI
    CMRNFSXXXXLXXHTRTHTGEKPFQCRTCMRNFSXXXXLXXHLRTHTGEKP
    FQCRICMRNFSXXXXLXXHLKTHLRGSSAQ
    N-rAPOBEC1 sDA2.1-nCas9-UGI-C
    (SEQ ID NO: 21)
    MWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECS
    RAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYC
    WRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTF
    FTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSV
    GWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTA
    RRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI
    VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDL
    NPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQ
    LPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLA
    QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTL
    LKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTE
    ELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE
    KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIER
    MTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQK
    KAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLL
    KIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQL
    KRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTF
    KEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKP
    ENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNE
    KLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSD
    KNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELD
    KAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD
    FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVY
    DVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETG
    EIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARK
    KDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFE
    KNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELAL
    PSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVIL
    ADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKR
    YTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQ
    ESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVI
    QDSNGENKIKMLSGGSPKKKRKV 
    N-rAPOBEC1 sDA2.2-nCas9-UGI-C
    (SEQ ID NO: 22)
    MNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPR
    NRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLY
    VLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGSE
    TPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI
    KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS
    FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKA
    DLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINA
    SGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFD
    LAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVN
    TEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGY
    IDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHL
    GELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKS
    EETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYN
    ELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIEC
    FDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDR
    EMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILD
    FLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIK
    KGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEE
    GIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDV
    DHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
    KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTK
    YDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVG
    TALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFF
    KTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTE
    VQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEK
    GKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFEL
    ENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQL
    FVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIH
    LFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQL
    GGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDE
    STDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV
    N-rAPOBEC1 sDA2.3-nCas9-UGI-C
    (SEQ ID NO: 23)
    MPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGV
    TIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPC
    LNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESD
    KKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
    GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE
    EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH
    MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSA
    RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS
    KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS
    MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF
    YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ
    EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE
    VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE
    GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED
    RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTY
    AHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR
    NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV
    DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQIL
    KEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFL
    KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFD
    NLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR
    EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK
    LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTETTLANGE
    IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES
    ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSV
    KELLGITIVIERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRML
    ASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY
    LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA
    PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTN
    LSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLL
    TSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV
    N-rAPOBEC1 sDA2.4-nCas9-UGI-C
    (SEQ ID NO: 24)
    MHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRN
    FVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIA
    LQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGW
    AVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRR
    YTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDE
    VAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDN
    SDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGE
    KKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGD
    QYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKA
    LVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELL
    VKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKIL
    TFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTN
    FDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV
    DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIK
    DKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRR
    YTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
    QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIV
    IEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYL
    YYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRG
    KSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGF
    IKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKD
    FQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRK
    MIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVW
    DKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDW
    DPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI
    DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSK
    YVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILAD
    ANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYT
    STKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESI
    LMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDS
    NGENKIKMLSGGSPKKKRKV
    N-rAPOBEC1 sDA2.5-nCas9-UGI-C
    (SEQ ID NO: 25)
    MTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLP
    PCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPES
    DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFD
    SGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLV
    EEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH
    MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSA
    RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS
    KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS
    MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF
    YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ
    EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE
    VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE
    GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED
    RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTY
    AHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR
    NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV
    DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQIL
    KEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFL
    KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFD
    NLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR
    EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK
    LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTETTLANGE
    IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES
    ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSV
    KELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRML
    ASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY
    LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA
    PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTN
    LSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLL
    TSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV
    N-rAPOBEC1 sDA2.6-nCas9-UGI-C
    (SEQ ID NO: 26)
    MPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTI
    ALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVG
    WAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR
    RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIV
    DEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP
    DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLP
    GEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQI
    GDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLL
    KALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEE
    LLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEK
    ILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM
    TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKK
    AIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLK
    IIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKR
    RRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKE
    DIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPEN
    IVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKL
    YLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKN
    RGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKA
    GFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFR
    KDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDV
    RKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
    WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKD
    WDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKN
    PIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPS
    KYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILA
    DANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRY
    TSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQE
    SILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQ
    DSNGENKIKMLSGGSPKKKRKV
    N-hAPOBEC3A sDA1.1-NLS-ZF-C
    (SEQ ID NO: 27)
    MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK
    MDQHRGFLHNQAKNLLGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLX
    XHTRTHTGEKPFQCRICMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXX
    XXLXXEILKTHLRGSSAQ
    N-hAPOBEC3A sDA1.2-NLS-ZF-C
    (SEQ ID NO: 28)
    MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK
    MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPGSPKKKRKVGSS
    RPGERPFQCRICMRNFSXXXXLXXHTRTHTGEKPFQCRICMRNFSXXXXLXX
    HLRTHTGEKPFQCRICMRNFSXXXXLXXHLKTHLRGSSAQ
    N-hAPOBEC3A sDA1.3-NLS-ZF-C
    (SEQ ID NO: 29)
    MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK
    MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS
    GSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXHTRTHTGEKPFQCRTC
    MRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXXXXLXXHLKTHLRGSSA
    Q
    N-hAPOBEC3A sDA1.4-NLS-ZF-C
    (SEQ ID NO: 30)
    MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK
    MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS
    PCFSWGCAGEVRAFLQENTGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXX
    LXXHTRTHTGEKPFQCRICMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFS
    XXXXLXXHLKTHLRGSSAQ
    N-hAPOBEC3A sDA1.5-NLS-ZF-C
    (SEQ ID NO: 31)
    MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK
    MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS
    PCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV
    SIMGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXHTRTHTGEKPFQC
    RICMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXXXXLXXHLKTHLRGS
    SAQ
    N-hAPOBEC3A sDA1.6-NLS-ZF-C
    (SEQ ID NO: 32)
    MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK
    MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS
    PCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV
    SIMTYDEFKHCWDTFVDHQGCPGSPKKKRKVGSSRPGERPFQCRICMRNFSX
    XXXLXXHTRTHTGEKPFQCRTCMRNFSXXXXLXXHLRTHTGEKPFQCRTCMR
    NFSXXXXLXXHLKTHLRGSSAQ
    N-hAPOBEC3A sDA2.1-nCas9-UGI-C
    (SEQ ID NO: 33)
    MCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGE
    VRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHC
    WDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATP
    ESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALL
    FDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESF
    LVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLAL
    AHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAI
    LSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL
    QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPL
    SASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQ
    EEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAIL
    RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPW
    NFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVK
    YVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIS
    GVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEER
    LKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDG
    FANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQT
    VKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELG
    SQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQ
    SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQR
    KFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDEND
    KLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIK
    KYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITL
    ANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGG
    FSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKK
    LKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRK
    RMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQH
    KHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTN
    LGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGG
    STNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENV
    MLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV
    N-hAPOBEC3A sDA2.2-nCas9-UGI-C
    (SEQ ID NO: 34)
    MAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYD
    PLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHS
    QALSGRLRAILQNQGNSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITD
    EYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRR
    KNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAY
    HEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDV
    DKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKN
    GLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYA
    DLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVR
    QQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKL
    NREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRI
    PYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDK
    NLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLL
    FKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKD
    FLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTG
    WGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKA
    QVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEM
    ARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYY
    LQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKS
    DNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK
    RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQ
    FYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMI
    AKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDK
    GRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDP
    KKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDF
    LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYV
    NFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANL
    DKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK
    EVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILM
    LPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNG
    ENKIKMLSGGSPKKKRKV
    N-hAPOBEC3A sDA2.3-nCas9-UGI-C
    (SEQ ID NO: 35)
    MPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRD
    AGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQN
    QGNSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVL
    GNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNE
    MAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKL
    VDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQL
    FEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLT
    PNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAIL
    LSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQS
    KNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDN
    GSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRF
    AWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLL
    YEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLK
    EDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIV
    LTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRD
    KQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIA
    NLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNS
    RERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELD
    INRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKN
    YWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQI
    LDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHD
    AYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFF
    YSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQ
    VNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSV
    LVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIK
    LPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSP
    EDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPI
    REQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE
    TRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDIL
    VHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKR
    KV
    N-hAPOBEC3A sDA2.4-nCas9-UGI-C
    (SEQ ID NO: 36)
    MHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWD
    TFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESD
    KKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
    GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE
    EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH
    MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSA
    RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS
    KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS
    MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF
    YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ
    EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE
    VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE
    GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED
    RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTY
    AHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR
    NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV
    DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQIL
    KEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFL
    KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFD
    NLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR
    EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK
    LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTETTLANGE
    IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES
    ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSV
    KELLGITMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRML
    ASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY
    LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA
    PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTN
    LSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLL
    TSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV
    N-hAPOBEC3A sDA2.5-nCas9-UGI-C
    (SEQ ID NO: 37)
    MTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQG
    NSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGN
    TDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEM
    AKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLV
    DSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLF
    EENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTP
    NFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILL
    SDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSK
    NGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGS
    IPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAW
    MTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYE
    YFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKED
    YFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLT
    LTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQ
    SGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANL
    AGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRE
    RMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDIN
    RLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNY
    WRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
    DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDA
    YLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFY
    SNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQV
    NIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVL
    VVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKL
    PKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPE
    DNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIR
    EQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYET
    RIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDIL
    VHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKR
    KV
    N-hAPOBEC3A sDA2.5-nCas9-UGI-C
    (SEQ ID NO: 38)
    MFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESDKKY
    SIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGET
    AEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDK
    KHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKF
    RGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLS
    KSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDT
    YDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK
    RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKF
    IKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDF
    YPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVV
    DKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGM
    RKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRF
    NASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAH
    LFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNF
    MQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDE
    LVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKE
    HPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKD
    DSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNL
    TKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREV
    KVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLE
    SEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIR
    KRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIL
    PKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
    LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLAS
    AGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLD
    EIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPA
    AFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLS
    DIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTS
    DAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV
    S. aureus Cas9
    (SEQ ID NO: 39)
    MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRS
    KRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLS
    EEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAEL
    QLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLL
    ETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLY
    NALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDI
    KGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQ
    EELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNR
    LKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIE
    LAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQ
    EGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKG
    NRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQ
    KDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKF
    KKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAES
    MPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRELINDTLYSTRKD
    DKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIME
    QYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDD
    YPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCY
    EEAKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITY
    REYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG
    C. jejuni Cas9
    (SEQ ID NO: 40)
    MARILAFDIGISSIGWAFSENDELKDCGVRIFTKVENPKTGESLALPRRL
    ARSARKRLARRKARLNHLKHLIANEFKLNYEDYQSFDESLAKAYKGSLISPY
    ELRFRALNELLSKQDFARVILHIAKRRGYDDIKNSDDKEKGAILKAIKQNEEK
    LANYQSVGEYLYKEYFQKFKENSKEFTNVRNKKESYERCIAQSFLKDELKLIF
    KKQREFGFSFSKKFEEEVLSVAFYKRALKDFSHLVGNCSFFTDEKRAPKNSPL
    AFMFVALTRIINLLNNLKNTEGILYTKDDLNALLNEVLKNGTLTYKQTKKLL
    GLSDDYEFKGEKGTYFIEFKKYKEFIKALGEHNLSQDDLNEIAKDITLIKDEIK
    LKKALAKYDLNQNQIDSLSKLEFKDHLNISFKALKLVTPLMLEGKKYDEACN
    ELNLKVAINEDKKDFLPAFNETYYKDEVTNPVVLRAIKEYRKVLNALLKKYG
    KVHKINIELAREVGKNHSQRAKIEKEQNENYKAKKDAELECEKLGLKINSKNI
    LKLRLFKEQKEFCAYSGEKIKISDLQDEKMLEIDHIYPYSRSFDDSYMNKVLV
    FTKQNQEKLNQTPFEAFGNDSAKWQKIEVLAKNLPTKKQKRILDKNYKDKE
    QKNFKDRNLNDTRYIARLVLNYTKDYLDFLPLSDDENTKLNDTQKGSKVHV
    EAKSGMLTSALRHTWGFSAKDRNNHLHHAIDAVIIAYANNSIVKAFSDFKKE
    QESNSAELYAKKISELDYKNKRKFFEPFSGFRQKVLDKIDEIFVSKPERKKPSG
    ALHEETFRKEEEFYQSYGGKEGVLKALELGKIRKVNGKIVKNGDMFRVDIFK
    HKKTNKFYAVPIYTMDFALKVLPNKAVARSKKGEIKDWILMDENYEFCFSLY
    KDSLILIQTKDMQEPEFVYYNAFTSSTVSLIVSKHDNKFETLSKNQKILFKNAN
    EKEVIAKSIGIQNLKVFEKYIVSALGEVTKAEFRQREDFKK
    P. lavamentivorans Cas9
    (SEQ ID NO: 41)
    MERIFGFDIGTTSIGFSVIDYSSTQSAGNIQRLGVRIFPEARDPDGTPLNQ
    QRRQKRMMRRQLRRRRIRRKALNETLHEAGFLPAYGSADWPVVMADEPYE
    LRRRGLEEGLSAYEFGRAIYHLAQHRHFKGRELEESDTPDPDVDDEKEAANE
    RAATLKALKNEQTTLGAWLARRPPSDRKRGIHAHRNVVAEEFERLWEVQSK
    FHPALKSEEMRARISDTIFAQRPVFWRKNTLGECRFMPGEPLCPKGSWLSQQR
    RMLEKLNNLAIAGGNARPLDAEERDAILSKLQQQASMSWPGVRSALKALYK
    QRGEPGAEKSLKFNLELGGESKLLGNALEAKLADMFGPDWPAHPRKQEIRH
    AVHERLWAADYGETPDKKRVIILSEKDRKAHREAAANSFVADFGITGEQAAQ
    LQALKLPTGWEPYSIPALNLFLAELEKGERFGALVNGPDWEGWRRTNFPHRN
    QPTGEILDKLPSPASKEERERISQLRNPTVVRTQNELRKVVNNLIGLYGKPDRI
    RIEVGRDVGKSKREREEIQSGIRRNEKQRKKATEDLIKNGIANPSRDDVEKWI
    LWKEGQERCPYTGDQIGFNALFREGRYEVEHIWPRSRSFDNSPRNKTLCRKD
    VNIEKGNRMPFEAFGHDEDRWSAIQIRLQGMVSAKGGTGMSPGKVKRFLAK
    TMPEDFAARQLNDTRYAAKQILAQLKRLWPDMGPEAPVKVEAVTGQVTAQ
    LRKLWTLNNILADDGEKTRADHRHHAIDALTVACTHPGMTNKLSRYWQLRD
    DPRAEKPALTPPWDTIRADAEKAVSEIVVSHRVRKKVSGPLHKETTYGDTGT
    DIKTKSGTYRQFVTRKKIESLSKGELDEIRDPRIKEIVAAHVAGRGGDPKKAFP
    PYPCVSPGGPEIRKVRLTSKQQLNLMAQTGNGYADLGSNHHIAIYRLPDGKA
    DFEIVSLFDASRRLAQRNPIVQRTRADGASFVMSLAAGEAIMIPEGSKKGIWIV
    QGVWASGQVVLERDTDADHSTTTRPMPNPILKDDAKKVSIDPIGRVRPSND
    Ncinerea Cas9
    (SEQ ID NO: 42)
    MAAFKPNPMNYILGLDIGIASVGWAIVEIDEEENPIRLIDLGVRVFERA
    EVPKTGDSLAAARRLARSVRRLTRRRAHRLLRARRLLKREGVLQAADFDEN
    GLIKSLPNTPWQLRAAALDRKLTPLEWSAVLLHLIKHRGYLSQRKNEGETAD
    KELGALLKGVADNTHALQTGDFRTPAELALNKFEKESGHIRNQRGDYSHTFN
    RKDLQAELNLLFEKQKEFGNPHVSDGLKEGIETLLMTQRPALSGDAVQKML
    GHCTFEPTEPKAAKNTYTAERFVWLTKLNNLRILEQGSERPLTDTERATLMD
    EPYRKSKLTYAQARKLLDLDDTAFFKGLRYGKDNAEASTLMEMKAYHAISR
    ALEKEGLKDKKSPLNLSPELQDEIGTAFSLFKTDEDITGRLKDRVQPEILEALL
    KHISFDKFVQISLKALRRIVPLMEQGNRYDEACTEIYGDHYGKKNTEEKIYLP
    PIPADEIRNPVVLRALSQARKVINGVVRRYGSPARIHIETAREVGKSFKDRKEI
    EKRQEENRKDREKSAAKFREYFPNFVGEPKSKDILKLRLYEQQHGKCLYSGK
    EINLGRLNEKGYVEIDHALPFSRTWDDSFNNKVLALGSENQNKGNQTPYEYF
    NGKDNSREWQEFKARVETSRFPRSKKQRILLQKFDEDGFKERNLNDTRYINR
    FLCQFVADHMLLTGKGKRRVFASNGQITNLLRGFWGLRKVRAENDRHHALD
    AVVVACSTIAMQQKITRFVRYKEMNAFDGKTIDKETGEVLHQKAHFPQPWE
    FFAQEVMIRVFGKPDGKPEFEEADTPEKLRTLLAEKLSSRPEAVHKYVTPLFIS
    RAPNRKMSGQGHMETVKSAKRLDEGISVLRVPLTQLKLKDLEKMVNREREP
    KLYEALKARLEAHKDDPAKAFAEPFYKYDKAGNRTQQVKAVRVEQVQKTG
    VWVHNHNGIADNATIVRVDVFEKGGKYYLVPIYSWQVAKGILPDRAVVQGK
    DEEDWTVMDDSFEFKFVLYANDLIKLTAKKNEFLGYFVSLNRATGAIDIRTH
    DTDSTKGKNGIFQSVGVKTALSFQKYQIDELGKEIRPCRLKKRPPVR
    hAID
    (SEQ ID NO: 43)
    MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDF
    GYLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADF
    LRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNT
    FVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRTLGL
    hAIDv solubility variant lacking N-terminal RNA-binding region
    (SEQ ID NO: 44)
    MDPHIFTSNFNNGIGRHKTYLCYEVERLDSATSFSLDFGYLRNKNGCH
    VELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRI
    FTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNTFVENHERTFK
    AWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRTLGL
    hAIDv solubility variant lacking N-terminal RNA-binding region and the C-
    terminal poorly structured region
    (SEQ ID NO: 45)
    MDPHIFTSNFNNGIGRHKTYLCYEVERLDSATSFSLDFGYLRNKNGCH
    VELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRI
    FTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNTFVENHERTFK
    AWEGLHENSVRLSRQLRRILLPL
    rAPOBEC1 (rAPO1)
    (SEQ ID NO: 46)
    MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS
    IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEF
    LSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNF
    VNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIAL
    QSCHYQRLPPHILWATGLK
    mAPOBEC3
    (SEQ ID NO: 47)
    MGPFCLGCSHRKCYSPIRNLISQETFKFHFKNLGYAKGRKDTFLCYEV
    TRKDCDSPVSLHHGVFKNKDNIHAEICFLYWFHDKVLKVLSPREEFKITWYM
    SWSPCFECAEQIVRFLATHENLSLDIFSSRLYNVQDPETQQNLCRLVQEGAQV
    AAMDLYEFKKCWKKFVDNGGRRFRPWKRLLTNFRYQDSKLQEILRRMDPLS
    EEEFYSQFYNQRVKHLCYYHRMKPYLCYQLEQFNGQAPLKGCLLSEKGKQH
    AEILFLDKIRSMELSQVTITCYLTWSPCPNCAWQLAAFKRDRPDLILHIYTSRL
    YFHWKRPFQKGLCSLWQSGILVDVMDLPQFTDCWTNFVNPKRPFRPWKGLE
    IISRRTQRRLRRIKESWGLQDLVNDFGNLQLGPPMSN
    mAPOBEC3 catalytic domain
    (SEQ ID NO: 48)
    MGPFCLGCSHRKCYSPIRNLISQETFKFHFKNLGYAKGRKDTFLCYEV
    TRKDCDSPVSLHHGVFKNKDNIHAEICFLYWFHDKVLKVLSPREEFKITWYM
    SWSPCFECAEQIVRFLATHHNLSLDIFSSRLYNVQDPETQQNLCRLVQEGAQV
    AAMDLYEFKKCWKKFVDNGGRRFRPWKRLLTNFRYQDSKLQEILRR
    hAPOBEC3A (hA3A)
    (SEQ ID NO: 49)
    MEASPASGPRHLMDPHIFTSNFNNGIGREIKTYLCYEVERLDNGTSVK
    MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS
    PCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV
    SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN
    hAPOBEC3G
    (SEQ ID NO: 50)
    MKPHFRNTVERMYRDTFSYNFYNRPILSRRNTVWLCYEVKTKGPSRP
    PLDAKIFRGQVYSELKYHPEMRFFHWFSKWRKLHRDQEYEVTWYISWSPCT
    KCTRDMATFLAEDPKVTLTIFVARLYYFWDPDYQEALRSLCQKRDGPRATM
    KIMNYDEFQHCWSKFVYSQRELFEPWNNLPKYYILLHIMLGEILRHSMDPPTF
    TFNFNNEPWVRGRHETYLCYEVERMHNDTWVLLNQRRGFLCNQAPHKHGF
    LEGRHAELCFLDVIPFWKLDLDQDYRVTCFTSWSPCFSCAQEMAKFISKNKH
    VSLCIFTARIYDDQGRCQEGLRTLAEAGAKISIMTYSEFKHCWDTFVDHQGCP
    FQPWDGLDEHSQDLSGRLRAILQNQEN
    hAPOBEC3G catalytic domain
    (SEQ ID NO: 51)
    PPTFTFNFNNEPWVRGRHETYLCYEVERMHNDTWVLLNQRRGFLCN
    QAPHKHGFLEGRHAELCFLDVIPFWKLDLDQDYRVTCFTSWSPCFSCAQEMA
    KFISKNKHVSLCIFTARIYDDQGRCQEGLRTLAEAGAKISIMTYSEFKHCWDT
    FVDHQGCPFQPWDGLDEHSQDLSGRLRAILQNQEN
    hAPOBEC3H
    (SEQ ID NO: 52)
    MALLTAETFRLQFNNKRRLRRPYYPRKALLCYQLTPQNGSTPTRGYFE
    NKKKCHAEICFINEIKSMGLDETQCYQVTCYLTWSPCSSCAWELVDFIKAHD
    HLNLGIFASRLYYHWCKPQQKGLRLLCGSQVPVEVMGFPKFADCWENFVDH
    EKPLSFNPYKMLEELDKNSRAIKRRLERIKIPGVRAQGRYMDILCDAEV
    hAPOBEC3F
    (SEQ ID NO: 53)
    MKPHFRNTVERMYRDTFSYNFYNRPILSRRNTVWLCYEVKTKGPSRP
    RLDAKIFRGQVYSQPEHHAEMCFLSWFCGNQLPAYKCFQITWFVSWTPCPDC
    VAKLAEFLAEHPNVTLTISAARLYYYWERDYRRALCRLSQAGARVKIMDDE
    EFAYCWENFVYSEGQPFMPWYKFDDNYAFLHRTLKEILRNPMEAMYPHIFYF
    HFKNLRKAYGRNESWLCFTMEVVKHESPVSWKRGVFRNQVDPETHCHAER
    CFLSWFCDDILSPNTNYEVTWYTSWSPCPECAGEVAEFLARHSNVNLTIFTAR
    LYYFWDTDYQEGLRSLSQEGASVEIMGYKDFKYCWENFVYNDDEPFKPWK
    GLKYNFLFLDSKLQEILE
    hAPOBEC3F catalytic domain
    (SEQ ID NO: 54)
    KEILRNPMEAMYPHIFYFHFKNLRKAYGRNESWLCFTMEVVKHEISPV
    SWKRGVFRNQVDPETHCHAERCFLSWFCDDILSPNTNYEVTWYTSWSPCPEC
    AGEVAEFLARHSNVNLTIFTARLYYFWDTDYQEGLRSLSQEGASVEIMGYKD
    FKYCWENFVYNDDEPFKPWKGLKYNFLFLDSKLQEILE
    C. lari Cas9
    (SEQ ID NO: 55)
    MRILGFDIGINSIGWAFVENDELKDCGVRIFTKAENPKNKESLALPRRN
    ARSSRRRLKRRKARLIAIKRILAKELKLNYKDYVAADGELPKAYEGSLASVY
    ELRYKALTQNLETKDLARVILHIAKHRGYMNKNEKKSNDAKKGKILSALKN
    NALKLENYQSVGEYFYKEFFQKYKKNTKNFIKIRNTKDNYNNCVLSSDLEKE
    LKLILEKQKEFGYNYSEDFINEILKVAFFQRPLKDFSHLVGACTFFEEEKRACK
    NSYSAWEFVALTKIINEIKSLEKISGEIVPTQTINEVLNLILDKGSITYKKFRSCI
    NLHESISFKSLKYDKENAENAKLIDFRKLVEFKKALGVHSLSRQELDQISTHIT
    LIKDNVKLKTVLEKYNLSNEQINNLLEIEFNDYINLSFKALGMILPLMREGKR
    YDEACEIANLKPKTVDEKKDFLPAFCDSIFAHELSNPVVNRAISEYRKVLNAL
    LKKYGKVHKIHLELARDVGLSKKAREKIEKEQKENQAVNAWALKECENIGL
    KASAKNILKLKLWKEQKEICIYSGNKISIEHLKDEKALEVDHIYPYSRSFDDSFI
    NKVLVFTKENQEKLNKTPFEAFGKNIEKWSKIQTLAQNLPYKKKNKILDENF
    KDKQQEDFISRNLNDTRYIATLIAKYTKEYLNFLLLSENENANLKSGEKGSKI
    HVQTISGMLTSVLRHTWGFDKKDRNNHLHHALDAIIVAYSTNSIIKAFSDFRK
    NQELLKARFYAKELTSDNYKHQVKFFEPFKSFREKILSKIDEIFVSKPPRKRAR
    RALHKDTFHSENKIIDKCSYNSKEGLQIALSCGRVRKIGTKYVENDTIVRVDIF
    KKQNKFYAIPIYAMDFALGILPNKIVITGKDKNNNPKQWQTIDESYEFCFSLY
    KNDLILLQKKNMQEPEFAYYNDFSISTSSICVEKHDNKFENLTSNQKLLFSNA
    KEGSVKVESLGIQNLKVFEKYIITPLGDKIKADFQPRENISLKTSKKYGLR
    MbCpf1
    (SEQ ID NO: 56)
    MLFQDFTHLYPLSKTVRFELKPIDRTLEHIHAKNFLSQDETMADMHQKVKVI
    LDDYHRDFIADMMGEVKLTKLAEFYDVYLKFRKNPKDDELQKQLKDLQAV
    LRKEIVKPIGNGGKYKAGYDRLFGAKLFKDGKELGDLAKFVIAQEGESSPKL
    AHLAHFEKFSTYFTGFHDNRKNMYSDEDKHTAIAYRLIHENLPRFIDNLQILT
    TIKQKHSALYDQIINELTASGLDVSLASHLDGYHKLLTQEGITAYNTLLGGISG
    EAGSPKIQGINELINSHHNQHCHKSERIAKLRPLHKQILSDGMSVSFLPSKFAD
    DSEMCQAVNEFYRHYADVFAKVQSLFDGFDDHQKDGIYVEHKNLNELSKQA
    FGDFALLGRVLDGYYVDVVNPEFNERFAKAKTDNAKAKLTKEKDKFIKGVH
    SLASLEQAIEHYTARHDDESVQAGKLGQYFKHGLAGVDNPIQKIHNNHSTIK
    GFLERERPAGERALPKIKSGKNPEMTQLRQLKELLDNALNVAHFAKLLTTKT
    TLDNQDGNFYGEFGVLYDELAKIPTLYNKVRDYLSQKPFSTEKYKLNFGNPT
    LLNGWDLNKEKDNFGVILQKDGCYYLALLDKAHKKVFDNAPNTGKSIYQK
    MIYKYLEVRKQFPKVFFSKEAIAINYHPSKELVEIKDKGRQRSDDERLKLYRFI
    LECLKIHPKYDKKFEGAIGDIQLFKKDKKGREVPISEKDLFDKINGIFSSKPKLE
    MEDFFIGEFKRYNPSQDLVDQYNIYKKIDSNDNRKKENFYNNHPKFKKDLVR
    YYYESMCKHEEWEESFEFSKKLQDIGCYVDVNELFTEIETRRLNYKISFCNIN
    ADYIDELVEQGQLYLFQIYNKDFSPKAHGKPNLHTLYFKALFSEDNLADPIYK
    LNGEAQIFYRKASLDMNETTIHRAGEVLENKNPDNPKKRQFVYDIIKDKRYT
    QDKFMLHVPITMNFGVQGMTIKEFNKKVNQSIQQYDEVNVIGIDRGERHLLY
    LTVINSKGEILEQCSLNDITTASANGTQMTTPYHKILDKREIERLNARVGWGEI
    ETIKELKSGYLSHVVHQISQLMLKYNAIVVLEDLNFGFKRGRFKVEKQIYQNF
    ENALIKKLNHLVLKDKADDEIGSYKNALQLTNNFTDLKSIGKQTGFLFYVPA
    WNTSKIDPETGFVDLLKPRYENIAQSQAFFGKFDKICYNADKDYFEFHIDYAK
    FTDKAKNSRQIWTICSHGDKRYVYDKTANQNKGAAKGINVNDELKSLFARH
    HINEKQPNLVMDICQNNDKEFHKSLMYLLKTLLALRYSNASSDEDFILSPVAN
    DEGVFFNSALADDTQPQNADANGAYHIALKGLWLLNELKNSDDLNKVKLAI
    DNQTWLNFAQNRKRPAATKKAGQAKKKKGSYPYDVPDYAYPYDVPDYAYP
    YDVPDYA
  • Example 1
  • Since multiple mammalian deaminases may constitute functional BE proteins, and multiple truncation points may result in functional split BEs, we sought to examine an extensive set of split BE candidate pairs (Table 2, above, shows a representative list of deaminases that may be suitable for BE applications). Deaminase truncation points were chosen by evaluating structural information and determining which amino acid residues within the deaminase domains were unlikely to contribute to meaningful secondary structural components and were thus unlikely to affect the functionality of an intact enzyme. We chose six potential truncations regions for six mammalian deaminases (a full table of predicted split regions is included in Table 3, above). Each split region corresponds to homologous regions in each of the listed deaminases based on protein alignment. sDA1 halfases that contain a truncation variant from split region X are referred to as sDA1.X, and sDA2 halfases are similarly named.
  • Tables 4-6 show the exact truncation variants that we have created and evaluated. In addition to exactly reciprocal halfase pairs (in which the sDA1 and sDA2 portions of the BE contain truncation variants of a deaminase perfectly bisected by a defined split site), we also tested split BEs in which the halfases shared overlapping peptide sequences. We reasoned that this “extra” overlap may enable proper folding of the constituent halfases so as to enable functional reconstitution of the deamniase, and also noted that the most functional split yCD pair included a significant overlap in peptide sequence12.
  • TABLE 4
    Exact sDA split sites chosen.
    Split 1 Split 2 Split 3
    sDA1.1 sDA2.1 sDA1.2 sDA2.2 sDA1.3 sDA2.3
    rAPOBEC1 1-50   51-229 1-78   79-229 1-91   92-229
    hAPOBEC3A 1-63   67-199 1-86   87-199 1-99  100-199
    Split 4 Split 5 Split 6
    sDA1.4 sDA2.4 sDA1.5 sDA2.6 sDA1.6 sDA2.6
    rAPOBEC1 1-108 109-229 1-144 145-229 1-160 161-229
    hAPOBEC3A 1-118 119-199 1-153 154-199 1-172 173-199

    Split sites chosen and examined in this study so far, by deaminase domain. Amino acid positions of each species is given, according to the sequences given at the end of this document.
  • TABLE 5
    rAPOBEC1 sDA BE combination pairs examined
    sDA1.1 sDA1.2 sDA1.3 sDA1.4 sDA1.5 sDA1.6
    sDA2.1 Yes Yes No No No No
    sDA2.2 Yes Yes Yes No No No
    sDA2.3 No Yes Yes Yes No No
    sDA2.4 No No Yes Yes Yes No
    sDA2.5 No No No No No No
    sDA2.6 No No No No Yes Yes
  • TABLE 6
    hAPOBEC3A sDA BE combination pairs examined
    sDA1.1 sDA1.2 sDA1.3 sDA1.4 sDA1.5 sDA1.6
    sDA2.1 Yes No No No No No
    sDA2.2 No Yes No No No No
    sDA2.3 No No Yes No No No
    sDA2.4 No No No Yes No No
    sDA2.5 No No No No Yes No
    sDA2.6 No No No No No Yes
  • Several split BE halfase combinations showed activity when targeted to adjacent DNA sequences in a human HEK293 cell line in which the EGFP reporter gene has been integrated. Each rAPOBEC1 pair was tested in two different orientations with regards to the ZF and gRNA binding sites, with two different ZF domains and two different gRNAs for 4 total orientation pairs. Only directly reciprocal hAPOBEC3A pairs were tested (e.g. sDA1.1 with sDA2.1). Activity of each BE halfase pair when co-delivered by plasmid transfection with an approximate ratio of 1:1 for each halfase is shown in FIGS. 4-16 (FIG. 17 is a positive BE3 control for comparison) for each orientation of rAPO1 sDA pairs and FIG. 20 for hA3A pairs. A summary of the cumulative editing efficiencies (the sum of the editing rates at the cytosines within the gRNA editing window) of all rAPO1 halfase pairs in each orientation is given in FIG. 21. The target site configurations for and all DNA targeting proteins used for rAPO1 experiments is shown in FIG. 22. All rAPO1 split BEs shown include an sDA1 halfase with an sDA1-3AC3L-NLS-ZF configuration, while all hA3A split BEs include an sDA1 with an sDA1-NLS-ZF configuration. In particular, and in various orientations, rAPO1 sDA1.1+rAPO1 sDA2.1, rAPO1 sDA1.2+rAPO1 sDA2.1, rAPO1 sDA1.2+rAPO1 sDA2.2, hA3A sDA1.1+sDA2.1, and hA3A 1.6+hA3A 2.6 show significant activity compared to a positive BE3 control (FIGS. 4, 6, 7, 20, and 21).
  • It is conceivable and likely that optimizations of several parameters—including the nature of the protein linker between the sDA components and the targeting domains, the spacing between the sDA1 and sDA2 binding sites, the exact sites at which the sDAs are split, the relative concentrations of the sDA components, the source deaminase, the source of the nCas9 targeting mechanism, and the type of targeting domain used for the sDA1 component—could influence, change, or enhance the nature of a split BE platform. Furthermore, it is likely that split base editor pairs that do not include an sDA2 fused to an nCas9-UGI domain as in previously described base editors will still retain some limited mutagentic capacity so long as their DNA targeting proteins are brought to adjacent sequences of DNA, since the reconstituted deaminase domain will still be active around such target sites.
  • Importantly, none of the individual rAPO1 halfases are active on their own, suggesting a requirement for both halfases for genuine reconstitution of the functional deaminase domain (FIGS. 18 and 19). Furthermore, the fact that functional combinations seem to prefer certain orientations over others (for instance, rAPO1 sDA1.1+rAPO1 sDA2.1 is mostly functional in one of the orientations tested) suggests that the deaminase domains genuinely require adjacent binding of their DNA-targeting domains to function, making it unlikely that the deaminase domain can become reconstituted at other sites in the genome and thus unlikely to cause spurious genomic deamination.
  • REFERENCES
    • 1. Komor, Alexis C., Yongjoo B. Kim, Michael S. Packer, John A. Zuris, and David R. Liu. “Programmable Editing of a Target Base in Genomic DNA without Double-stranded DNA Cleavage.” Nature 533.7603 (2016): 420-24.
    • 2. Yang, Luhan, Adrian W. Briggs, Wei Leong Chew, Prashant Mali, Marc Guell, John Aach, Daniel Bryan Goodman, David Cox, Yinan Kan, Emal Lesha, Venkataramanan Soundararaj an, Feng Zhang, and George Church. “Engineering and Optimising Deaminase Fusions for Genome Editing.” Nature Communications 7 (2016): 13330.
    • 3. Jasin, Maria, and Rodney Rothstein. “Repair of strand breaks by homologous recombination.” Cold Spring Harbor perspectives in biology 5.11 (2013): a012740.
    • 4. Harris, Reuben S., Svend K. Petersen-Mahrt, and Michael S. Neuberger. “RNA Editing Enzyme APOBEC1 and Some of Its Homologs Can Act as DNA Mutators.” Molecular Cell 10.5 (2002): 1247-253.
    • 5. Nishida, K., T. Arazoe, N. Yachie, S. Banno, M. Kakimoto, M. Tabata, M. Mochizuki, A. Miyabe, M. Araki, K. Y. Hara, Z. Shimatani, and A. Kondo. “Targeted Nucleotide Editing Using Hybrid Prokaryotic and Vertebrate Adaptive Immune Systems.” Science 353.6305 (2016).
    • 6. Santos-Pereira, José M., and Andres Aguilera. “R Loops: New Modulators of Genome Dynamics and Function.” Nature Reviews Genetics 16.10 (2015): 583-97.
    • 7. Rebhandl, Stefan, Michael Huemer, Richard Greil, and Roland Geisberger. “AID/APOBEC Deaminases and Cancer.” Oncoscience 2 (2015): 320.
    • 8. Suspene, Rodolphe, et al. “Recovery of APOBEC3-edited human immunodeficiency virus G→A hypermutants by differential DNA denaturation PCR.” Journal of general virology 86.1 (2005): 125-129.
    • 9. Aynaud, Marie-Ming, et al. “Human Tribbles 3 protects nuclear DNA from cytidine deamination by APOBEC3A.” Journal of Biological Chemistry 287.46 (2012): 39182-39192.
    • 10. Shinohara, Masanobu, et al. “APOBEC3B can impair genomic stability by inducing base substitutions in genomic DNA in human cells.” Scientific reports 2 (2012): 806.
    • 11. Holtz, Colleen M., Holly A. Sadler, and Louis M. Mansky. “APOBEC3G cytosine deamination hotspots are defined by both sequence context and single-stranded DNA secondary structure.” Nucleic acids research (2013): gkt246.
    • 12. Ear, Po Hien, and Stephen W. Michnick. “A General Life-death Selection Strategy for Dissecting Protein Functions.” Nature Methods 6.11 (2009): 813-16.
    • 13. Rees, Holly A., et al. “Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery.” Nature Communications 8 (2017): ncomms15790.
    • 14. Pattanayak, Vikram, et al. “Revealing off-Target Cleavage Specificities of Zinc-Finger Nucleases by in Vitro Selection.” Nature Methods, vol. 8, no. 9, July 2011, pp. 765-770., doi:10.1038/nmeth.1670.
    • 15. Maeder, Morgan L., et al. “Rapid ‘Open-Source’ Engineering of Customized Zinc-Finger Nucleases for Highly Efficient Gene Modification.” Molecular Cell, vol. 31, no. 2, 2008, pp. 294-301., doi:10.1016/j.molce1.2008.06.016.
    • 16. Jason M. Gerhke, Oliver R. Cervantes, M. Kendell Clement, Luca Pinello, J. Keith Joung, “High-precision CRISPR-Cas9 base editors with minimized bystander and off-target mutations,” bioRxiv 273938; doi: doi.org/10.1101/273938.
    • 17. CA2915837A1
    • 18. Friedland, Ari E., et al. “Characterization of Staphylococcus Aureus Cas9: a Smaller Cas9 for All-in-One Adeno-Associated Virus Delivery and Paired Nickase Applications.” Genome Biology, vol. 16, no. 1, 2015, doi:10.1186/s13059-015-0817-8.
    • 19. Gasiunas, G., et al. “Cas9-CrRNA Ribonucleoprotein Complex Mediates Specific DNA Cleavage for Adaptive immunity in Bacteria.” Proceedings of the National Academy of Sciences, vol. 109, no. 39, April 2012, doi:10.1073/pnas.1208507109.
    • 20. Yamada, Mari, et al. “Crystal Structure of the Minimal Cas9 from Campylobacter Jejuni Reveals the Molecular Diversity in the CRISPR-Cas9 Systems.” Molecular Cell, vol. 65, no. 6, 2017, doi:10.1016/j.molcel.2017.02.007.
    • 21. Zetsche, Bernd, et al. “Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System.” Cell, vol. 163, no. 3, 2015, pp. 759-771., doi:10,1016/j.cell.2015.09.038,
    • 22. Hirano, H., et al. “Crystal Structure of Francisella Novicida Cas9 RHA in Complex with sgRNA and Target DNA (TGG PAM).” February, 2016, doi:10.2210/pdb5b2q/pdb.
    • 23. Yamano, T., et al. “Crystal Structure of Acidaminococcus Sp. Cpf1 in Complex with CrRNA and Target DNA,” April 2016, doi:10.2210/pd
    • 24. Tang, Xu, et al. “A CRISPR-Cpf1 System for Efficient Genome Editing and Transcriptional Repression in Plants.” Nature Plants, vol. 3, no. 7, 2017, p. 17103., doi:10.1038/nplants.2017.103.
    OTHER EMBODIMENTS
  • It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.

Claims (16)

1. A fusion protein comprising:
(i) a first portion of a split deaminase (sDA1) enzyme fused to a programmable DNA-binding domain, selected from the group consisting of zinc fingers (ZFs), transcription activator effector-like effectors (TALEs), and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) Cas RNA-guided nucleases (RGNs), and catalytically inactive Cas9 (dCas9) nicking Cas9 (nCas9), wherein the sDA1 is an N-terminal truncated, catalytically inactive or deficient derivative of a parental deaminase selected from the group consisting of hAID, rAPOBEC1, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F, hAPOBEC3G, hAPOBEC3H, and variants thereof, and optionally one or more uracil glycosylase inhibitor (UGI) sequences; or
(ii) a second portion of a split deaminase (sDA2) fused to nCas9, and one or more uracil glycosylase inhibitor (UGI) proteins, or any orthogonal DNA targeting domain as the one used for its complementary sDA1 portion, wherein the sDA2 is a C-terminal truncated, catalytically inactive or deficient derivative of hAID, rAPOBEC1*, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F, hAPOBEC3G, hAPOBEC3H, and variants thereof;
wherein the co-expression of the fusion protein of (i) with the fusion protein of (ii) in eukaryotic cells and their subsequent co-localization at adjacent genomic target sites provides a catalytically active base editor.
2. A pair of the fusion proteins of claim 1, comprising:
(i) a first fusion protein comprising a first portion of a split deaminase (sDA1) enzyme fused to one or more ZFs, wherein the sDA1 is an N-terminal truncated, catalytically inactive or deficient derivative of a parental deaminase selected from the group consisting of hAID, rAPOBEC1, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F, hAPOBEC3G, or hAPOBEC3H, and variants thereof that have altered substrate specificities or activities and optionally one or more UGI sequences; and
(ii) a second fusion protein comprising a second portion of a split deaminase (sDA2) fused to an nCas9 protein and one or more UGI proteins, wherein the sDA2 is a C-terminal truncated, catalytically inactive or deficient derivative of the same parental deaminase as SDA1,
wherein the co-expression of the fusion protein of (i) with the fusion protein of (ii) in eukaryotic cells and their subsequent co-localization at adjacent genomic target sites provides a catalytically active base-editor.
3. A nucleic acid encoding a fusion protein of claim 1.
4. A composition comprising one or more nucleic acids, wherein the nucleic acids encode the pair of fusion proteins of claim 2.
5. A method of targeted deamination of one or more selected cytosines in a nucleic acid, the method comprising contacting the nucleic acid with the pair of fusion proteins of claim 2, and one or more gRNAs that interact with Cas9 domains in the fusion proteins, preferably wherein one of the fusion proteins comprises nCas9, the other fusion protein comprises ZF or TALE, the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
6. The method of claim 5, wherein the nucleic acid is in a cell, and the method comprises contact the cell with the fusion proteins or expressing the fusion proteins in the cell.
7. The method of claim 6, wherein the cell is a eukaryotic cell.
8. A method of improving specificity of targeted deamination in a cell, the method comprising expressing in the cell, or contacting the cell with, the pair of fusion proteins of claim 2, and one or more gRNAs that interact with Cas9 domains in the fusion proteins, preferably wherein one of the fusion proteins comprises nCas9, the other fusion protein comprises ZF or TALE, the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
9. The method of claim 5, wherein the fusion protein is delivered as a ribonucleoprotein (RNP) complex with one or more gRNAs that interact with Cas9 domains in the fusion proteins, mRNA, or plasmid.
10. A method of deaminating one or more selected cytosines in a nucleic acid, the method comprising contacting the nucleic acid with the pair of fusion proteins of claim 2.
11. A composition comprising a fusion protein of part (i) of claim 1; a fusion protein of part (ii) of claim 1; or a fusion protein of part (i) of claim 1 and a fusion protein of part (ii) of claim 1.
12. The composition of claim 11, comprising one or more ribonucleoprotein (RNP) complexes.
13. A vector comprising the nucleic acid of claim 3.
14. An isolated host cell comprising the nucleic acid of claim 13.
15. The host cell of claim 14, which is a stem cell.
16. The host cell of claim 15, wherein the stem cell is hematopoietic stem cell.
US16/615,538 2017-05-25 2018-05-25 Using split deaminases to limit unwanted off-target base editor deamination Abandoned US20200172895A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/615,538 US20200172895A1 (en) 2017-05-25 2018-05-25 Using split deaminases to limit unwanted off-target base editor deamination

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201762511296P 2017-05-25 2017-05-25
US201762541544P 2017-08-04 2017-08-04
US201862622676P 2018-01-26 2018-01-26
US16/615,538 US20200172895A1 (en) 2017-05-25 2018-05-25 Using split deaminases to limit unwanted off-target base editor deamination
PCT/US2018/034687 WO2018218166A1 (en) 2017-05-25 2018-05-25 Using split deaminases to limit unwanted off-target base editor deamination

Publications (1)

Publication Number Publication Date
US20200172895A1 true US20200172895A1 (en) 2020-06-04

Family

ID=64397083

Family Applications (4)

Application Number Title Priority Date Filing Date
US16/615,559 Active US11326157B2 (en) 2017-05-25 2018-05-25 Base editors with improved precision and specificity
US16/616,014 Abandoned US20200140842A1 (en) 2017-05-25 2018-05-25 Bipartite base editor (bbe) architectures and type-ii-c-cas9 zinc finger editing
US16/615,538 Abandoned US20200172895A1 (en) 2017-05-25 2018-05-25 Using split deaminases to limit unwanted off-target base editor deamination
US17/739,418 Pending US20220275356A1 (en) 2017-05-25 2022-05-09 Base editors with improved precision and specificity

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US16/615,559 Active US11326157B2 (en) 2017-05-25 2018-05-25 Base editors with improved precision and specificity
US16/616,014 Abandoned US20200140842A1 (en) 2017-05-25 2018-05-25 Bipartite base editor (bbe) architectures and type-ii-c-cas9 zinc finger editing

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/739,418 Pending US20220275356A1 (en) 2017-05-25 2022-05-09 Base editors with improved precision and specificity

Country Status (7)

Country Link
US (4) US11326157B2 (en)
EP (3) EP3630849A4 (en)
JP (5) JP2020521446A (en)
CN (3) CN111093714A (en)
AU (4) AU2018273986A1 (en)
CA (3) CA3064828A1 (en)
WO (3) WO2018218188A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022067122A1 (en) * 2020-09-25 2022-03-31 Sangamo Therapeutics, Inc. Zinc finger fusion proteins for nucleobase editing
US11326157B2 (en) 2017-05-25 2022-05-10 The General Hospital Corporation Base editors with improved precision and specificity
US11834686B2 (en) * 2018-08-23 2023-12-05 Sangamo Therapeutics, Inc. Engineered target specific base editors
US11946040B2 (en) 2019-02-04 2024-04-02 The General Hospital Corporation Adenine DNA base editor variants with reduced off-target RNA editing

Families Citing this family (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013066438A2 (en) 2011-07-22 2013-05-10 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US20150044192A1 (en) 2013-08-09 2015-02-12 President And Fellows Of Harvard College Methods for identifying a target site of a cas9 nuclease
US9359599B2 (en) 2013-08-22 2016-06-07 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US9388430B2 (en) 2013-09-06 2016-07-12 President And Fellows Of Harvard College Cas9-recombinase fusion proteins and uses thereof
US9737604B2 (en) 2013-09-06 2017-08-22 President And Fellows Of Harvard College Use of cationic lipids to deliver CAS9
US9340799B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College MRNA-sensing switchable gRNAs
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
IL286474B2 (en) 2014-06-23 2023-11-01 Massachusetts Gen Hospital Genomewide unbiased identification of dsbs evaluated by sequencing (guide-seq)
AU2015298571B2 (en) 2014-07-30 2020-09-03 President And Fellows Of Harvard College Cas9 proteins including ligand-dependent inteins
EP3322804B1 (en) 2015-07-15 2021-09-01 Rutgers, The State University of New Jersey Nuclease-independent targeted gene editing platform and uses thereof
EP3347467B1 (en) 2015-09-11 2021-06-23 The General Hospital Corporation Full interrogation of nuclease dsbs and sequencing (find-seq)
US12043852B2 (en) 2015-10-23 2024-07-23 President And Fellows Of Harvard College Evolved Cas9 proteins for gene editing
KR20250103795A (en) 2016-08-03 2025-07-07 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 Adenosine nucleobase editors and uses thereof
CN109804066A (en) 2016-08-09 2019-05-24 哈佛大学的校长及成员们 Programmable CAS9- recombination enzyme fusion proteins and application thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
AU2017342543B2 (en) 2016-10-14 2024-06-27 President And Fellows Of Harvard College AAV delivery of nucleobase editors
WO2018119359A1 (en) 2016-12-23 2018-06-28 President And Fellows Of Harvard College Editing of ccr5 receptor gene to protect against hiv infection
JP7219972B2 (en) 2017-01-05 2023-02-09 ラトガース,ザ ステート ユニバーシティ オブ ニュー ジャージー DNA double-strand break-independent targeted gene editing platform and its applications
EP3592381A1 (en) 2017-03-09 2020-01-15 President and Fellows of Harvard College Cancer vaccine
EP3592853A1 (en) 2017-03-09 2020-01-15 President and Fellows of Harvard College Suppression of pain by gene editing
KR20190127797A (en) 2017-03-10 2019-11-13 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 Cytosine to Guanine Base Editing Agent
CA3057192A1 (en) 2017-03-23 2018-09-27 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable dna binding proteins
WO2018209320A1 (en) 2017-05-12 2018-11-15 President And Fellows Of Harvard College Aptazyme-embedded guide rnas for use with crispr-cas9 in genome editing and transcriptional activation
CN111801345A (en) 2017-07-28 2020-10-20 哈佛大学的校长及成员们 Methods and compositions for evolutionary base editors using phage-assisted sequential evolution (PACE)
EP3672612A4 (en) 2017-08-23 2021-09-29 The General Hospital Corporation MANIPULATED CRISPR-CAS9-NUCLEASES WITH CHANGED PAM SPECIFICITY
WO2019139645A2 (en) 2017-08-30 2019-07-18 President And Fellows Of Harvard College High efficiency base editors comprising gam
EP3694993A4 (en) 2017-10-11 2021-10-13 The General Hospital Corporation METHOD OF DETECTING A SITE-SPECIFIC AND UNDESIRED GENOMIC DESAMINATION INDUCED BY BASE EDITING TECHNOLOGIES
CA3082251A1 (en) 2017-10-16 2019-04-25 The Broad Institute, Inc. Uses of adenosine base editors
EP3724214A4 (en) 2017-12-15 2021-09-01 The Broad Institute Inc. SYSTEMS AND PROCEDURES FOR PREDICTING REPAIR RESULTS IN GENE ENGINEERING
WO2019161783A1 (en) * 2018-02-23 2019-08-29 Shanghaitech University Fusion proteins for base editing
EP3781585A4 (en) 2018-04-17 2022-01-26 The General Hospital Corporation SENSITIVE IN VITRO TESTS FOR SUBSTRATE PREFERENCES AND SITE FOR NUCLEIC ACID BINDERS, MODIFIERS AND CLEAVAGE AGENTS
US12133884B2 (en) 2018-05-11 2024-11-05 Beam Therapeutics Inc. Methods of substituting pathogenic amino acids using programmable base editor systems
WO2019226953A1 (en) 2018-05-23 2019-11-28 The Broad Institute, Inc. Base editors and uses thereof
KR20210049859A (en) 2018-08-28 2021-05-06 플래그쉽 파이어니어링 이노베이션스 브이아이, 엘엘씨 Methods and compositions for regulating the genome
WO2020051562A2 (en) 2018-09-07 2020-03-12 Beam Therapeutics Inc. Compositions and methods for improving base editing
US12281338B2 (en) 2018-10-29 2025-04-22 The Broad Institute, Inc. Nucleobase editors comprising GeoCas9 and uses thereof
US12351837B2 (en) 2019-01-23 2025-07-08 The Broad Institute, Inc. Supernegatively charged proteins and uses thereof
AU2020216484A1 (en) * 2019-01-31 2021-07-29 Beam Therapeutics Inc. Nucleobase editors having reduced off-target deamination and methods of using same to modify a nucleobase target sequence
CN113661248B (en) * 2019-02-02 2022-09-16 上海科技大学 Inhibition of unintended mutations in gene editing
CN110804628B (en) * 2019-02-28 2023-05-12 中国科学院脑科学与智能技术卓越创新中心 High-specificity off-target-free single-base gene editing tool
DE112020001306T5 (en) 2019-03-19 2022-01-27 Massachusetts Institute Of Technology METHODS AND COMPOSITIONS FOR EDITING NUCLEOTIDE SEQUENCES
US12473543B2 (en) 2019-04-17 2025-11-18 The Broad Institute, Inc. Adenine base editors with reduced off-target effects
CN112048497B (en) * 2019-06-06 2023-11-03 辉大(上海)生物科技有限公司 Novel single-base editing technology and application thereof
CN114258398A (en) 2019-06-13 2022-03-29 总医院公司 Engineered human endogenous virus-like particles and methods of delivery to cells using the same
WO2021046155A1 (en) 2019-09-03 2021-03-11 Voyager Therapeutics, Inc. Vectorized editing of nucleic acids to correct overt mutations
US20220290134A1 (en) * 2019-09-17 2022-09-15 Rutgers,The State University Of New Jersey Highly Efficient DNA Base Editors Mediated By RNA-Aptamer Recruitment For Targeted Genome Modification And Uses Thereof
CN110564752B (en) * 2019-09-30 2021-07-16 北京市农林科学院 Application of differential surrogate technology in enrichment of C·T base substitution cells
US12435330B2 (en) 2019-10-10 2025-10-07 The Broad Institute, Inc. Methods and compositions for prime editing RNA
CN112725348B (en) * 2019-10-28 2022-04-01 安徽省农业科学院水稻研究所 Gene and method for improving single-base editing efficiency of rice and application of gene
EP4069282A4 (en) * 2019-12-06 2023-11-08 The General Hospital Corporation FRACTIONATED DEAMINASE BASE EDITORS
CA3165802A1 (en) * 2020-01-25 2021-07-29 The Trustees Of The University Of Pennsylvania Compositions for small molecule control of precise base editing of target nucleic acids and methods of use thereof
CA3166153A1 (en) * 2020-01-28 2021-08-05 The Broad Institute, Inc. Base editors, compositions, and methods for modifying the mitochondrial genome
US20230049455A1 (en) * 2020-01-31 2023-02-16 University Of Massachusetts A cas9-pdbd base editor platform with improved targeting range and specificity
EP4103704A4 (en) * 2020-02-13 2024-10-16 Beam Therapeutics Inc. COMPOSITIONS AND METHODS FOR ENGRAFTING BASE EDITED CELLS
EP4103705A4 (en) * 2020-02-14 2024-02-28 Ohio State Innovation Foundation NUCLEOBASE EDITORS AND METHODS OF USE THEREOF
WO2021178717A2 (en) 2020-03-04 2021-09-10 Flagship Pioneering Innovations Vi, Llc Improved methods and compositions for modulating a genome
BR112022017732A2 (en) * 2020-03-04 2023-01-17 Suzhou Qi Biodesign Biotechnology Company Ltd IMPROVED CYTOSINE BASE EDIT SYSTEM
JP2023525304A (en) 2020-05-08 2023-06-15 ザ ブロード インスティテュート,インコーポレーテッド Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
GB202010348D0 (en) 2020-07-06 2020-08-19 Univ Wageningen Base editing tools
BR112023001272A2 (en) 2020-07-24 2023-04-04 Massachusetts Gen Hospital IMPROVED VIRUS-LIKE PARTICLES AND METHODS OF USING THEM FOR DELIVERY TO CELLS
BR112023003972A2 (en) * 2020-09-04 2023-04-18 Univ Kobe Nat Univ Corp COMPLEXES OF A NUCLEIC ACID SEQUENCE RECOGNITION MODULE LINKED TO A DEAMINASE AND AN AMINO-TERMINAL FRAGMENT OF A NUCLEIC ACID SEQUENCE RECOGNITION MODULE, A DEAMINASE AND A CARBOXY-TERMINAL FRAGMENT OF AN ACID SEQUENCE RECOGNITION MODULE LINKED NUCLEIC ACID, NUCLEIC ACID, VECTOR, AND METHOD FOR ALTERING A TARGET SITE OF A CELL DOUBLE-STRANDED DEOXYRIBONUCLEIC ACID
CN116157517A (en) * 2020-09-04 2023-05-23 国立大学法人广岛大学 Method for editing target DNA, method for preparing cells edited with target DNA, and DNA editing system for them
KR102399035B1 (en) * 2020-10-21 2022-05-17 성균관대학교산학협력단 Vector expressing cytosine base editor without off-target effect without reduction of on-target efficiency in industrial strains and uses thereof
US12123006B2 (en) * 2021-05-18 2024-10-22 Shanghaitech University Base editing tool and use thereof
WO2023279118A2 (en) * 2021-07-02 2023-01-05 University Of Maryland, College Park Cytidine deaminases and methods of genome editing using the same
CN113774085B (en) * 2021-08-20 2023-08-15 中国科学院广州生物医药与健康研究院 Single base editing tool TaC9-ABE and application thereof
WO2023050169A1 (en) * 2021-09-29 2023-04-06 深圳先进技术研究院 Method for achieving tag-to-taa conversion on genome with high throughput
IL313758A (en) * 2021-12-22 2024-08-01 Sangamo Therapeutics Inc Novel zinc finger fusion proteins for nucleobase editing
JP2023131616A (en) * 2022-03-09 2023-09-22 国立大学法人広島大学 DNA editing system, method for editing target DNA using the same, and method for producing cells with edited target DNA
CN114774399B (en) * 2022-03-25 2024-01-30 武汉大学 Method for artificially modifying single-base resolution positioning analysis of 5-hydroxymethylcytosine modification in deaminase-assisted DNA
CN114686456B (en) * 2022-05-10 2023-02-17 中山大学 Base editing system based on bimolecular deaminase complementation and application thereof
CN119630788A (en) * 2022-05-13 2025-03-14 辉大(上海)生物科技有限公司 Programmable adenine base editor and its use
WO2024125313A1 (en) * 2022-12-15 2024-06-20 中国科学院遗传与发育生物学研究所 Base editor and use thereof
EP4642791A1 (en) * 2022-12-30 2025-11-05 Peking University Nucleobase editor systems and methods of use thereof
WO2024226156A1 (en) * 2023-04-27 2024-10-31 University Of Massachusetts Cas-embedded cytidine deaminase ribonucleoprotein complexes having improved base editing specificity and efficiency
CN120795168A (en) * 2023-11-30 2025-10-17 华南农业大学 Recombinant fusion protein used as plant double-base editor and application thereof
CN119431603A (en) * 2024-10-09 2025-02-14 中山大学 Fusion protein and application thereof
CN119505015B (en) * 2024-11-15 2025-10-17 北京大学 Fusion protein and application method thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016103233A2 (en) * 2014-12-24 2016-06-30 Dana-Farber Cancer Institute, Inc. Systems and methods for genome modification and regulation

Family Cites Families (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0781331B1 (en) 1994-08-20 2008-09-03 Gendaq Limited Improvements in or relating to binding proteins for recognition of dna
US6294330B1 (en) * 1997-01-31 2001-09-25 Odyssey Pharmaceuticals Inc. Protein fragment complementation assays for the detection of biological or drug interactions
US6534261B1 (en) 1999-01-12 2003-03-18 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
AU776576B2 (en) 1999-12-06 2004-09-16 Sangamo Biosciences, Inc. Methods of using randomized libraries of zinc finger proteins for the identification of gene function
AU2001257331A1 (en) 2000-04-28 2001-11-12 Sangamo Biosciences, Inc. Methods for designing exogenous regulatory molecules
WO2004099366A2 (en) 2002-10-23 2004-11-18 The General Hospital Corporation Context sensitive parallel optimization of zinc finger dna binding domains
EP2325332B1 (en) 2005-08-26 2012-10-31 DuPont Nutrition Biosciences ApS Use of CRISPR associated genes (CAS)
US20100100976A1 (en) 2006-08-28 2010-04-22 University Of Rochester Methods and compositions related to apobec-1 expression
NZ579002A (en) 2007-03-02 2012-03-30 Danisco Cultures with improved phage resistance
FR2925918A1 (en) 2007-12-28 2009-07-03 Pasteur Institut Typing or subtyping Salmonella bacteria comprises determining the variable sequence composition of a nucleic acid fragment amplified from the CRISPR1 and/or CRISPR2 locus
WO2010011961A2 (en) 2008-07-25 2010-01-28 University Of Georgia Research Foundation, Inc. Prokaryotic rnai-like system and methods of use
US20100076057A1 (en) 2008-09-23 2010-03-25 Northwestern University TARGET DNA INTERFERENCE WITH crRNA
WO2010054108A2 (en) 2008-11-06 2010-05-14 University Of Georgia Research Foundation, Inc. Cas6 polypeptides and methods of use
MX337838B (en) 2008-11-07 2016-03-22 Dupont Nutrition Biosci Aps Bifidobacteria crispr sequences.
WO2010066907A1 (en) 2008-12-12 2010-06-17 Danisco A/S Genetic cluster of strains of streptococcus thermophilus having unique rheological properties for dairy fermentation
WO2010132092A2 (en) * 2009-05-12 2010-11-18 The Scripps Research Institute Cytidine deaminase fusions and related methods
US20120178647A1 (en) 2009-08-03 2012-07-12 The General Hospital Corporation Engineering of zinc finger arrays by context-dependent assembly
US20110104787A1 (en) * 2009-11-05 2011-05-05 President And Fellows Of Harvard College Fusion Peptides That Bind to and Modify Target Nucleic Acid Sequences
US10087431B2 (en) 2010-03-10 2018-10-02 The Regents Of The University Of California Methods of generating nucleic acid fragments
EA024121B9 (en) 2010-05-10 2017-01-30 Дзе Реджентс Ов Дзе Юниверсити Ов Калифорния Endoribonuclease compositions and methods of use thereof
EP2630156B1 (en) 2010-10-20 2018-08-22 DuPont Nutrition Biosciences ApS Lactococcus crispr-cas sequences
WO2012164565A1 (en) 2011-06-01 2012-12-06 Yeda Research And Development Co. Ltd. Compositions and methods for downregulating prokaryotic genes
GB201122458D0 (en) 2011-12-30 2012-02-08 Univ Wageningen Modified cascade ribonucleoproteins and uses thereof
CN104284669A (en) 2012-02-24 2015-01-14 弗雷德哈钦森癌症研究中心 Compositions and methods for treating hemoglobinopathies
US9637739B2 (en) 2012-03-20 2017-05-02 Vilnius University RNA-directed DNA cleavage by the Cas9-crRNA complex
FI3597749T3 (en) 2012-05-25 2023-10-09 Univ California METHODS AND COMPOSITIONS FOR RNA-DIRECTED MODIFICATION OF TARGET DNA AND RNA-DIRECTED MODULATION OF TRANSCRIPTION
WO2013188037A2 (en) 2012-06-11 2013-12-19 Agilent Technologies, Inc Method of adaptor-dimer subtraction using a crispr cas6 protein
EP2674501A1 (en) 2012-06-14 2013-12-18 Agence nationale de sécurité sanitaire de l'alimentation,de l'environnement et du travail Method for detecting and identifying enterohemorrhagic Escherichia coli
US9258704B2 (en) 2012-06-27 2016-02-09 Advanced Messaging Technologies, Inc. Facilitating network login
WO2014071235A1 (en) 2012-11-01 2014-05-08 Massachusetts Institute Of Technology Genetic device for the controlled destruction of dna
PT3363902T (en) 2012-12-06 2019-12-19 Sigma Aldrich Co Llc Crispr-based genome modification and regulation
EP2931898B1 (en) 2012-12-12 2016-03-09 The Broad Institute, Inc. Engineering and optimization of systems, methods and compositions for sequence manipulation with functional domains
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
EP3434776A1 (en) 2012-12-12 2019-01-30 The Broad Institute, Inc. Methods, models, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof
KR20150105956A (en) 2012-12-12 2015-09-18 더 브로드 인스티튜트, 인코퍼레이티드 Delivery, engineering and optimization of systems, methods and compositions for sequence manipulation and therapeutic applications
EP2932421A1 (en) 2012-12-12 2015-10-21 The Broad Institute, Inc. Methods, systems, and apparatus for identifying target sequences for cas enzymes or crispr-cas systems for target sequences and conveying results thereof
US20140310830A1 (en) 2012-12-12 2014-10-16 Feng Zhang CRISPR-Cas Nickase Systems, Methods And Compositions For Sequence Manipulation in Eukaryotes
EP4234696A3 (en) 2012-12-12 2023-09-06 The Broad Institute Inc. Crispr-cas component systems, methods and compositions for sequence manipulation
EP2931899A1 (en) 2012-12-12 2015-10-21 The Broad Institute, Inc. Functional genomics using crispr-cas systems, compositions, methods, knock out libraries and applications thereof
ES2576126T3 (en) 2012-12-12 2016-07-05 The Broad Institute, Inc. Modification by genetic technology and optimization of improved enzyme systems, methods and compositions for sequence manipulation
PT2784162E (en) 2012-12-12 2015-08-27 Broad Inst Inc Engineering of systems, methods and optimized guide compositions for sequence manipulation
BR102013032129B1 (en) 2012-12-13 2022-06-07 Dow Agrosciences Llc Method to identify the presence of an exogenous donor DNA polynucleotide inserted within a single target eukaryotic genomic locus
EP4282970A3 (en) 2012-12-17 2024-01-17 President and Fellows of Harvard College Rna-guided human genome engineering
WO2014110552A1 (en) 2013-01-14 2014-07-17 Recombinetics, Inc. Hornless livestock
US20140212869A1 (en) 2013-01-25 2014-07-31 Agilent Technologies, Inc. Nucleic Acid Proximity Assay Involving the Formation of a Three-way junction
WO2014124226A1 (en) 2013-02-07 2014-08-14 The Rockefeller University Sequence specific antimicrobials
CA2901676C (en) 2013-02-25 2023-08-22 Sangamo Biosciences, Inc. Methods and compositions for enhancing nuclease-mediated gene disruption
JP2016507244A (en) 2013-02-27 2016-03-10 ヘルムホルツ・ツェントルム・ミュンヒェン・ドイチェス・フォルシュンクスツェントルム・フューア・ゲズントハイト・ウント・ウムベルト(ゲーエムベーハー)Helmholtz Zentrum MuenchenDeutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Gene editing in oocytes by Cas9 nuclease
US10612043B2 (en) 2013-03-09 2020-04-07 Agilent Technologies, Inc. Methods of in vivo engineering of large sequences using multiple CRISPR/cas selections of recombineering events
AU2014235794A1 (en) 2013-03-14 2015-10-22 Caribou Biosciences, Inc. Compositions and methods of nucleic acid-targeting nucleic acids
KR102874079B1 (en) 2013-03-15 2025-10-22 더 제너럴 하스피탈 코포레이션 Using truncated guide rnas (tru-grnas) to increase specificity for rna-guided genome editing
US10760064B2 (en) 2013-03-15 2020-09-01 The General Hospital Corporation RNA-guided targeting of genetic and epigenomic regulatory proteins to specific genomic loci
US11332719B2 (en) 2013-03-15 2022-05-17 The Broad Institute, Inc. Recombinant virus and preparations thereof
US20140349400A1 (en) 2013-03-15 2014-11-27 Massachusetts Institute Of Technology Programmable Modification of DNA
US20140273230A1 (en) 2013-03-15 2014-09-18 Sigma-Aldrich Co., Llc Crispr-based genome modification and regulation
US9234213B2 (en) 2013-03-15 2016-01-12 System Biosciences, Llc Compositions and methods directed to CRISPR/Cas genomic engineering systems
US20140273235A1 (en) 2013-03-15 2014-09-18 Regents Of The University Of Minnesota ENGINEERING PLANT GENOMES USING CRISPR/Cas SYSTEMS
WO2014165612A2 (en) 2013-04-05 2014-10-09 Dow Agrosciences Llc Methods and compositions for integration of an exogenous sequence within the genome of plants
US20150056629A1 (en) 2013-04-14 2015-02-26 Katriona Guthrie-Honea Compositions, systems, and methods for detecting a DNA sequence
EP3456831B1 (en) 2013-04-16 2021-07-14 Regeneron Pharmaceuticals, Inc. Targeted modification of rat genome
EP2994531B1 (en) 2013-05-10 2018-03-28 Sangamo Therapeutics, Inc. Delivery methods and compositions for nuclease-mediated genome engineering
US9873907B2 (en) 2013-05-29 2018-01-23 Agilent Technologies, Inc. Method for fragmenting genomic DNA using CAS9
US20150067922A1 (en) 2013-05-30 2015-03-05 The Penn State Research Foundation Gene targeting and genetic modification of plants via rna-guided genome editing
CN105492611A (en) 2013-06-17 2016-04-13 布罗德研究所有限公司 Optimized CRISPR-CAS double nickase systems, methods and compositions for sequence manipulation
US10011850B2 (en) 2013-06-21 2018-07-03 The General Hospital Corporation Using RNA-guided FokI Nucleases (RFNs) to increase specificity for RNA-Guided Genome Editing
WO2015010114A1 (en) 2013-07-19 2015-01-22 Larix Bioscience, Llc Methods and compositions for producing double allele knock outs
WO2015021426A1 (en) 2013-08-09 2015-02-12 Sage Labs, Inc. A crispr/cas system-based novel fusion protein and its application in genome editing
AU2014312123A1 (en) 2013-08-29 2016-03-17 Temple University Of The Commonwealth System Of Higher Education Methods and compositions for RNA-guided treatment of HIV infection
US9737604B2 (en) * 2013-09-06 2017-08-22 President And Fellows Of Harvard College Use of cationic lipids to deliver CAS9
US9388430B2 (en) 2013-09-06 2016-07-12 President And Fellows Of Harvard College Cas9-recombinase fusion proteins and uses thereof
WO2015057976A1 (en) * 2013-10-17 2015-04-23 Sangamo Biosciences, Inc. Delivery methods and compositions for nuclease-mediated genome engineering in hematopoietic stem cells
MX388127B (en) 2013-12-11 2025-03-19 Regeneron Pharma METHODS AND COMPOSITIONS FOR THE TARGETED MODIFICATION OF A GENOME.
EP3079726B1 (en) 2013-12-12 2018-12-05 The Broad Institute, Inc. Delivery, use and therapeutic applications of the crispr-cas systems and compositions for targeting disorders and diseases using particle delivery components
US20150191744A1 (en) 2013-12-17 2015-07-09 University Of Massachusetts Cas9 effector-mediated regulation of transcription, differentiation and gene editing/labeling
CA2935032C (en) 2013-12-26 2024-01-23 The General Hospital Corporation Multiplex guide rnas
EP3553176A1 (en) 2014-03-10 2019-10-16 Editas Medicine, Inc. Crispr/cas-related methods and compositions for treating leber's congenital amaurosis 10 (lca10)
EP3180426B1 (en) * 2014-08-17 2019-12-25 The Broad Institute, Inc. Genome editing using cas9 nickases
US10190106B2 (en) * 2014-12-22 2019-01-29 Univesity Of Massachusetts Cas9-DNA targeting unit chimeras
US20180155708A1 (en) * 2015-01-08 2018-06-07 President And Fellows Of Harvard College Split Cas9 Proteins
KR102598856B1 (en) 2015-03-03 2023-11-07 더 제너럴 하스피탈 코포레이션 Engineered CRISPR-Cas9 nuclease with altered PAM specificity
US20180291372A1 (en) * 2015-05-14 2018-10-11 Massachusetts Institute Of Technology Self-targeting genome editing system
EP3322804B1 (en) * 2015-07-15 2021-09-01 Rutgers, The State University of New Jersey Nuclease-independent targeted gene editing platform and uses thereof
US9926546B2 (en) * 2015-08-28 2018-03-27 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases
AU2016316845B2 (en) 2015-08-28 2022-03-10 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases
US12043852B2 (en) * 2015-10-23 2024-07-23 President And Fellows Of Harvard College Evolved Cas9 proteins for gene editing
US20190093128A1 (en) * 2016-03-31 2019-03-28 The Regents Of The University Of California Methods for genome editing in zygotes
WO2017189308A1 (en) * 2016-04-19 2017-11-02 The Broad Institute Inc. Novel crispr enzymes and systems
KR20250103795A (en) 2016-08-03 2025-07-07 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 Adenosine nucleobase editors and uses thereof
WO2018035387A1 (en) 2016-08-17 2018-02-22 The Broad Institute, Inc. Novel crispr enzymes and systems
KR20190127797A (en) 2017-03-10 2019-11-13 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 Cytosine to Guanine Base Editing Agent
CA3057192A1 (en) 2017-03-23 2018-09-27 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable dna binding proteins
EP3612551B1 (en) 2017-04-21 2024-09-04 The General Hospital Corporation Variants of cpf1 (cas12a) with altered pam specificity
WO2018218188A2 (en) 2017-05-25 2018-11-29 The General Hospital Corporation Base editors with improved precision and specificity
EP3672612A4 (en) 2017-08-23 2021-09-29 The General Hospital Corporation MANIPULATED CRISPR-CAS9-NUCLEASES WITH CHANGED PAM SPECIFICITY
WO2020077138A2 (en) 2018-10-10 2020-04-16 The General Hospital Corporation Selective curbing of unwanted rna editing (secure) dna base editor variants
WO2020163396A1 (en) 2019-02-04 2020-08-13 The General Hospital Corporation Adenine dna base editor variants with reduced off-target rna editing
US20220290121A1 (en) 2019-08-30 2022-09-15 The General Hospital Corporation Combinatorial Adenine and Cytosine DNA Base Editors
WO2021042047A1 (en) 2019-08-30 2021-03-04 The General Hospital Corporation C-to-g transversion dna base editors
EP4069282A4 (en) 2019-12-06 2023-11-08 The General Hospital Corporation FRACTIONATED DEAMINASE BASE EDITORS

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016103233A2 (en) * 2014-12-24 2016-06-30 Dana-Farber Cancer Institute, Inc. Systems and methods for genome modification and regulation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Singh et al., Curr. Protein Pept. Sci. 18:1-11, 2017 (Year: 2017) *
Stier et al., PLoS ONE 8:e79003, 2013, 10 pages (Year: 2013) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11326157B2 (en) 2017-05-25 2022-05-10 The General Hospital Corporation Base editors with improved precision and specificity
US11834686B2 (en) * 2018-08-23 2023-12-05 Sangamo Therapeutics, Inc. Engineered target specific base editors
US11946040B2 (en) 2019-02-04 2024-04-02 The General Hospital Corporation Adenine DNA base editor variants with reduced off-target RNA editing
WO2022067122A1 (en) * 2020-09-25 2022-03-31 Sangamo Therapeutics, Inc. Zinc finger fusion proteins for nucleobase editing

Also Published As

Publication number Publication date
EP3630198A1 (en) 2020-04-08
AU2018272067B2 (en) 2024-07-18
JP7324713B2 (en) 2023-08-10
WO2018218188A2 (en) 2018-11-29
JP2020521446A (en) 2020-07-27
WO2018218206A1 (en) 2018-11-29
US20200140842A1 (en) 2020-05-07
EP3630849A4 (en) 2021-01-13
AU2018273986A1 (en) 2019-12-12
EP3630198A4 (en) 2021-04-21
CA3063733A1 (en) 2018-11-29
AU2024227270A1 (en) 2024-10-31
US20220275356A1 (en) 2022-09-01
US20200172885A1 (en) 2020-06-04
AU2018273968A1 (en) 2019-11-28
EP3630849A1 (en) 2020-04-08
JP2023113672A (en) 2023-08-16
CN110959040A (en) 2020-04-03
CN110959040B (en) 2024-11-12
JP2020521451A (en) 2020-07-27
JP2023126956A (en) 2023-09-12
JP2020521454A (en) 2020-07-27
CN111093714A (en) 2020-05-01
US11326157B2 (en) 2022-05-10
AU2018272067A1 (en) 2019-11-28
CA3064828A1 (en) 2018-11-29
WO2018218188A3 (en) 2019-01-03
WO2018218166A1 (en) 2018-11-29
CN110997728A (en) 2020-04-10
CA3063449A1 (en) 2018-11-29
EP3630970A2 (en) 2020-04-08
EP3630970A4 (en) 2020-12-30

Similar Documents

Publication Publication Date Title
US20200172895A1 (en) Using split deaminases to limit unwanted off-target base editor deamination
US11946040B2 (en) Adenine DNA base editor variants with reduced off-target RNA editing
AU2023208113B2 (en) Variants of CRISPR from Prevotella and Francisella 1 (Cpf1)
US10633642B2 (en) Engineered CRISPR-Cas9 nucleases
US11591589B2 (en) Variants of Cpf1 (Cas12a) with altered PAM specificity
EP4021945A2 (en) Combinatorial adenine and cytosine dna base editors
EP4022053A1 (en) C-to-g transversion dna base editors
US20170058271A1 (en) Engineered CRISPR-Cas9 Nucleases
WO2020077138A2 (en) Selective curbing of unwanted rna editing (secure) dna base editor variants
US20230024833A1 (en) Split deaminase base editors
WO2024086845A2 (en) Engineered casphi2 nucleases
BASE Adenine Dna Base Editor Variants With Reduced Off-target Rna Editing

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: THE GENERAL HOSPITAL CORPORATION, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JOUNG, J. KEITH;ANGSTMAN, JAMES;SIGNING DATES FROM 20210722 TO 20211007;REEL/FRAME:059799/0371

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION