[go: up one dir, main page]

WO2025049563A1 - A programmable gene correction technology using cas-polymerase constructs - Google Patents

A programmable gene correction technology using cas-polymerase constructs Download PDF

Info

Publication number
WO2025049563A1
WO2025049563A1 PCT/US2024/044167 US2024044167W WO2025049563A1 WO 2025049563 A1 WO2025049563 A1 WO 2025049563A1 US 2024044167 W US2024044167 W US 2024044167W WO 2025049563 A1 WO2025049563 A1 WO 2025049563A1
Authority
WO
WIPO (PCT)
Prior art keywords
cas9
dna
dna polymerase
mutation
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/044167
Other languages
French (fr)
Inventor
Piyush K. Jain
Long T. Nguyen
Noah RAKESTRAW
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Florida
University of Florida Research Foundation Inc
Original Assignee
University of Florida
University of Florida Research Foundation Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Florida, University of Florida Research Foundation Inc filed Critical University of Florida
Publication of WO2025049563A1 publication Critical patent/WO2025049563A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • C12N9/1252DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • C12N9/222Clustered regularly interspaced short palindromic repeats [CRISPR]-associated [CAS] enzymes
    • C12N9/226Class 2 CAS enzyme complex, e.g. single CAS protein
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • CRISPR/Cas clustered regularly interspaced short palindromic repeats/CRISPR-associated
  • CRISPR-based diagnostics have elevated nucleic acid detection in terms of sensitivity, specificity, and rapidness.
  • CRISPR/Cas technology works by introducing a CRISPR associated (Cas) nuclease and a short guide RNA sequence that has a region complimentary to a target sequence/site and acts as a guide by binding with Cas and directing the guide RNA/Cas complex to a target site.
  • This complex then acts as molecular scissors to cut the target sequence at a specific site creating doublestranded cuts in the target DNA or a single-stranded cut in the target RNA.
  • This specific target recognition and cleavage is also referred to as “cis-cleavage”.
  • Some types of Cas proteins, once bound to a target sequence also become active for collateral, non-specific cleavage, called “trans-cleavage.” This trans-cleavage activity can be leveraged for detection technologies.
  • the disclosure in one aspect, relates to a gene modification tool that fuses a Cas protein with a polymerase.
  • This double-strand break (DSB)-independent approach utilizes a DNA-RNA chimeric prime-editor-like guide RNAs (chimeric pegRNAs or cpegRNAs) to enable programmable DNA sequence replacement or excision at endogenous human genomic sites.
  • the Cas9 and RNA portion of cpegRNA helps with targeting a gene, while the DNA portion of cpegRNA is used as a template by the polymerase to repair.
  • the disclosed method can be used to introduce large DNA modifications in the genome.
  • FIGs. 1A-1C show electrophoretic separations of various Cas9-polymerase fusion proteins indicating successful construction of Cas9-reverse transcriptase (control; FIG. 1A); Cas9-T7 and Cas9-T4 polymerases (FIG. 1B); and Cas9-Klenow fragment DNA polymerase (FIG. 1C).
  • FIG. 2 shows a scheme for introducing eGFP and a mutated mcherry with a premature stop codon into a plasmid for testing the disclosed method.
  • FIG. 3 shows a scheme for transducing cells with the plasmid containing eGFP and mutated mcherry, selection of GFP positive cells, and using the disclosed method to remove the stop codon, allowing simultaneous expression of eGFP and mcherry in the cells.
  • FIGs. 4A-4F show successful transduction and optimization of HEK293T cells with the disclosed plasmids containing eGFP and mutated mcherry.
  • FIG. 4A 500 pL medium
  • FIG. 4B 250 pL medium
  • FIG. 4C 125 pL medium
  • FIG. 4D 62.5 pL medium
  • FIG. 4E 31.25 pL medium
  • FIG. 4F 0 pL medium (control).
  • FIGs. 5A-5B show a comparison of a standard prime editing technique (FIG. 5A) and a disclosed genome editing technique (Cas9 nickase fused with T4 DNA polymerase; FIG. 5B).
  • FIG. 6 shows a comparison of a standard prime editing technique and a disclosed genome editing technique (Cas9 nickase fused with DNA polymerases). The percentage of gene modification is quantified by expression of mCherry.
  • FIGs. 7A-7E show an initial screening for activity of CODE candidates.
  • FIG. 7A Schematic of the development of CODEs. CODEs consist of a nCas9-DNAP fusion protein and a chimeric pegRNA (cpeg) containing a guide RNA and ssDNA template with intended edits and primer binding site (PBS).
  • FIG. 7B Architecture of bacterial expression plasmids of CODEs. The editor expression is driven by T7 promoter, and 6x Histidine tag is located at the C-terminus is employed for purification purposes.
  • FIG. 7C Construction of HEK293T reporter cell line supporting base conversion via prime editing or CODE.
  • FIG. 7D Schematic of the workflow for nucleofection of CODEs and cpegRNA into HEK293T reporter cell line.
  • FIGs. 8A-8G show engineering of T4 and Bst chimeric oligonucleotide-directed editors for improved editing.
  • FIG. 8A Architecture of engineered CODE-T4 editors with domain rearrangement strategies.
  • FIG. 8B Percentage of mCherry activation of the CODE-T4 variants in (FIG. 8A).
  • FIG. 8C Engineering attempts to alter the T4 DNAP processivity and fidelity to create improved CODE-T4 variants beneficial mutations.
  • FIGs. 8D-8E Optimization of amino acid linker length between nCas9 and T4 DNAP in the fusion construct.
  • FIG. 8F Engineering attempts to alter the Bst-LF DNAP thermostability to create improved CODE-Bst variants with beneficial mutations.
  • FIGs. 9A-9G show I in-house synthesis of cpegRNA ligation reaction.
  • FIG. 9A Schematic of the T4 RNA Ligase l-mediated cpegRNA synthesis.
  • FIG. 9B Representative of denaturing gel showing successful ligation of sgRNA and ssDNA oligo to generate cpegRNA that targets mCherry gene.
  • FIG. 9C Visualization of HEK293T cells by fluorescence microscopy showing the mCherry activation by PE2 and engineered CODE-Bst variants with ligated cpegRNA. Cells were transfected with prime editors and CODEs 72 hours prior to imaging.
  • FIG. 9A Schematic of the T4 RNA Ligase l-mediated cpegRNA synthesis.
  • FIG. 9B Representative of denaturing gel showing successful ligation of sgRNA and ssDNA oligo to generate cpegRNA that targets mCherry gene.
  • FIG. 9D Quantification of mCherry activation in (FIG. 9C) via flow cytometry.
  • FIG. 9E Schematic of the workflow for transfection of plasmid encoding human codon-optimized CODE and synthetic or ligated cpegRNA.
  • FIGs. 10A-10H show efficient chimeric oligonucleotide-directed editing of endogenous gene loci with CODEMax and CODEMax(exo+).
  • FIG. 10A Architecture of plasmid encoding CODEMax and CODEMax(exo+).
  • FIG. 10B Alphafold3 predicted structure of the CODEMax(exo+) in complex with cpegRNA and target dsDNA36.
  • FIGs. 11A-11 F show general optimization of methods for chimeric oligonucleotide- directed editors.
  • FIGs. 11A-11B Effect of the volume of TranslT-X2 used during co-transfection of CODE encoding plasmid and cpegRNA at two different loci.
  • FIG. 11C Effect of CODE encoding plasmid amount on editing efficiency.
  • FIGs. 11 D-11E Effect of the total RNP complex on editing efficiency via transfection. The ratio of cpegRNA to CODE was constant at 2:1.
  • FIGs. 12A-12B show architecture of chimeric oligonucleotide-directed editors with domain organization.
  • AlphaFold 3 predicted structure of (FIG. 12A) CODEMax and (FIG. 12B) CODEMax(exo+) in complex with cpegRNA and target dsDNAI .
  • cpegRNA modeled incorporated a C to T substitution at position +1 downstream of the nick site at the HEK3 locus.
  • FIGs. 13A-13B show efficiency of prime editors and chimeric oligonucleotide-directed editors in different cell types.
  • FIGs. 14A-14D show Additional target sites for head-to-head comparison between prime editors and chimeric oligonucleotide-directed editors.
  • FIG. 15 shows representative allele plot for insertion of Hindlll at HEK3 locus. Allele plots with efficiencies and read counts were output by CRISPResso2.
  • FIGs. 16A-16C show a comparison of synthetic and pU6-promoter driven pegRNA delivery.
  • FIGs. 16A-16B Head-to-head comparison of synthetic pegRNA and pU6-promoter driven pegRNA expression for edits at two endogenous loci for +1 TCA insertion at DNMT1, +2 substitution at SRD5A3, and +2 G>T at SNCA, respectively.
  • FIGs. 17A-17C show quantification of scaffold insertion for prime editors and chimeric oligonucleotide-directed editors for select edits.
  • FIGs. 17A-17C Percent scaffold insertion quantified with CRISPResso2 at the HEK3 locus for +1 5bp deletion and +1 Hindi 11 insertion, and SNCA +2 G>C, respectively.
  • FIGs. 18A-18D show quantification of imprecise edits for prime editors and chimeric oligonucleotide-directed editors at select loci.
  • Imprecise edit percentage and imprecise edit type as quantified by CRISPResso2 plotted against base position relative to nick site.
  • Prime editing has gained prominence as a highly effective genome editing tool due to its precision and versatility. Pioneered by Anzalone et al, the technology can correct all 12 possible single-base substitutions and make small insertions or deletions without generating doublestranded breaks.
  • Prime editing typically uses a nickase Cas9-reverse transcriptase (nCas9-RT) fusion protein (prime editor) and a prime editing guide RNA (pegRNA), which has a 3’ extension containing a primer binding site (PBS) and reverse transcription template (RTT) encoding the desired edit.
  • PBS primer binding site
  • RTT reverse transcription template
  • the 5’ spacer region of the pegRNA directs the prime editor to the target site, resulting in nicking of the non-target strand by nCas9. Hybridization of the 3’ PBS to the nicked strand then generates an initiation site for reverse transcription to occur. Subsequently, the correct edits are installed into a 3’ edited
  • Prime editing has been iteratively improved through many rounds at the level of both the prime editor and pegRNA, the system still suffers from variable efficiency across genomic loci.
  • prime editing has largely made use of Moloney Murine Leukemia Virus reverse transcriptase (MMLV RT), which suffers from several drawbacks that decrease the effectiveness of CRISPR/Cas associated genome editing.
  • Editing efficiency is typically less than 20% in immortalized cell lines and even lower in other cell types. Editing efficiency can further vary a great deal across target sequences and cell types, making broad application difficult and/or unpredictable. Base substitution efficiency with MMLV RT is still quite low, and off-target mutations are relatively common. Multiple versions of prime editors have been engineered to improve this efficiency.
  • Prime editing guide RNA pegRNA
  • sgRNA single guide RNA
  • the prime editor after the prime editor generates a 3’ edited flap, it competes with the 5’ unedited flap for incorporation into the genome, reducing the likelihood of a successful edit.
  • the pegRNA contains a PBS that is complementary to the spacer sequence, it forms a stable RNA-RNA duplex, which creates a barrier for ribonucleoprotein complexation between the pegRNA and the prime editor. Recent efforts have been made to circumvent the auto-inhibitory interaction within the pegRNA by optimizing the melting temperature of the PBS and by incorporating mismatches into the PBS to reduce misfolded pegRNA interactions which leads to improved prime editing efficiency.
  • Prime editing can make versatile edits without induction of double-strand breaks in the DNA, the accuracy of prime editors still has room for improvement.
  • One of the most common imprecise edits observed for prime editing is an overextension of the reverse transcriptase past the RTT into the scaffold region of the pegRNA. This readthrough becomes problematic because it results in the incorporation of undesired bases into the genome, although recent studies have shown methods to lower its occurrence by engineering highly structured regions within the pegRNA.
  • chimeric oligonucleotide-directed editing (CODE) systems consisting of a DNA-dependent DNA polymerase paired with a chimeric pegRNA (cpegRNA) containing a DNA primer binding site and a DNA polymerase template may address some of the limitations of current reverse transcriptase-based prime editors. It was hypothesized that a cpegRNA could reduce the auto-inhibitory effect observed for traditional pegRNAs, as the DNA-RNA duplex is inherently less stable than the RNA-RNA duplex.
  • DNA polymerases are an abundant family of proteins with diverse molecular properties, rendering them interesting candidates to achieve chimeric oligonucleotide-directed editing.
  • DNA polymerases with advantageous properties such as high processivity, proofreading capability, and minimal reverse transcriptase activity has the potential to improve editing efficiency, reduce unintended edits, and enable new types of edits to be performed.
  • a new class of 13 CODEs that consist of a cpegRNA and a nickase Cas9-DNA polymerase fusion protein.
  • This simple two-component system allows for delivery via plasmids or ribonucleoprotein (RNP) complexes for effective and accurate genome editing.
  • RNP ribonucleoprotein
  • CODE improved gene correction efficiency compared to conventional PE2 and PEMax at several genomic loci with low unintended scaffold incorporation.
  • engineered CODEs expand the gene editing toolbox and offer versatility as well as flexibility toward therapeutic applications.
  • the disclosed technology uses a polymerase such as, for example, T4 DNA polymerase, T7 DNA polymerase, Klenow fragment DNA polymerase, ⁇
  • a polymerase such as, for example, T4 DNA polymerase, T7 DNA polymerase, Klenow fragment DNA polymerase, ⁇
  • the disclosed technology overcomes various errors associated with the use of reverse transcriptase enzymes such as amplification of an entire guide RNA (pegRNA) and incorporation into the genome. As such, the present disclosure exhibits high efficiency without off-target effects.
  • a the present technology makes a peg
  • Cas9 is then fused with a DNA polymerase that works at a particular temperature such as, for example, 37 °C, although other optimum temperatures are contemplated and should be considered disclosed.
  • the present platform does not make use of the Cas9 endonuclease, thus avoiding the problems associated with Cas9 endonuclease promiscuity.
  • the disclosed platform technology can be further expanded to modify any genomic region, RNA sequence, or cell type.
  • DSBs double stranded breaks
  • NHEJ non-homologous end joining
  • HRD homology-directed repairs
  • HDR has low efficiency, even when a repair template is provided in excess. Repair of DSBs often leads to undesired mutations including, but not limited to, chromosomal abnormalities. What is needed is an improved version of prime editing with higher efficiency and a lower tendency towards introducing errors in the genome.
  • a system for site-specific modification of a doublestranded target DNA sequence including at least the following:
  • a fusion protein including a DNA-binding nickase protein and a DNA polymerase
  • a chimeric guide nucleic acid sequence including at least one region of complementarity to a first strand of the double-stranded target DNA sequence, a core sequence that interacts with the DNA-binding nickase protein, a template encoding a modified sequence for insertion into a second strand of the double-stranded target DNA sequence, and a primer sequence.
  • the DNA-binding nickase protein can be a Cas9 nickase, a Cas12i1 nickase, or a Cas12a nickase, such as, for example, a DNA-binding nickase derived from a Streptococcus pyogenes Cas9, a Staphylococcus aureus Cas9, a Neisseria meningitidis Cas9, a Streptococcus thermophilus Cas9, an Actinobacillus minor Cas9, an Actinobacillus pleuropneumoniae Cas9, an Actinobacillus seminis Cas9, an Actinobacillus succinogenes Cas9, a Bergeriella denitrificans Cas9, a conserveatibacter flavescens Cas9, a Gallibacterium anatis Cas9, a Haemophilus
  • Cas12 systems act like nickases with shorter guides and cleave the NT strand, and can also be used in the systems and methods disclosed herein.
  • the present systems and methods make use of a much shorter crRNA ( ⁇ 36-42 nt) and therefore can be extended with DNA bases much more easily synthetically.
  • the present systems and methods can be or are configured to work in the inverse direction in terms of binding DNA. Further in this aspect, for example, the 3' end of the crRNA binds with the target instead of the 5' end.
  • this means the present system can be multiplexed with nCas9- DP to improve the prime editing efficiency on the other strand (in the same direction) or same strand (in the other direction).
  • the chimeric guide nucleic acid sequence includes both RNA and DNA.
  • the at least one region of complementary is RNA and the core sequence is RNA.
  • the template incorporating a modified sequence for insertion is DNA.
  • the template is from about 1 to about 5000 nucleotides in length, about 2 to about 1000 nucleotides in length, or about 3 to about 500 nucleotides in length.
  • the primer sequence is RNA and forms a primer for the DNA polymerase.
  • the primer is from about 4 to about 20 nucleotides in length, or is from about 10 to about 20 nucleotides in length.
  • the DNA polymerase can be selected from E. coli DNA polymerase I (Ecol), T4 DNA polymerase, T5 DNA polymerase, T7 DNA polymerase, thioredoxin binding domain of T7 DNA polymerase (T7/Trx), Klenow fragment DNA polymerase, ⁇ 29 DNA polymerase, Bsu DNA polymerase, Bst DNA polymerase large fragment (LF), Bst DNA polymerase full length (F), Pfu DNA polymerase, Pwo DNA polymerase, Stoffel DNA polymerase, or any combination thereof.
  • Ecol E. coli DNA polymerase I
  • T4 DNA polymerase T4 DNA polymerase
  • T5 DNA polymerase T7 DNA polymerase
  • T7 DNA polymerase thioredoxin binding domain of T7 DNA polymerase
  • Klenow fragment DNA polymerase ⁇ 29 DNA polymerase
  • Bsu DNA polymerase Bst DNA polymerase large fragment (LF)
  • the T4 DNA polymerase can be located at a C-terminus or an N- terminus of the DNA-binding nickase protein and a 33 amino acid linker can connect the T4 DNA polymerase to the DNA-binding nickase protein.
  • the T4 DNA polymerase includes at least one mutation such as, for example, a Y320A mutation, an L412M mutation, an I50L mutation, a G255S mutation, or any combination thereof.
  • the at least one mutation can be addition of an sso7d DNA binding domain at a C-terminus of the T4 DNA polymerase, optionally wherein the sso7d DNA binding domain can be an E12L mutation, a K35L mutation, or both.
  • the Bst DNA polymerase LF includes at least one fusion domain.
  • the at least one fusion domain includes an actin-binding protein such as, for example, villin headpiece.
  • the fusion domain is fused to an N- terminus of the Bst DNA polymerase LF.
  • the Bst DNA polymerase LF comprises a polymerase domain mutation such as, for example, a T493N mutation, an A552G mutation, an S371 D mutation, or any combination thereof.
  • the polymerase domain mutation imparts increased thermostability relative to a wild type Bst DNA polymerase LF.
  • the villin headpiece includes at least one additional mutation such as, for example, N31 R, N39K, E43K, A20K, or any combination thereof.
  • the DNA polymerase can be optimized to operate at a temperature of from about 30 °C to about 40 °C, or at about 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, or about 40
  • the modified sequence for insertion includes at least one mutation such as, for example, an insertion, a deletion, a base substitution, or any combination thereof.
  • a method for site-specific modification of a doublestranded target DNA sequence in a cell including contacting the double-stranded target DNA sequence with the disclosed system, wherein the core sequence binds the DNA-binding nickase protein; wherein the DNA-binding nickase protein nicks the second strand of the double-stranded target DNA sequence to form a free 3' end; wherein the primer sequence anneals with a complementary region on the second strand of the double-stranded target DNA; and wherein the DNA polymerase synthesizes a single strand of DNA encoded by the template from the free 3' end.
  • the cell in a further aspect, in the disclosed method, can be a prokaryotic cell or a eukaryotic cell such as, for example, a mammalian cell, and can be a dividing cell or a non-dividing cell.
  • the system is introduced to the cell using nucleofection (electroporation) or another transfection method.
  • performing the method results in less than 10% off-target editing in the cell’s genome.
  • the fusion protein and the chimeric guide nucleic acid are present in amounts effective to modify genomic DNA in the cell.
  • the method does not introduce double strand breaks into cellular DNA.
  • performing the method results in incorporation of the modified sequence in at least 25% of a population of cells contacted with the system.
  • Also disclosed herein is a cell including at least one genomic modification introduced by the disclosed method.
  • fusion protein including at least the following:
  • the DNA-binding nickase protein can be a Cas9 nickase, a Cas12i1 nickase, a Cas12a nickase, an ortholog thereof, or any combination thereof, and can be selected from nickases such as, for example, a Streptococcus pyogenes Cas9, a Staphylococcus aureus Cas9, a Neisseria meningitidis Cas9, a Streptococcus thermophilus Cas9, an Actinobacillus minorCasQ, an Actinobacillus pleuropneumoniae Cas9, an Actinobacillus seminis Cas9, an Actino bad I I us succinogenes Cas9, a Bergeriella denitrificans Cas9, a conserveatibacter flavescens Cas9, a Gallibacterium anatis Cas9, a Ha
  • the DNA polymerase can be selected from E. coli DNA polymerase I (Ecol), T4 DNA polymerase, T5 DNA polymerase, T7 DNA polymerase, thioredoxin binding domain of T7 DNA polymerase (T7/Trx), Klenow fragment DNA polymerase, 29 DNA polymerase, Bsu DNA polymerase, Bst DNA polymerase large fragment (LF), Bst DNA polymerase full length (F), Pfu DNA polymerase, Pwo DNA polymerase, Stoffel DNA polymerase, or any combination thereof.
  • Ecol E. coli DNA polymerase I
  • T4 DNA polymerase T4 DNA polymerase
  • T5 DNA polymerase T7 DNA polymerase
  • T7 DNA polymerase thioredoxin binding domain of T7 DNA polymerase
  • Klenow fragment DNA polymerase 29 DNA polymerase
  • Bsu DNA polymerase Bst DNA polymerase large fragment (LF)
  • Bst DNA polymerase full length F
  • the T4 DNA polymerase can be located at a C-terminus or an N- terminus of the DNA-binding nickase protein and a 33 amino acid linker can connect the T4 DNA polymerase to the DNA-binding nickase protein.
  • the T4 DNA polymerase includes at least one mutation such as, for example, a Y320A mutation, an L412M mutation, an I50L mutation, a G255S mutation, or any combination thereof.
  • the at least one mutation can be addition of an sso7d DNA binding domain at a C-terminus of the T4 DNA polymerase, optionally wherein the sso7d DNA binding domain can be an E12L mutation, a K35L mutation, or both.
  • the Bst DNA polymerase LF includes at least one fusion domain.
  • the at least one fusion domain includes an actin-binding protein such as, for example, villin headpiece.
  • the fusion domain is fused to an N- terminus of the Bst DNA polymerase LF.
  • the Bst DNA polymerase LF comprises a polymerase domain mutation such as, for example, a T493N mutation, an A552G mutation, an S371 D mutation, or any combination thereof.
  • the polymerase domain mutation imparts increased thermostability relative to a wild type Bst DNA polymerase LF.
  • the villin headpiece includes at least one additional mutation such as, for example, N31 R, N39K, E43K, A20K, or any combination thereof.
  • ratios, concentrations, amounts, and other numerical data can be expressed herein in a range format. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. It is also understood that there are a number of values disclosed herein, and that each value is also herein disclosed as “about” that particular value in addition to the value itself. For example, if the value “10” is disclosed, then “about 10” is also disclosed. Ranges can be expressed herein as from “about” one particular value, and/or to “about” another particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms a further aspect. For example, if the value “about 10” is disclosed, then “10” is also disclosed.
  • a further aspect includes from the one particular value and/or to the other particular value.
  • ranges excluding either or both of those included limits are also included in the disclosure, e.g. the phrase “x to y” includes the range from ‘x’ to ‘y’ as well as the range greater than ‘x’ and less than ‘y’.
  • the range can also be expressed as an upper limit, e.g.
  • a numerical range of “about 0.1% to 5%” should be interpreted to include not only the explicitly recited values of about 0.1 % to about 5%, but also include individual values (e.g., about 1 %, about 2%, about 3%, and about 4%) and the sub-ranges (e.g., about 0.5% to about 1.1%; about 5% to about 2.4%; about 0.5% to about 3.2%, and about 0.5% to about 4.4%, and other possible sub-ranges) within the indicated range.
  • the terms “about,” “approximate,” “at or about,” and “substantially” mean that the amount or value in question can be the exact value or a value that provides equivalent results or effects as recited in the claims or taught herein. That is, it is understood that amounts, sizes, formulations, parameters, and other quantities and characteristics are not and need not be exact, but may be approximate and/or larger or smaller, as desired, reflecting tolerances, conversion factors, rounding off, measurement error and the like, and other factors known to those of skill in the art such that equivalent results or effects are obtained. In some circumstances, the value that provides equivalent results or effects cannot be reasonably determined.
  • nucleic acid can be used interchangeably herein and can generally refer to a string of at least two base-sugar-phosphate combinations and refers to, among others, single-and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and doublestranded regions.
  • polynucleotide as used herein can refer to triple-stranded regions comprising RNA or DNA or both RNA and DNA.
  • the strands in such regions can be from the same molecule or from different molecules.
  • the regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules.
  • One of the molecules of a triple-helical region often is an oligonucleotide.
  • Polynucleotide” and “nucleic acids” also encompasses such chemically, enzymatically or metabolically modified forms of polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including simple and complex cells, inter alia.
  • polynucleotide as used herein can include DNAs or RNAs as described herein that contain one or more modified bases.
  • DNAs or RNAs including unusual bases, such as inosine, or modified bases, such as tritylated bases, to name just two examples are polynucleotides as the term is used herein.
  • Polynucleotide”, “nucleotide sequences” and “nucleic acids” also includes PNAs (peptide nucleic acids), phosphorothioates, and other variants of the phosphate backbone of native nucleic acids. Natural nucleic acids have a phosphate backbone, artificial nucleic acids can contain other types of backbones, but contain the same bases.
  • nucleic acids or RNAs with backbones modified for stability or for other reasons are “nucleic acids” or “polynucleotides” as that term is intended herein.
  • nucleic acid sequence and “oligonucleotide” also encompasses a nucleic acid and polynucleotide as defined elsewhere herein.
  • deoxyribonucleic acid (DNA) and “ribonucleic acid (RNA)” can generally refer to any polyribonucleotide or polydeoxyribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA.
  • RNA can be in the form of non-coding RNA such as tRNA (transfer RNA), snRNA (small nuclear RNA), rRNA (ribosomal RNA), anti-sense RNA, RNAi (RNA interference construct), siRNA (short interfering RNA), microRNA (miRNA), or ribozymes, aptamers, guide RNA (gRNA), CRISPR RNA (crRNA), Trans-activating crRNA (tracrRNA), or coding mRNA ( messenger RNA).
  • tRNA transfer RNA
  • snRNA small nuclear RNA
  • rRNA ribosomal RNA
  • anti-sense RNA RNAi (RNA interference construct)
  • siRNA short interfering RNA
  • microRNA microRNA
  • ribozymes aptamers
  • aptamers guide RNA (gRNA), CRISPR RNA (crRNA), Trans-activating crRNA (tracrRNA), or coding mRNA ( messenger RNA).
  • gRNA guide
  • cDNA refers to a DNA sequence that is complementary to an RNA transcript in a cell. It is a man-made molecule. Typically, cDNA is made in vitro by an enzyme called reverse-transcriptase using RNA transcripts as templates.
  • gene can refer to a hereditary unit corresponding to a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a characteristic(s) or trait(s) in an organism.
  • the term gene can refer to translated and/or untranslated regions of a genome.
  • Gene can refer to the specific sequence of DNA that is transcribed into an RNA transcript that can be translated into a polypeptide or be a catalytic RNA molecule, including but not limited to, tRNA, siRNA, piRNA, miRNA, long-non-coding RNA and shRNA.
  • corresponding to refers to the underlying biological relationship between these different molecules.
  • operatively “corresponding to” can direct them to determine the possible underlying and/or resulting sequences of other molecules given the sequence of any other molecule which has a similar biological relationship with these molecules. For example, from a DNA sequence an RNA sequence can be determined and from an RNA sequence a cDNA sequence can be determined.
  • exogenous DNA or “exogenous nucleic acid sequence” or “exogenous polynucleotide” refers to a nucleic acid sequence that was introduced into a cell, organism, or organelle via transfection.
  • Exogenous nucleic acids originate from an external source, for instance, the exogenous nucleic acid may be from another cell or organism and/or it may be synthetic and/or recombinant. While an exogenous nucleic acid sometimes originates from a different organism or species, it may also originate from the same species (e.g., an extra copy or recombinant form of a nucleic acid that is introduced into a cell or organism in addition to or as a replacement for the naturally occurring nucleic acid). Typically, the introduced exogenous sequence is a recombinant sequence.
  • isolated means separated from constituents, cellular and otherwise, in which the polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, are normally associated with in nature.
  • variant can refer to a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide, but retains essential and/or characteristic properties (structural and/or functional) of the reference polynucleotide or polypeptide.
  • a typical variant of a polypeptide differs in amino acid sequence from another, reference polypeptide. The differences can be limited so that the sequences of the reference polypeptide and the variant are closely similar overall and, in many regions, identical.
  • a variant and reference polypeptide may differ in nucleic or amino acid sequence by one or more modifications at the sequence level or post- transcriptional or post-translational modifications (e.g., substitutions, additions, deletions, methylation, glycosylations, etc.).
  • a substituted nucleic acid may or may not be an unmodified nucleic acid of adenine, thiamine, guanine, cytosine, uracil, including any chemically, enzymatically or metabolically modified forms of these or other nucleotides.
  • a substituted amino acid residue may or may not be one encoded by the genetic code.
  • a variant of a polypeptide may be naturally occurring such as an allelic variant, or it may be a variant that is not known to occur naturally. “Variant” includes functional and structural variants.
  • gene refers to a hereditary unit corresponding to a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a characteristic(s) or trait(s) in an organism.
  • synthetic gene can refer to a recombinant gene comprising one or more coding sequences for a protein of interest, or a synthetically purified protein that is not naturally occurring in its purified state.
  • guide polynucleotide As used herein, the terms “guide polynucleotide,” “guide sequence,” or “guide RNA” (gRNA or sgRNA) as can refer to any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequencespecific binding of a CRISPR complex to the target sequence.
  • the degree of complementarity between a guide polynucleotide and its corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows- Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
  • any suitable algorithm for aligning sequences include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows- Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available
  • a guide polynucleotide (also referred to herein as a guide sequence and includes single guide sequences (sgRNA)) can be about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, 90, 100, 110, 112, 115, 120, 130, 140, or more nucleotides in length.
  • the guide polynucleotide can include a nucleotide sequence that is complementary to a target DNA sequence. This portion of the guide sequence can be referred to as the complementary region of the guide RNA or the CRISPR RNA (crRNA).
  • crRNA/tracrRNA can also work with the disclosed approach. Further in this aspect, since crRNA is shorter, it may be easier to incorporate the desired DNA modifications to the crRNAs by ligation or synthesis compared to incorporation into sgRNAs. In a further aspect, and without wishing to be bound by theory, tracrRNAs are generally universal and work with any sequence of crRNAs and so the crRNA/tracrRNA system may be more economical for use.
  • the guide sequence can also include one or more miRNA target sequences coupled to the 3’ end of the guide sequence.
  • the guide sequence can include one or more MS2 RNA aptamers incorporated within the portion of the guide strand that is not the complementary portion.
  • guide sequence can include any specially modified guide sequences, including but not limited to those configured for use in synergistic activation mediator (SAM) implemented CRISPR or suppression.
  • SAM synergistic activation mediator
  • a guide polynucleotide can be less than about 150, 125, 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide polynucleotide to direct sequencespecific binding of a CRISPR complex to a target sequence may be assessed by any suitable assay.
  • the components of a CRISPR system sufficient to form a CRISPR complex may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the CRISPR sequence, followed by an assessment of preferential cleavage within the target sequence.
  • cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a CRISPR complex, including the guide polynucleotide to be tested and a control guide polynucleotide different from the test guide polynucleotide, and comparing binding or rate of cleavage at the target sequence between the test and control guide polynucleotide reactions.
  • Other assays are possible, and will occur to those skilled in the art.
  • polypeptides or “proteins” refers to amino acid residue sequences. Those sequences are written left to right in the direction from the amino to the carboxy terminus. In accordance with standard nomenclature, amino acid residue sequences are denominated by either a three letter or a single letter code as indicated as follows: Alanine (Ala, A), Arginine (Arg, R), Asparagine (Asn, N), Aspartic Acid (Asp, D), Cysteine (Cys, C), Glutamine (Gin, Q), Glutamic Acid (Glu, E), Glycine (Gly, G), Histidine (His, H), Isoleucine (lie, I), Leucine (Leu, L), Lysine (Lys, K), Methionine (Met, M), Phenylalanine (Phe, F), Proline (Pro, P), Serine (Ser, S), Threonine (Thr, T), Tryptophan (Tr
  • Protein and “Polypeptide” can refer to a molecule composed of one or more chains of amino acids in a specific order.
  • the term protein is used interchangeable with “polypeptide.” The order is determined by the base sequence of nucleotides in the gene coding for the protein. Proteins can be involved in the structure, function, and regulation of various functions.
  • identity is a relationship between two or more polypeptide or polynucleotide sequences, as determined by comparing the sequences. In the art, “identity” also refers to the degree of sequence relatedness between polypeptide as determined by the match between strings of such sequences. “Identity” can be readily calculated by known methods, including, but not limited to, those described in Computational Molecular Biology, Lesk, A. M., Ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D. I/!/., Ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H.
  • heterologous refers to compounds, molecules, nucleotide sequences (including genes), and polypeptide sequences (including peptides and proteins) that are different in both activity (function) and sequence or chemical structure.
  • heterologous can also refer to a gene or gene product that is from a different organism. For example, a human GTP cyclohydrolase or a synthase can be said to be heterologous when expressed in yeast.
  • homolog refers to a polypeptide sequence that shares a threshold level of similarity and/or identity as determined by alignment of matching amino acids. Two or more polypeptides determined to be homologs are said to be homologs. Homology is a qualitative term that describes the relationship between polypeptide sequences that is based upon the quantitative similarity.
  • paralog refers to a homolog produced via gene duplication of a gene.
  • paralogs are homologs that result from divergent evolution from a common ancestral gene.
  • orthologs refers to homologs produced by speciation followed by divergence of sequence but not activity in separate species. When speciation follows duplication and one homolog sorts with one species and the other copy sorts with the other species, subsequent divergence of the duplicated sequence is associated with one or the other species. Such species specific homologs are referred to herein as orthologs.
  • similarity is a quantitative term that defines the degree of sequence match between two compared polypeptide sequences.
  • organism refers to any living entity comprised of at least one cell.
  • a living organism can be as simple as, for example, a single isolated eukaryotic cell or cultured cell or cell line, or as complex as a mammal, including a human being, and animals (e.g., vertebrates, amphibians, fish, mammals, e.g., cats, dogs, horses, pigs, cows, sheep, rodents, rabbits, squirrels, bears, primates (e.g., chimpanzees, gorillas, and humans).
  • animals e.g., vertebrates, amphibians, fish, mammals, e.g., cats, dogs, horses, pigs, cows, sheep, rodents, rabbits, squirrels, bears, primates (e.g., chimpanzees, gorillas, and humans).
  • the term “recombinant” or “engineered” can generally refer to a non- naturally occurring nucleic acid, nucleic acid construct, or polypeptide.
  • Such non-naturally occurring nucleic acids may include natural nucleic acids that have been modified, for example that have deletions, substitutions, inversions, insertions, etc., and/or combinations of nucleic acid sequences of different origin that are joined using molecular biology technologies (e.g., a nucleic acid sequences encoding a fusion protein (e.g., a protein or polypeptide formed from the combination of two different proteins or protein fragments), the combination of a nucleic acid encoding a polypeptide to a promoter sequence, where the coding sequence and promoter sequence are from different sources or otherwise do not typically occur together naturally (e.g., a nucleic acid and a constitutive promoter), etc.
  • Recombinant or engineered can also refer to the polypeptide encoded by the recombinant nucle
  • cell As used herein, “cell,” “cell line,” and “cell culture” include progeny. It is also understood that all progeny may not be precisely identical in DNA content, due to deliberate or inadvertent mutations. Variant progeny that have the same function or biological property, as screened for in the originally transformed cell, are included.
  • culturing refers to maintaining cells under conditions in which they can proliferate and avoid senescence as a group of cells. “Culturing” can also include conditions in which the cells also or alternatively differentiate.
  • the term “specific binding” or “preferential binding” can refer to non- covalent physical association of a first and a second moiety wherein the association between the first and second moieties is at least 2 times as strong, at least 5 times as strong as, at least 10 times as strong as, at least 50 times as strong as, at least 100 times as strong as, or stronger than the association of either moiety with most or all other moieties present in the environment in which binding occurs.
  • Binding of two or more entities may be considered specific if the equilibrium dissociation constant, Kd, is 10“ 3 M or less, 10“ 4 M or less, 10“ 5 M or less, 10“ 6 M or less, 10“ 7 M or less, 10’ 8 M or less, 10“ 9 M or less, 10’ 10 M or less, 10“ 11 M or less, or 10’ 12 M or less under the conditions employed, e.g., under physiological conditions such as those inside a cell or consistent with cell survival.
  • specific binding can be accomplished by a plurality of weaker interactions (e.g., a plurality of individual interactions, wherein each individual interaction is characterized by a K d of greater than 10" 3 M).
  • specific binding which can be referred to as “molecular recognition,” is a saturable binding interaction between two entities that is dependent on complementary orientation of functional groups on each entity.
  • specific binding interactions include primer-polynucleotide interaction, aptamer-aptamer target interactions, antibody-antigen interactions, avidin-biotin interactions, ligand-receptor interactions, metal-chelate interactions, hybridization between complementary nucleic acids, etc.
  • atmospheres referred to herein are based on atmospheric pressure (i.e. one atmosphere) and temperatures are ambient.
  • Cas9-polymerase fusion proteins were constructed using techniques known in the art and purified using either Ni-NTA agarose or cation exchange resins, then separated by electrophoresis to determine size of the constructs and successful fusion (see FIGs. 1A-1C). Molecular weights of selected fusion proteins are shown in Table 1 :
  • FIGs. 5A-5B show a comparison of a standard prime editing technique (FIG. 5A) and a disclosed genome editing technique (Cas9 nickase fused with T4 DNA polymerase; FIG. 5B). Quantification of editing efficiencies for 6 different DNA polymerases fused to Cas9 are shown in FIG. 6.
  • nCas9 fusion proteins paired with a variety of wild-type polymerases from viral or bacterial origins (FIG. 7A) were first constructed. A total of 13 DNAPs with diverse properties such as thermostability, proofreading activity, processivity, and size were selected and screened (FIG. 7B). Each construct was expressed in E. coli and purified for ribonucleoprotein delivery.
  • HEK293T-based reporter cell line containing an open reading frame with a green fluorescence protein (GFP) upstream of a red fluorescence protein (mCherry) was generated.
  • GFP green fluorescence protein
  • mCherry red fluorescence protein
  • PTC premature termination codon
  • a cpegRNA was designed to target the constitutively expressed GFP-mCherry reporter gene containing the PTC.
  • the cpegRNA is comprised of a 20-nt RNA targeting sequence, a guide RNA scaffold, and a 3’- end DNA extension sequence containing a primer binding site (PBS) and a DNAP template encoding the desired corrections (FIG. 7C).
  • PBS primer binding site
  • FIG. 7C DNAP template encoding the desired corrections
  • Each CODE candidate was complexed with the cpegRNA prior to delivery into HEK293T cells via nucleofection and edit efficiency was quantified by the percentage of mCherry positive cells (FIG. 7D).
  • T4 DNAP and Bst, large fragment DNAP were selected.
  • T4 DNAP is a mesophilic polymerase that possesses strong 3’-5’ exonuclease activity.
  • How the T4 DNAP location impacted the CODE-T4 editing efficiency was studied by positioning the T4 DNAP either on the C-terminus (CODE-T4v1) or the N-terminus (CODE-T4v2) of nCas9 connected by a 33-amino acid linker (FIG. 8A). Not much difference was observed in the percentage of mCherry activation between the N-terminal and C-terminal fusion constructs.
  • FIG. 8B Next, how the 3’-5’ exonuclease activity of T4 DNAP affected the performance of CODE (FIG. 8B) was investigated.
  • a Y320A mutation was installed on T4 DNAP of C0DE-T4v1 editor (CODE-T4v3), which has been shown to diminish the polymerase exonuclease activity by 50-fold. Notably, this single mutation increased the efficiency of mCherry activation 2.4-fold compared to the wild-type editor (FIG. 8B).
  • a T4 Gene 32 Protein (gp32) was installed on the N-terminus of T4 DNA polymerase to generate a CODE-T4v4 editor (FIGs. 8A- 8B).
  • T4 gp32 is a single-stranded binding protein (SSB) that is crucial for T4 replication and repair.
  • SSB single-stranded binding protein
  • reduced activity was observed compared to CODE-T4v3, possibly due to gp32 SSB being inactive in a fusion format or being sterically hindered by nCas9 and T4 DNAP.
  • a sso7d DNA binding domain was then inserted at the C-terminus T4 DNAP of CODE- T4v1 and CODE-T4v3 to create CODE-T4v5 and CODE-T4v6.
  • Sso7d a DNA binding protein derived from Sulfolobus solfactaricus, is known to greatly enhance the processivity of DNA polymerases. It was investigated whether this binding domain could improve the CODE-T4 editors. Interestingly, no significant mCherry activation of these two CODEs compared to the original CODE-T4v1 was observed.
  • the CODE-T4v6 editor which possesses a Y320A mutation, exhibited a similar editing efficiency to the CODE-T4v1.
  • Bst-LF DNAP is a thermophilic DNA polymerase that has strong strand displacement activity; therefore, Bst-LF is often used in isothermal amplification technologies such as Loop- mediated isothermal amplification (LAMP).
  • LAMP Loop- mediated isothermal amplification
  • wild-type Bst-LF is optimally active at high temperatures in amplification reactions
  • moderate editing efficiency was observed by Bst-LF and full length Bst DNAP in HEK293T cells.
  • prime editor 2 (PE2), which utilizes an engineered M-MLV variant containing five-point mutations that increase the thermostability and processivity, was shown to achieve dramatic increase in prime editing efficiency compared to prime editor 1 (PE1) that contains a wild-type M-MMLV reverse transcriptase.
  • Bst-LF DNAP is naturally a thermophilic enzyme, it was reasoned that enhancing its thermostability further might improve its overall performance inside cells. Multiple approaches were therefor explored to engineer Bst-LF mutants within the CODE systems to see if increasing the thermostability of Bst- LF DNAP would increase its activity in cells.
  • the Bst-LF DNAP variant referred to as Br512
  • the Bst-LF DNAP variant consists of a modified 47 amino acid actin-binding protein called villin headpiece fused to the N-terminus of the Bst-LF.
  • the fusion of the villin headpiece to Bst-LF DNAP is hypothesized to improve protein folding and increase processivity via stabilization of the DNA/protein complex.
  • the Br512 DNAP was employed in CODE (referred to as CODE-Bstv3) and observed a nearly 3-fold increase in mCherry-positive cells compared to wild-type Bst-LF.
  • Br512g3.1 and Br512g3.2 variants differ from Br512 in the villin headpiece, where point mutations were rationally designed to supercharge and stabilize the domain (referred to as SC-vHP47).
  • the Br512g3.1 variant bears 3 mutations (N31 R, N39K, E43K) on the SC-vHP47, whereas the Br512g3.2 bears 4 mutations (A20K, N31 R, N39K, E43K).
  • CODE-Bstv5 and CODE-Bstv6 resulted in 23.6% and 33.4% efficiency installing the desired edits, approximately 5-fold and 7-fold increases in mCherry activation compared to the original CODE-Bstv1 system.
  • a pegRNA consists of 100% RNA bases and therefore can be synthesized either by enzymatic or chemical synthesis reactions. This advantage provides flexibility to deliver prime editing systems into cells as well as animal models. Oftentimes, it is convenient to co-deliver plasmids encoding prime editors and pegRNAs driven by a U6 promoter. Since the cpegRNA is a chimeric entity consisting of both RNA and DNA bases, it cannot be synthesized by naturally occurring enzymes. Instead, cpegRNAs must be chemically synthesized, which poses challenges for delivery approaches and associated synthesis costs.
  • FIG. 9B In-house synthesized cpegRNAs were tested with CODE-Bstv6 and noted an improved performance with 43.7% mCherry activation (FIGs. 9C-9D).
  • CODEMax is a combination of the engineered CODE-Bstv6 and a mutated nCas9 variant (R221 K and N394K).
  • FIG. 7E the initial screening of CODE candidates showed better performance of a chimeric oligonucleotide-directed editor using full-length Bst DNAP compared to that with Bst-LF.
  • CODEMax and CODEMax(exo+) were tested by targeting multiple genomic regions with a variety of edit types such as base conversion and transversion, short insertion, and short deletion.
  • Enhanced editing efficiency of CODEMax and CODEMax(exo+) was observed compared to PE2 and PEMax at several loci such as EMX1, FANCF, SRD5A3 and DNMT1 with HEK3 and MECP2 being exceptions. (FIGs. 10C-10H, 13A-15). It should be noted that these data were head-to-head comparisons of synthetic pegRNA/cpegRNA and protein-encoding plasmid via transfection.
  • ngRNA sequences are found in Tables 5-6:
  • the 5’-3’ exonuclease activity of the Bst DNAP supports the degradation of the 5’-flap which leverages the incorporation of the newly polymerase-mediated extension of the 3’-flap into the genome, favoring the edit-incorporating outcomes.
  • DNA polymerases are abundant and diverse across all three domains of life.
  • the specific properties of wild-type DNA polymerases that can be beneficial for prime editing like systems include thermostability, processivity, proofreading ability, and 5’ to 3’ exonuclease activity.
  • further engineering of wild-type DNA polymerases can improve editing outcomes, but engineering efforts may also be directed towards specific applications. For example, a highly processive DNA polymerase may be required for longer insertions; however, for simpler edits, a polymerase with higher fidelity may be favored.
  • Doman et al. have shown that different reverse transcriptase proteins perform better depending on the edit type and location. Having a diverse toolbox of prime editors utilizing reverse transcriptase or DNA polymerase-based editors enables broad applications for genome engineering.
  • Example 9 Materials and Methods
  • CODE gene fragments were either obtained from Addgene or synthesized by Twist Biosciences.
  • Bacterial and mammalian expression plasmids for CODES were cloned using InFusion® cloning (Takara Bio, Cat# 638948).
  • Q5 high-fidelity polymerase (New England Biolabs, Cat# M0491 L) was used to amplify gene fragments and non-lentiviral backbone for cloning as well as genomic DNA for deep sequencing.
  • PrimeSTAR® GXL DNA Polymerase (Takara Bio, Cat# R050A) was used for amplification.
  • plasmids expressing CODEs were prepared using ZymoPure II Plasmid Midiprep kit (Zymo Research, Cat# D4201) and diluted down to 1 mg/pL prior to transfection.
  • SEQ ID NO. 72 provides an exemplary sequence for pCMV- CODEMax-P2A-eGFP and SEQ ID NO. 73 provides an exemplary sequence for pCMV- CODEMax(exo+)-P2A-eGFP.
  • pegRNAs, sgRNAs, and 5’ phosphorylated ssDNAs were purchased from IDT.
  • cpegRNAs were also purchased from IDT unless otherwise indicated that they were produced via ligation of 5’ phosphorylated ssDNA to sgRNAs.
  • T4 RNA Ligase I (New England Biolabs, Cat# M0204) as follows: 3.5 pL T4 RNA Ligase Buffer, 4.5 pL PEG8000, 2 pL T4 RNA Ligase I, 3 pL 10mM ATP, 2 pL 100uM sgRNA, 5 pL 100 mM 5’ phosphorylated ssDNA, 0.25 pL Murine RNase Inhibitor (New England Biolabs, Cat# M0314), and 14.75 pL water. Reaction volumes greater than 35pL led to decreased ligation efficiencies.
  • the culture was then quickly cooled on ice for 10-15 minutes and induced with 1 mM isopropyl B-D-1-thiogalactopyranoside (IPTG) (Gold Biotechnology, Cat# I2481C100).
  • IPTG isopropyl B-D-1-thiogalactopyranoside
  • the culture was induced at 18°C for 5 hours followed by 26 °C for 14-18 hours.
  • the culture was induced at 18 °C for 16-18 hours.
  • the protein mixture was then passed through a 5 mL Hitrap Heparin HP column (Cytiva, Cat# 17040701) pre-equilibrated with Buffer A.
  • HEK293T and Lenti-XTM 293T were obtained from ATCC (CRL-3216) and Takara Bio (#632180), respectively.
  • U2OS cells were obtained ATCC (HTB-96). The cells were tested with mycoplasma using MycoAlert® Mycoplasma Detection Kit (Lonza, Cat# LT07-118).
  • the cells were cultured and passaged in D10 medium containing DM EM high glucose with GlutaMAXTM supplement and pyruvate (Gibco, Cat# 10569044), 10% Fetal Bovine Serum (Gibco, Cat# A3160902), 1X Penicillin-Streptomycin (Gibco, Cat# 15140122), and 1X MEM non-essential amino acids (Gibco, 11140035). All cell lines were incubated at 37 °C and 5% CO2.
  • the medium was changed to D10 medium six hours later, and the cells were incubated for an additional 48-60 hours.
  • the cells were harvested and pelleted down via centrifugation at 4000 x g for 10 minutes at 4 °C.
  • the supernatant was passed through a 0.45 pm filter, aliquoted, and stored at -80 °C until use.
  • lentiviral transduction 5 x 10 5 HEK293T cells were infected with multiple dilutions of viral supernatant via reverse transduction. Briefly, viral supernatant was added to a 6-well plate first. The cells were then counted and resuspended in D10 medium supplemented with 10 pg/mL of TransducelTTM Transduction Reagent (Mirus Bio, Cat# MIR6620) and transferred to the wells pre-added with viral supernatant. The cells were incubated for 72 hours before flow cytometry sorting and/or antibiotic selection.
  • Ribonucleoprotein (RNP) complexes were delivered into HEK293T cells via a 4D- Nucleofector® X Unit (Lonza, Cat# AAF-1003X) in a strip format using SF Cell Line 4D- NucleofectorTM X Kit S (Lonza, Cat# V4XC-2032).
  • Purified PEs and CODES were complexed with either pegRNA or cpegRNA to form RNP at 50 pmol protein: 200 pmol peg/cpegRNA ratio for 15 minutes at room temperature.
  • Around 2x105 cells were resuspended in Lonza SF buffer, mixed with the RNP, and electroporated using the program ED-130. The mixture was then incubated at 37 °C and 5% CO2 for 10 minutes before adding to a 48-well plate pre-added with DM EM medium supplemented with 10% Fetal Bovine Serum (no antibiotic). Cells were harvested after 72 hours.
  • Plasmids encoding prime editors and chimeric editors and synthetic pegRNA/cpegRNA were delivered to HEK293T cells utilizing TranslT-X2® Dynamic Delivery System (Mirus Bio, Cat# Ml R6000). 24 hours prior to transfection cells were seeded at of 2x10 5 cells per well in 24-well plates. Immediately prior to transfection, the media was replaced with antibiotic-free D10 media.
  • Genomic DNA library preparation and targeted amplicon deep sequencing [0124] Genomic DNA was extracted from the HEK293T and U2OS cells using the QuickExtractTM DNA Extraction Solution (Biosearch Technologies) system according to the manufacturer’s instructions. The DNA was then amplified in the first round of PCR using Q5 DNA Polymerase (NEB), and Illumina barcodes were appended during a second PCR. The products were then gel extracted, pooled together, and loaded on an Illumina MiSeqDx using a MiSeq Reagent Nano Kit v2 (Illumina, Cat# MS-101-1001) according to the manufacturer’s protocol. CRISPResso2 was used to determine the percentage of precise editing and indels35.
  • NNB Q5 DNA Polymerase
  • Quantification window was defined by the parameter “-qwc” spanning 10 base pairs upstream and downstream flanking the targeting sequence in cases where there was no nicking guide and 10 base pairs flanking the targeting sequence and the nicking guide in the case where a nicking guide was used.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Cell Biology (AREA)
  • Mycology (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

In one aspect, the disclosure relates to a gene modification tool that fuses a Cas protein with a polymerase. This DSB-independent approach utilizes a DNA-RNA chimeric prime-editor-like guide RNAs (chimeric pegRNAs or cpegRNAs) to enable programmable DNA sequence replacement or excision at endogenous human genomic sites. The Cas9 and RNA portion of cpegRNA helps with targeting a gene, while the DNA portion of cpegRNA is used as a template by the polymerase to repair. Furthermore, the disclosed method can be used to introduce large DNA modifications in the genome.

Description

A PROGRAMMABLE GENE CORRECTION TECHNOLOGY USING CAS-POLYM ERASE CONSTRUCTS
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 63/579,160 filed on August 28, 2023, and U.S. Provisional Application No. 63/600,216 filed on November 17, 2023, each of which is incorporated herein by reference in its entirety.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
[0002] This invention was made with government support under Grant No. R35 GM147788 awarded by the National Institutes of Health. The government has certain rights in the invention.
CROSS-REFERENCE TO SEQUENCE LISTING
[0003] This application contains a sequence listing filed in ST.26 format entitled “222112- 2250_Sequence_Listing.xml” created on August 21 , 2024, and having a file size of 100,401 bytes. The content of the sequence listing is incorporated herein in its entirety.
BACKGROUND
[0004] Genetic engineering has had a long and successful history in plant biology and crop breeding. Transgenic tools have been extensively used in model species to discover several genetic mechanisms, and have been extensively used in crops to create elite materials, exemplified by the development of virus-resistant papaya and insect resistant maize plants, among many others. More recently, gene editing tools that make use of RNA-guided endonuclease (e.g. CRISPR/Cas9) were developed to increase the precision of the genetic transformation process. Genome-editing methods not only allow the opportunity to more precisely edit the genome, but also allow the genome to be edited without incorporating foreign DNA- sequences, thus creating plants that are edited but non-transgenic.
[0005] The discovery of CRISPR/Cas (clustered regularly interspaced short palindromic repeats/CRISPR-associated) systems has provided new platforms and approaches to the field of genome engineering (i.e., editing), diagnostics, and development of other new advanced applications in biology, agriculture, biotechnology, diagnostics, and treatment of genetic disorders. CRISPR-based diagnostics have elevated nucleic acid detection in terms of sensitivity, specificity, and rapidness. [0006] Originally derived from various species of bacterial adaptive immune systems, the CRISPR/Cas technology works by introducing a CRISPR associated (Cas) nuclease and a short guide RNA sequence that has a region complimentary to a target sequence/site and acts as a guide by binding with Cas and directing the guide RNA/Cas complex to a target site. This complex then acts as molecular scissors to cut the target sequence at a specific site creating doublestranded cuts in the target DNA or a single-stranded cut in the target RNA. This specific target recognition and cleavage is also referred to as “cis-cleavage”. Some types of Cas proteins, once bound to a target sequence, also become active for collateral, non-specific cleavage, called “trans-cleavage.” This trans-cleavage activity can be leveraged for detection technologies.
[0007] Current methods for targeted genomic modification, such as deletion, replacement, integration, or inversion of DNA sequences, often involve double-strand DNA breaks (DSBs), leading to unintended outcomes like indel mixtures and chromosomal abnormalities. Many researchers are developing methods to achieve genomic modifications without DSBs. The most popular method is Prime Editing (Liu and coworkers), which involves fusing a reverse transcriptase domain to Cas9 and a longer pegRNA that contains a guide RNA and a repair template for reverse transcriptase. Prime editing suffers from lower efficiency but typically also introduces undesirable modifications to the genome. Some of those modifications are that the entire pegRNA is inserted into the genome by the reverse transcriptase (RT).
[0008] Despite advances in genomic modification research, there is still a scarcity of methods that avoid the issues associated with double-stranded DNA breaks while achieving high efficiency incorporation of edits and avoiding undesired incorporations of RNA guides into the genome. These needs and other needs are satisfied by the present disclosure.
SUMMARY
[0009] In accordance with the purpose(s) of the present disclosure, as embodied and broadly described herein, the disclosure, in one aspect, relates to a gene modification tool that fuses a Cas protein with a polymerase. This double-strand break (DSB)-independent approach utilizes a DNA-RNA chimeric prime-editor-like guide RNAs (chimeric pegRNAs or cpegRNAs) to enable programmable DNA sequence replacement or excision at endogenous human genomic sites. The Cas9 and RNA portion of cpegRNA helps with targeting a gene, while the DNA portion of cpegRNA is used as a template by the polymerase to repair. Furthermore, the disclosed method can be used to introduce large DNA modifications in the genome. [0010] Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims. In addition, all optional and preferred features and modifications of the described embodiments are usable in all aspects of the disclosure taught herein. Furthermore, the individual features of the dependent claims, as well as all optional and preferred features and modifications of the described embodiments are combinable and interchangeable with one another.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] Many aspects of the present disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
[0012] FIGs. 1A-1C show electrophoretic separations of various Cas9-polymerase fusion proteins indicating successful construction of Cas9-reverse transcriptase (control; FIG. 1A); Cas9-T7 and Cas9-T4 polymerases (FIG. 1B); and Cas9-Klenow fragment DNA polymerase (FIG. 1C).
[0013] FIG. 2 shows a scheme for introducing eGFP and a mutated mcherry with a premature stop codon into a plasmid for testing the disclosed method.
[0014] FIG. 3 shows a scheme for transducing cells with the plasmid containing eGFP and mutated mcherry, selection of GFP positive cells, and using the disclosed method to remove the stop codon, allowing simultaneous expression of eGFP and mcherry in the cells.
[0015] FIGs. 4A-4F show successful transduction and optimization of HEK293T cells with the disclosed plasmids containing eGFP and mutated mcherry. FIG. 4A: 500 pL medium; FIG. 4B: 250 pL medium; FIG. 4C: 125 pL medium; FIG. 4D: 62.5 pL medium; FIG. 4E: 31.25 pL medium; FIG. 4F: 0 pL medium (control).
[0016] FIGs. 5A-5B show a comparison of a standard prime editing technique (FIG. 5A) and a disclosed genome editing technique (Cas9 nickase fused with T4 DNA polymerase; FIG. 5B).
[0017] FIG. 6 shows a comparison of a standard prime editing technique and a disclosed genome editing technique (Cas9 nickase fused with DNA polymerases). The percentage of gene modification is quantified by expression of mCherry.
[0018] FIGs. 7A-7E show an initial screening for activity of CODE candidates. (FIG. 7A) Schematic of the development of CODEs. CODEs consist of a nCas9-DNAP fusion protein and a chimeric pegRNA (cpeg) containing a guide RNA and ssDNA template with intended edits and primer binding site (PBS). (FIG. 7B) Architecture of bacterial expression plasmids of CODEs. The editor expression is driven by T7 promoter, and 6x Histidine tag is located at the C-terminus is employed for purification purposes. (FIG. 7C) Construction of HEK293T reporter cell line supporting base conversion via prime editing or CODE. (FIG. 7D) Schematic of the workflow for nucleofection of CODEs and cpegRNA into HEK293T reporter cell line. (FIG. 7E) Percentage of mCherry activation of CODE candidates and the control engineered PE2 system. Error bars represent ± SD, where n = 3 biological replicates.
[0019] FIGs. 8A-8G show engineering of T4 and Bst chimeric oligonucleotide-directed editors for improved editing. (FIG. 8A) Architecture of engineered CODE-T4 editors with domain rearrangement strategies. (FIG. 8B) Percentage of mCherry activation of the CODE-T4 variants in (FIG. 8A). (FIG. 8C) Engineering attempts to alter the T4 DNAP processivity and fidelity to create improved CODE-T4 variants beneficial mutations. (FIGs. 8D-8E) Optimization of amino acid linker length between nCas9 and T4 DNAP in the fusion construct. (FIG. 8F) Engineering attempts to alter the Bst-LF DNAP thermostability to create improved CODE-Bst variants with beneficial mutations. (FIG. 8G) Percentage of mCherry activation of the CODE-T4 variants in (FIG. 8F). Error bars represent ± SD, where n = 3 biological replicates.
[0020] FIGs. 9A-9G show I in-house synthesis of cpegRNA ligation reaction. (FIG. 9A) Schematic of the T4 RNA Ligase l-mediated cpegRNA synthesis. (FIG. 9B) Representative of denaturing gel showing successful ligation of sgRNA and ssDNA oligo to generate cpegRNA that targets mCherry gene. (FIG. 9C) Visualization of HEK293T cells by fluorescence microscopy showing the mCherry activation by PE2 and engineered CODE-Bst variants with ligated cpegRNA. Cells were transfected with prime editors and CODEs 72 hours prior to imaging. (FIG. 9D) Quantification of mCherry activation in (FIG. 9C) via flow cytometry. (FIG. 9E) Schematic of the workflow for transfection of plasmid encoding human codon-optimized CODE and synthetic or ligated cpegRNA. (FIGs. 9F-9G) Efficiency of intended and unintended modifications of PE2 and CODE2 at MECP2 and DNMT1 loci, respectively. Error bars represent ± SD, where n = 3 biological replicates. [0021] FIGs. 10A-10H show efficient chimeric oligonucleotide-directed editing of endogenous gene loci with CODEMax and CODEMax(exo+). (FIG. 10A) Architecture of plasmid encoding CODEMax and CODEMax(exo+). (FIG. 10B) Alphafold3 predicted structure of the CODEMax(exo+) in complex with cpegRNA and target dsDNA36. (FIGs. 10C-10H) Endogenous targeting of CODEMax and CODEMax(exo+) at various gene loci in comparison with PE2 and PEMax. Error bars represent ± SD, where n = 3 technical replicates.
[0022] FIGs. 11A-11 F show general optimization of methods for chimeric oligonucleotide- directed editors. (FIGs. 11A-11B) Effect of the volume of TranslT-X2 used during co-transfection of CODE encoding plasmid and cpegRNA at two different loci. (FIG. 11C) Effect of CODE encoding plasmid amount on editing efficiency. (FIGs. 11 D-11E) Effect of the total RNP complex on editing efficiency via transfection. The ratio of cpegRNA to CODE was constant at 2:1. (FIG. 11F) Effect of the total RNP complex on editing efficiency via nucleofection. Error bars represent ± SD, where n = 3 technical replicates.
[0023] FIGs. 12A-12B show architecture of chimeric oligonucleotide-directed editors with domain organization. AlphaFold 3 predicted structure of (FIG. 12A) CODEMax and (FIG. 12B) CODEMax(exo+) in complex with cpegRNA and target dsDNAI . cpegRNA modeled incorporated a C to T substitution at position +1 downstream of the nick site at the HEK3 locus.
[0024] FIGs. 13A-13B show efficiency of prime editors and chimeric oligonucleotide-directed editors in different cell types. (FIGs. 13A-13B) Efficiency of +5 G>T at the EMX1 locus without ngRNA in HEK293T cells and U2OS cells, respectively. Error bars represent ± SD, where n = 3 technical replicates.
[0025] FIGs. 14A-14D show Additional target sites for head-to-head comparison between prime editors and chimeric oligonucleotide-directed editors. (FIGs. 14A-14D) Efficiency of intended and unintended modifications at HEK3, FANCF, and SRD5A3, and SNCA respectively. The data is complementary to FIGs. 7A-7E. Error bars represent ± SD, where n = 3 technical replicates.
[0026] FIG. 15 shows representative allele plot for insertion of Hindlll at HEK3 locus. Allele plots with efficiencies and read counts were output by CRISPResso2.
[0027] FIGs. 16A-16C show a comparison of synthetic and pU6-promoter driven pegRNA delivery. (FIGs. 16A-16B) Head-to-head comparison of synthetic pegRNA and pU6-promoter driven pegRNA expression for edits at two endogenous loci for +1 TCA insertion at DNMT1, +2 substitution at SRD5A3, and +2 G>T at SNCA, respectively. [0028] FIGs. 17A-17C show quantification of scaffold insertion for prime editors and chimeric oligonucleotide-directed editors for select edits. (FIGs. 17A-17C) Percent scaffold insertion quantified with CRISPResso2 at the HEK3 locus for +1 5bp deletion and +1 Hindi 11 insertion, and SNCA +2 G>C, respectively.
[0029] FIGs. 18A-18D show quantification of imprecise edits for prime editors and chimeric oligonucleotide-directed editors at select loci. (FIGs. 18A-18D) Imprecise edit percentage and imprecise edit type as quantified by CRISPResso2 plotted against base position relative to nick site.
[0030] Additional advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or can be learned by practice of the invention. The advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
DETAILED DESCRIPTION
[0031] Prime editing has gained prominence as a highly effective genome editing tool due to its precision and versatility. Pioneered by Anzalone et al, the technology can correct all 12 possible single-base substitutions and make small insertions or deletions without generating doublestranded breaks. Prime editing typically uses a nickase Cas9-reverse transcriptase (nCas9-RT) fusion protein (prime editor) and a prime editing guide RNA (pegRNA), which has a 3’ extension containing a primer binding site (PBS) and reverse transcription template (RTT) encoding the desired edit. The 5’ spacer region of the pegRNA directs the prime editor to the target site, resulting in nicking of the non-target strand by nCas9. Hybridization of the 3’ PBS to the nicked strand then generates an initiation site for reverse transcription to occur. Subsequently, the correct edits are installed into a 3’ edited DNA flap which can be incorporated into the genome.
[0032] Although prime editing has been iteratively improved through many rounds at the level of both the prime editor and pegRNA, the system still suffers from variable efficiency across genomic loci. To date, prime editing has largely made use of Moloney Murine Leukemia Virus reverse transcriptase (MMLV RT), which suffers from several drawbacks that decrease the effectiveness of CRISPR/Cas associated genome editing. Editing efficiency is typically less than 20% in immortalized cell lines and even lower in other cell types. Editing efficiency can further vary a great deal across target sequences and cell types, making broad application difficult and/or unpredictable. Base substitution efficiency with MMLV RT is still quite low, and off-target mutations are relatively common. Multiple versions of prime editors have been engineered to improve this efficiency. However, most newer versions of prime editing, at least three separate components must be delivered to the cell - a prime editing guide RNA (pegRNA) encoding the desired genomic modification, a fusion protein containing a Cas9 nickase and MMLV RT, and a single guide RNA (sgRNA) that directs the fusion protein to nick the non-target DNA strand.
[0033] In another aspect, with MMLV RT based prime editing, after the prime editor generates a 3’ edited flap, it competes with the 5’ unedited flap for incorporation into the genome, reducing the likelihood of a successful edit. Furthermore, since the pegRNA contains a PBS that is complementary to the spacer sequence, it forms a stable RNA-RNA duplex, which creates a barrier for ribonucleoprotein complexation between the pegRNA and the prime editor. Recent efforts have been made to circumvent the auto-inhibitory interaction within the pegRNA by optimizing the melting temperature of the PBS and by incorporating mismatches into the PBS to reduce misfolded pegRNA interactions which leads to improved prime editing efficiency. Although prime editing can make versatile edits without induction of double-strand breaks in the DNA, the accuracy of prime editors still has room for improvement. One of the most common imprecise edits observed for prime editing is an overextension of the reverse transcriptase past the RTT into the scaffold region of the pegRNA. This readthrough becomes problematic because it results in the incorporation of undesired bases into the genome, although recent studies have shown methods to lower its occurrence by engineering highly structured regions within the pegRNA.
[0034] Disclosed herein are chimeric oligonucleotide-directed editing (CODE) systems consisting of a DNA-dependent DNA polymerase paired with a chimeric pegRNA (cpegRNA) containing a DNA primer binding site and a DNA polymerase template may address some of the limitations of current reverse transcriptase-based prime editors. It was hypothesized that a cpegRNA could reduce the auto-inhibitory effect observed for traditional pegRNAs, as the DNA-RNA duplex is inherently less stable than the RNA-RNA duplex. In a further aspect, DNA polymerases are an abundant family of proteins with diverse molecular properties, rendering them intriguing candidates to achieve chimeric oligonucleotide-directed editing. In one aspect, the use of DNA polymerases with advantageous properties, such as high processivity, proofreading capability, and minimal reverse transcriptase activity has the potential to improve editing efficiency, reduce unintended edits, and enable new types of edits to be performed. [0035] Herein is disclosed a new class of 13 CODEs that consist of a cpegRNA and a nickase Cas9-DNA polymerase fusion protein. This simple two-component system allows for delivery via plasmids or ribonucleoprotein (RNP) complexes for effective and accurate genome editing. In an aspect, CODE improved gene correction efficiency compared to conventional PE2 and PEMax at several genomic loci with low unintended scaffold incorporation. In a further aspect, engineered CODEs expand the gene editing toolbox and offer versatility as well as flexibility toward therapeutic applications.
[0036] In one aspect, the disclosed technology uses a polymerase such as, for example, T4 DNA polymerase, T7 DNA polymerase, Klenow fragment DNA polymerase, <|>29 DNA polymerase, Bsu DNA polymerase, or Bst DNA polymerase fused with a Cas9 protein rather than a reverse transcriptase for incorporating a desired edit into the cell. In a further aspect, the disclosed technology overcomes various errors associated with the use of reverse transcriptase enzymes such as amplification of an entire guide RNA (pegRNA) and incorporation into the genome. As such, the present disclosure exhibits high efficiency without off-target effects. In one aspect, a the present technology makes a pegDNA by incorporating DNA into the ends of guide RNA. In a further aspect, Cas9 is then fused with a DNA polymerase that works at a particular temperature such as, for example, 37 °C, although other optimum temperatures are contemplated and should be considered disclosed. In one aspect, the present platform does not make use of the Cas9 endonuclease, thus avoiding the problems associated with Cas9 endonuclease promiscuity. In one aspect, the disclosed platform technology can be further expanded to modify any genomic region, RNA sequence, or cell type.
[0037] Many current methods for genome editing induce double stranded breaks (DSBs), which are repaired by the cell about 90% of the time through non-homologous end joining (NHEJ); only about 10% of such repairs are homology-directed repairs (HRD), and then only in dividing cells. HDR has low efficiency, even when a repair template is provided in excess. Repair of DSBs often leads to undesired mutations including, but not limited to, chromosomal abnormalities. What is needed is an improved version of prime editing with higher efficiency and a lower tendency towards introducing errors in the genome.
System for Genomic Modification
[0038] In one aspect, disclosed herein is a system for site-specific modification of a doublestranded target DNA sequence, the system including at least the following:
(a) a fusion protein including a DNA-binding nickase protein and a DNA polymerase, (b) a chimeric guide nucleic acid sequence (cpegRNA) including at least one region of complementarity to a first strand of the double-stranded target DNA sequence, a core sequence that interacts with the DNA-binding nickase protein, a template encoding a modified sequence for insertion into a second strand of the double-stranded target DNA sequence, and a primer sequence.
[0039] In a further aspect, the DNA-binding nickase protein can be a Cas9 nickase, a Cas12i1 nickase, or a Cas12a nickase, such as, for example, a DNA-binding nickase derived from a Streptococcus pyogenes Cas9, a Staphylococcus aureus Cas9, a Neisseria meningitidis Cas9, a Streptococcus thermophilus Cas9, an Actinobacillus minor Cas9, an Actinobacillus pleuropneumoniae Cas9, an Actinobacillus seminis Cas9, an Actinobacillus succinogenes Cas9, a Bergeriella denitrificans Cas9, a Conservatibacter flavescens Cas9, a Gallibacterium anatis Cas9, a Haemophilus felis Cas9, a Haemophilus parainfluenzae Cas9, a Haemophilus pittmaniae Cas9, a Haemophilus sputorum Cas9, a Mannheimia granulomatis Cas9, a Neisseria animalis Cas9, a Neisseria animaloris Cas9, a Neisseria arctica Cas9, a Neisseria bacilliformis Cas9, a Neisseria dentiae Cas9, an Otariodibacter oris Cas9, a Pasteurella aerogenes Cas9, a Pasteurella langaaensis Cas9, a Pasteurella mairii Cas9, a Pasteurellaceae bacterium Cas9, a Phocoenobacter uteri Cas9, a Rodentibacter pneumotropicus Cas9, a Simonsiella muelleri Cas9, a Suttonella indoIogenes Cas9, a Treponema denticola Cas9, a Moraxella bovoculi Cas12a, or an engineered AsCas12a. In some aspects, Cas9 protein is engineered to contain an R221 K mutation, an N394K mutation, or both.
[0040] In an aspect, many Cas12 systems act like nickases with shorter guides and cleave the NT strand, and can also be used in the systems and methods disclosed herein. In one aspect, the present systems and methods make use of a much shorter crRNA (~36-42 nt) and therefore can be extended with DNA bases much more easily synthetically. In another aspect, the present systems and methods can be or are configured to work in the inverse direction in terms of binding DNA. Further in this aspect, for example, the 3' end of the crRNA binds with the target instead of the 5' end. In a still further aspect, this means the present system can be multiplexed with nCas9- DP to improve the prime editing efficiency on the other strand (in the same direction) or same strand (in the other direction).
[0041] In another aspect, the chimeric guide nucleic acid sequence includes both RNA and DNA. In an aspect, the at least one region of complementary is RNA and the core sequence is RNA. In a further aspect, the template incorporating a modified sequence for insertion is DNA. Further in this aspect, the template is from about 1 to about 5000 nucleotides in length, about 2 to about 1000 nucleotides in length, or about 3 to about 500 nucleotides in length. In one aspect, the primer sequence is RNA and forms a primer for the DNA polymerase. In a further aspect, the primer is from about 4 to about 20 nucleotides in length, or is from about 10 to about 20 nucleotides in length.
[0042] In one aspect, the DNA polymerase can be selected from E. coli DNA polymerase I (Ecol), T4 DNA polymerase, T5 DNA polymerase, T7 DNA polymerase, thioredoxin binding domain of T7 DNA polymerase (T7/Trx), Klenow fragment DNA polymerase, < 29 DNA polymerase, Bsu DNA polymerase, Bst DNA polymerase large fragment (LF), Bst DNA polymerase full length (F), Pfu DNA polymerase, Pwo DNA polymerase, Stoffel DNA polymerase, or any combination thereof.
[0043] In a further aspect, the T4 DNA polymerase can be located at a C-terminus or an N- terminus of the DNA-binding nickase protein and a 33 amino acid linker can connect the T4 DNA polymerase to the DNA-binding nickase protein. In another aspect, the T4 DNA polymerase includes at least one mutation such as, for example, a Y320A mutation, an L412M mutation, an I50L mutation, a G255S mutation, or any combination thereof. In another aspect, the at least one mutation can be addition of an sso7d DNA binding domain at a C-terminus of the T4 DNA polymerase, optionally wherein the sso7d DNA binding domain can be an E12L mutation, a K35L mutation, or both. In an alternative aspect, the Bst DNA polymerase LF includes at least one fusion domain. In a further aspect, the at least one fusion domain includes an actin-binding protein such as, for example, villin headpiece. In another aspect, the fusion domain is fused to an N- terminus of the Bst DNA polymerase LF. In yet another aspect, the Bst DNA polymerase LF comprises a polymerase domain mutation such as, for example, a T493N mutation, an A552G mutation, an S371 D mutation, or any combination thereof. In any of these aspects, the polymerase domain mutation imparts increased thermostability relative to a wild type Bst DNA polymerase LF. In still another aspect, the villin headpiece includes at least one additional mutation such as, for example, N31 R, N39K, E43K, A20K, or any combination thereof.
[0044] In a further aspect, the DNA polymerase can be optimized to operate at a temperature of from about 30 °C to about 40 °C, or at about 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, or about 40
Figure imgf000012_0001
[0045] In one aspect, in the disclosed system, the modified sequence for insertion includes at least one mutation such as, for example, an insertion, a deletion, a base substitution, or any combination thereof.
Method for Site-Specific Modification of a Double-Stranded Target DNA
[0046] In one aspect, disclosed herein is a method for site-specific modification of a doublestranded target DNA sequence in a cell, the method including contacting the double-stranded target DNA sequence with the disclosed system, wherein the core sequence binds the DNA-binding nickase protein; wherein the DNA-binding nickase protein nicks the second strand of the double-stranded target DNA sequence to form a free 3' end; wherein the primer sequence anneals with a complementary region on the second strand of the double-stranded target DNA; and wherein the DNA polymerase synthesizes a single strand of DNA encoded by the template from the free 3' end.
[0047] In a further aspect, in the disclosed method, the cell can be a prokaryotic cell or a eukaryotic cell such as, for example, a mammalian cell, and can be a dividing cell or a non-dividing cell. In some aspects, the system is introduced to the cell using nucleofection (electroporation) or another transfection method. In any of these aspects, performing the method results in less than 10% off-target editing in the cell’s genome. In a further aspect, the fusion protein and the chimeric guide nucleic acid are present in amounts effective to modify genomic DNA in the cell. In one aspect, the method does not introduce double strand breaks into cellular DNA. In another aspect, performing the method results in incorporation of the modified sequence in at least 25% of a population of cells contacted with the system.
Cells and Fusion Proteins
[0048] Also disclosed herein is a cell including at least one genomic modification introduced by the disclosed method.
[0049] Furthermore, in one aspect, provided herein is a fusion protein including at least the following:
(a) a first part comprising a DNA-binding nickase protein; and (b) a second part comprising a DNA polymerase.
[0050] In an aspect, the DNA-binding nickase protein can be a Cas9 nickase, a Cas12i1 nickase, a Cas12a nickase, an ortholog thereof, or any combination thereof, and can be selected from nickases such as, for example, a Streptococcus pyogenes Cas9, a Staphylococcus aureus Cas9, a Neisseria meningitidis Cas9, a Streptococcus thermophilus Cas9, an Actinobacillus minorCasQ, an Actinobacillus pleuropneumoniae Cas9, an Actinobacillus seminis Cas9, an Actino bad I I us succinogenes Cas9, a Bergeriella denitrificans Cas9, a Conservatibacter flavescens Cas9, a Gallibacterium anatis Cas9, a Haemophilus fells Cas9, a Haemophilus parainfluenzae Cas9, a Haemophilus pittmaniae Cas9, a Haemophilus sputorum Cas9, a Mannheimia granulomatis Cas9, a Neisseria animalis Cas9, a Neisseria animalo s Cas9, a Neisseria arctica Cas9, a Neisseria bacilliformis Cas9, a Neisseria dentiae Cas9, an Otariodibacteroris Cas9, a Pasteurella aerogenes Cas9, a Pasteurella langaaensis Cas9, a Pasteurella mairii Cas9, a Pasteurellaceae bacterium Cas9, a Phocoenobacter uteri Cas9, a Rodentibacter pneumotropicus Cas9, a Simonsiella muelleri Cas9, a Suttonella indoIogenes Cas9, a Treponema denticola Cas9, a Moraxella bovoculi Cas12a, or an engineered AsCas12a. In some aspects, the Cas9 protein is engineered to contain an R221K mutation, an N394K mutation, or both.
[0051] In another aspect, the DNA polymerase can be selected from E. coli DNA polymerase I (Ecol), T4 DNA polymerase, T5 DNA polymerase, T7 DNA polymerase, thioredoxin binding domain of T7 DNA polymerase (T7/Trx), Klenow fragment DNA polymerase, 29 DNA polymerase, Bsu DNA polymerase, Bst DNA polymerase large fragment (LF), Bst DNA polymerase full length (F), Pfu DNA polymerase, Pwo DNA polymerase, Stoffel DNA polymerase, or any combination thereof.
[0052] In a further aspect, the T4 DNA polymerase can be located at a C-terminus or an N- terminus of the DNA-binding nickase protein and a 33 amino acid linker can connect the T4 DNA polymerase to the DNA-binding nickase protein. In another aspect, the T4 DNA polymerase includes at least one mutation such as, for example, a Y320A mutation, an L412M mutation, an I50L mutation, a G255S mutation, or any combination thereof. In another aspect, the at least one mutation can be addition of an sso7d DNA binding domain at a C-terminus of the T4 DNA polymerase, optionally wherein the sso7d DNA binding domain can be an E12L mutation, a K35L mutation, or both. In an alternative aspect, the Bst DNA polymerase LF includes at least one fusion domain. In a further aspect, the at least one fusion domain includes an actin-binding protein such as, for example, villin headpiece. In another aspect, the fusion domain is fused to an N- terminus of the Bst DNA polymerase LF. In yet another aspect, the Bst DNA polymerase LF comprises a polymerase domain mutation such as, for example, a T493N mutation, an A552G mutation, an S371 D mutation, or any combination thereof. In any of these aspects, the polymerase domain mutation imparts increased thermostability relative to a wild type Bst DNA polymerase LF. In still another aspect, the villin headpiece includes at least one additional mutation such as, for example, N31 R, N39K, E43K, A20K, or any combination thereof.
[0053] Many modifications and other embodiments disclosed herein will come to mind to one skilled in the art to which the disclosed compositions and methods pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the disclosures are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. The skilled artisan will recognize many variants and adaptations of the aspects described herein. These variants and adaptations are intended to be included in the teachings of this disclosure and to be encompassed by the claims herein.
[0054] Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
[0055] As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present disclosure.
[0056] Any recited method can be carried out in the order of events recited or in any other order that is logically possible. That is, unless otherwise expressly stated, it is in no way intended that any method or aspect set forth herein be construed as requiring that its steps be performed in a specific order. Accordingly, where a method claim does not specifically state in the claims or descriptions that the steps are to be limited to a specific order, it is no way intended that an order be inferred, in any respect. This holds for any possible non-express basis for interpretation, including matters of logic with respect to arrangement of steps or operational flow, plain meaning derived from grammatical organization or punctuation, or the number or type of aspects described in the specification.
[0057] All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided herein can be different from the actual publication dates, which can require independent confirmation.
[0058] While aspects of the present disclosure can be described and claimed in a particular statutory class, such as the system statutory class, this is for convenience only and one of skill in the art will understand that each aspect of the present disclosure can be described and claimed in any statutory class.
[0059] It is also to be understood that the terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosed compositions and methods belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the specification and relevant art and should not be interpreted in an idealized or overly formal sense unless expressly defined herein.
[0060] Prior to describing the various aspects of the present disclosure, the following definitions are provided and should be used unless otherwise indicated. Additional terms may be defined elsewhere in the present disclosure.
Definitions
[0061] As used herein, “comprising” is to be interpreted as specifying the presence of the stated features, integers, steps, or components as referred to, but does not preclude the presence or addition of one or more features, integers, steps, or components, or groups thereof. Moreover, each of the terms “by”, “comprising,” “comprises”, “comprised of,” “including,” “includes,” “included,” “involving,” “involves,” “involved,” and “such as” are used in their open, non-limiting sense and may be used interchangeably. Further, the term “comprising” is intended to include examples and aspects encompassed by the terms “consisting essentially of” and “consisting of.” Similarly, the term “consisting essentially of” is intended to include examples encompassed by the term “consisting of.
[0062] As used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a polymerase,” “a gene,” or “an RNA strand,” including, but not limited to, mixtures or combinations of two or more such polymerases, genes, or RNA strands, and the like.
[0063] It should be noted that ratios, concentrations, amounts, and other numerical data can be expressed herein in a range format. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. It is also understood that there are a number of values disclosed herein, and that each value is also herein disclosed as “about” that particular value in addition to the value itself. For example, if the value “10” is disclosed, then “about 10” is also disclosed. Ranges can be expressed herein as from “about” one particular value, and/or to “about” another particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms a further aspect. For example, if the value “about 10” is disclosed, then “10” is also disclosed.
[0064] When a range is expressed, a further aspect includes from the one particular value and/or to the other particular value. For example, where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure, e.g. the phrase “x to y” includes the range from ‘x’ to ‘y’ as well as the range greater than ‘x’ and less than ‘y’. The range can also be expressed as an upper limit, e.g. 'about x, y, z, or less’ and should be interpreted to include the specific ranges of ‘about x’, ‘about y’, and ‘about z’ as well as the ranges of ‘less than x’, less than y’, and ‘less than z’. Likewise, the phrase ‘about x, y, z, or greater’ should be interpreted to include the specific ranges of ‘about x’, ‘about y’, and ‘about z’ as well as the ranges of ‘greater than x’, greater than y’, and ‘greater than z’. In addition, the phrase “about ‘x’ to ‘y’”, where ‘x’ and ‘y’ are numerical values, includes “about ‘x’ to about ‘y’”.
[0065] It is to be understood that such a range format is used for convenience and brevity, and thus, should be interpreted in a flexible manner to include not only the numerical values explicitly recited as the limits of the range, but also to include all the individual numerical values or subranges encompassed within that range as if each numerical value and sub-range is explicitly recited. To illustrate, a numerical range of “about 0.1% to 5%” should be interpreted to include not only the explicitly recited values of about 0.1 % to about 5%, but also include individual values (e.g., about 1 %, about 2%, about 3%, and about 4%) and the sub-ranges (e.g., about 0.5% to about 1.1%; about 5% to about 2.4%; about 0.5% to about 3.2%, and about 0.5% to about 4.4%, and other possible sub-ranges) within the indicated range.
[0066] As used herein, the terms “about,” “approximate,” “at or about,” and “substantially” mean that the amount or value in question can be the exact value or a value that provides equivalent results or effects as recited in the claims or taught herein. That is, it is understood that amounts, sizes, formulations, parameters, and other quantities and characteristics are not and need not be exact, but may be approximate and/or larger or smaller, as desired, reflecting tolerances, conversion factors, rounding off, measurement error and the like, and other factors known to those of skill in the art such that equivalent results or effects are obtained. In some circumstances, the value that provides equivalent results or effects cannot be reasonably determined. In such cases, it is generally understood, as used herein, that “about” and “at or about” mean the nominal value indicated ±10% variation unless otherwise indicated or inferred. In general, an amount, size, formulation, parameter or other quantity or characteristic is “about,” “approximate,” or “at or about” whether or not expressly stated to be such. It is understood that where “about,” “approximate,” or “at or about” is used before a quantitative value, the parameter also includes the specific quantitative value itself, unless specifically stated otherwise.
[0067] As used herein, the terms “optional” or “optionally” means that the subsequently described event or circumstance can or cannot occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.
[0068] As used herein, “nucleic acid,” “nucleotide sequence,” and “polynucleotide” can be used interchangeably herein and can generally refer to a string of at least two base-sugar-phosphate combinations and refers to, among others, single-and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and doublestranded regions. In addition, polynucleotide as used herein can refer to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands in such regions can be from the same molecule or from different molecules. The regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules. One of the molecules of a triple-helical region often is an oligonucleotide. “Polynucleotide” and “nucleic acids” also encompasses such chemically, enzymatically or metabolically modified forms of polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including simple and complex cells, inter alia. For instance, the term polynucleotide as used herein can include DNAs or RNAs as described herein that contain one or more modified bases. Thus, DNAs or RNAs including unusual bases, such as inosine, or modified bases, such as tritylated bases, to name just two examples, are polynucleotides as the term is used herein. “Polynucleotide”, “nucleotide sequences” and “nucleic acids” also includes PNAs (peptide nucleic acids), phosphorothioates, and other variants of the phosphate backbone of native nucleic acids. Natural nucleic acids have a phosphate backbone, artificial nucleic acids can contain other types of backbones, but contain the same bases. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are “nucleic acids” or "polynucleotides" as that term is intended herein. As used herein, “nucleic acid sequence” and “oligonucleotide” also encompasses a nucleic acid and polynucleotide as defined elsewhere herein.
[0069] As used herein, “deoxyribonucleic acid (DNA)” and “ribonucleic acid (RNA)” can generally refer to any polyribonucleotide or polydeoxyribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. RNA can be in the form of non-coding RNA such as tRNA (transfer RNA), snRNA (small nuclear RNA), rRNA (ribosomal RNA), anti-sense RNA, RNAi (RNA interference construct), siRNA (short interfering RNA), microRNA (miRNA), or ribozymes, aptamers, guide RNA (gRNA), CRISPR RNA (crRNA), Trans-activating crRNA (tracrRNA), or coding mRNA (messenger RNA).
[0070] As used herein, “cDNA” refers to a DNA sequence that is complementary to an RNA transcript in a cell. It is a man-made molecule. Typically, cDNA is made in vitro by an enzyme called reverse-transcriptase using RNA transcripts as templates.
[0071] As used herein, “gene” can refer to a hereditary unit corresponding to a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a characteristic(s) or trait(s) in an organism. The term gene can refer to translated and/or untranslated regions of a genome. “Gene” can refer to the specific sequence of DNA that is transcribed into an RNA transcript that can be translated into a polypeptide or be a catalytic RNA molecule, including but not limited to, tRNA, siRNA, piRNA, miRNA, long-non-coding RNA and shRNA.
[0072] As used herein with reference to the relationship between DNA, cDNA, cRNA, RNA, protein/peptides, and the like “corresponding to” or “encoding” (used interchangeably herein) refers to the underlying biological relationship between these different molecules. As such, one of skill in the art would understand that operatively “corresponding to” can direct them to determine the possible underlying and/or resulting sequences of other molecules given the sequence of any other molecule which has a similar biological relationship with these molecules. For example, from a DNA sequence an RNA sequence can be determined and from an RNA sequence a cDNA sequence can be determined. [0073] As used herein, the term “exogenous DNA” or “exogenous nucleic acid sequence” or “exogenous polynucleotide” refers to a nucleic acid sequence that was introduced into a cell, organism, or organelle via transfection. Exogenous nucleic acids originate from an external source, for instance, the exogenous nucleic acid may be from another cell or organism and/or it may be synthetic and/or recombinant. While an exogenous nucleic acid sometimes originates from a different organism or species, it may also originate from the same species (e.g., an extra copy or recombinant form of a nucleic acid that is introduced into a cell or organism in addition to or as a replacement for the naturally occurring nucleic acid). Typically, the introduced exogenous sequence is a recombinant sequence.
[0074] As used herein, “isolated” means separated from constituents, cellular and otherwise, in which the polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, are normally associated with in nature. A non-naturally occurring polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, do not require “isolation” to distinguish it from its naturally occurring counterpart.
[0075] As used herein, “variant” can refer to a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide, but retains essential and/or characteristic properties (structural and/or functional) of the reference polynucleotide or polypeptide. A typical variant of a polypeptide differs in amino acid sequence from another, reference polypeptide. The differences can be limited so that the sequences of the reference polypeptide and the variant are closely similar overall and, in many regions, identical. A variant and reference polypeptide may differ in nucleic or amino acid sequence by one or more modifications at the sequence level or post- transcriptional or post-translational modifications (e.g., substitutions, additions, deletions, methylation, glycosylations, etc.). A substituted nucleic acid may or may not be an unmodified nucleic acid of adenine, thiamine, guanine, cytosine, uracil, including any chemically, enzymatically or metabolically modified forms of these or other nucleotides. A substituted amino acid residue may or may not be one encoded by the genetic code. A variant of a polypeptide may be naturally occurring such as an allelic variant, or it may be a variant that is not known to occur naturally. “Variant” includes functional and structural variants.
[0076] As used herein, “gene” refers to a hereditary unit corresponding to a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a characteristic(s) or trait(s) in an organism. As used herein, “synthetic gene” can refer to a recombinant gene comprising one or more coding sequences for a protein of interest, or a synthetically purified protein that is not naturally occurring in its purified state.
[0077] As used herein, the terms “guide polynucleotide,” “guide sequence,” or “guide RNA” (gRNA or sgRNA) as can refer to any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequencespecific binding of a CRISPR complex to the target sequence. The degree of complementarity between a guide polynucleotide and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows- Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). A guide polynucleotide (also referred to herein as a guide sequence and includes single guide sequences (sgRNA)) can be about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, 90, 100, 110, 112, 115, 120, 130, 140, or more nucleotides in length. The guide polynucleotide (gRNA or sgRNA) can include a nucleotide sequence that is complementary to a target DNA sequence. This portion of the guide sequence can be referred to as the complementary region of the guide RNA or the CRISPR RNA (crRNA). Another portion of the guide sequence serves as a binding scaffold for the CRISPR-associated (Cas) nuclease. This portion of the guide sequence can be referred to as the tracrRNA. In one aspect, crRNA/tracrRNA can also work with the disclosed approach. Further in this aspect, since crRNA is shorter, it may be easier to incorporate the desired DNA modifications to the crRNAs by ligation or synthesis compared to incorporation into sgRNAs. In a further aspect, and without wishing to be bound by theory, tracrRNAs are generally universal and work with any sequence of crRNAs and so the crRNA/tracrRNA system may be more economical for use. The guide sequence can also include one or more miRNA target sequences coupled to the 3’ end of the guide sequence. The guide sequence can include one or more MS2 RNA aptamers incorporated within the portion of the guide strand that is not the complementary portion. As used herein the term guide sequence can include any specially modified guide sequences, including but not limited to those configured for use in synergistic activation mediator (SAM) implemented CRISPR or suppression. [0078] A guide polynucleotide can be less than about 150, 125, 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide polynucleotide to direct sequencespecific binding of a CRISPR complex to a target sequence may be assessed by any suitable assay. For example, the components of a CRISPR system sufficient to form a CRISPR complex, including the guide polynucleotide to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the CRISPR sequence, followed by an assessment of preferential cleavage within the target sequence. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a CRISPR complex, including the guide polynucleotide to be tested and a control guide polynucleotide different from the test guide polynucleotide, and comparing binding or rate of cleavage at the target sequence between the test and control guide polynucleotide reactions. Other assays are possible, and will occur to those skilled in the art.
[0079] As used herein, “polypeptides” or “proteins” refers to amino acid residue sequences. Those sequences are written left to right in the direction from the amino to the carboxy terminus. In accordance with standard nomenclature, amino acid residue sequences are denominated by either a three letter or a single letter code as indicated as follows: Alanine (Ala, A), Arginine (Arg, R), Asparagine (Asn, N), Aspartic Acid (Asp, D), Cysteine (Cys, C), Glutamine (Gin, Q), Glutamic Acid (Glu, E), Glycine (Gly, G), Histidine (His, H), Isoleucine (lie, I), Leucine (Leu, L), Lysine (Lys, K), Methionine (Met, M), Phenylalanine (Phe, F), Proline (Pro, P), Serine (Ser, S), Threonine (Thr, T), Tryptophan (Trp, W), Tyrosine (Tyr, Y), and Valine (Vai, V). “Protein” and “Polypeptide” can refer to a molecule composed of one or more chains of amino acids in a specific order. The term protein is used interchangeable with “polypeptide.” The order is determined by the base sequence of nucleotides in the gene coding for the protein. Proteins can be involved in the structure, function, and regulation of various functions.
[0080] As used herein, “identity,” is a relationship between two or more polypeptide or polynucleotide sequences, as determined by comparing the sequences. In the art, “identity” also refers to the degree of sequence relatedness between polypeptide as determined by the match between strings of such sequences. “Identity” can be readily calculated by known methods, including, but not limited to, those described in Computational Molecular Biology, Lesk, A. M., Ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D. I/!/., Ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G., Eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., Eds., M Stockton Press, New York, 1991 ; and Carillo, H., and Lipman, D., SIAM J. Applied Math. 1988, 48: 1073. Preferred methods to determine identity are designed to give the largest match between the sequences tested. Methods to determine identity are codified in publicly available computer programs. The percent identity between two sequences can be determined by using analysis software (e.g., Sequence Analysis Software Package of the Genetics Computer Group, Madison Wis.) that incorporates the Needelman and Wunsch (J. Mol. Biol., 1970, 48: 443-453) algorithm (e.g., NBLAST, and XBLAST). The default parameters are used to determine the identity for the polypeptides or polynucleotides of the present disclosure.
[0081] As used herein, “heterologous” refers to compounds, molecules, nucleotide sequences (including genes), and polypeptide sequences (including peptides and proteins) that are different in both activity (function) and sequence or chemical structure. As used herein, “heterologous” can also refer to a gene or gene product that is from a different organism. For example, a human GTP cyclohydrolase or a synthase can be said to be heterologous when expressed in yeast.
[0082] As used herein, “homolog” refers to a polypeptide sequence that shares a threshold level of similarity and/or identity as determined by alignment of matching amino acids. Two or more polypeptides determined to be homologs are said to be homologs. Homology is a qualitative term that describes the relationship between polypeptide sequences that is based upon the quantitative similarity.
[0083] As used herein, “paralog” refers to a homolog produced via gene duplication of a gene. In other words, paralogs are homologs that result from divergent evolution from a common ancestral gene.
[0084] As used herein, “orthologs” refers to homologs produced by speciation followed by divergence of sequence but not activity in separate species. When speciation follows duplication and one homolog sorts with one species and the other copy sorts with the other species, subsequent divergence of the duplicated sequence is associated with one or the other species. Such species specific homologs are referred to herein as orthologs.
[0085] As used herein, “similarity” is a quantitative term that defines the degree of sequence match between two compared polypeptide sequences. [0086] As used herein, "organism", "host", and "subject" refers to any living entity comprised of at least one cell. A living organism can be as simple as, for example, a single isolated eukaryotic cell or cultured cell or cell line, or as complex as a mammal, including a human being, and animals (e.g., vertebrates, amphibians, fish, mammals, e.g., cats, dogs, horses, pigs, cows, sheep, rodents, rabbits, squirrels, bears, primates (e.g., chimpanzees, gorillas, and humans).
[0087] As used herein, the term “recombinant” or “engineered” can generally refer to a non- naturally occurring nucleic acid, nucleic acid construct, or polypeptide. Such non-naturally occurring nucleic acids may include natural nucleic acids that have been modified, for example that have deletions, substitutions, inversions, insertions, etc., and/or combinations of nucleic acid sequences of different origin that are joined using molecular biology technologies (e.g., a nucleic acid sequences encoding a fusion protein (e.g., a protein or polypeptide formed from the combination of two different proteins or protein fragments), the combination of a nucleic acid encoding a polypeptide to a promoter sequence, where the coding sequence and promoter sequence are from different sources or otherwise do not typically occur together naturally (e.g., a nucleic acid and a constitutive promoter), etc. Recombinant or engineered can also refer to the polypeptide encoded by the recombinant nucleic acid. Non-naturally occurring nucleic acids or polypeptides include nucleic acids and polypeptides modified by man.
[0088] As used herein, “cell,” "cell line," and "cell culture" include progeny. It is also understood that all progeny may not be precisely identical in DNA content, due to deliberate or inadvertent mutations. Variant progeny that have the same function or biological property, as screened for in the originally transformed cell, are included.
[0089] As used herein, “culturing” refers to maintaining cells under conditions in which they can proliferate and avoid senescence as a group of cells. “Culturing” can also include conditions in which the cells also or alternatively differentiate.
[0090] As used herein, the term “specific binding” or “preferential binding” can refer to non- covalent physical association of a first and a second moiety wherein the association between the first and second moieties is at least 2 times as strong, at least 5 times as strong as, at least 10 times as strong as, at least 50 times as strong as, at least 100 times as strong as, or stronger than the association of either moiety with most or all other moieties present in the environment in which binding occurs. Binding of two or more entities may be considered specific if the equilibrium dissociation constant, Kd, is 10“3 M or less, 10“4 M or less, 10“5 M or less, 10“6 M or less, 10“ 7 M or less, 10’8 M or less, 10“9 M or less, 10’10 M or less, 10“11 M or less, or 10’12 M or less under the conditions employed, e.g., under physiological conditions such as those inside a cell or consistent with cell survival. In some embodiments, specific binding can be accomplished by a plurality of weaker interactions (e.g., a plurality of individual interactions, wherein each individual interaction is characterized by a Kd of greater than 10"3 M). In some embodiments, specific binding, which can be referred to as “molecular recognition,” is a saturable binding interaction between two entities that is dependent on complementary orientation of functional groups on each entity. Examples of specific binding interactions include primer-polynucleotide interaction, aptamer-aptamer target interactions, antibody-antigen interactions, avidin-biotin interactions, ligand-receptor interactions, metal-chelate interactions, hybridization between complementary nucleic acids, etc.
[0091] Unless otherwise specified, atmospheres referred to herein are based on atmospheric pressure (i.e. one atmosphere) and temperatures are ambient.
[0092] Now having described the aspects of the present disclosure, in general, the following Examples describe some additional aspects of the present disclosure. While aspects of the present disclosure are described in connection with the following examples and the corresponding text and figures, there is no intent to limit aspects of the present disclosure to this description. On the contrary, the intent is to cover all alternatives, modifications, and equivalents included within the spirit and scope of the present disclosure.
EXAMPLES
[0093] The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how the compounds, compositions, articles, devices and/or methods claimed herein are made and evaluated, and are intended to be purely exemplary of the disclosure and are not intended to limit the scope of what the inventors regard as their disclosure. Efforts have been made to ensure accuracy with respect to numbers (e.g., amounts, temperature, etc.), but some errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, temperature is in °C or is at ambient temperature, and pressure is at or near atmospheric.
Example 1 : Construction of Cas9-Polymerase Fusion Proteins
[0094] Cas9-polymerase fusion proteins were constructed using techniques known in the art and purified using either Ni-NTA agarose or cation exchange resins, then separated by electrophoresis to determine size of the constructs and successful fusion (see FIGs. 1A-1C). Molecular weights of selected fusion proteins are shown in Table 1 :
Figure imgf000026_0001
Example 2: Testing Cas9-Polymerase Fusion Proteins and Chimeric Guide Sequence
[0095] An assay was designed to test the disclosed system as seen in FIGs. 2-3. A plasmid was designed containing red fluorescent reporter mcherry with a premature stop codon and eGFP and transduced into HEK293T cells using a lentiviral transduction system. GFP positive cells were selected and the disclosed method was conducted to remove the stop codon in mcherry. Removal of the stop codon allowed simultaneous expression of eGFP and mcherry in the cells (see FIGs. 4A-4F and 5A-5B). FIGs. 5A-5B show a comparison of a standard prime editing technique (FIG. 5A) and a disclosed genome editing technique (Cas9 nickase fused with T4 DNA polymerase; FIG. 5B). Quantification of editing efficiencies for 6 different DNA polymerases fused to Cas9 are shown in FIG. 6.
Example 3: A Survey of 13 DNA Polymerase-Mediated Editors for Targeted Genome Modification
[0096] To determine whether a nickase Cas9 (nCas9) fused to a DNA polymerase (DNAP) could perform precise genome editing when delivered as a ribonucleoprotein complex, nCas9 fusion proteins paired with a variety of wild-type polymerases from viral or bacterial origins (FIG. 7A) were first constructed. A total of 13 DNAPs with diverse properties such as thermostability, proofreading activity, processivity, and size were selected and screened (FIG. 7B). Each construct was expressed in E. coli and purified for ribonucleoprotein delivery. To evaluate the efficiency of CODE candidates, a HEK293T-based reporter cell line containing an open reading frame with a green fluorescence protein (GFP) upstream of a red fluorescence protein (mCherry) was generated. A premature termination codon (PTC) was also installed into the mCherry gene so that the cells only display green fluorescence. Upon successful A-to-G substitution within the PTC, the stop codon is converted to the CAA codon which allows for mCherry translational readthrough.
[0097] Next, a cpegRNA was designed to target the constitutively expressed GFP-mCherry reporter gene containing the PTC. The cpegRNA is comprised of a 20-nt RNA targeting sequence, a guide RNA scaffold, and a 3’- end DNA extension sequence containing a primer binding site (PBS) and a DNAP template encoding the desired corrections (FIG. 7C). Each CODE candidate was complexed with the cpegRNA prior to delivery into HEK293T cells via nucleofection and edit efficiency was quantified by the percentage of mCherry positive cells (FIG. 7D).
[0098] It was observed that many CODE candidates were able to induce mCherry activation, albeit with low efficiencies compared to the engineered prime editor PE2. Among the most efficient wild-type CODEs were those that employed polymerases from T4 and T5 bacteriophages, reaching 4.1% and 10.2% mCherry activation, respectively. Additionally, the large fragment Bst DNAP derived from Geobacillus stearothermophilus achieved 4.7% while full length Bst DNAP was even more effective at 11.1 % mCherry activation. Prime editing adjacent systems were recently reported that utilize DNA-dependent DNA polymerases such as Phi29 DNAP and Klenow fragment. Within the disclosed editing system, it was observed that CODE candidates employing DNA Polymerase I and the Klenow fragment exhibited moderate mCherry activation (3.8% and 4.5%, respectively) while Phi29 DNAP showed low editing activity (2.3%) (FIG. 7E). Although the chimeric editing system differs from these studies, the data presented herein supports the reported abilities of Phi29 and the Klenow fragment to extend primed templates at a nick in the genomic DNA. Together, these data demonstrate that multiple DNA polymerases can utilize a DNA template within a chimeric pegRNA to perform precise edits in mammalian cells.
Example 4: Engineering T4 DNA Polymerase for Improved Editing Efficiency
[0099] For further engineering of CODEs for greater functionality and utility, T4 DNAP and Bst, large fragment DNAP (Bst-LF) were selected. T4 DNAP is a mesophilic polymerase that possesses strong 3’-5’ exonuclease activity. How the T4 DNAP location impacted the CODE-T4 editing efficiency was studied by positioning the T4 DNAP either on the C-terminus (CODE-T4v1) or the N-terminus (CODE-T4v2) of nCas9 connected by a 33-amino acid linker (FIG. 8A). Not much difference was observed in the percentage of mCherry activation between the N-terminal and C-terminal fusion constructs. Next, how the 3’-5’ exonuclease activity of T4 DNAP affected the performance of CODE (FIG. 8B) was investigated. A Y320A mutation was installed on T4 DNAP of C0DE-T4v1 editor (CODE-T4v3), which has been shown to diminish the polymerase exonuclease activity by 50-fold. Notably, this single mutation increased the efficiency of mCherry activation 2.4-fold compared to the wild-type editor (FIG. 8B). A T4 Gene 32 Protein (gp32) was installed on the N-terminus of T4 DNA polymerase to generate a CODE-T4v4 editor (FIGs. 8A- 8B). T4 gp32 is a single-stranded binding protein (SSB) that is crucial for T4 replication and repair. However, reduced activity was observed compared to CODE-T4v3, possibly due to gp32 SSB being inactive in a fusion format or being sterically hindered by nCas9 and T4 DNAP.
[0100] A sso7d DNA binding domain was then inserted at the C-terminus T4 DNAP of CODE- T4v1 and CODE-T4v3 to create CODE-T4v5 and CODE-T4v6. Sso7d, a DNA binding protein derived from Sulfolobus solfactaricus, is known to greatly enhance the processivity of DNA polymerases. It was investigated whether this binding domain could improve the CODE-T4 editors. Interestingly, no significant mCherry activation of these two CODEs compared to the original CODE-T4v1 was observed. The CODE-T4v6 editor, which possesses a Y320A mutation, exhibited a similar editing efficiency to the CODE-T4v1. Since the sso7d DNA binding domain possesses some ribonuclease activity, it was reasoned that it could potentially degrade the cpegRNA and have a negative impact on CODE22. This ribonuclease activity was therefore deactivated by introducing two mutations E12L and K35L into the sso7d domain to generate CODE-T4v7. However, this version of CODE performed poorly compared to CODE-T4v3 with a 3.8-fold decrease in efficiency (FIG. 8B).
[0101] Engineering the T4 DNAP itself was another focus. Mutations such as L412M and I50L have been shown to increase the processivity, although these substitutions cause a slight increase in replication errors. Combinations of these CODE-T4 mutants were therefore generated and tested in HEK293T cells. Notably, up to 20.7% mCherry activation was observed for the CODE-T4Y320A/L412M mutants, which is a 3.8-fold increase compared to CODE-T4v1. A boost in efficiency was also noted for the CODE-T4I50L/Y320A and CODE-T4G255S/Y320A mutants compared to the original CODE-T4v1 (FIG. 8C).
[0102] It was hypothesized that the 3’-5’ exonuclease activity of T4 DNA polymerase might have a negative impact on the editing efficiency as it could remove bases at the newly synthesized 3’- flap25,26. To test this hypothesis, a CODE-T4D219A and CODE-T5D138A with deficient 3’-5’ exonuclease activity were generated and compared against the corresponding wild-type CODE- T4 and CODE-T5. Interestingly, no difference in mCherry activation was observed in HEK293T reporter cells, indicating that this 3’-5’ exonuclease activity might have minimal involvement in the prime editing process (FIG. 8D). Finally, the amino acid linker length was optimized between the nCas9 and T4 DNAP and found that a 45 amino acid linker exhibited the best mCherry activation efficiency (FIG. 8E).
Example 5: Engineering Bst DNA Polymerase for Improved Editing Efficiency
[0103] Bst-LF DNAP is a thermophilic DNA polymerase that has strong strand displacement activity; therefore, Bst-LF is often used in isothermal amplification technologies such as Loop- mediated isothermal amplification (LAMP). Although wild-type Bst-LF is optimally active at high temperatures in amplification reactions, moderate editing efficiency was observed by Bst-LF and full length Bst DNAP in HEK293T cells. Additionally, prime editor 2 (PE2), which utilizes an engineered M-MLV variant containing five-point mutations that increase the thermostability and processivity, was shown to achieve dramatic increase in prime editing efficiency compared to prime editor 1 (PE1) that contains a wild-type M-MMLV reverse transcriptase. While Bst-LF DNAP is naturally a thermophilic enzyme, it was reasoned that enhancing its thermostability further might improve its overall performance inside cells. Multiple approaches were therefor explored to engineer Bst-LF mutants within the CODE systems to see if increasing the thermostability of Bst- LF DNAP would increase its activity in cells.
[0104] Paik et al. used a machine learning approach to generate a Bst DNAP variant with increased thermostability. The Bst-LF DNAP variant, referred to as Br512, consists of a modified 47 amino acid actin-binding protein called villin headpiece fused to the N-terminus of the Bst-LF. The fusion of the villin headpiece to Bst-LF DNAP is hypothesized to improve protein folding and increase processivity via stabilization of the DNA/protein complex. The Br512 DNAP was employed in CODE (referred to as CODE-Bstv3) and observed a nearly 3-fold increase in mCherry-positive cells compared to wild-type Bst-LF. In an attempt to further increase efficiency, additional variants of Br512 engineered to have even stronger thermostability were tested. Br512g3.1 and Br512g3.2 variants differ from Br512 in the villin headpiece, where point mutations were rationally designed to supercharge and stabilize the domain (referred to as SC-vHP47). The Br512g3.1 variant bears 3 mutations (N31 R, N39K, E43K) on the SC-vHP47, whereas the Br512g3.2 bears 4 mutations (A20K, N31 R, N39K, E43K). Furthermore, these two Bst-LF DNAP variants bear additional mutations within the polymerase domain (T493N, A552G, S371 D), resulting in a significant increase in thermostability over Br512 while maintaining high functionality in LAMP reactions up to 74 °C. By incorporating these Br512g3.1 and Br512g3.2 DNAP variants into the chimeric oligonucleotide-directed editors, the CODE-Bstv5 and CODE-Bstv6 systems were generated. Notably, CODE-Bstv5 and CODE-Bstv6 resulted in 23.6% and 33.4% efficiency installing the desired edits, approximately 5-fold and 7-fold increases in mCherry activation compared to the original CODE-Bstv1 system.
Example 6: In-House Synthesis of Chimeric pegRNA Improves Editing Efficiency
[0105] Traditionally, a pegRNA consists of 100% RNA bases and therefore can be synthesized either by enzymatic or chemical synthesis reactions. This advantage provides flexibility to deliver prime editing systems into cells as well as animal models. Oftentimes, it is convenient to co-deliver plasmids encoding prime editors and pegRNAs driven by a U6 promoter. Since the cpegRNA is a chimeric entity consisting of both RNA and DNA bases, it cannot be synthesized by naturally occurring enzymes. Instead, cpegRNAs must be chemically synthesized, which poses challenges for delivery approaches and associated synthesis costs.
[0106] To address this hurdle, an inexpensive ligation-based method was developed to synthesize cpegRNAs. Taking advantage of T4 RNA ligase I, which can ligate a 3’-hydroxyl RNA to a 5’-phosphorylated DNA, a synthesis protocol was established for generating full length cpegRNA (FIG. 9A). A benefit of this ligation reaction is that one can easily make a multitude of edit types at the same genomic locus by modifying single stranded DNA oligos which each utilize the same single-guide RNA. This approach drastically decreases the synthesis cost compared to synthetic cpegRNA. Additionally, four modified 2'-O-methylated uracils were placed at the end of the single guide RNA to improve stability and reduce sgRNA scaffold incorporation in the cells (FIG. 9B). In-house synthesized cpegRNAs were tested with CODE-Bstv6 and noted an improved performance with 43.7% mCherry activation (FIGs. 9C-9D).
Example 7: Efficient Editing of Endogenous Gene Loci with CODEMax and CODEMax(exo+)
[0107] How efficiently CODE performs when targeting endogenous genes was next investigated. Using nucleofection strategies, a significant reduction in editing efficiency was observed for CODE-Bstv6, possibly due to the cpegRNA chimeric nature initiating cellular response, leading to its degradation. Alternative methods to deliver CODES by comparing RNP, mRNA, and plasmid approaches were then explored. Herein it is proposed that encapsulating the cpegRNA with transfection reagents could provide it with some protection from the cellular environment. After several rounds of optimization, it was shown that plasmids encoding CODEs and synthetic cpegRNAs can be co-delivered into mammalian cells via transfection (FIGs. 9E-9G, 11A-11 F). [0108] To enhance editing efficiency, two additional nCas9 mutations (R221 K and N394K) that were developed by Chen et al. to convert PE2 into PEMax were adopted for this work. This version of chimeric oligonucleotide-directed editing was termed CODEMax, which is a combination of the engineered CODE-Bstv6 and a mutated nCas9 variant (R221 K and N394K). Furthermore, the initial screening of CODE candidates (FIG. 7E) showed better performance of a chimeric oligonucleotide-directed editor using full-length Bst DNAP compared to that with Bst-LF. Interestingly, the difference between the truncated and full-length polymerase is the absence of a 5’-3’ exonuclease domain. It was hypothesized that a polymerase that possesses strong 5’-3’ exonuclease activity may improve the editing efficiency (FIGs. 10A-10B, 12A-12B). In prime editing-like systems, the 3’ flap generated after extension by the polymerase, enters competition for incorporation into the genome with the 5’ unedited flap. Therefore, having a polymerase with 5’-3’ exonuclease activity that can displace and degrade the 5’ unedited strand during extension may be beneficial (FIG. 10B). In support of this hypothesis, Liang et al. has demonstrated that the fusion of a T5 exonuclease at the N-terminus of the M-MLV reverse transcriptase increased efficiency of the PE2 system in plants. Therefore, the 5’-3’ exonuclease domain was incorporated back into CODEMax, hereafter referred to as CODEMax(exo+).
[0109] CODEMax and CODEMax(exo+) were tested by targeting multiple genomic regions with a variety of edit types such as base conversion and transversion, short insertion, and short deletion. Enhanced editing efficiency of CODEMax and CODEMax(exo+) was observed compared to PE2 and PEMax at several loci such as EMX1, FANCF, SRD5A3 and DNMT1 with HEK3 and MECP2 being exceptions. (FIGs. 10C-10H, 13A-15). It should be noted that these data were head-to-head comparisons of synthetic pegRNA/cpegRNA and protein-encoding plasmid via transfection. However, expression of pegRNA under U6 promoter resulted in higher efficiency for PE2 and PEMax compared to delivery of synthetic pegRNA (FIGs. 16A-16C). With the addition of a nicking guide, which significantly increases edit efficiency for traditional prime editors by evading mismatch repair (MMR), improved prime editing was also observed for CODE systems, suggesting a conserved presence of heteroduplex intermediates (FIGs. 10E-10H, 14A-14D). Additionally, CODEMax(exo+) outperformed CODEMax at a majority of edit sites. Lastly, it was demonstrated that CODEMax and CODEMax(exo+) exhibited low unintended scaffold incorporation like that seen for PEMax at certain edit sites (FIGs. 17A-17C). On the other hand, comparable amounts of total unintended edits were observed between PE and CODE systems (FIGs. 18A-18D). Useful forward and reverse primers are found in Table 1 : Reverse Primers
Figure imgf000031_0001
Figure imgf000031_0002
Figure imgf000032_0001
[0110] Useful pegRNA/cpegRNA sequences are found in Tables 2-4:
Figure imgf000032_0002
Figure imgf000033_0002
Figure imgf000033_0003
Figure imgf000033_0001
Figure imgf000034_0001
[0111] Useful Nicking guide RNA (ngRNA) sequences are found in Tables 5-6:
Figure imgf000034_0002
Figure imgf000034_0003
Figure imgf000035_0001
[0112] Exemplary amino acid sequences are found in Table 7:
Figure imgf000035_0002
Figure imgf000036_0002
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Example 8: Discussion
[0113] This study demonstrates that chimeric oligonucleotide-directed editors are effective genome editors but with a distinctly new approach in protein engineering and cpegRNA design, establishing a new class of Cas9-based editing tools. The two-component CODE system was developed by screening 13 diverse polymerases and selecting the best candidates for further engineering, resulting in a thermophilic Bst DNA polymerase with a robust strand displacement capability and 5’-3’ exonuclease activity. It was hypothesized that the strand displacement property promotes genome strand invasion during R-loop formation of nickase Cas9 at the target site, allowing for enhanced polymerization. Additionally, the 5’-3’ exonuclease activity of the Bst DNAP supports the degradation of the 5’-flap which leverages the incorporation of the newly polymerase-mediated extension of the 3’-flap into the genome, favoring the edit-incorporating outcomes.
[0114] The engineered CODE systems allow for further development of a whole new class of prime editing systems utilizing DNA-dependent DNA polymerases. DNA polymerases are abundant and diverse across all three domains of life. The specific properties of wild-type DNA polymerases that can be beneficial for prime editing like systems include thermostability, processivity, proofreading ability, and 5’ to 3’ exonuclease activity. As shown herein, further engineering of wild-type DNA polymerases can improve editing outcomes, but engineering efforts may also be directed towards specific applications. For example, a highly processive DNA polymerase may be required for longer insertions; however, for simpler edits, a polymerase with higher fidelity may be favored. Doman et al. have shown that different reverse transcriptase proteins perform better depending on the edit type and location. Having a diverse toolbox of prime editors utilizing reverse transcriptase or DNA polymerase-based editors enables broad applications for genome engineering. Example 9: Materials and Methods
General cloning methods and plasmid construction
[0115] CODE gene fragments were either obtained from Addgene or synthesized by Twist Biosciences. Bacterial and mammalian expression plasmids for CODES were cloned using InFusion® cloning (Takara Bio, Cat# 638948). Q5 high-fidelity polymerase (New England Biolabs, Cat# M0491 L) was used to amplify gene fragments and non-lentiviral backbone for cloning as well as genomic DNA for deep sequencing. For In-Fusion® cloning involving the assembly of lentiviral backbones, PrimeSTAR® GXL DNA Polymerase (Takara Bio, Cat# R050A) was used for amplification. For transfection into mammalian cells, plasmids expressing CODEs were prepared using ZymoPure II Plasmid Midiprep kit (Zymo Research, Cat# D4201) and diluted down to 1 mg/pL prior to transfection. SEQ ID NO. 72 provides an exemplary sequence for pCMV- CODEMax-P2A-eGFP and SEQ ID NO. 73 provides an exemplary sequence for pCMV- CODEMax(exo+)-P2A-eGFP.
Chimeric pegRNA synthesis via T4 RNA liqase-mediated ligation
[0116] All pegRNAs, sgRNAs, and 5’ phosphorylated ssDNAs were purchased from IDT. cpegRNAs were also purchased from IDT unless otherwise indicated that they were produced via ligation of 5’ phosphorylated ssDNA to sgRNAs. Ligations were prepared using T4 RNA Ligase I (New England Biolabs, Cat# M0204) as follows: 3.5 pL T4 RNA Ligase Buffer, 4.5 pL PEG8000, 2 pL T4 RNA Ligase I, 3 pL 10mM ATP, 2 pL 100uM sgRNA, 5 pL 100 mM 5’ phosphorylated ssDNA, 0.25 pL Murine RNase Inhibitor (New England Biolabs, Cat# M0314), and 14.75 pL water. Reaction volumes greater than 35pL led to decreased ligation efficiencies. Reactions were incubated at 16°C for 16 hours and purified with Monarch® RNA Cleanup Kits (New England Biolabs, Cat# T2030) according to manufacturer instructions. Products were analyzed via 10% TBE-Urea PAGE electrophoresis (Biorad, Cat# 3450088).
CODE protein expression and purification
[0117] Bacterial expression plasmids carrying CODEs were transformed into homemade competent cells propagated from Rosetta™ 2(DE3)pLysS Singles™ Competent Cells (Millipore Sigma, Cat# 71401). Individual colonies were picked and inoculated in 50 mL Luria Broth (Fisher Scientific, Cat# BP9723-2) overnight at 37°C. The culture was then scaled up to 4-12 liters of Terrific Broth (RPI, T15000- 10000.0) and grown until OD = 0.8-1.0. The culture was then quickly cooled on ice for 10-15 minutes and induced with 1 mM isopropyl B-D-1-thiogalactopyranoside (IPTG) (Gold Biotechnology, Cat# I2481C100). For CODE constructs that were built based on the pET-PE2-His backbone (Addgene, #170103), the culture was induced at 18°C for 5 hours followed by 26 °C for 14-18 hours. For CODE constructs that were built based on PE-Max-pET21a backbone (Addgene, #204471), the culture was induced at 18 °C for 16-18 hours.
[0118] Cell pellets were collected the next day by centrifugation (4000 *g for 10 minutes), suspended in 100-150 ml_ lysis buffer (500 mM NaCI, 50 mM Tris-HCI pH = 7.5, 1 mM TCEP- HCI, 20 mM imidazole, and 5% glycerol) followed by sonication. The lysate was centrifuged at 40,000 x g for 45 minutes before passing through a 0.45 pm filter. The clarified lysate was then injected into a prepacked Ni-NTA affinity column (EconoFit Nuvia IMAC Column, Biorad #12009287) in a FPLC (NGC Quest Plus, Biorad) pre-equilibrated with lysis buffer. Proteins were eluted from the column with 40 mL of elution buffer (500 mM NaCI, 50 mM Tris-HCI pH = 7.5, 1 mM TCEP-HCI, 300 mM Imidazole, and 5% glycerol). The eluted solution was concentrated in an Amicon® Ultra Centrifugal Filter, 50 kDa MWCO (Millipore Sigma, UFC905024) down to 10-15 mL and equilibrated with 40 mL of Buffer A (200 mM NaCI, 50 mM Tris-HCI pH = 7.5, 1 mM TCEP- HCI, and 5% glycerol). The protein mixture was then passed through a 5 mL Hitrap Heparin HP column (Cytiva, Cat# 17040701) pre-equilibrated with Buffer A. The column then underwent gradient elution from Buffer A to Buffer B (2000 mM NaCI, 50 mM Tris-HCI pH = 7.5, 1 mM TCEP- HCI, and 5% glycerol). The purest fractions of the protein were pooled together and concentrated in an Amicon® Ultra Centrifugal Filter (50 kDa MWCO) in final buffer C (500 mM NaCI, 50 mM Tris-HCI pH = 7.5, 1 mM TCEP-HCI, and 5% glycerol) before storing at -80 °C. When in use, the protein was diluted in storage buffer (300 mM NaCI, 10 mM Tris-HCI, 0.1 mM EDTA, 1 mM DTT, 50% glycerol, 0.1 % Triton® X- 100, final pH = 7.4 at 25 °C) down to 50 pM and stored at -20 °C.
Mammalian cell culture
[0119] HEK293T and Lenti-X™ 293T were obtained from ATCC (CRL-3216) and Takara Bio (#632180), respectively. U2OS cells were obtained ATCC (HTB-96). The cells were tested with mycoplasma using MycoAlert® Mycoplasma Detection Kit (Lonza, Cat# LT07-118). The cells were cultured and passaged in D10 medium containing DM EM high glucose with GlutaMAX™ supplement and pyruvate (Gibco, Cat# 10569044), 10% Fetal Bovine Serum (Gibco, Cat# A3160902), 1X Penicillin-Streptomycin (Gibco, Cat# 15140122), and 1X MEM non-essential amino acids (Gibco, 11140035). All cell lines were incubated at 37 °C and 5% CO2.
Reporter cell line and stably expressed CODE cell line generation [0120] For lentiviral packaging in a T75 flask, 10 pg of transfer plasmid was co-transfected with 5 pg of pMD2.G (Addgene, #12259) and 7.5 pg of psPAX2 (Addgene, #12260) into Lenti- X™ 293T cells with 50 pL of Lipofectamine 3000 and 42 pL P3000 Enhancer reagents (ThermoFisher, Cat# L3000008), which were diluted in Opti-MEM™ I Reduced Serum Medium (Gibco, 31985062) following the manufacturer instructions. The medium was changed to D10 medium six hours later, and the cells were incubated for an additional 48-60 hours. The cells were harvested and pelleted down via centrifugation at 4000 x g for 10 minutes at 4 °C. The supernatant was passed through a 0.45 pm filter, aliquoted, and stored at -80 °C until use.
[0121] For lentiviral transduction, 5 x 105 HEK293T cells were infected with multiple dilutions of viral supernatant via reverse transduction. Briefly, viral supernatant was added to a 6-well plate first. The cells were then counted and resuspended in D10 medium supplemented with 10 pg/mL of TransducelT™ Transduction Reagent (Mirus Bio, Cat# MIR6620) and transferred to the wells pre-added with viral supernatant. The cells were incubated for 72 hours before flow cytometry sorting and/or antibiotic selection.
Mammalian cell RNP nucleofection and plasmid transfection
[0122] Ribonucleoprotein (RNP) complexes were delivered into HEK293T cells via a 4D- Nucleofector® X Unit (Lonza, Cat# AAF-1003X) in a strip format using SF Cell Line 4D- Nucleofector™ X Kit S (Lonza, Cat# V4XC-2032). Purified PEs and CODES were complexed with either pegRNA or cpegRNA to form RNP at 50 pmol protein: 200 pmol peg/cpegRNA ratio for 15 minutes at room temperature. Around 2x105 cells were resuspended in Lonza SF buffer, mixed with the RNP, and electroporated using the program ED-130. The mixture was then incubated at 37 °C and 5% CO2 for 10 minutes before adding to a 48-well plate pre-added with DM EM medium supplemented with 10% Fetal Bovine Serum (no antibiotic). Cells were harvested after 72 hours.
[0123] Plasmids encoding prime editors and chimeric editors and synthetic pegRNA/cpegRNA were delivered to HEK293T cells utilizing TranslT-X2® Dynamic Delivery System (Mirus Bio, Cat# Ml R6000). 24 hours prior to transfection cells were seeded at of 2x105 cells per well in 24-well plates. Immediately prior to transfection, the media was replaced with antibiotic-free D10 media. 1500 ng of chimeric/prime editor plasmid, 750 ng of synthetic cpegRNA/pegRNA, and 500 ng of ngRNA plasmid (if applicable) were complexed for 20 minutes with 6 pL of TranslT-X2 in 50 pL of Opti-MEM™ Reduced Serum Medium (Gibco, Cat# 31985062). After complexing, reactions were added dropwise to each well and harvested after 72 hours.
Genomic DNA library preparation and targeted amplicon deep sequencing [0124] Genomic DNA was extracted from the HEK293T and U2OS cells using the QuickExtract™ DNA Extraction Solution (Biosearch Technologies) system according to the manufacturer’s instructions. The DNA was then amplified in the first round of PCR using Q5 DNA Polymerase (NEB), and Illumina barcodes were appended during a second PCR. The products were then gel extracted, pooled together, and loaded on an Illumina MiSeqDx using a MiSeq Reagent Nano Kit v2 (Illumina, Cat# MS-101-1001) according to the manufacturer’s protocol. CRISPResso2 was used to determine the percentage of precise editing and indels35. Quantification window was defined by the parameter “-qwc” spanning 10 base pairs upstream and downstream flanking the targeting sequence in cases where there was no nicking guide and 10 base pairs flanking the targeting sequence and the nicking guide in the case where a nicking guide was used.
[0125] It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the abovedescribed embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.
REFERENCES
1. Abdus Sattar, A. K., et al. Functional consequences and exonuclease kinetic parameters of point mutations in bacteriophage T4 DNA polymerase. Biochemistry 35, 16621-16629 (1996).
2. Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold
3. Nature (2024).
3. Anzalone, A. V. et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat Biotechnol 40, 731-740 (2022).
4. Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149-157 (2019).
5. Chen, P. J. et al Prime editing for precise and highly versatile genome manipulation. Nat Rev Genet 24, 161-177 (2023).
6. Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635-5652 e5629 (2021). Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat Biotechnol 37, 224-226 (2019). da Silva, J. F. et al. Click editing enables programmable genome writing using DNA polymerases and HUH endonucleases. bioRxiv (2023), doi:10.1101/2023.09.12.557440 Doman, J. L. et al. Phage-assisted evolution and protein engineering yield compact, efficient prime editors. Cell 186, 3983-4002 e3926 (2023). Frey, M. W., et al. Construction and characterization of a bacteriophage T4 DNA polymerase deficient in 3'— >5' exonuclease activity. Proc Natl Acad Sci U S A 90, 2579-2583 (1993). Gould, S. I. Prime editing sensors enable multiplexed genome editing. Nat Rev Genet 25, 454 (2024). Halperin, S. O. et al. CRISPR-guided DNA polymerases enable diversification of all nucleotides in a tunable window. Nature 560, 248-252 (2018). Jordan, C. S. et al. Regulation of the bacteriophage T4 Dda helicase by Gp32 single-stranded DNA-binding protein. DNA Repair (Amst) 25, 41-53 (2015). Leavitt, M. C. et al. T5 DNA polymerase: structural-functional relationships to other DNA polymerases. Proc Natl Acad Sci U S A 86, 4465-4469 (1989). Lesnik, E. A. et al. Relative thermodynamic stability of DNA, RNA, and DNA:RNA hybrid duplexes: relationship with base composition and structure. Biochemistry 34, 10807-10815 (1995). Li, V., et al. Identification of a new motif in family B DNA polymerases by mutational analyses of the bacteriophage t4 DNA polymerase. J Mol Biol 400, 295-308 (2010). Liang, Z., et al. Addition of the T5 exonuclease increases the prime editing efficiency in plants. J Genet Genomics 50, 582-588 (2023). Liu, B. et al. Targeted genome editing with a DNA-dependent DNA polymerase and exogenous DNA-containing templates. Nat Biotechnol 42:1039-1045 (2023). Liu, P. et al. Improved prime editors enable pathogenic allele correction and cancer modelling in adult mice. Nat Commun 12, 2121 (2021). Nelson, J. W. et al. Engineered pegRNAs improve prime editing efficiency. Nat Biotechnol 40, 402-410 (2022). Notomi, T. et al. Loop-mediated isothermal amplification of DNA. Nucleic Acids Res 28, E63 (2000). Oscorbin, I. et al. Bst polymerase - a humble relative of Taq polymerase. Comput Struct Biotechnol J 21, 4519-4535 (2023). Paik, I., et al. Charge Engineering Improves the Performance of Bst DNA Polymerase Fusions. ACS Synth Biol 11 , 1488-1496 (2022). Pant, K. et al. The role of the C-domain of bacteriophage T4 gene 32 protein inss DNA binding and dsDNA helix-destabilization: Kinetic, single-molecule, and cross-linking studies. PLoS One 13, e0194357 (2018). Petri, K. et al. CRISPR prime editing with ribonucleoprotein complexes in zebrafish and primary human cells. Nat Biotechnol 40, 189-193 (2022). Ponnienselvan, K. et al. Reducing the inherent auto-inhibitory interaction within the pegRNA enhances prime editing efficiency. Nucleic Acids Res 51, 6966-6980 (2023). Qi, R. et al. Mutant T4 DNA polymerase for easy cloning and mutagenesis. PLoS One 14, e0211065 (2019). Reha-Krantz, L. J. et al. Motif A of bacteriophage T4 DNA polymerase: role in primer extension and DNA replication fidelity. Isolation of new antimutator and mutator DNA polymerases. J Biol Chem 269, 5635-5643 (1994). Reha-Krantz, L. J., et al. Engineering processive DNA polymerases with maximum benefit at minimum cost. Front Microbiol 5, 380 (2014). Shehi, E. et al. The Sso7d DNA-binding protein from Sulfolobus solfataricus has ribonuclease activity. FEBS Lett 497, 131-136 (2001). Vats, S., et al. Prime Editing in Plants: Prospects and Challenges. J Exp Bot, erae053 (2024). Wang, Y. et al. A novel strategy to engineer DNA polymerases for enhanced processivity and improved performance in vitro. Nucleic Acids Res 32, 1197- 1207 (2004). Yarnall, M. T. N. et al. Drag-and-drop genome insertion of large sequences without doublestrand DNA cleavage using CRISPR-directed integrases. Nat Biotechnol 41 , 500-512 (2023). Zeng, H. et al. Recent advances in prime editing technologies and their promises for therapeutic applications. Curr Opin Biotechnol 86, 103071 (2024). Zhang, W. et al. Enhancing CRISPR prime editing by reducing misfolded pegRNA interactions. bioRxiv (2023). doi: 10.1101/2023.08.14.553324 Zhao, Z., et al. Prime editing: advances and therapeutic applications. Trends Biotechnol 41 , 1000-1012 (2023).

Claims

CLAIMS What is claimed is:
1 . A system for site-specific modification of a double-stranded target DNA sequence, the system comprising:
(a) a fusion protein comprising a DNA-binding nickase protein and a DNA polymerase,
(b) a chimeric guide nucleic acid sequence (cpegRNA) comprising at least one region of complementarity to a first strand of the double-stranded target DNA sequence, a core sequence that interacts with the DNA-binding nickase protein, a template encoding a modified sequence for insertion into a second strand of the double-stranded target DNA sequence, and a primer sequence.
2. The system of claim 1 , wherein the DNA-binding nickase protein comprises a Cas9 nickase, a Cas12i1 nickase, a Cas12a nickase, an ortholog thereof, or any combination thereof.
3. The system of claim 2, wherein the DNA-binding nickase protein is derived from a Streptococcus pyogenes Cas9, a Staphylococcus aureus Cas9, a Neisseria meningitidis Cas9, a Streptococcus thermophilus Cas9, an Actinobacillus minor Cas9, an Actino bad I I us pleuropneumoniae Cas9, an Actinobacillus seminis Cas9, an Actinobacillus succinogenes Cas9, a Bergeriella denitrificans Cas9, a Conservatibacter flavescens Cas9, a Gallibacterium anatis Cas9, a Haemophilus felis Cas9, a Haemophilus parainfluenzae Cas9, a Haemophilus pittmaniae Cas9, a Haemophilus sputorum Cas9, a Mannheimia granulomatis Cas9, a Neisseria animalis Cas9, a Neisseria animaloris Cas9, a Neisseria arctica Cas9, a Neisseria bacilliformis Cas9, a Neisseria dentiae Cas9, an Otariodibacter oris Cas9, a Pasteurella aerogenes Cas9, a Pasteurella langaaensis Cas9, a Pasteurella mairii Cas9, a Pasteurellaceae bacterium Cas9, a Phocoenobacter uteri Cas9, a Rodentibacter pneumotropicus Cas9, a Simonsiella muelleri Cas9, a Suttonella indoIogenes Cas9, a Treponema denticola Cas9, a Moraxella bovoculi Cas12a, or an engineered AsCas12a.
4. The system of claim 2, wherein the Cas9 protein is engineered to contain an R221 K mutation, an N394K mutation, or both.
5. The system of claim 1 , wherein the chimeric guide nucleic acid sequence comprises both RNA and DNA.
6. The system of claim 1, wherein in the chimeric guide nucleic acid sequence, the core sequence comprises RNA.
7. The system of claim 1 , wherein in the chimeric guide nucleic acid sequence, the template incorporating a modified sequence for insertion comprises DNA.
8. The system of claim 1, wherein the template is from about 1 to about 5000 nucleotides in length.
9. The system of claim 1, wherein the template is from about 3 to about 500 nucleotides in length.
10. The system of claim 1 , wherein the primer sequence comprises a primer for the DNA polymerase.
11. The system of claim 1, wherein the primer sequence comprises RNA.
12. The system of claim 1, wherein the primer is from about 4 to about 20 nucleotides in length.
13. The system of claim 1, wherein the primer is from about 10 to about 20 nucleotides in length.
14. The system of claim 1 , wherein the DNA polymerase comprises E. coli DNA polymerase I (Ecol), T4 DNA polymerase, T5 DNA polymerase, T7 DNA polymerase, thioredoxin binding domain of T7 DNA polymerase (T7/Trx), Klenow fragment DNA polymerase, 29 DNA polymerase, Bsu DNA polymerase, Bst DNA polymerase large fragment (LF), Bst DNA polymerase full length (F), Pfu DNA polymerase, Pwo DNA polymerase, Stoffel DNA polymerase, or any combination thereof.
15. The system of claim 14, wherein in the fusion protein, the T4 DNA polymerase is located at a C-terminus or an N-terminus of the DNA-binding nickase protein, wherein a 33 amino acid linker connects the T4 DNA polymerase to the DNA-binding nickase protein.
16. The system of claim 15, wherein the T4 DNA polymerase comprises at least one mutation.
17. The system of claim 16, wherein the at least one mutation is a Y320A mutation, an L412M mutation, an I50L mutation, a G255S mutation, or any combination thereof.
18. The system of claim 16, wherein the at least one mutation comprises addition of an sso7d DNA binding domain at a C-terminus of the T4 DNA polymerase, optionally wherein the sso7d DNA binding domain comprises an E12L mutation, a K35L mutation, or both.
19. The system of claim 14, wherein the Bst DNA polymerase LF comprises at least one fusion domain.
20. The system of claim 19, wherein the at least one fusion domain comprises an actin-binding protein.
21. The system of claim 20, wherein the actin-binding protein comprises villin headpiece.
22. The system of claim 19, wherein the fusion domain is fused to an N-terminus of the Bst DNA polymerase LF.
23. The system of claim 14, wherein the Bst DNA polymerase LF comprises a polymerase domain mutation.
24. The system of claim 23, wherein the polymerase domain mutation comprises a T493N mutation, an A552G mutation, an S371 D mutation, or any combination thereof.
25. The system of claim 23, wherein the polymerase domain mutation comprises increased thermostability relative to a wild type Bst DNA polymerase LF.
26. The system of claim 21, wherein the villin headpiece comprises at least one additional mutation.
27. The system of claim 26, wherein the at least one additional mutation comprises N31 R, N39K, E43K, A20K, or any combination thereof.
28. The system of claim 1, wherein the DNA polymerase is optimized to operate at a temperature of from about 30 °C to about 40 °C.
29. The system of claim 28, wherein the DNA polymerase is optimized to operate at a temperature of about 37 °C.
30. The system of claim 1 , wherein the modified sequence for insertion comprises at least one mutation.
31. The system of claim 30, wherein the at least one mutation comprises an insertion, a deletion, a base substitution, or any combination thereof.
32. A method for site-specific modification of a double-stranded target DNA sequence in a cell, the method comprising contacting the double-stranded target DNA sequence with the system of any one of claims 1-17; wherein the core sequence binds the DNA-binding nickase protein; wherein the DNA-binding nickase protein nicks the second strand of the double-stranded target DNA sequence to form a free 3' end; wherein the primer sequence anneals with a complementary region on the second strand of the double-stranded target DNA; and wherein the DNA polymerase synthesizes a single strand of DNA encoded by the template from the free 3' end.
33. The method of claim 32, wherein the cell comprises a prokaryotic cell or a eukaryotic cell.
34. The method of claim 33, wherein the eukaryotic cell is a mammalian cell.
35. The method of claim 32, wherein the cell is a dividing cell or a non-dividing cell.
36. The method of claim 32, wherein the system is introduced to the cell using nucleofection.
37. The method of claim 32, wherein performing the method results in less than 10% off-target editing in a genome of the cell.
38. The method of claim 32, wherein the fusion protein and the chimeric guide nucleic acid sequence are present in amounts effective to modify genomic DNA in the cell.
39. The method of claim 32, wherein performing the method results in incorporation of the modified sequence in at least 25% of a population of cells contacted with the system.
40. The method of claim 32, wherein the method does not introduce double strand breaks into cellular DNA.
41. A cell comprising at least one genomic modification introduced by the method of claim 32.
42. A fusion protein comprising:
(a) a first part comprising a DNA-binding nickase protein; and
(b) a second part comprising a DNA polymerase.
43. The fusion protein of claim 42, wherein the DNA-binding nickase protein comprises a Cas9 nickase, a Cas12i1 nickase, a Cas12a nickase, an ortholog thereof, or any combination thereof.
44. The fusion protein of claim 43, wherein the DNA-binding nickase protein is derived from a Streptococcus pyogenes Cas9, a Staphylococcus aureus Cas9, a Neisseria meningitidis Cas9, a Streptococcus thermophilus Cas9, an Actinobacillus minor Cas9, an Actinobacillus pleuropneumoniae Cas9, an Actinobacillus seminis Cas9, an Actinobacillus succinogenes Cas9, a Bergeriella denitrificans Cas9, a Conservatibacter flavescens Cas9, a Gallibacterium anatis Cas9, a Haemophilus fells Cas9, a Haemophilus parainfluenzae Cas9, a Haemophilus pittmaniae Cas9, a Haemophilus sputorum Cas9, a Mannheimia granulomatis Cas9, a Neisseria animalis Cas9, a Neisseria animaloris Cas9, a Neisseria arctica Cas9, a Neisseria bacilliformis Cas9, a Neisseria dentiae Cas9, an Otariodibacter oris Cas9, a Pasteurella aerogenes Cas9, a Pasteurella langaaensis Cas9, a Pasteurella mairii Cas9, a Pasteurellaceae bacterium Cas9, a Phocoenobacter uteri Cas9, a Rodentibacter pneumotropicus Cas9, a Simonsiella muelleri Cas9, a Suttonella indoIogenes Cas9, a Treponema denticola Cas9, a Moraxella bovoculi Cas12a, or an engineered AsCas12a.
45. The fusion protein of claim 43, wherein the Cas9 protein is engineered to contain an R221 K mutation, an N394K mutation, or both.
46. The fusion protein of claim 42, wherein the DNA polymerase comprises E. coli DNA polymerase I (Ecol), T4 DNA polymerase, T5 DNA polymerase, T7 DNA polymerase, thioredoxin binding domain of T7 DNA polymerase (T7/Trx), Klenow fragment DNA polymerase, <t>29 DNA polymerase, Bsu DNA polymerase, Bst DNA polymerase large fragment (LF), Bst DNA polymerase full length (F), Pfu DNA polymerase, Pwo DNA polymerase, Stoffel DNA polymerase, or any combination thereof.
47. The fusion protein of claim 46, wherein the T4 DNA polymerase is located at a C-terminus or an N-terminus of the DNA-binding nickase protein, wherein a 33 amino acid linker connects the T4 DNA polymerase to the DNA-binding nickase protein.
48. The fusion protein of claim 47, wherein the T4 DNA polymerase comprises at least one mutation.
49. The fusion protein of claim 48, wherein the at least one mutation is a Y320A mutation, an L412M mutation, an I50L mutation, a G255S mutation, or any combination thereof.
50. The fusion protein of claim 48, wherein the at least one mutation comprises addition of an sso7d DNA binding domain at a C-terminus of the T4 DNA polymerase, optionally wherein the sso7d DNA binding domain comprises an E12L mutation, a K35L mutation, or both.
51. The system of claim 46, wherein the Bst DNA polymerase LF comprises at least one fusion domain.
52. The fusion protein of claim 51 , wherein the at least one fusion domain comprises an actin- binding protein.
53. The fusion protein of claim 52, wherein the actin-binding protein comprises villin headpiece.
54. The fusion protein of claim 51 , wherein the fusion domain is fused to an N-terminus of the Bst DNA polymerase LF.
55. The fusion protein of claim 46, wherein the Bst DNA polymerase LF comprises a polymerase domain mutation.
56. The fusion protein of claim 55, wherein the polymerase domain mutation comprises a T493N mutation, an A552G mutation, an S371 D mutation, or any combination thereof.
57. The fusion protein of claim 55, wherein the polymerase domain mutation comprises increased thermostability relative to a wild type Bst DNA polymerase LF.
58. The fusion protein of claim 53, wherein the villin headpiece comprises at least one additional mutation.
59. The fusion protein of claim 58, wherein the at least one additional mutation comprises N31R, N39K, E43K, A20K, or any combination thereof.
PCT/US2024/044167 2023-08-28 2024-08-28 A programmable gene correction technology using cas-polymerase constructs Pending WO2025049563A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202363579160P 2023-08-28 2023-08-28
US63/579,160 2023-08-28
US202363600216P 2023-11-17 2023-11-17
US63/600,216 2023-11-17

Publications (1)

Publication Number Publication Date
WO2025049563A1 true WO2025049563A1 (en) 2025-03-06

Family

ID=94820350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/044167 Pending WO2025049563A1 (en) 2023-08-28 2024-08-28 A programmable gene correction technology using cas-polymerase constructs

Country Status (1)

Country Link
WO (1) WO2025049563A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022067130A2 (en) * 2020-09-24 2022-03-31 The Broad Institute, Inc. Prime editing guide rnas, compositions thereof, and methods of using the same
WO2022150790A2 (en) * 2021-01-11 2022-07-14 The Broad Institute, Inc. Prime editor variants, constructs, and methods for enhancing prime editing efficiency and precision
US20220356469A1 (en) * 2019-03-19 2022-11-10 The Broad Institute, Inc. Methods and compositions for editing nucleotide sequences methods and compositions for editing nucleotide sequences
WO2023279106A1 (en) * 2021-07-01 2023-01-05 The Board Of Regents Of The University Of Texas System Compositions and methods for myosin heavy chain base editing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220356469A1 (en) * 2019-03-19 2022-11-10 The Broad Institute, Inc. Methods and compositions for editing nucleotide sequences methods and compositions for editing nucleotide sequences
WO2022067130A2 (en) * 2020-09-24 2022-03-31 The Broad Institute, Inc. Prime editing guide rnas, compositions thereof, and methods of using the same
WO2022150790A2 (en) * 2021-01-11 2022-07-14 The Broad Institute, Inc. Prime editor variants, constructs, and methods for enhancing prime editing efficiency and precision
WO2023279106A1 (en) * 2021-07-01 2023-01-05 The Board Of Regents Of The University Of Texas System Compositions and methods for myosin heavy chain base editing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PANDEY VIRENDRA NATH, KAUSHIKS NEERJA A, PRADHANS D S, MODAK MUKUND J, : "Template Primer-dependent Binding of B’-Fluorosulfonylbenzoyldeoxyadenosine by Escherichia coli DNA Polymerase I ", THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 265, no. 7, 5 March 1990 (1990-03-05), pages 3679 - 3684, XP093290645 *

Similar Documents

Publication Publication Date Title
US11225649B2 (en) Engineered nucleic-acid targeting nucleic acids
US11001843B2 (en) Engineered nucleic acid-targeting nucleic acids
EP2947146B1 (en) Methods and compositions for targeted cleavage and recombination
WO2022253185A1 (en) Cas12 protein, gene editing system containing cas12 protein, and application
JP7138712B2 (en) Systems and methods for genome editing
JP2023522848A (en) Compositions and methods for improved site-specific modification
EP3612630B1 (en) Site-specific dna modification using a donor dna repair template having tandem repeat sequences
CA3009727A1 (en) Compositions and methods for the treatment of hemoglobinopathies
CN116801913A (en) Compositions and methods for targeting BCL11A
WO2020180699A1 (en) Novel crispr dna targeting enzymes and systems
US20230091242A1 (en) Rna-guided genome recombineering at kilobase scale
WO2023086953A1 (en) Compositions and methods for the treatment of hereditary angioedema (hae)
JP2024540337A (en) New CRISPR-Cas12i system and its uses
WO2019189147A1 (en) Method for modifying target site in double-stranded dna in cell
WO2019173248A1 (en) Engineered nucleic acid-targeting nucleic acids
US20220162648A1 (en) Compositions and methods for improved gene editing
CN119923466A (en) Compositions and methods for reducing complement activation
JP2022533842A (en) SINGLE-BASE-SUBSTITUTED PROTEINS AND COMPOSITIONS CONTAINING THE SAME
KR102151064B1 (en) Gene editing composition comprising sgRNAs with matched 5&#39; nucleotide and gene editing method using the same
US20240110163A1 (en) Crispr-associated based-editing of the complementary strand
KR20240012377A (en) Compositions and methods for self-inactivation of base editors
WO2017046594A1 (en) Compositions and methods for polynucleotide assembly
WO2025049563A1 (en) A programmable gene correction technology using cas-polymerase constructs
Nguyen et al. Efficient Genome Editing with Chimeric Oligonucleotide-Directed Editing
CN119162157B (en) Deaminases and their variants for base editing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24860922

Country of ref document: EP

Kind code of ref document: A1