WO2025240795A1 - End-modified grnas for improved base editing - Google Patents
End-modified grnas for improved base editingInfo
- Publication number
- WO2025240795A1 WO2025240795A1 PCT/US2025/029650 US2025029650W WO2025240795A1 WO 2025240795 A1 WO2025240795 A1 WO 2025240795A1 US 2025029650 W US2025029650 W US 2025029650W WO 2025240795 A1 WO2025240795 A1 WO 2025240795A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- agrna
- nucleic acid
- editing
- sequence
- target nucleic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
- C12N9/222—Clustered regularly interspaced short palindromic repeats [CRISPR]-associated [CAS] enzymes
- C12N9/226—Class 2 CAS enzyme complex, e.g. single CAS protein
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
Definitions
- Base editing has expanded the genome editing toolkit by offering high editing efficiencies, both in vivo and in vitro, without inducing double-strand breaks.
- Adenine base editors (ABEs), catalyze the deamination of cytidine residues in a sequence dependent manner and the conversion A*T-to-G*C base pairs; while, cytidine base editors (CBEs) catalyze the deamination of adenosine residues in a sequence dependent manner and the conversion of C «G-to-T «A base pairs.
- nucleobase editor (“NBE”) efficiency and editing pattern are influenced by the complex interaction between nucleobase editors, gRNAs, and target sequences (Arbab 2020), modifications to nucleobase editors and/or to the components thereof which result in increased editing efficiencies and/or increased specificity would significantly advance the art.
- the present disclosure provides modified guide RNAs (gRNAs) comprising 3'- nucleic acid extensions (referred to herein as anchor guide RNAs (agRNAs)), wherein the agRNA has improved properties, including, but not limited to, improved editing efficiency and/or reduced bystander editing in a context dependent manner of base editing when used in conjunction with a nucleobase editor, such as a fusion protein comprising a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
- agRNAs modified guide RNAs
- agRNAs 3'- nucleic acid extensions
- napDNAbp nucleic acid programmable DNA binding protein
- these agRNAs improve editing efficiency and/or reduce bystander editing by a nucleobase editor by stabilizing the target nucleic acid sequence (e.g., genomic DNA) within the active site of the nucleobase editor, where stabilizing means restricting movement of the DNA within the active site of the nucleobase editor to result in a smaller editing window and/or deaminating fewer bases.
- the present disclosure further provides methods for engineering and/or evolving nucleobase editors to be used in conjunction with a given agRNA.
- compositions, methods, uses, and kits for base editing comprising an agRNA and an optionally engineered and/or evolved nucleobase editor disclosed herein.
- the nucleobase editor is an engineered and/or evolved nucleobase editor.
- the agRNA comprises a gRNA and a 3 '-nucleic acid extension (FIG. IB and FIG. 5A).
- the gRNA comprises a spacer sequence and a scaffold sequence.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA by a nucleotide linker.
- the nucleotide linker ranges from 1-50 nucleotides in length.
- the agRNA is capable of binding to a napDNAbp by the scaffold sequence of the gRNA and directing the napDNAbp to a target nucleic acid sequence (e.g., genomic DNA).
- the target nucleic acid sequence comprises (i) a target strand and (ii) a complementary non-target strand.
- the target strand comprises a protospacer sequence that binds to the spacer sequence of the gRNA forming an RNA-DNA hybrid.
- the non-target strand comprises the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor).
- the non-target strand binds to 3 '-nucleic acid extension of the agRNA.
- the 3 '-nucleic acid extension binds to the non-target strand using (a) an upstream binding sequence (UBS) that is complementary the non-target stand and binds downstream of the target nucleic acid on the non-target strand, and/or (b) a downstream binding sequence (DBS) that is complementary the non-target stand and binds upstream of the target nucleic acid on the non-target strand (FIG. IB and FIG. 5A).
- UBS upstream binding sequence
- DBS downstream binding sequence
- an agRNA for nucleobase editing comprises a 3 '-nucleic acid extension comprising nucleic acids encoding the upstream binding sequence (UBS).
- UBS upstream binding sequence
- the UBS is complementary to the non-target stand and binds downstream of the target nucleic acid on the non-target strand.
- an agRNA for nucleobase editing comprises a 3 '-nucleic acid extension comprising nucleic acids encoding the downstream binding sequence (DBS).
- the DBS is complementary to the non-target stand and binds upstream of the target nucleic acid on the non-target strand(FIG. IB and FIG. 5A).
- the UBS is about or more than about 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or more nucleotides in length.
- the DBS is about or more than about 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or more nucleotides in length.
- the UBS and/or the DBS are at least 85% homologous, at least 90% homologous, at least 95 % homologous, at least 97 % homologous, at least 99% homologous, at least 99.7 % homologous, or 100% homologous to the non-target strand.
- the UBS and/or DBS comprises at least 1, at least 2, at least 3, at least 4, or at least 5 mismatches.
- the 3 '-nucleic acid extension further comprises a counterloop sequence (CLS).
- CLS is about or more than about 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or more nucleotides in length.
- the CLS forms a secondary structural feature.
- the CLS is a hairpin.
- the CLS is flanked by the UBS and the DBS.
- the 3 '-nucleic acid extension further comprises a secondary structural element.
- the secondary structure element is a tevopreQl motif (FIG. IF).
- the location of the editing window can change when a different Cas domain is used, or when the deaminase domain changes.
- SaCas9 typically supports a broader editing window (typically protospacer positions -3-12 for CBEs and -4-12 for ABEs) than SpCas9 (protospacer positions -4-8 for CBEs and -4-7 for ABEs).
- a broader editing window increases the frequency of bystander editing by a nucleobase editor.
- agRNAs may be modified in one or more ways to restricted the movement of the nontarget strand (e.g., genomic DNA strand not bound by the spacer of the gRNA) within the active site of the nucleobase editor, thus resulting in a smaller editing window and minimizing the bystander effect and/or improve editing efficiency of the target nucleic acid by the nucleobase editor.
- the agRNA is modified to include 3'- nucleic acid extension comprising, but not limited to, a UBS, a UBS and a CLS, a CLS, a CLS and a DBS, a UBS and a DBS, or a UBS and a CLS and a DBS.
- the 3 '-nucleic acid extension stabilizes the non-target strand (e.g., genomic DNA strand not bound by the spacer of the gRNA) comprising the target nucleic acid (e.g., the nucleobase to be edited by a nucleobase editor) within the active site of the nucleobase editor (FIG. IB).
- the agRNA improves the editing efficiency of a target nucleic acid by a nucleobase editor relative to the editing efficiency of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
- the agRNA reduces bystander editing of bystander nucleic acids within an editing window of a target nucleic acid for a nucleobase editor relative to the bystander editing within a window of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
- the target nucleic acid sequence comprises a target nucleic acid (also referred to as “the target nucleobase”), wherein the target nucleic acid falls within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site.
- the target nucleic acid falls within a gene, a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by a pathogenic Single Nucleotide Polymorphisms (SNPs).
- SNPs are the most common genetic variations for various complex human diseases and disorders, including inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods and uses described herein.
- the target nucleic acid sequence comprises one or more target nucleic acids (also referred to as “the target nucleobase”), wherein the one or more target nucleic acids fall within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site.
- the target nucleic acid sequence comprises one or more target nucleic acids (also referred to as “the target nucleobase”), wherein the one or more target nucleic acids fall within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site.
- the target nucleic acid falls within a gene (e.g., DNMT1 a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by a pathogenic Single Nucleotide Polymorphisms (SNPs).
- SNPs are the most common genetic variations for various complex human diseases and disorders, including, but not limited to, inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods and uses described herein.
- the present disclosure provides an agRNA library comprising a plurality of agRNAs described herein.
- Each agRNA library comprises agRNAs that bind up- (e.g., utilizing UBSs) and/or downstream (i.e., utilizing DBSs) of a specific target DNA strand that is later present in the active site of the nucleobase editor (FIG. IB and FIG. 5A). Therefore, each agRNA library is specific to the intended target of the nucleobase.
- the agRNA library comprising between 2,000-75,000 different agRNAs, wherein the agRNAs comprise different 3 '-nucleic acid extensions.
- the agRNA library varies the 3'-nucleic acid extensions by length (e.g., the length of the 3'- nucleic acid extension is varied in the library). In some embodiments, the agRNA library varies the 3'-nucleic acid extensions by UBS, DBS, or CFS (e.g., the UBS and DBS are kept constant, while the CFS is varied). That is, in some embodiments, the 3 '-nucleic acid extension is varied in the library by UBS, DBS, or CFS (e.g., the UBS and DBS are kept constant, while the CFS is varied).
- the agRNA library varies the 3'- nucleic acid extensions by structure such that the 3 '-nucleic acid extensions comprise different combinations of UBSs, CUSs, and DBSs (e.g., an agRNA library comprising agRNAs with a CFS in the 3'- nucleic extension versus an agRNA library comprising agRNAs without a CLS in the 3'- nucleic extension), or further comprise other secondary structural elements (e.g., tevopreql motif).
- the 3'-nucleic acid extension is varied in the library by structure such that the 3 '-nucleic acid extensions comprise different combinations of UBSs, CLSs, and DBSs (e.g., an agRNA library comprising agRNAs with a CLS in the 3'- nucleic extension versus an agRNA library comprising agRNAs without a CLS in the 3'- nucleic extension), or further comprise other secondary structural elements (e.g., tevopreql motif).
- the agRNA library is used to screen which agRNAs improve editing efficiency and/or reduce bystander editing of a nucleobase editor.
- the disclosure provides polynucleotides, vectors, and cells, comprising an agRNA described herein for screening the editing pattern for each nucleobase combined with a particular agRNA.
- the present disclosure describes a polynucleotide comprising an agRNA.
- the polynucleotide may further comprise a target nucleic acid sequence (e.g., a gene of interest) comprising the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor) downstream of the agRNA sequence.
- a target nucleic acid sequence e.g., a gene of interest
- the target nucleic acid e.g., the nucleobase to be edited by the nucleobase editor
- the present disclosure provides a vector comprising a polynucleotide described herein.
- the vector comprises a polynucleotide encoding an agRNA described herein.
- the polynucleotide can be under the control of a promoter.
- the polynucleotide can be under the control of multiple promoters.
- the promoter can be any promoter recognized by a skilled artisan (e.g., a constitutive promoter, a tissue- specific promoter, or an inducible promoter).
- the promoter can be a U6 promoter.
- the promoter can also be a U6, U6v4, U6v7, or U6v9 promoter or a fragment thereof.
- the vector further comprises a polynucleotide sequence comprising an agRNA described herein and a target nucleic acid sequence (e.g., a gene of interest) that includes the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor) (FIG. 5A).
- the target nucleic acid sequence is located downstream of the agRNA sequence.
- the agRNA and the target nucleic acid sequence are within a 50-600-nucleotide window (e.g., a 100-nucleotide window, a 300-nucleotide window, a 450-nucleotide window, etc.).
- the vector further comprises at least one primer binding site. In certain embodiments, the vector further comprises at least two primer binding sites.
- the vector comprising the one or more primer binding sites is subjected to next-generation sequencing (NGS) to sequence the agRNA and the target nucleic acid after the editing process in order to analyze the editing pattern (e.g., editing efficiency and bystander editing) of a nucleobase with a given agRNA.
- NGS next-generation sequencing
- a first primer binding site is located upstream or within the agRNA, while a second primer binding site is located downstream of a target nucleic acid sequence.
- the distance between the first and second primer sites is less than 600, less than 500, less than 300, less than 200, less than 100, or less than 50 nucleotides. In certain embodiments, the distance between the first and second primer sites is less than 300 nucleotides.
- the present disclosure provides an agRNA screening library comprising a plurality of vectors described above and provided in this disclosure.
- next-generation sequencing NGS is used to sequence the plurality of vectors of the agRNA screening library to analyze the editing pattern (e.g., editing efficiency and bystander editing) of a nucleobase and a given agRNA.
- the agRNA and the target nucleic acid sequence are within within the 300-nucleotide window.
- the target sequence falls within the human DNMT1 gene.
- the present disclosure describes a composition
- a composition comprising (a) an agRNA and (b) a nucleobase editor (e.g., ABExl, ABEx2, ABEx3, or ABEx4) to carry out nucleobase editing.
- the composition further comprises (c) a target nucleic acid.
- the nucleobase editor is a fusion protein capable of base editing.
- the fusion protein comprises a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
- the composition comprises (a) an agRNA, (b) a N-terminal portion of a split nucleobase editor fused at its C-terminus to an intein-N and (c) a C-terminal portion of a split nucleobase editor fused at its N-terminus to an intein-C such that the N- terminal portion of a split nucleobase editor and the C-terminal portion of a split nucleobase editor are joined to form a fusion protein of a deaminase and a napDNAbp.
- the composition further comprises (d) a target nucleic acid.
- the napDNAbp is selected from the group consisting of Cas9, CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp-Cas9, SpRY, SpG-Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2, and variants thereof.
- the deaminase is a cytidine deaminase or an adenosine deaminase.
- the cytidine deaminase is selected from the group consisting of CBE6, CGBE, BE4max, TadCBE, and variants thereof.
- the adenosine deaminase is selected from the group consisting of TadA-8e, ABE8e, AYBE, ABE9e, and variants thereof.
- the deaminase is an adenosine deaminase comprising an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1 of a variant thereof.
- the adenosine deaminase has an amino acid sequence that includes one or more amino acid substitutions at positions 28, 34, and 151 relative to the corresponding position in the sequence of SEQ ID NO: 1 or a variant thereof.
- the adenosine deaminase has an amino acid sequence that comprises one or more amino acid substitutions selected from V28C, L34W, and M151E relative to the corresponding position in SEQ ID NO: 1 or a variant thereof.
- the present disclosure describes a complex comprising any of the agRNAs described herein and a nucleobase editor described herein or elsewhere.
- the present disclosure describes one or more polynucleotides encoding a complex of an agRNA and a nucleobase editor described herein or elsewhere.
- the present disclosure describes one or more vectors comprising one or more polynucleotides encoding the components of a complex of an agRNA and a nucleobase editor.
- the vector includes one or more promoters that drive the expression of the agRNA and the nucleobase editor or split nucleobase editor of the complex.
- the disclosure provides cells (e.g., transformed cell lines) that comprise the agRNA described herein.
- the cells can also comprise the nucleobase editing complexes described herein (e.g., wherein the cell comprises both an agRNA and a nucleobase editor).
- the cells can also comprise any of the polynucleotides described above, which express the agRNA, and optionally which express the nucleobase editors.
- the cells can comprise any of the vectors described above, which express the agRNA, and optionally which express the nucleobase editor.
- the disclosure provides a pharmaceutical composition comprising: (i) an agRNA described above, or a nucleobase editing complex described above, a polynucleotide described above, or a vector described above, or any of the cells described above, and (ii) a pharmaceutically acceptable excipient.
- the disclosure describes a computational method, which may be embodied in software, for designing a library of 3'-nucleic acid extensions.
- the method involves evaluating a target nucleic acid sequence and generating UBSs, DBSs, and CLSs at varying lengths (e.g., 0-50 nucleotides), and 3 '-nucleic acid extensions comprising different combinations of the various UBSs, DBSs, and/or CLSs.
- the disclosure describes a method of selecting agRNAs, wherein the method involves transfecting the agRNA screening libraries described herein into cells, and using (Next Generation Sequencing) NGS to select agRNA vectors that observe reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid by a nucleobase editor.
- the reduced bystander editing and/or editing efficiency of a nucleobase editor is measured relative to the gRNA lacking the 3 '-nucleic acid extension of the agRNA or a second agRNA.
- PANCE phage-assisted, non-continuous evolution
- the selection plasmid comprises (i) a pill nucleotide sequence (encoding the phage coat protein pill) that has been modified to contain at least one single nucleotide variant (SNV) and (ii) an agRNA nucleotide sequence encoding the corresponding agRNA that targets the modified pill nucleotide sequence to correct (edit) the SNV to the wildtype sequence.
- SNV single nucleotide variant
- the SNV results in a mutated pill protein having lower the phage infectivity. Correction of the SNV by a complex of the agRNA and nucleobase editor to the WT sequence increases phage infectivity. If the perfect edit occurs, the pill sequence is reverted to the wildtype sequence and phage propagation occurs.
- the selection plasmid comprises a sequence such that bystander edits upstream and downstream of the target nucleic acid in the pill nucleotide sequence introduce mutations that inhibit phage propagation.
- host cells further comprise a helper plasmid and/or a mutagenesis plasmid.
- a mutagenesis plasmid comprises an arabinose- inducible promoter.
- Some aspects of this disclosure provide methods of selecting nucleobase editors that show reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid utilizing machine learning (ML) language models.
- the ML language models are able to predict evolutionary adaptative mutations resulting in nucleobase editor variants with improved fitness.
- the disclosure provides a method of nucleobase editing (e.g., “base editing”) comprising contacting a target nucleic acid sequence with an agRNA described above and a nucleobase editor comprising a fusion protein comprising a deaminase and a napDNAbp or a split napDNAbp, wherein the editing efficiency is increased, and/or the bystander editing is decreased as compared to the same method using a gRNA not including the 3 '-nucleic acid extension.
- base editing e.g., “base editing”
- the present disclosure contemplates the use of the agRNAs described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell (e.g., prokaryotic or eukaryotic cell), a virus, a fungus, a plant, an insect, and/or an animal.
- SNV single nucleotide variant
- the present disclosure contemplates the use of the methods described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell (e.g., prokaryotic or eukaryotic cell), a virus, a fungus, a plant, an insect, and/or an animal.
- SNV single nucleotide variant
- the use of the methods described herein may be used for modifying a target nucleic acid sequence for research purposes (e.g., to edit or introduce a nonpathogenic SNP that may enhance or abolish a function, a process, or a phenotype).
- the disclosure provides a kit comprising: (i) agRNA described above, or a nucleobase editing complex described above, a nucleic acid molecule described above, or a vector described above, or any of the cells described above, and (ii) a set of instructions for conducting nucleobase editing.
- FIGs. 1A-1I depict the design and testing of agRNA library.
- Figure 1A is a schematic workflow of the dual nucleobase editor system evolution starting with sequencing the patient specific mutation, testing existing base editing enzymes for that context, and identifying the editing pattern. If bystander mutations exist, a personalized agRNA for the specific context in combination with an optionally evolved nucleobase editor can generate a “bystander-less” and active base editing system that is personalized for the patient.
- Figure IB is a schematic of library design.
- FIG. 1C is a dot plot representation of the ⁇ 60K agRNA clones’ library after NGS evaluation. Shown as squares are agRNA candidates with high efficiency and low bystander editing in the DNMT1 cloned context.
- FIG. ID shows the editing pattern of ABE8e at the human DNMT1 locus in combination with selected agRNAs.
- FIG. IE shows the editing pattern of ABE8e in combination with agRNA56114-tev opreq 1 at the human DNMT1 locus in HEK293T cells.
- FIG. 1G shows the editing pattern of ABE8e and the different guide RNA combinations shown in f at the human DNMT1 locus.
- FIG. 1H shows the influence of the agRNA56114 with and without tevopreql motif on the editing pattern in a ⁇ 12K different pathogenic contexts.
- FIG. II is the editing pattern of ABE8e in combination with agRNA56114-tev opreq 1 at the human DNMT1 locus in HeLa and HepG2 cells. agRNA testing data in the native DNMT1 locus was obtained from n>3 independent experiments.
- FIGs. 2A-2G shows phage assisted non-continuous evolution of adenine base editors.
- FIG. 2A depicts a schematic representation of the PANCE.
- the three different selection plasmids encode for different pill sequences carrying a single nucleotide variant (SNV) together with the corresponding agRNA to correct the SNV to the wild type sequence.
- FIG. 2B shows that once the selection phage and the selection plasmid meet in the same cell, the SNVs of SP1-3 can be corrected. If the perfect edit occurs, the sequence is reverted to the wild type sequence (e.g., Leu, Arg, or Vai). The SNVs introduce an amino acid exchange that lowers the phage infectivity (Weiss). Bystander edits can introduce mutations that result in a decrease in the phage replication (e.g., Pro or Gly).
- FIG. 2D is the editing pattern of the selected mutants generated in the PANCE experiment in the human DNMT1 locus in HEK293T cells (n>3 independent experiments).
- FIG. 2E shows the fold change of the amino acid exchanges V28C and L34W representing the two mutants evolved in the PANCE experiment. Dots represent the fold change in the three different replicates of the PANCE evolution.
- FIGs. 2F-2H depict the computational modeling of the structural change resulting from the amino acid exchanges of the V28C and L34W mutants.
- FIGs. 3A-3F depict machine learning guided base editor evolution.
- FIG. 3A is a schematic workflow of the machine learning approach to identify evolutionary plausible mutations.
- FIG. 3C shows the fold change of the amino acid exchange glutamic acid (M) to aspartic acid (D) at position 151 in the PANCE experiment.
- FIG. 3A is a schematic workflow of the machine learning approach to identify evolutionary plausible mutations.
- FIG. 3F depicts the computational modeling of the structural change resulting from the amino acid exchanges of the M151E mutant.
- FIGs. 4A-4D show bystander abolishment by ABEx-agRNA combinations.
- FIG. 4A shows the AB Ex variants and corresponding mutations. ABExl and 3 were generated by PANCE, ABEx2 by machine learning and ABEx4 is a combination of both techniques.
- FIG. 4A-4D show bystander abolishment by ABEx-agRNA combinations.
- FIG. 4A shows the AB Ex variants and corresponding mutations. ABExl and 3 were generated by PANCE, ABEx2 by machine learning and ABEx4 is a combination of both techniques.
- FIG. 4B is the editing pattern in the human DNMT1 locus caused by ABEX spCas9-SpRY variants combined
- FIG. 4D shows the AB EX-SpRY variants fold change editing efficiency normalized vs ABE8e, analyzed in -12000 different pathogenic contexts.
- FIGs. 5A-5E show Sanger traces and NGS analysis of ABE8e-spCas9-WT with sgRNACtrl and agRNA56114.
- FIG. 5A is a schematic representation of agRNA library design and workflow. The TadA-8e domain engages with the exposed single-stranded region of the PAM-distal nontarget strand (NTS) fostering deamination. Based on this a 3’ extended agRNA library was designed and cloned into (plasmid). Next, 20 million HEK293T cells were transfected and analyzed the editing pattern by Illumina sequencing (miSeq V3 300 cycles). FIGs.
- FIGS. 5B-5C are Sanger sequencing chromatograms on the DNMT1 genome locus after editing using ABE8e-spCas9-WT with the sgRNACtrl and agRNA56114-tevopreql.
- FIGs. 5D-5E show the editing frequencies on the DNMT1 genome locus after editing using ABE8e-spCas9-WT with the sgRNACtrl and agRNA56114-tevopreql analyzed by NGS.
- FIGs. 6A-6E show the PANCE constructs and titers.
- FIG. 6A depicts a selection phage design containing the TadA fused with the Npu N-terminal intein.
- FIG. 6B shows schematics of the Selection plasmids containing the gene III, the agRNA and the Cas9 fused with the Npu C-terminal intein.
- FIG. 6C depicts the mapping of agRNA binding sites on the gene III.
- FIG. 6D is a depiction of the PANCE workflow schematics.
- FIG. 6E shows the phage titer across the ten rounds of evolution.
- FIGs. 7A-7C show PANCE variant testing in the DNMTl-Site.
- FIG. 7A show RNP editing efficiencies using the ABE8e-SpRY and the ABE9-WT. ABE9 showed low editing efficiency using both plasmids and RNP editing strategies.
- FIGs. 7B-7C show the editing efficiency of 50 PANCE evolved clones with the sgRNActri (FIG. 7B) and agRNAseiu- tevopreql •
- FIGs. 8A-8B are a structural comparison of ABE8e and AB Ex variants.
- FIG. 8A depicts the structure modeling of ABE8e-WT. Snapshot is at the editing interphase representing the deaminase (pink), the WT mutated aminoacids (grey) and the DNA (yellow).
- FIG. 8B shows the number of H bonds formed between amino acids in positions mutated (columns) and neighbor amino acids (rows). The table in FIG. 8B shows H bonds for those positions with wild-type amino acid (WT), mutated amino acid (mut), and the difference between mutated and wild type (dif).
- Both wild-type V28 and mutated C28 are predicted to establish the same interactions with surrounding residues, i.e., a hydrogen bond with V30. It is possible that a mutation in residue 28 induces a conformational change and it may interact with nucleotide 6 of the gRNA given its proximity (FIG. 2F). Residues E34 and W34 are also predicted to establish the same interactions with surrounding amino acids, i.e., one and two hydrogen bonds with 141 and G42, respectively. Residue 34 is far from the gRNA but the substitution from E to W, which is a more hydrophobic amino acid, could alter the orientation of the alpha-helix arm (orange) where residue H57 lies.
- FIGs. 9A-9B show the ME variant testing the DNMTl-Site.
- FIGs. 9A-9B show the editing efficiencies of the 21 ME obtained variants with the sgRNActri (FIG. 9A) and agRNA 56 ii4 (FIG. 9B).
- FIGs. 10A-10E show the ABE8e and ABExl editing and structure comparison.
- FIGs. 10A-10B are Sanger sequencing chromatograms of the DNMT1 genome locus after editing using ABE8e-SpRY with sgRNActri and agRNAseiu.
- FIGs. 11A-11D show the variant testing in HEK-Site3.
- FIG. 11A shows the editing efficiencies of the HEK site 3 locus using both PANCE and ML variants, together with the double mutants in HEK293T cells.
- FIGs. 1 IB-11C are the editing efficiencies of the HEK site 3 locus using agRNAseiu-tevopreqi in HepG2 (FIG. 1 IB) and HeLa (FIG. 11C) cell lines.
- FIG. 12 shows the DNMT1 site editing window movement. Mean editing efficiencies in the DNMT1 locus using different sgRNAs and agRNAs targeting the same site. The different guides are named as -1, -2 or -3 if the editing window is moved upstream and +1 if downstream the agRNActri used in this work (centered).
- FIGs. 13A-13H show the Cas-dependent and independent off-targets.
- FIG. 13A is the on-target editing of ABE8e-SpRY and ABExl-SpRY in the DNMT1 locus.
- FIG. 13B-13E show the Cas-dependent off-target analysis of ABE8e-SpRY and ABExl -SpRY in four different loci.
- FIG. 13F shows schematics of the orthogonal R-loop assay.
- FIGs. 13G-13H show the Cas-independent off-target editing of ABE8e-SpRY and ABExl -SpRY in two different sites.
- FIG. 14 shows the Path_Var libraries analysis. Editing profile for the different SpRY variants using sgRNACtrl and agRNA56114-tevopreql compared with ABE8e and ABE9 [0060]
- FIGs. 15A-15H shows the Path_Var libraries analysis.
- FIG. 15A shows read counts of each sample after Illumina sequencing.
- FIG. 15B shows an example of the quality score of readl and read 2.
- FIG. 15C shows the editing profile for the different SpRY variants using sgRNActri compared with ABE8e and ABE9. FIGs.
- FIGS. 15D-15G are the editing profiles for the different SpRY variants using sgRNActri compared with ABE8e and ABE9 when more than 2 As are present in the editing window.
- Figure 15H shows the C to A, C to G, and C to T editing profiles using sgRNActri for the different SpRY variants compared with ABE8e and ABE9.
- FIGs. 16A-16D show the effect of 3' extended gRNAs on the editing pattern at the human DNMT1 locus.
- FIG. 16A Scheme of the agRNA (“Anchor”, yellow) in the base editing system.
- FIG. 16B Experimental workflow and design of the agRNA library. More than 60k different agRNAs were cloned downstream of the scaffold within a 300 nt DNA sequence that also contains the target DNA site of the guide.
- FIG. 16C Editing pattern and PreS for the top five performing agRNAs within the human DNMT1 locus.
- FIG. 17A-17I show phage-assisted non-continuous evolution of precise adenine base editors.
- FIG. 17A Schematic representation of the PANCE.
- the selection phage encoded the TadA-8e adenine deaminase with a C-terminal intein, while the selection plasmids (SP 1- 3) the nCas9-SpRY with a N-terminal intein.
- the three different selection plasmids encode for different pill sequences carrying a single nucleotide variant (SNV) together with the corresponding agRNA to correct the SNV to the wildtype sequence.
- SNV single nucleotide variant
- FIG. 17D NGS analysis of editing efficiency of the ABE8e control and the two best PANCE generated variants (based on PreS) V28C and L34W in the human DNMT1 locus with and without the agRNAseiu.
- FIG. 17E Precise A-to-G editing (A 13) within edited reads.
- FIG. 17F Fold change precision over ABE8e_SpRY sgRNActri.
- FIG. 17G Fold change of the ratio of the V28C and L34W mutation across 10 rounds of evolution.
- FIG. 17H Editing of Cas9-dependent off-targets of the V28C and L34W in 4 different predicted sites across the human genome.
- FIGs. 18A-18I show machine learning-guided design of adenine base editor variants with reduced bystander editing.
- FIG. 18A Schematic workflow of the machine learning approach to identify evolutionary plausible mutations.
- FIG. 18B (left) Editing efficiencies in the human DNMT1 locus caused by ABE predicted single amino acid exchange mutants (with sgRNActri and agRNAseiu) identified by machine learning algorithm in HEK293T cells, (right) PreS score of tested variants.
- the underlined mutations are mutations that revert to the wildtype TadA amino acid at that position.
- FIG. 18C NGS abundance analysis control and the best machine learning variant M151E in the human DNMT1 locus with and without the 56114 agRNA.
- FIG. 18D Fold change of precise mutations in Ar, (top) or A 15 (bottom).
- FIG. 18E V28C variant predicted mutations. The numbers indicated how many of the 6 models indicated the mutation as a plausible mutation. The underlined mutations are mutations that revert to the wildtype TadA amino acid at that position.
- FIG. 18F Editing efficiencies at the endogenous DNMT1 (top) and Site9 (bottom) of the PANCE and machine learning generated double mutant V28C-M151E
- FIG. 18G Amino acid abundance at position 151 after the PANCE experiment.
- FIG. 18H Fold change of ratio of the M151D mutation throughout the PANCE experiment.
- FIGs. 19A-19P show editing pattern of the base editors.
- FIG. 19A Normalized fold-change histograms of PANCE and ML evolved variants taking ABE8e_SpRY as a control across thousands of human pathogenic variants.
- FIG. 19B Fold change in precision correction of pathogenic mutations when more than 2 adenines are present in the target site.
- FIG. 19C Fold change in precision correction of pathogenic mutations when more than 3 adenines are present in the target site.
- FIG. 19D Normalized fold-change histograms of V28C variant in YA or RA contexts.
- FIG. 19E-19N Editing pattern for twelve individual DNA sites in bulk and on single-read level after editing with the V28C variant and ABE8e_SpRY.
- FIG. 19E Site 2.
- FIG. 19F Site 4.
- FIG. 19G Site 7.
- FIG. 19H Site 8.
- FIG. 191 Site 9.
- FIG. 19J Site 10.
- FIG. 19K Site 12.
- FIG. 19L Site 13.
- FIG. 19M Site 14.
- FIG. 19N Site 18.
- FIGs. 20A-20J show the performance of V28C as an editing tool for correcting human pathogenic mutations.
- FIG. 20A In silico model of the effect of the V28C mutation on the interaction with the target DNA.
- FIG. 20B Scheme of PCSK9 editing strategy using adenine base editors to disrupt a splicing site
- FIG. 20D Quantification of A-to- G editing at the target site.
- FIG. 20E Scheme of the SNCA E46K mutation.
- FIG. 20F Sequence similarity of the guide targeting the human DNMT1 locus and the guide targeting the SNCA E46K mutation.
- FIG. 20G NGS abundance analysis results of the editing of the SNCA E46K mutation by ABE8e_SpRY and the V28C variant with agRNActri and agRNAs6ii4. The sequence highlighted in green represents the perfect edit.
- FIG. 20H Editing pattern of the SNCA E46K mutation by ABE8e and the V28C variant with agRNActri and agRNAs6ii4.
- FIG. 201 Precise A-to-G editing (A15) within edited reads.
- FIGs. 21A-21G show agRNA library design and testing.
- FIG. 21A-21B Schematic representation of agRNA library design and workflow.
- FIG. 21A The TadA-8e domain engages with the exposed single- stranded region of the PAM-distal nontarget strand (NTS) fostering deamination.
- FIG. 21B Based on this, a 3' extended agRNA library was designed and cloned into (plasmid). Next 20 million HEK293T cells were transfected and analyzed the editing pattern by Illumina sequencing (miSeq V3 300 cycles).
- FIG. 21C Dot plot representation of the ⁇ 60K agRNA clones’ library after NGS evaluation.
- FIG. 21D Bulk editing efficiencies on the DNMT1 genome locus after editing using ABE8e-spCas9-WT with the sgRNACtrl and agRNAseiu analyzed by NGS.
- FIG. 21E Fold-change precision quantification of A13A15 and A13 edited species (based on NGS abundance analysis) using agRNA56114 over gRNACtrl.
- FIG. 21F-21G A-to-G editing efficiencies in (FIG. 21F) HeLa and (FIG. 21G) HepG2 cells.
- FIGs. 22A-22F shows PANCE constructs and their evaluation.
- FIG. 22A Mapping of agRNA binding sites on the gene III in the Selection Plasmid.
- FIG. 22B Schematic representation of the PANCE workflow.
- FIG. 22C Phage titer across the ten rounds of evolution.
- FIG. 22A Mapping of agRNA binding sites on the gene III in the Selection Plasmid.
- FIG. 22B Schematic representation of the PANCE workflow.
- FIG. 22C Phage titer across the ten rounds of evolution.
- FIG. 22F Frequency of A-to-I editing.
- FIGs. 23A-23B show base editing variant testing in HEK-Site3.
- FIG. 23A Editing efficiencies of the HEK Site9 locus using both PANCE and ML variants, together with the double mutants in HEK293T cells.
- FIG. 23B Editing efficiencies of the HEK Site9 locus using agRNA56114
- FIGs. 24A-24I show PathVar library analysis.
- FIG. 24A Schematic representation of the PathVar library workflow using Tol2 mediated transposition.
- FIG. 24B Editing profile for the different SpRY variants using sgRNACtrl compared with ABE8e and ABE9.
- FIG. 24C-24F Editing profile for the different SpRY variants 5 using sgRNACtrl compared with ABE8e and ABE9 when more than 2 As are present in the editing window.
- FIG. 24G C to A, C to G and C to T editing profiles using sgRNACtrl for the different SpRY variants compared with ABE8e and ABE9.
- FIG. 24H and 241) NGS abundance analysis and editing pattern of ABE8_SpRY and V28C_SpRY using sgRNACtrl at (FIG. 24H) Site3 and (FIG. 241) Site 16 sites.
- FIGs. 25A-25D showV28C-M151E variant performance to correct mutation E46K in human iPSCs.
- FIG. 25A Editing efficiencies of V28C-M151E with sgRNACtrl and agRNA56114 compared toABE8e and V28C variants.
- FIG. 25B NGS abundance analysis 5 of V28C-M151E editing at the target site.
- FIG. 25C Precise A-to-G editing (A15) within edited reads.
- administer refers to implanting, absorbing, ingesting, injecting, inhaling, or otherwise introducing a treatment or therapeutic agent, or a composition of treatments or therapeutic agents, in or on a subject.
- biomolecule refers to any substance produced by cells or living organisms and includes carbohydrates, lipids, nucleic acids, proteins, and vitamins.
- cDNA refers to DNA that is derived from (e.g., by reverse transcription) and complementary to an RNA template (e.g.. an mRNA template or an rRNA template).
- RNA template e.g. an mRNA template or an rRNA template.
- a “cell,” as used herein, may be present in a population of cells e.g., in a tissue, a sample, a biopsy, an organ, or an organoid).
- a population of cells is composed of a plurality of different cell types.
- Cells for use in the methods and systems of the present disclosure can be present within an organism, a single cell type derived from an organism, or a mixture of cell types. Included are naturally occurring cells and cell populations, genetically engineered cell lines, cells derived from transgenic animals, cells from a subject, etc. Virtually any cell type and size can be accommodated in the methods and systems described herein.
- the cells are mammalian cells (e.g., complex cell populations such as naturally occurring tissues).
- the cells are from a human.
- the cells are collected from a subject (e.g., a human) through a medical procedure, such as a biopsy.
- the cells may be a cultured population (e.g., a culture derived from a complex population or a culture derived from a single cell type where the cells have differentiated into multiple lineages).
- the cells may also be provided in situ in a tissue sample.
- base editor or equivalently “nucleobase editor (NBE)” refers to an agent comprising a polypeptide that is capable of making a modification to a base (e.g., A, T, C, G, or U) within a nucleic acid sequence (e.g., DNA or RNA).
- a nucleic acid sequence e.g., DNA or RNA.
- the nucleobase editor is capable of deaminating a base within a nucleic acid.
- the nucleobase editor is capable of deaminating a base within a DNA molecule.
- the nucleobase editor is capable of deaminating a cytosine (C) in DNA.
- the nucleobase editor is capable of deaminating a adenine (A) in DNA. In some embodiments, the nucleobase editor is capable of excising a base within a DNA molecule. In some embodiments, the nucleobase editor is capable of excising an adenine, guanine, cytosine, thymine or uracil within a nucleic acid (e.g., DNA or RNA) molecule. In some embodiments, the nucleobase editor is a protein (e.g., a fusion protein) comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to a cytidine deaminase.
- a protein e.g., a fusion protein
- napDNAbp nucleic acid programmable DNA binding protein
- the nucleobase editor is a protein (e.g., a fusion protein) comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to an adenosine deaminase.
- a protein e.g., a fusion protein
- napDNAbp nucleic acid programmable DNA binding protein
- Cas9 or “Cas9 domain” refers to an RNA-guided nuclease comprising a Cas9 protein, or a fragment thereof (e.g., a protein comprising an active, inactive, or partially active DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9).
- a Cas9 nuclease is also referred to sometimes as a casnl nuclease or a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease.
- CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids).
- CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer.
- tracrRNA trans-encoded small RNA
- rnc endogenous ribonuclease 3
- Cas9 protein serves as a guide for ribonuclease 3-aided processing of pre-crRNA.
- RNA single guide RNAs
- sgRNA single guide RNAs
- gNRA single guide RNAs
- Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self-versus non-self.
- Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D .J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar E.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc.
- Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and 5. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.
- a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a nickase.
- a Cas9 nuclease lacks an active (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a dead Cas9 (dCas9).
- nucleic acid programmable DNA binding protein refers to a protein that associates with a nucleic acid (e.g., DNA or RNA), such as a guide nuclic acid, that guides the napDNAbp to a specific nucleic acid sequence.
- a Cas9 protein can associate with a guide RNA that guides the Cas9 protein to a specific DNA sequence that is complementary to the guide RNA.
- the napDNAbp is a class 2 microbial CRISPR-Cas effector.
- the napDNAbp is a Cas9 domain, for example a nuclease active Cas9, a Cas9 nickase (nCas9), or a nuclease inactive Cas9 (dCas9).
- nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., dCas9 and nCas9), CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp- Cas9, SpRY, SpG-Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2, and variants thereof.
- nucleic acid programmable DNAbinding proteins also include nucleic acid programmable proteins that bind RNA.
- the napDNAbp may be associated with a nucleic acid that guides the napDNAbp to an RNA.
- Other nucleic acid programmable DNA binding proteins are also within the scope of this disclosure, though they may not be specifically listed in this disclosure.
- Cas9 fusion proteins as provided herein comprise the full- length amino acid sequence of a Cas9 protein, e.g., one of the Cas9 sequences provided herein. In other embodiments, however, fusion proteins as provided herein do not comprise a full-length Cas9 sequence, but only a fragment thereof.
- a Cas9 fusion protein provided herein comprises a Cas9 fragment, wherein the fragment binds crRNA and tracrRNA or sgRNA, but does not comprise a functional nuclease domain, e.g., in that it comprises only a truncated version of a nuclease domain or no nuclease domain at all.
- Cas9 refers to Cas9 from: Corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1, NC_016786.1); Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref: NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychroflexus torquisl (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1), Listeria innocua (NCBI Ref: NP_472073.1), Camp
- dCas9 refers to a nuclease-inactive Cas9 or nuclease-dead Cas9, or a variant thereof, and embraces any naturally occurring dCas9 from any organism, any naturally-occurring dCas9 equivalent or functional fragment thereof, any dCas9 homolog, ortholog, or paralog from any organism, and any variant of a dCas9, naturally-occurring or engineered.
- dCas9 is not meant to be particularly limiting and may be referred to as a “dCas9 or variant thereof.”
- Exemplary dCas9 proteins and method for making dCas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference.
- Any suitable mutation which inactivates both Cas9 endonucleases such as D10A and H840A mutations in the wild-type S. pyogenes Cas9 amino acid sequence (SEQ ID NO: 77), or D10A and N580A mutations in the wild-type S. aureus Cas9 amino acid sequence, may be used to form the dCas9.
- wild type Cas9 corresponds to Cas9 from
- Streptococcus pyogenes (Uniport Reference Sequence: Q99ZW2, SEQ ID NO: 77 (amino acid).
- nCas9 or “Cas9 nickase” refers to a Cas9 or a variant thereof, which cleaves or nicks only one of the strands of a target cut site thereby introducing a nick in a double strand DNA molecule rather than creating a double strand break.
- This can be achieved by introducing appropriate mutations in a wild-type Cas9 which inactivates one of the two endonuclease activities of the Cas9.
- Any suitable mutation which inactivates one Cas9 endonuclease activity but leaves the other intact is contemplated, such as one of D10A or H840A mutations in the wild- type .S'.
- the napDNAbp comprises a Cas9 nickase, wherein the Cas9 nickase is .S'. aureus Cas9 comprising a D10A mutation.
- Cas9 proteins e.g., a nuclease dead Cas9 (dCas9), a Cas9 nickase (nCas9), or a nuclease active Cas9), including variants and homologs thereof, are within the scope of this disclosure.
- dCas9 nuclease dead Cas9
- nCas9 Cas9 nickase
- nuclease active Cas9 including variants and homologs thereof
- deaminase or “deaminase domain,” as used herein, refers to a protein or enzyme that catalyzes a deamination reaction.
- the deaminase or deaminase domain is a cytidine deaminase, catalyzing the hydrolytic deamination of cytidine or deoxycytidine to uridine or deoxyuridine, respectively.
- the deaminase or deaminase domain is a cytidine deaminase domain, catalyzing the hydrolytic deamination of cytosine to uracil.
- the deaminase or deaminase domain is a naturally-occurring deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism that does not occur in nature.
- the deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally occurring deaminase from an organism.
- the napDNAbp of the nucleobase editor is a Cas9 domain.
- the nucleobase editor comprises a Cas9 protein fused to a cytidine deaminase.
- the nucleobase editor comprises a Cas9 nickase (nCas9) fused to a cytidine deaminase.
- the Cas9 nickase comprises a D10A mutation and comprises a histidine at residue 840 of SEQ ID NO: 77, or a corresponding mutation in any Cas9 provided herein, which renders Cas9 capable of cleaving only one strand of a nucleic acid duplex.
- the nucleobase editor comprises a nuclease-inactive Cas9 (dCas9) fused to a cytidine deaminase.
- the dCas9 domain comprises a D10A and a H840A mutation of SEQ ID NO: 77, or a corresponding mutation in any Cas9 provided herein, which inactivates the nuclease activity of the Cas9 protein.
- the cytidine deaminases may be enzymes that convert cytidine (C) to uracil (U) in DNA. If DNA replication occurs before uracil repair, the replication machinery may treat the uracil as thymine (T), leading to a C:G to T:A base pair conversion.
- the cytidine deaminases utilized in the nucleobase editor are apolipoprotein B mRNA-editing complex 1 (APOBEC1) deaminases, e.g. rat APOBEC1 deaminases.
- APOBEC1 apolipoprotein B mRNA-editing complex 1
- the adenosine deaminases may be may be enzymes that convert adenine (A) to guanine (G) in DNA, leading to an A:T to G:C base pair conversion.
- the adenosine deaminase is derived from a bacterium, such as, E.coli. S. aureus, S. typhi, S. putrefaciens, H. influenzae, or C. crescentus.
- the adenosine deaminase is a TadA deaminase.
- the TadA deaminase is an E. coli TadA deaminase (ecTadA). In some embodiments, the TadA deaminase is a truncated E. coli TadA deaminase.
- the truncated ecTadA may be missing one or more N- terminal amino acids relative to a full-length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 N- terminal amino acid residues relative to the full length ecTadA.
- the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 C-terminal amino acid residues relative to the full length ecTadA.
- the ecTadA deaminase does not comprise an N-terminal methionine.
- guide RNA is a particular type of guide nucleic acid which is commonly associated with a Cas protein (e.g., a Cas9 protein), directing the Cas protein to a specific sequence in a DNA molecule that includes complementarity to the protospacer sequence of the guide RNA.
- a gRNA as disclosed herein, may refer to a sgRNA or anchor guide RNA, herein referred to as “agRNA” (e.g., for base editing).
- agRNA may be naturally occurring, recombinant, synthetic, or any combination of these.
- a gRNA may direct a Cas protein (e.g., as part of a nucleobase editor) to a target site in the target gene.
- the Cas protein equivalents may include other napDNAbps from any type of CRISPR system (e.g., type II, V, VI), including Cpfl (a type-V CRISPR-Cas system), C2cl (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), and C2c3 (a type V CRISPR-Cas system).
- CRISPR system e.g., type II, V, VI
- Cpfl a type-V CRISPR-Cas system
- C2cl a type V CRISPR-Cas system
- C2c2 a type VI CRISPR-Cas system
- C2c3 a type V CRISPR-Cas system
- C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), which is incorporated herein by reference. Exemplary sequences and structures of guide RNAs are provided herein.
- guide RNAs associate with a Cas protein, directing (or programming) the Cas protein to a specific sequence in a DNA molecule that includes a sequence complementary to the protospacer sequence for the guide RNA.
- a gRNA is a component of the CRISPR/Cas system.
- the sequence specificity of a Cas DNA-binding protein is determined by gRNAs, which have nucleotide base-pairing complementarity to target DNA sequences.
- the native gRNA comprises a 20 nucleotide (nt) Specificity Determining Sequence (SDS), or spacer, which specifies the DNA sequence to be targeted, and is immediately followed by an 80 nt scaffold sequence, which associates the gRNA with the Cas protein.
- SDS Specificity Determining Sequence
- an SDS of the present disclosure has a length of 15 to 100 nucleotides, or more.
- an SDS may have a length of 15 to 90, 15 to 85, 15 to 80, 15 to 75, 15 to 70, 15 to 65, 15 to 60, 15 to 55, 15 to 50, 15 to 45, 15 to 40, 15 to 35, 15 to 30, or 15 to 20 nucleotides.
- the SDS is 20 nucleotides long.
- the SDS may be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides long. At least a portion of the target DNA sequence is complementary to the SDS of the gRNA.
- a region of the target sequence is complementary to the SDS of the gRNA sequence and is immediately followed by the correct protospacer adjacent motif (PAM) sequence.
- PAM protospacer adjacent motif
- an SDS is 100% complementary to its target sequence.
- the SDS sequence is less than 100% complementary to its target sequence and is, thus, considered to be partially complementary to its target sequence.
- a targeting sequence may be 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, or 90% complementary to its target sequence.
- the SDS of template DNA or target DNA may differ from a complementary region of a gRNA by 1, 2, 3, 4, or 5 nucleotides.
- the guide RNA is about 15-120 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence (e.g., a target sequence in DNMT1).
- a target sequence e.g., a target sequence in DNMT1.
- the guide RNA is 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
- the guide RNA comprises a sequence of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more contiguous nucleotides that is complementary to a target sequence. Sequence complementarity refers to distinct interactions between adenine and thymine (DNA) or uracil (RNA), and between guanine and cytosine.
- anchor gRNA refers to a gRNA comprising a 3’-nuceliec acid extension attached at the 3 ’-end of the gRNA.
- the 3’-nuceliec acid extension is about 1- 120 nucleotides long and comprises a sequence of at least 3 contiguous nucleotides that is complementary to a target sequence.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA by a nucleotide linker.
- the 3’- nuceliec acid extension is 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 ,1, 42, 43, 44, 45 ,46, 47, 48 49, or 50 nucleotides long.
- the 3’-nuceliec acid extension comprises an upstream binding sequence (USB) and a downstream binding sequence (DBS) that are complementary to a target sequence of a nucleobase editor.
- the UBS and/or the DSB is 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 ,1, 42, 43, 44, 45 ,46, 47, 48 49, or 50 nucleotides long.
- the 3’-nuceliec acid extension comprises a counterloop sequence (CLS).
- the CLS is 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 ,1, 42, 43, 44, 45 ,46, 47, 48 49, or 50 nucleotides long.
- the counterloop sequence is a hairpin.
- inhibitor of base repair refers to a protein that is capable in inhibiting the activity of a nucleic acid repair enzyme, for example a base excision repair enzyme.
- the IBR is an inhibitor of inosine base excision repair.
- Exemplary inhibitors of base repair include inhibitors of APE1, Endo III, Endo IV, Endo V, Endo VIII, Fpg, hOGGl, hNEILl, T7 Endol, T4PDG, UDG, hSMUGl, and hAAG.
- the IBR is an inhibitor of Endo V or hAAG.
- the IBR is a catalytically inactive EndoV or a catalytically inactive hAAG.
- uracil glycosylase inhibitor refers to a protein that is capable of inhibiting a uracil-DNA glycosylase base-excision repair enzyme.
- a UGI domain comprises a wild-type UGI or a UGI as set forth in SEQ ID NO: 12.
- the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment. A UGI variant shares homology to UGI, or a fragment thereof.
- nuclear localization sequence refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport.
- Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed November 23, 2000, published as WG/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences.
- mutation refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)).
- prevent refers to a prophylactic treatment of a subject who is not and was not with a disease but is at risk of developing the disease or who was with a disease, is not with the disease, but is at risk of regression of the disease.
- the subject is at a higher risk of developing the disease or at a higher risk of regression of the disease than an average healthy member of a population.
- polynucleotide refers to a series of nucleotide bases (also called “nucleotides”) in DNA and RNA and mean any chain of two or more nucleotides.
- the polynucleotides can be chimeric mixtures or derivatives or modified versions thereof, and single-stranded or double- stranded.
- the oligonucleotide can be modified at the base moiety, sugar moiety, or phosphate backbone, for example, to improve stability of the molecule, its hybridization parameters, etc.
- nucleic acid refers to a polymer of nucleotides.
- the polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxy adenosine, deoxythymidine, deoxyguanosine, and deoxy cytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7 deazaadenosine, 7 de
- natural nucleosides i.e., adenosine, thymidine, gu
- a “protein,” “peptide,” or “polypeptide” comprises a polymer of amino acid residues linked together by peptide bonds.
- the term refers to proteins, polypeptides, and peptides of any size, structure, or function. Typically, a protein will be at least three amino acids long.
- a protein may refer to an individual protein or a collection of proteins. Proteins may contain only natural amino acids, although non-natural amino acids (i.e., compounds that do not occur in nature but that can be incorporated into a polypeptide chain) and/or amino acid analogs as are known in the art may alternatively be employed.
- amino acids in a protein may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a famesyl group, an isofamesyl group, a fatty acid group, a linker for conjugation or functionalization, or other modification.
- a protein may also be a single molecule or may be a multi-molecular complex.
- a protein may be a fragment of a naturally occurring protein or peptide.
- a protein may be naturally occurring, recombinant, synthetic, or any combination of these.
- a protein may also be a therapeutic protein administered as a treatment for a disease or disorder (e.g., one that is associated with a change in the RNA expression and/or translation profile of a cell taken from a subject).
- the protein is an antibody, or an antibody variant (including antibody fragments).
- the term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively.
- a protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein.
- a protein comprises a proteinaceous part, e.g., an amino acid sequence constituting a nucleic acid binding domain, and an organic compound, e.g., a compound that can act as a nucleic acid cleavage agent.
- a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA.
- any of the proteins provided herein may be produced by any method known in the art.
- the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker.
- Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.
- promoter is art-recognized and refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene.
- a promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active in the presence of a specific condition.
- conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule.
- a subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule “inducer” for activity.
- inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
- inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
- constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect.
- the disclosure provides vectors with appropriate promoters for driving expression of the nucleic acid sequences encoding the fusion proteins (or one or more individual components thereof).
- RNA-programmable nuclease and “RNA-guided nuclease” are used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA(s) that is not a target for cleavage.
- an RNA-programmable nuclease when in a complex with an RNA, may be referred to as a nuclease:RNA complex.
- the bound RNA(s) is referred to as a guide RNA (gRNA).
- gRNAs can exist as a complex of two or more RNAs, or as a single RNA molecule.
- gRNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though “gRNA” is used interchangeably to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules.
- gRNAs that exist as single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas9 complex to the target); and (2) a domain that binds a Cas9 protein.
- domain (2) corresponds to a sequence known as a tracrRNA, and comprises a stem-loop structure.
- domain (2) is identical or homologous to a tracrRNA as provided in Jinek et al., Science 337:816-821(2012), the entire contents of which is incorporated herein by reference.
- gRNAs e.g., those including domain 2
- a gRNA comprises two or more of domains (1) and (2), and may be referred to as an “extended gRNA.”
- an extended gRNA will, e.g., bind two or more Cas9 proteins and bind a target nucleic acid at two or more distinct regions, as described herein.
- the gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex.
- the RNA- programmable nuclease is the (CRIS PR-associated system) Cas9 endonuclease, for example, Cas9 (Csnl) from Streptococcus pyogenes (see, e.g., “Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc.
- Cas9 endonuclease for example,
- RNA-programmable nucleases e.g., Cas9
- Cas9 RNA:DNA hybridization to target DNA cleavage sites
- Methods of using RNA-programmable nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al., Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al., RNA-guided human genome engineering via Cas9. Science 339, 823- 826 (2013); Hwang, W.Y.
- recombinant protein or nucleic acid molecule comprises an amino acid or nucleotide sequence that comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations as compared to any naturally occurring sequence.
- target nucleic acid refers to nucleotide in a “target sequence” within a nucleic acid molecule that is modified by a nucleobase editor, such as a fusion protein comprising an adenosine deaminase, e.g., a dCas9-adenosine deaminase fusion protein provided herein).
- RNA transcript is the product resulting from RNA polymerase- catalyzed transcription of a DNA sequence.
- primary transcript When the RNA transcript is a complementary copy of a DNA sequence, it is referred to as the primary transcript, or it may be an RNA sequence derived from post-transcriptional processing of the primary transcript and is then referred to as the mature RNA.
- Messenger RNA (mRNA)” refers to the RNA that is without introns and can be translated into a polypeptide by the cell.
- upstream and downstream are terms of relativity that define the linear position of at least two elements located in a nucleic acid molecule (whether single or double-stranded) that is orientated in a 5'-to-3' direction.
- a first element is upstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 5' to the second element.
- a first element is downstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 3' to the second element.
- a “subject” to which administration is contemplated refers to a human (z.e., male or female of any age group, e.g., pediatric subject (e.g., infant, child, or adolescent) or adult subject (e.g., young adult, middle-aged adult, or senior adult)) or non-human animal.
- the non-human animal is a mammal (e.g., primate (e.g., cynomolgus monkey or rhesus monkey) or mouse).
- the term “patient” refers to a subject in need of treatment of a disease.
- the subject is human.
- the patient is human.
- the human may be a male or female at any stage of development.
- a subject or patient “in need” of treatment of a disease or disorder includes, without limitation, those who exhibit any risk factors or symptoms of a disease or disorder.
- a subject is a non-human experimental animal (e.g., a mouse, rat, dog, pig, or non-human primate).
- An “effective amount” of a compound described herein refers to an amount sufficient to elicit the desired biological response.
- An effective amount of a compound described herein may vary depending on such factors as the desired biological endpoint, the pharmacokinetics of the compound, the condition being treated, the mode of administration, and the age and health of the subject. In certain embodiments, an effective amount is a therapeutically effective amount.
- an effective amount is a prophylactic treatment. In certain embodiments, an effective amount is the amount of a compound described herein in a single dose. In certain embodiments, an effective amount is the combined amounts of a compound described herein in multiple doses.
- an effective amount of a nucleobase editor may refer to the amount of the nucleobase editor that is sufficient to induce a mutation of a target site specifically bound by the nucleobase editor.
- an effective amount of a fusion protein provided herein e.g., of a fusion protein comprising a nucleic acid programmable DNA binding protein and a deaminase domain (e.g., a cytidine deaminase domain) may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein.
- an agent e.g., a fusion protein, a nucleobase editor, a deaminase, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
- an agent e.g., a fusion protein, a nucleobase editor, a deaminase, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
- the desired biological response e.g., on the specific allele, genome, or target site to be edited, on the cell or tissue being targeted, and on the agent being used.
- a “therapeutically effective amount” of a treatment or therapeutic agent is an amount sufficient to provide a therapeutic benefit in the treatment of a condition or to delay or minimize one or more symptoms associated with the condition.
- a therapeutically effective amount of a treatment or therapeutic agent means an amount of the therapy, alone or in combination with other therapies, that provides a therapeutic benefit in the treatment of the condition.
- the term “therapeutically effective amount” can encompass an amount that improves overall therapy, reduces or avoids symptoms, signs, or causes of the condition, and/or enhances the therapeutic efficacy of another therapeutic agent.
- treatment refers to reversing, alleviating, delaying the onset of, or inhibiting the progress of a disease described herein.
- treatment may be administered after one or more signs or symptoms of the disease have developed or have been observed (e.g., prophylactically or upon suspicion or risk of disease).
- treatment may be administered in the absence of signs or symptoms of the disease.
- treatment may be administered to a susceptible subject prior to the onset of symptoms (e.g., in light of a history of symptoms in the subject, or family members of the subject). Treatment may also be continued after symptoms have resolved, for example, to delay or prevent recurrence.
- treatment may be administered after using the methods disclosed herein and observing a change in the RNA expression or translation profile in a cell or tissue in comparison to a healthy cell or tissue.
- variants should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature, e.g., a variant Cas9 is a Cas9 comprising one or more changes in amino acid residues (i.e., “substitutions”) as compared to a wild type Cas9 amino acid sequence.
- variants encompasses homologous proteins having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with a reference sequence and having the same or substantially the same functional activity or activities as the reference sequence.
- vector refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter a host cell, mutate, and replicate within the host cell, and then transfer a replicated form of the vector into another host cell.
- exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure.
- wild type is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.
- the present disclosure provides modified guide RNAs (gRNAs) comprising 3'- nucleic acid extensions (referred to herein as anchor guide RNAs (agRNAs)), wherein the use of an agRNA results in improved editing efficiency and/or reduced bystander editing of a nucleobase editor.
- the nucleobase editor is a fusion protein comprising a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
- napDNAbp nucleic acid programmable DNA binding protein
- these agRNAs improved editing efficiency and/or reduced bystander editing of a nucleobase editor by stabilizing the target nucleic acid sequence (e.g., genomic DNA) within the active site of the nucleobase editor.
- the present disclosure further provides methods for evolving nucleobase editors to be used in conjunction with a given agRNA.
- the agRNA improves the editing efficiency of a target nucleic acid by a nucleobase editor relative to the editing efficiency of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
- the agRNA reduces bystander editing of bystander nucleic acids within an editing window of a target nucleic acid for a nucleobase editor relative to the bystander editing of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
- compositions, methods, uses, and kits for base editing comprising an agRNA and an optionally engineered and/or evolved nucleobase editor disclosed herein.
- the agRNAs are modified versions of a guide RNA.
- Guide RNAs maybe naturally occurring, expressed from an encoding nucleic acid, or synthesized chemically. Methods are well known in the art for obtaining or otherwise synthesizing guide RNAs and for determining the appropriate sequence of the guide RNA, including the protospacer sequence which interacts and hybridizes with the target strand of a genomic target site of interest.
- RNA sequence will depend upon the nucleotide sequence of a genomic target site of interest (i.e., the target site to be edited) and the type of napDNAbp (e.g., Cas9 protein) present in base editing smethods described herein, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc.
- a genomic target site of interest i.e., the target site to be edited
- type of napDNAbp e.g., Cas9 protein
- a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, Cas9 nickase, dead Cas9 domain, or Cas9 variant) to the target sequence.
- a napDNAbp e.g., a Cas9, Cas9 homolog, Cas9 nickase, dead Cas9 domain, or Cas9 variant
- the degree of complementarity between a guide sequence and its corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
- Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
- any suitable algorithm for aligning sequences include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn
- a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. [0126] In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length.
- BE base editor
- the agRNA comprises a gRNA and a 3 '-nucleic acid extension ( Figure IB and Figure 5A).
- the gRNA comprises a spacer sequence and a scaffold sequence.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA by a nucleotide linker.
- the nucleotide linker is ranges from 1-50 nucleotides in length.
- the agRNA is capable of binding to a napDNAbp by the scaffold sequence of the gRNA and directing the napDNAbp to a target nucleic acid sequence (e.g., genomic DNA).
- the target nucleic acid sequence comprises (i) a target strand and (ii) a complementary non-target strand.
- the target strand comprises a protospacer sequence that binds to the spacer sequence of the gRNA forming an RNA-DNA hybrid.
- the non-target strand comprises the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor).
- the non-target strand binds to 3'-nucleic acid extension of the agRNA.
- the 3 '-nucleic acid extension binds to the non-target strand using (a) an upstream binding sequence (UBS) that is complementary the non-target stand and binds downstream of the target nucleic acid on the non-target strand and/or (b) a downstream binding sequence (DBS) that is complementary the non-target stand and binds upstream of the target nucleic acid on the non-target strand ( Figure IB and Figure 5A).
- UBS upstream binding sequence
- DBS downstream binding sequence
- the UBS and/or the DBS can be of any suitable length. In certain embodiments, the UBS and/or the DBS is 0 nucleotides in length. In certain embodiments the UBS and/or DBS is at least 1 nucleotide, at least 2 nucleotides, 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at
- nucleotides 25, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, 42 nucleotides, at least 43 nucleotides, at least 44 nucleotides, at least 45 nucleotides, at least 46 nucleotides, at least 47 nucleotides, at least 48 nucleotides, at least 49 nucleotides, or at least 50 nucleotides in length.
- the UBS and/or the DBS are at least 85% homologous, at least 90% homologous, at least 95 % homologous, at least 97 % homologous, at least 99% homologous, at least 99.7 % homologous, or 100% homologous to the non-target strand.
- the UBS and/or DBS comprises at least 1, at least 2, at least 3, at least 4, or at least 5 mismatches.
- the 3 '-nucleic acid extension further comprises a counterloop sequence (CLS).
- CLS can be of any suitable length.
- the CLS is 0 nucleotides in length.
- the CLS is at least 1 nucleotide, at least 2 nucleotides, 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides
- the CLS forms a secondary structural feature. In some embodiments, the CLS forms a hairpin. In some embodiments, the CLS is flanked by the UBS and the DBS.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA.
- the 3 '-nucleic acid extension further comprises a secondary structural element.
- the secondary structure element is a tevopreQl motif.
- the 3 '-nucleic acid extension comprises any structure selected from:
- each instance of comprises an optional linker, e.g. a peptide linker.
- the agRNA comprises any structure selected from: 5'-[gRNA]-[UBS]-[CLS]-[DBS]-3';
- the 3 '-nucleic acid extension has a nucleotide sequence of SEQ ID NO: 2-11, or a nucleotide sequence having at least 80% sequence identity therewith. In certain embodiments, the 3 '-nucleic acid extension comprises a sequence selected from the group consisting of:
- CTGGCGCGTCGCGCTCTGG (SEQ ID NO: 4- 61531 agRNA);
- CTCGCGGCTTCGCGTGGCAC (SEQ ID NO: 6 - 62809 agRNA);
- CACGCGGCTTCGCGGGCACCA (SEQ ID NO: 7 - 41197 agRNA); ACCGCGCTTCGCGTGGCACCA (SEQ ID NO: 8 - 48214 agRNA); CACCCCTCGCGTTCGCGTTCTGGCA (SEQ ID NO: 9 - 35622 agRNA); CCCTGGCGCGTTCGCGCGCGGCAC (SEQ ID NO: 10 - 56984 agRNA); and TGGCGCGGCTCGCTGGCACCA (SEQ ID NO: 11 - 63661 agRNA).
- the agRNA is capable of binding to a napDNAbp by the scaffold sequence of the gRNA and directing the napDNAbp to a target nucleic acid sequence (e.g., genomic DNA).
- the target nucleic acid sequence comprises (i) a target strand and (ii) a complementary non-target strand.
- the target strand comprises a protospacer sequence that binds to the spacer sequence of the gRNA forming an RNA-DNA hybrid.
- the non-target strand comprises the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor).
- the non-target strand binds to 3 '-nucleic acid extension of the agRNA.
- the 3 '-nucleic acid extension binds to the non-target strand using (a) an upstream binding sequence (UBS) that is complementary the non-target stand and binds downstream of the target nucleic acid on the non-target strand and/or (b) a downstream binding sequence (DBS) that is complementary the non-target stand and binds upstream of the target nucleic acid on the non-target strand ( Figure IB and Figure 5A).
- the “perfect edit” is a single nucleotide substitution of the target nucleic acid within a target nucleic acid sequence.
- the target nucleic acid can be referred to as the “target site” of a nucleobase editor.
- SNPs are the most common genetic variations for various complex human diseases and disorders, including, but not limited to, inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods described herein. [0139] The present disclosure contemplates the use of an agRNA described herein for multiplex editing with a nucleobase editor described herein or elsewhere.
- the target nucleic acid sequence comprises one or more target nucleic acids (also referred to as “the target nucleobase”), wherein the one or more target nucleic acids fall within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site.
- the target nucleic acid falls within a gene (e.g., DNMTI), a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by pathogenic Single Nucleotide Polymorphisms (SNPs).
- SNPs are the most common genetic variations for various complex human diseases and disorders, including inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods described herein.
- the present disclosure provides an agRNA library comprising a plurality of agRNAs described herein.
- Each agRNA library comprises agRNAs that bind up- (e.g., utilizing UBSs) and/or downstream (i.e., utilizing DBSs) of a specific target DNA strand that is later present in the active site of the nucleobase editor ( Figure IB and Figure 5A). Therefore, each agRNA library is specific to the intended target of the nucleobase.
- the agRNA library comprising between 2,000-75,000 different agRNAs. In some embodiments, the agRNA library comprising 2,000 different agRNAs. In some embodiments, the agRNA library comprising between 10,000 different agRNAs. In some embodiments, the agRNA library comprising between 20,000 different agRNAs. In some embodiments, the agRNA library comprising between 40,000 different agRNAs. In some embodiments, the agRNA library comprising between 55,000 different agRNAs. In some embodiments, the agRNA library comprising between 60,000 different agRNAs. In some embodiments, the agRNA library comprising between 75,000 different agRNAs.
- the agRNA library comprises a plurality of agRNAs with different 3 '-nucleic acid extensions. In some embodiments, the agRNA library varies the 3'- nucleic acid extensions by length. In some embodiments, the agRNA library varies the 3'- nucleic acid extensions by UBS, DBS, or CFS (e.g., the UBS and DBS are kept constant, while the CFS is varied).
- the agRNA library varies the 3 '-nucleic acid extensions by structure such that the 3 '-nucleic acid extensions comprise different combinations of UBSs, CESs, and DBSs (e.g., an agRNA library comprising agRNAs with a CFS in the 3'- nucleic extension versus an agRNA library comprising agRNAs without a CFS in the 3'- nucleic extension), or further comprise other secondary structural elements (e.g., tevopreql motif).
- the agRNA library is used to screen which agRNAs improve editing efficiency and/or reduce bystander editing of a nucleobase editor.
- the agRNA library consists of combinations of an array of upstream binding sequences (UBSs), counter-loop sequences (CESs), and downstream binding sequences (DBSs) for a particular target nucleic acid sequence.
- UBSs upstream binding sequences
- CESs counter-loop sequences
- DBSs downstream binding sequences
- the UBSs and the DBSs bind to the target nucleic acid (e.g., genomic DNA) surrounding the targeted edit (also referred to herein as the “perfect edit” or “target nucleic acid”).
- the disclosure provides polynucleotides, vectors, and cells, comprising an agRNA described herein for screening the editing pattern for each nucleobase combined with a particular agRNA.
- the disclosure provides a polynucleotide encoding an agRNA described herein.
- the present disclosure provides a vector comprising any polynucleotide described herein.
- the vector comprises a polynucleotide encoding an agRNA described herein.
- the polynucleotide can be under the control of a promoter.
- the polynucleotide can be under the control of multiple promoters.
- the promoter can be any promoter recognized by a skilled artisan (e.g., a constitutive promoter, a tissue- specific promoter, or an inducible promoter).
- the promoter can be a U6 promoter.
- the promoter can also be a U6, U6v4, U6v7, or U6v9 promoter or a fragment thereof.
- the vector further comprises a polynucleotide sequence comprising an agRNA described herein and a target nucleic acid sequence (e.g., a gene of interest) that includes the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor) ( Figure 5A).
- the target nucleic acid sequence is located downstream of the agRNA sequence.
- the agRNA and the target nucleic acid sequence are within a 50-600-nucleotide window (e.g., a 100-nucleotide window, a 300-nucleotide window, a 450-nucleotide window, etc.).
- the vector further comprises at least one primer binding site.
- the vector further comprises at least two primer binding sites.
- the vector comprising the one or more primer binding sites is subjected to next-generation sequencing (NGS) to sequence the agRNA and the target nucleic acid after the editing process in order to analyze the editing pattern (e.g., editing efficiency and bystander editing) of a nucleobase with a given agRNA.
- NGS next-generation sequencing
- a first primer binding site is located upstream or within the agRNA, while a second primer binding site is located downstream of a target nucleic acid sequence.
- the distance between the first and second primer sites is less than 300 nucleotides. In some embodiments, the distance between the first and second primer sites is less than 600, less than 500, less than 300, less than 200, less than 100, or less than 50 nucleotides. In certain embodiments, the distance between the first and second primer sites is less than 300 nucleotides.
- the present disclosure provides an agRNA screening library comprising a plurality of vectors described above and provided in this disclosure.
- next-generation sequencing NGS is used to sequence the plurality of vectors of the agRNA screening library to analyze the editing pattern for each library clone within the 300-nucleotide window.
- the target sequence is the human DNMT1 gene.
- the present disclosure provides a method of selecting agRNAs comprising the steps of:
- NGS next generation sequencing
- the bystander editing of the nucleobase editor at one or more sites is reduced by at least 2%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% relative to a reference agRNA.
- the editing efficiency of the nucleobase editor at one or more sites is reduced by at least 2%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% relative to a reference agRNA.
- compositions comprising any of the fusion proteins provided herein, and an agRNA optionally bound to napDNAbp of the fusion protein.
- compositions comprising any of the fusion proteins provided herein, and an agRNA optionally bound to a Cas9 domain (e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase) of fusion protein.
- a Cas9 domain e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase
- Some aspects of this disclosure provide complexes comprising any of the fusion proteins provided herein, and an agRNA bound to napDNAbp of the fusion protein. Some aspects of this disclosure provide complexes comprising any of the fusion proteins provided herein, and an agRNA bound to a Cas9 domain (e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase) of fusion protein.
- a Cas9 domain e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase
- the present disclosure describes a composition
- a composition comprising (a) an agRNA and (b) a nucleobase editor (e.g., ABExl, ABEx2, ABEx3, or ABEx4) to carry out nucleobase editing.
- the composition further comprises (c) a target nucleic acid.
- the nucleobase editor is a fusion protein capable of base editing.
- the fusion protein comprises a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
- the composition comprises (a) an agRNA, (b) a N-terminal portion of a split nucleobase editor fused at its C-terminus to an intein-N and (c) a C-terminal portion of a split nucleobase editor fused at its N-terminus to an intein-C such that the N- terminal portion of a split nucleobase editor and the C-terminal portion of a split nucleobase editor are joined to form a fusion protein of a deaminase and a napDNAbp.
- the composition further comprising (d) a target nucleic acid.
- the present disclosure describes a complex comprising any of the agRNAs described herein and a nucleobase editor described herein or elsewhere.
- the present disclosure describes one or more polynucleotides encoding a complex of an agRNA and a nucleobase editor described herein or elsewhere.
- the present disclosure describes one or more vectors comprising one or more polynucleotides encoding a complex of an agRNA and a nucleobase editor.
- the vector includes one or more promoters that drive the expression of the agRNA and the nucleobase editor or split nucleobase editor of the complex.
- the nucleobase editors described herein may comprise a nucleic acid programmable DNA binding protein (napDNAbp).
- napDNAbp nucleic acid programmable DNA binding protein
- a napDNAbp can be associated with or complexed with at least one guide nucleic acid (e.g., guide RNA or a agRNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target nucleic acid sequence) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the spacer of a guide RNA which anneals to the protospacer of the DNA target).
- the guide nucleic- acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to complementary sequence of the protospacer in the DNA.
- any suitable napDNAbp may be used in the nucleobase editors described herein.
- the napDNAbp may be any Class 2 CRISPR-Cas system, including any type II, type V, or type VI CRISPR-Cas enzyme.
- the napDNAbp is selected from the group consisting of Cas9, CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp-Cas9, SpRY, SpG-Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2(a-i), Casl4, Argonaute, and variants thereof.
- the nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., dCas9 and nCas9), Casl2e (CasX), Casl2d (CasY), Casl2a (Cpfl), Casl2bl (C2cl), and Casl2c (C2c3).
- nucleobase editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein — including any naturally occurring variant, mutant, or otherwise engineered version of Cas9 — that is known or which can be made or evolved through a directed evolutionary or otherwise mutagenic process.
- the Cas9 or Cas9 variants have a nickase activity, i.e., only cleave of strand of the target DNA sequence.
- the Cas9 or Cas9 variants have inactive nucleases, i.e., are “dead” Cas9 proteins.
- Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid structure (e.g., the circular permutant formats).
- the nucleobase editors described herein may also comprise Cas9 equivalents, including Casl2a/Cpfl and Casl2b proteins which are the result of convergent evolution.
- the napDNAbps used herein may also contain various modifications that alter/enhance their PAM specificities (e.g., SpRY).
- the application contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a references SpCas9 canonical sequences or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
- a reference Cas9 sequence such as a references SpCas9 canonical sequences or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
- the nucleobase editors described herein may be delivered to cells as two or more fragments which become assembled inside the cell (either by passive assembly, or by active assembly, such as using split intein sequences) into a reconstituted nucleobase editor.
- the self-assembly may be passive whereby the two or more nucleobase editor fragments associate inside the cell covalently or non-covalently to reconstitute the nucleobase editor.
- the self-assembly may be catalyzed by dimerization domains installed on each of the fragments.
- the selfassembly may be catalyzed by split intein sequences installed on each of the nucleobase editor fragments.
- split nucleobase editors analogous to those described herein is further described in, for example, International Patent Application Publication No. WO 2017/197238, published November 16, 2017, which is incorporated herein by reference.
- the nucleobase editor (BE) is divided at a split site within the napDNAbp.
- Fusion proteins useful for the methods disclosed herein include cytidine base editors (CBEs), in which the deaminase domain is a cytidine deaminase.
- the deaminase domain is an apolipoprotein B mRNA-editing complex 1 (APOBEC1) deaminase domain.
- APOBEC1 apolipoprotein B mRNA-editing complex 1
- rAPOBECl a rat APOBEC1
- a human APOBEC1 is used.
- cytidine deaminases include APOBEC2 deaminase, APOBEC3A deaminase, APOBEC3B deaminase, APOBEC3C deaminase, APOBEC3D deaminase, APOBEC3F deaminase, APOBEC3G deaminase, and APOBEC3H deaminase, an activation-induced deaminase (AID), a cytidine deaminase 1 from Petromyzon marinus (pmCDAl), an ACF1/ASE deaminase, CBE6, CGBE, TadCBE, or a variant thereof.
- AID activation-induced deaminase
- AID activation-induced deaminase 1 from Petromyzon marinus
- pmCDAl Petromyzon marinus
- ACF1/ASE deaminase CBE6, CGBE, Ta
- the cytidine base editors utilized in the disclosed methods may further comprise an inhibitor of base excision repair ("iBER") domain.
- the iBER domain may comprise a uracil glycosylase inhibitor (UGI) domain.
- the uracil glycosylase inhibitor domain prevents a U:G mismatch (or G:T mismatch) from being repaired back to the original C:G (or A:T) base pair.
- the fusion protein comprises a catalytically inactive inosine-specific nuclease domain, such as a UGI domain.
- a UGI domain comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, 99%, 99.5%, or 99.9% identical to the amino acid sequence:
- Configurations of the cytidine base editors utilized in the methods disclosed herein may comprise dCas9 and/or UGI domains that comprise fusion proteins having the general structure NH2-[dCas9]-[cytidine deaminase domain]-COOH, NH2-[cytidine deaminase domain]-[dCas9]-COOH, NH2-[dCas9]-[cytidine deaminase domain] -[uracil glycosylase inhibitor] -COOH, or NH2-[cytidine deaminase domain]-[dCas9]-[uracil glycosylase inhibitor] -COOH; wherein each instance of "]-[" comprises an optional linker, e.g. a peptide linker.
- Configurations of the cytidine base editors utilized in the methods disclosed herein may comprise nCas9 and/or UGI domains that comprise fusion proteins having the general structure NH2-[nCas9]-[cytidine deaminase domain]-COOH, NH2-[cytidine deaminase domain]-[nCas9]-COOH, NH2-[nCas9]-[cytidine deaminase domain] -[uracil glycosylase inhibitor] -COOH, or NH2-[cytidine deaminase domain]-[nCas9]-[uracil glycosylase inhibitor] -COOH; wherein each instance of “]-[“ comprises an optional linker, e.g. a peptide linker.
- the cytidine base editors (CBE) utilized in the disclosed methods may further comprise one, two, or more than two nuclear localization sequences (NLS).
- Configurations of such base editors may comprise fusion proteins having the structure NH2-[dCas9]-[cytidine deaminase domain] -[NLS] -COOH, NH2-[dCas9]-[cytidine deaminase domain]-[NLS]-[NLS]-COOH, NH2-[cytidine deaminase domain]-[dCas9]-[NLS]-COOH, NH2-[cytidine deaminase domain]-[dCas9]-[NLS]-[NLS]-COOH, NH2-[dCas9]- [cytidine deaminase domain] -[uracil glycosylase inhibitor] -[NLS]
- Fusion proteins useful for the methods disclosed herein include adenine base editors (ABEs), in which the deaminase domain is an adenosine deaminase.
- the adenosine deaminase domain comprises the amino acid sequence of SEQ ID NO: 1 and 13-18.
- the adenosine deaminase is derived from a bacterium, such as, E. coli, S. aureus, S. typhi, S. putrefaciens, H. influenzae, or C. crescentus.
- the adenosine deaminase is a TadA deaminase.
- the TadA deaminase is an E. coli TadA deaminase (ecTadA).
- the TadA deaminase is a truncated E. coli TadA deaminase.
- the truncated ecTadA may be missing one or more N-terminal amino acids relative to a full-length ecTadA.
- the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 N-terminal amino acid residues relative to the full length ecTadA.
- the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 C-terminal amino acid residues relative to the full length ecTadA.
- the ecTadA deaminase does not comprise an N-terminal methionine.
- the adenosine deaminase is an N-terminal truncated E. coli TadA (ecTadA).
- the adenosine deaminase comprises a sequence that has at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following amino acid sequence: MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPT AHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKT GAAGSLMDVLHHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 13).
- the adenosine deaminase is a full-length E. coli TadA (“ecTadA(wt)”) deaminase.
- the adenosine deaminase comprises a sequence that has at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following amino acid sequence: MRRAFITGVFFLSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEG WNRPIGRHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIG RVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADECAALLSDFFRMRRQEI KAQKKAQSSTD (SEQ ID NO: 14).
- the adenosine deaminase comprises a D108N mutation in SEQ ID NO: 13, or a corresponding mutation in a homologous or orthologous adenosine deaminase.
- the adenosine deaminase further comprises an A106V mutation in SEQ ID NO: 13, or a corresponding mutation in a homologous or orthologous adenosine deaminase.
- the fusion proteins disclosed herein have the general structure ecTadA*-XTEN-dCas9 (e.g. “ecTadA*(7.10)”), where ecTadA* represents an ecTadA variant comprising A106V and D108N mutations in the amino acid sequence of SEQ ID NO: 1.
- ecTadA* represents an ecTadA variant comprising A106V and D108N mutations in the amino acid sequence of SEQ ID NO: 1.
- the adenosine deaminase comprises the amino acid sequence:
- Configurations of the adenine base editors utilized in the methods disclosed herein may comprise a dCas9 domain, and may comprise fusion proteins having the structure NH2- [dCas9]- [adenine deaminase domain] -COOH, NH2- [adenine deaminase domain] -[dCas9]- COOH, NH2-[dCas9]- [adenine deaminase domain]-[NLS]-COOH, NH2-[dCas9]-[adenine deaminase domain]-[NLS]-[NLS]-COOH, NH2-[adenine deaminase domain]-[dCas9]-[NLS]- COOH, NH 2 -[adenine deaminase domain]-[dCas9]-[NLS]-[NLS]-COOH, NH 2 -[NLS]- [dCas9]- [d
- Configurations of the adenine base editors utilized in the methods disclosed herein may comprise an nCas9 domain, and may comprise fusion proteins having the structure NH2- [nCas9]- [adenine deaminase domain] -COOH, NH2- [adenine deaminase domain] -[nCas9]- COOH, NH2-[nCas9]- [adenine deaminase domain]-[NLS]-COOH, NH2-[nCas9]-[adenine deaminase domain]-[NLS]-[NLS]-COOH, NH2-[adenine deaminase domain]-[nCas9]-[NLS]- COOH, NH 2 -[adenine deaminase domain]-[nCas9]-[NLS]-[NLS]-COOH, NH 2 -[NLS]- [nCas9]- [nC
- the adenosine deaminase is selected from the group consisting of TadA-8e, ABE8e, AYBE, ABE9, and variants thereof.
- the deaminase is a TadA-8e adenosine deaminase.
- the deaminase is a TadA-8e adenosine deaminase variant.
- the deaminase is an adenosine deaminase comprising an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1.
- the adenosine deaminase has an amino acid sequence that includes one or more amino acid substitutions at positions 8, 25, 26, 27, 28, 29, 30, 31, 33, 34, 37, 38, ,39, 41, 42, 43, 44, 45, 48, 49, 50, 54, 56, 58, 78, 79, 80, 82, 84, 85, 86, 88, 90, 91, 92, 93, 94, 95, 96, 97, 99, 100, 101, 102,106, 107, 109, 111, 123, 146, 149, 151, 152, 155, 156, and 157 of SEQ ID NO: 1.
- the adenosine deaminase has an amino acid sequence that includes one or more amino acid substitutions at positions 28, 34, and 151 of SEQ ID NO: 1.
- the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitutions H8D, R25E, R26G, E27R, V28C, P29C, V30W, G31E, V33C, V33T, I34W, N37T, N37H, N38I, R39E, I41S, G42A, E43R, G44A, W45G, A48P, I49S, G50A, D54T, A56P, A58S, A78R, T79H, L80P, V822R, F84I, F84L, E85R, P86A, V88R, C90V, A91R, G92R, A93R, M94H, I95D, H96P, S97L, I99D,
- the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitutions V28C, L34W, and M151E. In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitution V28C (ABExl). In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitution L34W (ABEx2). In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitution M151E (ABEx3). In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitutions V28C and M151E (ABEx4).
- the adenosine deaminase comprises a V28C, L34W, and/or M151E mutation in SEQ ID NO: 1, or a corresponding mutation in a homologous or orthologous adenosine deaminase. In some embodiments, the adenosine deaminase comprises a V28C mutation in SEQ ID NO: 1, or a corresponding mutation in a homologous or orthologous adenosine deaminase.
- the adenosine deaminase comprises a M151E mutation in SEQ ID NO: 1, or a corresponding mutation in a homologous or orthologous adenosine deaminase.
- Methods for determining homologous or orthologous adenosine deaminases to SEQ ID NO: 1 would be apparent to the skilled artisan.
- nucleobase editor fusion proteins comprising an TadA-8e deaminase variant and a napDNAbp.
- Exemplary fusion proteins include, without limitation, the following TadA-8e deaminase variants:
- ABEx2 (L34W mutation is bolded and underlined, BPNLS sequence is underlined) MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVWVLN NRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGA MIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFY RMPRQVFNAQKKAQSSIN (SEQ ID NO: 16).
- ABEx3 (M15 IE mutation is bolded and underlined, BPNLS sequence is underlined) MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLN NRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGA MIHSRIGRVVEGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDEY REPRQVFNAQKKAQSSIN (SEQ ID NO: 17).
- the fusion proteins disclosed herein further comprise one or more, preferably at least two nuclear localization signals.
- the fusion proteins comprise at least two NLSs.
- the NLSs can be the same NLSs or they can be different NLSs.
- the NLSs may be expressed as part of a fusion protein with the remaining portions of the fusion proteins.
- the location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a fusion protein (e.g. inserted between the encoded Cas9 and a DNA effector moiety (e.g. a deaminase)).
- the NLSs may be any known NLS sequence in the art.
- the NLSs may also be any future-discovered NLSs for nuclear localization.
- the NLSs also may be any naturally- occurring NLS, or any non-naturally occurring NLS (e.g. an NLS with one or more desired mutations).
- a nuclear localization signal or sequence is an amino acid sequence that tags, designates, or otherwise marks a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. Different nuclear localized proteins may share the same NLS. An NLS has the opposite function of a nuclear export signal (NES), which targets proteins out of the nucleus.
- NES nuclear export signal
- a nuclear localization signal can also target the exterior surface of a cell. Thus, a single nuclear localization signal can direct the entity with which it is associated to the exterior of a cell and to the nucleus of a cell.
- Such sequences can be of any size and composition, for example more than 25, 25, 15, 12, 10, 8, 7, 6, 5 or 4 amino acids, but will preferably comprise at least a four to eight amino acid sequence known to function as a nuclear localization signal (NLS).
- nuclear localization sequence refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport.
- Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed Nov. 23, 2000, published as WO 2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences.
- linkers may be used to link any of the peptides or peptide domains of the disclosure.
- the term "linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g. a binding domain and a cleavage domain of a nuclease.
- a linker joins a dCas9 and deaminase domain (e.g. a cytidine or adenosine deaminase).
- the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two.
- the linker is an amino acid or a plurality of amino acids (e.g. a peptide or protein).
- the linker is an organic molecule, group, polymer, or chemical moiety.
- the linker is 5- 100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30, 31, 32, 33-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
- the linker is a peptide linker, such as an XTEN linker, a 16 amino acid linker.
- the fusion protein described herein may comprise one or more heterologous protein domains, e.g. epitope tags and reporter gene sequences.
- the heterologous protein domain comprises a reporter sequence comprising a p2A-GFP insert ((Addgene plasmid #65562; RRID:Addgene_65562), see Li J, et al., Intron targeting-mediated and endogenous gene integrity-maintaining knockin in zebrafish using the CRISPR/Cas9 system. Cell Res. (2015)).
- Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags.
- reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP).
- GST glutathione-5-transferase
- HRP horseradish peroxidase
- CAT chloramphenicol acetyltransferase
- beta-galactosidase beta-galacto
- a fusion protein may be fused to a gene sequence encoding a protein or a fragment of a protein that bind DNA molecules or bind other cellular molecules, including, but not limited to, maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, and herpes simplex virus (HSV) BP16 protein fusions. Additional domains that may form part of a fusion protein are described in US Patent Publication No. 2011/0059502, published Mar. 10, 2011 and incorporated herein by reference in its entirety.
- the invention provides methods comprising delivering one or more nucleobase editor-encoding polynucleotides, such as or one or more vectors as described herein encoding one or more components of the base editing system described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell.
- the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells.
- a nucleobase editor as described herein in combination with (and optionally complexed with) an anchor guide sequence is delivered to a cell.
- Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome.
- Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
- Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipidmucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.
- Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., TransfectamTM and LipofectinTM).
- Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
- Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
- target tissues e.g. in vivo administration.
- the preparation of lipidmucleic acid complexes, including targeted liposomes such as immunolipid complexes is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem.
- RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus.
- Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo).
- Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
- the disclosure provides cells (e.g., transformed cell lines) that comprise the agRNA described herein.
- the cells can also comprise the nucleobase editing complexes described herein (e.g., wherein the cell comprises both an agRNA and a nucleobase editor).
- the cells can also comprise any of the polynucleotides described above, which express the agRNA, and optionally which express the nucleobase editors.
- the cells can comprise any of the vectors described above, which express the agRNA, and optionally which express the nucleobase editors.
- the disclosure describes a method of selecting agRNAs, wherein the method involves transfecting the agRNA screening libraries described above and in this discloser into host cells, and using NGS to select agRNA vectors that observe reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid by a nucleobase editor.
- the reduced bystander editing and/or editing efficiency of a nucleobase editor is measured relative to the gRNA lacking the 3 '-nucleic acid extension of the agRNA or a second agRNA.
- a host cell is transiently or non-transiently transfected with one or more vectors described herein.
- a cell is transfected as it naturally occurs in a subject.
- the cell that is transfected is derived from cells taken from a non-human subect, such as a mammal (e.g., primate (e.g., cynomolgus monkey or rhesus monkey), elephant, or mouse) or an avian (e.g., a bird).
- the cell that is transfected is derived from cells taken from plants, fungi, bacteria, and archaea.
- a cell that is transfected is taken from a subject.
- the cell is derived from cells taken from a subject, such as a cell line.
- a wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huhl, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panel, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calul, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7,
- a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences.
- a cell transiently transfected with the components of a CRISPR system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence.
- cells transiently or non-transiently transfected with one or more vectors described herein, or cell lines derived from such cells are used in assessing one or more test compounds.
- the disclosure provides a pharmaceutical composition
- a pharmaceutical composition comprising: (i) an agRNA described above, or a nucleobase editing complex described above, a polynucleotide described above, or a vector described above, or any of the cells described above, and (ii) a pharmaceutically acceptable excipient.
- compositions comprising any of the various components of the base editing system described herein (e.g., including, but not limited to, the napDNAbps, deaminases, fusion proteins (e.g., comprising napDNAbps and deaminases), pegRNAs, and complexes comprising fusion proteins and agRNAs, as well as accessory elements.
- the napDNAbps e.g., including, but not limited to, the napDNAbps, deaminases, fusion proteins (e.g., comprising napDNAbps and deaminases), pegRNAs, and complexes comprising fusion proteins and agRNAs, as well as accessory elements.
- composition refers to a composition formulated for pharmaceutical use.
- the pharmaceutical composition further comprises a pharmaceutically acceptable carrier.
- the pharmaceutical composition comprises additional agents (e.g. for specific delivery, increasing half-life, or other therapeutic compounds).
- the term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body).
- a pharmaceutically-acceptable material such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body).
- a pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.).
- materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols
- wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation.
- excipient e.g., pharmaceutically acceptable carrier or the like are used interchangeably herein.
- the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing.
- Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.
- the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site).
- a diseased site e.g., tumor site
- the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.
- the pharmaceutical composition described herein is delivered in a controlled release system.
- a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng. 14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med. 321:574).
- polymeric materials can be used.
- the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human.
- pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer.
- the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection.
- the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent.
- the pharmaceutical is to be administered by infusion
- it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline.
- an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.
- a pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer’s or Hank’s solution.
- the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.
- the pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration.
- the particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein.
- Compounds can be entrapped in “stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther. 1999, 6:1438-47).
- SPLP stabilized plasmid-lipid particles
- lipids such as N-[l-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles.
- DOTAP N-[l-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate
- the preparation of such lipid particles is well known. See, e.g., U.S. Patent Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.
- the pharmaceutical composition described herein may be administered or packaged as a unit dose, for example.
- unit dose when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.
- the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection.
- a pharmaceutically acceptable diluent e.g., sterile water
- the pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention.
- Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.
- an article of manufacture containing materials useful for the treatment of the diseases described above is included.
- the article of manufacture comprises a container and a label.
- Suitable containers include, for example, bottles, vials, syringes, and test tubes.
- the containers may be formed from a variety of materials such as glass or plastic.
- the container holds a composition that is effective for treating a disease described herein and may have a sterile access port.
- the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle.
- the active agent in the composition is a compound of the invention.
- the label on or associated with the container indicates that the composition is used for treating the disease of choice.
- the article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
- the disclosure describes a computational method, which may be embodied in software, for designing a library of 3'-nucleic acid extensions.
- the method involves efficiently evaluating a nucleic acid target and generating one or more combinations of a UBS, DBS, and/or CLS for nucleobase editing.
- program or “software” are used herein in a generic sense to refer to any type of computer code or set of processor-executable instructions that may be employed to program a computer or other processor to implement various aspects of embodiments as described above.
- one or more computer programs that when executed perform methods of the disclosure provided herein need not reside on a single computer or processor but may be distributed in a modular fashion among different computers or processors to implement various aspects of the disclosure provided herein.
- Some aspects of this disclosure provide methods of selecting nucleobase editors that show reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid utilizing machine learning (ML) language models.
- the ML language models are able to predict evolutionary adaptative mutations resulting in nucleobase editor variants with improved fitness.
- the present disclosure describes a method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor comprising:
- the machine learning model is an ESM-lb language model and/or an ESM-lv language model, wherein said language models (i) learn natural amino acid patterns based on millions of naturally occurring protein sequences, (ii) consider mutations observed in sequences of natural proteins as plausible mutations and (iii) assume plausible mutations with high likelihood scores correlate with improved protein fitness. Evolution of nucleobase editor
- Some aspects of this disclosure provide methods of phage-assisted, non-continuous evolution (PANCE) of a nucleobase editor.
- Some aspects of this disclosure provide methods of phage-assisted, non-continuous evolution (PANCE) of a nucleobase editor.
- PANCE selection phages
- SP selection phages
- SP 1-3 selection plasmids
- the selection plasmid comprises (i) a pill nucleotide sequence (encoding the phage coat protein pill) that has been modified to contain at least one a single nucleotide variant (SNV) and (ii) an agRNA nucleotide sequence encoding the corresponding agRNA that targets the modified pill nucleotide sequence to correct (edit) the SNV to the wildtype sequence.
- SNV results in a mutated pill protein having lower the phage infectivity. Correction of the SNV by a complex of the agRNA and nucleobase editor to the WT sequence increases phage infectivity. If the perfect edit occurs, the pill sequence is reverted to the wildtype sequence and phage propagation occurs.
- the selection plasmid comprises a sequence such that bystander edits upstream and downstream of the target nucleic acid in the pill nucleotide sequence introduce mutations that inhibit phage propagation.
- host cells further comprise a helper plasmid and/or a mutagenesis plasmid.
- a mutagenesis plasmid comprises an arabinose- inducible promoter.
- the present disclosure describes a method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor, the method comprising an agRNA described herein as part of PANCE system.
- the method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor comprises the steps:
- selection plasmids for PANCE encode a pill gene further comprising mutations that diminish pill activity, wherein (i) pill activity is restored by a nucleobase editor if the nucleobase editor edits the target nucleic acid or (ii) pill activity is not restored by a nucleobase editor if the nucleobase editor edits bystander nucleic acids;
- the selection plasmids generated for PANCE comprise a sequence selected from any one of SEQ ID NOs: 19-21.
- the selection plasmids generated for PANCE comprise the sequence:
- the selection plasmids generated for PANCE comprise the sequence:
- the selection plasmids generated for PANCE comprise the sequence:
- a person skilled in the art would appreciate the use of other evolution systems (e.g., phage-assisted, continuous evolution (PACE)) for generating evolved nucleobase editors with reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid in conjunction with the agRNAs described herein.
- PACE phage-assisted, continuous evolution
- the present invention relates to an improved version of “base editing” that utilizes modified or equivalently, engineered agRNAs which are engineered to comprise one or more structural modifications that improve one or more characteristics, including their stability, cellular lifespan, affinity for Cas9 (or more broadly, to a napDNAbp), or interaction with a target DNA thereby increasing the editing efficiency base editing and reducing bystander editing within the base editing window of a nucleobase editor.
- Some aspects of this disclosure provide methods of using any of the fusion proteins (e.g., a Cas9 domain fused to an adenosine deaminase) provided herein, or complexes comprising an agNRA and a fusion protein (e.g., a Cas9 domain fused to an adenosine deaminase) provided herein.
- fusion proteins e.g., a Cas9 domain fused to an adenosine deaminase
- complexes comprising an agNRA and a fusion protein e.g., a Cas9 domain fused to an adenosine deaminase
- some aspects of this disclosure provide methods comprising contacting a DNA, or RNA molecule with any of the fusion proteins or nucleobase editors provided herein, and with at least one agNRA, wherein the agRNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence.
- the disclosure provides a method of nucleobase editing (e.g., “base editing”) comprising contacting a target nucleic acid sequence with an agRNA described above and a nucleobase editor comprising a fusion protein comprising a deaminase and a napDNAbp or a split napDNAbp, wherein the editing efficiency is increased and/or the bystander editing is decreased as compared to the same method using a gRNA not comprising the 3 Z -nucelic acid extension.
- base editing e.g., “base editing”
- the present disclosure contemplates the use of the agRNAs described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell (e.g., prokaryotic or eukaryotic cell), a virus, a fungus, a plant, an insect, and/or an animal.
- SNV single nucleotide variant
- the present disclosure contemplates the use of the methods described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell, a virus, a fungus, a plant, an insect, and/or an animal.
- SNV single nucleotide variant
- the use of the methods described herein may be used for modifying a target nucleic acid sequence for research purposes.
- the present disclosure contemplates the use of the base editing methods described herein for targeted modifications in the genomes of plants for improved crop varieties.
- the target DNA sequence comprises a sequence associated with a disease or disorder. In some embodiments, the target DNA sequence comprises a point mutation associated with a disease or disorder.
- the activity of the fusion protein results in a correction of the point mutation.
- the methods provided herein are used to introduce a deactivating point mutation into a gene or allele that encodes a gene product that is associated with a disease or disorder.
- methods are provided herein that employ a DNA editing fusion protein to introduce a deactivating point mutation into an oncogene e.g., in the treatment of a proliferative disease).
- a deactivating mutation may, in some embodiments, generate a premature stop codon in a coding sequence, which results in the expression of a truncated gene product, e.g., a truncated protein lacking the function of the full-length protein.
- the purpose of the methods provided herein is to restore the function of a dysfunctional gene via genome editing.
- the nucleobase editing proteins provided herein can be validated for gene editing-based human therapeutics in vitro, e.g., by correcting a disease-associated mutation in human cell culture.
- the instant disclosure provides methods for the treatment of a subject diagnosed with a disease associated with or caused by a point mutation that can be corrected by a DNA editing fusion protein provided herein.
- a method comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of a nucleobase editor fusion protein that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene.
- the disease is a proliferative disease.
- the disease is a genetic disease.
- the disease is a neoplastic disease.
- the disease is a metabolic disease.
- the disease is a lysosomal storage disease.
- Other diseases or disorders that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated gene will be known to those of skill in the art, and the disclosure is not limited in this respect.
- compositions of the present disclosure may be assembled into kits.
- the kit comprises nucleic acid vectors for the expression of the nucleobase editors described herein.
- the kit further comprises appropriate guide nucleotide sequences (e.g., agRNAs) or nucleic acid vectors for the expression of such guide nucleotide sequences, to target the Cas9 protein or nucleobase editor to the desired target sequence.
- agRNAs guide nucleotide sequences
- the kit described herein may include one or more containers housing components for performing the methods described herein and optionally instructions for use. Any of the kit described herein may further comprise components needed for performing the assay methods.
- Each component of the kits may be provided in liquid form (e.g., in solution) or in solid form, (e.g., a dry powder). In certain cases, some of the components may be reconstitutable or otherwise processible (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water), which may or may not be provided with the kit.
- kits may optionally include instructions and/or promotion for use of the components provided.
- “instructions” can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc.
- the written instructions may be in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which can also reflect approval by the agency of manufacture, use or sale for animal administration.
- kits includes all methods of doing business including methods of education, hospital and other clinical instruction, scientific inquiry, drug discovery or development, academic research, pharmaceutical industry activity including pharmaceutical sales, and any advertising or other promotional activity including written, oral and electronic communication of any form, associated with the disclosure. Additionally, the kits may include other components depending on the specific application, as described herein.
- kits may contain any one or more of the components described herein in one or more containers.
- the components may be prepared sterilely, packaged in a syringe and shipped refrigerated. Alternatively it may be housed in a vial or other container for storage. A second container may have other components prepared sterilely.
- the kits may include the active agents premixed and shipped in a vial, tube, or other container.
- kits may have a variety of forms, such as a blister pouch, a shrink wrapped pouch, a vacuum sealable pouch, a sealable thermoformed tray, or a similar pouch or tray form, with the accessories loosely packed within the pouch, one or more tubes, containers, a box or a bag.
- the kits may be sterilized after the accessories are added, thereby allowing the individual accessories in the container to be otherwise unwrapped.
- the kits can be sterilized using any appropriate sterilization techniques, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art.
- kits may also include other components, depending on the specific application, for example, containers, cell media, salts, buffers, reagents, syringes, needles, a fabric, such as gauze, for applying or removing a disinfecting agent, disposable gloves, a support for the agents prior to administration, etc.
- kits comprising a nucleic acid construct comprising a nucleotide sequence encoding the various components of the nucleobase editing system described herein (e.g., including, but not limited to, the napDNAbps, deaminases, fusion proteins (e.g., comprising napDNAbps and deaminases, agRNAs, and complexes comprising fusion proteins and agRNAs, as well as accessory elements.
- the nucleotide sequence(s) comprises a heterologous promoter (or more than a single promoter) that drives expression of the nucleobase editing system components.
- kits comprising one or more nucleic acid constructs encoding the various components of the nucleobase editing system described herein, e.g., the comprising a nucleotide sequence encoding the components of the nucleobase editing system capable of modifying a target DNA sequence.
- the nucleotide sequence comprises a heterologous promoter that drives expression of the nucleobase editing system components.
- kits comprising a nucleic acid construct, comprising (a) a nucleotide sequence encoding a napDNAbp (e.g., a Cas9 domain) fused to a deaminase and (b) a heterologous promoter that drives expression of the sequence of (a).
- a nucleic acid construct comprising (a) a nucleotide sequence encoding a napDNAbp (e.g., a Cas9 domain) fused to a deaminase and (b) a heterologous promoter that drives expression of the sequence of (a).
- the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
- This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified.
- “at least one of A and B” can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements) ;etc.
- a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
- Example 1 3 ' extended gRNAs (agRNAs) for targeted manipulation of bystander edits, activity, and editing window
- the mechanism underlying the adenosine deamination process involves that the TadA-8e domain engages with the exposed single-stranded region of the PAM-distal nontarget strand (NTS) (Lapinaite)] (FIG. 5A).
- NTS PAM-distal nontarget strand
- the TadA-8e deaminase when attached to the Cas9 protein, induces specific editing patterns within narrow DNA regions, covering several base pairs. This connection limits the enzyme to act on certain nucleotides, defining what is known as the editing window. Since different DNA contexts are relatively diverse substrates that require the enzyme to accept structure variations within its active site, the DNA strand in the active site has a certain degree of freedom to move in this position.
- a system that stabilizes the DNA strand within the active site was therefore established. This restricted movement may result in a smaller editing window thus minimizing the bystander effect. Therefore, one option was the possibility of adding nucleotides to the 3’ end of the gRNA scaffold.
- These anchor guide RNAs were designed to bind the DNA strand up- and downstream of the DNA region that is later present in the active site of the TadA-8e, thereby stabilizing the loop structure resulting in fewer bases being deaminated (FIGs. 1A and 5A).
- the agRNA library consisted of combinations of an array of upstream binding sequences, counterloops, and downstream binding sequences (FIGs. 5A-5B). Both the up- and downstream sequences bound the DNA strand surrounding the targeted edit. Sequences of lengths ranging from 1 to 11 base pair (bp) were tested, with all possible starting points in a 11 bp window. The counterloop sequences ranged from 1 to 33 bp, with the longer ones forming guanine-cytosine (GC)-rich hairpins. This design process yielded an agRNAs library containing ⁇ 60K candidates.
- a plasmid with the editing target downstream of the agRNA was constructed.
- NGS nextgeneration sequencing
- he tested library was designed to target a site in the human DNMT1 locus, being an optimal candidate for screening, both for high accessibility for editing in HEK293T cells, and the multiple adenines context within the editing window.
- the ABE8e-spCas9-WT nucleobase editor in combination with a non-modified guide (sgRNACtrl) showed a high editing efficiency for the four adenines in the editing window (A13, A14, A15 and A17).
- sgRNACtrl non-modified guide
- anchors that showed higher efficiency and lower bystander editing in the DNMT1 sensor library were selected.
- Five candidates to test in the native context in HEK293T cells were selected, and all of them showed a decrease in bystander editing (FIG. ID).
- Clone 56114 was the one that showed higher precision in A13 editing, with a significant reduction of 44% in A 17 and 34% in A16 (FIGs. ID and 5C-5e).
- the agRNA56114-tevopreql was tested in Hela and HepG2 cells, obtaining similar bystander reduction patterns (FIG. II).
- sgRNA and agRNA libraries targeting -12000 pathogenic single nucleotide variants that can be targeted by base editing based on proximity to a PAM (Arbab) were constructed.
- the libraries sgRNACtrl-tevopreql and agRNA56114-tevopreql decreased the editing efficiency of the ABE8e-spCas9-WT when compared to sgRNACtrl (FIG. 1H).
- the library with agRNAs56114 slightly decreased the editing efficiency.
- Phage-assisted non-continuous evolution has been used in the past to increase the activity of ABEs (Richter).
- a selection method was designed to evolve variants with decreased bystander edits that can also work with the agRNAs disclosed herein.
- the activity of the nucleobase editor encoded on the M13 phage genome was linked with the expression of the gene 111 (encoding for the pill protein) that is required for phage replication (Esvelt).
- the selection plasmids also encoded the corresponding agRNA targeting the SNV, as well as a C-terminal Intein-nCas9-SpRY.
- a functional nucleobase editor was expressed, and was capable of performing the editing (FIGs. 2A-2B and 6C).
- the initial batch cultivation was infected with the selection phage and after 12 hours of cultivation, different volumes of the supernatant were used to infect the next batch cultivation. Phage DNA was isolated from each selection round and NGS obtained after the last round of evolution (FIG. 6D).
- the TadA mutational landscape was determined, the top 50 most enriched mutations were selected, and individually tested in the DNMT1 site (FIGs. 7 A and 7C). Variants with both, higher efficiency and reduced bystander editing pattern at position Ar, were detected, when compared with the Abe8e-SpRY BE (FIG. 2E).
- the PANCE evolved TadA variants were also benchmarked against the ABE9 nucleobase editor (both WT and SpRY) that showed low editing efficiency and no relative bystander reduction in the DNMT1 site (FIGs. 2E and 7A-7C). Variants displaying V28C, L34W, D54T, and I95D showed potential to generate a perfect edit at position Ar, (FIG. 2E).
- ABE8e The crystal structure of ABE8e was computationally analyzed to better understand the impact of these mutations on ABE8e. Variants V28C and L34W were generated in silica, separately, and their interactions with surrounding amino acids and nucleotides were compared to wild-type, but no changes in interactions were predicted (Methods) (FIG. 8A). It possible that these mutations induced a conformational change in ABE8e that alter interactions of ABE8e residues H57 and C87 with nucleotide 8-Az(26) of the gRNA. Based on the wild-type crystal structure, H57 and C87 were predicted to establish three van der Waals interactions and two hydrogen bonds with 8-Az(26), respectively.
- Non-stop codon based PANCE selection proved to be a powerful tool to evolve base editing mutants that showed decreased bystander editing without losing on-target activity.
- Example 3 Machine learning guided identification of additional ABE candidates.
- protein language models trained on massive, non-redundant protein sequence datasets can learn these general, evolutionarily plausible mutational patterns (Hie et al., Meier et al., Hie et al. 2). This knowledge can be leveraged to predict mutations likely to be beneficial, guiding protein evolution more efficiently.
- these models can be used to predict the probability distribution of each amino acid at any given position along a protein sequence, where the probability distribution reflects the knowledge acquired by the models on their training dataset. Positions where the model assigns a higher probability to an amino acid than the wild-type residue are considered more likely than a random pick to yield a positive effect on the protein fitness.
- the wild-type residue M151 formed two hydrogen bonds with C146 and Q154.
- the mutation M151E allowed an additional hydrogen bond to form between the carboxyl group of glutamate (acceptor) and the amino group of Q154 (donor) (FIG. 8B).
- Glutamate also introduced a negative charge compared to methionine, potentially changing the local conformation, distances (the measured distance 5.214 A), and interactions between E151 and nucleotide C(25) of the gRNA (FIG. 3F).
- Example 4 agRNA, PANCE and ML variants outperform current base editing variants
- agRNAsei 14-tevopreqi was combined with ABE variants (FIGs. 7C and 9B), herein referred to as ABExl ABEx2, ABEx3, and ABEx4.
- ABExl and ABEx2 were generated in the PANCE experiment, ABEx3 using ML, and ABEx4 as the combination of both techniques (FIG. 4A). All the ABEx-spCas9-SpRY variants showed improved efficiency and reduced bystander editing when combined with the agRNA at the target site.
- ABExl demonstrated higher editing efficiency at position Ar, and reduced bystander editing compared to the control, using both sgRNA and agRNA (FIG. 4B) (FIGs. 10A-10D).
- ABEx2 L34W
- ABEx2 achieved precise editing and exhibited the same efficiency as ABE8e-SpRY at position A13, while also minimizing bystander editing at position A17/16/15 (FIG. 4B).
- ABEx2 showed precise editing.
- ABEx3 (M151E), also reduced bystander editing when combined with agRNAseiu-tevopreqi and increased efficiency at position A15 (FIG. 4B).
- a previous strategy to remove bystander editing using the PAMless SpRY variant of the spCas9 was based on the design of guides that moved the editing window to isolate the target nucelobase [Alves].
- the ability of ABE8e-spCas9-SpRY to isolate A13 after moving the editing window downstream (-1, -2, -3 bp) and upstream (+1 bp) of the control sgRNA was tested, and it was found that ABE8e-spCas9-SpRY was not able to isolate A13 with either the sgRNActrior with agRNAseiu-tevopreqi.
- variant ABExl showed increased A13 editing with both centered and +1 sgRNA and agRNAseiu-tevopreqi (FIG. 12). This result highlights the importance of targeted design of guide RNAs and their further combination with evolved nucleobase editors (e.g., ABExl) in order to find the best combination (e.g., a nucleobase editor and agRNA pair) to fix a particular mutation to maximize efficiency and safety.
- evolved nucleobase editors e.g., ABExl
- LB and 2xYT media were generated using MP BiomedicalsTM media capsules according to the manufacturer’s protocol.
- 16 g/L agar was added for standard, and 7 g/L agar was added for soft agar. All media was sterilized by autoclaving.
- agRNA library generation Placeholder 16 g/L agar was added for standard, and 7 g/L agar was added for soft agar. All media was sterilized by autoclaving.
- the agRNA library consisted of an upstream binding sequence (UBS) that was the reverse complement of the downstream sequence of the target sequence (also referred to as “the target” or “target”) of the nucleobase editor, a counterloop, and a downstream binding sequence (DBS) that bound the upstream sequence of the target.
- UBS upstream binding sequence
- target also referred to as “the target” or “target”
- DBS downstream binding sequence
- the upstream and downstream binding sequences were of different lengths and had different binding regions in the 1 to 11 bp region upstream and downstream of the target.
- the counterloop library consisted of 33 different DNA sequences, of which, the longer ones form GC rich hairpins.
- the final library contained every possible combination of an UBS, counterloop, and DBS.
- the agRNA for DNMT1 was ordered as Agilent DNA Oligo Pool.
- the oligos for the DNMT1 library contained a gibson overhang, gRNA, gRNA scaffold, agRNA library and a terminator followed by a short DNA sequence used as primer binding site.
- the target for the DNMT1 library was already cloned on the plasmid used as backbone.
- the DNA Oligo Pool library was amplified with the oligos Lib_F and Lib_R via PCR.
- the backbone pU6- tevopreql-GG- acceptor (Addgene #174038) was PCR amplified using the oligos SplitF and SplitR.
- the PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol.
- the fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol.
- 2 pL of the Gibson assembly mix are directly transformed into Lucigen's Endura Competent Cells and after recovery on 1ml of SOC media, plated on Carbenicillin/Agar plates poured in NuncTM Square BioAssay Dishes (Cole Palmer #EW-01929-00).
- the library was amplified from the pU6-tevopreql-GG-acceptor.
- the backbone was digested using PspXI and Esp3I and cloned by Gibson assembly following the previously described protocol.
- HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37°C and 5% CO2. 20 million cells were seeded in a 225mm 3 dish and co-transfected the day after using Lipofectamine 3000 (Thermo Fisher) with library plasmid amount corresponding to 1 plasmid per cell and 20 ug of nucleobase editor pCMV-T7-ABE8e-nSpCas9-P2A-EGFP (KAC978) (Addgene #185910). Genomic DNA was collected from cells 5 days after transfection.
- the generation of these context libraries differed from the generation of the agRNA, since extensive recombination events occurred when the gRNA, gRNA scaffold, agRNA and target were introduced as one oligo.
- the gRNA and target with 11 bp upstream and 25 bp downstream of the native genomic context were cloned as an oligo lacking the gRNA scaffold and hairpin. Instead of these, the oligo had 2 outward facing Bsal cutting sites with 10 randomized base pairs at that position.
- the DNA Oligo Pool libraries are amplified with the oligos Lib_F and Lib_R via PCR.
- the backbone sgBbsI (p2Tol-U6-2xBbsI-sgRNA- HygR) (Addgene #71485) was PCR amplified using the oligos BB_R and BB_F.
- the PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol.
- the fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol.
- the library was then digested using Bsal according to the manufacturer’s protocol and gel purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol.
- the gRNA scaffold, hairpin and terminator with an inward facing Bsal cutting site up- and downstream were ordered as cloned gene synthesis from IDT.
- the plasmid was also Bsal digested and the fragment was purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol.
- the insert and library were ligated using New England Biolabs T4 DNA Ligase according to the Manufacturer’s protocol with 150 ng backbone and a 10:1 ratio of the insert to the backbone.
- HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO 2 .
- Tol2 transposon-mediated library integration 5 million cells (-400X coverage) were seeded in 175 mm 3 dishes. The following day, cells were co-transfected using Lipofectamine 3000 (Thermo Fisher) with 10 ug of Tol2 transposase plasmid (pCMV-Tol2 Addgene # #31823) and 10 ug of Path_Var library. To generate stable library cell lines, cells were selected with hygromycin (25 ug/ml) starting the day after transfection and continued for > 2-3 weeks. Following, lOug of nucleobase editor was transfected using Lipofectamine 3000 to 2.5 million cells (-200X coverage) were seeded the day before in a 100 mm 3 dish. Genomic DNA was collected from cells 5 days after transfection.
- the resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs).
- the PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via InvitrogenTM QubitTM.
- a 4 nM Pool of the different libraries was generated and 10% 4 nM PhiX was added.
- the Pool was sequenced using the Illumina MiSeq Reagent Kit v2 (300-cycles) according to the manufacturer’s protocol. Genome editing of genomic loci
- HEK293T, HeLa and HepG2 were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO2. 20.000 cells were seeded in 96 well plates (Corning) and transfected the day after using jetOPTIMUS® (Polyplus) following manufacturer instructions. 50 ng of sgRNA or agRNA (both cloned in pU6-tevopreql-GG-acceptor) with 150 ng of nucleobase editor were co-transfected, and cells were harvested after three days for Sanger sequencing (Genewiz) or high throughput sequencing (Quintara Biosciences or in house Illumina miSeq).
- the evaluation of the context library involved analyzing gRNA and agRNA libraries, which comprised approximately 12,000 spacer sequences and their respective contextual sequences.
- the efficiency and editing profiles for each gRNA and agRNA were established using custom scripts developed in R. First, the target sites — where each spacer binds within the context — were extracted from the NGS reads. Subsequently, for each spacer in the library, all combinations of adenine to guanine conversions were aligned against these extracted sequences. Spacers with fewer than 25 total reads were excluded from the analysis. To quantify overall editing efficiency for the different nucleobase editors, the mean A to G conversion rate was calculated by averaging the editing frequencies at each targeted position. Generation of the selection phages for PANCE
- the PANCE selection phages are carrying the CDS for the ABE8e adenine deaminase instead of the CDS of Pill.
- the ABE8e adenine deaminase has part of the peptide linker sequence and a C-terminal fused intein CDS to enable it to encode the relatively small protein and not the whole nucleobase editor.
- the phages were generated by PCR amplifying the ABE8e adenine deaminase including the partial sequence of the peptide linker using the oligonucleotides ABE_M13_F and ABE_M13_R.
- the N-terminal Npu DnaE intein was ordered as gBlock and amplified using the oligonucleotides Npu_ABE_F and Npu_M13_R.
- the phage backbone was amplified using the oligonucleotides GOI_M13_F and G0I_M13_R using a wildtype M13 phage genomic DNA as a template. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C.
- the cells were immediately mixed with 3 mL soft LB-agar (0.7 %) and plated on LB bottom agar plates containing 100 pg/mL carbenicillin. The plates were incubated at 37 °C overnight. Plaques were picked into 50 pL 2xYT media and 1 pL was used as a template for colony PCR using the oligonucleotides ABE_M13_F and Npu_M13_R. Positive phages were amplified by adding the remaining 2xYT media to a freshly grown S2060 pJC175e culture at the OD600 of 0.4 and cultivating for 16 hours at 37 °C. The cultures were spun down to remove the E.
- coli cells and the phages were precipitated by adding a 20% polyethylene glycol (8000) and 2.5M sodium chloride solution in a 1:4 ratio to the culture supernatant.
- the mixture is incubated for at least 3 hours at 4 °C and the phage pellet is resuspended in a PBS buffer, the phage titer was quantified using the Progen Phage Titration ELISA kit and the phages were stored at 4 °C until usage.
- 3 mL of the culture supernatant were used for phage DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit. The isolated DNA was sent to Plasmidsaurus for whole phage DNA sequencing. Generation of the selection cells for PANCE
- the selection plasmids were designed on the basis of using pJC175e and adding mutations that when edited by the ABE base editor, only perfect edits restore Pill activity while bystander lower pill activity.
- the pJC175e backbone was amplified using the oligonucleotides pIII_gBlock_R and pJC175e_Cas_F.
- the part of the pill CDS containing the mutation followed by the corresponding guide correcting the introduced mutation downstream as well as the C-terminal DnaE intein necessary to fuse the ABE8e adenine base editor encoded by the phage to the Cas9 encoded by the selection plasmid were ordered as gBlock.
- the three different gBlocks for the three different selection plasmids each encoding a different pill mutation were amplified via PCR using the oligonucleotides gBlock_R and gBlock_pIII_F.
- the base Cas9 CDS was amplified from ABE8e plasmid (Addgene #138489) using the oligonucleotides BE_Npu_F and BE_pJC175e_R. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C.
- the clones with verified sequence were used to generate electrocompetent cells that were then transformed with the mutation plasmid MP4 (Badran).
- the cells were recovered in 1 mL SOC media and plated on 2xYT agar plates containing 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. 5 colonies were used to start 50 pL shake flask 2xYT 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol cultivations. The cultivations were used to freeze 20 % glycerol stocks in 1 mL aliquots after 16 hours. Each culture was also used to isolate plasmid DNA for whole plasmid sequencing by Plasmidsaurus to select the glycerol stocks with no mutation in MP4 and the selection plasmid.
- the evolution was performed as 10 consecutive batch cultivations in triplicates using a mix of three different selection plasmids in each evolution.
- the day prior to the cultivation three 3 mL overnight cultures are prepared using 2xYT media with 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. The cultures are inoculated with the glycerol stock of one of the selection plasmids each.
- 3-4 hours prior to phage infection 50 mL shake flasks are inoculated with a combined OD600 of 0.1 of the pooled overnight cultures with the different selection plasmids.
- the cells are cultivated in 2xYT with 100 pg/mL carbenicillin and 25 g/mL chloramphenicol. 30 minutes prior reaching an OD of 0.4, the cells are induced with 0.5 % arabinose and when the cells reach the OD600 of 0.4, the cells are infected with the selection phages at an MOI of 1 for the first selection round.
- the evolution is performed for 12 hours at 37 °C and after that the entire cultivation was spun down and the supernatant was filtered with 0.2 pM filters. For selection round 2-4, 500 pL, for round 5-6 100 pL, and for the remaining rounds 5 pL of the supernatant were used to infect the following evolution.
- the phage titer after each selection round was determined using the Progen Phage Titration ELISA kit. 3 mL of each culture supernatant was used for phage DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit.
- the resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs).
- the PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via InvitrogenTM QubitTM.
- a 4 nM Pool of the different libraries was generated and 10 % 4 nM PhiX was added.
- the Pool was sequenced using the Illumina MiSeq Reagent Kit v3 (600-cycles) according to the manufacturer’s protocol.
- the number of models in the ensemble (e.g., 4 out of 6, versus 2 out of 6) that agreed on a given prediction (i.e., a specific amino acid substitution) determined the score that was given to a predicted substitution, and a higher score was more likely to yield a positive result (e.g., an evolved protein that retained function/activity).
- the ensemble of protein language models was applied to the TadA-8e sequence, which yielded the following predictions (score in parenthesis): R26G (6), F84L (6), N108D (6), Y149F (6), F156K (6), V106A (5), P152R (5), H8D (5), N157K (5), R111T (4), C146S (3), R111A (2), C146Q (2), M151E (2), C146K (1), Y123H (1), M151Q (1), A48P (1), V155E (1), S109P (1), P152Q (1).
- Lor gRNA engineering a library of 3 '-extended sgRNAs, or anchor-guide RNAs (agRNAs), was designed and tested to improve the precision of ABEs. agRNA candidates from this library screening were then used as part of a Phage Assisted Non-Continuous Evolution (PANCE) system to evolve a more precise TadA-8e enzyme. Using a dual selection pressure (favoring precise editing at the target site while penalizing bystander edits) several variants were identified with narrowed editing windows. Notably, the PANCE- evolved V28C variant exhibited enhanced on-target efficiency while reducing bystander editing. Editing patterns across ⁇ 12K pathogenic mutations demonstrated that V28C is ⁇ 2-3 fold more precise and -20% more efficient than ABE8e.
- Machine learning was applied to computationally design TadA-8e mutants with improved precision and efficiency.
- the M151E mutation narrowed the editing window while increasing on-target editing.
- Example 7 3 '-gRNA extensions restrict editing window and reduce bystander editing
- the mechanism underlying the adenosine deamination process involves that the TadA-8e domain engages with the exposed single- stranded region of the PAM-distal nontarget strand (18) (FIG. 21A).
- the TadA-8e deaminase when attached to the Cas9 protein, induces specific editing patterns within narrow DNA regions, covering several base pairs. This connection limits the enzyme to act on certain nucleotides, defining what is known as the editing window.
- the present disclosure describes anchor guide RNAs (agRNAs) that stabilized the DNA strand within the active site.
- the agRNAs were designed by adding nucleotides to the 3' end of the gRNA scaffold.
- the agRNAs were designed as a library of short sequences and entire hairpin structures (counter-loops) at the opposite site of the editing loop to introduce structures that may sterically restrict the movement of the DNA and TadA enzyme even further (Fig. 1A, Fig. SIB). This design process yielded an agRNAs library containing -60K candidates.
- a lentiviral vector was constructed with the editing target downstream of the agRNA.
- the tested library was designed to target a site in the human DNMT1 locus, being an optimal candidate for screening, both for high accessibility for editing in HEK293T cells, and the multiple adenines context within the editing window.
- the ABE8e-spCas9-WT base editor in combination with a non-3 'extended guide (sgRNACtrl) showed a high editing efficiency for the four adenines in the editing window (A13, A14, A15 and A17).
- Next a library was designed to edit A13.
- agRNAs (termed “anchors”) were selected that showed higher efficiency and lower bystander editing in the DNMT1 agRNA library (FIG 21C). Five candidates were selected to test in the native context in HEK293T cells, and all of them showed a decrease in bystander editing (FIG 16C).
- the enzyme TadA-8e was evolved to further decrease bystander edits and to evaluate how the combination of new variants and agRNA could impact the editing pattern.
- Phage-assisted non-continuous evolution has been used in the past to increase the activity of ABEs (19).
- a selection method was designed to evolve variants with decreased bystander edits and that worked with agRNAs. Specifically, a selection pressure was designed that decreases the phage titer in response to bystander edits, but also increases phage titer upon “perfect editing” (e.g., editing the “target” base), to prevent the evolution of an inactive TadA enzyme.
- perfect editing e.g., editing the “target” base
- the accessory plasmid encoded the corresponding agRNA targeting the Single Nucleotide Variant (SNV), as well as a C-terminal Intein-nCas9-SpRY.
- Phage DNA was isolated from each PANCE round and sequenced using NGS (FIG. 22B). An increase in phage titer was observed after round 4, which suggested an increased activity of the enzyme mutants encoded by the phage (FIG. 22C).
- the top 50 most enriched mutations were selected and individually tested the editing pattern in the human DNMT1 locus in HEK293 cells (FIG. 17C). Variants were detected with both higher efficiency and reduced bystander editing pattern at position A13, when compared with the Abe8e-SpRY BE (FIG. 17C).
- the PANCE evolved TadA variants were also benchmarked against the ABE9 base editor (both WT and SpRY) that showed low editing efficiency and no relative bystander reduction in the DNMT1 site (FIG. 17C).
- Variants displaying V28C (SEQ ID NO: 15) and E34W (SEQ ID NO: 16) showed the highest PreS with both sgRNACtrl and agRNA56114 (FIG. 17B and 17C).
- NGS abundance analysis demonstrates that ABE8e rarely achieved perfect editing (FIG.17D). Unlike the ABE8e-WT, the SpRY version did not reduce bystander editing with the agRNA56114, however increased the editing efficiency (FIG.17D).
- the primary outcome for both V28C (SEQ ID NO: 15) and E34W (SEQ ID NO: 16) was the perfect editing, with an average of 9.3% and 10.9%, respectively, when used with the sgRNACtrl (FIG.17D).
- V28C in combination with agRNA56114, not only reduced bystander editing but also increased perfect editing to 24.4%.
- L34W increased to 18% the on-target editing and kept all bystander editing below 5% (FIG.17D).
- 80.8% ( ⁇ 0.5) of the reads for L34W in combination with agRNA56114 were the perfect editing (Fig. 2E).
- V28C (SEQ ID NO: 15) with agRNA56114 (SEQ ID NO:2) showed 53.4% ( ⁇ 4.6) of the reads with perfect edits and the highest fold-change improvement when compared with ABE8e sgRNACtrl (47.24 ⁇ 1.95) (Fig. 2F).
- Both variants exhibited enrichment across all rounds of PANCE evolution (FIG.17G).
- Cas9-WT base editors with these mutations also showed increased precision (FIG. 22E).
- V28C’s (SEQ ID NO: 15) increased on-target editing was accompanied by DNA Cas9-dependent off-target editing, similar to ABE8e-SpRY, across four sites (FIG.17H).
- an orthogonal R-loop assay was performed to assess the Cas9-independent on target editing, and observed a substantial decrease in both variants across 5 different sites (FIG.171).
- RNA off-target editing was analyzed by RNA-sequencing. The use of the anchor reduced the A-to-I deamination by 3.6-fold (FIG.22F). When combined with the evolved variants, the reduction was even higher with a 14.2 and 22.7-fold for V28C (sgRNACtrl and agRNA56114) and 18.33 and 21.9-fold for L34W (FIG.22F).
- Example 9 Machine Learning-guided design reveals overengineering constraints in ABE8e and unveils novel precise mutations
- NGS abundance analysis of M151E editing showed increased precision in position A15, with the highest fold-change in combination with the and agRNA56114 (25.7-fold ⁇ 2.4) and a 15.4-fold ( ⁇ 1.3) in A13 (FIG.18D and 18E).
- the PANCE derived variant V28C (SEQ ID NO: 15); highest PreS) was machine-learning evolved” (ML-evolved) using the same approach.
- the ML-derived mutation with the highest PreS was combined with the PANCE-derived V28C mutation to determine if the ML approach could be additive to PANCE.
- the V28C-M151E variant (SEQ ID NO: 18) showed reduced editing efficiency at DNMT1 position, but precise edit in other contexts such Site9 (FIG.18G, FIGs. 23A and 23B).
- the M151E mutation was then cross-referenced with the amino acid exchanges observed during the PANCE.
- amino acid substitutions at position 151 were evaluated, an enrichment in aspartic acid was observed across the different rounds of evolution (FIG. 18H and 18G). Both glutamic acid and aspartic acid are negatively charged amino acids, delivering similar editing patterns at the DNMT1 site (FIG. 18J.).
- Example 10 ABE8e-V28C achieves superior precision and efficiency across diverse genomic sites
- V28C variant SEQ ID NO: 15
- FIG. 19D Significant C-to-N changes across all the sites were not detected when compared to ABE8e
- V28C variant SEQ ID NO: 15
- ABE8e ABE8e across 12 different sites in the human genome using HEK293T cells
- V28C variant refined the deaminase’s editing window, improving precision at every tested site (FIG. 19E-19N).
- V28C produced a constrained 4- A editing window, yielding an average 27.1% increase in on- target efficiency (FIG. 190 and 19P).
- Example 11 V28C drastically improves precise correction of pathogenic variants in iPSCs [0336] NGS abundance analysis revealed that the V28C mutation significantly enriched single-base deamination across multiple loci. Structural modeling of the wild-type enzyme predicts that H57 and C87 establish three van der Waals interactions and two hydrogen bonds with 8-Az(26) (FIG. 20A). Without being bound by theory, in the V28C mutant, the C28 may shift closer to 8-Az(26) to reduce the measured distance to below 5.101 A. In line with this, a 0.77 A decrease in distance between residue 28 and 8-Az(26) was detected (FIG. 20A).
- V28C narrows the editing window and minimizes bystander editing without compromising catalytic efficiency.
- V28C variant SEQ ID NO: 15
- PCSK9 a therapeutic target for lowering EDE levels and reducing the risk of coronary heart disease
- gRNA targeting the exon 1-intron 1 splice donor site
- FIG. 20B a loss-of- function mutation was introduced.
- V28C variant SEQ ID NO: 15
- V28C variant SEQ ID NO: 15
- the SNCA E46K mutation was targeted, which causes early-onset Parkinson’s disease by promoting a-synuclein aggregation and neuronal toxicity (FIG. 20E).
- the target sequence shares 45% identity with the DNMT1 site used to identify agRNA56114 (FIG. 20F).
- V28C variant (SEQ ID NO: 15) demonstrated superior precision in reverting the pathogenic mutation, achieving 11.6% ( ⁇ 0.8) perfect edits (% of edited reads) compared to just 0.65% ( ⁇ 0.14) with ABE8e, representing a 17.6-fold increase in precision (FIG. 20J).
- V28C with agRNA56114 (SEQ ID NO:2) further improved precision, yielding 17.5% ( ⁇ 1.06) perfect edits, a 26.6-fold enhancement over ABE8e alone (FIG. 201- 20J).
- Introducing the PLM predicted M151E mutation into V28C further increased perfect editing to 47.4% when combined with sgRNACtrl and 53.0 with agRNA56114 (calculated from edited reads).
- LB and 2xYT media were generated using MP BiomedicalsTM media capsules according to the manufacturer’s protocol.
- 16 g/L agar was added for standard, and 7 g/L agar was added for soft agar. All media was sterilized by autoclaving. Oligo s/primers and plasmids used in the study can be found in Table A. All gRNAs were cloned using KLD (NEB) cloning according to the manufacturer’s protocol. agRNA library generation
- the agRNA library consists of an upstream sequence (US) that is the reverse complement of the downstream sequence of the target, a counter-loop and a downstream binding sequence (DS) that binds the upstream sequence of the target.
- US upstream sequence
- DS downstream binding sequence
- the upstream and downstream binding sequences are of different length and have different binding regions in the 1 to 11 bp region upstream and downstream of the target.
- the counter-loop library consists of 33 different DNA sequences of which the longer one’s form GC rich hairpins.
- the final library is a library containing every combination of the possible UBS, hairpin, and DBS combinations. A script to generate the hairpin library for a novel context can be found in supplementary code 1.
- the agRNA for DNMT1 was ordered as Agilent DNA Oligo Pool (64610 oligos).
- the oligos for the DNMT1 library contained a Gibson overhang, gRNA, gRNA scaffold, agRNA library and a terminator followed by a short DNA sequence used as primer binding site.
- the target for the DNMT1 library was already cloned on the plasmid used as backbone.
- the DNA Oligo Pool library was amplified with the oligos Lib_F and Lib_R via PCR.
- the backbone pU6-tevopreql-GG-acceptor (Addgene #174038) was PCR amplified using the oligos SplitF and SplitR.
- the PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol.
- the fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol.
- 2 pL of the Gibson assembly mix are directly transformed into Lucigen's Endura Competent Cells and after recovery on 1ml of SOC media, plated on Carbenicillin/Agar plates poured in NuncTM Square Bio As say Dishes (Cole Palmer #EW-01929-00).
- the library was amplified from the pU6-tevopreql-GG-acceptor.
- the backbone was digested using PspXI and Esp3I and cloned by Gibson assembly following the previously described protocol.
- HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37°C and 5% CO2. Cells were seeded at a density of 5 x 10 6 cells per 10 cm plate in DMEM supplemented with 10% FBS and antibiotics (Thermo # 15240062). The following day, cells were transfected using Transporter 5® Transfection Reagent with a plasmid mix containing pVSV- G (3.86 pg), pPax2 (8.57 pg), and the lentiviral transfer vector (9.23 pg) in Opti-MEM (Thermo Fisher).
- the DNA-transporter complexes were incubated at room temperature for 20 minutes before being added to the culture media. After 24 hours, the media was replaced with fresh DMEM, and viral supernatants were collected at 48- and 72-hours post-transfection.
- the harvested media was filtered through a 0.45 pm vacuum filter system and concentrated using Lenti-X Concentrator (Takara Bio) at a 1:3 ratio (media:concentrator) by incubation at 4°C for at least 30 minutes, followed by centrifugation at 1,500 x g for 45 minutes at 4°C.
- the viral pellet was resuspended in phosphate-buffered saline (PBS), aliquoted, and stored at -80°C until further use.
- HEK293T cells were transduced with the lentiviral library with an MOI of 0.2. After 24-48hs, media was removed and exchanged with fresh media with 2 ug/ml of puromycin. Selection continued during -14 days. To test the editing pattern across our library, 20 million cells (300X coverage) were seeded in a 225mm3 dish and transfected the day after using Lipof ectamine 3000 (Thermo Fisher) with 20 ug of base editor pCMV-T7-ABE8e-nSpCas9- P2A-EGFP (KAC978) (Addgene #185910). Genomic DNA was collected from cells 5 days after transfection.
- Score (% Perfect Edit) / ((% Perfect Edit + % Bystander) 2 ).
- Anchors achieving the highest scores and demonstrating at least 20% overall editing efficiency were further characterized experimentally.
- the backbone sgBbsI (p2Tol-U6-2xBbsI-sgRNA-HygR) (Addgene #71485) was PCR amplified using the oligos BB_R and BB_F.
- the PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol.
- the fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol.
- the library was then digested using Bsal according to the manufacturer’s protocol and gel purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol.
- the gRNA scaffold, hairpin and terminator with an inward facing Bsal cutting site up- and downstream were ordered as cloned gene synthesis from IDT.
- the plasmid was also Bsal digested, and the fragment was purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol.
- the insert and library were ligated using New England Biolabs T4 DNA Ligase according to the Manufacturer’s protocol with 150 ng backbone and a 10:1 ratio of the insert to the backbone.
- HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37°C and 5% CO2.
- Tol2 transposon-mediated library integration 5 million cells (-400X coverage) were seeded in 175mm3dishes. The following day, cells were co-transfected using Lipofectamine 3000 (Thermo Fisher) with lOug of Tol2 transposase plasmid (pCMV-Tol2 Addgene # #31823) and 10 ug of Path_Var library. To generate stable library cell lines, cells were selected with hygromycin (25 ug/ml) starting the day after transfection and continued for > 2-3 weeks. Following, lOug of base editor was transfected using Lipofectamine 3000 to 2.5 million cells (-200X coverage) were seeded the day before in a 100mm3 dish. Genomic DNA was collected from cells 5 days after transfection.
- the resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs).
- the PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via InvitrogenTM QubitTM.
- a 4 nM Pool of the different libraries was generated and 10 % 4 nM PhiX was added.
- the Pool was sequenced using the Illumina MiSeq Reagent Kit v2 (300-cycles) according to the manufacturer’s protocol.
- the evaluation of the context library involved analyzing gRNA libraries, which comprised approximately 12,000 spacer sequences and their respective contextual sequences.
- the efficiency and editing profiles for each gRNA were established using custom scripts developed in R. First, the target sites — where each spacer binds within the context — were extracted from the NGS reads. Subsequently, for each spacer in the library, all combinations of adenine to guanine conversions were aligned against these extracted sequences. Spacers with fewer than 25 total reads were excluded from the analysis. To quantify overall editing efficiency for the different base editors, the mean A to G conversion rate was calculated by averaging the editing frequencies at each targeted position.
- HEK293T, HeLa and HepG2 were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO2. 20.000 cells were seeded in 96 well plates (Corning) and transfected the day after using jetOPTIMUS® (Polyplus) following manufacturer instructions. 50 ng of sgRNA or agRNA (both cloned in BPK1520 (Plasmid #65777)) with 150ng of base editor were cotransfected, and cells were harvested after three days for Sanger sequencing (Genewiz) or HTS (Quintara Biosciences or in house Illumina miSeq). HTS data was analyzed using CRISPRESSO and BE-analyzer (CRISPR RGEN tools) (26).
- the PANCE selection phages are carrying the CDS for the ABE8e adenine deaminase instead of the CDS of Pill.
- the ABE8e adenine deaminase has part of the peptide linker sequence and a C-terminal fused intein CDS to enable it to encode the relatively small protein and not the whole base editor.
- the phages were generated by PCR amplifying the ABE8e adenine deaminase including the partial sequence of the peptide linker using the oligonucleotides ABE_M13_F and ABE_M13_R.
- the N-terminal Npu DnaE intein was ordered as gBlock and amplified using the oligonucleotides Npu_ABE_F and Npu_M13_R.
- the phage backbone was amplified using the oligonucleotides GOI_M13_F and G0I_M13_R using a wildtype M13 phage genomic DNA as a template. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C.
- the cells were immediately mixed with 3 mL soft LB-agar (0.7 %) and plated on LB bottom agar plates containing 100 pg/mL carbenicillin. The plates were incubated at 37 °C overnight. Plaques were picked into 50 pL 2xYT media and 1 pL was used as a template for colony PCR using the oligonucleotides ABE_M13_F and Npu_M13_R. Positive phages were amplified by adding the remaining 2xYT media to a freshly grown S2060 pJC175e culture at the OD600 of 0.4 and cultivating for 16 h at 37 °C. The cultures were spun down to remove the E.
- coli cells and the phages were precipitated by adding a 20% polyethylene glycol (8000) and 2.5M sodium chloride solution in a 1:4 ratio to the culture supernatant.
- the mixture is incubated for at least 3h at 4 °C and the phage pellet is resuspended in a PBS buffer, the phage titer was quantified using the Progen Phage Titration ELISA kit and the phages were stored at 4 °C until usage.
- 3 mL of the culture supernatant were used for phage DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit. The isolated DNA was sent to Plasmidsaurus for whole phage DNA sequencing.
- the selection plasmids were designed on the basis of using pJC175e (Addgene #79219) (27) and adding mutations that when edited by the ABE base editor, only perfect edits restore Pill activity while bystander lower pill activity.
- the pJC175e backbone was amplified using the oligonucleotides pIII_gBlock_R and pJC175e_Cas_F.
- the part of the pill CDS containing the mutation followed by the corresponding guide correcting the introduced mutation downstream as well as the C-terminal DnaE intein necessary to fuse the ABE8e adenine base editor encoded by the phage to the Cas9 encoded by the selection plasmid were ordered as gBlock.
- Each selection plasmid also encodes the agRNA to fix the mutation on pill (SP1 (F366L); SP2 (K360R); SP3 (141 IV) ).
- the three different gBlocks for the three different selection plasmids each encoding a different pill mutation were amplified via PCR using the oligonucleotides gBlock_R and gBlock_pIII_F.
- the base Cas9 CDS was amplified from ABE8e plasmid (Addgene #138489) using the oligonucleotides BE_Npu_F and BE_pJC175e_R.
- PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C. All fragments were digested with Dpnl (NEB) overnight at 37 °C in the PCR buffer and PCR purified using the NEB Monarch® PCR & DNA Cleanup Kit the next day. The fragments were assembled using the NEB Gibson Assembly® Master Mix according to the manufacturer’s protocol and transformed into electrocompetent S2060 competent cells. The cells were recovered in 500 pL SOC media for Ih and after that plated on LB agar plates with 100 g/ML carbenicillin and incubated overnight at 37 °C.
- Colonies were screened via colony PCr and positive clones were sent to whole plasmid sequencing. The clones with verified sequence were used to generate electrocompetent cells that were then transformed with the mutation plasmid MP4 (Addgene #69652) (28). The cells were recovered in 1 mL SOC media and plated on 2xYT agar plates containing 1% glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. 5 colonies were used to start 50 pL shake flask 2xYT 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol cultivations.
- the cultivations were used to freeze 20 % glycerol stocks in 1 mL aliquots after 16 h. Each culture was also used to isolate plasmid DNA for whole plasmid sequencing by Plasmidsaurus to select the glycerol stocks with no mutation in MP4 and the selection plasmid. PANCE
- the evolution was performed as 10 consecutive batch cultivations in triplicates using a mix of three different selection plasmids in each evolution.
- the day prior to the cultivation three 3 mL overnight cultures were prepared using 2xYT media with 1 % glucose, 100 pg/mL carbenicillin and 25 g/mL chloramphenicol. The cultures are inoculated with the glycerol stock of one of the selection plasmids each.
- 3-4 h prior to phage infection 50 mL shake flasks are inoculated with a combined OD600 of 0.1 of the pooled overnight cultures with the different selection plasmids.
- the cells are cultivated in 2xYT with 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. 30 min prior reaching an OD of 0.4, the cells are induced with 0.5 % arabinose and when the cells reach the OD600 of 0.4, the cells are infected with the selection phages at an MOI of 1 for the first selection round.
- the evolution is performed for 12 h at 37 °C and after that the entire cultivation was spun down and the supernatant was filtered with 0.2 pM filters. For selection round 2-4, 500 pL, for round 5-6 100 pL, and for the remaining rounds 5 pL of the supernatant were used to infect the following evolution.
- V28C, L34W and V28C&M151E correspond to amino acid sequences set forth in SEQ ID NOs.: 15, 16, and 18.
- the resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs).
- the PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via InvitrogenTM QubitTM.
- a 4 nM Pool of the different libraries was generated and 10 % 4 nM PhiX was added.
- the Pool was sequenced using the Illumina MiSeq Reagent Kit v3 (600-cycles) according to the manufacturer’s protocol.
- Variant testing in the DNMT1 site followed the same conditions described in Genome editing of genomic loci section.
- the command line in ChimeraX was used to visualize, mutate, and analyze interactions of target residues. Interactions of nucleotides 5-8 in the gRNA with residues of ABE8e were analyzed. Hydrogen bonds and non-polar (van der Waals) interactions were sought between carbon atoms in the gRNA and protein at a maximum distance of 3.8 Angstroms, and cationic interactions between nitrogen atoms within 5 A of an aromatic carbon involving our target residues and nucleotides.
- KOLF2.1J SNCA E46K-/- was purchased from Jackson laboratories and maintained in StemFlex media (Life Technologies #A3349401) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO2. Cells were cultured in coated plates with Img/ml Synthemax stock solution (Synthemax II-SC - Cat# 3535, Corning) following manufacturer instructions.
- Nucleofection was performed using the Neon electroporation system (Thermo Fisher) 10 ul kit. 200.000 cells were resuspended in ⁇ 10ul of buffer R and mixed with 200ng of gRNA vector and 200ug of base editor. Nucleofection was performed using the following parameters: Voltage: 1400 V, Width: 20 ms, Pulses: 2 pulses. After nucleofection, cells were plated in 12- well plates with 400ul of StemFlex without antibiotic and 1:100 dilution of RevitaCellTM Supplement (Cat# A26445-01, Gibco Life Technologies). After 24 hrs, media was replaced and editing was evaluated after 48hs by NGS (Quintara Biosciences).
- the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim.
- any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim.
- elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group.
- the invention, or aspects of the invention is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Wood Science & Technology (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Medicinal Chemistry (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
Provided herein are compositions and methods for improving the editing efficiency and/or reducing the bystander editing of nucleobase editors, such as an adenine base editor (ABE). Some aspects of this disclosure describe sequence- specific anchor guide RNAs (agRNAs) with extended 3 '-nucleic acid extensions that improve the editing efficiency and/or reduce bystander editing of a nucleobase editor in a context dependent manner. Some aspects of the disclosure provide evolved nucleobase editors with narrowed editing windows and increased activity.
Description
END-MODIFIED GRNAS FOR IMPROVED BASE EDITING
RELATED APPLICATIONS
[0001] This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application, U.S.S.N. 63/648,151, filed May 15, 2024, which is herein incorporated by reference in its entity.
GOVERNMENT SUPPORT
[0002] This invention was made with government support under DE-FG02-02ER63445 awarded by the Department of Energy (DOE). The government has certain rights in this invention.
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
[0003] The content of the electronic sequence listing (H082470440WO00-SEQ-GJM.xml; Size: 123,568 bytes; and Date of Creation: May 13, 2025) is herein incorporated by reference in its entirety.
BACKGROUND
[0004] Base editing has expanded the genome editing toolkit by offering high editing efficiencies, both in vivo and in vitro, without inducing double-strand breaks. Adenine base editors (ABEs), catalyze the deamination of cytidine residues in a sequence dependent manner and the conversion A*T-to-G*C base pairs; while, cytidine base editors (CBEs) catalyze the deamination of adenosine residues in a sequence dependent manner and the conversion of C«G-to-T«A base pairs. When combined with PAMless Cas enzymes, such as SpRY (Walton, Chen), these enzymes have the capacity to induce all four transition mutations and effectively correct the majority of identified human pathogenic Single Nucleotide Polymorphisms (SNPs) (Sternberg, Nishimasu, Epinat). However, the promiscuous activity of fused deaminases allows not only the conversion of the target nucleotide, but also non-target bases within the editing window. This bystander effect has the potential to result in missense or nonsense mutations, potentially presenting limitations on the use of base editing in clinical applications.
[0005] While engineering nucleobase editors to obtain nucleobase editors with narrower windows and less bystander effects seems to be a streamline solution, previous efforts have
shown that precision is often achieved at the expense of editing efficiency, thus imposing a limitation in target sites that can be edited. Since, nucleobase editor (“NBE”) efficiency and editing pattern are influenced by the complex interaction between nucleobase editors, gRNAs, and target sequences (Arbab 2020), modifications to nucleobase editors and/or to the components thereof which result in increased editing efficiencies and/or increased specificity would significantly advance the art.
SUMMARY
[0006] The present disclosure provides modified guide RNAs (gRNAs) comprising 3'- nucleic acid extensions (referred to herein as anchor guide RNAs (agRNAs)), wherein the agRNA has improved properties, including, but not limited to, improved editing efficiency and/or reduced bystander editing in a context dependent manner of base editing when used in conjunction with a nucleobase editor, such as a fusion protein comprising a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp). Without being bound by theory, these agRNAs improve editing efficiency and/or reduce bystander editing by a nucleobase editor by stabilizing the target nucleic acid sequence (e.g., genomic DNA) within the active site of the nucleobase editor, where stabilizing means restricting movement of the DNA within the active site of the nucleobase editor to result in a smaller editing window and/or deaminating fewer bases. The present disclosure further provides methods for engineering and/or evolving nucleobase editors to be used in conjunction with a given agRNA.
[0007] In various aspects, the present disclosure provides compositions, methods, uses, and kits for base editing comprising an agRNA and an optionally engineered and/or evolved nucleobase editor disclosed herein. In some embodiments, the nucleobase editor is an engineered and/or evolved nucleobase editor.
[0008] In all aspects, the agRNA comprises a gRNA and a 3 '-nucleic acid extension (FIG. IB and FIG. 5A). The gRNA comprises a spacer sequence and a scaffold sequence. In some embodiments, the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA. In some embodiments, the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA by a nucleotide linker. In some embodiments, the nucleotide linker ranges from 1-50 nucleotides in length.
[0009] In various embodiments, the agRNA is capable of binding to a napDNAbp by the scaffold sequence of the gRNA and directing the napDNAbp to a target nucleic acid sequence (e.g., genomic DNA). The target nucleic acid sequence comprises (i) a target strand and (ii) a
complementary non-target strand. The target strand comprises a protospacer sequence that binds to the spacer sequence of the gRNA forming an RNA-DNA hybrid. The non-target strand comprises the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor). The non-target strand binds to 3 '-nucleic acid extension of the agRNA. In some embodiments, the 3 '-nucleic acid extension binds to the non-target strand using (a) an upstream binding sequence (UBS) that is complementary the non-target stand and binds downstream of the target nucleic acid on the non-target strand, and/or (b) a downstream binding sequence (DBS) that is complementary the non-target stand and binds upstream of the target nucleic acid on the non-target strand (FIG. IB and FIG. 5A).
[0010] In one aspect, an agRNA for nucleobase editing comprises a 3 '-nucleic acid extension comprising nucleic acids encoding the upstream binding sequence (UBS). In some embodiments, the UBS is complementary to the non-target stand and binds downstream of the target nucleic acid on the non-target strand.
[0011] In another aspect, an agRNA for nucleobase editing comprises a 3 '-nucleic acid extension comprising nucleic acids encoding the downstream binding sequence (DBS). In some embodiments, the DBS is complementary to the non-target stand and binds upstream of the target nucleic acid on the non-target strand(FIG. IB and FIG. 5A).
[0012] In some embodiments, the UBS is about or more than about 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or more nucleotides in length. In some embodiments, the DBS is about or more than about 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or more nucleotides in length.
[0013] In some embodiments, the UBS and/or the DBS are at least 85% homologous, at least 90% homologous, at least 95 % homologous, at least 97 % homologous, at least 99% homologous, at least 99.7 % homologous, or 100% homologous to the non-target strand. In some embodiments, the UBS and/or DBS comprises at least 1, at least 2, at least 3, at least 4, or at least 5 mismatches.
[0014] In some embodiments, the 3 '-nucleic acid extension further comprises a counterloop sequence (CLS). In some embodiments, the CLS is about or more than about 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or more nucleotides in length. In some embodiments, the CLS forms a secondary structural feature. In some embodiments, the CLS is a hairpin. In some embodiments, the CLS is flanked by the UBS and the DBS.
[0015] In certain embodiments, the 3 '-nucleic acid extension further comprises a secondary structural element. In certain embodiments, the secondary structure element is a tevopreQl motif (FIG. IF).
[0016] A person skilled in the art would appreciate that the location of the editing window can change when a different Cas domain is used, or when the deaminase domain changes. For example, SaCas9 typically supports a broader editing window (typically protospacer positions -3-12 for CBEs and -4-12 for ABEs) than SpCas9 (protospacer positions -4-8 for CBEs and -4-7 for ABEs). A person skilled in the art would also appreciate that a broader editing window increases the frequency of bystander editing by a nucleobase editor. Without being bound by theory, this issue arises because the non-target strand (e.g., genomic DNA strand not bound by the spacer of the gRNA) has a certain degree of freedom to move in the active site of the nucleobase editor. To overcome this, the present inventors have discovered that agRNAs may be modified in one or more ways to restricted the movement of the nontarget strand (e.g., genomic DNA strand not bound by the spacer of the gRNA) within the active site of the nucleobase editor, thus resulting in a smaller editing window and minimizing the bystander effect and/or improve editing efficiency of the target nucleic acid by the nucleobase editor. In certain embodiments, the agRNA is modified to include 3'- nucleic acid extension comprising, but not limited to, a UBS, a UBS and a CLS, a CLS, a CLS and a DBS, a UBS and a DBS, or a UBS and a CLS and a DBS. In certain embodiments, the 3 '-nucleic acid extension stabilizes the non-target strand (e.g., genomic DNA strand not bound by the spacer of the gRNA) comprising the target nucleic acid (e.g., the nucleobase to be edited by a nucleobase editor) within the active site of the nucleobase editor (FIG. IB). Thus, in certain embodiments, the agRNA improves the editing efficiency of a target nucleic acid by a nucleobase editor relative to the editing efficiency of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA. In certain embodiments, the agRNA reduces bystander editing of bystander nucleic acids within an editing window of a target nucleic acid for a nucleobase editor relative to the bystander editing within a window of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA. In certain embodiments, the target nucleic acid sequence comprises a target nucleic acid (also referred to as “the target nucleobase”), wherein the target nucleic acid falls within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site. In certain embodiments, the target nucleic acid falls within a gene, a gene that is associated with a disease or disorder, or a gene that is
associated with a disease or disorder caused by a pathogenic Single Nucleotide Polymorphisms (SNPs). SNPs are the most common genetic variations for various complex human diseases and disorders, including inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods and uses described herein.
[0017] The present disclosure contemplates the use of an agRNA described herein for multiplex editing with a nucleobase editor described herein or elsewhere. In certain embodiments, the target nucleic acid sequence comprises one or more target nucleic acids (also referred to as “the target nucleobase”), wherein the one or more target nucleic acids fall within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site. In certain embodiments, the target nucleic acid falls within a gene (e.g., DNMT1 a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by a pathogenic Single Nucleotide Polymorphisms (SNPs). SNPs are the most common genetic variations for various complex human diseases and disorders, including, but not limited to, inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods and uses described herein.
[0018] In another aspect, the present disclosure provides an agRNA library comprising a plurality of agRNAs described herein. Each agRNA library comprises agRNAs that bind up- (e.g., utilizing UBSs) and/or downstream (i.e., utilizing DBSs) of a specific target DNA strand that is later present in the active site of the nucleobase editor (FIG. IB and FIG. 5A). Therefore, each agRNA library is specific to the intended target of the nucleobase. In some embodiments, the agRNA library comprising between 2,000-75,000 different agRNAs, wherein the agRNAs comprise different 3 '-nucleic acid extensions. In some embodiments, the agRNA library varies the 3'-nucleic acid extensions by length (e.g., the length of the 3'- nucleic acid extension is varied in the library). In some embodiments, the agRNA library varies the 3'-nucleic acid extensions by UBS, DBS, or CFS (e.g., the UBS and DBS are kept constant, while the CFS is varied). That is, in some embodiments, the 3 '-nucleic acid extension is varied in the library by UBS, DBS, or CFS (e.g., the UBS and DBS are kept constant, while the CFS is varied). In some embodiments, the agRNA library varies the 3'- nucleic acid extensions by structure such that the 3 '-nucleic acid extensions comprise different combinations of UBSs, CUSs, and DBSs (e.g., an agRNA library comprising agRNAs with a CFS in the 3'- nucleic extension versus an agRNA library comprising
agRNAs without a CLS in the 3'- nucleic extension), or further comprise other secondary structural elements (e.g., tevopreql motif). That is, in some embodiments, the 3'-nucleic acid extension is varied in the library by structure such that the 3 '-nucleic acid extensions comprise different combinations of UBSs, CLSs, and DBSs (e.g., an agRNA library comprising agRNAs with a CLS in the 3'- nucleic extension versus an agRNA library comprising agRNAs without a CLS in the 3'- nucleic extension), or further comprise other secondary structural elements (e.g., tevopreql motif). In some embodiments, the agRNA library is used to screen which agRNAs improve editing efficiency and/or reduce bystander editing of a nucleobase editor.
[0019] To measure the effect of an agRNA on the editing efficiency and/or reduced bystander editing of a nucleobase editor, the inventors describe a screening method. Therefore, in some embodiments, the disclosure provides polynucleotides, vectors, and cells, comprising an agRNA described herein for screening the editing pattern for each nucleobase combined with a particular agRNA.
[0020] In some embodiments, the present disclosure describes a polynucleotide comprising an agRNA. The polynucleotide may further comprise a target nucleic acid sequence (e.g., a gene of interest) comprising the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor) downstream of the agRNA sequence.
[0021] In another aspect, the present disclosure provides a vector comprising a polynucleotide described herein. In certain embodiments, the vector comprises a polynucleotide encoding an agRNA described herein. In certain embodiments, the polynucleotide can be under the control of a promoter. In certain embodiments, the polynucleotide can be under the control of multiple promoters. The promoter can be any promoter recognized by a skilled artisan (e.g., a constitutive promoter, a tissue- specific promoter, or an inducible promoter). The promoter can be a U6 promoter. The promoter can also be a U6, U6v4, U6v7, or U6v9 promoter or a fragment thereof.
[0022] In certain embodiments, the vector further comprises a polynucleotide sequence comprising an agRNA described herein and a target nucleic acid sequence (e.g., a gene of interest) that includes the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor) (FIG. 5A). In certain embodiments, the target nucleic acid sequence is located downstream of the agRNA sequence. In certain embodiments, the agRNA and the target nucleic acid sequence are within a 50-600-nucleotide window (e.g., a 100-nucleotide window, a 300-nucleotide window, a 450-nucleotide window, etc.).
[0023] In certain embodiments, the vector further comprises at least one primer binding site. In certain embodiments, the vector further comprises at least two primer binding sites. In certain embodiments, the vector comprising the one or more primer binding sites is subjected to next-generation sequencing (NGS) to sequence the agRNA and the target nucleic acid after the editing process in order to analyze the editing pattern (e.g., editing efficiency and bystander editing) of a nucleobase with a given agRNA.
[0024] In some embodiments, a first primer binding site is located upstream or within the agRNA, while a second primer binding site is located downstream of a target nucleic acid sequence. In some embodiments, the distance between the first and second primer sites is less than 600, less than 500, less than 300, less than 200, less than 100, or less than 50 nucleotides. In certain embodiments, the distance between the first and second primer sites is less than 300 nucleotides.
[0025] In another aspect, the present disclosure provides an agRNA screening library comprising a plurality of vectors described above and provided in this disclosure. In certain embodiments, next-generation sequencing (NGS) is used to sequence the plurality of vectors of the agRNA screening library to analyze the editing pattern (e.g., editing efficiency and bystander editing) of a nucleobase and a given agRNA. In certain embodiments, the agRNA and the target nucleic acid sequence are within within the 300-nucleotide window. In some embodiments, the target sequence falls within the human DNMT1 gene.
[0026] In another aspect, the present disclosure describes a composition comprising (a) an agRNA and (b) a nucleobase editor (e.g., ABExl, ABEx2, ABEx3, or ABEx4) to carry out nucleobase editing. In certain embodiments, the composition further comprises (c) a target nucleic acid. In certain embodiments, the nucleobase editor is a fusion protein capable of base editing. In some embodiments, the fusion protein comprises a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
[0027] In some embodiments, the composition comprises (a) an agRNA, (b) a N-terminal portion of a split nucleobase editor fused at its C-terminus to an intein-N and (c) a C-terminal portion of a split nucleobase editor fused at its N-terminus to an intein-C such that the N- terminal portion of a split nucleobase editor and the C-terminal portion of a split nucleobase editor are joined to form a fusion protein of a deaminase and a napDNAbp. In certain embodiments, the composition further comprises (d) a target nucleic acid.
[0028] In certain embodiments, the napDNAbp is selected from the group consisting of Cas9, CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp-Cas9, SpRY, SpG-Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2, and variants thereof.
[0029] In certain embodiments, the deaminase is a cytidine deaminase or an adenosine deaminase. In certain embodiments, the cytidine deaminase is selected from the group consisting of CBE6, CGBE, BE4max, TadCBE, and variants thereof. In certain embodiments, the adenosine deaminase is selected from the group consisting of TadA-8e, ABE8e, AYBE, ABE9e, and variants thereof.
[0030] In certain embodiments, the deaminase is an adenosine deaminase comprising an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1 of a variant thereof. In certain embodiments, the adenosine deaminase has an amino acid sequence that includes one or more amino acid substitutions at positions 28, 34, and 151 relative to the corresponding position in the sequence of SEQ ID NO: 1 or a variant thereof. In certain embodiments, the adenosine deaminase has an amino acid sequence that comprises one or more amino acid substitutions selected from V28C, L34W, and M151E relative to the corresponding position in SEQ ID NO: 1 or a variant thereof.
[0031] In another aspect, the present disclosure describes a complex comprising any of the agRNAs described herein and a nucleobase editor described herein or elsewhere.
[0032] In another aspect, the present disclosure describes one or more polynucleotides encoding a complex of an agRNA and a nucleobase editor described herein or elsewhere. [0033] In another aspect, the present disclosure describes one or more vectors comprising one or more polynucleotides encoding the components of a complex of an agRNA and a nucleobase editor. In certain embodiments, the vector includes one or more promoters that drive the expression of the agRNA and the nucleobase editor or split nucleobase editor of the complex.
[0034] In another aspect, the disclosure provides cells (e.g., transformed cell lines) that comprise the agRNA described herein. The cells can also comprise the nucleobase editing complexes described herein (e.g., wherein the cell comprises both an agRNA and a nucleobase editor). The cells can also comprise any of the polynucleotides described above, which express the agRNA, and optionally which express the nucleobase editors. In addition, the cells can comprise any of the vectors described above, which express the agRNA, and optionally which express the nucleobase editor.
[0035] In another aspect, the disclosure provides a pharmaceutical composition comprising: (i) an agRNA described above, or a nucleobase editing complex described above, a polynucleotide described above, or a vector described above, or any of the cells described above, and (ii) a pharmaceutically acceptable excipient.
[0036] In another aspect, the disclosure describes a computational method, which may be embodied in software, for designing a library of 3'-nucleic acid extensions. The method involves evaluating a target nucleic acid sequence and generating UBSs, DBSs, and CLSs at varying lengths (e.g., 0-50 nucleotides), and 3 '-nucleic acid extensions comprising different combinations of the various UBSs, DBSs, and/or CLSs.
[0037] In another aspect, the disclosure describes a method of selecting agRNAs, wherein the method involves transfecting the agRNA screening libraries described herein into cells, and using (Next Generation Sequencing) NGS to select agRNA vectors that observe reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid by a nucleobase editor. In certain embodiments, the reduced bystander editing and/or editing efficiency of a nucleobase editor is measured relative to the gRNA lacking the 3 '-nucleic acid extension of the agRNA or a second agRNA.
[0038] Some aspects of this disclosure provide methods of phage-assisted, non-continuous evolution (PANCE) of a nucleobase editor. In PANCE, selection phages (SP) are generated that encoded an adenine deaminase with a C-terminal intein, as well as selection plasmids (e.g., SP 1-3) that encode a nCas9-SpRY with a N-terminal intein. The selection plasmid comprises (i) a pill nucleotide sequence (encoding the phage coat protein pill) that has been modified to contain at least one single nucleotide variant (SNV) and (ii) an agRNA nucleotide sequence encoding the corresponding agRNA that targets the modified pill nucleotide sequence to correct (edit) the SNV to the wildtype sequence. The SNV results in a mutated pill protein having lower the phage infectivity. Correction of the SNV by a complex of the agRNA and nucleobase editor to the WT sequence increases phage infectivity. If the perfect edit occurs, the pill sequence is reverted to the wildtype sequence and phage propagation occurs. Similarly, the selection plasmid comprises a sequence such that bystander edits upstream and downstream of the target nucleic acid in the pill nucleotide sequence introduce mutations that inhibit phage propagation.
[0039] In some embodiments, host cells further comprise a helper plasmid and/or a mutagenesis plasmid. In some embodiments, a mutagenesis plasmid comprises an arabinose- inducible promoter.
[0040] Some aspects of this disclosure provide methods of selecting nucleobase editors that show reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid utilizing machine learning (ML) language models. In certain embodiments, the ML language models
are able to predict evolutionary adaptative mutations resulting in nucleobase editor variants with improved fitness.
[0041] In another aspect, the disclosure provides a method of nucleobase editing (e.g., “base editing”) comprising contacting a target nucleic acid sequence with an agRNA described above and a nucleobase editor comprising a fusion protein comprising a deaminase and a napDNAbp or a split napDNAbp, wherein the editing efficiency is increased, and/or the bystander editing is decreased as compared to the same method using a gRNA not including the 3 '-nucleic acid extension.
[0042] The present disclosure contemplates the use of the agRNAs described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell (e.g., prokaryotic or eukaryotic cell), a virus, a fungus, a plant, an insect, and/or an animal. The present disclosure contemplates the use of the methods described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell (e.g., prokaryotic or eukaryotic cell), a virus, a fungus, a plant, an insect, and/or an animal. In other aspects, the use of the methods described herein may be used for modifying a target nucleic acid sequence for research purposes (e.g., to edit or introduce a nonpathogenic SNP that may enhance or abolish a function, a process, or a phenotype).
[0043] In yet another aspect, the disclosure provides a kit comprising: (i) agRNA described above, or a nucleobase editing complex described above, a nucleic acid molecule described above, or a vector described above, or any of the cells described above, and (ii) a set of instructions for conducting nucleobase editing.
[0044] The foregoing concepts, and additional concepts discussed below, may be arranged in any suitable combination, as the present disclosure is not limited in this respect. Further, other advantages and novel features of the present disclosure will become apparent from the following detailed description of various non-limiting embodiments when considered in conjunction with the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0045] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments of the disclosure and together with the description, provide non-limiting examples.
[0046] FIGs. 1A-1I depict the design and testing of agRNA library. Figure 1A is a schematic workflow of the dual nucleobase editor system evolution starting with sequencing the patient specific mutation, testing existing base editing enzymes for that context, and identifying the editing pattern. If bystander mutations exist, a personalized agRNA for the specific context in combination with an optionally evolved nucleobase editor can generate a “bystander-less” and active base editing system that is personalized for the patient. Figure IB is a schematic of library design. Library candidates consist of combinations of an array of sequences that bind upstream of the target, hairpin loops, and sequences that bind downstream of the target. Figure 1C is a dot plot representation of the ~60K agRNA clones’ library after NGS evaluation. Shown as squares are agRNA candidates with high efficiency and low bystander editing in the DNMT1 cloned context. FIG. ID shows the editing pattern of ABE8e at the human DNMT1 locus in combination with selected agRNAs. FIG. IE shows the editing pattern of ABE8e in combination with agRNA56114-tev opreq 1 at the human DNMT1 locus in HEK293T cells. FIG. IF is a schematic representation of different agRNA compositions containing different combinations of the anchor and tevopreql motif. FIG. 1G shows the editing pattern of ABE8e and the different guide RNA combinations shown in f at the human DNMT1 locus. FIG. 1H shows the influence of the agRNA56114 with and without tevopreql motif on the editing pattern in a ~12K different pathogenic contexts. FIG. II is the editing pattern of ABE8e in combination with agRNA56114-tev opreq 1 at the human DNMT1 locus in HeLa and HepG2 cells. agRNA testing data in the native DNMT1 locus was obtained from n>3 independent experiments. Data are shown as mean values; error bars, S.D. P value was determined by a two-way ANOVA test (* < 0.05, ** < 0.01, *** < 0.001 and ****, 0.0001). [0047] FIGs. 2A-2G shows phage assisted non-continuous evolution of adenine base editors. FIG. 2A depicts a schematic representation of the PANCE. The selection phage encoded the TadA-8e adenine deaminase with a C-terminal intein, while the selection plasmids (SP 1-3) the nCas9-SpRY with a N-terminal intein. The three different selection plasmids encode for different pill sequences carrying a single nucleotide variant (SNV) together with the corresponding agRNA to correct the SNV to the wild type sequence. FIG. 2B shows that once the selection phage and the selection plasmid meet in the same cell, the SNVs of SP1-3 can be corrected. If the perfect edit occurs, the sequence is reverted to the wild type sequence (e.g., Leu, Arg, or Vai). The SNVs introduce an amino acid exchange that lowers the phage infectivity (Weiss). Bystander edits can introduce mutations that result in a decrease in the phage replication (e.g., Pro or Gly). FIG. 2C shows the mutational landscape after round 10
RIO minus the amino acid frequencies on the non-evolved phage (n=3 independent experiments). FIG. 2D is the editing pattern of the selected mutants generated in the PANCE experiment in the human DNMT1 locus in HEK293T cells (n>3 independent experiments). FIG. 2E shows the fold change of the amino acid exchanges V28C and L34W representing the two mutants evolved in the PANCE experiment. Dots represent the fold change in the three different replicates of the PANCE evolution. FIGs. 2F-2H depict the computational modeling of the structural change resulting from the amino acid exchanges of the V28C and L34W mutants.
[0048] FIGs. 3A-3F depict machine learning guided base editor evolution. FIG. 3A is a schematic workflow of the machine learning approach to identify evolutionary plausible mutations. FIG. 3B shows the editing pattern in the human DNMT1 locus caused by ABE single amino acid exchange mutants identified by machine learning algorithm in HEK293T cells (n=3 independent experiments). FIG. 3C shows the fold change of the amino acid exchange glutamic acid (M) to aspartic acid (D) at position 151 in the PANCE experiment. FIG. 3D is the mean percentage of amino acid exchanges in RIO minus the amino acid frequencies on the non-evolved phage (n=3 independent experiments) of the amino acid exchanges at position 151. FIG. 3E shows the editing pattern at the human DNMT1 locus of the machine learning identified mutant (Ml 5 IE) and the PANCE identified mutant (Ml 5 ID) in HEK2393T cells (n=3 independent experiments). FIG. 3F depicts the computational modeling of the structural change resulting from the amino acid exchanges of the M151E mutant.
[0049] FIGs. 4A-4D show bystander abolishment by ABEx-agRNA combinations. FIG. 4A shows the AB Ex variants and corresponding mutations. ABExl and 3 were generated by PANCE, ABEx2 by machine learning and ABEx4 is a combination of both techniques. FIG. 4B is the editing pattern in the human DNMT1 locus caused by ABEX spCas9-SpRY variants combined with agRNA56114-tevopreql in HEK293T cells (n=3 independent experiments). FIG. 4C shows the editing pattern in the human DNMT1 locus caused by ABEX-spCas9-WT variants combined with agRNA56114-tevopreql in HEK293T cells (n=3 independent experiments). FIG. 4D shows the AB EX-SpRY variants fold change editing efficiency normalized vs ABE8e, analyzed in -12000 different pathogenic contexts.
[0050] FIGs. 5A-5E show Sanger traces and NGS analysis of ABE8e-spCas9-WT with sgRNACtrl and agRNA56114. FIG. 5A is a schematic representation of agRNA library design and workflow. The TadA-8e domain engages with the exposed single-stranded region of the PAM-distal nontarget strand (NTS) fostering deamination. Based on this a 3’ extended
agRNA library was designed and cloned into (plasmid). Next, 20 million HEK293T cells were transfected and analyzed the editing pattern by Illumina sequencing (miSeq V3 300 cycles). FIGs. 5B-5C are Sanger sequencing chromatograms on the DNMT1 genome locus after editing using ABE8e-spCas9-WT with the sgRNACtrl and agRNA56114-tevopreql. FIGs. 5D-5E show the editing frequencies on the DNMT1 genome locus after editing using ABE8e-spCas9-WT with the sgRNACtrl and agRNA56114-tevopreql analyzed by NGS. [0051] FIGs. 6A-6E show the PANCE constructs and titers. FIG. 6A depicts a selection phage design containing the TadA fused with the Npu N-terminal intein. FIG. 6B shows schematics of the Selection plasmids containing the gene III, the agRNA and the Cas9 fused with the Npu C-terminal intein. FIG. 6C depicts the mapping of agRNA binding sites on the gene III. FIG. 6D is a depiction of the PANCE workflow schematics. FIG. 6E shows the phage titer across the ten rounds of evolution.
[0052] FIGs. 7A-7C show PANCE variant testing in the DNMTl-Site. FIG. 7A show RNP editing efficiencies using the ABE8e-SpRY and the ABE9-WT. ABE9 showed low editing efficiency using both plasmids and RNP editing strategies. FIGs. 7B-7C show the editing efficiency of 50 PANCE evolved clones with the sgRNActri (FIG. 7B) and agRNAseiu- tevopreql •
[0053] FIGs. 8A-8B are a structural comparison of ABE8e and AB Ex variants. FIG. 8A depicts the structure modeling of ABE8e-WT. Snapshot is at the editing interphase representing the deaminase (pink), the WT mutated aminoacids (grey) and the DNA (yellow). FIG. 8B shows the number of H bonds formed between amino acids in positions mutated (columns) and neighbor amino acids (rows). The table in FIG. 8B shows H bonds for those positions with wild-type amino acid (WT), mutated amino acid (mut), and the difference between mutated and wild type (dif). Both wild-type V28 and mutated C28 are predicted to establish the same interactions with surrounding residues, i.e., a hydrogen bond with V30. It is possible that a mutation in residue 28 induces a conformational change and it may interact with nucleotide 6 of the gRNA given its proximity (FIG. 2F). Residues E34 and W34 are also predicted to establish the same interactions with surrounding amino acids, i.e., one and two hydrogen bonds with 141 and G42, respectively. Residue 34 is far from the gRNA but the substitution from E to W, which is a more hydrophobic amino acid, could alter the orientation of the alpha-helix arm (orange) where residue H57 lies.
[0054] FIGs. 9A-9B show the ME variant testing the DNMTl-Site. FIGs. 9A-9B show the editing efficiencies of the 21 ME obtained variants with the sgRNActri (FIG. 9A) and agRNA56ii4 (FIG. 9B).
[0055] FIGs. 10A-10E show the ABE8e and ABExl editing and structure comparison.
FIGs. 10A-10B are Sanger sequencing chromatograms of the DNMT1 genome locus after editing using ABE8e-SpRY with sgRNActri and agRNAseiu. FIG. 10B is the mean editing efficiency for WT variants in the DNMT1 locus (n=3 independent experiments).
[0056] FIGs. 11A-11D show the variant testing in HEK-Site3. FIG. 11A shows the editing efficiencies of the HEK site 3 locus using both PANCE and ML variants, together with the double mutants in HEK293T cells. FIGs. 1 IB-11C are the editing efficiencies of the HEK site 3 locus using agRNAseiu-tevopreqi in HepG2 (FIG. 1 IB) and HeLa (FIG. 11C) cell lines. [0057] FIG. 12 shows the DNMT1 site editing window movement. Mean editing efficiencies in the DNMT1 locus using different sgRNAs and agRNAs targeting the same site. The different guides are named as -1, -2 or -3 if the editing window is moved upstream and +1 if downstream the agRNActri used in this work (centered).
[0058] FIGs. 13A-13H show the Cas-dependent and independent off-targets. FIG. 13A is the on-target editing of ABE8e-SpRY and ABExl-SpRY in the DNMT1 locus. FIG. 13B-13E show the Cas-dependent off-target analysis of ABE8e-SpRY and ABExl -SpRY in four different loci. FIG. 13F shows schematics of the orthogonal R-loop assay. FIGs. 13G-13H show the Cas-independent off-target editing of ABE8e-SpRY and ABExl -SpRY in two different sites.
[0059] FIG. 14 shows the Path_Var libraries analysis. Editing profile for the different SpRY variants using sgRNACtrl and agRNA56114-tevopreql compared with ABE8e and ABE9 [0060] FIGs. 15A-15H shows the Path_Var libraries analysis. FIG. 15A shows read counts of each sample after Illumina sequencing. FIG. 15B shows an example of the quality score of readl and read 2. FIG. 15C shows the editing profile for the different SpRY variants using sgRNActri compared with ABE8e and ABE9. FIGs. 15D-15G are the editing profiles for the different SpRY variants using sgRNActri compared with ABE8e and ABE9 when more than 2 As are present in the editing window. Figure 15H shows the C to A, C to G, and C to T editing profiles using sgRNActri for the different SpRY variants compared with ABE8e and ABE9.
[0061] FIGs. 16A-16D show the effect of 3' extended gRNAs on the editing pattern at the human DNMT1 locus. (FIG. 16A) Scheme of the agRNA (“Anchor”, yellow) in the base editing system. (FIG. 16B) Experimental workflow and design of the agRNA library. More than 60k different agRNAs were cloned downstream of the scaffold within a 300 nt DNA sequence that also contains the target DNA site of the guide. (FIG. 16C) Editing pattern and PreS for the top five performing agRNAs within the human DNMT1 locus. (FIG. 16D) NGS
abundance analysis of ABE8e-spCas9-WT in combination with sgRNActri (left) or agRNAs6ii4 (right) within the human DNMT1 locus. Editing was measured by targeted sequencing and shown as mean, s.e.m., of n = 3 individual data points. P values were determined with a two-way ANOVA test.
[0062] FIG. 17A-17I show phage-assisted non-continuous evolution of precise adenine base editors. (FIG. 17A) Schematic representation of the PANCE. The selection phage encoded the TadA-8e adenine deaminase with a C-terminal intein, while the selection plasmids (SP 1- 3) the nCas9-SpRY with a N-terminal intein. The three different selection plasmids encode for different pill sequences carrying a single nucleotide variant (SNV) together with the corresponding agRNA to correct the SNV to the wildtype sequence. (FIG. 17B) If the selection phage and the selection plasmid are in the same cell, the SNVs of SP1-3 can be corrected. If the perfect edit occurs, the sequence is reverted to the wildtype sequence. The SNVs introduce an amino acid exchange that lowers the phage infectivity (21). Bystander edits can introduce mutations that result in a decrease in phage replication. (FIG. 17C) Editing pattern of the top enriched adenine base editor mutants during the PANCE experiment. PreS is the precision score rating on-target editing efficiency in comparison to bystander editing efficiency. (FIG. 17D) NGS analysis of editing efficiency of the ABE8e control and the two best PANCE generated variants (based on PreS) V28C and L34W in the human DNMT1 locus with and without the agRNAseiu. (FIG. 17E) Precise A-to-G editing (A 13) within edited reads. (FIG. 17F) Fold change precision over ABE8e_SpRY sgRNActri. (FIG. 17G) Fold change of the ratio of the V28C and L34W mutation across 10 rounds of evolution. (FIG. 17H) Editing of Cas9-dependent off-targets of the V28C and L34W in 4 different predicted sites across the human genome. (FIG. 171) Editing of Cas9-independent off-targets of the V28C and L34W variant. Editing was measured by targeted sequencing and shown as mean, s.e.m., of n = 3 individual data points. P values were determined with a oneway ANOVA test.
[0063] FIGs. 18A-18I show machine learning-guided design of adenine base editor variants with reduced bystander editing. (FIG. 18A) Schematic workflow of the machine learning approach to identify evolutionary plausible mutations. (FIG. 18B) (left) Editing efficiencies in the human DNMT1 locus caused by ABE predicted single amino acid exchange mutants (with sgRNActri and agRNAseiu) identified by machine learning algorithm in HEK293T cells, (right) PreS score of tested variants. The underlined mutations are mutations that revert to the wildtype TadA amino acid at that position. (FIG. 18C) NGS abundance analysis control and the best machine learning variant M151E in the human DNMT1 locus with and
without the 56114 agRNA. (FIG. 18D) Fold change of precise mutations in Ar, (top) or A 15 (bottom). (FIG. 18E) V28C variant predicted mutations. The numbers indicated how many of the 6 models indicated the mutation as a plausible mutation. The underlined mutations are mutations that revert to the wildtype TadA amino acid at that position. (FIG. 18F) Editing efficiencies at the endogenous DNMT1 (top) and Site9 (bottom) of the PANCE and machine learning generated double mutant V28C-M151E (FIG. 18G) Amino acid abundance at position 151 after the PANCE experiment. (FIG. 18H) Fold change of ratio of the M151D mutation throughout the PANCE experiment. (FIG. 181) Comparison of editing pattern at the human DNMT1 locus of the amino acid exchange variants at position 151, generated by either the PANCE (Ml 5 ID) or the machine learning (Ml 5 IE). Editing was measured by targeted sequencing and shown as mean, s.e.m., of n = 3 individual data points.
[0064] FIGs. 19A-19P show editing pattern of the base editors. (FIG. 19A) Normalized fold-change histograms of PANCE and ML evolved variants taking ABE8e_SpRY as a control across thousands of human pathogenic variants. (FIG. 19B) Fold change in precision correction of pathogenic mutations when more than 2 adenines are present in the target site. (FIG. 19C) Fold change in precision correction of pathogenic mutations when more than 3 adenines are present in the target site. (FIG. 19D) Normalized fold-change histograms of V28C variant in YA or RA contexts. (FIG. 19E-19N) Editing pattern for twelve individual DNA sites in bulk and on single-read level after editing with the V28C variant and ABE8e_SpRY. (FIG. 19E) Site 2. (FIG. 19F) Site 4. (FIG. 19G) Site 7. (FIG. 19H) Site 8. (FIG. 191) Site 9. (FIG. 19J) Site 10. (FIG. 19K) Site 12. (FIG. 19L) Site 13. (FIG. 19M) Site 14. (FIG. 19N) Site 18. (FIGs. 19O-19P) Mean editing efficiencies throughout the twelve individual contexts of ABE8e control and the V28C variant. Editing was measured by targeted sequencing and shown as mean, s.e.m., of n = 3 individual data points.
[0065] FIGs. 20A-20J show the performance of V28C as an editing tool for correcting human pathogenic mutations. (FIG. 20A) In silico model of the effect of the V28C mutation on the interaction with the target DNA. (FIG. 20B) Scheme of PCSK9 editing strategy using adenine base editors to disrupt a splicing site (FIG. 20C) (FIG. 20D) Quantification of A-to- G editing at the target site. (FIG. 20E) Scheme of the SNCA E46K mutation. (FIG. 20F) Sequence similarity of the guide targeting the human DNMT1 locus and the guide targeting the SNCA E46K mutation. (FIG. 20G) NGS abundance analysis results of the editing of the SNCA E46K mutation by ABE8e_SpRY and the V28C variant with agRNActri and agRNAs6ii4. The sequence highlighted in green represents the perfect edit. (FIG. 20H) Editing pattern of the SNCA E46K mutation by ABE8e and the V28C variant with agRNActri
and agRNAs6ii4. (FIG. 201) Precise A-to-G editing (A15) within edited reads. (FIG. 20J) Fold change precision over ABE8e_SpRY sgRNActri. Editing was measured by targeted sequencing and shown as mean, s.e.m., of n = 3 individual data points. P values were determined with a one-way ANOVA test.
[0066] FIGs. 21A-21G show agRNA library design and testing. (FIG. 21A-21B) Schematic representation of agRNA library design and workflow. (FIG. 21A) The TadA-8e domain engages with the exposed single- stranded region of the PAM-distal nontarget strand (NTS) fostering deamination. (FIG. 21B) Based on this, a 3' extended agRNA library was designed and cloned into (plasmid). Next 20 million HEK293T cells were transfected and analyzed the editing pattern by Illumina sequencing (miSeq V3 300 cycles). (FIG. 21C) Dot plot representation of the ~60K agRNA clones’ library after NGS evaluation. Shown as squares are agRNA candidates with high efficiency and low bystander editing in the DNMT1 cloned context. (FIG. 21D), Bulk editing efficiencies on the DNMT1 genome locus after editing using ABE8e-spCas9-WT with the sgRNACtrl and agRNAseiu analyzed by NGS. (FIG. 21E) Fold-change precision quantification of A13A15 and A13 edited species (based on NGS abundance analysis) using agRNA56114 over gRNACtrl. (FIG. 21F-21G) A-to-G editing efficiencies in (FIG. 21F) HeLa and (FIG. 21G) HepG2 cells.
[0067] FIGs. 22A-22F shows PANCE constructs and their evaluation. (FIG. 22A) Mapping of agRNA binding sites on the gene III in the Selection Plasmid. (FIG. 22B) Schematic representation of the PANCE workflow. (FIG. 22C) Phage titer across the ten rounds of evolution. (FIG. 22D) Mutational landscape after round 10 (R10) of evolution. The heatmap represents the mean percentage of amino acid exchanges in R10 minus the amino acid frequencies on the non-evolved phage (n=3 independent experiments). (FIG. 22E) Editing pattern in the human DNMT1 locus caused by ABE-spCas9-WT variants combined with sgRNACtrl or agRNA56114 in HEK293T cells (n=3 independent experiments). (FIG. 22F) Frequency of A-to-I editing.
[0068] FIGs. 23A-23B show base editing variant testing in HEK-Site3. (FIG. 23A) Editing efficiencies of the HEK Site9 locus using both PANCE and ML variants, together with the double mutants in HEK293T cells. (FIG. 23B) Editing efficiencies of the HEK Site9 locus using agRNA56114
[0069] FIGs. 24A-24I show PathVar library analysis. (FIG. 24A) Schematic representation of the PathVar library workflow using Tol2 mediated transposition. (FIG. 24B) Editing profile for the different SpRY variants using sgRNACtrl compared with ABE8e and ABE9. (FIG. 24C-24F), Editing profile for the different SpRY variants 5 using sgRNACtrl
compared with ABE8e and ABE9 when more than 2 As are present in the editing window. (FIG. 24G) C to A, C to G and C to T editing profiles using sgRNACtrl for the different SpRY variants compared with ABE8e and ABE9. (FIG. 24H and 241) NGS abundance analysis and editing pattern of ABE8_SpRY and V28C_SpRY using sgRNACtrl at (FIG. 24H) Site3 and (FIG. 241) Site 16 sites.
[0070] FIGs. 25A-25D showV28C-M151E variant performance to correct mutation E46K in human iPSCs. (FIG. 25A) Editing efficiencies of V28C-M151E with sgRNACtrl and agRNA56114 compared toABE8e and V28C variants. (FIG. 25B) NGS abundance analysis 5 of V28C-M151E editing at the target site. (FIG. 25C) Precise A-to-G editing (A15) within edited reads. (FIG. 25D) Fold change precision over ABE8e_SpRY sgRNACtrl (calculated from edited reads). Editing was measured by targeted sequencing and shown as mean, s.e.m., of n = 3 individual data points.
DEFINITIONS
[0071] Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
[0072] The terms “administer,” “administering,” and “administration” refer to implanting, absorbing, ingesting, injecting, inhaling, or otherwise introducing a treatment or therapeutic agent, or a composition of treatments or therapeutic agents, in or on a subject.
[0073] The term “biomolecule” or “biological molecule” refers to any substance produced by cells or living organisms and includes carbohydrates, lipids, nucleic acids, proteins, and vitamins.
[0074] The term “cDNA” refers to DNA that is derived from (e.g., by reverse transcription) and complementary to an RNA template (e.g.. an mRNA template or an rRNA template). [0075] The terms “condition,” “disease,” and “disorder” are used interchangeably.
[0076] A “cell,” as used herein, may be present in a population of cells e.g., in a tissue, a sample, a biopsy, an organ, or an organoid). In some embodiments, a population of cells is composed of a plurality of different cell types. Cells for use in the methods and systems of
the present disclosure can be present within an organism, a single cell type derived from an organism, or a mixture of cell types. Included are naturally occurring cells and cell populations, genetically engineered cell lines, cells derived from transgenic animals, cells from a subject, etc. Virtually any cell type and size can be accommodated in the methods and systems described herein. In some embodiments, the cells are mammalian cells (e.g., complex cell populations such as naturally occurring tissues). In some embodiments, the cells are from a human. In certain embodiments, the cells are collected from a subject (e.g., a human) through a medical procedure, such as a biopsy. Alternatively, the cells may be a cultured population (e.g., a culture derived from a complex population or a culture derived from a single cell type where the cells have differentiated into multiple lineages). The cells may also be provided in situ in a tissue sample.
[0077] The term “base editor (BE),” or equivalently “nucleobase editor (NBE)” refers to an agent comprising a polypeptide that is capable of making a modification to a base (e.g., A, T, C, G, or U) within a nucleic acid sequence (e.g., DNA or RNA). In some embodiments, the nucleobase editor is capable of deaminating a base within a nucleic acid. In some embodiments, the nucleobase editor is capable of deaminating a base within a DNA molecule. In some embodiments, the nucleobase editor is capable of deaminating a cytosine (C) in DNA. In some embodiments, the nucleobase editor is capable of deaminating a adenine (A) in DNA. In some embodiments, the nucleobase editor is capable of excising a base within a DNA molecule. In some embodiments, the nucleobase editor is capable of excising an adenine, guanine, cytosine, thymine or uracil within a nucleic acid (e.g., DNA or RNA) molecule. In some embodiments, the nucleobase editor is a protein (e.g., a fusion protein) comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to a cytidine deaminase. In some embodiments, the nucleobase editor is a protein (e.g., a fusion protein) comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to an adenosine deaminase.
[0078] The term “Cas9” or “Cas9 domain” refers to an RNA-guided nuclease comprising a Cas9 protein, or a fragment thereof (e.g., a protein comprising an active, inactive, or partially active DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A Cas9 nuclease is also referred to sometimes as a casnl nuclease or a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR
clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3 '-5' exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self-versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D .J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar E.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and 5. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a nickase. In some embodiments, a Cas9 nuclease lacks an active (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a dead Cas9 (dCas9).
[0079] The term “nucleic acid programmable DNA binding protein” or “napDNAbp” refers to a protein that associates with a nucleic acid (e.g., DNA or RNA), such as a guide nuclic acid, that guides the napDNAbp to a specific nucleic acid sequence. For example, a Cas9 protein can associate with a guide RNA that guides the Cas9 protein to a specific DNA sequence that is complementary to the guide RNA. In some embodiments, the napDNAbp is a class 2 microbial CRISPR-Cas effector. In some embodiments, the napDNAbp is a Cas9 domain, for example a nuclease active Cas9, a Cas9 nickase (nCas9), or a nuclease inactive Cas9 (dCas9). Examples of nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., dCas9 and nCas9), CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp- Cas9, SpRY, SpG-Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2, and variants thereof. It should be appreciated, however, that nucleic acid programmable DNAbinding proteins also include nucleic acid programmable proteins that bind RNA. For example, the napDNAbp may be associated with a nucleic acid that guides the napDNAbp to an RNA. Other nucleic acid programmable DNA binding proteins are also within the scope of this disclosure, though they may not be specifically listed in this disclosure.
[0080] In some embodiments, Cas9 fusion proteins as provided herein comprise the full- length amino acid sequence of a Cas9 protein, e.g., one of the Cas9 sequences provided herein. In other embodiments, however, fusion proteins as provided herein do not comprise a full-length Cas9 sequence, but only a fragment thereof. For example, in some embodiments, a Cas9 fusion protein provided herein comprises a Cas9 fragment, wherein the fragment binds crRNA and tracrRNA or sgRNA, but does not comprise a functional nuclease domain, e.g., in that it comprises only a truncated version of a nuclease domain or no nuclease domain at all.
[0081] In some embodiments, Cas9 refers to Cas9 from: Corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1, NC_016786.1); Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref: NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychroflexus torquisl (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1), Listeria innocua (NCBI Ref: NP_472073.1), Campylobacter jejuni (NCBI Ref: YP_002344900.1) or Neisseria, meningitidis (NCBI Ref: YP_002342100.1) or to a Cas9 from any other organism.
[0082] As used herein, the term “dCas9” refers to a nuclease-inactive Cas9 or nuclease-dead Cas9, or a variant thereof, and embraces any naturally occurring dCas9 from any organism, any naturally-occurring dCas9 equivalent or functional fragment thereof, any dCas9 homolog, ortholog, or paralog from any organism, and any variant of a dCas9, naturally-occurring or engineered. The term dCas9 is not meant to be particularly limiting and may be referred to as a “dCas9 or variant thereof.” Exemplary dCas9 proteins and method for making dCas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference. Any suitable mutation which inactivates both Cas9 endonucleases, such as D10A and H840A mutations in the wild-type S. pyogenes Cas9 amino acid sequence (SEQ ID NO: 77), or D10A and N580A mutations in the wild-type S. aureus Cas9 amino acid sequence, may be used to form the dCas9.
[0083] In some embodiments, wild type Cas9 corresponds to Cas9 from
Streptococcus pyogenes (Uniport Reference Sequence: Q99ZW2, SEQ ID NO: 77 (amino acid).
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGE IAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHE RHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEG DLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLP GEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYA DLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPE
KYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQR TFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFA WMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTV YNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFD SVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERL KTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN FMOLIHDDSETFKEDIOKAQVSGQGDSLHEHIANLAGSPAIKKGILOTVKVVDELVK VMGRHKPENIVIEMARENOTTOKGOKNSRERMKRIEEGIKELGSOILKEHPVENTQL ONEKLYLYYLONGRDMYVDQELDINRLSDYDVDHIVPOSFLKDDSIDNKVLTRSDK NRGKSDNVPSEEVVKKMKNYWROLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK ROLVETROITKHVAOILDSRMNTKYDENDKETREVKVITLKSKLVSDFRKDFOFYKV REINNYHHAHDAYLNAVVGTALIKKYPKEESEFVYGDYKVYDVRKMIAKSEOEIGK ATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSM POVNIVKKTEVOTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVV
AKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFE LENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQ HKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD (SEQ ID NO: 77) (single underline: HNH domain; double underline: RuvC domain)
[0084] As used herein, the term “nCas9” or “Cas9 nickase” refers to a Cas9 or a variant thereof, which cleaves or nicks only one of the strands of a target cut site thereby introducing a nick in a double strand DNA molecule rather than creating a double strand break. This can be achieved by introducing appropriate mutations in a wild-type Cas9 which inactivates one of the two endonuclease activities of the Cas9. Any suitable mutation which inactivates one Cas9 endonuclease activity but leaves the other intact is contemplated, such as one of D10A or H840A mutations in the wild- type .S'. pyogenes Cas9 amino acid sequence, or a D10A mutation in the wild-type .S'. aureus Cas9 amino acid sequence, may be used to form the nCas9. In certain embodiments, the napDNAbp comprises a Cas9 nickase, wherein the Cas9 nickase is .S'. aureus Cas9 comprising a D10A mutation.
[0085] It should be appreciated that additional Cas9 proteins (e.g., a nuclease dead Cas9 (dCas9), a Cas9 nickase (nCas9), or a nuclease active Cas9), including variants and homologs thereof, are within the scope of this disclosure.
[0086] The term “deaminase” or “deaminase domain,” as used herein, refers to a protein or enzyme that catalyzes a deamination reaction. In some embodiments, the deaminase or deaminase domain is a cytidine deaminase, catalyzing the hydrolytic deamination of cytidine or deoxycytidine to uridine or deoxyuridine, respectively. In some embodiments, the deaminase or deaminase domain is a cytidine deaminase domain, catalyzing the hydrolytic deamination of cytosine to uracil. In some embodiments, the deaminase or deaminase domain is a naturally-occurring deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism that does not occur in nature. For example, in some embodiments, the deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally occurring deaminase from an organism.
[0087] In some embodiments, the napDNAbp of the nucleobase editor is a Cas9 domain. In some embodiments, the nucleobase editor comprises a Cas9 protein fused to a cytidine deaminase. In some embodiments, the nucleobase editor comprises a Cas9 nickase (nCas9)
fused to a cytidine deaminase. In some embodiments, the Cas9 nickase comprises a D10A mutation and comprises a histidine at residue 840 of SEQ ID NO: 77, or a corresponding mutation in any Cas9 provided herein, which renders Cas9 capable of cleaving only one strand of a nucleic acid duplex. In some embodiments, the nucleobase editor comprises a nuclease-inactive Cas9 (dCas9) fused to a cytidine deaminase. In some embodiments, the dCas9 domain comprises a D10A and a H840A mutation of SEQ ID NO: 77, or a corresponding mutation in any Cas9 provided herein, which inactivates the nuclease activity of the Cas9 protein.
[0088] The cytidine deaminases (e.g. engineered cytidine deaminases, evolved cytidine deaminases) described herein may be enzymes that convert cytidine (C) to uracil (U) in DNA. If DNA replication occurs before uracil repair, the replication machinery may treat the uracil as thymine (T), leading to a C:G to T:A base pair conversion. In some embodiments, the cytidine deaminases utilized in the nucleobase editor are apolipoprotein B mRNA-editing complex 1 (APOBEC1) deaminases, e.g. rat APOBEC1 deaminases.
[0089] The adenosine deaminases (e.g. engineered adenosine deaminases, evolved adenosine deaminases) provided herein may be may be enzymes that convert adenine (A) to guanine (G) in DNA, leading to an A:T to G:C base pair conversion. In some embodiments, the adenosine deaminase is derived from a bacterium, such as, E.coli. S. aureus, S. typhi, S. putrefaciens, H. influenzae, or C. crescentus. In some embodiments, the adenosine deaminase is a TadA deaminase. In some embodiments, the TadA deaminase is an E. coli TadA deaminase (ecTadA). In some embodiments, the TadA deaminase is a truncated E. coli TadA deaminase. For example, the truncated ecTadA may be missing one or more N- terminal amino acids relative to a full-length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 N- terminal amino acid residues relative to the full length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 C-terminal amino acid residues relative to the full length ecTadA. In some embodiments, the ecTadA deaminase does not comprise an N-terminal methionine.
Reference is made to U.S. Patent Publication No. 2018/0073012, published March 15, 2018, which is incorporated herein by reference.
[0090] As used herein, the term “guide RNA” is a particular type of guide nucleic acid which is commonly associated with a Cas protein (e.g., a Cas9 protein), directing the Cas protein to a specific sequence in a DNA molecule that includes complementarity to the protospacer sequence of the guide RNA. A gRNA, as disclosed herein, may refer to a sgRNA or anchor
guide RNA, herein referred to as “agRNA” (e.g., for base editing). A gRNA may be naturally occurring, recombinant, synthetic, or any combination of these. A gRNA may direct a Cas protein (e.g., as part of a nucleobase editor) to a target site in the target gene. However, this term also embraces the equivalent guide nucleic acid molecules that associate with Cas protein equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non- naturally occurring (e.g., engineered or recombinant), and which otherwise program the Cas protein equivalent to localize to a specific target nucleotide sequence. The Cas protein equivalents may include other napDNAbps from any type of CRISPR system (e.g., type II, V, VI), including Cpfl (a type-V CRISPR-Cas system), C2cl (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), and C2c3 (a type V CRISPR-Cas system). Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), which is incorporated herein by reference. Exemplary sequences and structures of guide RNAs are provided herein.
[0091] Functionally, guide RNAs associate with a Cas protein, directing (or programming) the Cas protein to a specific sequence in a DNA molecule that includes a sequence complementary to the protospacer sequence for the guide RNA. A gRNA is a component of the CRISPR/Cas system. The sequence specificity of a Cas DNA-binding protein is determined by gRNAs, which have nucleotide base-pairing complementarity to target DNA sequences. The native gRNA comprises a 20 nucleotide (nt) Specificity Determining Sequence (SDS), or spacer, which specifies the DNA sequence to be targeted, and is immediately followed by an 80 nt scaffold sequence, which associates the gRNA with the Cas protein. In some embodiments, an SDS of the present disclosure has a length of 15 to 100 nucleotides, or more. For example, an SDS may have a length of 15 to 90, 15 to 85, 15 to 80, 15 to 75, 15 to 70, 15 to 65, 15 to 60, 15 to 55, 15 to 50, 15 to 45, 15 to 40, 15 to 35, 15 to 30, or 15 to 20 nucleotides. In some embodiments, the SDS is 20 nucleotides long. For example, the SDS may be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides long. At least a portion of the target DNA sequence is complementary to the SDS of the gRNA. For a Cas protein to successfully bind to the DNA target sequence, a region of the target sequence is complementary to the SDS of the gRNA sequence and is immediately followed by the correct protospacer adjacent motif (PAM) sequence. In some embodiments, an SDS is 100% complementary to its target sequence. In some embodiments, the SDS sequence is less than 100% complementary to its target sequence and is, thus, considered to be partially complementary to its target sequence. For example, a targeting sequence may be 99%, 98%,
97%, 96%, 95%, 94%, 93%, 92%, 91%, or 90% complementary to its target sequence. In some embodiments, the SDS of template DNA or target DNA may differ from a complementary region of a gRNA by 1, 2, 3, 4, or 5 nucleotides.
[0092] In some embodiments, the guide RNA is about 15-120 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence (e.g., a target sequence in DNMT1). In some embodiments, the guide RNA is 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
112, 113, 114, 115, 116, 117, 118, 119, or 120 nucleotides long. In some embodiments, the guide RNA comprises a sequence of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more contiguous nucleotides that is complementary to a target sequence. Sequence complementarity refers to distinct interactions between adenine and thymine (DNA) or uracil (RNA), and between guanine and cytosine.
[0093] The term “anchor gRNA,” as used herein, refers to a gRNA comprising a 3’-nuceliec acid extension attached at the 3 ’-end of the gRNA. The 3’-nuceliec acid extension is about 1- 120 nucleotides long and comprises a sequence of at least 3 contiguous nucleotides that is complementary to a target sequence. In some embodiments, the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA by a nucleotide linker. In some embodiments, the 3’- nuceliec acid extension is 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 ,1, 42, 43, 44, 45 ,46, 47, 48 49, or 50 nucleotides long. In some embodiments, the 3’-nuceliec acid extension comprises an upstream binding sequence (USB) and a downstream binding sequence (DBS) that are complementary to a target sequence of a nucleobase editor. In some embodiments, the UBS and/or the DSB is 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 ,1, 42, 43, 44, 45 ,46, 47, 48 49, or 50 nucleotides long. In some embodiments, the 3’-nuceliec acid extension comprises a counterloop sequence (CLS). In some embodiments, the CLS is 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 ,1, 42, 43, 44, 45 ,46, 47, 48 49, or 50 nucleotides long. In some embodiments, the counterloop sequence is a hairpin.
[0094] The term “inhibitor of base repair” or “IBR” refers to a protein that is capable in inhibiting the activity of a nucleic acid repair enzyme, for example a base excision repair
enzyme. In some embodiments, the IBR is an inhibitor of inosine base excision repair. Exemplary inhibitors of base repair include inhibitors of APE1, Endo III, Endo IV, Endo V, Endo VIII, Fpg, hOGGl, hNEILl, T7 Endol, T4PDG, UDG, hSMUGl, and hAAG. In some embodiments, the IBR is an inhibitor of Endo V or hAAG. In some embodiments, the IBR is a catalytically inactive EndoV or a catalytically inactive hAAG.
[0095] The term “uracil glycosylase inhibitor” or “UGI,” as used herein, refers to a protein that is capable of inhibiting a uracil-DNA glycosylase base-excision repair enzyme. In some embodiments, a UGI domain comprises a wild-type UGI or a UGI as set forth in SEQ ID NO: 12. In some embodiments, the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment. A UGI variant shares homology to UGI, or a fragment thereof.
[0096] The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed November 23, 2000, published as WG/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences.
[0097] The term “mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)).
[0098] The term “prevent,” “preventing,” or “prevention” refers to a prophylactic treatment of a subject who is not and was not with a disease but is at risk of developing the disease or who was with a disease, is not with the disease, but is at risk of regression of the disease. In certain embodiments, the subject is at a higher risk of developing the disease or at a higher risk of regression of the disease than an average healthy member of a population.
[0099] The terms “polynucleotide,” “nucleotide sequence,” “nucleic acid,” “nucleic acid molecule,” “nucleic acid sequence,” and “oligonucleotide” refer to a series of nucleotide bases (also called “nucleotides”) in DNA and RNA and mean any chain of two or more
nucleotides. The polynucleotides can be chimeric mixtures or derivatives or modified versions thereof, and single-stranded or double- stranded. The oligonucleotide can be modified at the base moiety, sugar moiety, or phosphate backbone, for example, to improve stability of the molecule, its hybridization parameters, etc.
[0100] The term "nucleic acid," as used herein, (also referred to as a "polynucleotide") refers to a polymer of nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxy adenosine, deoxythymidine, deoxyguanosine, and deoxy cytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7 deazaadenosine, 7 deazaguanosine, 8 oxoadenosine, 8 oxoguanosine, 0(6) methylguanine, 4-acetylcytidine, 5-(carboxyhydroxymethyl)uridine, dihydrouridine, methylpseudouridine, 1- methyl adenosine, 1-methyl guanosine, N6-methyl adenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, modified sugars (e.g., 2'-fluororibose, ribose, 2 '-deoxyribose, 2'-O-methylcytidine, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioates and 5' phosphoramidite linkages).
[0101] A “protein,” “peptide,” or “polypeptide” comprises a polymer of amino acid residues linked together by peptide bonds. The term refers to proteins, polypeptides, and peptides of any size, structure, or function. Typically, a protein will be at least three amino acids long. A protein may refer to an individual protein or a collection of proteins. Proteins may contain only natural amino acids, although non-natural amino acids (i.e., compounds that do not occur in nature but that can be incorporated into a polypeptide chain) and/or amino acid analogs as are known in the art may alternatively be employed. Also, one or more of the amino acids in a protein may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a famesyl group, an isofamesyl group, a fatty acid group, a linker for conjugation or functionalization, or other modification. A protein may also be a single molecule or may be a multi-molecular complex. A protein may be a fragment of a naturally occurring protein or peptide. A protein may be naturally occurring, recombinant, synthetic, or any combination of these. A protein may also be a therapeutic protein administered as a treatment for a disease or disorder (e.g., one that is associated with a change in the RNA expression and/or translation profile of a cell taken from a subject). In certain embodiments, the protein is an antibody, or an antibody variant (including antibody fragments).
[0102] The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein. In some embodiments, a protein comprises a proteinaceous part, e.g., an amino acid sequence constituting a nucleic acid binding domain, and an organic compound, e.g., a compound that can act as a nucleic acid cleavage agent. In some embodiments, a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.
[0103] The term “promoter” is art-recognized and refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene. A promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active in the presence of a specific condition. For example, a conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule. A subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule “inducer” for activity. Examples of inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters. A variety of constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect. In various embodiments, the disclosure provides vectors with appropriate promoters for driving expression of the nucleic acid sequences encoding the fusion proteins (or one or more individual components thereof).
[0104] The term “RNA-programmable nuclease,” and “RNA-guided nuclease” are used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA(s) that is not a target for cleavage. In some embodiments, an RNA-programmable nuclease, when in a complex with an RNA, may be referred to as a nuclease:RNA complex. Typically, the bound RNA(s) is referred to as a guide RNA (gRNA). gRNAs can exist as a complex of two or more RNAs, or as a single RNA molecule. gRNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though “gRNA” is used interchangeably to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas9 complex to the target); and (2) a domain that binds a Cas9 protein. In some embodiments, domain (2) corresponds to a sequence known as a tracrRNA, and comprises a stem-loop structure. For example, in some embodiments, domain (2) is identical or homologous to a tracrRNA as provided in Jinek et al., Science 337:816-821(2012), the entire contents of which is incorporated herein by reference. Other examples of gRNAs (e.g., those including domain 2) can be found in U.S. Provisional Patent Application, U.S.S.N. 61/874,682, filed September 6, 2013, entitled “Switchable Cas9 Nucleases And Uses Thereof,” and U.S. Provisional Patent Application, U.S.S.N. 61/874,746, filed September 6, 2013, entitled “Delivery System For Functional Nucleases,” the entire contents of each are hereby incorporated by reference in their entirety. In some embodiments, a gRNA comprises two or more of domains (1) and (2), and may be referred to as an “extended gRNA.” For example, an extended gRNA will, e.g., bind two or more Cas9 proteins and bind a target nucleic acid at two or more distinct regions, as described herein. The gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex. In some embodiments, the RNA- programmable nuclease is the (CRIS PR-associated system) Cas9 endonuclease, for example, Cas9 (Csnl) from Streptococcus pyogenes (see, e.g., “Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E.,
Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference.
[0105] Because RNA-programmable nucleases (e.g., Cas9) use RNA:DNA hybridization to target DNA cleavage sites, these proteins are able to be targeted, in principle, to any sequence specified by the guide RNA. Methods of using RNA-programmable nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al., Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al., RNA-guided human genome engineering via Cas9. Science 339, 823- 826 (2013); Hwang, W.Y. et al., Efficient genome editing in zebrafish using a CRISPR-Cas system. Nature Biotechnology 31, 227-229 (2013); Jinek, M. et al., RNA-programmed genome editing in human cells. eLife 2, e00471 (2013); Dicarlo, J.E. et al., Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic acids research (2013); Jiang, W. et al. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nature Biotechnology 31, 233-239 (2013); the entire contents of each of which are incorporated herein by reference).
[0106] The term “recombinant” as used herein in the context of proteins or nucleic acids refers to proteins or nucleic acids that do not occur in nature, but are the product of human engineering. For example, in some embodiments, a recombinant protein or nucleic acid molecule comprises an amino acid or nucleotide sequence that comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations as compared to any naturally occurring sequence.
[0107] The term “target nucleic acid” refers to nucleotide in a “target sequence” within a nucleic acid molecule that is modified by a nucleobase editor, such as a fusion protein comprising an adenosine deaminase, e.g., a dCas9-adenosine deaminase fusion protein provided herein).
[0108] A “transcript” or “RNA transcript” is the product resulting from RNA polymerase- catalyzed transcription of a DNA sequence. When the RNA transcript is a complementary copy of a DNA sequence, it is referred to as the primary transcript, or it may be an RNA sequence derived from post-transcriptional processing of the primary transcript and is then referred to as the mature RNA. “Messenger RNA (mRNA)” refers to the RNA that is without introns and can be translated into a polypeptide by the cell.
[0109] As used herein, the terms “upstream” and “downstream” are terms of relativity that define the linear position of at least two elements located in a nucleic acid molecule (whether single or double-stranded) that is orientated in a 5'-to-3' direction. In particular, a first element is upstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 5' to the second element. Conversely, a first element is downstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 3' to the second element.
[0110] A “subject” to which administration is contemplated refers to a human (z.e., male or female of any age group, e.g., pediatric subject (e.g., infant, child, or adolescent) or adult subject (e.g., young adult, middle-aged adult, or senior adult)) or non-human animal. In some embodiments, the non-human animal is a mammal (e.g., primate (e.g., cynomolgus monkey or rhesus monkey) or mouse). The term “patient” refers to a subject in need of treatment of a disease. In some embodiments, the subject is human. In some embodiments, the patient is human. The human may be a male or female at any stage of development. A subject or patient “in need” of treatment of a disease or disorder includes, without limitation, those who exhibit any risk factors or symptoms of a disease or disorder. In some embodiments, a subject is a non-human experimental animal (e.g., a mouse, rat, dog, pig, or non-human primate). [0111] An “effective amount” of a compound described herein refers to an amount sufficient to elicit the desired biological response. An effective amount of a compound described herein may vary depending on such factors as the desired biological endpoint, the pharmacokinetics of the compound, the condition being treated, the mode of administration, and the age and health of the subject. In certain embodiments, an effective amount is a therapeutically effective amount. In certain embodiments, an effective amount is a prophylactic treatment. In certain embodiments, an effective amount is the amount of a compound described herein in a single dose. In certain embodiments, an effective amount is the combined amounts of a compound described herein in multiple doses. For example, in some embodiments, an effective amount of a nucleobase editor may refer to the amount of the nucleobase editor that is sufficient to induce a mutation of a target site specifically bound by the nucleobase editor. In some embodiments, an effective amount of a fusion protein provided herein, e.g., of a fusion protein comprising a nucleic acid programmable DNA binding protein and a deaminase domain (e.g., a cytidine deaminase domain) may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a fusion protein, a nucleobase editor, a deaminase, a hybrid protein, a protein dimer, a
complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors as, for example, on the desired biological response, e.g., on the specific allele, genome, or target site to be edited, on the cell or tissue being targeted, and on the agent being used.
[0112] A “therapeutically effective amount” of a treatment or therapeutic agent is an amount sufficient to provide a therapeutic benefit in the treatment of a condition or to delay or minimize one or more symptoms associated with the condition. A therapeutically effective amount of a treatment or therapeutic agent means an amount of the therapy, alone or in combination with other therapies, that provides a therapeutic benefit in the treatment of the condition. The term “therapeutically effective amount” can encompass an amount that improves overall therapy, reduces or avoids symptoms, signs, or causes of the condition, and/or enhances the therapeutic efficacy of another therapeutic agent.
[0113] The terms “treatment,” “treat,” and “treating” refer to reversing, alleviating, delaying the onset of, or inhibiting the progress of a disease described herein. In some embodiments, treatment may be administered after one or more signs or symptoms of the disease have developed or have been observed (e.g., prophylactically or upon suspicion or risk of disease). In other embodiments, treatment may be administered in the absence of signs or symptoms of the disease. For example, treatment may be administered to a susceptible subject prior to the onset of symptoms (e.g., in light of a history of symptoms in the subject, or family members of the subject). Treatment may also be continued after symptoms have resolved, for example, to delay or prevent recurrence. In some embodiments, treatment may be administered after using the methods disclosed herein and observing a change in the RNA expression or translation profile in a cell or tissue in comparison to a healthy cell or tissue.
[0114] As used herein, the term “variant” should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature, e.g., a variant Cas9 is a Cas9 comprising one or more changes in amino acid residues (i.e., “substitutions”) as compared to a wild type Cas9 amino acid sequence. The term “variant” encompasses homologous proteins having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with a reference sequence and having the same or substantially the same functional activity or activities as the reference sequence. The term also encompasses mutants, truncations, or domains of a reference sequence that display the same or substantially the same functional activity or activities as the reference sequence. [0115] The term “vector,” as used herein, refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter a host cell, mutate, and replicate within the
host cell, and then transfer a replicated form of the vector into another host cell. Exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure.
[0116] As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.
[0117] Throughout the present disclosure, when a range of values is listed, it is intended to encompass each value and sub-range within the range. Where ranges are given, endpoints are included.
[0118] The details of certain embodiments of the invention are set forth in the Detailed Description of Certain Embodiments, as described below. Other features, objects, and advantages of the invention will be apparent from the Definitions, Examples, and Claims.
DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS
[0119] The aspects described herein are not limited to specific embodiments, methods, uses, or configurations, and as such can, of course, vary. The terminology used herein is for the purpose of describing particular aspects only and, unless specifically defined herein, is not intended to be limiting.
[0120] The present disclosure provides modified guide RNAs (gRNAs) comprising 3'- nucleic acid extensions (referred to herein as anchor guide RNAs (agRNAs)), wherein the use of an agRNA results in improved editing efficiency and/or reduced bystander editing of a nucleobase editor. In certain embodiments, the nucleobase editor is a fusion protein comprising a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp). Without being bound by theory, these agRNAs improved editing efficiency and/or reduced bystander editing of a nucleobase editor by stabilizing the target nucleic acid sequence (e.g., genomic DNA) within the active site of the nucleobase editor. The present disclosure further provides methods for evolving nucleobase editors to be used in conjunction with a given agRNA.
[0121] In certain embodiments, the agRNA improves the editing efficiency of a target nucleic acid by a nucleobase editor relative to the editing efficiency of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA. In certain embodiments, the agRNA reduces bystander editing of bystander nucleic acids within an editing window of a target nucleic acid for a nucleobase editor relative to the
bystander editing of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
[0122] In various aspects, the present disclosure provides compositions, methods, uses, and kits for base editing comprising an agRNA and an optionally engineered and/or evolved nucleobase editor disclosed herein. agRNAs
[0123] In various aspects, the agRNAs are modified versions of a guide RNA. Guide RNAs maybe naturally occurring, expressed from an encoding nucleic acid, or synthesized chemically. Methods are well known in the art for obtaining or otherwise synthesizing guide RNAs and for determining the appropriate sequence of the guide RNA, including the protospacer sequence which interacts and hybridizes with the target strand of a genomic target site of interest.
[0124] In various embodiments, the particular design aspects of a guide RNA sequence will depend upon the nucleotide sequence of a genomic target site of interest (i.e., the target site to be edited) and the type of napDNAbp (e.g., Cas9 protein) present in base editing smethods described herein, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc.
[0125] In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, Cas9 nickase, dead Cas9 domain, or Cas9 variant) to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.
[0126] In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequencespecific binding of a base editor (BE) to a target sequence may be assessed by any suitable assay. Suitable assays will occur to those skilled in the art.
[0127] In all aspects, the agRNA comprises a gRNA and a 3 '-nucleic acid extension (Figure IB and Figure 5A). The gRNA comprises a spacer sequence and a scaffold sequence. In some embodiments, the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA. In some embodiments, the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA by a nucleotide linker. In some embodiments, the nucleotide linker is ranges from 1-50 nucleotides in length. In various embodiments, the agRNA is capable of binding to a napDNAbp by the scaffold sequence of the gRNA and directing the napDNAbp to a target nucleic acid sequence (e.g., genomic DNA). The target nucleic acid sequence comprises (i) a target strand and (ii) a complementary non-target strand. The target strand comprises a protospacer sequence that binds to the spacer sequence of the gRNA forming an RNA-DNA hybrid. The non-target strand comprises the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor). The non-target strand binds to 3'-nucleic acid extension of the agRNA. In some embodiments, the 3 '-nucleic acid extension binds to the non-target strand using (a) an upstream binding sequence (UBS) that is complementary the non-target stand and binds downstream of the target nucleic acid on the non-target strand and/or (b) a downstream binding sequence (DBS) that is complementary the non-target stand and binds upstream of the target nucleic acid on the non-target strand (Figure IB and Figure 5A).
[0128] The UBS and/or the DBS can be of any suitable length. In certain embodiments, the UBS and/or the DBS is 0 nucleotides in length. In certain embodiments the UBS and/or DBS is at least 1 nucleotide, at least 2 nucleotides, 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, or at least
25, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, 42 nucleotides, at least 43 nucleotides, at least 44 nucleotides, at least 45
nucleotides, at least 46 nucleotides, at least 47 nucleotides, at least 48 nucleotides, at least 49 nucleotides, or at least 50 nucleotides in length.
[0129] In some embodiments, the UBS and/or the DBS are at least 85% homologous, at least 90% homologous, at least 95 % homologous, at least 97 % homologous, at least 99% homologous, at least 99.7 % homologous, or 100% homologous to the non-target strand. In some embodiments, the UBS and/or DBS comprises at least 1, at least 2, at least 3, at least 4, or at least 5 mismatches.
[0130] In some embodiments, the 3 '-nucleic acid extension further comprises a counterloop sequence (CLS). The CLS can be of any suitable length. In certain embodiments, the CLS is 0 nucleotides in length. In certain embodiments the CLS is at least 1 nucleotide, at least 2 nucleotides, 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, 42 nucleotides, at least 43 nucleotides, at least 44 nucleotides, at least 45 nucleotides, at least 46 nucleotides, at least 47 nucleotides, at least 48 nucleotides, at least 49 nucleotides, or at least 50 nucleotides in length.
[0131] In some embodiments, the CLS forms a secondary structural feature. In some embodiments, the CLS forms a hairpin. In some embodiments, the CLS is flanked by the UBS and the DBS.
[0132] In certain embodiments, the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA.
[0133] In certain embodiments, the 3 '-nucleic acid extension further comprises a secondary structural element. In certain embodiments, the secondary structure element is a tevopreQl motif.
[0134] In certain embodiments, the 3 '-nucleic acid extension comprises any structure selected from:
5'-[UBS]-[CLS]-[DBS]-3';
5'-[UBS]- [DBS]-3';
5'-[UBS]-[CLS]-3';
5'-[CLS]-[DBS]-3';
5'-[CLS]-3';
5'-[UBS]-3';
5'-[DBS]-3';
5'-[UBS]-[CLS]-[tevopreQl motif]-[DBS]-3'; and 5'-[UBS]-[CLS]-[DBS]-[tevopreQl motif]-3' wherein each instance of
comprises an optional linker, e.g. a peptide linker.
[0135] In certain embodiments, the agRNA comprises any structure selected from: 5'-[gRNA]-[UBS]-[CLS]-[DBS]-3';
5'-[gRNA]-[UBS]- [DBS]-3';
5'-[gRNA]-[UBS]-[CLS]-3';
5'-[gRNA]-[CLS]-[DBS]-3';
5'-[gRNA]-[CLS]-3';
5'-[UBS]-3';
5'-[DBS]-3';5'-[gRNA]-[UBS]-[CLS]-[tevopreQl motif]-[DBS]-3'; and
5'-[ gRNA]-[UBS]-[CLS]-[DBS]-[tevopreQl motif]-3' wherein each instance of
comprises an optional linker, e.g. a peptide linker.
[0136] In certain embodiments, the 3 '-nucleic acid extension has a nucleotide sequence of SEQ ID NO: 2-11, or a nucleotide sequence having at least 80% sequence identity therewith. In certain embodiments, the 3 '-nucleic acid extension comprises a sequence selected from the group consisting of:
CGCGCGTTCGCGCGG (SEQ ID NO: 2- 56114 agRNA);
CACGCGCGTTCGCGCTGGCACCA (SEQ ID NO: 3- 39979 agRNA);
CTGGCGCGTCGCGCTCTGG (SEQ ID NO: 4- 61531 agRNA);
CCTGCGCGTCGCGCTTCTGGCACCA (SEQ ID NO: 5- 60514 agRNA);
CTCGCGGCTTCGCGTGGCAC (SEQ ID NO: 6 - 62809 agRNA);
CACGCGGCTTCGCGGGCACCA (SEQ ID NO: 7 - 41197 agRNA); ACCGCGCTTCGCGTGGCACCA (SEQ ID NO: 8 - 48214 agRNA); CACCCCTCGCGTTCGCGTTCTGGCA (SEQ ID NO: 9 - 35622 agRNA); CCCTGGCGCGTTCGCGCGGCAC (SEQ ID NO: 10 - 56984 agRNA); and TGGCGCGGCTCGCGCTGGCACCA (SEQ ID NO: 11 - 63661 agRNA).
[0137] In certain embodiments, the agRNA is capable of binding to a napDNAbp by the scaffold sequence of the gRNA and directing the napDNAbp to a target nucleic acid sequence (e.g., genomic DNA). The target nucleic acid sequence comprises (i) a target strand and (ii) a complementary non-target strand. The target strand comprises a protospacer sequence that binds to the spacer sequence of the gRNA forming an RNA-DNA hybrid. The non-target strand comprises the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor). The non-target strand binds to 3 '-nucleic acid extension of the agRNA. In some embodiments, the 3 '-nucleic acid extension binds to the non-target strand using (a) an upstream binding sequence (UBS) that is complementary the non-target stand and binds downstream of the target nucleic acid on the non-target strand and/or (b) a downstream binding sequence (DBS) that is complementary the non-target stand and binds upstream of the target nucleic acid on the non-target strand (Figure IB and Figure 5A). In certain embodiments, the “perfect edit” is a single nucleotide substitution of the target nucleic acid within a target nucleic acid sequence. In certain embodiments, the target nucleic acid can be referred to as the “target site” of a nucleobase editor.
[0138] In certain embodiments, the target nucleic acid sequence comprises a target nucleic acid (also referred to as “the target nucleobase”), wherein the target nucleic acid falls within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site. In certain embodiments, the target nucleic acid falls within a gene (e.g., DNMTI), a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by pathogenic Single Nucleotide Polymorphisms (SNPs). SNPs are the most common genetic variations for various complex human diseases and disorders, including, but not limited to, inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods described herein. [0139] The present disclosure contemplates the use of an agRNA described herein for multiplex editing with a nucleobase editor described herein or elsewhere. In certain embodiments, the target nucleic acid sequence comprises one or more target nucleic acids (also referred to as “the target nucleobase”), wherein the one or more target nucleic acids fall within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site. In certain embodiments, the target nucleic acid falls within a gene (e.g., DNMTI), a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by pathogenic Single Nucleotide Polymorphisms (SNPs). SNPs are the most common
genetic variations for various complex human diseases and disorders, including inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods described herein.
[0140] In another aspect, the present disclosure provides an agRNA library comprising a plurality of agRNAs described herein. Each agRNA library comprises agRNAs that bind up- (e.g., utilizing UBSs) and/or downstream (i.e., utilizing DBSs) of a specific target DNA strand that is later present in the active site of the nucleobase editor (Figure IB and Figure 5A). Therefore, each agRNA library is specific to the intended target of the nucleobase.
[0141] In some embodiments, the agRNA library comprising between 2,000-75,000 different agRNAs. In some embodiments, the agRNA library comprising 2,000 different agRNAs. In some embodiments, the agRNA library comprising between 10,000 different agRNAs. In some embodiments, the agRNA library comprising between 20,000 different agRNAs. In some embodiments, the agRNA library comprising between 40,000 different agRNAs. In some embodiments, the agRNA library comprising between 55,000 different agRNAs. In some embodiments, the agRNA library comprising between 60,000 different agRNAs. In some embodiments, the agRNA library comprising between 75,000 different agRNAs.
[0142] In some embodiments, the agRNA library comprises a plurality of agRNAs with different 3 '-nucleic acid extensions. In some embodiments, the agRNA library varies the 3'- nucleic acid extensions by length. In some embodiments, the agRNA library varies the 3'- nucleic acid extensions by UBS, DBS, or CFS (e.g., the UBS and DBS are kept constant, while the CFS is varied). In some embodiments, the agRNA library varies the 3 '-nucleic acid extensions by structure such that the 3 '-nucleic acid extensions comprise different combinations of UBSs, CESs, and DBSs (e.g., an agRNA library comprising agRNAs with a CFS in the 3'- nucleic extension versus an agRNA library comprising agRNAs without a CFS in the 3'- nucleic extension), or further comprise other secondary structural elements (e.g., tevopreql motif). In some embodiments, the agRNA library is used to screen which agRNAs improve editing efficiency and/or reduce bystander editing of a nucleobase editor.
[0143] In some embodiments, the agRNA library consists of combinations of an array of upstream binding sequences (UBSs), counter-loop sequences (CESs), and downstream binding sequences (DBSs) for a particular target nucleic acid sequence. In some embodiments, the UBSs and the DBSs bind to the target nucleic acid (e.g., genomic DNA) surrounding the targeted edit (also referred to herein as the “perfect edit” or “target nucleic acid”). In certain embodiments, the target nucleic acid sequence comprises a target nucleic acid, wherein the target nucleic acid falls within a double-stranded DNA molecule such as a
gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site. In certain embodiments, the target nucleic acid falls within a gene, a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by pathogenic Single Nucleotide Polymorphisms (SNPs). SNPs are the most common genetic variations for various complex human diseases and disorders, including, but not limited to, inflammatory disorders, autoimmune disorders, and cancers. [0144] To measure the effect of an agRNA on the editing efficiency and/or reduced bystander editing of a nucleobase editor, the inventors describe an agRNA screening method. Therefore, in some embodiments, the disclosure provides polynucleotides, vectors, and cells, comprising an agRNA described herein for screening the editing pattern for each nucleobase combined with a particular agRNA.
[0145] In another aspect, the disclosure provides a polynucleotide encoding an agRNA described herein.
[0146] In another aspect, the present disclosure provides a vector comprising any polynucleotide described herein. In certain embodiments, the vector comprises a polynucleotide encoding an agRNA described herein. In certain embodiments, the polynucleotide can be under the control of a promoter. In certain embodiments, the polynucleotide can be under the control of multiple promoters. The promoter can be any promoter recognized by a skilled artisan (e.g., a constitutive promoter, a tissue- specific promoter, or an inducible promoter). The promoter can be a U6 promoter. The promoter can also be a U6, U6v4, U6v7, or U6v9 promoter or a fragment thereof.
[0147]
[0148] In certain embodiments, the vector further comprises a polynucleotide sequence comprising an agRNA described herein and a target nucleic acid sequence (e.g., a gene of interest) that includes the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor) (Figure 5A). In certain embodiments, the target nucleic acid sequence is located downstream of the agRNA sequence. In certain embodiments, the agRNA and the target nucleic acid sequence are within a 50-600-nucleotide window (e.g., a 100-nucleotide window, a 300-nucleotide window, a 450-nucleotide window, etc.).
[0149] In certain embodiments, the vector further comprises at least one primer binding site. In a preferred embedment, the vector further comprises at least two primer binding sites. In certain embodiments, the vector comprising the one or more primer binding sites is subjected to next-generation sequencing (NGS) to sequence the agRNA and the target nucleic acid after
the editing process in order to analyze the editing pattern (e.g., editing efficiency and bystander editing) of a nucleobase with a given agRNA.
[0150] In some embodiments, a first primer binding site is located upstream or within the agRNA, while a second primer binding site is located downstream of a target nucleic acid sequence. In certain embodiments, the distance between the first and second primer sites is less than 300 nucleotides. In some embodiments, the distance between the first and second primer sites is less than 600, less than 500, less than 300, less than 200, less than 100, or less than 50 nucleotides. In certain embodiments, the distance between the first and second primer sites is less than 300 nucleotides.
Selection of agRN As
[0151] In another aspect, the present disclosure provides an agRNA screening library comprising a plurality of vectors described above and provided in this disclosure. In certain embodiments, next-generation sequencing (NGS) is used to sequence the plurality of vectors of the agRNA screening library to analyze the editing pattern for each library clone within the 300-nucleotide window. In some embodiments, the target sequence is the human DNMT1 gene.
[0152] In another aspect, the present disclosure provides a method of selecting agRNAs comprising the steps of:
(a) transfecting the agRNA screening library described herein into cells; and
(b) using next generation sequencing (NGS) to select agRNA vectors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor, wherein the reduced bystander editing and/or editing efficiency are measured relative to the gRNA lacking the 3 '-nucleic acid extension of the agRNA or a second agRNA.
[0153] In certain embodiments, the bystander editing of the nucleobase editor at one or more sites is reduced by at least 2%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% relative to a reference agRNA.
[0154] In certain embodiments, the editing efficiency of the nucleobase editor at one or more sites is reduced by at least 2%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% relative to a reference agRNA.
Compositions and complexes
[0155] Some aspects of this disclosure provide compositions comprising any of the fusion proteins provided herein, and an agRNA optionally bound to napDNAbp of the fusion protein. Some aspects of this disclosure provide compositions comprising any of the fusion proteins provided herein, and an agRNA optionally bound to a Cas9 domain (e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase) of fusion protein.
[0156] Some aspects of this disclosure provide complexes comprising any of the fusion proteins provided herein, and an agRNA bound to napDNAbp of the fusion protein. Some aspects of this disclosure provide complexes comprising any of the fusion proteins provided herein, and an agRNA bound to a Cas9 domain (e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase) of fusion protein.
[0157] In another aspect, the present disclosure describes a composition comprising (a) an agRNA and (b) a nucleobase editor (e.g., ABExl, ABEx2, ABEx3, or ABEx4) to carry out nucleobase editing. In certain embodiments, the composition further comprises (c) a target nucleic acid. In certain embodiments, the nucleobase editor is a fusion protein capable of base editing. In some embodiments, the fusion protein comprises a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
[0158] In some embodiments, the composition comprises (a) an agRNA, (b) a N-terminal portion of a split nucleobase editor fused at its C-terminus to an intein-N and (c) a C-terminal portion of a split nucleobase editor fused at its N-terminus to an intein-C such that the N- terminal portion of a split nucleobase editor and the C-terminal portion of a split nucleobase editor are joined to form a fusion protein of a deaminase and a napDNAbp. In certain embodiments, the composition further comprising (d) a target nucleic acid.
[0159] In another aspect, the present disclosure describes a complex comprising any of the agRNAs described herein and a nucleobase editor described herein or elsewhere. In another aspect, the present disclosure describes one or more polynucleotides encoding a complex of an agRNA and a nucleobase editor described herein or elsewhere.
[0160] In another aspect, the present disclosure describes one or more vectors comprising one or more polynucleotides encoding a complex of an agRNA and a nucleobase editor. In certain embodiments, the vector includes one or more promoters that drive the expression of the agRNA and the nucleobase editor or split nucleobase editor of the complex.
napDNAbps
[0161] The nucleobase editors described herein may comprise a nucleic acid programmable DNA binding protein (napDNAbp).
[0162] In one aspect, a napDNAbp can be associated with or complexed with at least one guide nucleic acid (e.g., guide RNA or a agRNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target nucleic acid sequence) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the spacer of a guide RNA which anneals to the protospacer of the DNA target). In other words, the guide nucleic- acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to complementary sequence of the protospacer in the DNA.
[0163] Any suitable napDNAbp may be used in the nucleobase editors described herein. In various embodiments, the napDNAbp may be any Class 2 CRISPR-Cas system, including any type II, type V, or type VI CRISPR-Cas enzyme.
[0164] In certain embodiments, the napDNAbp is selected from the group consisting of Cas9, CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp-Cas9, SpRY, SpG-Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2(a-i), Casl4, Argonaute, and variants thereof. In certain embodiments, the nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., dCas9 and nCas9), Casl2e (CasX), Casl2d (CasY), Casl2a (Cpfl), Casl2bl (C2cl), and Casl2c (C2c3).
[0165] The above description of various napDNAbps which can be used in connection with the presently disclose nucleobase editors is not meant to be limiting in any way. The nucleobase editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein — including any naturally occurring variant, mutant, or otherwise engineered version of Cas9 — that is known or which can be made or evolved through a directed evolutionary or otherwise mutagenic process. In various embodiments, the Cas9 or Cas9 variants have a nickase activity, i.e., only cleave of strand of the target DNA sequence. In other embodiments, the Cas9 or Cas9 variants have inactive nucleases, i.e., are “dead” Cas9 proteins. Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid structure (e.g., the circular permutant formats). The nucleobase editors described herein may also comprise Cas9 equivalents, including Casl2a/Cpfl and Casl2b proteins which are the result of convergent evolution. The napDNAbps used herein (e.g., SpCas9, Cas9 variant, or Cas9 equivalents) may also contain various modifications that alter/enhance their PAM specificities (e.g., SpRY). Lastly, the
application contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a references SpCas9 canonical sequences or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
[0166] In various embodiments, the nucleobase editors described herein may be delivered to cells as two or more fragments which become assembled inside the cell (either by passive assembly, or by active assembly, such as using split intein sequences) into a reconstituted nucleobase editor. In some cases, the self-assembly may be passive whereby the two or more nucleobase editor fragments associate inside the cell covalently or non-covalently to reconstitute the nucleobase editor. In other cases, the self-assembly may be catalyzed by dimerization domains installed on each of the fragments. In still other cases, the selfassembly may be catalyzed by split intein sequences installed on each of the nucleobase editor fragments. The use of split nucleobase editors analogous to those described herein is further described in, for example, International Patent Application Publication No. WO 2017/197238, published November 16, 2017, which is incorporated herein by reference. [0167] In one embodiment, the nucleobase editor (BE) is divided at a split site within the napDNAbp.
Cytidine base editors
[0168] Fusion proteins useful for the methods disclosed herein include cytidine base editors (CBEs), in which the deaminase domain is a cytidine deaminase. In some embodiments, the deaminase domain is an apolipoprotein B mRNA-editing complex 1 (APOBEC1) deaminase domain. In some embodiments, a rat APOBEC1 (rAPOBECl) is used. In other embodiments, a human APOBEC1 is used. Other cytidine deaminases include APOBEC2 deaminase, APOBEC3A deaminase, APOBEC3B deaminase, APOBEC3C deaminase, APOBEC3D deaminase, APOBEC3F deaminase, APOBEC3G deaminase, and APOBEC3H deaminase, an activation-induced deaminase (AID), a cytidine deaminase 1 from Petromyzon marinus (pmCDAl), an ACF1/ASE deaminase, CBE6, CGBE, TadCBE, or a variant thereof. Reference is made to U.S. Patent Publication No. 2017/0121693, published May 4, 2017, which issued as U.S. Pat. No. 10,167,457 on Jan. 1, 2019, U.S. Patent Publication No. 2015/0166980, published Jun. 18, 2015, and U.S. Pat. No. 9,840,699, issued Dec. 12, 2017, each of which are incorporated herein by reference.
[0169] The cytidine base editors utilized in the disclosed methods may further comprise an inhibitor of base excision repair ("iBER") domain. In particular embodiments, the iBER domain may comprise a uracil glycosylase inhibitor (UGI) domain. In particular, the uracil glycosylase inhibitor domain prevents a U:G mismatch (or G:T mismatch) from being repaired back to the original C:G (or A:T) base pair. In some embodiments, the fusion protein comprises a catalytically inactive inosine-specific nuclease domain, such as a UGI domain. A UGI domain comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, 99%, 99.5%, or 99.9% identical to the amino acid sequence:
MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLT SDAPEYKPWALVIQDSNGENKIKML (SEQ ID NO: 12).
[0170] Configurations of the cytidine base editors utilized in the methods disclosed herein may comprise dCas9 and/or UGI domains that comprise fusion proteins having the general structure NH2-[dCas9]-[cytidine deaminase domain]-COOH, NH2-[cytidine deaminase domain]-[dCas9]-COOH, NH2-[dCas9]-[cytidine deaminase domain] -[uracil glycosylase inhibitor] -COOH, or NH2-[cytidine deaminase domain]-[dCas9]-[uracil glycosylase inhibitor] -COOH; wherein each instance of "]-[" comprises an optional linker, e.g. a peptide linker.
[0171] Configurations of the cytidine base editors utilized in the methods disclosed herein may comprise nCas9 and/or UGI domains that comprise fusion proteins having the general structure NH2-[nCas9]-[cytidine deaminase domain]-COOH, NH2-[cytidine deaminase domain]-[nCas9]-COOH, NH2-[nCas9]-[cytidine deaminase domain] -[uracil glycosylase inhibitor] -COOH, or NH2-[cytidine deaminase domain]-[nCas9]-[uracil glycosylase inhibitor] -COOH; wherein each instance of “]-[“ comprises an optional linker, e.g. a peptide linker.
[0172] The cytidine base editors (CBE) utilized in the disclosed methods may further comprise one, two, or more than two nuclear localization sequences (NLS). Configurations of such base editors (having a dCas9 domain) may comprise fusion proteins having the structure NH2-[dCas9]-[cytidine deaminase domain] -[NLS] -COOH, NH2-[dCas9]-[cytidine deaminase domain]-[NLS]-[NLS]-COOH, NH2-[cytidine deaminase domain]-[dCas9]-[NLS]-COOH, NH2-[cytidine deaminase domain]-[dCas9]-[NLS]-[NLS]-COOH, NH2-[dCas9]- [cytidine deaminase domain] -[uracil glycosylase inhibitor] -[NLS] -COOH, NH2-[dCas9]- [cytidine deaminase domain] -[uracil glycosylase inhibitor]-[NLS]-[NLS]-COOH, NH2-[cytidine deaminase domain] -[dCas9]- [uracil glycosylase inhibitor]-[NLS]-COOH, NH2-[cytidine deaminase domain] -[dCas9]- [uracil glycosylase inhibitor]-[NLS]-[NLS]-COOH; NH2-
[NLS]-[dCas9]-[cytidine deaminase domain]-COOH, NH2-[NLS]-[dCas9]-[NLS]-[cytidine deaminase domain]-COOH, NH2-[NLS]-[NLS]-[dCas9]-[cytidine deaminase domain]- COOH, NH2-[NLS]- [cytidine deaminase domain] -[dCas9]-COOH, NH2-[NLS]-[NLS]- [cytidine deaminase domain]-[dCas9]-COOH, or NH2-[NLS]- [cytidine deaminase domain]- [NLS]-[dCas9]-COOH, wherein each instance of "]-[" comprises an optional linker, e.g. a peptide linker.
Adenosine base editors
[0173] Fusion proteins useful for the methods disclosed herein include adenine base editors (ABEs), in which the deaminase domain is an adenosine deaminase. In various embodiments, the adenosine deaminase domain comprises the amino acid sequence of SEQ ID NO: 1 and 13-18.
[0174] In some embodiments, the adenosine deaminase is derived from a bacterium, such as, E. coli, S. aureus, S. typhi, S. putrefaciens, H. influenzae, or C. crescentus. In some embodiments, the adenosine deaminase is a TadA deaminase. In some embodiments, the TadA deaminase is an E. coli TadA deaminase (ecTadA). In some embodiments, the TadA deaminase is a truncated E. coli TadA deaminase. For example, the truncated ecTadA may be missing one or more N-terminal amino acids relative to a full-length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 N-terminal amino acid residues relative to the full length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 C-terminal amino acid residues relative to the full length ecTadA. In some embodiments, the ecTadA deaminase does not comprise an N-terminal methionine. Reference is made to U.S. Patent Publication No. 2018/0073012, published Mar. 15, 2018, which issued as U.S. Pat. No. 10,113,163, on Oct. 30, 2018, and U.S. Patent Publication No. 2015/0166980, published Jun. 18, 2015, each of which is incorporated herein by reference.
[0175] In some embodiments, the adenosine deaminase is an N-terminal truncated E. coli TadA (ecTadA). In certain embodiments, the adenosine deaminase comprises a sequence that has at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following amino acid sequence: MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPT AHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKT
GAAGSLMDVLHHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 13).
[0176] In some embodiments, the adenosine deaminase is a full-length E. coli TadA (“ecTadA(wt)”) deaminase. For example, in certain embodiments, the adenosine deaminase comprises a sequence that has at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following amino acid sequence: MRRAFITGVFFLSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEG WNRPIGRHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIG RVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADECAALLSDFFRMRRQEI KAQKKAQSSTD (SEQ ID NO: 14).
[0177] In some embodiments, the adenosine deaminase comprises a D108N mutation in SEQ ID NO: 13, or a corresponding mutation in a homologous or orthologous adenosine deaminase. In other embodiments, the adenosine deaminase further comprises an A106V mutation in SEQ ID NO: 13, or a corresponding mutation in a homologous or orthologous adenosine deaminase. Exemplary adenine base editors disclosed herein, such as ecTadA(D108N)-XTEN-dCas9, catalyze adenine deamination reactions in eukaryotic cells (e.g. HEK 293T mammalian cells). In certain examples, the fusion proteins disclosed herein have the general structure ecTadA*-XTEN-dCas9 (e.g. “ecTadA*(7.10)”), where ecTadA* represents an ecTadA variant comprising A106V and D108N mutations in the amino acid sequence of SEQ ID NO: 1. Thus, in some embodiments, the adenosine deaminase comprises the amino acid sequence:
MSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPT AHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKT GAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMPRQVFNAQKKAQSSTD (SEQ ID NO: 1).
[0178] Configurations of the adenine base editors utilized in the methods disclosed herein may comprise a dCas9 domain, and may comprise fusion proteins having the structure NH2- [dCas9]- [adenine deaminase domain] -COOH, NH2- [adenine deaminase domain] -[dCas9]- COOH, NH2-[dCas9]- [adenine deaminase domain]-[NLS]-COOH, NH2-[dCas9]-[adenine deaminase domain]-[NLS]-[NLS]-COOH, NH2-[adenine deaminase domain]-[dCas9]-[NLS]- COOH, NH2-[adenine deaminase domain]-[dCas9]-[NLS]-[NLS]-COOH, NH2-[NLS]- [dCas9]- [adenine deaminase domain]-COOH, NH2-[NLS]-[dCas9]-[NLS]-[adenine deaminase domain]-COOH, NH2-[NLS]-[NLS]-[dCas9]-[adenine deaminase domain]- COOH, NH2-[NLS]-[adenine deaminase domain] -[dCas9] -COOH, NH2-[NLS]-[NLS]-
[adenine deaminase domain] -[dCas9]-C00H, or NH2-[NLS]-[adenine deaminase domain]- [NLS]-[dCas9]-COOH, wherein each instance of comprises an optional linker, e.g. a peptide linker.
[0179] Configurations of the adenine base editors utilized in the methods disclosed herein may comprise an nCas9 domain, and may comprise fusion proteins having the structure NH2- [nCas9]- [adenine deaminase domain] -COOH, NH2- [adenine deaminase domain] -[nCas9]- COOH, NH2-[nCas9]- [adenine deaminase domain]-[NLS]-COOH, NH2-[nCas9]-[adenine deaminase domain]-[NLS]-[NLS]-COOH, NH2-[adenine deaminase domain]-[nCas9]-[NLS]- COOH, NH2-[adenine deaminase domain]-[nCas9]-[NLS]-[NLS]-COOH, NH2-[NLS]- [nCas9]- [adenine deaminase domain]-COOH, NH2-[NLS]-[nCas9]-[NLS]-[adenine deaminase domain]-COOH, NH2-[NLS]-[NLS]-[nCas9]-[adenine deaminase domain]- COOH, NH2-[NLS]-[adenine deaminase domain] -[nCas9] -COOH, NH2-[NLS]-[NLS]- [adenine deaminase domain] -[nCas9] -COOH, or NH2-[NLS]-[adenine deaminase domain]- [NLS]-[nCas9]-COOH, wherein each instance of “]-[” comprises an optional linker, e.g. a peptide linker.
[0180] In certain embodiments, the adenosine deaminase is selected from the group consisting of TadA-8e, ABE8e, AYBE, ABE9, and variants thereof. In certain embodiments, the deaminase is a TadA-8e adenosine deaminase. In certain embodiments, the deaminase is a TadA-8e adenosine deaminase variant.
[0181] In certain embodiments, the deaminase is an adenosine deaminase comprising an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1. In certain embodiments, the adenosine deaminase has an amino acid sequence that includes one or more amino acid substitutions at positions 8, 25, 26, 27, 28, 29, 30, 31, 33, 34, 37, 38, ,39, 41, 42, 43, 44, 45, 48, 49, 50, 54, 56, 58, 78, 79, 80, 82, 84, 85, 86, 88, 90, 91, 92, 93, 94, 95, 96, 97, 99, 100, 101, 102,106, 107, 109, 111, 123, 146, 149, 151, 152, 155, 156, and 157 of SEQ ID NO: 1. In certain embodiments, the adenosine deaminase has an amino acid sequence that includes one or more amino acid substitutions at positions 28, 34, and 151 of SEQ ID NO: 1. In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitutions H8D, R25E, R26G, E27R, V28C, P29C, V30W, G31E, V33C, V33T, I34W, N37T, N37H, N38I, R39E, I41S, G42A, E43R, G44A, W45G, A48P, I49S, G50A, D54T, A56P, A58S, A78R, T79H, L80P, V822R, F84I, F84L, E85R, P86A, V88R, C90V, A91R, G92R, A93R, M94H, I95D, H96P, S97L, I99D, G100R, R101P, V102R, V106A, R107E, S109P, S109L, R111K, R111T, R111A, Y123H, C146Q, R146K, Y149F, M151E, M151Q,
P152R, P152Q, V155E, F156K, and N157K. In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitutions V28C, L34W, and M151E. In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitution V28C (ABExl). In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitution L34W (ABEx2). In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitution M151E (ABEx3). In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitutions V28C and M151E (ABEx4).
[0182] In some embodiments, the adenosine deaminase comprises a V28C, L34W, and/or M151E mutation in SEQ ID NO: 1, or a corresponding mutation in a homologous or orthologous adenosine deaminase. In some embodiments, the adenosine deaminase comprises a V28C mutation in SEQ ID NO: 1, or a corresponding mutation in a homologous or orthologous adenosine deaminase. In some embodiments, the adenosine deaminase comprises a M151E mutation in SEQ ID NO: 1, or a corresponding mutation in a homologous or orthologous adenosine deaminase. Methods for determining homologous or orthologous adenosine deaminases to SEQ ID NO: 1 would be apparent to the skilled artisan.
[0183] Some aspects of the disclosure provide nucleobase editor fusion proteins comprising an TadA-8e deaminase variant and a napDNAbp. Exemplary fusion proteins include, without limitation, the following TadA-8e deaminase variants:
[0184] ABExl (V28C mutation is bolded and underlined, BPNLS sequence is underlined) MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDERECPVGAVLVLN NRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGA MIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFY RMPRQVFNAQKKAQSSIN (SEQ ID NO: 15).
[0185] ABEx2 (L34W mutation is bolded and underlined, BPNLS sequence is underlined) MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVWVLN NRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGA MIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFY RMPRQVFNAQKKAQSSIN (SEQ ID NO: 16).
[0186] ABEx3 (M15 IE mutation is bolded and underlined, BPNLS sequence is underlined) MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLN NRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGA
MIHSRIGRVVEGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDEY REPRQVFNAQKKAQSSIN (SEQ ID NO: 17).
[0187] ABEx4 (V28C/M15 IE mutations are bolded and underlined, BPNLS sequence is underlined)
MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDERECPVGAVLVLN NRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGA MIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFY REPRQVFNAQKKAQSSIN (SEQ ID NO: 18).
Additional nucleobase editor elements
[0188] In various embodiments, the fusion proteins disclosed herein further comprise one or more, preferably at least two nuclear localization signals. In certain embodiments, the fusion proteins comprise at least two NLSs. In embodiments with at least two NLSs, the NLSs can be the same NLSs or they can be different NLSs. In addition, the NLSs may be expressed as part of a fusion protein with the remaining portions of the fusion proteins. The location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a fusion protein (e.g. inserted between the encoded Cas9 and a DNA effector moiety (e.g. a deaminase)).
[0189] The NLSs may be any known NLS sequence in the art. The NLSs may also be any future-discovered NLSs for nuclear localization. The NLSs also may be any naturally- occurring NLS, or any non-naturally occurring NLS (e.g. an NLS with one or more desired mutations).
[0190] A nuclear localization signal or sequence (NLS) is an amino acid sequence that tags, designates, or otherwise marks a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. Different nuclear localized proteins may share the same NLS. An NLS has the opposite function of a nuclear export signal (NES), which targets proteins out of the nucleus. A nuclear localization signal can also target the exterior surface of a cell. Thus, a single nuclear localization signal can direct the entity with which it is associated to the exterior of a cell and to the nucleus of a cell. Such sequences can be of any size and composition, for example more than 25, 25, 15, 12, 10, 8, 7, 6, 5 or 4 amino acids, but will preferably comprise at least a four to eight amino acid sequence known to function as a nuclear localization signal (NLS).
[0191] The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport.
Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed Nov. 23, 2000, published as WO 2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences.
[0192] In certain embodiments, linkers may be used to link any of the peptides or peptide domains of the disclosure. As defined above, the term "linker," as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g. a binding domain and a cleavage domain of a nuclease. In some embodiments, a linker joins a dCas9 and deaminase domain (e.g. a cytidine or adenosine deaminase). Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g. a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5- 100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30, 31, 32, 33-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
[0193] In some embodiments, the linker is a peptide linker, such as an XTEN linker, a 16 amino acid linker.
[0194] In some embodiments, the fusion protein described herein may comprise one or more heterologous protein domains, e.g. epitope tags and reporter gene sequences. In some embodiments, the heterologous protein domain comprises a reporter sequence comprising a p2A-GFP insert ((Addgene plasmid #65562; RRID:Addgene_65562), see Li J, et al., Intron targeting-mediated and endogenous gene integrity-maintaining knockin in zebrafish using the CRISPR/Cas9 system. Cell Res. (2015)). Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP). A fusion protein may be fused to a gene sequence encoding a protein or a fragment of a protein that bind DNA molecules or bind other cellular molecules, including, but not limited to, maltose
binding protein (MBP), S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, and herpes simplex virus (HSV) BP16 protein fusions. Additional domains that may form part of a fusion protein are described in US Patent Publication No. 2011/0059502, published Mar. 10, 2011 and incorporated herein by reference in its entirety.
Delivery of base editors
[0195] In some aspects, the invention provides methods comprising delivering one or more nucleobase editor-encoding polynucleotides, such as or one or more vectors as described herein encoding one or more components of the base editing system described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, a nucleobase editor as described herein in combination with (and optionally complexed with) an anchor guide sequence is delivered to a cell. Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a nucleobase editor to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. For a review of gene therapy procedures, see Anderson, Science 256:808-813 (1992); Nabel & Feigner, TIBTECH 11:211-217 (1993); Mitani & Caskey, TIBTECH 11:162-166 (1993); Dillon, TIBTECH 11:167-175 (1993); Miller, Nature 357:455-460 (1992); Van Brunt, Biotechnology 6(10): 1149- 1154 (1988); Vigne, Restorative Neurology and Neuroscience 8:35-36 (1995); Kremer & Perricaudet, British Medical Bulletin 51(1):31-44 (1995); Haddada et al., in Current Topics in Microbiology and Immunology Doerfler and Bihm (eds) (1995); and Yu et al., Gene Therapy 1:13-26 (1994).
[0196] Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipidmucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of
polynucleotides include those of Feigner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration). [0197] The preparation of lipidmucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).
[0198] The use of RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus. Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo). Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
Cells
[0199] In another aspect, the disclosure provides cells (e.g., transformed cell lines) that comprise the agRNA described herein. The cells can also comprise the nucleobase editing complexes described herein (e.g., wherein the cell comprises both an agRNA and a nucleobase editor). The cells can also comprise any of the polynucleotides described above, which express the agRNA, and optionally which express the nucleobase editors. In addition, the cells can comprise any of the vectors described above, which express the agRNA, and optionally which express the nucleobase editors.
[0200] In another aspect, the disclosure describes a method of selecting agRNAs, wherein the method involves transfecting the agRNA screening libraries described above and in this discloser into host cells, and using NGS to select agRNA vectors that observe reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid by a nucleobase editor. In certain embodiments, the reduced bystander editing and/or editing efficiency of a nucleobase editor
is measured relative to the gRNA lacking the 3 '-nucleic acid extension of the agRNA or a second agRNA.
[0201] In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, the cell that is transfected is derived from cells taken from a non-human subect, such as a mammal (e.g., primate (e.g., cynomolgus monkey or rhesus monkey), elephant, or mouse) or an avian (e.g., a bird). In some embodiments, the cell that is transfected is derived from cells taken from plants, fungi, bacteria, and archaea. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huhl, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panel, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calul, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS- 6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A 172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293. BxPC3. C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-K1, CHO-K2, CHO-T, CHO Dhfr-/-, COR-L23, COR-L23/CPR, COR- L23/5010, COR-L23/R23, COS-7, COV-434, CMLT1, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepalclc7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYO1, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-10A, MDA- MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH- 3T3, NALM-1, NW- 145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR, and transgenic varieties thereof. Cell lines are available from a variety of sources known to those with skill in the art (see, e.g. the American Type Culture Collection (ATCC) (Manassas, Va.)). In some embodiments, a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences. In some embodiments, a cell transiently
transfected with the components of a CRISPR system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence. In some embodiments, cells transiently or non-transiently transfected with one or more vectors described herein, or cell lines derived from such cells are used in assessing one or more test compounds.
Pharmaceutical compositions
[0202] In another aspect, the disclosure provides a pharmaceutical composition comprising: (i) an agRNA described above, or a nucleobase editing complex described above, a polynucleotide described above, or a vector described above, or any of the cells described above, and (ii) a pharmaceutically acceptable excipient.
[0203] Other aspects of the present disclosure relate to pharmaceutical compositions comprising any of the various components of the base editing system described herein (e.g., including, but not limited to, the napDNAbps, deaminases, fusion proteins (e.g., comprising napDNAbps and deaminases), pegRNAs, and complexes comprising fusion proteins and agRNAs, as well as accessory elements.
[0204] The term “pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises additional agents (e.g. for specific delivery, increasing half-life, or other therapeutic compounds).
[0205] As used here, the term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body). A pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.). Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose,
methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or poly anhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein.
[0206] In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.
[0207] In some embodiments, the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site). In some embodiments, the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.
[0208] In other embodiments, the pharmaceutical composition described herein is delivered in a controlled release system. In one embodiment, a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng. 14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med. 321:574). In another embodiment, polymeric materials can be used. (See, e.g., Medical Applications of Controlled
Release (Langer and Wise eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug Product Design and Performance (Smolen and Ball eds., Wiley, New York, 1984); Ranger and Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem. 23:61. See also Levy et al., 1985, Science 228:190; During et al., 1989, Ann. Neurol. 25:351; Howard et al., 1989, J. Neurosurg. 71:105). Other controlled release systems are discussed, for example, in Langer, supra.
[0209] In some embodiments, the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human. In some embodiments, pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer. Where necessary, the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the pharmaceutical is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.
[0210] A pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer’s or Hank’s solution. In addition, the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.
[0211] The pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration. The particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein. Compounds can be entrapped in “stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther. 1999, 6:1438-47). Positively charged lipids such as N-[l-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S. Patent Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.
[0212] The pharmaceutical composition described herein may be administered or packaged as a unit dose, for example. The term “unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.
[0213] Further, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection. The pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention. Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.
[0214] In another aspect, an article of manufacture containing materials useful for the treatment of the diseases described above is included. In some embodiments, the article of manufacture comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. In some embodiments, the container holds a composition that is effective for treating a disease described herein and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle. The active agent in the composition is a compound of the invention. In some embodiments, the label on or associated with the container indicates that the composition is used for treating the disease of choice. The article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
Computational Methods
[0215] In another aspect, the disclosure describes a computational method, which may be embodied in software, for designing a library of 3'-nucleic acid extensions. The method involves efficiently evaluating a nucleic acid target and generating one or more combinations of a UBS, DBS, and/or CLS for nucleobase editing.
[0216] The terms “program” or “software” are used herein in a generic sense to refer to any type of computer code or set of processor-executable instructions that may be employed to program a computer or other processor to implement various aspects of embodiments as described above. Additionally, according to one aspect, one or more computer programs that when executed perform methods of the disclosure provided herein need not reside on a single computer or processor but may be distributed in a modular fashion among different computers or processors to implement various aspects of the disclosure provided herein.
[0217] Some aspects of this disclosure provide methods of selecting nucleobase editors that show reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid utilizing machine learning (ML) language models. In certain embodiments, the ML language models are able to predict evolutionary adaptative mutations resulting in nucleobase editor variants with improved fitness.
[0218] In some embodiments, the present disclosure describes a method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor comprising:
(a) using machine learning (ML) language models to predict evolutionary adaptative mutations resulting in nucleobase editor variants with improved fitness;
(b) selecting from the one or more nucleobase editor variant(s) of (a);
(c) cloning the nucleobase editor variants of (b) and testing in target cells; and
(d) scoring the nucleobase editor variants based on enrichment values, wherein the nucleobase editor variant with improved fitness (i) reduces bystander editing within an editing window of a target nucleic acid for the one or more nucleobase editors and/or (ii) improves editing efficiency of a target nucleic acid by the one or more nucleobase editors. [0219] In certain embodiments, the machine learning model is an ESM-lb language model and/or an ESM-lv language model, wherein said language models (i) learn natural amino acid patterns based on millions of naturally occurring protein sequences, (ii) consider mutations observed in sequences of natural proteins as plausible mutations and (iii) assume plausible mutations with high likelihood scores correlate with improved protein fitness.
Evolution of nucleobase editor
[0220] Some aspects of this disclosure provide methods of phage-assisted, non-continuous evolution (PANCE) of a nucleobase editor.
[0221] Some aspects of this disclosure provide methods of phage-assisted, non-continuous evolution (PANCE) of a nucleobase editor. In PANCE, selection phages (SP) are generated that encoded an adenine deaminase with a C-terminal intein, as well as selection plasmids (e.g., SP 1-3) that encode a nCas9-SpRY with a N-terminal intein. The selection plasmid comprises (i) a pill nucleotide sequence (encoding the phage coat protein pill) that has been modified to contain at least one a single nucleotide variant (SNV) and (ii) an agRNA nucleotide sequence encoding the corresponding agRNA that targets the modified pill nucleotide sequence to correct (edit) the SNV to the wildtype sequence. The SNV results in a mutated pill protein having lower the phage infectivity. Correction of the SNV by a complex of the agRNA and nucleobase editor to the WT sequence increases phage infectivity. If the perfect edit occurs, the pill sequence is reverted to the wildtype sequence and phage propagation occurs. Similarly, the selection plasmid comprises a sequence such that bystander edits upstream and downstream of the target nucleic acid in the pill nucleotide sequence introduce mutations that inhibit phage propagation.
[0222] In some embodiments, host cells further comprise a helper plasmid and/or a mutagenesis plasmid. In some embodiments, a mutagenesis plasmid comprises an arabinose- inducible promoter.
[0223] In some embodiments, the present disclosure describes a method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor, the method comprising an agRNA described herein as part of PANCE system.
[0224] In some embodiments, the method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor, comprises the steps:
(a) generating selection phages for PANCE,
(b) generating selection plasmids for PANCE, wherein the selection plasmids encode a pill gene further comprising mutations that diminish pill activity, wherein (i) pill activity is restored by a nucleobase editor if the nucleobase editor edits the target nucleic acid or (ii) pill
activity is not restored by a nucleobase editor if the nucleobase editor edits bystander nucleic acids;
(c) generating selection cells for PANCE;
(d) performing PANCE as batch cultivations, wherein the PANCE system exerts selection pressure using one or more selection plasmids of (b), wherein adaptive mutations of the nucleobase editor that retain target nucleic acid base editing activity are necessary for phage propagation or wherein adaptive mutations of the nucleobase editor that increase bystander editing of one or more nucleic acids within the editing window are disadvantageous for phage propagation;
(e) performing NGS on each batch of (d) and identifying nucleobase editor variants;
(f) cloning the nucleobase editor variants of (e) and testing in target cells; and
(g) scoring the nucleobase editor variants based on enrichment values.
[0225] In certain embodiments, the selection plasmids generated for PANCE comprise a sequence selected from any one of SEQ ID NOs: 19-21.
[0226] In certain embodiments, the selection plasmids generated for PANCE comprise the sequence:
[0227] SP1 (F366L)
TTGACAGGTGAACGCTCAGCTCTTATAATGCCTATAGGAAAAGAAGGTAAATATT GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAA AAAGTGGCACCGAGTCGGTGCCGCGCGTTCGCGCGGCGCGGTTCTATCTAGTTAC GCGTTAAACCAACTAGAA (SEQ ID NO: 19).
[0228] In certain embodiments, the selection plasmids generated for PANCE comprise the sequence:
[0229] SP2 (K360R)
TTGACAGGTGAACGCTCAGCTCTTATAATGCCTATTTTCAAACAATATTTACCTTG TTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAA AAGTGGCACCGAGTCGGTGCCGCGCGTTCGCGCGGCGCGGTTCTATCTAGTTACG CGTTAAACCAACTAGAA (SEQ ID NO: 20).
[0230] In certain embodiments, the selection plasmids generated for PANCE comprise the sequence:
[0231] SP3 (I411V)
TTGACAGGTGAACGCTCAGCTCTTATAATGCCTATTGTATATATTTTCTACGTTTG TTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAA
AAGTGGCACCGAGTCGGTGCCGCGCGTTCGCGCGGCGCGGTTCTATCTAGTTACG CGTTAAACCAACTAGAA (SEQ ID NO: 21).
[0232] A person skilled in the art would appreciate the use of other evolution systems (e.g., phage-assisted, continuous evolution (PACE)) for generating evolved nucleobase editors with reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid in conjunction with the agRNAs described herein.
Base editing
[0233] The present invention relates to an improved version of “base editing” that utilizes modified or equivalently, engineered agRNAs which are engineered to comprise one or more structural modifications that improve one or more characteristics, including their stability, cellular lifespan, affinity for Cas9 (or more broadly, to a napDNAbp), or interaction with a target DNA thereby increasing the editing efficiency base editing and reducing bystander editing within the base editing window of a nucleobase editor.
[0234] Some aspects of this disclosure provide methods of using any of the fusion proteins (e.g., a Cas9 domain fused to an adenosine deaminase) provided herein, or complexes comprising an agNRA and a fusion protein (e.g., a Cas9 domain fused to an adenosine deaminase) provided herein. For example, some aspects of this disclosure provide methods comprising contacting a DNA, or RNA molecule with any of the fusion proteins or nucleobase editors provided herein, and with at least one agNRA, wherein the agRNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence.
[0235] In another aspect, the disclosure provides a method of nucleobase editing ( e.g., “base editing”) comprising contacting a target nucleic acid sequence with an agRNA described above and a nucleobase editor comprising a fusion protein comprising a deaminase and a napDNAbp or a split napDNAbp, wherein the editing efficiency is increased and/or the bystander editing is decreased as compared to the same method using a gRNA not comprising the 3Z -nucelic acid extension.
[0236] The present disclosure contemplates the use of the agRNAs described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell (e.g., prokaryotic or eukaryotic cell), a virus, a fungus, a plant, an insect, and/or an animal. The
present disclosure contemplates the use of the methods described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell, a virus, a fungus, a plant, an insect, and/or an animal. In other aspects, the use of the methods described herein may be used for modifying a target nucleic acid sequence for research purposes.
[0237] In certain embodiments, the present disclosure contemplates the use of the base editing methods described herein for targeted modifications in the genomes of plants for improved crop varieties.
[0238] In some embodiments, the target DNA sequence comprises a sequence associated with a disease or disorder. In some embodiments, the target DNA sequence comprises a point mutation associated with a disease or disorder.
[0239] In some embodiments, the activity of the fusion protein (e.g., comprising a napDNAbp and deaminase), or the complex, results in a correction of the point mutation. [0240] In some embodiments, the methods provided herein are used to introduce a deactivating point mutation into a gene or allele that encodes a gene product that is associated with a disease or disorder. For example, in some embodiments, methods are provided herein that employ a DNA editing fusion protein to introduce a deactivating point mutation into an oncogene e.g., in the treatment of a proliferative disease). A deactivating mutation may, in some embodiments, generate a premature stop codon in a coding sequence, which results in the expression of a truncated gene product, e.g., a truncated protein lacking the function of the full-length protein.
[0241] In some embodiments, the purpose of the methods provided herein is to restore the function of a dysfunctional gene via genome editing. The nucleobase editing proteins provided herein can be validated for gene editing-based human therapeutics in vitro, e.g., by correcting a disease-associated mutation in human cell culture.
[0242] The successful correction of point mutations in disease-associated genes and alleles opens up new strategies for gene correction with applications in therapeutics and basic research.
[0243] The instant disclosure provides methods for the treatment of a subject diagnosed with a disease associated with or caused by a point mutation that can be corrected by a DNA editing fusion protein provided herein. For example, in some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of a nucleobase
editor fusion protein that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene.
[0244] In some embodiments, the disease is a proliferative disease. In some embodiments, the disease is a genetic disease. In some embodiments, the disease is a neoplastic disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease. Other diseases or disorders that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated gene will be known to those of skill in the art, and the disclosure is not limited in this respect.
Kits
[0245] The compositions of the present disclosure may be assembled into kits. In some embodiments, the kit comprises nucleic acid vectors for the expression of the nucleobase editors described herein. In other embodiments, the kit further comprises appropriate guide nucleotide sequences (e.g., agRNAs) or nucleic acid vectors for the expression of such guide nucleotide sequences, to target the Cas9 protein or nucleobase editor to the desired target sequence.
[0246] The kit described herein may include one or more containers housing components for performing the methods described herein and optionally instructions for use. Any of the kit described herein may further comprise components needed for performing the assay methods. Each component of the kits, where applicable, may be provided in liquid form (e.g., in solution) or in solid form, (e.g., a dry powder). In certain cases, some of the components may be reconstitutable or otherwise processible (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water), which may or may not be provided with the kit.
[0247] In some embodiments, the kits may optionally include instructions and/or promotion for use of the components provided. As used herein, “instructions” can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc. The written instructions may be in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which can also reflect approval by the agency of manufacture, use or sale for animal administration. As used herein, “promoted” includes all methods of doing business
including methods of education, hospital and other clinical instruction, scientific inquiry, drug discovery or development, academic research, pharmaceutical industry activity including pharmaceutical sales, and any advertising or other promotional activity including written, oral and electronic communication of any form, associated with the disclosure. Additionally, the kits may include other components depending on the specific application, as described herein.
[0248] The kits may contain any one or more of the components described herein in one or more containers. The components may be prepared sterilely, packaged in a syringe and shipped refrigerated. Alternatively it may be housed in a vial or other container for storage. A second container may have other components prepared sterilely. Alternatively the kits may include the active agents premixed and shipped in a vial, tube, or other container.
[0249] The kits may have a variety of forms, such as a blister pouch, a shrink wrapped pouch, a vacuum sealable pouch, a sealable thermoformed tray, or a similar pouch or tray form, with the accessories loosely packed within the pouch, one or more tubes, containers, a box or a bag. The kits may be sterilized after the accessories are added, thereby allowing the individual accessories in the container to be otherwise unwrapped. The kits can be sterilized using any appropriate sterilization techniques, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art. The kits may also include other components, depending on the specific application, for example, containers, cell media, salts, buffers, reagents, syringes, needles, a fabric, such as gauze, for applying or removing a disinfecting agent, disposable gloves, a support for the agents prior to administration, etc.
Some aspects of this disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding the various components of the nucleobase editing system described herein (e.g., including, but not limited to, the napDNAbps, deaminases, fusion proteins (e.g., comprising napDNAbps and deaminases, agRNAs, and complexes comprising fusion proteins and agRNAs, as well as accessory elements. In some embodiments, the nucleotide sequence(s) comprises a heterologous promoter (or more than a single promoter) that drives expression of the nucleobase editing system components.
[0250] Other aspects of this disclosure provide kits comprising one or more nucleic acid constructs encoding the various components of the nucleobase editing system described herein, e.g., the comprising a nucleotide sequence encoding the components of the nucleobase editing system capable of modifying a target DNA sequence. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the nucleobase editing system components.
[0251] Some aspects of this disclosure provides kits comprising a nucleic acid construct, comprising (a) a nucleotide sequence encoding a napDNAbp (e.g., a Cas9 domain) fused to a deaminase and (b) a heterologous promoter that drives expression of the sequence of (a).
Additional terms
[0252] As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, for example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements) ;etc.
[0253] The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
[0254] Use of ordinal terms such as “first,” “second,” “third,” etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed. Such
terms are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term). The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” “having,” “containing,” “involving,” and variations thereof, is meant to encompass the items listed thereafter and additional items.
[0255] The foregoing concepts, and additional concepts discussed below, may be arranged in any suitable combination, as the present disclosure is not limited in this respect. Further, other advantages and novel features of the present disclosure will become apparent from the following detailed description of various non-limiting embodiments when considered in conjunction with the accompanying figures.
EXAMPLES
Example 1: 3 ' extended gRNAs (agRNAs) for targeted manipulation of bystander edits, activity, and editing window
[0256] The mechanism underlying the adenosine deamination process involves that the TadA-8e domain engages with the exposed single-stranded region of the PAM-distal nontarget strand (NTS) (Lapinaite)] (FIG. 5A). The TadA-8e deaminase, when attached to the Cas9 protein, induces specific editing patterns within narrow DNA regions, covering several base pairs. This connection limits the enzyme to act on certain nucleotides, defining what is known as the editing window. Since different DNA contexts are relatively diverse substrates that require the enzyme to accept structure variations within its active site, the DNA strand in the active site has a certain degree of freedom to move in this position.
[0257] A system that stabilizes the DNA strand within the active site was therefore established. This restricted movement may result in a smaller editing window thus minimizing the bystander effect. Therefore, one option was the possibility of adding nucleotides to the 3’ end of the gRNA scaffold. These anchor guide RNAs (agRNAs) were designed to bind the DNA strand up- and downstream of the DNA region that is later present in the active site of the TadA-8e, thereby stabilizing the loop structure resulting in fewer bases being deaminated (FIGs. 1A and 5A).
[0258] With this aim, a library of short sequences and entire hairpin structures (counterloops) at the opposite site of the loop to introduce structures that may sterically restrict the movement of the DNA and TadA enzyme even further was designed (FIGs. 1A and 5A).
[0259] The agRNA library consisted of combinations of an array of upstream binding sequences, counterloops, and downstream binding sequences (FIGs. 5A-5B). Both the up-
and downstream sequences bound the DNA strand surrounding the targeted edit. Sequences of lengths ranging from 1 to 11 base pair (bp) were tested, with all possible starting points in a 11 bp window. The counterloop sequences ranged from 1 to 33 bp, with the longer ones forming guanine-cytosine (GC)-rich hairpins. This design process yielded an agRNAs library containing ~60K candidates. To screen the library in a high throughput manner, a plasmid with the editing target downstream of the agRNA (sensor library) was constructed. With the agRNA and the target being in a 300-nucleotide window, it was possible to perform nextgeneration sequencing (NGS) on the sequence in this region after the editing process and analyze the editing pattern for each library clone (FIGs. 1A and 5A).
[0260] he tested library was designed to target a site in the human DNMT1 locus, being an optimal candidate for screening, both for high accessibility for editing in HEK293T cells, and the multiple adenines context within the editing window. The ABE8e-spCas9-WT nucleobase editor in combination with a non-modified guide (sgRNACtrl) showed a high editing efficiency for the four adenines in the editing window (A13, A14, A15 and A17). A slight preference towards the two adenines at position 13 and 15 (FIGs. ID, 5B, and 5D) was observed with almost the same editing efficiency, therefore, the library was designed to precisely edit A13. With the current editing tools, undesired bystander edits in the selected context remained unavoidable (FIGs. ID, 6B).
[0261] First, anchors that showed higher efficiency and lower bystander editing in the DNMT1 sensor library (FIG. 1C) were selected. Five candidates to test in the native context in HEK293T cells (clones 48214, 62809, 56114, 41197 and 39979) were selected, and all of them showed a decrease in bystander editing (FIG. ID). Clone 56114 was the one that showed higher precision in A13 editing, with a significant reduction of 44% in A 17 and 34% in A16 (FIGs. ID and 5C-5e). To test the reproducibility of this effect in different cell lines, the agRNA56114-tevopreql was tested in Hela and HepG2 cells, obtaining similar bystander reduction patterns (FIG. II).
[0262] To determine whether the anchor served as more than just another 3' RNA degradation protector like the previously described tevopreql (Nelson et al), various guide combinations were benchmarked with and without the tevopreql motif. The motif increased the editing efficiency of the control guide, potentially due to resisting exonucleolytic degradation as has been previously described (Nelson et al). The absence of the motif in the agRNA did not affect the bystander reduction effect, suggesting that the reduction may be independent of the tevopreql motif (FIG. 1G).
[0263] To test the ability of influencing the editing pattern in different contexts, different sgRNA and agRNA libraries targeting -12000 pathogenic single nucleotide variants (SNVs) that can be targeted by base editing based on proximity to a PAM (Arbab) were constructed. The libraries sgRNACtrl-tevopreql and agRNA56114-tevopreql decreased the editing efficiency of the ABE8e-spCas9-WT when compared to sgRNACtrl (FIG. 1H). The library with agRNAs56114 slightly decreased the editing efficiency. These results suggested that the 3’ modification of the gRNAs maintains the guide's functionality, but the context influences the efficiency of the anchor and the tevopreql motif. This data suggests that specific agRNAs should be designed for specific contexts to maximize both precision and efficiency.
Example 2: Evolution towards decreased bystander edits via Phage-Assisted Non- Continuous Evolution
[0264] Although a robust reduction in bystanders was observed, a more precise (a perfect) edit at position A could be achieved. To refine the editing, the enzyme TadA-8e was evolved toward decreasing bystander edits and the impact of different combinations of new variants with agRNAs on editing pattern was evaluated.
[0265] Phage-assisted non-continuous evolution has been used in the past to increase the activity of ABEs (Richter). A selection method was designed to evolve variants with decreased bystander edits that can also work with the agRNAs disclosed herein. A selection pressure that decreased the phage titer in response to bystander edits, but also increased it upon the perfect edit, was required in order to prevent the evolution of an inactive TadA enzyme. To achieve this, the activity of the nucleobase editor encoded on the M13 phage genome was linked with the expression of the gene 111 (encoding for the pill protein) that is required for phage replication (Esvelt). Since single mutagenesis data for pill amino acid exchanges are well characterized (Weiss), groups of 2-3 amino acids that drastically decrease phage replication were identified (Weiss). Toward this goal, the codons of selected amino acids in pill were mutated to be T/A rich, which made these contexts responsive to A>G editing. Three contexts in which bystander edits would result in a decrease in phage titer were identified, while the perfect edit generated an adaptive mutation that also selects for targeted efficiency (FIGs. 2A-2B). Each identified pill variant was cloned into individual plasmids (SP1-3) (FIG. 6B). The selection plasmids also encoded the corresponding agRNA targeting the SNV, as well as a C-terminal Intein-nCas9-SpRY. Once the selection phage carrying the N-terminal Intein-TadA-8e fusion protein infected the cell (FIG. 6A), a functional nucleobase editor was expressed, and was capable of performing the editing (FIGs. 2A-2B and 6C).
[0266] The initial batch cultivation was infected with the selection phage and after 12 hours of cultivation, different volumes of the supernatant were used to infect the next batch cultivation. Phage DNA was isolated from each selection round and NGS obtained after the last round of evolution (FIG. 6D). An increase in phage titer was observed after round 4 suggesting an increased activity of the enzyme mutants encoded by the phage (FIG. 6E). [0267] The 3 replicates showed almost identical enrichment of the same variant over the course of the evolution (Table A). The sequencing data also showed slow enrichment of different mutants with amino acid exchanges, with a maximum enrichment of 3-4 % of the new amino acid compared to the wildtype (FIG. 2D).
[0268] Table A
[0269] Once the TadA mutational landscape was determined, the top 50 most enriched mutations were selected, and individually tested in the DNMT1 site (FIGs. 7 A and 7C). Variants with both, higher efficiency and reduced bystander editing pattern at position Ar, were detected, when compared with the Abe8e-SpRY BE (FIG. 2E). The PANCE evolved TadA variants were also benchmarked against the ABE9 nucleobase editor (both WT and SpRY) that showed low editing efficiency and no relative bystander reduction in the DNMT1 site (FIGs. 2E and 7A-7C). Variants displaying V28C, L34W, D54T, and I95D showed potential to generate a perfect edit at position Ar, (FIG. 2E). These variants showed an enrichment across the different rounds of evolution suggesting an advantage over other variants in response to the selection pressure (FIG. 2F). Despite being evolved with the agRNA56ii4-tevopreqi, most of the variants were still functional with the unmodified DNMT1 gRNA (FIG. 7B).
[0270] The crystal structure of ABE8e was computationally analyzed to better understand the impact of these mutations on ABE8e. Variants V28C and L34W were generated in silica, separately, and their interactions with surrounding amino acids and nucleotides were compared to wild-type, but no changes in interactions were predicted (Methods) (FIG. 8A). It possible that these mutations induced a conformational change in ABE8e that alter interactions of ABE8e residues H57 and C87 with nucleotide 8-Az(26) of the gRNA. Based on the wild-type crystal structure, H57 and C87 were predicted to establish three van der Waals interactions and two hydrogen bonds with 8-Az(26), respectively. In the case of variant V28C, there could be an approximation of C28 (below the measured distance 5.101 A) to 8-Az(26) of the gRNA from the opposite side than H57 and C87 (FIG. 2F). In line with this, a decrease of 0.77 A in the distance between residue 28 and 8-Az(26) was measured. In the case of variant L34W, the tryptophan, which is more hydrophobic than leucine, might alter the orientation of the alpha-helix arm (orange) where residue H57 lies (FIG. 2H).
[0271] Non-stop codon based PANCE selection proved to be a powerful tool to evolve base editing mutants that showed decreased bystander editing without losing on-target activity.
Example 3: Machine learning guided identification of additional ABE candidates.
[0272] Improving protein fitness traditionally involves screening a vast library of random mutations and selecting those that enhance a desired function under specific pressures (directed evolution). However, the PANCE experiment demonstrates that even with increased fitness, specific mutational patterns across protein families are essential for evolvability. In general, most random mutations will be destabilizing or neutral for protein function (FIG. 3A).
[0273] In contrast, protein language models trained on massive, non-redundant protein sequence datasets can learn these general, evolutionarily plausible mutational patterns (Hie et al., Meier et al., Hie et al. 2). This knowledge can be leveraged to predict mutations likely to be beneficial, guiding protein evolution more efficiently. Following training, these models can be used to predict the probability distribution of each amino acid at any given position along a protein sequence, where the probability distribution reflects the knowledge acquired by the models on their training dataset. Positions where the model assigns a higher probability to an amino acid than the wild-type residue are considered more likely than a random pick to yield a positive effect on the protein fitness.
[0274] In line with this, an ensemble of protein language models (Hie et al.,) was used to predict mutations likely to yield a positive effect on the TadA-8e sequence, as combining predictions from multiple models has been shown to increase prediction quality, notably on the task of increasing protein fitness (Hie et al., Meier et al.). Twenty-one evolutionary plausible mutations were identified and were individually tested for their editing pattern in the DNMT1 site (FIGs. 3A-3B and 9A-9BA different editing pattern was observed, favoring A 15, in contrast to the PANCE evolved variants where A15 showed improved efficiency (FIG. 3B).
[0275] Next, variants that showed higher A15 editing efficiency and reduced A 17/16/13 bystander edits were identified and M151E was selected for further evaluation. Since this is an evolutionary plausible mutation, it was decided that this mutation should be cross- referenced the variants obtained by PANCE. The amino acid substitutions at position 151 was evaluated and an enrichment in aspartic acid across the different rounds of evolution was observed (FIGs. 3C-3D). Both glutamic acid and aspartic acid are negatively charged amino acids, suggesting that both independent strategies could identify similar mutations. Both amino acids showed similar editing patterns in the DNMT1 site, with M151D slightly more efficient in position A15 (FIG. 3E).
[0276] The computational analysis of ABE8e's structure highlights a change of interactions for mutant M151E. The wild-type residue M151 formed two hydrogen bonds with C146 and Q154. The mutation M151E allowed an additional hydrogen bond to form between the carboxyl group of glutamate (acceptor) and the amino group of Q154 (donor) (FIG. 8B). Glutamate also introduced a negative charge compared to methionine, potentially changing the local conformation, distances (the measured distance 5.214 A), and interactions between E151 and nucleotide C(25) of the gRNA (FIG. 3F).
Example 4: agRNA, PANCE and ML variants outperform current base editing variants [0277] agRNAsei 14-tevopreqi was combined with ABE variants (FIGs. 7C and 9B), herein referred to as ABExl ABEx2, ABEx3, and ABEx4. ABExl and ABEx2 were generated in the PANCE experiment, ABEx3 using ML, and ABEx4 as the combination of both techniques (FIG. 4A). All the ABEx-spCas9-SpRY variants showed improved efficiency and reduced bystander editing when combined with the agRNA at the target site.
[0278] ABExl (V28C) demonstrated higher editing efficiency at position Ar, and reduced bystander editing compared to the control, using both sgRNA and agRNA (FIG. 4B) (FIGs. 10A-10D). ABEx2 (L34W) achieved precise editing and exhibited the same efficiency as ABE8e-SpRY at position A13, while also minimizing bystander editing at position A17/16/15 (FIG. 4B). ABEx2 showed precise editing. ABEx3 (M151E), also reduced bystander editing when combined with agRNAseiu-tevopreqi and increased efficiency at position A15 (FIG. 4B). Similar editing patterns were found when the ABEx-spCas9-WT variants were used (FIGs. 4C and 10B), but with higher editing efficiencies. To test a combination of mutations, double mutant ABEx4 (V28C and M151E) was generated. Similar to ABE9, the combination of two mutations on the deaminase seemed to abolish the editing activity in this site. However, ABEx4 was functional, even showing higher editing efficiency and bystander reduction, outperforming ABE9 in the HEK site3 (FIGs. 11A-1 ID).
[0279] A previous strategy to remove bystander editing using the PAMless SpRY variant of the spCas9 was based on the design of guides that moved the editing window to isolate the target nucelobase [Alves]. Similarly, here, the ability of ABE8e-spCas9-SpRY to isolate A13 after moving the editing window downstream (-1, -2, -3 bp) and upstream (+1 bp) of the control sgRNA was tested, and it was found that ABE8e-spCas9-SpRY was not able to isolate A13 with either the sgRNActrior with agRNAseiu-tevopreqi. In comparison to ABE8e- spCas9-SpRY, variant ABExl showed increased A13 editing with both centered and +1 sgRNA and agRNAseiu-tevopreqi (FIG. 12). This result highlights the importance of targeted
design of guide RNAs and their further combination with evolved nucleobase editors (e.g., ABExl) in order to find the best combination (e.g., a nucleobase editor and agRNA pair) to fix a particular mutation to maximize efficiency and safety.
[0280] ABExl with both sgRNActri and agRNAseiu-tevo reqi showed no significant increase in Cas9-dependant (FIGs. 13A-13E) and independent (FIGs. 13F-13H) off-target editing, despite the increased on-target activity in position A 13.
[0281] To further investigate the AB Ex variants in different contexts, the editing pattern of the nucleobase variants was analyzed using a library with -12,000 different pathogenic contexts (NGG, NG- PAMs) (FIGs. 4D, 14, and 15A-15B) (Arbab). ABExl and ABEx3 exhibited higher editing efficiency than the ABE8e-spCas9-SpRY and narrower editing windows. ABEx2 also reduced bystander editing. However, the L34W mutation impacted the enzyme efficiency. ABEx4 was more efficient than ABE9. However, both double mutants exhibited a big decrease in their efficiency. Additionally, it was observed that AB Ex variants have position A15 as the preferred edit while ABE9 optimized position A16 (FIGs. 4C, 14, and 15C). The effect of the number of adenosines (As) in the editing window was also assessed and a similar editing pattern was observed between the nucleotide variants (FIGs. 15C-15F). Overall, it was determined that the AB Ex mutants can work in a big range of contexts with minimal C to A/G/T editing (FIG. 15H).
[0282] The editing fold change, normalized versus the ABE8e-SpRY, revealed that the evolved variants described herein were highly efficient, even when compared with variants previously described, like ABE8e and ABE9 (FIG. 4D). Evolution of nucleobase editors (by ME models and/or evolution systems), in combination with target specific agRNAs, has the potential to revolutionize personalized medicine approaches.
Example 5: Methods
Bacterial media and reagents base editing variants
[0283] LB and 2xYT media were generated using MP Biomedicals™ media capsules according to the manufacturer’s protocol. For LB and 2xYT agar, 16 g/L agar was added for standard, and 7 g/L agar was added for soft agar. All media was sterilized by autoclaving. agRNA library generation Placeholder
[0284] In some embodiments, the agRNA library consisted of an upstream binding sequence (UBS) that was the reverse complement of the downstream sequence of the target sequence (also referred to as “the target” or “target”) of the nucleobase editor, a counterloop, and a
downstream binding sequence (DBS) that bound the upstream sequence of the target. The upstream and downstream binding sequences were of different lengths and had different binding regions in the 1 to 11 bp region upstream and downstream of the target. The counterloop library consisted of 33 different DNA sequences, of which, the longer ones form GC rich hairpins. The final library contained every possible combination of an UBS, counterloop, and DBS.
[0285] Python script for the generation of agRNA libraries for specific contexts. Sequences disclosed within Python script include TACCAGGACCCGCTCAATGTC (SEQ ID NO: 78), GCGCGGCTTCGCGC (SEQ ID NO: 79), GCGCGGCTCGCGC (SEQ ID NO: 80), GCGCGCTTCGCGC (SEQ ID NO: 81), GCGCGTTCGCGC (SEQ ID NO: 82), GCGCGTCGCGC (SEQ ID NO: 83), CGCGGCTTCGCG (SEQ ID NO: 84), CGCGCGCGGCTCGCG (SEQ ID NO: 85), CGCGCTTCGCG (SEQ ID NO: 86), CGCGTTCGCG (SEQ ID NO: 87), and GCGGCTTCGC (SEQ ID NO: 88): import openpyxl input_sequence = "TACCAGGACCCGCTCAATGTC" loop_sequences = ["GCGCGGCTTCGCGC", "GCGCGGCTCGCGC", "GCGCGCTTCGCGC", "GCGCGTTCGCGC", "GCGCGTCGCGC", "CGCGGCTTCGCG", "CGCGCGCGGCTCGCG", "CGCGCTTCGCG", "CGCGTTCGCG", "CGCGTCGCG", "GCGGCTTCGC", "GCGGCTCGC", "GCGCTTCGC", "GCGTTCGC", "GCGTCGC", "CGGCTTCG", "CGGCTCG", "CGCTTCG", "CGTTCG", "CGTCG", "GGCTTC", "GGCTC", "GCTTC", "GTTC", "GTC", "CCCC", "GGGG", "CCC", "GG", "CC", "GG", "C", "G"]
[0286] # Reverse complements the input sequence complement = {'A': T, T: 'A', 'C': 'G', 'G': 'C'} reverse_complement = ".join([complement[base] for base in input_sequence[::-l]]) UBS = set() DBS = set() # Extract all possible base sequences from the first 10 bases of the reverse complement of the input sequence, and add them to the UBS setstick for i in range(lO): for j in range(i+l, 11): UB S . add(rever se_complement [i : j ] )
[0287] # Extract all possible base sequences from the last 11 bases of the reverse complement of the input sequence, and add them to the DBS set for i in range(ll, 21): for j in range(i+l, 22): DBS.add(reverse_complement[i:j]) # Convert the UBS and DBS sets to lists, and remove any duplicatesUBS = list(set(UBS)) DBS = list(set(DBS)) # Create a list to store all possible combinations of UBS+Loop+DBS combinations = []
[0288] # Iterate over each UBS sequence, loop sequence, and DBS sequence, and combine them to form a new sequence for u in UBS:
for 1 in loop_sequences: for d in DBS: new_sequence = u + 1 + d combinations . append(ne w_sequence)
[0289] # Write the combinations to an Excel file workbook = openpyxl.Workbook() worksheet = workbook. active for combination in combinations: worksheet.append([combination]) workbook.save("combinations.xlsx") print("Unique Base Sequences (UBS):") print(UBS) print("\n") print("Unique Downstream Base Sequences (DBS):") print(DBS)print("\n") print("All possible combinations of UBS+Loop+DBS : ") print(combinations)
[0290] The agRNA for DNMT1 was ordered as Agilent DNA Oligo Pool. The oligos for the DNMT1 library contained a gibson overhang, gRNA, gRNA scaffold, agRNA library and a terminator followed by a short DNA sequence used as primer binding site. The target for the DNMT1 library was already cloned on the plasmid used as backbone. The DNA Oligo Pool library was amplified with the oligos Lib_F and Lib_R via PCR. The backbone pU6- tevopreql-GG- acceptor (Addgene #174038) was PCR amplified using the oligos SplitF and SplitR. The PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol. The fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol. 2 pL of the Gibson assembly mix are directly transformed into Lucigen's Endura Competent Cells and after recovery on 1ml of SOC media, plated on Carbenicillin/Agar plates poured in Nunc™ Square BioAssay Dishes (Cole Palmer #EW-01929-00). For cloning into the backbone LentiGuide- Hygro (Addgene #139462), the library was amplified from the pU6-tevopreql-GG-acceptor. The backbone was digested using PspXI and Esp3I and cloned by Gibson assembly following the previously described protocol.
[0291] For each library at least lOx library coverage of colonies were washed off the plates using LB media and then spun down. The plasmid DNA was extracted using the QIAGEN® Plasmid Plus Midi Kit according to the manufacturer’s protocol from the resulting pellet.
DNMT1 library testing in HEK cells
[0292] HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37°C and 5% CO2. 20 million cells were seeded in a 225mm3 dish and co-transfected the day after using Lipofectamine 3000 (Thermo Fisher) with library plasmid amount corresponding to 1
plasmid per cell and 20 ug of nucleobase editor pCMV-T7-ABE8e-nSpCas9-P2A-EGFP (KAC978) (Addgene #185910). Genomic DNA was collected from cells 5 days after transfection.
Context library generation
[0293] For the A to G nucleobase editor, -12,000 different gRNAs targeting pathogenic relevant mutations (Arbab et al. 2020) were cloned as a library to test the performance of different hairpins and nucleobase editors.
[0294] The generation of these context libraries differed from the generation of the agRNA, since extensive recombination events occurred when the gRNA, gRNA scaffold, agRNA and target were introduced as one oligo. The gRNA and target with 11 bp upstream and 25 bp downstream of the native genomic context were cloned as an oligo lacking the gRNA scaffold and hairpin. Instead of these, the oligo had 2 outward facing Bsal cutting sites with 10 randomized base pairs at that position. The DNA Oligo Pool libraries are amplified with the oligos Lib_F and Lib_R via PCR. The backbone sgBbsI (p2Tol-U6-2xBbsI-sgRNA- HygR) (Addgene #71485) was PCR amplified using the oligos BB_R and BB_F. The PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol. The fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol. 2 pL of the Gibson assembly mix are directly transformed into Lucigen's Endura Competent Cells and after recovery plated on agar plates poured in Nunc™ Square Bio As say Dishes. For each library at least lOx library coverage of colonies were washed off the plates using LB media and then spun down. The plasmid DNA was extracted using the QIAGEN® Plasmid Plus Midi Kit according to the manufacturer’s protocol from the resulting pellet.
[0295] The library was then digested using Bsal according to the manufacturer’s protocol and gel purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol. The gRNA scaffold, hairpin and terminator with an inward facing Bsal cutting site up- and downstream were ordered as cloned gene synthesis from IDT. The plasmid was also Bsal digested and the fragment was purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol. The insert and library were ligated using New England Biolabs T4 DNA Ligase according to the Manufacturer’s protocol with 150 ng backbone and a 10:1 ratio of the insert to the backbone.
2 p L of the ligation mix were directly transformed into Lucigen's Endura Competent Cells and after recovery plated on agar plates poured in Nunc™ Square BioAssay Dishes. For each library at least lOx library coverage of colonies were washed off the plates using LB media and then spun down. The plasmid DNA was extracted using the QIAGEN® Plasmid Plus Midi Kit according to the manufacturer’s protocol from the resulting pellet.
Path_Var library testing in HEK cells
[0296] HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO2.
[0297] For stable Tol2 transposon-mediated library integration, 5 million cells (-400X coverage) were seeded in 175 mm3 dishes. The following day, cells were co-transfected using Lipofectamine 3000 (Thermo Fisher) with 10 ug of Tol2 transposase plasmid (pCMV-Tol2 Addgene # #31823) and 10 ug of Path_Var library. To generate stable library cell lines, cells were selected with hygromycin (25 ug/ml) starting the day after transfection and continued for > 2-3 weeks. Following, lOug of nucleobase editor was transfected using Lipofectamine 3000 to 2.5 million cells (-200X coverage) were seeded the day before in a 100 mm3 dish. Genomic DNA was collected from cells 5 days after transfection.
Illumina sequencing and bioinformatic analysis of the libraries
[0298] To sequence the libraries before (as quality control) and after testing in the HEK cells, Ing of the isolated plasmid DNA (QuickExtract DNA Extraction Solution Bioserch Technologies) was amplified using the oligo mix IllSeq_DNMTl_i5_Fl-4 and IllSeq_DNMTl_i7_Rl-4 using New England Biolabs Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol. The forward and reverse oligo contained an i5/i7 overhang for indexing as well as 4-7 Ns to ensure shifting of the sequence to be able to sequence the sequences with high identity. The resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs). The PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via Invitrogen™ Qubit™. A 4 nM Pool of the different libraries was generated and 10% 4 nM PhiX was added. The Pool was sequenced using the Illumina MiSeq Reagent Kit v2 (300-cycles) according to the manufacturer’s protocol.
Genome editing of genomic loci
[0299] HEK293T, HeLa and HepG2 were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO2. 20.000 cells were seeded in 96 well plates (Corning) and transfected the day after using jetOPTIMUS® (Polyplus) following manufacturer instructions. 50 ng of sgRNA or agRNA (both cloned in pU6-tevopreql-GG-acceptor) with 150 ng of nucleobase editor were co-transfected, and cells were harvested after three days for Sanger sequencing (Genewiz) or high throughput sequencing (Quintara Biosciences or in house Illumina miSeq).
Evaluation of the anchor library
[0300] Efficiency and precision of the nucleobase editor, in combination with the anchor sequences, were evaluated using custom scripts developed in R. The quality of the reads from NGS samples was assessed before further processing. Variant calling techniques were then applied to distinctly identify the perfect edit — conversion of adenine at position 13 to guanine — apart from bystander edits, which encompassed any conversion of the other adenines or combinations involving A13. Samples with anchor sequences yielding fewer than 20 reads were excluded to ensure robustness in the data analysis. Furthermore, a quantitative score was devised and calculated using the following formula:Score = (% Perfect Edit) / ((% Perfect Edit + % Bystander)2). Anchors achieving the highest scores and demonstrating at least 20% overall editing efficiency were further characterized experimentally.
Evaluation of the context library
[0301] The evaluation of the context library involved analyzing gRNA and agRNA libraries, which comprised approximately 12,000 spacer sequences and their respective contextual sequences. The efficiency and editing profiles for each gRNA and agRNA were established using custom scripts developed in R. First, the target sites — where each spacer binds within the context — were extracted from the NGS reads. Subsequently, for each spacer in the library, all combinations of adenine to guanine conversions were aligned against these extracted sequences. Spacers with fewer than 25 total reads were excluded from the analysis. To quantify overall editing efficiency for the different nucleobase editors, the mean A to G conversion rate was calculated by averaging the editing frequencies at each targeted position.
Generation of the selection phages for PANCE
[0302] The PANCE selection phages are carrying the CDS for the ABE8e adenine deaminase instead of the CDS of Pill. The ABE8e adenine deaminase has part of the peptide linker sequence and a C-terminal fused intein CDS to enable it to encode the relatively small protein and not the whole nucleobase editor. The phages were generated by PCR amplifying the ABE8e adenine deaminase including the partial sequence of the peptide linker using the oligonucleotides ABE_M13_F and ABE_M13_R. The N-terminal Npu DnaE intein was ordered as gBlock and amplified using the oligonucleotides Npu_ABE_F and Npu_M13_R. The phage backbone was amplified using the oligonucleotides GOI_M13_F and G0I_M13_R using a wildtype M13 phage genomic DNA as a template. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C. All fragments were digested with Dpnl (NEB) overnight at 37 °C in the PCR buffer and PCR purified using the NEB Monarch® PCR & DNA Cleanup Kit the next day. The fragments were assembled using the NEB Gibson Assembly® Master Mix according to the manufacturer’s protocol and 3 pL of the reaction were directly transformed into electrocompetent S2060 pJC175e competent cells. The cells were recovered in 500 pL SOC media for 45 min and after that 450 pL and 50 pL were mixed each with 500 pL of freshly grown S2060 pJC175e cells. The cells were immediately mixed with 3 mL soft LB-agar (0.7 %) and plated on LB bottom agar plates containing 100 pg/mL carbenicillin. The plates were incubated at 37 °C overnight. Plaques were picked into 50 pL 2xYT media and 1 pL was used as a template for colony PCR using the oligonucleotides ABE_M13_F and Npu_M13_R. Positive phages were amplified by adding the remaining 2xYT media to a freshly grown S2060 pJC175e culture at the OD600 of 0.4 and cultivating for 16 hours at 37 °C. The cultures were spun down to remove the E. coli cells and the phages were precipitated by adding a 20% polyethylene glycol (8000) and 2.5M sodium chloride solution in a 1:4 ratio to the culture supernatant. The mixture is incubated for at least 3 hours at 4 °C and the phage pellet is resuspended in a PBS buffer, the phage titer was quantified using the Progen Phage Titration ELISA kit and the phages were stored at 4 °C until usage. Additionally, 3 mL of the culture supernatant were used for phage DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit. The isolated DNA was sent to Plasmidsaurus for whole phage DNA sequencing.
Generation of the selection cells for PANCE
[0303] The selection plasmids were designed on the basis of using pJC175e and adding mutations that when edited by the ABE base editor, only perfect edits restore Pill activity while bystander lower pill activity. The pJC175e backbone was amplified using the oligonucleotides pIII_gBlock_R and pJC175e_Cas_F. The part of the pill CDS containing the mutation followed by the corresponding guide correcting the introduced mutation downstream as well as the C-terminal DnaE intein necessary to fuse the ABE8e adenine base editor encoded by the phage to the Cas9 encoded by the selection plasmid were ordered as gBlock. The three different gBlocks for the three different selection plasmids each encoding a different pill mutation were amplified via PCR using the oligonucleotides gBlock_R and gBlock_pIII_F. The base Cas9 CDS was amplified from ABE8e plasmid (Addgene #138489) using the oligonucleotides BE_Npu_F and BE_pJC175e_R. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C. All fragments were digested with Dpnl (NEB) overnight at 37 °C in the PCR buffer and PCR purified using the NEB Monarch® PCR & DNA Cleanup Kit the next day. The fragments were assembled using the NEB Gibson Assembly® Master Mix according to the manufacturer’s protocol and transformed into electrocompetent S2060 competent cells. The cells were recovered in 500 pL SOC media for Ih and after that plated on LB agar plates with 100 g/ML carbenicillin and incubated overnight at 37 °C. Colonies were screened via colony PCr and positive clones were sent to whole plasmid sequencing. The clones with verified sequence were used to generate electrocompetent cells that were then transformed with the mutation plasmid MP4 (Badran). The cells were recovered in 1 mL SOC media and plated on 2xYT agar plates containing 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. 5 colonies were used to start 50 pL shake flask 2xYT 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol cultivations. The cultivations were used to freeze 20 % glycerol stocks in 1 mL aliquots after 16 hours. Each culture was also used to isolate plasmid DNA for whole plasmid sequencing by Plasmidsaurus to select the glycerol stocks with no mutation in MP4 and the selection plasmid.
Phage Assisted A on -Continuous Evolution (PANCE)
[0304] The evolution was performed as 10 consecutive batch cultivations in triplicates using a mix of three different selection plasmids in each evolution. For the PANCE experiment, the day prior to the cultivation three 3 mL overnight cultures are prepared using 2xYT media
with 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. The cultures are inoculated with the glycerol stock of one of the selection plasmids each. The following day, 3-4 hours prior to phage infection 50 mL shake flasks are inoculated with a combined OD600 of 0.1 of the pooled overnight cultures with the different selection plasmids. The cells are cultivated in 2xYT with 100 pg/mL carbenicillin and 25 g/mL chloramphenicol. 30 minutes prior reaching an OD of 0.4, the cells are induced with 0.5 % arabinose and when the cells reach the OD600 of 0.4, the cells are infected with the selection phages at an MOI of 1 for the first selection round. The evolution is performed for 12 hours at 37 °C and after that the entire cultivation was spun down and the supernatant was filtered with 0.2 pM filters. For selection round 2-4, 500 pL, for round 5-6 100 pL, and for the remaining rounds 5 pL of the supernatant were used to infect the following evolution. The phage titer after each selection round was determined using the Progen Phage Titration ELISA kit. 3 mL of each culture supernatant was used for phage DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit.
Illumina sequencing of the PANCE experiment
[0305] To sequence the PANCE variants, Ing of the isolated plasmid phage DNA of each selection round was amplified using the oligo mix IllSeq_ABE_i5_Fl-4 and IllSeq_ABE_i5_Rl-4 using New England Biolabs Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol. The forward and reverse oligo contained an i5/i7 overhang for indexing as well as 4-7 Ns to ensure shifting of the sequence to be able to sequence the sequences with high identity. The resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs). The PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via Invitrogen™ Qubit™. A 4 nM Pool of the different libraries was generated and 10 % 4 nM PhiX was added. The Pool was sequenced using the Illumina MiSeq Reagent Kit v3 (600-cycles) according to the manufacturer’s protocol.
Evolutionary plausible mutations prediction with machine learning
[0306] We followed the approach used by Hie et al., which consists of using an ensemble of six protein language models: ESM-lb (Rives et al.) and ESM-lv (Meier et al.), wherein ESM-lv is composed of five models (accessible at: github.com/facebookresearch/esm). Together, the six models were used to predict what amino acids would be more likely to fill a
particular amino acid position of an “input” protein sequence other than the wild-type amino acid itself. The number of models in the ensemble (e.g., 4 out of 6, versus 2 out of 6) that agreed on a given prediction (i.e., a specific amino acid substitution) determined the score that was given to a predicted substitution, and a higher score was more likely to yield a positive result (e.g., an evolved protein that retained function/activity).
[0307] The ensemble of protein language models was applied to the TadA-8e sequence, which yielded the following predictions (score in parenthesis): R26G (6), F84L (6), N108D (6), Y149F (6), F156K (6), V106A (5), P152R (5), H8D (5), N157K (5), R111T (4), C146S (3), R111A (2), C146Q (2), M151E (2), C146K (1), Y123H (1), M151Q (1), A48P (1), V155E (1), S109P (1), P152Q (1).
Computational analysis of structural impact on ABE8e
[0308] The structure of ABE8e (PDB: 6vpc) was visualized and analyzed using the software ChimeraX (version 1.6.1 (2023-05-09)). Structural models of the ABE mutants described herein were generated from the structure of ABE8e (PDB: 6vpc) using the mutagenesis feature on the software ChimeraX (version 1.6.1 (2023-05-09)).
[0309] Using the command line in ChimeraX, interactions of target residues were visualized, mutated, and analyzed. The interactions of nucleotides 5-8 in the gRNA were also assessed with residues of ABE8e. Furthermore, hydrogen bonds, non-polar (van der Waals) interactions between carbon atoms in the gRNA and protein at a maximum distance of 3.8 Angstroms, and cationic interactions between nitrogen atoms within 5 A of an aromatic carbon involving the target residues and nucleotides were identified.
Software
[0310] Statistic and plots: PRISM 10 (Graphpad Software LLC) Graphical abstracts were generated with Biorender.com.
References for Examples 1-5
1. Walton, Russell T et al. Unconstrained genome targeting with near-PAMless engineered CRISPR Cas9 variants. Science. 368(6488)290-296(2020).
2. Chen L, Zhang S, Xue N, et al. Engineering a precise adenine base editor with minimal bystander editing. Nat Chem Biol. 19(1), 101-110 (2023).
Sternberg, S. H., LaFrance, B., Kaplan, M. & Doudna, J. A. Conformational control of DNA target cleavage by CRISPR-Cas9. Nature. 527, 110-113 (2015) Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell. 156, 935-949 (2014). Epinat, J.-C. et al. A novel engineered meganuclease induces homologous recombination in yeast and mammalian cells. Nucleic Acids Res. 31, 2952-2962 (2003). Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371-376 (2017). Liu, Z. et al. Efficient base editing with high precision in rabbits using YFE-BE4max. Cell Death Dis. 11, 36 (2020). Doman JL, Raguram A, Newby GA & Liu DR Evaluation and minimization of Cas9 independent off target DNA editing by cytosine base editors. Nat. Biotechnol. 38, 620- 628 (2020). Zhou C et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature. 571, 275-278 (2019). Zhao, Dongdong et al. Imperfect guide-RNA (igRNA) enables CRISPR single-base editing with ABE and CBE. Nucleic acids research. 50(7), 4161-4170 (2022). Alves CRR, Ha LL, Yaworski R, et al. Base editing as a genetic treatment for spinal muscular atrophy. Preprint. bioRxiv. 2023.01.20.524978 (2023). Chu, V. T. et al. Increasing the efficiency of homology-directed repair for CRISPR-Cas9 induced precise gene editing in mammalian cells. Nat. Biotechnol. 33, 543-548 (2015). Carlson-Stevermer, J. et al. Assembly of CRISPR ribonucleoproteins with biotinylated oligonucleotides via an RNA aptamer for precise gene editing. Nat. Commun. 8, 1711 (2017). Miller, Shannon M et al. Phage-assisted continuous and non-continuous evolution. Nature protocols 15 (12), 4101-4127(2020). Lapinaite, A et al. DNA capture by a CRISPR-Cas9-guided adenine base editor. Science 369,566 571(2020). Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. (2020). Weiss, Gregory A et al. Comprehensive mutagenesis of the C-terminal domain of the M13 gene-3 minor coat protein: the requirements for assembly into the bacteriophage particle. Journal of molecular biology . 332(4), 777-82 (2003).
18. Badran, A., Liu, D. Development of potent in vivo mutagenesis plasmids with broad mutational spectra. Nat Comm. 6, 8425 (2015).
19. Arbab, Mandana et al. Determinants of Base Editing Outcomes from Target Library Analysis and Machine Learning. Cell vol. 182(2), 463-480 (2020).
20. Sanchez-Rivera, Erancisco J et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nature biotechnology. 40(6), 862-873 (2022).
21. Carlson JC, Badran AH, Guggiana-Nilo DA & Liu DR. Negative selection and stringency modulation in phage-assisted continuous evolution. Nat Chem Biol. 10, 216-222 (2014).
22. Esvelt KM, Carlson JC & Liu DR. A system for the continuous directed evolution of biomolecules. Nature. 472, 499-503 (2011).
23. Hubbard BP et al. Continuous directed evolution of DNA-binding proteins to improve TALEN specificity. Nat Methods. 12, 939-942 (2015).
24. Packer MS, Rees HA & Liu DR. Phage-assisted continuous evolution of proteases with altered substrate specificity. Nat Comm. 8, 956 (2017).
25. Lahens, Nicholas L., et al. A comparison of Illumina and Ion Torrent sequencing platforms in the context of differential gene expression. BMC genomics. 18, 1-13(2017).
26. Wang JY, Doudna JA. CRISPR technology: A decade of genome editing is only the beginning. Science. 379, 251 (2023).
Example 6:
[0311] Unlocking the full potential of base editing for therapeutic applications requires the development of a robust and adaptable methodology for designing novel base editing strategies that are both efficient and precise. Accordingly, the present disclosure describes a multifaceted approach to designing a base editing system that minimizes bystander editing while maintaining high efficiency. In particular, the methods described herein integrate three complementary techniques: (1) gRNA engineering to reduce bystander editing, (2) phage- assisted non-continuous evolution (PANCE (17)) to selectively evolve base editors with improved precision, and (3) protein language models to rationally engineer deaminases with optimized activity.
[0312] Lor gRNA engineering, a library of 3 '-extended sgRNAs, or anchor-guide RNAs (agRNAs), was designed and tested to improve the precision of ABEs. agRNA candidates from this library screening were then used as part of a Phage Assisted Non-Continuous Evolution (PANCE) system to evolve a more precise TadA-8e enzyme. Using a dual
selection pressure (favoring precise editing at the target site while penalizing bystander edits) several variants were identified with narrowed editing windows. Notably, the PANCE- evolved V28C variant exhibited enhanced on-target efficiency while reducing bystander editing. Editing patterns across ~12K pathogenic mutations demonstrated that V28C is ~2-3 fold more precise and -20% more efficient than ABE8e.
[0313] Machine learning (ML) was applied to computationally design TadA-8e mutants with improved precision and efficiency. Among these, the M151E mutation narrowed the editing window while increasing on-target editing.
Example 7: 3 '-gRNA extensions restrict editing window and reduce bystander editing [0314] Without being bound by theory, the mechanism underlying the adenosine deamination process involves that the TadA-8e domain engages with the exposed single- stranded region of the PAM-distal nontarget strand (18) (FIG. 21A). The TadA-8e deaminase, when attached to the Cas9 protein, induces specific editing patterns within narrow DNA regions, covering several base pairs. This connection limits the enzyme to act on certain nucleotides, defining what is known as the editing window.
[0315] Thus, without being bound by theory, the present disclosure describes anchor guide RNAs (agRNAs) that stabilized the DNA strand within the active site. The agRNAs were designed by adding nucleotides to the 3' end of the gRNA scaffold. The agRNAs were designed as a library of short sequences and entire hairpin structures (counter-loops) at the opposite site of the editing loop to introduce structures that may sterically restrict the movement of the DNA and TadA enzyme even further (Fig. 1A, Fig. SIB). This design process yielded an agRNAs library containing -60K candidates. To screen the library in a high throughput manner, a lentiviral vector was constructed with the editing target downstream of the agRNA. With the agRNA and the target being in a 300-nucleotide window, it was possible to perform next-generation sequencing (NGS) on the sequence in this region after the editing process and analyze the editing pattern for each library clone (FIG 16B).
[0316] The tested library was designed to target a site in the human DNMT1 locus, being an optimal candidate for screening, both for high accessibility for editing in HEK293T cells, and the multiple adenines context within the editing window. The ABE8e-spCas9-WT base editor in combination with a non-3 'extended guide (sgRNACtrl) showed a high editing efficiency for the four adenines in the editing window (A13, A14, A15 and A17). Next a library was designed to edit A13.
[0317] First, agRNAs (termed “anchors”) were selected that showed higher efficiency and lower bystander editing in the DNMT1 agRNA library (FIG 21C). Five candidates were selected to test in the native context in HEK293T cells, and all of them showed a decrease in bystander editing (FIG 16C).
[0318] A precision score was introduced: Precision Score (PreS), PreS = (% On-target edit/Avg bystander editing) * (% On-target edit Variant / % On-target edit ABE8e), to select those clones that showed higher on-target editing and reduced bystander editing. Clone 56114 (agRNA56114) showed higher PreS in A 13, with an editing reduction of 44% in A 17 and 34% in A16 (FIGs. 16C and 16D, and FIG. 21D). NGS abundance analysis showed an increase of 2.4-fold (± 0.2) in A13/A15 editing and 1.5-fold (± 0.18) in A13. (Fig. S1E). To test the reproducibility of this effect in different cell lines, the agRNA56114 was tested in Hela and HepG2 cells, obtaining similar bystander reduction patterns (FIG. 21F).
Example 8: Precision-driven selection circuit enables evolution of highly accurate adenine base editors
[0319] To further refine the editing, the enzyme TadA-8e was evolved to further decrease bystander edits and to evaluate how the combination of new variants and agRNA could impact the editing pattern.
[0320] Phage-assisted non-continuous evolution (PANCE) has been used in the past to increase the activity of ABEs (19). A selection method was designed to evolve variants with decreased bystander edits and that worked with agRNAs. Specifically, a selection pressure was designed that decreases the phage titer in response to bystander edits, but also increases phage titer upon “perfect editing” (e.g., editing the “target” base), to prevent the evolution of an inactive TadA enzyme. To achieve this, the activity of the base editor encoded on the M13 phage genome was linked with the expression of the gene III (encoding for the pill protein) that is required for phage replication (20). Since the effect of pill amino acid exchanges on phage infectivity is well characterized (21), groups of 2-3 amino acids were identified that drastically decrease phage replication. Toward this goal, the codons of selected amino acids in pill were mutated to be T/A rich, which make these contexts responsive to A«T-to-G«C editing. Three contexts were identified in which bystander edits would result in a decrease in phage titer, while the perfect edit generates an adaptive mutation that also selects for targeted efficiency (FIG 17A, FIG. 22A). Each identified pill variant was cloned into individual plasmids (SP1-3) (Fig. 2B). The accessory plasmid encoded the corresponding agRNA targeting the Single Nucleotide Variant (SNV), as well as a C-terminal Intein-nCas9-SpRY.
Once the selection phage carrying the N-terminal Intein-TadA-8e fusion protein infects the cell, a functional base editor can be expressed, capable of performing the editing.
[0321] Phage DNA was isolated from each PANCE round and sequenced using NGS (FIG. 22B). An increase in phage titer was observed after round 4, which suggested an increased activity of the enzyme mutants encoded by the phage (FIG. 22C).
[0322] The 3 replicates showed almost identical enrichment of the same variant over the course of the evolution. The sequencing data also showed a slow enrichment of different mutants with amino acid exchanges, with a maximum enrichment of 3-4% of the new amino acid compared to the wildtype (FIG. 22D).
[0323] Once the TadA mutational landscape was determined, the top 50 most enriched mutations were selected and individually tested the editing pattern in the human DNMT1 locus in HEK293 cells (FIG. 17C). Variants were detected with both higher efficiency and reduced bystander editing pattern at position A13, when compared with the Abe8e-SpRY BE (FIG. 17C). The PANCE evolved TadA variants were also benchmarked against the ABE9 base editor (both WT and SpRY) that showed low editing efficiency and no relative bystander reduction in the DNMT1 site (FIG. 17C). Variants displaying V28C (SEQ ID NO: 15) and E34W (SEQ ID NO: 16) showed the highest PreS with both sgRNACtrl and agRNA56114 (FIG. 17B and 17C).
[0324] NGS abundance analysis demonstrates that ABE8e rarely achieved perfect editing (FIG.17D). Unlike the ABE8e-WT, the SpRY version did not reduce bystander editing with the agRNA56114, however increased the editing efficiency (FIG.17D). The primary outcome for both V28C (SEQ ID NO: 15) and E34W (SEQ ID NO: 16) was the perfect editing, with an average of 9.3% and 10.9%, respectively, when used with the sgRNACtrl (FIG.17D). V28C, in combination with agRNA56114, not only reduced bystander editing but also increased perfect editing to 24.4%. L34W increased to 18% the on-target editing and kept all bystander editing below 5% (FIG.17D). 80.8% (± 0.5) of the reads for L34W in combination with agRNA56114 were the perfect editing (Fig. 2E). V28C (SEQ ID NO: 15) with agRNA56114 (SEQ ID NO:2) showed 53.4% (±4.6) of the reads with perfect edits and the highest fold-change improvement when compared with ABE8e sgRNACtrl (47.24 ±1.95) (Fig. 2F). Both variants exhibited enrichment across all rounds of PANCE evolution (FIG.17G). Cas9-WT base editors with these mutations also showed increased precision (FIG. 22E).
[0325] V28C’s (SEQ ID NO: 15) increased on-target editing was accompanied by DNA Cas9-dependent off-target editing, similar to ABE8e-SpRY, across four sites (FIG.17H).
Next, an orthogonal R-loop assay was performed to assess the Cas9-independent on target editing, and observed a substantial decrease in both variants across 5 different sites (FIG.171). RNA off-target editing was analyzed by RNA-sequencing. The use of the anchor reduced the A-to-I deamination by 3.6-fold (FIG.22F). When combined with the evolved variants, the reduction was even higher with a 14.2 and 22.7-fold for V28C (sgRNACtrl and agRNA56114) and 18.33 and 21.9-fold for L34W (FIG.22F).
Example 9: Machine Learning-guided design reveals overengineering constraints in ABE8e and unveils novel precise mutations
[0326] Improving protein fitness traditionally involves screening a vast library of random mutations and selecting those that enhance a desired function under specific pressures (directed evolution). However, the PANCE experiment demonstrated that even with increased fitness, specific mutational patterns across protein families were essential for evolvability. In general, most random mutations would likely be destabilizing or neutral for protein function (FIG.18 A).
[0327] In contrast, protein language models trained on massive, non-redundant protein sequence datasets learn these general, evolutionarily plausible mutational patterns (22, 23). This knowledge can be leveraged to predict mutations likely to be beneficial, guiding protein evolution more efficiently. Following training, these models can be used to predict the probability distribution of each amino acid at any given position along a protein sequence, where the probability distribution reflects the knowledge acquired by the models on their training dataset. Positions where the model assigns a higher probability to an amino acid than the wild-type residue are considered more likely than a random pick to yield a positive effect on the protein fitness. This offered a potential alternative to traditional directed evolution methods.
[0328] In line with this, an ensemble of protein language models (PLM) (22) was used to predict mutations likely to yield a positive effect on the TadA-8e sequence, as combining predictions from multiple models has been shown to increase prediction quality, notably on the task of increasing protein fitness (22, 23).Twenty-one evolutionary plausible mutations were identified that were individually tested for their editing pattern in the DNMT1 site (FIG.18A). A different editing pattern was observed, favoring A15, in contrast to the PANCE evolved variants where A15 showed improved efficiency (FIG.18B). M151E showed the higher PreS towards A15 with both sgRNACtrl and agRNA56114. Nine of the twenty-one predicted mutations reverted to the amino acid found in the ancestral TadA (FIG.18C). Eight
of these reversions (A48P, F84L, V106A, Y123H, C146S, P152R, V155E and N157K) showed increased PreS, while N108D abolished ABE8e editing at the DNMT1 site (FIG.18C). This data suggested a possible overengineering of the ABE8e enzyme.
[0329] NGS abundance analysis of M151E editing showed increased precision in position A15, with the highest fold-change in combination with the and agRNA56114 (25.7-fold ± 2.4) and a 15.4-fold (± 1.3) in A13 (FIG.18D and 18E).
[0330] Next, the PANCE derived variant V28C ((SEQ ID NO: 15); highest PreS) was machine-learning evolved” (ML-evolved) using the same approach. The ML-derived mutation with the highest PreS was combined with the PANCE-derived V28C mutation to determine if the ML approach could be additive to PANCE. The V28C-M151E variant (SEQ ID NO: 18) showed reduced editing efficiency at DNMT1 position, but precise edit in other contexts such Site9 (FIG.18G, FIGs. 23A and 23B).
[0331] The M151E mutation was then cross-referenced with the amino acid exchanges observed during the PANCE. When the amino acid substitutions at position 151 were evaluated, an enrichment in aspartic acid was observed across the different rounds of evolution (FIG. 18H and 18G). Both glutamic acid and aspartic acid are negatively charged amino acids, delivering similar editing patterns at the DNMT1 site (FIG. 18J.).
Example 10: ABE8e-V28C achieves superior precision and efficiency across diverse genomic sites
[0332] To further assess the performance of ABE variants across diverse genomic contexts, editing patterns were analyzed across a library of human pathogenic sites, including NGG and NG- PAMs (Fig. 4D)14. Thousands of pathogenic point mutations were integrated that are correctable via adenine base editing in HEK293T cells (FIG. 24A). Overall, the V28C variant (SEQ ID NO: 15) demonstrated a -20% increase in on-target activity (A15) and a 2.5- to 3-fold increase in precision when the editing context contained more than two or three adenines, respectively (FIGs. 19A-19C, FIGs. 24B-24F).
[0333] Other variants that demonstrated high performance at the DNMT1 site, including L34W, D54T, and I95D, also exhibited enhanced precision but with reduced on-target efficiency (FIGs. 19A-19C). Similarly, M151E (SEQ ID NO: 17) improved on-target efficiency but with a wider editing window than V28C (FIGs. 19A-19C, FIGs. 24B-24F ). The addition of the M15 IE mutation to the V28C variant drastically impacted on-target efficiency (FIGs. 24B-24F).
[0334] ABE8e has been reported to favor deamination in YA contexts (Y = pyrimidine; T or C), while RA (R = purine; A or G) contexts are more challenging substrates (16). Although recently developed variants have improved potency and expanded sequence compatibility, they often do so at the expense of precision due to wider editing windows. In contrast, V28C variant (SEQ ID NO: 15) demonstrated superior editing in both YA and RA contexts while maintaining a narrower editing window (FIG. 19D). Significant C-to-N changes across all the sites were not detected when compared to ABE8e (FIG. 24G).
[0335] The library-based findings were validated by benchmarking V28C variant (SEQ ID NO: 15) against ABE8e across 12 different sites in the human genome using HEK293T cells (FIG. 19E-19N, FIGs. 24H and 241). Consistently, V28C variant refined the deaminase’s editing window, improving precision at every tested site (FIG. 19E-19N). Notably, it also exhibited higher editing activity at the most frequently edited adenine (FIG. 19E-19N ). V28C produced a constrained 4- A editing window, yielding an average 27.1% increase in on- target efficiency (FIG. 190 and 19P).
Example 11: V28C drastically improves precise correction of pathogenic variants in iPSCs [0336] NGS abundance analysis revealed that the V28C mutation significantly enriched single-base deamination across multiple loci. Structural modeling of the wild-type enzyme predicts that H57 and C87 establish three van der Waals interactions and two hydrogen bonds with 8-Az(26) (FIG. 20A). Without being bound by theory, in the V28C mutant, the C28 may shift closer to 8-Az(26) to reduce the measured distance to below 5.101 A. In line with this, a 0.77 A decrease in distance between residue 28 and 8-Az(26) was detected (FIG. 20A). This structural rearrangement likely constrains the active site, enhancing substrate specificity while reducing off-target deamination. By restricting the flexibility of the deaminase domain, V28C narrows the editing window and minimizes bystander editing without compromising catalytic efficiency.
[0337] The functional impact of this enhanced precision was assessed by testing the V28C variant (SEQ ID NO: 15) in generating a splicing disruption in PCSK9, a therapeutic target for lowering EDE levels and reducing the risk of coronary heart disease (24). Using a previously validated gRNA targeting the exon 1-intron 1 splice donor site (25), a loss-of- function mutation was introduced (FIG. 20B). Here, even in the absence of bystander editing, V28C variant (SEQ ID NO: 15) exhibited an average editing efficiency of 75.7% (±5.0), outperforming ABE8e by 19.7% (FIG. 20C and 20D).
[0338] To further evaluate the precision of V28C variant (SEQ ID NO: 15) in a clinically relevant setting, the SNCA E46K mutation was targeted, which causes early-onset Parkinson’s disease by promoting a-synuclein aggregation and neuronal toxicity (FIG. 20E). The target sequence shares 45% identity with the DNMT1 site used to identify agRNA56114 (FIG. 20F). As expected, V28C variant (SEQ ID NO: 15) demonstrated superior precision in reverting the pathogenic mutation, achieving 11.6% (±0.8) perfect edits (% of edited reads) compared to just 0.65% (±0.14) with ABE8e, representing a 17.6-fold increase in precision (FIG. 20J). The combination of V28C with agRNA56114 (SEQ ID NO:2) further improved precision, yielding 17.5% (±1.06) perfect edits, a 26.6-fold enhancement over ABE8e alone (FIG. 201- 20J). Introducing the PLM predicted M151E mutation into V28C further increased perfect editing to 47.4% when combined with sgRNACtrl and 53.0 with agRNA56114 (calculated from edited reads).
Example 12: Methods
Bacterial media, reagents and plasmids used in the study.
[0339] LB and 2xYT media were generated using MP Biomedicals™ media capsules according to the manufacturer’s protocol. For LB and 2xYT agar, 16 g/L agar was added for standard, and 7 g/L agar was added for soft agar. All media was sterilized by autoclaving. Oligo s/primers and plasmids used in the study can be found in Table A. All gRNAs were cloned using KLD (NEB) cloning according to the manufacturer’s protocol. agRNA library generation
[0340] The agRNA library consists of an upstream sequence (US) that is the reverse complement of the downstream sequence of the target, a counter-loop and a downstream binding sequence (DS) that binds the upstream sequence of the target. The upstream and downstream binding sequences are of different length and have different binding regions in the 1 to 11 bp region upstream and downstream of the target. The counter-loop library consists of 33 different DNA sequences of which the longer one’s form GC rich hairpins. The final library is a library containing every combination of the possible UBS, hairpin, and DBS combinations. A script to generate the hairpin library for a novel context can be found in supplementary code 1.
[0341] The agRNA for DNMT1 was ordered as Agilent DNA Oligo Pool (64610 oligos). The oligos for the DNMT1 library contained a Gibson overhang, gRNA, gRNA scaffold, agRNA library and a terminator followed by a short DNA sequence used as primer binding site. The
target for the DNMT1 library was already cloned on the plasmid used as backbone. The DNA Oligo Pool library was amplified with the oligos Lib_F and Lib_R via PCR. The backbone pU6-tevopreql-GG-acceptor (Addgene #174038) was PCR amplified using the oligos SplitF and SplitR. The PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol. The fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol. 2 pL of the Gibson assembly mix are directly transformed into Lucigen's Endura Competent Cells and after recovery on 1ml of SOC media, plated on Carbenicillin/Agar plates poured in Nunc™ Square Bio As say Dishes (Cole Palmer #EW-01929-00). For cloning into the backbone LentiGuide- Puro (Addgene #52963), the library was amplified from the pU6-tevopreql-GG-acceptor.
The backbone was digested using PspXI and Esp3I and cloned by Gibson assembly following the previously described protocol.
[0342] For each library at least lOx library coverage of colonies were washed off the plates using LB media and then spun down. The plasmid DNA was extracted using the QIAGEN® Plasmid Plus Midi Kit according to the manufacturer’s protocol from the resulting pellet.
Lentiviral particles generation
[0343] HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37°C and 5% CO2. Cells were seeded at a density of 5 x 106 cells per 10 cm plate in DMEM supplemented with 10% FBS and antibiotics (Thermo # 15240062). The following day, cells were transfected using Transporter 5® Transfection Reagent with a plasmid mix containing pVSV- G (3.86 pg), pPax2 (8.57 pg), and the lentiviral transfer vector (9.23 pg) in Opti-MEM (Thermo Fisher). The DNA-transporter complexes were incubated at room temperature for 20 minutes before being added to the culture media. After 24 hours, the media was replaced with fresh DMEM, and viral supernatants were collected at 48- and 72-hours post-transfection. The harvested media was filtered through a 0.45 pm vacuum filter system and concentrated using Lenti-X Concentrator (Takara Bio) at a 1:3 ratio (media:concentrator) by incubation at 4°C for at least 30 minutes, followed by centrifugation at 1,500 x g for 45 minutes at 4°C. The viral pellet was resuspended in phosphate-buffered saline (PBS), aliquoted, and stored at -80°C until further use.
DNMT1 library testing in HEK cells
[0344] HEK293T cells were transduced with the lentiviral library with an MOI of 0.2. After 24-48hs, media was removed and exchanged with fresh media with 2 ug/ml of puromycin. Selection continued during -14 days. To test the editing pattern across our library, 20 million cells (300X coverage) were seeded in a 225mm3 dish and transfected the day after using Lipof ectamine 3000 (Thermo Fisher) with 20 ug of base editor pCMV-T7-ABE8e-nSpCas9- P2A-EGFP (KAC978) (Addgene #185910). Genomic DNA was collected from cells 5 days after transfection.
Evaluation of the anchor library
[0345] Efficiency and precision of the base editor, in combination with the anchor sequences from were evaluated using custom scripts developed in R. The quality of the reads from NGS samples was assessed before further processing. Variant calling techniques were then applied to distinctly identify the perfect edit — conversion of adenine at position 13 to guanine — apart from bystander edits, which encompassed any conversion of the other adenines or combinations involving A13. Samples with anchor sequences yielding fewer than 20 reads were excluded to ensure robustness in the data analysis. Furthermore, a quantitative score was devised and calculated using the following formula:
[0346] Score = (% Perfect Edit) / ((% Perfect Edit + % Bystander)2). Anchors achieving the highest scores and demonstrating at least 20% overall editing efficiency were further characterized experimentally.
Context library generation
[0347] For the A to G base editor, -12000 different gRNAs targeting pathogenic relevant mutations 14 (Arbab et al. 2020) were cloned as a library to test the performance of different hairpins and base editors.
[0348] The generation of these context libraries differed from the generation of the agRNA, since extensive recombination events occurred when the gRNA, gRNA scaffold, agRNA and target were introduced as one oligo. To avoid recombination issues, the gRNA and target with 11 bp upstream and 25 bp downstream of the native genomic context were cloned as an oligo lacking the gRNA scaffold and hairpin. Instead of these, the oligo had 2 outwards facing Bsal cutting sites with 10 randomized base pairs at that position. The DNA Oligo Pool libraries are amplified with the oligos Lib_F and Lib_R via PCR. The backbone sgBbsI (p2Tol-U6-2xBbsI-sgRNA-HygR) (Addgene #71485) was PCR amplified using the oligos
BB_R and BB_F. The PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol. The fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol. 2 pL of the Gibson assembly mix are directly transformed into Lucigen's Endura Competent Cells and after recovery plated on agar plates poured in Nunc™ Square Bio As say Dishes. For each library at least lOx library coverage of colonies were washed off the plates using LB media and then spun down. The plasmid DNA was extracted using the QIAGEN® Plasmid Plus Midi Kit according to the manufacturer’s protocol from the resulting pellet.
[0349] The library was then digested using Bsal according to the manufacturer’s protocol and gel purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol. The gRNA scaffold, hairpin and terminator with an inward facing Bsal cutting site up- and downstream were ordered as cloned gene synthesis from IDT. The plasmid was also Bsal digested, and the fragment was purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol. The insert and library were ligated using New England Biolabs T4 DNA Ligase according to the Manufacturer’s protocol with 150 ng backbone and a 10:1 ratio of the insert to the backbone. 2 p L of the ligation mix were directly transformed into Lucigen's Endura Competent Cells and after recovery plated on agar plates poured in Nunc™ Square BioAssay Dishes. For each library at least lOx library coverage of colonies were washed off the plates using LB media and then spun down. The plasmid DNA was extracted using the QIAGEN® Plasmid Plus Midi Kit according to the manufacturer’s protocol from the resulting pellet.
Path_Var library testing in HEK cells
[0350] HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37°C and 5% CO2.
[0351] For stable Tol2 transposon-mediated library integration, 5 million cells (-400X coverage) were seeded in 175mm3dishes. The following day, cells were co-transfected using Lipofectamine 3000 (Thermo Fisher) with lOug of Tol2 transposase plasmid (pCMV-Tol2 Addgene # #31823) and 10 ug of Path_Var library. To generate stable library cell lines, cells were selected with hygromycin (25 ug/ml) starting the day after transfection and continued for > 2-3 weeks. Following, lOug of base editor was transfected using Lipofectamine 3000 to
2.5 million cells (-200X coverage) were seeded the day before in a 100mm3 dish. Genomic DNA was collected from cells 5 days after transfection.
Illumina sequencing and bioinformatic analysis of the libraries
[0352] To sequence the libraries before (as quality control) and after testing in the HEK cells, Ing of the isolated plasmid DNA (QuickExtract DNA Extraction Solution Bioserch Technologies) was amplified using the oligo mix IllSeq_DNMTl_i5_Fl-4 and IllSeq_DNMTl_i7_Rl-4 using New England Biolabs Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol. The forward and reverse oligo contained an i5/i7 overhang for indexing as well as 4-7 Ns to ensure shifting of the sequence to be able to sequence the sequences with high identity. The resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs). The PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via Invitrogen™ Qubit™. A 4 nM Pool of the different libraries was generated and 10 % 4 nM PhiX was added. The Pool was sequenced using the Illumina MiSeq Reagent Kit v2 (300-cycles) according to the manufacturer’s protocol.
Evaluation of the context library
[0353] The evaluation of the context library involved analyzing gRNA libraries, which comprised approximately 12,000 spacer sequences and their respective contextual sequences. The efficiency and editing profiles for each gRNA were established using custom scripts developed in R. First, the target sites — where each spacer binds within the context — were extracted from the NGS reads. Subsequently, for each spacer in the library, all combinations of adenine to guanine conversions were aligned against these extracted sequences. Spacers with fewer than 25 total reads were excluded from the analysis. To quantify overall editing efficiency for the different base editors, the mean A to G conversion rate was calculated by averaging the editing frequencies at each targeted position.
Genome editing of genomic loci
[0354] HEK293T, HeLa and HepG2 were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO2. 20.000 cells were seeded in 96 well plates (Corning) and transfected the day after using jetOPTIMUS® (Polyplus) following manufacturer instructions. 50 ng of sgRNA or
agRNA (both cloned in BPK1520 (Plasmid #65777)) with 150ng of base editor were cotransfected, and cells were harvested after three days for Sanger sequencing (Genewiz) or HTS (Quintara Biosciences or in house Illumina miSeq). HTS data was analyzed using CRISPRESSO and BE-analyzer (CRISPR RGEN tools) (26).
Generation of the selection phages for PANCE
[0355] The PANCE selection phages are carrying the CDS for the ABE8e adenine deaminase instead of the CDS of Pill. The ABE8e adenine deaminase has part of the peptide linker sequence and a C-terminal fused intein CDS to enable it to encode the relatively small protein and not the whole base editor. The phages were generated by PCR amplifying the ABE8e adenine deaminase including the partial sequence of the peptide linker using the oligonucleotides ABE_M13_F and ABE_M13_R. The N-terminal Npu DnaE intein was ordered as gBlock and amplified using the oligonucleotides Npu_ABE_F and Npu_M13_R. The phage backbone was amplified using the oligonucleotides GOI_M13_F and G0I_M13_R using a wildtype M13 phage genomic DNA as a template. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C. All fragments were digested with Dpnl (NEB) overnight at 37 °C in the PCR buffer and PCR purified using the NEB Monarch® PCR & DNA Cleanup Kit the next day. The fragments were assembled using the NEB Gibson Assembly® Master Mix according to the manufacturer’s protocol and 3 pL of the reaction were directly transformed into electrocompetent S206029 pJC175e competent cells. The cells were recovered in 500 pL SOC media for 45 min and after that 450 pL and 50 pL were mixed each with 500 pL of freshly grown S2060 pJC175e cells. The cells were immediately mixed with 3 mL soft LB-agar (0.7 %) and plated on LB bottom agar plates containing 100 pg/mL carbenicillin. The plates were incubated at 37 °C overnight. Plaques were picked into 50 pL 2xYT media and 1 pL was used as a template for colony PCR using the oligonucleotides ABE_M13_F and Npu_M13_R. Positive phages were amplified by adding the remaining 2xYT media to a freshly grown S2060 pJC175e culture at the OD600 of 0.4 and cultivating for 16 h at 37 °C. The cultures were spun down to remove the E. coli cells and the phages were precipitated by adding a 20% polyethylene glycol (8000) and 2.5M sodium chloride solution in a 1:4 ratio to the culture supernatant. The mixture is incubated for at least 3h at 4 °C and the phage pellet is resuspended in a PBS buffer, the phage titer was quantified using the Progen Phage Titration ELISA kit and the phages were stored at 4 °C until usage. Additionally, 3 mL of the culture supernatant were used for phage
DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit. The isolated DNA was sent to Plasmidsaurus for whole phage DNA sequencing.
Generation of the selection cells for PAN CE
[0356] The selection plasmids were designed on the basis of using pJC175e (Addgene #79219) (27) and adding mutations that when edited by the ABE base editor, only perfect edits restore Pill activity while bystander lower pill activity. The pJC175e backbone was amplified using the oligonucleotides pIII_gBlock_R and pJC175e_Cas_F. The part of the pill CDS containing the mutation followed by the corresponding guide correcting the introduced mutation downstream as well as the C-terminal DnaE intein necessary to fuse the ABE8e adenine base editor encoded by the phage to the Cas9 encoded by the selection plasmid were ordered as gBlock. Each selection plasmid also encodes the agRNA to fix the mutation on pill (SP1 (F366L); SP2 (K360R); SP3 (141 IV) ). The three different gBlocks for the three different selection plasmids each encoding a different pill mutation were amplified via PCR using the oligonucleotides gBlock_R and gBlock_pIII_F. The base Cas9 CDS was amplified from ABE8e plasmid (Addgene #138489) using the oligonucleotides BE_Npu_F and BE_pJC175e_R. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C. All fragments were digested with Dpnl (NEB) overnight at 37 °C in the PCR buffer and PCR purified using the NEB Monarch® PCR & DNA Cleanup Kit the next day. The fragments were assembled using the NEB Gibson Assembly® Master Mix according to the manufacturer’s protocol and transformed into electrocompetent S2060 competent cells. The cells were recovered in 500 pL SOC media for Ih and after that plated on LB agar plates with 100 g/ML carbenicillin and incubated overnight at 37 °C. Colonies were screened via colony PCr and positive clones were sent to whole plasmid sequencing. The clones with verified sequence were used to generate electrocompetent cells that were then transformed with the mutation plasmid MP4 (Addgene #69652) (28). The cells were recovered in 1 mL SOC media and plated on 2xYT agar plates containing 1% glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. 5 colonies were used to start 50 pL shake flask 2xYT 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol cultivations. The cultivations were used to freeze 20 % glycerol stocks in 1 mL aliquots after 16 h. Each culture was also used to isolate plasmid DNA for whole plasmid sequencing by Plasmidsaurus to select the glycerol stocks with no mutation in MP4 and the selection plasmid.
PANCE
[0357] The evolution was performed as 10 consecutive batch cultivations in triplicates using a mix of three different selection plasmids in each evolution. For the PANCE experiment, the day prior to the cultivation three 3 mL overnight cultures were prepared using 2xYT media with 1 % glucose, 100 pg/mL carbenicillin and 25 g/mL chloramphenicol. The cultures are inoculated with the glycerol stock of one of the selection plasmids each. The following day, 3-4 h prior to phage infection 50 mL shake flasks are inoculated with a combined OD600 of 0.1 of the pooled overnight cultures with the different selection plasmids. The cells are cultivated in 2xYT with 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. 30 min prior reaching an OD of 0.4, the cells are induced with 0.5 % arabinose and when the cells reach the OD600 of 0.4, the cells are infected with the selection phages at an MOI of 1 for the first selection round. The evolution is performed for 12 h at 37 °C and after that the entire cultivation was spun down and the supernatant was filtered with 0.2 pM filters. For selection round 2-4, 500 pL, for round 5-6 100 pL, and for the remaining rounds 5 pL of the supernatant were used to infect the following evolution. The phage titer after each selection round was determined using the Progen Phage Titration ELISA kit. 3 mL of each culture supernatant was used for phage DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit. Sequences of V28C, L34W and V28C&M151E correspond to amino acid sequences set forth in SEQ ID NOs.: 15, 16, and 18.
Illumina sequencing of the PANCE experiment
[0358] To sequence the PANCE variants, Ing of the isolated plasmid phage DNA of each selection round was amplified using the oligo mix IllSeq_ABE_i5_Fl-4 and IllSeq_ABE_i5_Rl-4 using New England Biolabs Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol. The forward and reverse oligo contained an i5/i7 overhang for indexing as well as 4-7 Ns to ensure shifting of the sequence to be able to sequence the sequences with high identity. The resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs). The PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via Invitrogen™ Qubit™. A 4 nM Pool of the different libraries was generated and 10 % 4 nM PhiX was added. The Pool was sequenced using the Illumina MiSeq Reagent Kit v3 (600-cycles) according to the manufacturer’s protocol. Variant testing
in the DNMT1 site followed the same conditions described in Genome editing of genomic loci section. To access the best performing variants we developed the Precision Score (PreS) - PreS = (% On-target edit/Avg bystander editing) * (% On-target edit Variant / % On-target edit ABE8e).
Evolutionary plausible mutations prediction with machine learning
[0359] The approach used by Hie et al. was followed (22), which consists of using an ensemble of six protein language models: ESM-lb34 (23) and ESM-lv23 (23), composed itself of five models (accessible at: https://github.com/facebookresearch/esm). Together, the six models are used to predict what amino acids would be more likely than the current wild type amino acid, if any, at each position of the protein sequence given as input. The number of models in the ensemble agreeing on a given prediction (i.e., a specific amino acid substitution) allows to score a given substitution, where a higher score is more likely to yield a positive result.
[0360] The ensemble of protein language models was applied to the TadA-8e sequence, which yielded the following predictions (score in parenthesis): R26G (6), F84L (6), N108D (6), Y149F (6), F156K (6), V106A (5), P152R (5), H8D (5), N157K (5), R111T (4), C146S (3), R111A (2), C146Q (2), M151E (2), C146K (1), Y123H (1), M151Q (1), A48P (1), V155E (1), S109P (1), P152Q (1).
Computational analysis of structural impact on ABE8e
[0361] The crystal structure of ABE8e named 6vpc available in the PDB database was used and the software ChimeraX (version 1.6.1 (2023-05-09)) for visualization and analysis of the structure.
[0362] The command line in ChimeraX was used to visualize, mutate, and analyze interactions of target residues. Interactions of nucleotides 5-8 in the gRNA with residues of ABE8e were analyzed. Hydrogen bonds and non-polar (van der Waals) interactions were sought between carbon atoms in the gRNA and protein at a maximum distance of 3.8 Angstroms, and cationic interactions between nitrogen atoms within 5 A of an aromatic carbon involving our target residues and nucleotides.
IPSCs editing
[0363] KOLF2.1J SNCA E46K-/- was purchased from Jackson laboratories and maintained in StemFlex media (Life Technologies #A3349401) supplemented with 10% FBS (Life
Technologies) and kept at 37 °C and 5% CO2. Cells were cultured in coated plates with Img/ml Synthemax stock solution (Synthemax II-SC - Cat# 3535, Corning) following manufacturer instructions.
[0364] Nucleofection was performed using the Neon electroporation system (Thermo Fisher) 10 ul kit. 200.000 cells were resuspended in ~10ul of buffer R and mixed with 200ng of gRNA vector and 200ug of base editor. Nucleofection was performed using the following parameters: Voltage: 1400 V, Width: 20 ms, Pulses: 2 pulses. After nucleofection, cells were plated in 12- well plates with 400ul of StemFlex without antibiotic and 1:100 dilution of RevitaCell™ Supplement (Cat# A26445-01, Gibco Life Technologies). After 24 hrs, media was replaced and editing was evaluated after 48hs by NGS (Quintara Biosciences).
References for Example 1A-7A
1. Gaudelli, N. M. et al. Programmable base editing of A«T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017).
2. Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science 368, 290-296 (2020).
3. Chen, L. et al. Engineering a precise adenine base editor with minimal bystander editing. Nat Chem Biol 19, 101-110 (2023).
4. Sternberg, S. H., LaFrance, B., Kaplan, M. & Doudna, J. A. Conformational control of DNA target cleavage by CRISPR-Cas9. Nature 527, 110-113 (2015).
5. Nishimasu, H. et al. Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA. Cell 156, 935-949 (2014).
6. Epinat, J.-C. A novel engineered meganuclease induces homologous recombination in yeast and mammalian cells. Nucleic Acids Research 31, 2952-2962 (2003).
7. Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat Rev Genet 19, 770-788 (2018).
8. Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat Biotechnol 35, 371-376 (2017).
9. Liu, Z. et al. Efficient base editing with high precision in rabbits using YFE-BE4max. Cell Death Dis 11, 36 (2020).
Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat Biotechnol 38, 620-628 (2020). Zhou, C. et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275-278 (2019). Alves, C. R. R. et al. Base editing as a genetic treatment for spinal muscular atrophy. Preprint at https://doi.org/10.1101/2023.01.20.524978 (2023). Chu, V. T. et al. Increasing the efficiency of homology-directed repair for CRISPR- Cas9-induced precise gene editing in mammalian cells. Nat Biotechnol 33, 543-548 (2015). Carlson-Stevermer, J. et al. Assembly of CRISPR ribonucleoproteins with biotinylated oligonucleotides via an RNA aptamer for precise gene editing. Nat Commun 8, 1711 (2017). Arbab, M. et al. Determinants of Base Editing Outcomes from Target Library Analysis and Machine Learning. Cell 182, 463-480. e30 (2020). Xiao, Y.-L., Wu, Y. & Tang, W. An adenine base editor variant expands context compatibility. Nat Biotechnol 42, 1442-1453 (2024). Miller, S. M., Wang, T. & Liu, D. R. Phage-assisted continuous and non-continuous evolution. Nat Pro toe 15, 4101-4127 (2020). Lapinaite, A. et al. DNA capture by a CRISPR-Cas9-guided adenine base editor. Science 369, 566-571 (2020). Richter, M. L. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat Biotechnol 38, 883-891 (2020). Esvelt, K. M., Carlson, J. C. & Liu, D. R. A system for the continuous directed evolution of biomolecules. Nature 472, 499-503 (2011). Weiss, G. A., Roth, T. A., Baldi, P. L. & Sidhu, S. S. Comprehensive Mutagenesis of the C-terminal Domain of the M13 Gene-3 Minor Coat Protein: The Requirements for Assembly into the Bacteriophage Particle. Journal of Molecular Biology 332, 777- 782 (2003). Hie, B. L., Yang, K. K. & Kim, P. S. Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins. Cell Systems 13, 274- 285. e6 (2022).
23. Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. U.S.A. 118, e2016239118 (2021).
24. Lepor, N. E. & Kereiakes, D. J. The PCSK9 Inhibitors: A Novel Therapeutic Target Enters Clinical Practice. Am Health Drug Benefits 8, 483-489 (2015).
25. Musunuru, K. et al. In vivo CRISPR base editing of PCSK9 durably lowers cholesterol in primates. Nature 593, 429-434 (2021).
26. Bae, S., Park, J. & Kim, J.-S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473-1475 (2014).
27. Badran, A. H. et al. Continuous evolution of Bacillus thuringiensis toxins overcomes insect resistance. Nature 533, 58-63 (2016).
28. Badran, A. H. & Liu, D. R. Development of potent in vivo mutagenesis plasmids with broad mutational spectra. Nat Commun 6, 8425 (2015).
EQUIVALENTS AND SCOPE
[0365] In the claims articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.
[0366] Furthermore, the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or
consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein.
[0367] It is also noted that the terms “comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub-range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.
[0368] This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the claims. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any claim, for any reason, whether or not related to the existence of prior art.
[0369] Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended claims. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.
Claims
1. An anchor guide RNA (agRNA) for editing a target nucleic acid using a nucleobase editor comprising a guide RNA (gRNA) and a 3 '-nucleic acid extension, wherein the 3'- nucleic acid extension comprises nucleic acids encoding an upstream binding sequence (UBS) and a downstream binding sequence (DBS), wherein the UBS binds to the nucleic acids downstream of the target nucleic acid; and wherein the DBS binds to the nucleic acids upstream of the target nucleic acid.
2. The agRNA of claim 1, wherein the gRNA is a single guide RNA (sgRNA).
3. The agRNA of claim 2, wherein the 3 '-extension is attached to the 3 '-end of the sgRNA.
4. The agRNA of any one of claims 1-3, wherein the 3'-nucleic acid extension further comprises a counterloop sequence (CLS).
5. The agRNA of any one of claims 1-4, wherein the agRNA improves the editing efficiency of a target nucleic acid by a nucleobase editor relative to the editing efficiency of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
6. The agRNA of any one of claims 1-5, wherein the agRNA reduces bystander editing of bystander nucleic acids within an editing window of a target nucleic acid for a nucleobase editor relative to the bystander editing of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
7. The agRNA of any one of claims 1-6, wherein the UBS comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides complementary to the nucleic acid sequence downstream of the target.
8. The agRNA of any one of claims 1-7, wherein the DBS comprises 0, 1, 2, 3, 4, 5, 6, 7,
8. 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides complementary to the nucleic acid sequence upstream of the target.
9. The agRNA of any one of claims 4-8, wherein the CLS comprises 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides.
10. The agRNA of any one of claims 4-9, wherein the CLS is a hairpin.
11. The agRNA of any one of claims 4-10, wherein the UBS and the DBS flank the CLS.
12. The agRNA of any one of claims 1-11, wherein the 3'-nucleic acid extension further comprises a secondary structure element.
13. The agRNA of claim 12, wherein the secondary structure element is a tevopreQl motif.
14. The agRNA of any one of claims 4-13, wherein the 3'-nucleic acid extension comprises a structure selected from:
5'-[UBS]-[CLS]-[DBS]-3';
5'-[UBS]- [DBS]-3';
5'-[UBS]-[CLS]-3';
5'-[CLS]-[DBS]-3';
5'-[CLS]-3';
5'-[UBS]-3';
5'-[DBS]-3';
5'-[UBS]-[CLS]-[tevopreQl motif]-[DBS]-3'; or 5'-[UBS]-[CLS]-[DBS]-[tevopreQl motif]-3'.
15. The agRNA of any one of claims 4-14, wherein the agRNA comprises a structure selected from:
5'-[gRNA]-[UBS]-[CLS]-[DBS]-3';
5'-[gRNA]-[UBS]- [DBS]-3';
5'-[gRNA]-[UBS]-[CLS]-3';
5'-[gRNA]-[CLS]-[DBS]-3';
5'-[gRNA]-[CLS]-3';
5'-[gRNA]-[UBS]-3';
5'-[gRNA]-[DBS]-3';
5'-[gRNA]-[UBS]-[CLS]-[tevopreQl motif]-[DBS]-3'; or 5'-[gRNA]-[UBS]-[CLS]-[DBS]-[tevopreQl motif]-3'.
16. The agRNA of any one of claims 1-15, wherein the target nucleic acid is within a double-stranded DNA molecule.
17. The agRNA of any one of claims 1-16, wherein the target nucleic acid falls within a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site.
18. The agRNA of claim 17, wherein the target nucleic acid falls within a gene.
19. The agRNA of claim 18, wherein the gene is associated with a disease or disorder.
20. The agRNA of claim 19, wherein the disease or disorder is caused by pathogenic
Single Nucleotide Polymorphisms (SNPs).
21. The agRNA of claim 19 or 20, wherein the disease or disorder is selected from a group consisting of inflammatory disorders, autoimmune disorders, and cancers.
22. The agRNA of any one of claims 1-18, wherein the target nucleic acid is within the gene encoding DNMT1.
23. The agRNA of any one of claims 4-22, wherein the 3 '-nucleic acid extension comprises a sequence selected from any one of SEQ ID NOs: 2-11.
24. The agRNA of any one of claims 4-23, wherein the 3'-nucleic acid extension comprises a sequence selected from the group consisting of: CGCGCGTTCGCGCGG (SEQ ID NO: 2);
CACGCGCGTTCGCGCTGGCACCA (SEQ ID NO: 3);
CTGGCGCGTCGCGCTCTGG (SEQ ID NO: 4);
CCTGCGCGTCGCGCTTCTGGCACCA (SEQ ID NO: 5);
CTCGCGGCTTCGCGTGGCAC (SEQ ID NO: 6);
CACGCGGCTTCGCGGGCACCA (SEQ ID NO: 7);
ACCGCGCTTCGCGTGGCACCA (SEQ ID NO: 8);
CACCCCTCGCGTTCGCGTTCTGGCA (SEQ ID NO: 9);
CCCTGGCGCGTTCGCGCGGCAC (SEQ ID NO: 10); and TGGCGCGGCTCGCGCTGGCACCA (SEQ ID NO: 11).
25. A polynucleotide encoding the agRNA of any one of claims 1-24.
26. An agRNA library comprising a plurality of anchor guide RNAs (agRNAs) of any one of claims 1-24.
27. The agRNA library of claim 26, wherein the target nucleic acid of the plurality of agRNAs falls within a gene.
28. The agRNA library of claim 27, wherein the gene is associated with a disease or disorder.
29. The agRNA library of claim 28, wherein the disease or disorder is caused by pathogenic Single Nucleotide Polymorphisms (SNPs).
30. The agRNA library of claim 28 or 29, wherein the disease or disorder is selected from a group consisting of inflammatory disorders, autoimmune disorders, and cancers.
31. A vector comprising the polynucleotide of claim 25.
32. The vector of claim 31 further comprising a nucleic acid sequence comprising a target nucleotide sequence comprising (a) the target nucleic acid, (b) the nucleic acids upstream of the target nucleic acid, and (c) the nucleic acids downstream of the target nucleic acid.
33. The vector of claim 32 further comprising at least two primer binding sites,
wherein a first primer binding site is located upstream or within the agRNA, and wherein a second primer binding site is located downstream of a target nucleic acid sequence, and wherein the distance between the first and second primer sites is less than 300 nucleotides.
34. An agRNA screening library comprising a plurality of vectors of claim 33.
35. A composition comprising:
(a) the agRNA of any one of claims 1-24;
(b) a nucleobase editor, wherein the nucleobase editor comprises a fusion protein of a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp), wherein the napDNAbp is programed by the agRNA of (a).
36. The composition of claim 35 further comprising:
(c) a target nucleic acid.
37. A composition comprising:
(a) the agRNA of any one of claims 1-24;
(b) an N-terminal portion of a split nucleobase editor fused at its C-terminus to an intein-N and
(c) a C-terminal portion of a split nucleobase editor fused at its N-terminus to an intein-C wherein the N-terminal portion of a split nucleobase editor and the C-terminal portion of a split nucleobase editor are joined to form a fusion protein of a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp), wherein the napDNAbp is programed by the agRNA of (a).
38. The composition of claim 37 further comprising:
(d) a target nucleic acid.
39. The composition of any one of claims 35-38, wherein the napDNAbp is selected from the group consisting of Cas9, CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp-Cas9, SpRY, SpG- Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2, and variants thereof.
40. The composition of any one of claims 35-39, wherein the deaminase is a cytidine deaminase or an adenosine deaminase.
41. The composition of any one of claims 35-40, wherein the deaminase is a cytidine deaminase.
42. The composition of claim 41, wherein the cytidine deaminase is selected from the group consisting of CBE6, CGBE, BE4max, TadCBE, and variants thereof.
43. The composition of any one of claims 35-40, wherein the deaminase is an adenosine deaminase.
44. The composition of claim 43, wherein the adenosine deaminase is selected from the group consisting of TadA-8e, ABE8e, AYBE, ABE9, and variants thereof.
45. The composition of any one of claims 43-44, wherein the adenosine deaminase comprises an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence comprises one or more amino acid substitutions at position(s) selected from the group consisting of positions 8, 25, 26, 27, 28, 29, 30, 31, 33, 34, 37, 38, ,39, 41, 42, 43, 44, 45, 48, 49, 50, 54, 56, 58, 78, 79, 80, 82, 84, 85, 86, 88, 90, 91, 92, 93, 94, 95, 96, 97, 99, 100, 101, 102,106, 107, 109, 111, 123, 146, 149, 151, 152, 155, 156, and 157 of SEQ ID NO: 1.
46. The composition of any one of claims 43-45, wherein the adenosine deaminase comprises an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence comprises one or more amino acid substitutions selected from the group consisting of H8D, R25E, R26G, E27R, V28C, P29C, V30W, G31E, V33C, V33T, I34W, N37T, N37H, N38I, R39E, I41S, G42A, E43R, G44A, W45G, A48P, I49S, G50A, D54T, A56P, A58S, A78R, T79H, L80P, V822R, F84I, F84L, E85R, P86A, V88R, C90V, A91R, G92R, A93R, M94H, I95D, H96P, S97L, I99D, G100R, R101P, V102R,
V106A, R107E, S109P, S109L, R111K, R111T, R111A, Y123H, C146Q, R146K, Y149F, M151E, M151Q, P152R, P152Q, V155E, F156K, and N157K of SEQ ID NO: 1.
47. The composition of any one of claims 43-45, wherein the adenosine deaminase comprises an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence comprises one or more amino acid substitutions at position(s) selected from the group consisting of positions 28, 34, and 151 of SEQ ID NO: 1.
48. The composition of any one of claims 43-47, wherein the adenosine deaminase comprises an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence comprises one or more amino acid substitutions selected from the group consisting of V28C, L34W, and M151E of SEQ ID NO: 1.
49. The composition of any one of claims 43-48, wherein the adenosine deaminase comprises an amino acid sequence comprising amino acid substitution(s):
(a) V28C (ABExl);
(b) L34W (ABEx2);
(c) M151E (ABEx3); and/or
(d) V28C and M151E (ABEx4).
50. A complex of the agRNAs of any one of claims 1-24 and the nucleobase editor of any one of claims 35-49.
51. One or more polynucleotides encoding the complex of claim 50.
52. One or more vectors comprising the one or more polynucleotides of claim 51 and one or more promoters that drive the expression of the agRNA and the nucleobase editor or split nucleobase editor of the complex.
53. The one or more vectors of claim 52, wherein one or more of the vectors is adeno- associated virus (AAV).
54. A cell comprising the vector of claim 52 or 53.
55. A cell comprising the agRNA of any one of claims 1-24, the polynucleotide of claim 25 or 51, the vector of any one of claims 31-33 , 52 or 53, or the complex of claim 50.
56. A pharmaceutical composition comprising: (i) the agRNA of any of claims 1-24, the complex of claim 50, a polynucleotide of claim 51, or a vector of any of claims 52-53; and
(ii) a pharmaceutically acceptable excipient.
57. A method of designing a library of sequences encoding 3 '-nucleic acid extensions, the method comprising:
(a) using software executing on at least one computer hardware processor to perform:
(i) obtaining input data indicative of the target nucleotide sequence;
(ii) generating an array of upstream binding sequences (UBSs) comprising nucleotides that are complementary to the downstream sequence of the target nucleotide;
(iii) generating an array of downstream binding sequences (DBSs) comprising nucleotides that are complementary to the upstream sequence of the target nucleotide;
(iv) removing duplicates from the arrays of UBSs and DBSs; and
(v) creating a list of combinations of UBSs, DBSs, and counterloop sequences (CLSs) from an array of CLSs, wherein each combination is output as a nucleic acid sequence encoding a single UBS, a single CLP, and a single DBS, wherein the UBS and DBS flank the CLP.
58. A method of selecting agRNAs comprising the steps of:
(a) transfecting the agRNA screening library of claim 35 into cells; and
(b) using next generation sequencing (NGS) to select agRNA vectors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor, wherein the reduced bystander editing and/or editing efficiency are measured relative to the gRNA lacking the 3 '-nucleic acid extension of the agRNA or a second agRNA.
59. The method of claim 56, wherein the bystander editing at one or more sites is reduced by at least 2%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least
50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% relative to a reference agRNA.
60. The method of claim 59, wherein the editing efficiency of a target nucleic acid by a nucleobase editor is improved by at least 2%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% relative to a reference agRNA.
61. A method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor, the method comprising using an agRNA of any one of claims 58-60 as part of an evolution system.
62. The method of claim 61, wherein the evolution system is Phage Assisted Continuous Evolution (PACE) or Phage Assisted Non-Continuous Evolution (PANCE).
63. The method of claim 61 or 62, wherein the evolution system is PANCE, the method comprising the steps of:
(a) generating selection phages for PANCE,
(b) generating selection plasmid for PANCE, wherein the selection phage encode a pill gene further comprising mutations that diminish pill activity, wherein (i) pill activity is restored by a nucleobase editor if the nucleobase editor edits the target nucleic acid or (ii) pill activity is not restored by a nucleobase editor if the nucleobase editor edits bystander nucleic acids;
(c) generating selection cells for PANCE;
(d) performing PANCE as batch cultivations, wherein the PANCE system exerts selection pressure using one or more selection plasmids of (b), wherein adaptive mutations of the nucleobase editor that retain target nucleic acid base editing activity are necessary for phage propagation or wherein adaptive mutations of the nucleobase editor that increase bystander editing of one or more nucleic acids within the editing window are disadvantageous for phage propagation;
(e) performing NGS on each batch of (d) and identifying nucleobase editor variants;
(f) cloning the nucleobase editor variants of (e) and testing in target cells; and
(g) scoring the nucleobase editor variants based on enrichment values.
64. The method of claim 63, wherein the selection plasmids comprise a sequence selected from any one of SEQ ID NOs: 19-21.
65. A method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor comprising:
(a) using machine learning (ML) language models to predict evolutionary adaptative mutations resulting in nucleobase editor variants with improved fitness;
(b) selecting from the one or more nucleobase editor variant(s) of (a);
(c) cloning the nucleobase editor variants of (b) and testing in target cells; and
(d) scoring the nucleobase editor variants based on enrichment values, wherein the nucleobase editor variant with improved fitness (i) reduces bystander editing within an editing window of a target nucleic acid for the one or more nucleobase editors and/or (ii) improves editing efficiency of a target nucleic acid by the one or more nucleobase editors.
66. The method of claim 65, wherein the machine learning model is an ESM-lb language model and/or an ESM-lv language model, wherein said language models (i) learn natural amino acid patterns based on millions of naturally occurring protein sequences, (ii) consider mutations observed in sequences of natural proteins as plausible mutations and (iii) assume plausible mutations with high likelihood scores correlate with improved protein fitness.
67. A method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor comprising:
(a) using the method of any one of claims 61-64; and
(b) using the method of any one of claims 65-66.
68. A method of base editing comprising: contacting a target nucleic acid sequence with a nucleobase editor, wherein the nucleobase editor comprises a fusion protein comprising a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp), wherein the napDNAbp is bound to the agRNA of any one of claims 1-24.
69. The method of claim 68, wherein the target nucleic acid sequence comprises a target nucleic acid.
70. The method of claim 69, wherein the target nucleic acid falls within a double-stranded DNA molecule.
71. The method of claim 70, wherein the double-stranded DNA molecule is genomic DNA of a cell.
72. The method of any one of claims 68-71, wherein the method is performed within a prokaryotic or eukaryotic cell.
73. Use of the agRNA of any one of claims 1-24 for base editing a target nucleic acid, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell, a virus, a fungus, a plant, an insect, and/or an animal.
74. A kit comprising the agRNA of any one of claims 1-24, the complex of claim 50, the polynucleotide of claim 51, the vector of any of claims 52-53, or the cell of claim 55.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463648151P | 2024-05-15 | 2024-05-15 | |
| US63/648,151 | 2024-05-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025240795A1 true WO2025240795A1 (en) | 2025-11-20 |
Family
ID=97720769
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/029650 Pending WO2025240795A1 (en) | 2024-05-15 | 2025-05-15 | End-modified grnas for improved base editing |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025240795A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200339980A1 (en) * | 2016-06-08 | 2020-10-29 | Agilent Technologies, Inc. | High Specificity Genome Editing Using Chemically Modified Guide RNAs |
| US20230357766A1 (en) * | 2020-09-24 | 2023-11-09 | The Broad Institute, Inc. | Prime editing guide rnas, compositions thereof, and methods of using the same |
-
2025
- 2025-05-15 WO PCT/US2025/029650 patent/WO2025240795A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200339980A1 (en) * | 2016-06-08 | 2020-10-29 | Agilent Technologies, Inc. | High Specificity Genome Editing Using Chemically Modified Guide RNAs |
| US20230357766A1 (en) * | 2020-09-24 | 2023-11-09 | The Broad Institute, Inc. | Prime editing guide rnas, compositions thereof, and methods of using the same |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250011748A1 (en) | Base editors, compositions, and methods for modifying the mitochondrial genome | |
| EP4100032B1 (en) | Gene editing methods for treating spinal muscular atrophy | |
| US11344609B2 (en) | Compositions and methods for treating hemoglobinopathies | |
| US20220315906A1 (en) | Base editors with diversified targeting scope | |
| US20230021641A1 (en) | Cas9 variants having non-canonical pam specificities and uses thereof | |
| US20230127008A1 (en) | Stat3-targeted base editor therapeutics for the treatment of melanoma and other cancers | |
| US12281338B2 (en) | Nucleobase editors comprising GeoCas9 and uses thereof | |
| US20220177877A1 (en) | Highly multiplexed base editing | |
| US20220307001A1 (en) | Evolved cas9 variants and uses thereof | |
| US20240173430A1 (en) | Base editing for treating hutchinson-gilford progeria syndrome | |
| WO2021158995A1 (en) | Base editor predictive algorithm and method of use | |
| JP2024041081A (en) | Use of adenosine base editors | |
| WO2021222318A1 (en) | Targeted base editing of the ush2a gene | |
| JP2020534795A (en) | Methods and Compositions for Evolving Base Editing Factors Using Phage-Supported Continuous Evolution (PACE) | |
| WO2022261509A1 (en) | Improved cytosine to guanine base editors | |
| US20250339559A1 (en) | Base editing-mediated readthrough of premature termination codons (bert) | |
| US20250090687A1 (en) | Mitochondrial base editors and methods for editing mitochondrial dna | |
| EP4192948A2 (en) | Rna and dna base editing via engineered adar | |
| WO2025240795A1 (en) | End-modified grnas for improved base editing | |
| WO2023205687A1 (en) | Improved prime editing methods and compositions | |
| EP4573191A1 (en) | Evolved cytosine deaminases and methods of editing dna using same | |
| US20250228981A1 (en) | Base editing methods and compositions for treating triplet repeat disorders | |
| WO2024077267A1 (en) | Prime editing methods and compositions for treating triplet repeat disorders | |
| WO2025122725A1 (en) | Methods and compositions for base editing of tpp1 in the treatment of batten disease | |
| EP4658786A2 (en) | Gene editing methods, systems, and compositions for treating spinal muscular atrophy |