WO2025240795A1 - Arng à extrémité modifiée pour édition de base améliorée - Google Patents
Arng à extrémité modifiée pour édition de base amélioréeInfo
- Publication number
- WO2025240795A1 WO2025240795A1 PCT/US2025/029650 US2025029650W WO2025240795A1 WO 2025240795 A1 WO2025240795 A1 WO 2025240795A1 US 2025029650 W US2025029650 W US 2025029650W WO 2025240795 A1 WO2025240795 A1 WO 2025240795A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- agrna
- nucleic acid
- editing
- sequence
- target nucleic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
- C12N9/222—Clustered regularly interspaced short palindromic repeats [CRISPR]-associated [CAS] enzymes
- C12N9/226—Class 2 CAS enzyme complex, e.g. single CAS protein
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
Definitions
- Base editing has expanded the genome editing toolkit by offering high editing efficiencies, both in vivo and in vitro, without inducing double-strand breaks.
- Adenine base editors (ABEs), catalyze the deamination of cytidine residues in a sequence dependent manner and the conversion A*T-to-G*C base pairs; while, cytidine base editors (CBEs) catalyze the deamination of adenosine residues in a sequence dependent manner and the conversion of C «G-to-T «A base pairs.
- nucleobase editor (“NBE”) efficiency and editing pattern are influenced by the complex interaction between nucleobase editors, gRNAs, and target sequences (Arbab 2020), modifications to nucleobase editors and/or to the components thereof which result in increased editing efficiencies and/or increased specificity would significantly advance the art.
- the present disclosure provides modified guide RNAs (gRNAs) comprising 3'- nucleic acid extensions (referred to herein as anchor guide RNAs (agRNAs)), wherein the agRNA has improved properties, including, but not limited to, improved editing efficiency and/or reduced bystander editing in a context dependent manner of base editing when used in conjunction with a nucleobase editor, such as a fusion protein comprising a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
- agRNAs modified guide RNAs
- agRNAs 3'- nucleic acid extensions
- napDNAbp nucleic acid programmable DNA binding protein
- these agRNAs improve editing efficiency and/or reduce bystander editing by a nucleobase editor by stabilizing the target nucleic acid sequence (e.g., genomic DNA) within the active site of the nucleobase editor, where stabilizing means restricting movement of the DNA within the active site of the nucleobase editor to result in a smaller editing window and/or deaminating fewer bases.
- the present disclosure further provides methods for engineering and/or evolving nucleobase editors to be used in conjunction with a given agRNA.
- compositions, methods, uses, and kits for base editing comprising an agRNA and an optionally engineered and/or evolved nucleobase editor disclosed herein.
- the nucleobase editor is an engineered and/or evolved nucleobase editor.
- the agRNA comprises a gRNA and a 3 '-nucleic acid extension (FIG. IB and FIG. 5A).
- the gRNA comprises a spacer sequence and a scaffold sequence.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA by a nucleotide linker.
- the nucleotide linker ranges from 1-50 nucleotides in length.
- the agRNA is capable of binding to a napDNAbp by the scaffold sequence of the gRNA and directing the napDNAbp to a target nucleic acid sequence (e.g., genomic DNA).
- the target nucleic acid sequence comprises (i) a target strand and (ii) a complementary non-target strand.
- the target strand comprises a protospacer sequence that binds to the spacer sequence of the gRNA forming an RNA-DNA hybrid.
- the non-target strand comprises the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor).
- the non-target strand binds to 3 '-nucleic acid extension of the agRNA.
- the 3 '-nucleic acid extension binds to the non-target strand using (a) an upstream binding sequence (UBS) that is complementary the non-target stand and binds downstream of the target nucleic acid on the non-target strand, and/or (b) a downstream binding sequence (DBS) that is complementary the non-target stand and binds upstream of the target nucleic acid on the non-target strand (FIG. IB and FIG. 5A).
- UBS upstream binding sequence
- DBS downstream binding sequence
- an agRNA for nucleobase editing comprises a 3 '-nucleic acid extension comprising nucleic acids encoding the upstream binding sequence (UBS).
- UBS upstream binding sequence
- the UBS is complementary to the non-target stand and binds downstream of the target nucleic acid on the non-target strand.
- an agRNA for nucleobase editing comprises a 3 '-nucleic acid extension comprising nucleic acids encoding the downstream binding sequence (DBS).
- the DBS is complementary to the non-target stand and binds upstream of the target nucleic acid on the non-target strand(FIG. IB and FIG. 5A).
- the UBS is about or more than about 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or more nucleotides in length.
- the DBS is about or more than about 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or more nucleotides in length.
- the UBS and/or the DBS are at least 85% homologous, at least 90% homologous, at least 95 % homologous, at least 97 % homologous, at least 99% homologous, at least 99.7 % homologous, or 100% homologous to the non-target strand.
- the UBS and/or DBS comprises at least 1, at least 2, at least 3, at least 4, or at least 5 mismatches.
- the 3 '-nucleic acid extension further comprises a counterloop sequence (CLS).
- CLS is about or more than about 0, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, or more nucleotides in length.
- the CLS forms a secondary structural feature.
- the CLS is a hairpin.
- the CLS is flanked by the UBS and the DBS.
- the 3 '-nucleic acid extension further comprises a secondary structural element.
- the secondary structure element is a tevopreQl motif (FIG. IF).
- the location of the editing window can change when a different Cas domain is used, or when the deaminase domain changes.
- SaCas9 typically supports a broader editing window (typically protospacer positions -3-12 for CBEs and -4-12 for ABEs) than SpCas9 (protospacer positions -4-8 for CBEs and -4-7 for ABEs).
- a broader editing window increases the frequency of bystander editing by a nucleobase editor.
- agRNAs may be modified in one or more ways to restricted the movement of the nontarget strand (e.g., genomic DNA strand not bound by the spacer of the gRNA) within the active site of the nucleobase editor, thus resulting in a smaller editing window and minimizing the bystander effect and/or improve editing efficiency of the target nucleic acid by the nucleobase editor.
- the agRNA is modified to include 3'- nucleic acid extension comprising, but not limited to, a UBS, a UBS and a CLS, a CLS, a CLS and a DBS, a UBS and a DBS, or a UBS and a CLS and a DBS.
- the 3 '-nucleic acid extension stabilizes the non-target strand (e.g., genomic DNA strand not bound by the spacer of the gRNA) comprising the target nucleic acid (e.g., the nucleobase to be edited by a nucleobase editor) within the active site of the nucleobase editor (FIG. IB).
- the agRNA improves the editing efficiency of a target nucleic acid by a nucleobase editor relative to the editing efficiency of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
- the agRNA reduces bystander editing of bystander nucleic acids within an editing window of a target nucleic acid for a nucleobase editor relative to the bystander editing within a window of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
- the target nucleic acid sequence comprises a target nucleic acid (also referred to as “the target nucleobase”), wherein the target nucleic acid falls within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site.
- the target nucleic acid falls within a gene, a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by a pathogenic Single Nucleotide Polymorphisms (SNPs).
- SNPs are the most common genetic variations for various complex human diseases and disorders, including inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods and uses described herein.
- the target nucleic acid sequence comprises one or more target nucleic acids (also referred to as “the target nucleobase”), wherein the one or more target nucleic acids fall within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site.
- the target nucleic acid sequence comprises one or more target nucleic acids (also referred to as “the target nucleobase”), wherein the one or more target nucleic acids fall within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site.
- the target nucleic acid falls within a gene (e.g., DNMT1 a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by a pathogenic Single Nucleotide Polymorphisms (SNPs).
- SNPs are the most common genetic variations for various complex human diseases and disorders, including, but not limited to, inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods and uses described herein.
- the present disclosure provides an agRNA library comprising a plurality of agRNAs described herein.
- Each agRNA library comprises agRNAs that bind up- (e.g., utilizing UBSs) and/or downstream (i.e., utilizing DBSs) of a specific target DNA strand that is later present in the active site of the nucleobase editor (FIG. IB and FIG. 5A). Therefore, each agRNA library is specific to the intended target of the nucleobase.
- the agRNA library comprising between 2,000-75,000 different agRNAs, wherein the agRNAs comprise different 3 '-nucleic acid extensions.
- the agRNA library varies the 3'-nucleic acid extensions by length (e.g., the length of the 3'- nucleic acid extension is varied in the library). In some embodiments, the agRNA library varies the 3'-nucleic acid extensions by UBS, DBS, or CFS (e.g., the UBS and DBS are kept constant, while the CFS is varied). That is, in some embodiments, the 3 '-nucleic acid extension is varied in the library by UBS, DBS, or CFS (e.g., the UBS and DBS are kept constant, while the CFS is varied).
- the agRNA library varies the 3'- nucleic acid extensions by structure such that the 3 '-nucleic acid extensions comprise different combinations of UBSs, CUSs, and DBSs (e.g., an agRNA library comprising agRNAs with a CFS in the 3'- nucleic extension versus an agRNA library comprising agRNAs without a CLS in the 3'- nucleic extension), or further comprise other secondary structural elements (e.g., tevopreql motif).
- the 3'-nucleic acid extension is varied in the library by structure such that the 3 '-nucleic acid extensions comprise different combinations of UBSs, CLSs, and DBSs (e.g., an agRNA library comprising agRNAs with a CLS in the 3'- nucleic extension versus an agRNA library comprising agRNAs without a CLS in the 3'- nucleic extension), or further comprise other secondary structural elements (e.g., tevopreql motif).
- the agRNA library is used to screen which agRNAs improve editing efficiency and/or reduce bystander editing of a nucleobase editor.
- the disclosure provides polynucleotides, vectors, and cells, comprising an agRNA described herein for screening the editing pattern for each nucleobase combined with a particular agRNA.
- the present disclosure describes a polynucleotide comprising an agRNA.
- the polynucleotide may further comprise a target nucleic acid sequence (e.g., a gene of interest) comprising the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor) downstream of the agRNA sequence.
- a target nucleic acid sequence e.g., a gene of interest
- the target nucleic acid e.g., the nucleobase to be edited by the nucleobase editor
- the present disclosure provides a vector comprising a polynucleotide described herein.
- the vector comprises a polynucleotide encoding an agRNA described herein.
- the polynucleotide can be under the control of a promoter.
- the polynucleotide can be under the control of multiple promoters.
- the promoter can be any promoter recognized by a skilled artisan (e.g., a constitutive promoter, a tissue- specific promoter, or an inducible promoter).
- the promoter can be a U6 promoter.
- the promoter can also be a U6, U6v4, U6v7, or U6v9 promoter or a fragment thereof.
- the vector further comprises a polynucleotide sequence comprising an agRNA described herein and a target nucleic acid sequence (e.g., a gene of interest) that includes the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor) (FIG. 5A).
- the target nucleic acid sequence is located downstream of the agRNA sequence.
- the agRNA and the target nucleic acid sequence are within a 50-600-nucleotide window (e.g., a 100-nucleotide window, a 300-nucleotide window, a 450-nucleotide window, etc.).
- the vector further comprises at least one primer binding site. In certain embodiments, the vector further comprises at least two primer binding sites.
- the vector comprising the one or more primer binding sites is subjected to next-generation sequencing (NGS) to sequence the agRNA and the target nucleic acid after the editing process in order to analyze the editing pattern (e.g., editing efficiency and bystander editing) of a nucleobase with a given agRNA.
- NGS next-generation sequencing
- a first primer binding site is located upstream or within the agRNA, while a second primer binding site is located downstream of a target nucleic acid sequence.
- the distance between the first and second primer sites is less than 600, less than 500, less than 300, less than 200, less than 100, or less than 50 nucleotides. In certain embodiments, the distance between the first and second primer sites is less than 300 nucleotides.
- the present disclosure provides an agRNA screening library comprising a plurality of vectors described above and provided in this disclosure.
- next-generation sequencing NGS is used to sequence the plurality of vectors of the agRNA screening library to analyze the editing pattern (e.g., editing efficiency and bystander editing) of a nucleobase and a given agRNA.
- the agRNA and the target nucleic acid sequence are within within the 300-nucleotide window.
- the target sequence falls within the human DNMT1 gene.
- the present disclosure describes a composition
- a composition comprising (a) an agRNA and (b) a nucleobase editor (e.g., ABExl, ABEx2, ABEx3, or ABEx4) to carry out nucleobase editing.
- the composition further comprises (c) a target nucleic acid.
- the nucleobase editor is a fusion protein capable of base editing.
- the fusion protein comprises a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
- the composition comprises (a) an agRNA, (b) a N-terminal portion of a split nucleobase editor fused at its C-terminus to an intein-N and (c) a C-terminal portion of a split nucleobase editor fused at its N-terminus to an intein-C such that the N- terminal portion of a split nucleobase editor and the C-terminal portion of a split nucleobase editor are joined to form a fusion protein of a deaminase and a napDNAbp.
- the composition further comprises (d) a target nucleic acid.
- the napDNAbp is selected from the group consisting of Cas9, CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp-Cas9, SpRY, SpG-Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2, and variants thereof.
- the deaminase is a cytidine deaminase or an adenosine deaminase.
- the cytidine deaminase is selected from the group consisting of CBE6, CGBE, BE4max, TadCBE, and variants thereof.
- the adenosine deaminase is selected from the group consisting of TadA-8e, ABE8e, AYBE, ABE9e, and variants thereof.
- the deaminase is an adenosine deaminase comprising an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1 of a variant thereof.
- the adenosine deaminase has an amino acid sequence that includes one or more amino acid substitutions at positions 28, 34, and 151 relative to the corresponding position in the sequence of SEQ ID NO: 1 or a variant thereof.
- the adenosine deaminase has an amino acid sequence that comprises one or more amino acid substitutions selected from V28C, L34W, and M151E relative to the corresponding position in SEQ ID NO: 1 or a variant thereof.
- the present disclosure describes a complex comprising any of the agRNAs described herein and a nucleobase editor described herein or elsewhere.
- the present disclosure describes one or more polynucleotides encoding a complex of an agRNA and a nucleobase editor described herein or elsewhere.
- the present disclosure describes one or more vectors comprising one or more polynucleotides encoding the components of a complex of an agRNA and a nucleobase editor.
- the vector includes one or more promoters that drive the expression of the agRNA and the nucleobase editor or split nucleobase editor of the complex.
- the disclosure provides cells (e.g., transformed cell lines) that comprise the agRNA described herein.
- the cells can also comprise the nucleobase editing complexes described herein (e.g., wherein the cell comprises both an agRNA and a nucleobase editor).
- the cells can also comprise any of the polynucleotides described above, which express the agRNA, and optionally which express the nucleobase editors.
- the cells can comprise any of the vectors described above, which express the agRNA, and optionally which express the nucleobase editor.
- the disclosure provides a pharmaceutical composition comprising: (i) an agRNA described above, or a nucleobase editing complex described above, a polynucleotide described above, or a vector described above, or any of the cells described above, and (ii) a pharmaceutically acceptable excipient.
- the disclosure describes a computational method, which may be embodied in software, for designing a library of 3'-nucleic acid extensions.
- the method involves evaluating a target nucleic acid sequence and generating UBSs, DBSs, and CLSs at varying lengths (e.g., 0-50 nucleotides), and 3 '-nucleic acid extensions comprising different combinations of the various UBSs, DBSs, and/or CLSs.
- the disclosure describes a method of selecting agRNAs, wherein the method involves transfecting the agRNA screening libraries described herein into cells, and using (Next Generation Sequencing) NGS to select agRNA vectors that observe reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid by a nucleobase editor.
- the reduced bystander editing and/or editing efficiency of a nucleobase editor is measured relative to the gRNA lacking the 3 '-nucleic acid extension of the agRNA or a second agRNA.
- PANCE phage-assisted, non-continuous evolution
- the selection plasmid comprises (i) a pill nucleotide sequence (encoding the phage coat protein pill) that has been modified to contain at least one single nucleotide variant (SNV) and (ii) an agRNA nucleotide sequence encoding the corresponding agRNA that targets the modified pill nucleotide sequence to correct (edit) the SNV to the wildtype sequence.
- SNV single nucleotide variant
- the SNV results in a mutated pill protein having lower the phage infectivity. Correction of the SNV by a complex of the agRNA and nucleobase editor to the WT sequence increases phage infectivity. If the perfect edit occurs, the pill sequence is reverted to the wildtype sequence and phage propagation occurs.
- the selection plasmid comprises a sequence such that bystander edits upstream and downstream of the target nucleic acid in the pill nucleotide sequence introduce mutations that inhibit phage propagation.
- host cells further comprise a helper plasmid and/or a mutagenesis plasmid.
- a mutagenesis plasmid comprises an arabinose- inducible promoter.
- Some aspects of this disclosure provide methods of selecting nucleobase editors that show reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid utilizing machine learning (ML) language models.
- the ML language models are able to predict evolutionary adaptative mutations resulting in nucleobase editor variants with improved fitness.
- the disclosure provides a method of nucleobase editing (e.g., “base editing”) comprising contacting a target nucleic acid sequence with an agRNA described above and a nucleobase editor comprising a fusion protein comprising a deaminase and a napDNAbp or a split napDNAbp, wherein the editing efficiency is increased, and/or the bystander editing is decreased as compared to the same method using a gRNA not including the 3 '-nucleic acid extension.
- base editing e.g., “base editing”
- the present disclosure contemplates the use of the agRNAs described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell (e.g., prokaryotic or eukaryotic cell), a virus, a fungus, a plant, an insect, and/or an animal.
- SNV single nucleotide variant
- the present disclosure contemplates the use of the methods described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell (e.g., prokaryotic or eukaryotic cell), a virus, a fungus, a plant, an insect, and/or an animal.
- SNV single nucleotide variant
- the use of the methods described herein may be used for modifying a target nucleic acid sequence for research purposes (e.g., to edit or introduce a nonpathogenic SNP that may enhance or abolish a function, a process, or a phenotype).
- the disclosure provides a kit comprising: (i) agRNA described above, or a nucleobase editing complex described above, a nucleic acid molecule described above, or a vector described above, or any of the cells described above, and (ii) a set of instructions for conducting nucleobase editing.
- FIGs. 1A-1I depict the design and testing of agRNA library.
- Figure 1A is a schematic workflow of the dual nucleobase editor system evolution starting with sequencing the patient specific mutation, testing existing base editing enzymes for that context, and identifying the editing pattern. If bystander mutations exist, a personalized agRNA for the specific context in combination with an optionally evolved nucleobase editor can generate a “bystander-less” and active base editing system that is personalized for the patient.
- Figure IB is a schematic of library design.
- FIG. 1C is a dot plot representation of the ⁇ 60K agRNA clones’ library after NGS evaluation. Shown as squares are agRNA candidates with high efficiency and low bystander editing in the DNMT1 cloned context.
- FIG. ID shows the editing pattern of ABE8e at the human DNMT1 locus in combination with selected agRNAs.
- FIG. IE shows the editing pattern of ABE8e in combination with agRNA56114-tev opreq 1 at the human DNMT1 locus in HEK293T cells.
- FIG. 1G shows the editing pattern of ABE8e and the different guide RNA combinations shown in f at the human DNMT1 locus.
- FIG. 1H shows the influence of the agRNA56114 with and without tevopreql motif on the editing pattern in a ⁇ 12K different pathogenic contexts.
- FIG. II is the editing pattern of ABE8e in combination with agRNA56114-tev opreq 1 at the human DNMT1 locus in HeLa and HepG2 cells. agRNA testing data in the native DNMT1 locus was obtained from n>3 independent experiments.
- FIGs. 2A-2G shows phage assisted non-continuous evolution of adenine base editors.
- FIG. 2A depicts a schematic representation of the PANCE.
- the three different selection plasmids encode for different pill sequences carrying a single nucleotide variant (SNV) together with the corresponding agRNA to correct the SNV to the wild type sequence.
- FIG. 2B shows that once the selection phage and the selection plasmid meet in the same cell, the SNVs of SP1-3 can be corrected. If the perfect edit occurs, the sequence is reverted to the wild type sequence (e.g., Leu, Arg, or Vai). The SNVs introduce an amino acid exchange that lowers the phage infectivity (Weiss). Bystander edits can introduce mutations that result in a decrease in the phage replication (e.g., Pro or Gly).
- FIG. 2D is the editing pattern of the selected mutants generated in the PANCE experiment in the human DNMT1 locus in HEK293T cells (n>3 independent experiments).
- FIG. 2E shows the fold change of the amino acid exchanges V28C and L34W representing the two mutants evolved in the PANCE experiment. Dots represent the fold change in the three different replicates of the PANCE evolution.
- FIGs. 2F-2H depict the computational modeling of the structural change resulting from the amino acid exchanges of the V28C and L34W mutants.
- FIGs. 3A-3F depict machine learning guided base editor evolution.
- FIG. 3A is a schematic workflow of the machine learning approach to identify evolutionary plausible mutations.
- FIG. 3C shows the fold change of the amino acid exchange glutamic acid (M) to aspartic acid (D) at position 151 in the PANCE experiment.
- FIG. 3A is a schematic workflow of the machine learning approach to identify evolutionary plausible mutations.
- FIG. 3F depicts the computational modeling of the structural change resulting from the amino acid exchanges of the M151E mutant.
- FIGs. 4A-4D show bystander abolishment by ABEx-agRNA combinations.
- FIG. 4A shows the AB Ex variants and corresponding mutations. ABExl and 3 were generated by PANCE, ABEx2 by machine learning and ABEx4 is a combination of both techniques.
- FIG. 4A-4D show bystander abolishment by ABEx-agRNA combinations.
- FIG. 4A shows the AB Ex variants and corresponding mutations. ABExl and 3 were generated by PANCE, ABEx2 by machine learning and ABEx4 is a combination of both techniques.
- FIG. 4B is the editing pattern in the human DNMT1 locus caused by ABEX spCas9-SpRY variants combined
- FIG. 4D shows the AB EX-SpRY variants fold change editing efficiency normalized vs ABE8e, analyzed in -12000 different pathogenic contexts.
- FIGs. 5A-5E show Sanger traces and NGS analysis of ABE8e-spCas9-WT with sgRNACtrl and agRNA56114.
- FIG. 5A is a schematic representation of agRNA library design and workflow. The TadA-8e domain engages with the exposed single-stranded region of the PAM-distal nontarget strand (NTS) fostering deamination. Based on this a 3’ extended agRNA library was designed and cloned into (plasmid). Next, 20 million HEK293T cells were transfected and analyzed the editing pattern by Illumina sequencing (miSeq V3 300 cycles). FIGs.
- FIGS. 5B-5C are Sanger sequencing chromatograms on the DNMT1 genome locus after editing using ABE8e-spCas9-WT with the sgRNACtrl and agRNA56114-tevopreql.
- FIGs. 5D-5E show the editing frequencies on the DNMT1 genome locus after editing using ABE8e-spCas9-WT with the sgRNACtrl and agRNA56114-tevopreql analyzed by NGS.
- FIGs. 6A-6E show the PANCE constructs and titers.
- FIG. 6A depicts a selection phage design containing the TadA fused with the Npu N-terminal intein.
- FIG. 6B shows schematics of the Selection plasmids containing the gene III, the agRNA and the Cas9 fused with the Npu C-terminal intein.
- FIG. 6C depicts the mapping of agRNA binding sites on the gene III.
- FIG. 6D is a depiction of the PANCE workflow schematics.
- FIG. 6E shows the phage titer across the ten rounds of evolution.
- FIGs. 7A-7C show PANCE variant testing in the DNMTl-Site.
- FIG. 7A show RNP editing efficiencies using the ABE8e-SpRY and the ABE9-WT. ABE9 showed low editing efficiency using both plasmids and RNP editing strategies.
- FIGs. 7B-7C show the editing efficiency of 50 PANCE evolved clones with the sgRNActri (FIG. 7B) and agRNAseiu- tevopreql •
- FIGs. 8A-8B are a structural comparison of ABE8e and AB Ex variants.
- FIG. 8A depicts the structure modeling of ABE8e-WT. Snapshot is at the editing interphase representing the deaminase (pink), the WT mutated aminoacids (grey) and the DNA (yellow).
- FIG. 8B shows the number of H bonds formed between amino acids in positions mutated (columns) and neighbor amino acids (rows). The table in FIG. 8B shows H bonds for those positions with wild-type amino acid (WT), mutated amino acid (mut), and the difference between mutated and wild type (dif).
- Both wild-type V28 and mutated C28 are predicted to establish the same interactions with surrounding residues, i.e., a hydrogen bond with V30. It is possible that a mutation in residue 28 induces a conformational change and it may interact with nucleotide 6 of the gRNA given its proximity (FIG. 2F). Residues E34 and W34 are also predicted to establish the same interactions with surrounding amino acids, i.e., one and two hydrogen bonds with 141 and G42, respectively. Residue 34 is far from the gRNA but the substitution from E to W, which is a more hydrophobic amino acid, could alter the orientation of the alpha-helix arm (orange) where residue H57 lies.
- FIGs. 9A-9B show the ME variant testing the DNMTl-Site.
- FIGs. 9A-9B show the editing efficiencies of the 21 ME obtained variants with the sgRNActri (FIG. 9A) and agRNA 56 ii4 (FIG. 9B).
- FIGs. 10A-10E show the ABE8e and ABExl editing and structure comparison.
- FIGs. 10A-10B are Sanger sequencing chromatograms of the DNMT1 genome locus after editing using ABE8e-SpRY with sgRNActri and agRNAseiu.
- FIGs. 11A-11D show the variant testing in HEK-Site3.
- FIG. 11A shows the editing efficiencies of the HEK site 3 locus using both PANCE and ML variants, together with the double mutants in HEK293T cells.
- FIGs. 1 IB-11C are the editing efficiencies of the HEK site 3 locus using agRNAseiu-tevopreqi in HepG2 (FIG. 1 IB) and HeLa (FIG. 11C) cell lines.
- FIG. 12 shows the DNMT1 site editing window movement. Mean editing efficiencies in the DNMT1 locus using different sgRNAs and agRNAs targeting the same site. The different guides are named as -1, -2 or -3 if the editing window is moved upstream and +1 if downstream the agRNActri used in this work (centered).
- FIGs. 13A-13H show the Cas-dependent and independent off-targets.
- FIG. 13A is the on-target editing of ABE8e-SpRY and ABExl-SpRY in the DNMT1 locus.
- FIG. 13B-13E show the Cas-dependent off-target analysis of ABE8e-SpRY and ABExl -SpRY in four different loci.
- FIG. 13F shows schematics of the orthogonal R-loop assay.
- FIGs. 13G-13H show the Cas-independent off-target editing of ABE8e-SpRY and ABExl -SpRY in two different sites.
- FIG. 14 shows the Path_Var libraries analysis. Editing profile for the different SpRY variants using sgRNACtrl and agRNA56114-tevopreql compared with ABE8e and ABE9 [0060]
- FIGs. 15A-15H shows the Path_Var libraries analysis.
- FIG. 15A shows read counts of each sample after Illumina sequencing.
- FIG. 15B shows an example of the quality score of readl and read 2.
- FIG. 15C shows the editing profile for the different SpRY variants using sgRNActri compared with ABE8e and ABE9. FIGs.
- FIGS. 15D-15G are the editing profiles for the different SpRY variants using sgRNActri compared with ABE8e and ABE9 when more than 2 As are present in the editing window.
- Figure 15H shows the C to A, C to G, and C to T editing profiles using sgRNActri for the different SpRY variants compared with ABE8e and ABE9.
- FIGs. 16A-16D show the effect of 3' extended gRNAs on the editing pattern at the human DNMT1 locus.
- FIG. 16A Scheme of the agRNA (“Anchor”, yellow) in the base editing system.
- FIG. 16B Experimental workflow and design of the agRNA library. More than 60k different agRNAs were cloned downstream of the scaffold within a 300 nt DNA sequence that also contains the target DNA site of the guide.
- FIG. 16C Editing pattern and PreS for the top five performing agRNAs within the human DNMT1 locus.
- FIG. 17A-17I show phage-assisted non-continuous evolution of precise adenine base editors.
- FIG. 17A Schematic representation of the PANCE.
- the selection phage encoded the TadA-8e adenine deaminase with a C-terminal intein, while the selection plasmids (SP 1- 3) the nCas9-SpRY with a N-terminal intein.
- the three different selection plasmids encode for different pill sequences carrying a single nucleotide variant (SNV) together with the corresponding agRNA to correct the SNV to the wildtype sequence.
- SNV single nucleotide variant
- FIG. 17D NGS analysis of editing efficiency of the ABE8e control and the two best PANCE generated variants (based on PreS) V28C and L34W in the human DNMT1 locus with and without the agRNAseiu.
- FIG. 17E Precise A-to-G editing (A 13) within edited reads.
- FIG. 17F Fold change precision over ABE8e_SpRY sgRNActri.
- FIG. 17G Fold change of the ratio of the V28C and L34W mutation across 10 rounds of evolution.
- FIG. 17H Editing of Cas9-dependent off-targets of the V28C and L34W in 4 different predicted sites across the human genome.
- FIGs. 18A-18I show machine learning-guided design of adenine base editor variants with reduced bystander editing.
- FIG. 18A Schematic workflow of the machine learning approach to identify evolutionary plausible mutations.
- FIG. 18B (left) Editing efficiencies in the human DNMT1 locus caused by ABE predicted single amino acid exchange mutants (with sgRNActri and agRNAseiu) identified by machine learning algorithm in HEK293T cells, (right) PreS score of tested variants.
- the underlined mutations are mutations that revert to the wildtype TadA amino acid at that position.
- FIG. 18C NGS abundance analysis control and the best machine learning variant M151E in the human DNMT1 locus with and without the 56114 agRNA.
- FIG. 18D Fold change of precise mutations in Ar, (top) or A 15 (bottom).
- FIG. 18E V28C variant predicted mutations. The numbers indicated how many of the 6 models indicated the mutation as a plausible mutation. The underlined mutations are mutations that revert to the wildtype TadA amino acid at that position.
- FIG. 18F Editing efficiencies at the endogenous DNMT1 (top) and Site9 (bottom) of the PANCE and machine learning generated double mutant V28C-M151E
- FIG. 18G Amino acid abundance at position 151 after the PANCE experiment.
- FIG. 18H Fold change of ratio of the M151D mutation throughout the PANCE experiment.
- FIGs. 19A-19P show editing pattern of the base editors.
- FIG. 19A Normalized fold-change histograms of PANCE and ML evolved variants taking ABE8e_SpRY as a control across thousands of human pathogenic variants.
- FIG. 19B Fold change in precision correction of pathogenic mutations when more than 2 adenines are present in the target site.
- FIG. 19C Fold change in precision correction of pathogenic mutations when more than 3 adenines are present in the target site.
- FIG. 19D Normalized fold-change histograms of V28C variant in YA or RA contexts.
- FIG. 19E-19N Editing pattern for twelve individual DNA sites in bulk and on single-read level after editing with the V28C variant and ABE8e_SpRY.
- FIG. 19E Site 2.
- FIG. 19F Site 4.
- FIG. 19G Site 7.
- FIG. 19H Site 8.
- FIG. 191 Site 9.
- FIG. 19J Site 10.
- FIG. 19K Site 12.
- FIG. 19L Site 13.
- FIG. 19M Site 14.
- FIG. 19N Site 18.
- FIGs. 20A-20J show the performance of V28C as an editing tool for correcting human pathogenic mutations.
- FIG. 20A In silico model of the effect of the V28C mutation on the interaction with the target DNA.
- FIG. 20B Scheme of PCSK9 editing strategy using adenine base editors to disrupt a splicing site
- FIG. 20D Quantification of A-to- G editing at the target site.
- FIG. 20E Scheme of the SNCA E46K mutation.
- FIG. 20F Sequence similarity of the guide targeting the human DNMT1 locus and the guide targeting the SNCA E46K mutation.
- FIG. 20G NGS abundance analysis results of the editing of the SNCA E46K mutation by ABE8e_SpRY and the V28C variant with agRNActri and agRNAs6ii4. The sequence highlighted in green represents the perfect edit.
- FIG. 20H Editing pattern of the SNCA E46K mutation by ABE8e and the V28C variant with agRNActri and agRNAs6ii4.
- FIG. 201 Precise A-to-G editing (A15) within edited reads.
- FIGs. 21A-21G show agRNA library design and testing.
- FIG. 21A-21B Schematic representation of agRNA library design and workflow.
- FIG. 21A The TadA-8e domain engages with the exposed single- stranded region of the PAM-distal nontarget strand (NTS) fostering deamination.
- FIG. 21B Based on this, a 3' extended agRNA library was designed and cloned into (plasmid). Next 20 million HEK293T cells were transfected and analyzed the editing pattern by Illumina sequencing (miSeq V3 300 cycles).
- FIG. 21C Dot plot representation of the ⁇ 60K agRNA clones’ library after NGS evaluation.
- FIG. 21D Bulk editing efficiencies on the DNMT1 genome locus after editing using ABE8e-spCas9-WT with the sgRNACtrl and agRNAseiu analyzed by NGS.
- FIG. 21E Fold-change precision quantification of A13A15 and A13 edited species (based on NGS abundance analysis) using agRNA56114 over gRNACtrl.
- FIG. 21F-21G A-to-G editing efficiencies in (FIG. 21F) HeLa and (FIG. 21G) HepG2 cells.
- FIGs. 22A-22F shows PANCE constructs and their evaluation.
- FIG. 22A Mapping of agRNA binding sites on the gene III in the Selection Plasmid.
- FIG. 22B Schematic representation of the PANCE workflow.
- FIG. 22C Phage titer across the ten rounds of evolution.
- FIG. 22A Mapping of agRNA binding sites on the gene III in the Selection Plasmid.
- FIG. 22B Schematic representation of the PANCE workflow.
- FIG. 22C Phage titer across the ten rounds of evolution.
- FIG. 22F Frequency of A-to-I editing.
- FIGs. 23A-23B show base editing variant testing in HEK-Site3.
- FIG. 23A Editing efficiencies of the HEK Site9 locus using both PANCE and ML variants, together with the double mutants in HEK293T cells.
- FIG. 23B Editing efficiencies of the HEK Site9 locus using agRNA56114
- FIGs. 24A-24I show PathVar library analysis.
- FIG. 24A Schematic representation of the PathVar library workflow using Tol2 mediated transposition.
- FIG. 24B Editing profile for the different SpRY variants using sgRNACtrl compared with ABE8e and ABE9.
- FIG. 24C-24F Editing profile for the different SpRY variants 5 using sgRNACtrl compared with ABE8e and ABE9 when more than 2 As are present in the editing window.
- FIG. 24G C to A, C to G and C to T editing profiles using sgRNACtrl for the different SpRY variants compared with ABE8e and ABE9.
- FIG. 24H and 241) NGS abundance analysis and editing pattern of ABE8_SpRY and V28C_SpRY using sgRNACtrl at (FIG. 24H) Site3 and (FIG. 241) Site 16 sites.
- FIGs. 25A-25D showV28C-M151E variant performance to correct mutation E46K in human iPSCs.
- FIG. 25A Editing efficiencies of V28C-M151E with sgRNACtrl and agRNA56114 compared toABE8e and V28C variants.
- FIG. 25B NGS abundance analysis 5 of V28C-M151E editing at the target site.
- FIG. 25C Precise A-to-G editing (A15) within edited reads.
- administer refers to implanting, absorbing, ingesting, injecting, inhaling, or otherwise introducing a treatment or therapeutic agent, or a composition of treatments or therapeutic agents, in or on a subject.
- biomolecule refers to any substance produced by cells or living organisms and includes carbohydrates, lipids, nucleic acids, proteins, and vitamins.
- cDNA refers to DNA that is derived from (e.g., by reverse transcription) and complementary to an RNA template (e.g.. an mRNA template or an rRNA template).
- RNA template e.g. an mRNA template or an rRNA template.
- a “cell,” as used herein, may be present in a population of cells e.g., in a tissue, a sample, a biopsy, an organ, or an organoid).
- a population of cells is composed of a plurality of different cell types.
- Cells for use in the methods and systems of the present disclosure can be present within an organism, a single cell type derived from an organism, or a mixture of cell types. Included are naturally occurring cells and cell populations, genetically engineered cell lines, cells derived from transgenic animals, cells from a subject, etc. Virtually any cell type and size can be accommodated in the methods and systems described herein.
- the cells are mammalian cells (e.g., complex cell populations such as naturally occurring tissues).
- the cells are from a human.
- the cells are collected from a subject (e.g., a human) through a medical procedure, such as a biopsy.
- the cells may be a cultured population (e.g., a culture derived from a complex population or a culture derived from a single cell type where the cells have differentiated into multiple lineages).
- the cells may also be provided in situ in a tissue sample.
- base editor or equivalently “nucleobase editor (NBE)” refers to an agent comprising a polypeptide that is capable of making a modification to a base (e.g., A, T, C, G, or U) within a nucleic acid sequence (e.g., DNA or RNA).
- a nucleic acid sequence e.g., DNA or RNA.
- the nucleobase editor is capable of deaminating a base within a nucleic acid.
- the nucleobase editor is capable of deaminating a base within a DNA molecule.
- the nucleobase editor is capable of deaminating a cytosine (C) in DNA.
- the nucleobase editor is capable of deaminating a adenine (A) in DNA. In some embodiments, the nucleobase editor is capable of excising a base within a DNA molecule. In some embodiments, the nucleobase editor is capable of excising an adenine, guanine, cytosine, thymine or uracil within a nucleic acid (e.g., DNA or RNA) molecule. In some embodiments, the nucleobase editor is a protein (e.g., a fusion protein) comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to a cytidine deaminase.
- a protein e.g., a fusion protein
- napDNAbp nucleic acid programmable DNA binding protein
- the nucleobase editor is a protein (e.g., a fusion protein) comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to an adenosine deaminase.
- a protein e.g., a fusion protein
- napDNAbp nucleic acid programmable DNA binding protein
- Cas9 or “Cas9 domain” refers to an RNA-guided nuclease comprising a Cas9 protein, or a fragment thereof (e.g., a protein comprising an active, inactive, or partially active DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9).
- a Cas9 nuclease is also referred to sometimes as a casnl nuclease or a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease.
- CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids).
- CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer.
- tracrRNA trans-encoded small RNA
- rnc endogenous ribonuclease 3
- Cas9 protein serves as a guide for ribonuclease 3-aided processing of pre-crRNA.
- RNA single guide RNAs
- sgRNA single guide RNAs
- gNRA single guide RNAs
- Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self-versus non-self.
- Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D .J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar E.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc.
- Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and 5. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.
- a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a nickase.
- a Cas9 nuclease lacks an active (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a dead Cas9 (dCas9).
- nucleic acid programmable DNA binding protein refers to a protein that associates with a nucleic acid (e.g., DNA or RNA), such as a guide nuclic acid, that guides the napDNAbp to a specific nucleic acid sequence.
- a Cas9 protein can associate with a guide RNA that guides the Cas9 protein to a specific DNA sequence that is complementary to the guide RNA.
- the napDNAbp is a class 2 microbial CRISPR-Cas effector.
- the napDNAbp is a Cas9 domain, for example a nuclease active Cas9, a Cas9 nickase (nCas9), or a nuclease inactive Cas9 (dCas9).
- nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., dCas9 and nCas9), CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp- Cas9, SpRY, SpG-Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2, and variants thereof.
- nucleic acid programmable DNAbinding proteins also include nucleic acid programmable proteins that bind RNA.
- the napDNAbp may be associated with a nucleic acid that guides the napDNAbp to an RNA.
- Other nucleic acid programmable DNA binding proteins are also within the scope of this disclosure, though they may not be specifically listed in this disclosure.
- Cas9 fusion proteins as provided herein comprise the full- length amino acid sequence of a Cas9 protein, e.g., one of the Cas9 sequences provided herein. In other embodiments, however, fusion proteins as provided herein do not comprise a full-length Cas9 sequence, but only a fragment thereof.
- a Cas9 fusion protein provided herein comprises a Cas9 fragment, wherein the fragment binds crRNA and tracrRNA or sgRNA, but does not comprise a functional nuclease domain, e.g., in that it comprises only a truncated version of a nuclease domain or no nuclease domain at all.
- Cas9 refers to Cas9 from: Corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1, NC_016786.1); Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref: NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychroflexus torquisl (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1), Listeria innocua (NCBI Ref: NP_472073.1), Camp
- dCas9 refers to a nuclease-inactive Cas9 or nuclease-dead Cas9, or a variant thereof, and embraces any naturally occurring dCas9 from any organism, any naturally-occurring dCas9 equivalent or functional fragment thereof, any dCas9 homolog, ortholog, or paralog from any organism, and any variant of a dCas9, naturally-occurring or engineered.
- dCas9 is not meant to be particularly limiting and may be referred to as a “dCas9 or variant thereof.”
- Exemplary dCas9 proteins and method for making dCas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference.
- Any suitable mutation which inactivates both Cas9 endonucleases such as D10A and H840A mutations in the wild-type S. pyogenes Cas9 amino acid sequence (SEQ ID NO: 77), or D10A and N580A mutations in the wild-type S. aureus Cas9 amino acid sequence, may be used to form the dCas9.
- wild type Cas9 corresponds to Cas9 from
- Streptococcus pyogenes (Uniport Reference Sequence: Q99ZW2, SEQ ID NO: 77 (amino acid).
- nCas9 or “Cas9 nickase” refers to a Cas9 or a variant thereof, which cleaves or nicks only one of the strands of a target cut site thereby introducing a nick in a double strand DNA molecule rather than creating a double strand break.
- This can be achieved by introducing appropriate mutations in a wild-type Cas9 which inactivates one of the two endonuclease activities of the Cas9.
- Any suitable mutation which inactivates one Cas9 endonuclease activity but leaves the other intact is contemplated, such as one of D10A or H840A mutations in the wild- type .S'.
- the napDNAbp comprises a Cas9 nickase, wherein the Cas9 nickase is .S'. aureus Cas9 comprising a D10A mutation.
- Cas9 proteins e.g., a nuclease dead Cas9 (dCas9), a Cas9 nickase (nCas9), or a nuclease active Cas9), including variants and homologs thereof, are within the scope of this disclosure.
- dCas9 nuclease dead Cas9
- nCas9 Cas9 nickase
- nuclease active Cas9 including variants and homologs thereof
- deaminase or “deaminase domain,” as used herein, refers to a protein or enzyme that catalyzes a deamination reaction.
- the deaminase or deaminase domain is a cytidine deaminase, catalyzing the hydrolytic deamination of cytidine or deoxycytidine to uridine or deoxyuridine, respectively.
- the deaminase or deaminase domain is a cytidine deaminase domain, catalyzing the hydrolytic deamination of cytosine to uracil.
- the deaminase or deaminase domain is a naturally-occurring deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism that does not occur in nature.
- the deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally occurring deaminase from an organism.
- the napDNAbp of the nucleobase editor is a Cas9 domain.
- the nucleobase editor comprises a Cas9 protein fused to a cytidine deaminase.
- the nucleobase editor comprises a Cas9 nickase (nCas9) fused to a cytidine deaminase.
- the Cas9 nickase comprises a D10A mutation and comprises a histidine at residue 840 of SEQ ID NO: 77, or a corresponding mutation in any Cas9 provided herein, which renders Cas9 capable of cleaving only one strand of a nucleic acid duplex.
- the nucleobase editor comprises a nuclease-inactive Cas9 (dCas9) fused to a cytidine deaminase.
- the dCas9 domain comprises a D10A and a H840A mutation of SEQ ID NO: 77, or a corresponding mutation in any Cas9 provided herein, which inactivates the nuclease activity of the Cas9 protein.
- the cytidine deaminases may be enzymes that convert cytidine (C) to uracil (U) in DNA. If DNA replication occurs before uracil repair, the replication machinery may treat the uracil as thymine (T), leading to a C:G to T:A base pair conversion.
- the cytidine deaminases utilized in the nucleobase editor are apolipoprotein B mRNA-editing complex 1 (APOBEC1) deaminases, e.g. rat APOBEC1 deaminases.
- APOBEC1 apolipoprotein B mRNA-editing complex 1
- the adenosine deaminases may be may be enzymes that convert adenine (A) to guanine (G) in DNA, leading to an A:T to G:C base pair conversion.
- the adenosine deaminase is derived from a bacterium, such as, E.coli. S. aureus, S. typhi, S. putrefaciens, H. influenzae, or C. crescentus.
- the adenosine deaminase is a TadA deaminase.
- the TadA deaminase is an E. coli TadA deaminase (ecTadA). In some embodiments, the TadA deaminase is a truncated E. coli TadA deaminase.
- the truncated ecTadA may be missing one or more N- terminal amino acids relative to a full-length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 N- terminal amino acid residues relative to the full length ecTadA.
- the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 C-terminal amino acid residues relative to the full length ecTadA.
- the ecTadA deaminase does not comprise an N-terminal methionine.
- guide RNA is a particular type of guide nucleic acid which is commonly associated with a Cas protein (e.g., a Cas9 protein), directing the Cas protein to a specific sequence in a DNA molecule that includes complementarity to the protospacer sequence of the guide RNA.
- a gRNA as disclosed herein, may refer to a sgRNA or anchor guide RNA, herein referred to as “agRNA” (e.g., for base editing).
- agRNA may be naturally occurring, recombinant, synthetic, or any combination of these.
- a gRNA may direct a Cas protein (e.g., as part of a nucleobase editor) to a target site in the target gene.
- the Cas protein equivalents may include other napDNAbps from any type of CRISPR system (e.g., type II, V, VI), including Cpfl (a type-V CRISPR-Cas system), C2cl (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), and C2c3 (a type V CRISPR-Cas system).
- CRISPR system e.g., type II, V, VI
- Cpfl a type-V CRISPR-Cas system
- C2cl a type V CRISPR-Cas system
- C2c2 a type VI CRISPR-Cas system
- C2c3 a type V CRISPR-Cas system
- C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), which is incorporated herein by reference. Exemplary sequences and structures of guide RNAs are provided herein.
- guide RNAs associate with a Cas protein, directing (or programming) the Cas protein to a specific sequence in a DNA molecule that includes a sequence complementary to the protospacer sequence for the guide RNA.
- a gRNA is a component of the CRISPR/Cas system.
- the sequence specificity of a Cas DNA-binding protein is determined by gRNAs, which have nucleotide base-pairing complementarity to target DNA sequences.
- the native gRNA comprises a 20 nucleotide (nt) Specificity Determining Sequence (SDS), or spacer, which specifies the DNA sequence to be targeted, and is immediately followed by an 80 nt scaffold sequence, which associates the gRNA with the Cas protein.
- SDS Specificity Determining Sequence
- an SDS of the present disclosure has a length of 15 to 100 nucleotides, or more.
- an SDS may have a length of 15 to 90, 15 to 85, 15 to 80, 15 to 75, 15 to 70, 15 to 65, 15 to 60, 15 to 55, 15 to 50, 15 to 45, 15 to 40, 15 to 35, 15 to 30, or 15 to 20 nucleotides.
- the SDS is 20 nucleotides long.
- the SDS may be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides long. At least a portion of the target DNA sequence is complementary to the SDS of the gRNA.
- a region of the target sequence is complementary to the SDS of the gRNA sequence and is immediately followed by the correct protospacer adjacent motif (PAM) sequence.
- PAM protospacer adjacent motif
- an SDS is 100% complementary to its target sequence.
- the SDS sequence is less than 100% complementary to its target sequence and is, thus, considered to be partially complementary to its target sequence.
- a targeting sequence may be 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, or 90% complementary to its target sequence.
- the SDS of template DNA or target DNA may differ from a complementary region of a gRNA by 1, 2, 3, 4, or 5 nucleotides.
- the guide RNA is about 15-120 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence (e.g., a target sequence in DNMT1).
- a target sequence e.g., a target sequence in DNMT1.
- the guide RNA is 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
- the guide RNA comprises a sequence of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more contiguous nucleotides that is complementary to a target sequence. Sequence complementarity refers to distinct interactions between adenine and thymine (DNA) or uracil (RNA), and between guanine and cytosine.
- anchor gRNA refers to a gRNA comprising a 3’-nuceliec acid extension attached at the 3 ’-end of the gRNA.
- the 3’-nuceliec acid extension is about 1- 120 nucleotides long and comprises a sequence of at least 3 contiguous nucleotides that is complementary to a target sequence.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA by a nucleotide linker.
- the 3’- nuceliec acid extension is 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 ,1, 42, 43, 44, 45 ,46, 47, 48 49, or 50 nucleotides long.
- the 3’-nuceliec acid extension comprises an upstream binding sequence (USB) and a downstream binding sequence (DBS) that are complementary to a target sequence of a nucleobase editor.
- the UBS and/or the DSB is 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 ,1, 42, 43, 44, 45 ,46, 47, 48 49, or 50 nucleotides long.
- the 3’-nuceliec acid extension comprises a counterloop sequence (CLS).
- the CLS is 1, 2, 3, 4, 5, 6, 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 ,1, 42, 43, 44, 45 ,46, 47, 48 49, or 50 nucleotides long.
- the counterloop sequence is a hairpin.
- inhibitor of base repair refers to a protein that is capable in inhibiting the activity of a nucleic acid repair enzyme, for example a base excision repair enzyme.
- the IBR is an inhibitor of inosine base excision repair.
- Exemplary inhibitors of base repair include inhibitors of APE1, Endo III, Endo IV, Endo V, Endo VIII, Fpg, hOGGl, hNEILl, T7 Endol, T4PDG, UDG, hSMUGl, and hAAG.
- the IBR is an inhibitor of Endo V or hAAG.
- the IBR is a catalytically inactive EndoV or a catalytically inactive hAAG.
- uracil glycosylase inhibitor refers to a protein that is capable of inhibiting a uracil-DNA glycosylase base-excision repair enzyme.
- a UGI domain comprises a wild-type UGI or a UGI as set forth in SEQ ID NO: 12.
- the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment. A UGI variant shares homology to UGI, or a fragment thereof.
- nuclear localization sequence refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport.
- Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed November 23, 2000, published as WG/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences.
- mutation refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)).
- prevent refers to a prophylactic treatment of a subject who is not and was not with a disease but is at risk of developing the disease or who was with a disease, is not with the disease, but is at risk of regression of the disease.
- the subject is at a higher risk of developing the disease or at a higher risk of regression of the disease than an average healthy member of a population.
- polynucleotide refers to a series of nucleotide bases (also called “nucleotides”) in DNA and RNA and mean any chain of two or more nucleotides.
- the polynucleotides can be chimeric mixtures or derivatives or modified versions thereof, and single-stranded or double- stranded.
- the oligonucleotide can be modified at the base moiety, sugar moiety, or phosphate backbone, for example, to improve stability of the molecule, its hybridization parameters, etc.
- nucleic acid refers to a polymer of nucleotides.
- the polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxy adenosine, deoxythymidine, deoxyguanosine, and deoxy cytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7 deazaadenosine, 7 de
- natural nucleosides i.e., adenosine, thymidine, gu
- a “protein,” “peptide,” or “polypeptide” comprises a polymer of amino acid residues linked together by peptide bonds.
- the term refers to proteins, polypeptides, and peptides of any size, structure, or function. Typically, a protein will be at least three amino acids long.
- a protein may refer to an individual protein or a collection of proteins. Proteins may contain only natural amino acids, although non-natural amino acids (i.e., compounds that do not occur in nature but that can be incorporated into a polypeptide chain) and/or amino acid analogs as are known in the art may alternatively be employed.
- amino acids in a protein may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a famesyl group, an isofamesyl group, a fatty acid group, a linker for conjugation or functionalization, or other modification.
- a protein may also be a single molecule or may be a multi-molecular complex.
- a protein may be a fragment of a naturally occurring protein or peptide.
- a protein may be naturally occurring, recombinant, synthetic, or any combination of these.
- a protein may also be a therapeutic protein administered as a treatment for a disease or disorder (e.g., one that is associated with a change in the RNA expression and/or translation profile of a cell taken from a subject).
- the protein is an antibody, or an antibody variant (including antibody fragments).
- the term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively.
- a protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein.
- a protein comprises a proteinaceous part, e.g., an amino acid sequence constituting a nucleic acid binding domain, and an organic compound, e.g., a compound that can act as a nucleic acid cleavage agent.
- a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA.
- any of the proteins provided herein may be produced by any method known in the art.
- the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker.
- Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.
- promoter is art-recognized and refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene.
- a promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active in the presence of a specific condition.
- conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule.
- a subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule “inducer” for activity.
- inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
- inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
- constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect.
- the disclosure provides vectors with appropriate promoters for driving expression of the nucleic acid sequences encoding the fusion proteins (or one or more individual components thereof).
- RNA-programmable nuclease and “RNA-guided nuclease” are used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA(s) that is not a target for cleavage.
- an RNA-programmable nuclease when in a complex with an RNA, may be referred to as a nuclease:RNA complex.
- the bound RNA(s) is referred to as a guide RNA (gRNA).
- gRNAs can exist as a complex of two or more RNAs, or as a single RNA molecule.
- gRNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though “gRNA” is used interchangeably to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules.
- gRNAs that exist as single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas9 complex to the target); and (2) a domain that binds a Cas9 protein.
- domain (2) corresponds to a sequence known as a tracrRNA, and comprises a stem-loop structure.
- domain (2) is identical or homologous to a tracrRNA as provided in Jinek et al., Science 337:816-821(2012), the entire contents of which is incorporated herein by reference.
- gRNAs e.g., those including domain 2
- a gRNA comprises two or more of domains (1) and (2), and may be referred to as an “extended gRNA.”
- an extended gRNA will, e.g., bind two or more Cas9 proteins and bind a target nucleic acid at two or more distinct regions, as described herein.
- the gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex.
- the RNA- programmable nuclease is the (CRIS PR-associated system) Cas9 endonuclease, for example, Cas9 (Csnl) from Streptococcus pyogenes (see, e.g., “Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc.
- Cas9 endonuclease for example,
- RNA-programmable nucleases e.g., Cas9
- Cas9 RNA:DNA hybridization to target DNA cleavage sites
- Methods of using RNA-programmable nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al., Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al., RNA-guided human genome engineering via Cas9. Science 339, 823- 826 (2013); Hwang, W.Y.
- recombinant protein or nucleic acid molecule comprises an amino acid or nucleotide sequence that comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations as compared to any naturally occurring sequence.
- target nucleic acid refers to nucleotide in a “target sequence” within a nucleic acid molecule that is modified by a nucleobase editor, such as a fusion protein comprising an adenosine deaminase, e.g., a dCas9-adenosine deaminase fusion protein provided herein).
- RNA transcript is the product resulting from RNA polymerase- catalyzed transcription of a DNA sequence.
- primary transcript When the RNA transcript is a complementary copy of a DNA sequence, it is referred to as the primary transcript, or it may be an RNA sequence derived from post-transcriptional processing of the primary transcript and is then referred to as the mature RNA.
- Messenger RNA (mRNA)” refers to the RNA that is without introns and can be translated into a polypeptide by the cell.
- upstream and downstream are terms of relativity that define the linear position of at least two elements located in a nucleic acid molecule (whether single or double-stranded) that is orientated in a 5'-to-3' direction.
- a first element is upstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 5' to the second element.
- a first element is downstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 3' to the second element.
- a “subject” to which administration is contemplated refers to a human (z.e., male or female of any age group, e.g., pediatric subject (e.g., infant, child, or adolescent) or adult subject (e.g., young adult, middle-aged adult, or senior adult)) or non-human animal.
- the non-human animal is a mammal (e.g., primate (e.g., cynomolgus monkey or rhesus monkey) or mouse).
- the term “patient” refers to a subject in need of treatment of a disease.
- the subject is human.
- the patient is human.
- the human may be a male or female at any stage of development.
- a subject or patient “in need” of treatment of a disease or disorder includes, without limitation, those who exhibit any risk factors or symptoms of a disease or disorder.
- a subject is a non-human experimental animal (e.g., a mouse, rat, dog, pig, or non-human primate).
- An “effective amount” of a compound described herein refers to an amount sufficient to elicit the desired biological response.
- An effective amount of a compound described herein may vary depending on such factors as the desired biological endpoint, the pharmacokinetics of the compound, the condition being treated, the mode of administration, and the age and health of the subject. In certain embodiments, an effective amount is a therapeutically effective amount.
- an effective amount is a prophylactic treatment. In certain embodiments, an effective amount is the amount of a compound described herein in a single dose. In certain embodiments, an effective amount is the combined amounts of a compound described herein in multiple doses.
- an effective amount of a nucleobase editor may refer to the amount of the nucleobase editor that is sufficient to induce a mutation of a target site specifically bound by the nucleobase editor.
- an effective amount of a fusion protein provided herein e.g., of a fusion protein comprising a nucleic acid programmable DNA binding protein and a deaminase domain (e.g., a cytidine deaminase domain) may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein.
- an agent e.g., a fusion protein, a nucleobase editor, a deaminase, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
- an agent e.g., a fusion protein, a nucleobase editor, a deaminase, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
- the desired biological response e.g., on the specific allele, genome, or target site to be edited, on the cell or tissue being targeted, and on the agent being used.
- a “therapeutically effective amount” of a treatment or therapeutic agent is an amount sufficient to provide a therapeutic benefit in the treatment of a condition or to delay or minimize one or more symptoms associated with the condition.
- a therapeutically effective amount of a treatment or therapeutic agent means an amount of the therapy, alone or in combination with other therapies, that provides a therapeutic benefit in the treatment of the condition.
- the term “therapeutically effective amount” can encompass an amount that improves overall therapy, reduces or avoids symptoms, signs, or causes of the condition, and/or enhances the therapeutic efficacy of another therapeutic agent.
- treatment refers to reversing, alleviating, delaying the onset of, or inhibiting the progress of a disease described herein.
- treatment may be administered after one or more signs or symptoms of the disease have developed or have been observed (e.g., prophylactically or upon suspicion or risk of disease).
- treatment may be administered in the absence of signs or symptoms of the disease.
- treatment may be administered to a susceptible subject prior to the onset of symptoms (e.g., in light of a history of symptoms in the subject, or family members of the subject). Treatment may also be continued after symptoms have resolved, for example, to delay or prevent recurrence.
- treatment may be administered after using the methods disclosed herein and observing a change in the RNA expression or translation profile in a cell or tissue in comparison to a healthy cell or tissue.
- variants should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature, e.g., a variant Cas9 is a Cas9 comprising one or more changes in amino acid residues (i.e., “substitutions”) as compared to a wild type Cas9 amino acid sequence.
- variants encompasses homologous proteins having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with a reference sequence and having the same or substantially the same functional activity or activities as the reference sequence.
- vector refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter a host cell, mutate, and replicate within the host cell, and then transfer a replicated form of the vector into another host cell.
- exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure.
- wild type is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.
- the present disclosure provides modified guide RNAs (gRNAs) comprising 3'- nucleic acid extensions (referred to herein as anchor guide RNAs (agRNAs)), wherein the use of an agRNA results in improved editing efficiency and/or reduced bystander editing of a nucleobase editor.
- the nucleobase editor is a fusion protein comprising a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
- napDNAbp nucleic acid programmable DNA binding protein
- these agRNAs improved editing efficiency and/or reduced bystander editing of a nucleobase editor by stabilizing the target nucleic acid sequence (e.g., genomic DNA) within the active site of the nucleobase editor.
- the present disclosure further provides methods for evolving nucleobase editors to be used in conjunction with a given agRNA.
- the agRNA improves the editing efficiency of a target nucleic acid by a nucleobase editor relative to the editing efficiency of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
- the agRNA reduces bystander editing of bystander nucleic acids within an editing window of a target nucleic acid for a nucleobase editor relative to the bystander editing of the target nucleic acid by the nucleobase editor using a gRNA lacking the 3 '-nucleic acid extension of the agRNA.
- compositions, methods, uses, and kits for base editing comprising an agRNA and an optionally engineered and/or evolved nucleobase editor disclosed herein.
- the agRNAs are modified versions of a guide RNA.
- Guide RNAs maybe naturally occurring, expressed from an encoding nucleic acid, or synthesized chemically. Methods are well known in the art for obtaining or otherwise synthesizing guide RNAs and for determining the appropriate sequence of the guide RNA, including the protospacer sequence which interacts and hybridizes with the target strand of a genomic target site of interest.
- RNA sequence will depend upon the nucleotide sequence of a genomic target site of interest (i.e., the target site to be edited) and the type of napDNAbp (e.g., Cas9 protein) present in base editing smethods described herein, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc.
- a genomic target site of interest i.e., the target site to be edited
- type of napDNAbp e.g., Cas9 protein
- a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, Cas9 nickase, dead Cas9 domain, or Cas9 variant) to the target sequence.
- a napDNAbp e.g., a Cas9, Cas9 homolog, Cas9 nickase, dead Cas9 domain, or Cas9 variant
- the degree of complementarity between a guide sequence and its corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
- Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
- any suitable algorithm for aligning sequences include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn
- a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. [0126] In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length.
- BE base editor
- the agRNA comprises a gRNA and a 3 '-nucleic acid extension ( Figure IB and Figure 5A).
- the gRNA comprises a spacer sequence and a scaffold sequence.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA by a nucleotide linker.
- the nucleotide linker is ranges from 1-50 nucleotides in length.
- the agRNA is capable of binding to a napDNAbp by the scaffold sequence of the gRNA and directing the napDNAbp to a target nucleic acid sequence (e.g., genomic DNA).
- the target nucleic acid sequence comprises (i) a target strand and (ii) a complementary non-target strand.
- the target strand comprises a protospacer sequence that binds to the spacer sequence of the gRNA forming an RNA-DNA hybrid.
- the non-target strand comprises the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor).
- the non-target strand binds to 3'-nucleic acid extension of the agRNA.
- the 3 '-nucleic acid extension binds to the non-target strand using (a) an upstream binding sequence (UBS) that is complementary the non-target stand and binds downstream of the target nucleic acid on the non-target strand and/or (b) a downstream binding sequence (DBS) that is complementary the non-target stand and binds upstream of the target nucleic acid on the non-target strand ( Figure IB and Figure 5A).
- UBS upstream binding sequence
- DBS downstream binding sequence
- the UBS and/or the DBS can be of any suitable length. In certain embodiments, the UBS and/or the DBS is 0 nucleotides in length. In certain embodiments the UBS and/or DBS is at least 1 nucleotide, at least 2 nucleotides, 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at
- nucleotides 25, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, 42 nucleotides, at least 43 nucleotides, at least 44 nucleotides, at least 45 nucleotides, at least 46 nucleotides, at least 47 nucleotides, at least 48 nucleotides, at least 49 nucleotides, or at least 50 nucleotides in length.
- the UBS and/or the DBS are at least 85% homologous, at least 90% homologous, at least 95 % homologous, at least 97 % homologous, at least 99% homologous, at least 99.7 % homologous, or 100% homologous to the non-target strand.
- the UBS and/or DBS comprises at least 1, at least 2, at least 3, at least 4, or at least 5 mismatches.
- the 3 '-nucleic acid extension further comprises a counterloop sequence (CLS).
- CLS can be of any suitable length.
- the CLS is 0 nucleotides in length.
- the CLS is at least 1 nucleotide, at least 2 nucleotides, 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides
- the CLS forms a secondary structural feature. In some embodiments, the CLS forms a hairpin. In some embodiments, the CLS is flanked by the UBS and the DBS.
- the 3 '-nucleic acid extension is attached to the 3 '-end of the gRNA.
- the 3 '-nucleic acid extension further comprises a secondary structural element.
- the secondary structure element is a tevopreQl motif.
- the 3 '-nucleic acid extension comprises any structure selected from:
- each instance of comprises an optional linker, e.g. a peptide linker.
- the agRNA comprises any structure selected from: 5'-[gRNA]-[UBS]-[CLS]-[DBS]-3';
- the 3 '-nucleic acid extension has a nucleotide sequence of SEQ ID NO: 2-11, or a nucleotide sequence having at least 80% sequence identity therewith. In certain embodiments, the 3 '-nucleic acid extension comprises a sequence selected from the group consisting of:
- CTGGCGCGTCGCGCTCTGG (SEQ ID NO: 4- 61531 agRNA);
- CTCGCGGCTTCGCGTGGCAC (SEQ ID NO: 6 - 62809 agRNA);
- CACGCGGCTTCGCGGGCACCA (SEQ ID NO: 7 - 41197 agRNA); ACCGCGCTTCGCGTGGCACCA (SEQ ID NO: 8 - 48214 agRNA); CACCCCTCGCGTTCGCGTTCTGGCA (SEQ ID NO: 9 - 35622 agRNA); CCCTGGCGCGTTCGCGCGCGGCAC (SEQ ID NO: 10 - 56984 agRNA); and TGGCGCGGCTCGCTGGCACCA (SEQ ID NO: 11 - 63661 agRNA).
- the agRNA is capable of binding to a napDNAbp by the scaffold sequence of the gRNA and directing the napDNAbp to a target nucleic acid sequence (e.g., genomic DNA).
- the target nucleic acid sequence comprises (i) a target strand and (ii) a complementary non-target strand.
- the target strand comprises a protospacer sequence that binds to the spacer sequence of the gRNA forming an RNA-DNA hybrid.
- the non-target strand comprises the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor).
- the non-target strand binds to 3 '-nucleic acid extension of the agRNA.
- the 3 '-nucleic acid extension binds to the non-target strand using (a) an upstream binding sequence (UBS) that is complementary the non-target stand and binds downstream of the target nucleic acid on the non-target strand and/or (b) a downstream binding sequence (DBS) that is complementary the non-target stand and binds upstream of the target nucleic acid on the non-target strand ( Figure IB and Figure 5A).
- the “perfect edit” is a single nucleotide substitution of the target nucleic acid within a target nucleic acid sequence.
- the target nucleic acid can be referred to as the “target site” of a nucleobase editor.
- SNPs are the most common genetic variations for various complex human diseases and disorders, including, but not limited to, inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods described herein. [0139] The present disclosure contemplates the use of an agRNA described herein for multiplex editing with a nucleobase editor described herein or elsewhere.
- the target nucleic acid sequence comprises one or more target nucleic acids (also referred to as “the target nucleobase”), wherein the one or more target nucleic acids fall within a double-stranded DNA molecule such as a gene, a transcriptional regulatory region, an intron splice site, an exonic splicing enhancer site, or a nucleosome binding site.
- the target nucleic acid falls within a gene (e.g., DNMTI), a gene that is associated with a disease or disorder, or a gene that is associated with a disease or disorder caused by pathogenic Single Nucleotide Polymorphisms (SNPs).
- SNPs are the most common genetic variations for various complex human diseases and disorders, including inflammatory disorders, autoimmune disorders, and cancers. Treatment of any disease or disorder caused by SNPs is contemplated by the methods described herein.
- the present disclosure provides an agRNA library comprising a plurality of agRNAs described herein.
- Each agRNA library comprises agRNAs that bind up- (e.g., utilizing UBSs) and/or downstream (i.e., utilizing DBSs) of a specific target DNA strand that is later present in the active site of the nucleobase editor ( Figure IB and Figure 5A). Therefore, each agRNA library is specific to the intended target of the nucleobase.
- the agRNA library comprising between 2,000-75,000 different agRNAs. In some embodiments, the agRNA library comprising 2,000 different agRNAs. In some embodiments, the agRNA library comprising between 10,000 different agRNAs. In some embodiments, the agRNA library comprising between 20,000 different agRNAs. In some embodiments, the agRNA library comprising between 40,000 different agRNAs. In some embodiments, the agRNA library comprising between 55,000 different agRNAs. In some embodiments, the agRNA library comprising between 60,000 different agRNAs. In some embodiments, the agRNA library comprising between 75,000 different agRNAs.
- the agRNA library comprises a plurality of agRNAs with different 3 '-nucleic acid extensions. In some embodiments, the agRNA library varies the 3'- nucleic acid extensions by length. In some embodiments, the agRNA library varies the 3'- nucleic acid extensions by UBS, DBS, or CFS (e.g., the UBS and DBS are kept constant, while the CFS is varied).
- the agRNA library varies the 3 '-nucleic acid extensions by structure such that the 3 '-nucleic acid extensions comprise different combinations of UBSs, CESs, and DBSs (e.g., an agRNA library comprising agRNAs with a CFS in the 3'- nucleic extension versus an agRNA library comprising agRNAs without a CFS in the 3'- nucleic extension), or further comprise other secondary structural elements (e.g., tevopreql motif).
- the agRNA library is used to screen which agRNAs improve editing efficiency and/or reduce bystander editing of a nucleobase editor.
- the agRNA library consists of combinations of an array of upstream binding sequences (UBSs), counter-loop sequences (CESs), and downstream binding sequences (DBSs) for a particular target nucleic acid sequence.
- UBSs upstream binding sequences
- CESs counter-loop sequences
- DBSs downstream binding sequences
- the UBSs and the DBSs bind to the target nucleic acid (e.g., genomic DNA) surrounding the targeted edit (also referred to herein as the “perfect edit” or “target nucleic acid”).
- the disclosure provides polynucleotides, vectors, and cells, comprising an agRNA described herein for screening the editing pattern for each nucleobase combined with a particular agRNA.
- the disclosure provides a polynucleotide encoding an agRNA described herein.
- the present disclosure provides a vector comprising any polynucleotide described herein.
- the vector comprises a polynucleotide encoding an agRNA described herein.
- the polynucleotide can be under the control of a promoter.
- the polynucleotide can be under the control of multiple promoters.
- the promoter can be any promoter recognized by a skilled artisan (e.g., a constitutive promoter, a tissue- specific promoter, or an inducible promoter).
- the promoter can be a U6 promoter.
- the promoter can also be a U6, U6v4, U6v7, or U6v9 promoter or a fragment thereof.
- the vector further comprises a polynucleotide sequence comprising an agRNA described herein and a target nucleic acid sequence (e.g., a gene of interest) that includes the target nucleic acid (e.g., the nucleobase to be edited by the nucleobase editor) ( Figure 5A).
- the target nucleic acid sequence is located downstream of the agRNA sequence.
- the agRNA and the target nucleic acid sequence are within a 50-600-nucleotide window (e.g., a 100-nucleotide window, a 300-nucleotide window, a 450-nucleotide window, etc.).
- the vector further comprises at least one primer binding site.
- the vector further comprises at least two primer binding sites.
- the vector comprising the one or more primer binding sites is subjected to next-generation sequencing (NGS) to sequence the agRNA and the target nucleic acid after the editing process in order to analyze the editing pattern (e.g., editing efficiency and bystander editing) of a nucleobase with a given agRNA.
- NGS next-generation sequencing
- a first primer binding site is located upstream or within the agRNA, while a second primer binding site is located downstream of a target nucleic acid sequence.
- the distance between the first and second primer sites is less than 300 nucleotides. In some embodiments, the distance between the first and second primer sites is less than 600, less than 500, less than 300, less than 200, less than 100, or less than 50 nucleotides. In certain embodiments, the distance between the first and second primer sites is less than 300 nucleotides.
- the present disclosure provides an agRNA screening library comprising a plurality of vectors described above and provided in this disclosure.
- next-generation sequencing NGS is used to sequence the plurality of vectors of the agRNA screening library to analyze the editing pattern for each library clone within the 300-nucleotide window.
- the target sequence is the human DNMT1 gene.
- the present disclosure provides a method of selecting agRNAs comprising the steps of:
- NGS next generation sequencing
- the bystander editing of the nucleobase editor at one or more sites is reduced by at least 2%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% relative to a reference agRNA.
- the editing efficiency of the nucleobase editor at one or more sites is reduced by at least 2%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99%, or at least 99.5% relative to a reference agRNA.
- compositions comprising any of the fusion proteins provided herein, and an agRNA optionally bound to napDNAbp of the fusion protein.
- compositions comprising any of the fusion proteins provided herein, and an agRNA optionally bound to a Cas9 domain (e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase) of fusion protein.
- a Cas9 domain e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase
- Some aspects of this disclosure provide complexes comprising any of the fusion proteins provided herein, and an agRNA bound to napDNAbp of the fusion protein. Some aspects of this disclosure provide complexes comprising any of the fusion proteins provided herein, and an agRNA bound to a Cas9 domain (e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase) of fusion protein.
- a Cas9 domain e.g., a dCas9, a nuclease active Cas9, or a Cas9 nickase
- the present disclosure describes a composition
- a composition comprising (a) an agRNA and (b) a nucleobase editor (e.g., ABExl, ABEx2, ABEx3, or ABEx4) to carry out nucleobase editing.
- the composition further comprises (c) a target nucleic acid.
- the nucleobase editor is a fusion protein capable of base editing.
- the fusion protein comprises a deaminase and a nucleic acid programmable DNA binding protein (napDNAbp).
- the composition comprises (a) an agRNA, (b) a N-terminal portion of a split nucleobase editor fused at its C-terminus to an intein-N and (c) a C-terminal portion of a split nucleobase editor fused at its N-terminus to an intein-C such that the N- terminal portion of a split nucleobase editor and the C-terminal portion of a split nucleobase editor are joined to form a fusion protein of a deaminase and a napDNAbp.
- the composition further comprising (d) a target nucleic acid.
- the present disclosure describes a complex comprising any of the agRNAs described herein and a nucleobase editor described herein or elsewhere.
- the present disclosure describes one or more polynucleotides encoding a complex of an agRNA and a nucleobase editor described herein or elsewhere.
- the present disclosure describes one or more vectors comprising one or more polynucleotides encoding a complex of an agRNA and a nucleobase editor.
- the vector includes one or more promoters that drive the expression of the agRNA and the nucleobase editor or split nucleobase editor of the complex.
- the nucleobase editors described herein may comprise a nucleic acid programmable DNA binding protein (napDNAbp).
- napDNAbp nucleic acid programmable DNA binding protein
- a napDNAbp can be associated with or complexed with at least one guide nucleic acid (e.g., guide RNA or a agRNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target nucleic acid sequence) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the spacer of a guide RNA which anneals to the protospacer of the DNA target).
- the guide nucleic- acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to complementary sequence of the protospacer in the DNA.
- any suitable napDNAbp may be used in the nucleobase editors described herein.
- the napDNAbp may be any Class 2 CRISPR-Cas system, including any type II, type V, or type VI CRISPR-Cas enzyme.
- the napDNAbp is selected from the group consisting of Cas9, CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Sp-Cas9, SpRY, SpG-Cas9, NG-Cas9, NRRH-Cas9, spCas9, geoCas9, saCas9, Nme2Cas9, Casl2(a-i), Casl4, Argonaute, and variants thereof.
- the nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., dCas9 and nCas9), Casl2e (CasX), Casl2d (CasY), Casl2a (Cpfl), Casl2bl (C2cl), and Casl2c (C2c3).
- nucleobase editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein — including any naturally occurring variant, mutant, or otherwise engineered version of Cas9 — that is known or which can be made or evolved through a directed evolutionary or otherwise mutagenic process.
- the Cas9 or Cas9 variants have a nickase activity, i.e., only cleave of strand of the target DNA sequence.
- the Cas9 or Cas9 variants have inactive nucleases, i.e., are “dead” Cas9 proteins.
- Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid structure (e.g., the circular permutant formats).
- the nucleobase editors described herein may also comprise Cas9 equivalents, including Casl2a/Cpfl and Casl2b proteins which are the result of convergent evolution.
- the napDNAbps used herein may also contain various modifications that alter/enhance their PAM specificities (e.g., SpRY).
- the application contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a references SpCas9 canonical sequences or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
- a reference Cas9 sequence such as a references SpCas9 canonical sequences or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
- the nucleobase editors described herein may be delivered to cells as two or more fragments which become assembled inside the cell (either by passive assembly, or by active assembly, such as using split intein sequences) into a reconstituted nucleobase editor.
- the self-assembly may be passive whereby the two or more nucleobase editor fragments associate inside the cell covalently or non-covalently to reconstitute the nucleobase editor.
- the self-assembly may be catalyzed by dimerization domains installed on each of the fragments.
- the selfassembly may be catalyzed by split intein sequences installed on each of the nucleobase editor fragments.
- split nucleobase editors analogous to those described herein is further described in, for example, International Patent Application Publication No. WO 2017/197238, published November 16, 2017, which is incorporated herein by reference.
- the nucleobase editor (BE) is divided at a split site within the napDNAbp.
- Fusion proteins useful for the methods disclosed herein include cytidine base editors (CBEs), in which the deaminase domain is a cytidine deaminase.
- the deaminase domain is an apolipoprotein B mRNA-editing complex 1 (APOBEC1) deaminase domain.
- APOBEC1 apolipoprotein B mRNA-editing complex 1
- rAPOBECl a rat APOBEC1
- a human APOBEC1 is used.
- cytidine deaminases include APOBEC2 deaminase, APOBEC3A deaminase, APOBEC3B deaminase, APOBEC3C deaminase, APOBEC3D deaminase, APOBEC3F deaminase, APOBEC3G deaminase, and APOBEC3H deaminase, an activation-induced deaminase (AID), a cytidine deaminase 1 from Petromyzon marinus (pmCDAl), an ACF1/ASE deaminase, CBE6, CGBE, TadCBE, or a variant thereof.
- AID activation-induced deaminase
- AID activation-induced deaminase 1 from Petromyzon marinus
- pmCDAl Petromyzon marinus
- ACF1/ASE deaminase CBE6, CGBE, Ta
- the cytidine base editors utilized in the disclosed methods may further comprise an inhibitor of base excision repair ("iBER") domain.
- the iBER domain may comprise a uracil glycosylase inhibitor (UGI) domain.
- the uracil glycosylase inhibitor domain prevents a U:G mismatch (or G:T mismatch) from being repaired back to the original C:G (or A:T) base pair.
- the fusion protein comprises a catalytically inactive inosine-specific nuclease domain, such as a UGI domain.
- a UGI domain comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, 99%, 99.5%, or 99.9% identical to the amino acid sequence:
- Configurations of the cytidine base editors utilized in the methods disclosed herein may comprise dCas9 and/or UGI domains that comprise fusion proteins having the general structure NH2-[dCas9]-[cytidine deaminase domain]-COOH, NH2-[cytidine deaminase domain]-[dCas9]-COOH, NH2-[dCas9]-[cytidine deaminase domain] -[uracil glycosylase inhibitor] -COOH, or NH2-[cytidine deaminase domain]-[dCas9]-[uracil glycosylase inhibitor] -COOH; wherein each instance of "]-[" comprises an optional linker, e.g. a peptide linker.
- Configurations of the cytidine base editors utilized in the methods disclosed herein may comprise nCas9 and/or UGI domains that comprise fusion proteins having the general structure NH2-[nCas9]-[cytidine deaminase domain]-COOH, NH2-[cytidine deaminase domain]-[nCas9]-COOH, NH2-[nCas9]-[cytidine deaminase domain] -[uracil glycosylase inhibitor] -COOH, or NH2-[cytidine deaminase domain]-[nCas9]-[uracil glycosylase inhibitor] -COOH; wherein each instance of “]-[“ comprises an optional linker, e.g. a peptide linker.
- the cytidine base editors (CBE) utilized in the disclosed methods may further comprise one, two, or more than two nuclear localization sequences (NLS).
- Configurations of such base editors may comprise fusion proteins having the structure NH2-[dCas9]-[cytidine deaminase domain] -[NLS] -COOH, NH2-[dCas9]-[cytidine deaminase domain]-[NLS]-[NLS]-COOH, NH2-[cytidine deaminase domain]-[dCas9]-[NLS]-COOH, NH2-[cytidine deaminase domain]-[dCas9]-[NLS]-[NLS]-COOH, NH2-[dCas9]- [cytidine deaminase domain] -[uracil glycosylase inhibitor] -[NLS]
- Fusion proteins useful for the methods disclosed herein include adenine base editors (ABEs), in which the deaminase domain is an adenosine deaminase.
- the adenosine deaminase domain comprises the amino acid sequence of SEQ ID NO: 1 and 13-18.
- the adenosine deaminase is derived from a bacterium, such as, E. coli, S. aureus, S. typhi, S. putrefaciens, H. influenzae, or C. crescentus.
- the adenosine deaminase is a TadA deaminase.
- the TadA deaminase is an E. coli TadA deaminase (ecTadA).
- the TadA deaminase is a truncated E. coli TadA deaminase.
- the truncated ecTadA may be missing one or more N-terminal amino acids relative to a full-length ecTadA.
- the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 N-terminal amino acid residues relative to the full length ecTadA.
- the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 C-terminal amino acid residues relative to the full length ecTadA.
- the ecTadA deaminase does not comprise an N-terminal methionine.
- the adenosine deaminase is an N-terminal truncated E. coli TadA (ecTadA).
- the adenosine deaminase comprises a sequence that has at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following amino acid sequence: MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPT AHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKT GAAGSLMDVLHHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 13).
- the adenosine deaminase is a full-length E. coli TadA (“ecTadA(wt)”) deaminase.
- the adenosine deaminase comprises a sequence that has at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following amino acid sequence: MRRAFITGVFFLSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEG WNRPIGRHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIG RVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADECAALLSDFFRMRRQEI KAQKKAQSSTD (SEQ ID NO: 14).
- the adenosine deaminase comprises a D108N mutation in SEQ ID NO: 13, or a corresponding mutation in a homologous or orthologous adenosine deaminase.
- the adenosine deaminase further comprises an A106V mutation in SEQ ID NO: 13, or a corresponding mutation in a homologous or orthologous adenosine deaminase.
- the fusion proteins disclosed herein have the general structure ecTadA*-XTEN-dCas9 (e.g. “ecTadA*(7.10)”), where ecTadA* represents an ecTadA variant comprising A106V and D108N mutations in the amino acid sequence of SEQ ID NO: 1.
- ecTadA* represents an ecTadA variant comprising A106V and D108N mutations in the amino acid sequence of SEQ ID NO: 1.
- the adenosine deaminase comprises the amino acid sequence:
- Configurations of the adenine base editors utilized in the methods disclosed herein may comprise a dCas9 domain, and may comprise fusion proteins having the structure NH2- [dCas9]- [adenine deaminase domain] -COOH, NH2- [adenine deaminase domain] -[dCas9]- COOH, NH2-[dCas9]- [adenine deaminase domain]-[NLS]-COOH, NH2-[dCas9]-[adenine deaminase domain]-[NLS]-[NLS]-COOH, NH2-[adenine deaminase domain]-[dCas9]-[NLS]- COOH, NH 2 -[adenine deaminase domain]-[dCas9]-[NLS]-[NLS]-COOH, NH 2 -[NLS]- [dCas9]- [d
- Configurations of the adenine base editors utilized in the methods disclosed herein may comprise an nCas9 domain, and may comprise fusion proteins having the structure NH2- [nCas9]- [adenine deaminase domain] -COOH, NH2- [adenine deaminase domain] -[nCas9]- COOH, NH2-[nCas9]- [adenine deaminase domain]-[NLS]-COOH, NH2-[nCas9]-[adenine deaminase domain]-[NLS]-[NLS]-COOH, NH2-[adenine deaminase domain]-[nCas9]-[NLS]- COOH, NH 2 -[adenine deaminase domain]-[nCas9]-[NLS]-[NLS]-COOH, NH 2 -[NLS]- [nCas9]- [nC
- the adenosine deaminase is selected from the group consisting of TadA-8e, ABE8e, AYBE, ABE9, and variants thereof.
- the deaminase is a TadA-8e adenosine deaminase.
- the deaminase is a TadA-8e adenosine deaminase variant.
- the deaminase is an adenosine deaminase comprising an amino acid sequence that is at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of SEQ ID NO: 1.
- the adenosine deaminase has an amino acid sequence that includes one or more amino acid substitutions at positions 8, 25, 26, 27, 28, 29, 30, 31, 33, 34, 37, 38, ,39, 41, 42, 43, 44, 45, 48, 49, 50, 54, 56, 58, 78, 79, 80, 82, 84, 85, 86, 88, 90, 91, 92, 93, 94, 95, 96, 97, 99, 100, 101, 102,106, 107, 109, 111, 123, 146, 149, 151, 152, 155, 156, and 157 of SEQ ID NO: 1.
- the adenosine deaminase has an amino acid sequence that includes one or more amino acid substitutions at positions 28, 34, and 151 of SEQ ID NO: 1.
- the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitutions H8D, R25E, R26G, E27R, V28C, P29C, V30W, G31E, V33C, V33T, I34W, N37T, N37H, N38I, R39E, I41S, G42A, E43R, G44A, W45G, A48P, I49S, G50A, D54T, A56P, A58S, A78R, T79H, L80P, V822R, F84I, F84L, E85R, P86A, V88R, C90V, A91R, G92R, A93R, M94H, I95D, H96P, S97L, I99D,
- the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitutions V28C, L34W, and M151E. In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitution V28C (ABExl). In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitution L34W (ABEx2). In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitution M151E (ABEx3). In certain embodiments, the adenosine deaminase of SEQ ID NO: 1 comprises the amino acid substitutions V28C and M151E (ABEx4).
- the adenosine deaminase comprises a V28C, L34W, and/or M151E mutation in SEQ ID NO: 1, or a corresponding mutation in a homologous or orthologous adenosine deaminase. In some embodiments, the adenosine deaminase comprises a V28C mutation in SEQ ID NO: 1, or a corresponding mutation in a homologous or orthologous adenosine deaminase.
- the adenosine deaminase comprises a M151E mutation in SEQ ID NO: 1, or a corresponding mutation in a homologous or orthologous adenosine deaminase.
- Methods for determining homologous or orthologous adenosine deaminases to SEQ ID NO: 1 would be apparent to the skilled artisan.
- nucleobase editor fusion proteins comprising an TadA-8e deaminase variant and a napDNAbp.
- Exemplary fusion proteins include, without limitation, the following TadA-8e deaminase variants:
- ABEx2 (L34W mutation is bolded and underlined, BPNLS sequence is underlined) MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVWVLN NRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGA MIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFY RMPRQVFNAQKKAQSSIN (SEQ ID NO: 16).
- ABEx3 (M15 IE mutation is bolded and underlined, BPNLS sequence is underlined) MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLN NRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGA MIHSRIGRVVEGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDEY REPRQVFNAQKKAQSSIN (SEQ ID NO: 17).
- the fusion proteins disclosed herein further comprise one or more, preferably at least two nuclear localization signals.
- the fusion proteins comprise at least two NLSs.
- the NLSs can be the same NLSs or they can be different NLSs.
- the NLSs may be expressed as part of a fusion protein with the remaining portions of the fusion proteins.
- the location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a fusion protein (e.g. inserted between the encoded Cas9 and a DNA effector moiety (e.g. a deaminase)).
- the NLSs may be any known NLS sequence in the art.
- the NLSs may also be any future-discovered NLSs for nuclear localization.
- the NLSs also may be any naturally- occurring NLS, or any non-naturally occurring NLS (e.g. an NLS with one or more desired mutations).
- a nuclear localization signal or sequence is an amino acid sequence that tags, designates, or otherwise marks a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. Different nuclear localized proteins may share the same NLS. An NLS has the opposite function of a nuclear export signal (NES), which targets proteins out of the nucleus.
- NES nuclear export signal
- a nuclear localization signal can also target the exterior surface of a cell. Thus, a single nuclear localization signal can direct the entity with which it is associated to the exterior of a cell and to the nucleus of a cell.
- Such sequences can be of any size and composition, for example more than 25, 25, 15, 12, 10, 8, 7, 6, 5 or 4 amino acids, but will preferably comprise at least a four to eight amino acid sequence known to function as a nuclear localization signal (NLS).
- nuclear localization sequence refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport.
- Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed Nov. 23, 2000, published as WO 2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences.
- linkers may be used to link any of the peptides or peptide domains of the disclosure.
- the term "linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g. a binding domain and a cleavage domain of a nuclease.
- a linker joins a dCas9 and deaminase domain (e.g. a cytidine or adenosine deaminase).
- the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two.
- the linker is an amino acid or a plurality of amino acids (e.g. a peptide or protein).
- the linker is an organic molecule, group, polymer, or chemical moiety.
- the linker is 5- 100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30, 31, 32, 33-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
- the linker is a peptide linker, such as an XTEN linker, a 16 amino acid linker.
- the fusion protein described herein may comprise one or more heterologous protein domains, e.g. epitope tags and reporter gene sequences.
- the heterologous protein domain comprises a reporter sequence comprising a p2A-GFP insert ((Addgene plasmid #65562; RRID:Addgene_65562), see Li J, et al., Intron targeting-mediated and endogenous gene integrity-maintaining knockin in zebrafish using the CRISPR/Cas9 system. Cell Res. (2015)).
- Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags.
- reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP).
- GST glutathione-5-transferase
- HRP horseradish peroxidase
- CAT chloramphenicol acetyltransferase
- beta-galactosidase beta-galacto
- a fusion protein may be fused to a gene sequence encoding a protein or a fragment of a protein that bind DNA molecules or bind other cellular molecules, including, but not limited to, maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, and herpes simplex virus (HSV) BP16 protein fusions. Additional domains that may form part of a fusion protein are described in US Patent Publication No. 2011/0059502, published Mar. 10, 2011 and incorporated herein by reference in its entirety.
- the invention provides methods comprising delivering one or more nucleobase editor-encoding polynucleotides, such as or one or more vectors as described herein encoding one or more components of the base editing system described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell.
- the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells.
- a nucleobase editor as described herein in combination with (and optionally complexed with) an anchor guide sequence is delivered to a cell.
- Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome.
- Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
- Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipidmucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.
- Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., TransfectamTM and LipofectinTM).
- Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
- Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
- target tissues e.g. in vivo administration.
- the preparation of lipidmucleic acid complexes, including targeted liposomes such as immunolipid complexes is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem.
- RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus.
- Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo).
- Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
- the disclosure provides cells (e.g., transformed cell lines) that comprise the agRNA described herein.
- the cells can also comprise the nucleobase editing complexes described herein (e.g., wherein the cell comprises both an agRNA and a nucleobase editor).
- the cells can also comprise any of the polynucleotides described above, which express the agRNA, and optionally which express the nucleobase editors.
- the cells can comprise any of the vectors described above, which express the agRNA, and optionally which express the nucleobase editors.
- the disclosure describes a method of selecting agRNAs, wherein the method involves transfecting the agRNA screening libraries described above and in this discloser into host cells, and using NGS to select agRNA vectors that observe reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid by a nucleobase editor.
- the reduced bystander editing and/or editing efficiency of a nucleobase editor is measured relative to the gRNA lacking the 3 '-nucleic acid extension of the agRNA or a second agRNA.
- a host cell is transiently or non-transiently transfected with one or more vectors described herein.
- a cell is transfected as it naturally occurs in a subject.
- the cell that is transfected is derived from cells taken from a non-human subect, such as a mammal (e.g., primate (e.g., cynomolgus monkey or rhesus monkey), elephant, or mouse) or an avian (e.g., a bird).
- the cell that is transfected is derived from cells taken from plants, fungi, bacteria, and archaea.
- a cell that is transfected is taken from a subject.
- the cell is derived from cells taken from a subject, such as a cell line.
- a wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huhl, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panel, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calul, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7,
- a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences.
- a cell transiently transfected with the components of a CRISPR system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence.
- cells transiently or non-transiently transfected with one or more vectors described herein, or cell lines derived from such cells are used in assessing one or more test compounds.
- the disclosure provides a pharmaceutical composition
- a pharmaceutical composition comprising: (i) an agRNA described above, or a nucleobase editing complex described above, a polynucleotide described above, or a vector described above, or any of the cells described above, and (ii) a pharmaceutically acceptable excipient.
- compositions comprising any of the various components of the base editing system described herein (e.g., including, but not limited to, the napDNAbps, deaminases, fusion proteins (e.g., comprising napDNAbps and deaminases), pegRNAs, and complexes comprising fusion proteins and agRNAs, as well as accessory elements.
- the napDNAbps e.g., including, but not limited to, the napDNAbps, deaminases, fusion proteins (e.g., comprising napDNAbps and deaminases), pegRNAs, and complexes comprising fusion proteins and agRNAs, as well as accessory elements.
- composition refers to a composition formulated for pharmaceutical use.
- the pharmaceutical composition further comprises a pharmaceutically acceptable carrier.
- the pharmaceutical composition comprises additional agents (e.g. for specific delivery, increasing half-life, or other therapeutic compounds).
- the term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body).
- a pharmaceutically-acceptable material such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body).
- a pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.).
- materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols
- wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation.
- excipient e.g., pharmaceutically acceptable carrier or the like are used interchangeably herein.
- the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing.
- Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.
- the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site).
- a diseased site e.g., tumor site
- the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.
- the pharmaceutical composition described herein is delivered in a controlled release system.
- a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng. 14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med. 321:574).
- polymeric materials can be used.
- the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human.
- pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer.
- the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection.
- the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent.
- the pharmaceutical is to be administered by infusion
- it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline.
- an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.
- a pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer’s or Hank’s solution.
- the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.
- the pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration.
- the particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein.
- Compounds can be entrapped in “stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther. 1999, 6:1438-47).
- SPLP stabilized plasmid-lipid particles
- lipids such as N-[l-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles.
- DOTAP N-[l-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate
- the preparation of such lipid particles is well known. See, e.g., U.S. Patent Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.
- the pharmaceutical composition described herein may be administered or packaged as a unit dose, for example.
- unit dose when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.
- the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection.
- a pharmaceutically acceptable diluent e.g., sterile water
- the pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention.
- Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.
- an article of manufacture containing materials useful for the treatment of the diseases described above is included.
- the article of manufacture comprises a container and a label.
- Suitable containers include, for example, bottles, vials, syringes, and test tubes.
- the containers may be formed from a variety of materials such as glass or plastic.
- the container holds a composition that is effective for treating a disease described herein and may have a sterile access port.
- the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle.
- the active agent in the composition is a compound of the invention.
- the label on or associated with the container indicates that the composition is used for treating the disease of choice.
- the article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
- the disclosure describes a computational method, which may be embodied in software, for designing a library of 3'-nucleic acid extensions.
- the method involves efficiently evaluating a nucleic acid target and generating one or more combinations of a UBS, DBS, and/or CLS for nucleobase editing.
- program or “software” are used herein in a generic sense to refer to any type of computer code or set of processor-executable instructions that may be employed to program a computer or other processor to implement various aspects of embodiments as described above.
- one or more computer programs that when executed perform methods of the disclosure provided herein need not reside on a single computer or processor but may be distributed in a modular fashion among different computers or processors to implement various aspects of the disclosure provided herein.
- Some aspects of this disclosure provide methods of selecting nucleobase editors that show reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid utilizing machine learning (ML) language models.
- the ML language models are able to predict evolutionary adaptative mutations resulting in nucleobase editor variants with improved fitness.
- the present disclosure describes a method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor comprising:
- the machine learning model is an ESM-lb language model and/or an ESM-lv language model, wherein said language models (i) learn natural amino acid patterns based on millions of naturally occurring protein sequences, (ii) consider mutations observed in sequences of natural proteins as plausible mutations and (iii) assume plausible mutations with high likelihood scores correlate with improved protein fitness. Evolution of nucleobase editor
- Some aspects of this disclosure provide methods of phage-assisted, non-continuous evolution (PANCE) of a nucleobase editor.
- Some aspects of this disclosure provide methods of phage-assisted, non-continuous evolution (PANCE) of a nucleobase editor.
- PANCE selection phages
- SP selection phages
- SP 1-3 selection plasmids
- the selection plasmid comprises (i) a pill nucleotide sequence (encoding the phage coat protein pill) that has been modified to contain at least one a single nucleotide variant (SNV) and (ii) an agRNA nucleotide sequence encoding the corresponding agRNA that targets the modified pill nucleotide sequence to correct (edit) the SNV to the wildtype sequence.
- SNV results in a mutated pill protein having lower the phage infectivity. Correction of the SNV by a complex of the agRNA and nucleobase editor to the WT sequence increases phage infectivity. If the perfect edit occurs, the pill sequence is reverted to the wildtype sequence and phage propagation occurs.
- the selection plasmid comprises a sequence such that bystander edits upstream and downstream of the target nucleic acid in the pill nucleotide sequence introduce mutations that inhibit phage propagation.
- host cells further comprise a helper plasmid and/or a mutagenesis plasmid.
- a mutagenesis plasmid comprises an arabinose- inducible promoter.
- the present disclosure describes a method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor, the method comprising an agRNA described herein as part of PANCE system.
- the method of selecting nucleobase editors having (i) reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or (ii) improved editing efficiency of a target nucleic acid by a nucleobase editor comprises the steps:
- selection plasmids for PANCE encode a pill gene further comprising mutations that diminish pill activity, wherein (i) pill activity is restored by a nucleobase editor if the nucleobase editor edits the target nucleic acid or (ii) pill activity is not restored by a nucleobase editor if the nucleobase editor edits bystander nucleic acids;
- the selection plasmids generated for PANCE comprise a sequence selected from any one of SEQ ID NOs: 19-21.
- the selection plasmids generated for PANCE comprise the sequence:
- the selection plasmids generated for PANCE comprise the sequence:
- the selection plasmids generated for PANCE comprise the sequence:
- a person skilled in the art would appreciate the use of other evolution systems (e.g., phage-assisted, continuous evolution (PACE)) for generating evolved nucleobase editors with reduced bystander editing within an editing window of a target nucleic acid for a nucleobase editor and/or improved editing efficiency of a target nucleic acid in conjunction with the agRNAs described herein.
- PACE phage-assisted, continuous evolution
- the present invention relates to an improved version of “base editing” that utilizes modified or equivalently, engineered agRNAs which are engineered to comprise one or more structural modifications that improve one or more characteristics, including their stability, cellular lifespan, affinity for Cas9 (or more broadly, to a napDNAbp), or interaction with a target DNA thereby increasing the editing efficiency base editing and reducing bystander editing within the base editing window of a nucleobase editor.
- Some aspects of this disclosure provide methods of using any of the fusion proteins (e.g., a Cas9 domain fused to an adenosine deaminase) provided herein, or complexes comprising an agNRA and a fusion protein (e.g., a Cas9 domain fused to an adenosine deaminase) provided herein.
- fusion proteins e.g., a Cas9 domain fused to an adenosine deaminase
- complexes comprising an agNRA and a fusion protein e.g., a Cas9 domain fused to an adenosine deaminase
- some aspects of this disclosure provide methods comprising contacting a DNA, or RNA molecule with any of the fusion proteins or nucleobase editors provided herein, and with at least one agNRA, wherein the agRNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence.
- the disclosure provides a method of nucleobase editing (e.g., “base editing”) comprising contacting a target nucleic acid sequence with an agRNA described above and a nucleobase editor comprising a fusion protein comprising a deaminase and a napDNAbp or a split napDNAbp, wherein the editing efficiency is increased and/or the bystander editing is decreased as compared to the same method using a gRNA not comprising the 3 Z -nucelic acid extension.
- base editing e.g., “base editing”
- the present disclosure contemplates the use of the agRNAs described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell (e.g., prokaryotic or eukaryotic cell), a virus, a fungus, a plant, an insect, and/or an animal.
- SNV single nucleotide variant
- the present disclosure contemplates the use of the methods described herein for base editing a target nucleic acid within a target nucleic acid sequence, wherein editing the target nucleic acid produces a single nucleotide variant (SNV) for engineering a cell, a virus, a fungus, a plant, an insect, and/or an animal.
- SNV single nucleotide variant
- the use of the methods described herein may be used for modifying a target nucleic acid sequence for research purposes.
- the present disclosure contemplates the use of the base editing methods described herein for targeted modifications in the genomes of plants for improved crop varieties.
- the target DNA sequence comprises a sequence associated with a disease or disorder. In some embodiments, the target DNA sequence comprises a point mutation associated with a disease or disorder.
- the activity of the fusion protein results in a correction of the point mutation.
- the methods provided herein are used to introduce a deactivating point mutation into a gene or allele that encodes a gene product that is associated with a disease or disorder.
- methods are provided herein that employ a DNA editing fusion protein to introduce a deactivating point mutation into an oncogene e.g., in the treatment of a proliferative disease).
- a deactivating mutation may, in some embodiments, generate a premature stop codon in a coding sequence, which results in the expression of a truncated gene product, e.g., a truncated protein lacking the function of the full-length protein.
- the purpose of the methods provided herein is to restore the function of a dysfunctional gene via genome editing.
- the nucleobase editing proteins provided herein can be validated for gene editing-based human therapeutics in vitro, e.g., by correcting a disease-associated mutation in human cell culture.
- the instant disclosure provides methods for the treatment of a subject diagnosed with a disease associated with or caused by a point mutation that can be corrected by a DNA editing fusion protein provided herein.
- a method comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of a nucleobase editor fusion protein that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene.
- the disease is a proliferative disease.
- the disease is a genetic disease.
- the disease is a neoplastic disease.
- the disease is a metabolic disease.
- the disease is a lysosomal storage disease.
- Other diseases or disorders that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated gene will be known to those of skill in the art, and the disclosure is not limited in this respect.
- compositions of the present disclosure may be assembled into kits.
- the kit comprises nucleic acid vectors for the expression of the nucleobase editors described herein.
- the kit further comprises appropriate guide nucleotide sequences (e.g., agRNAs) or nucleic acid vectors for the expression of such guide nucleotide sequences, to target the Cas9 protein or nucleobase editor to the desired target sequence.
- agRNAs guide nucleotide sequences
- the kit described herein may include one or more containers housing components for performing the methods described herein and optionally instructions for use. Any of the kit described herein may further comprise components needed for performing the assay methods.
- Each component of the kits may be provided in liquid form (e.g., in solution) or in solid form, (e.g., a dry powder). In certain cases, some of the components may be reconstitutable or otherwise processible (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water), which may or may not be provided with the kit.
- kits may optionally include instructions and/or promotion for use of the components provided.
- “instructions” can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc.
- the written instructions may be in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which can also reflect approval by the agency of manufacture, use or sale for animal administration.
- kits includes all methods of doing business including methods of education, hospital and other clinical instruction, scientific inquiry, drug discovery or development, academic research, pharmaceutical industry activity including pharmaceutical sales, and any advertising or other promotional activity including written, oral and electronic communication of any form, associated with the disclosure. Additionally, the kits may include other components depending on the specific application, as described herein.
- kits may contain any one or more of the components described herein in one or more containers.
- the components may be prepared sterilely, packaged in a syringe and shipped refrigerated. Alternatively it may be housed in a vial or other container for storage. A second container may have other components prepared sterilely.
- the kits may include the active agents premixed and shipped in a vial, tube, or other container.
- kits may have a variety of forms, such as a blister pouch, a shrink wrapped pouch, a vacuum sealable pouch, a sealable thermoformed tray, or a similar pouch or tray form, with the accessories loosely packed within the pouch, one or more tubes, containers, a box or a bag.
- the kits may be sterilized after the accessories are added, thereby allowing the individual accessories in the container to be otherwise unwrapped.
- the kits can be sterilized using any appropriate sterilization techniques, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art.
- kits may also include other components, depending on the specific application, for example, containers, cell media, salts, buffers, reagents, syringes, needles, a fabric, such as gauze, for applying or removing a disinfecting agent, disposable gloves, a support for the agents prior to administration, etc.
- kits comprising a nucleic acid construct comprising a nucleotide sequence encoding the various components of the nucleobase editing system described herein (e.g., including, but not limited to, the napDNAbps, deaminases, fusion proteins (e.g., comprising napDNAbps and deaminases, agRNAs, and complexes comprising fusion proteins and agRNAs, as well as accessory elements.
- the nucleotide sequence(s) comprises a heterologous promoter (or more than a single promoter) that drives expression of the nucleobase editing system components.
- kits comprising one or more nucleic acid constructs encoding the various components of the nucleobase editing system described herein, e.g., the comprising a nucleotide sequence encoding the components of the nucleobase editing system capable of modifying a target DNA sequence.
- the nucleotide sequence comprises a heterologous promoter that drives expression of the nucleobase editing system components.
- kits comprising a nucleic acid construct, comprising (a) a nucleotide sequence encoding a napDNAbp (e.g., a Cas9 domain) fused to a deaminase and (b) a heterologous promoter that drives expression of the sequence of (a).
- a nucleic acid construct comprising (a) a nucleotide sequence encoding a napDNAbp (e.g., a Cas9 domain) fused to a deaminase and (b) a heterologous promoter that drives expression of the sequence of (a).
- the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
- This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified.
- “at least one of A and B” can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements) ;etc.
- a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
- Example 1 3 ' extended gRNAs (agRNAs) for targeted manipulation of bystander edits, activity, and editing window
- the mechanism underlying the adenosine deamination process involves that the TadA-8e domain engages with the exposed single-stranded region of the PAM-distal nontarget strand (NTS) (Lapinaite)] (FIG. 5A).
- NTS PAM-distal nontarget strand
- the TadA-8e deaminase when attached to the Cas9 protein, induces specific editing patterns within narrow DNA regions, covering several base pairs. This connection limits the enzyme to act on certain nucleotides, defining what is known as the editing window. Since different DNA contexts are relatively diverse substrates that require the enzyme to accept structure variations within its active site, the DNA strand in the active site has a certain degree of freedom to move in this position.
- a system that stabilizes the DNA strand within the active site was therefore established. This restricted movement may result in a smaller editing window thus minimizing the bystander effect. Therefore, one option was the possibility of adding nucleotides to the 3’ end of the gRNA scaffold.
- These anchor guide RNAs were designed to bind the DNA strand up- and downstream of the DNA region that is later present in the active site of the TadA-8e, thereby stabilizing the loop structure resulting in fewer bases being deaminated (FIGs. 1A and 5A).
- the agRNA library consisted of combinations of an array of upstream binding sequences, counterloops, and downstream binding sequences (FIGs. 5A-5B). Both the up- and downstream sequences bound the DNA strand surrounding the targeted edit. Sequences of lengths ranging from 1 to 11 base pair (bp) were tested, with all possible starting points in a 11 bp window. The counterloop sequences ranged from 1 to 33 bp, with the longer ones forming guanine-cytosine (GC)-rich hairpins. This design process yielded an agRNAs library containing ⁇ 60K candidates.
- a plasmid with the editing target downstream of the agRNA was constructed.
- NGS nextgeneration sequencing
- he tested library was designed to target a site in the human DNMT1 locus, being an optimal candidate for screening, both for high accessibility for editing in HEK293T cells, and the multiple adenines context within the editing window.
- the ABE8e-spCas9-WT nucleobase editor in combination with a non-modified guide (sgRNACtrl) showed a high editing efficiency for the four adenines in the editing window (A13, A14, A15 and A17).
- sgRNACtrl non-modified guide
- anchors that showed higher efficiency and lower bystander editing in the DNMT1 sensor library were selected.
- Five candidates to test in the native context in HEK293T cells were selected, and all of them showed a decrease in bystander editing (FIG. ID).
- Clone 56114 was the one that showed higher precision in A13 editing, with a significant reduction of 44% in A 17 and 34% in A16 (FIGs. ID and 5C-5e).
- the agRNA56114-tevopreql was tested in Hela and HepG2 cells, obtaining similar bystander reduction patterns (FIG. II).
- sgRNA and agRNA libraries targeting -12000 pathogenic single nucleotide variants that can be targeted by base editing based on proximity to a PAM (Arbab) were constructed.
- the libraries sgRNACtrl-tevopreql and agRNA56114-tevopreql decreased the editing efficiency of the ABE8e-spCas9-WT when compared to sgRNACtrl (FIG. 1H).
- the library with agRNAs56114 slightly decreased the editing efficiency.
- Phage-assisted non-continuous evolution has been used in the past to increase the activity of ABEs (Richter).
- a selection method was designed to evolve variants with decreased bystander edits that can also work with the agRNAs disclosed herein.
- the activity of the nucleobase editor encoded on the M13 phage genome was linked with the expression of the gene 111 (encoding for the pill protein) that is required for phage replication (Esvelt).
- the selection plasmids also encoded the corresponding agRNA targeting the SNV, as well as a C-terminal Intein-nCas9-SpRY.
- a functional nucleobase editor was expressed, and was capable of performing the editing (FIGs. 2A-2B and 6C).
- the initial batch cultivation was infected with the selection phage and after 12 hours of cultivation, different volumes of the supernatant were used to infect the next batch cultivation. Phage DNA was isolated from each selection round and NGS obtained after the last round of evolution (FIG. 6D).
- the TadA mutational landscape was determined, the top 50 most enriched mutations were selected, and individually tested in the DNMT1 site (FIGs. 7 A and 7C). Variants with both, higher efficiency and reduced bystander editing pattern at position Ar, were detected, when compared with the Abe8e-SpRY BE (FIG. 2E).
- the PANCE evolved TadA variants were also benchmarked against the ABE9 nucleobase editor (both WT and SpRY) that showed low editing efficiency and no relative bystander reduction in the DNMT1 site (FIGs. 2E and 7A-7C). Variants displaying V28C, L34W, D54T, and I95D showed potential to generate a perfect edit at position Ar, (FIG. 2E).
- ABE8e The crystal structure of ABE8e was computationally analyzed to better understand the impact of these mutations on ABE8e. Variants V28C and L34W were generated in silica, separately, and their interactions with surrounding amino acids and nucleotides were compared to wild-type, but no changes in interactions were predicted (Methods) (FIG. 8A). It possible that these mutations induced a conformational change in ABE8e that alter interactions of ABE8e residues H57 and C87 with nucleotide 8-Az(26) of the gRNA. Based on the wild-type crystal structure, H57 and C87 were predicted to establish three van der Waals interactions and two hydrogen bonds with 8-Az(26), respectively.
- Non-stop codon based PANCE selection proved to be a powerful tool to evolve base editing mutants that showed decreased bystander editing without losing on-target activity.
- Example 3 Machine learning guided identification of additional ABE candidates.
- protein language models trained on massive, non-redundant protein sequence datasets can learn these general, evolutionarily plausible mutational patterns (Hie et al., Meier et al., Hie et al. 2). This knowledge can be leveraged to predict mutations likely to be beneficial, guiding protein evolution more efficiently.
- these models can be used to predict the probability distribution of each amino acid at any given position along a protein sequence, where the probability distribution reflects the knowledge acquired by the models on their training dataset. Positions where the model assigns a higher probability to an amino acid than the wild-type residue are considered more likely than a random pick to yield a positive effect on the protein fitness.
- the wild-type residue M151 formed two hydrogen bonds with C146 and Q154.
- the mutation M151E allowed an additional hydrogen bond to form between the carboxyl group of glutamate (acceptor) and the amino group of Q154 (donor) (FIG. 8B).
- Glutamate also introduced a negative charge compared to methionine, potentially changing the local conformation, distances (the measured distance 5.214 A), and interactions between E151 and nucleotide C(25) of the gRNA (FIG. 3F).
- Example 4 agRNA, PANCE and ML variants outperform current base editing variants
- agRNAsei 14-tevopreqi was combined with ABE variants (FIGs. 7C and 9B), herein referred to as ABExl ABEx2, ABEx3, and ABEx4.
- ABExl and ABEx2 were generated in the PANCE experiment, ABEx3 using ML, and ABEx4 as the combination of both techniques (FIG. 4A). All the ABEx-spCas9-SpRY variants showed improved efficiency and reduced bystander editing when combined with the agRNA at the target site.
- ABExl demonstrated higher editing efficiency at position Ar, and reduced bystander editing compared to the control, using both sgRNA and agRNA (FIG. 4B) (FIGs. 10A-10D).
- ABEx2 L34W
- ABEx2 achieved precise editing and exhibited the same efficiency as ABE8e-SpRY at position A13, while also minimizing bystander editing at position A17/16/15 (FIG. 4B).
- ABEx2 showed precise editing.
- ABEx3 (M151E), also reduced bystander editing when combined with agRNAseiu-tevopreqi and increased efficiency at position A15 (FIG. 4B).
- a previous strategy to remove bystander editing using the PAMless SpRY variant of the spCas9 was based on the design of guides that moved the editing window to isolate the target nucelobase [Alves].
- the ability of ABE8e-spCas9-SpRY to isolate A13 after moving the editing window downstream (-1, -2, -3 bp) and upstream (+1 bp) of the control sgRNA was tested, and it was found that ABE8e-spCas9-SpRY was not able to isolate A13 with either the sgRNActrior with agRNAseiu-tevopreqi.
- variant ABExl showed increased A13 editing with both centered and +1 sgRNA and agRNAseiu-tevopreqi (FIG. 12). This result highlights the importance of targeted design of guide RNAs and their further combination with evolved nucleobase editors (e.g., ABExl) in order to find the best combination (e.g., a nucleobase editor and agRNA pair) to fix a particular mutation to maximize efficiency and safety.
- evolved nucleobase editors e.g., ABExl
- LB and 2xYT media were generated using MP BiomedicalsTM media capsules according to the manufacturer’s protocol.
- 16 g/L agar was added for standard, and 7 g/L agar was added for soft agar. All media was sterilized by autoclaving.
- agRNA library generation Placeholder 16 g/L agar was added for standard, and 7 g/L agar was added for soft agar. All media was sterilized by autoclaving.
- the agRNA library consisted of an upstream binding sequence (UBS) that was the reverse complement of the downstream sequence of the target sequence (also referred to as “the target” or “target”) of the nucleobase editor, a counterloop, and a downstream binding sequence (DBS) that bound the upstream sequence of the target.
- UBS upstream binding sequence
- target also referred to as “the target” or “target”
- DBS downstream binding sequence
- the upstream and downstream binding sequences were of different lengths and had different binding regions in the 1 to 11 bp region upstream and downstream of the target.
- the counterloop library consisted of 33 different DNA sequences, of which, the longer ones form GC rich hairpins.
- the final library contained every possible combination of an UBS, counterloop, and DBS.
- the agRNA for DNMT1 was ordered as Agilent DNA Oligo Pool.
- the oligos for the DNMT1 library contained a gibson overhang, gRNA, gRNA scaffold, agRNA library and a terminator followed by a short DNA sequence used as primer binding site.
- the target for the DNMT1 library was already cloned on the plasmid used as backbone.
- the DNA Oligo Pool library was amplified with the oligos Lib_F and Lib_R via PCR.
- the backbone pU6- tevopreql-GG- acceptor (Addgene #174038) was PCR amplified using the oligos SplitF and SplitR.
- the PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol.
- the fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol.
- 2 pL of the Gibson assembly mix are directly transformed into Lucigen's Endura Competent Cells and after recovery on 1ml of SOC media, plated on Carbenicillin/Agar plates poured in NuncTM Square BioAssay Dishes (Cole Palmer #EW-01929-00).
- the library was amplified from the pU6-tevopreql-GG-acceptor.
- the backbone was digested using PspXI and Esp3I and cloned by Gibson assembly following the previously described protocol.
- HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37°C and 5% CO2. 20 million cells were seeded in a 225mm 3 dish and co-transfected the day after using Lipofectamine 3000 (Thermo Fisher) with library plasmid amount corresponding to 1 plasmid per cell and 20 ug of nucleobase editor pCMV-T7-ABE8e-nSpCas9-P2A-EGFP (KAC978) (Addgene #185910). Genomic DNA was collected from cells 5 days after transfection.
- the generation of these context libraries differed from the generation of the agRNA, since extensive recombination events occurred when the gRNA, gRNA scaffold, agRNA and target were introduced as one oligo.
- the gRNA and target with 11 bp upstream and 25 bp downstream of the native genomic context were cloned as an oligo lacking the gRNA scaffold and hairpin. Instead of these, the oligo had 2 outward facing Bsal cutting sites with 10 randomized base pairs at that position.
- the DNA Oligo Pool libraries are amplified with the oligos Lib_F and Lib_R via PCR.
- the backbone sgBbsI (p2Tol-U6-2xBbsI-sgRNA- HygR) (Addgene #71485) was PCR amplified using the oligos BB_R and BB_F.
- the PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol.
- the fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol.
- the library was then digested using Bsal according to the manufacturer’s protocol and gel purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol.
- the gRNA scaffold, hairpin and terminator with an inward facing Bsal cutting site up- and downstream were ordered as cloned gene synthesis from IDT.
- the plasmid was also Bsal digested and the fragment was purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol.
- the insert and library were ligated using New England Biolabs T4 DNA Ligase according to the Manufacturer’s protocol with 150 ng backbone and a 10:1 ratio of the insert to the backbone.
- HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO 2 .
- Tol2 transposon-mediated library integration 5 million cells (-400X coverage) were seeded in 175 mm 3 dishes. The following day, cells were co-transfected using Lipofectamine 3000 (Thermo Fisher) with 10 ug of Tol2 transposase plasmid (pCMV-Tol2 Addgene # #31823) and 10 ug of Path_Var library. To generate stable library cell lines, cells were selected with hygromycin (25 ug/ml) starting the day after transfection and continued for > 2-3 weeks. Following, lOug of nucleobase editor was transfected using Lipofectamine 3000 to 2.5 million cells (-200X coverage) were seeded the day before in a 100 mm 3 dish. Genomic DNA was collected from cells 5 days after transfection.
- the resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs).
- the PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via InvitrogenTM QubitTM.
- a 4 nM Pool of the different libraries was generated and 10% 4 nM PhiX was added.
- the Pool was sequenced using the Illumina MiSeq Reagent Kit v2 (300-cycles) according to the manufacturer’s protocol. Genome editing of genomic loci
- HEK293T, HeLa and HepG2 were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO2. 20.000 cells were seeded in 96 well plates (Corning) and transfected the day after using jetOPTIMUS® (Polyplus) following manufacturer instructions. 50 ng of sgRNA or agRNA (both cloned in pU6-tevopreql-GG-acceptor) with 150 ng of nucleobase editor were co-transfected, and cells were harvested after three days for Sanger sequencing (Genewiz) or high throughput sequencing (Quintara Biosciences or in house Illumina miSeq).
- the evaluation of the context library involved analyzing gRNA and agRNA libraries, which comprised approximately 12,000 spacer sequences and their respective contextual sequences.
- the efficiency and editing profiles for each gRNA and agRNA were established using custom scripts developed in R. First, the target sites — where each spacer binds within the context — were extracted from the NGS reads. Subsequently, for each spacer in the library, all combinations of adenine to guanine conversions were aligned against these extracted sequences. Spacers with fewer than 25 total reads were excluded from the analysis. To quantify overall editing efficiency for the different nucleobase editors, the mean A to G conversion rate was calculated by averaging the editing frequencies at each targeted position. Generation of the selection phages for PANCE
- the PANCE selection phages are carrying the CDS for the ABE8e adenine deaminase instead of the CDS of Pill.
- the ABE8e adenine deaminase has part of the peptide linker sequence and a C-terminal fused intein CDS to enable it to encode the relatively small protein and not the whole nucleobase editor.
- the phages were generated by PCR amplifying the ABE8e adenine deaminase including the partial sequence of the peptide linker using the oligonucleotides ABE_M13_F and ABE_M13_R.
- the N-terminal Npu DnaE intein was ordered as gBlock and amplified using the oligonucleotides Npu_ABE_F and Npu_M13_R.
- the phage backbone was amplified using the oligonucleotides GOI_M13_F and G0I_M13_R using a wildtype M13 phage genomic DNA as a template. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C.
- the cells were immediately mixed with 3 mL soft LB-agar (0.7 %) and plated on LB bottom agar plates containing 100 pg/mL carbenicillin. The plates were incubated at 37 °C overnight. Plaques were picked into 50 pL 2xYT media and 1 pL was used as a template for colony PCR using the oligonucleotides ABE_M13_F and Npu_M13_R. Positive phages were amplified by adding the remaining 2xYT media to a freshly grown S2060 pJC175e culture at the OD600 of 0.4 and cultivating for 16 hours at 37 °C. The cultures were spun down to remove the E.
- coli cells and the phages were precipitated by adding a 20% polyethylene glycol (8000) and 2.5M sodium chloride solution in a 1:4 ratio to the culture supernatant.
- the mixture is incubated for at least 3 hours at 4 °C and the phage pellet is resuspended in a PBS buffer, the phage titer was quantified using the Progen Phage Titration ELISA kit and the phages were stored at 4 °C until usage.
- 3 mL of the culture supernatant were used for phage DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit. The isolated DNA was sent to Plasmidsaurus for whole phage DNA sequencing. Generation of the selection cells for PANCE
- the selection plasmids were designed on the basis of using pJC175e and adding mutations that when edited by the ABE base editor, only perfect edits restore Pill activity while bystander lower pill activity.
- the pJC175e backbone was amplified using the oligonucleotides pIII_gBlock_R and pJC175e_Cas_F.
- the part of the pill CDS containing the mutation followed by the corresponding guide correcting the introduced mutation downstream as well as the C-terminal DnaE intein necessary to fuse the ABE8e adenine base editor encoded by the phage to the Cas9 encoded by the selection plasmid were ordered as gBlock.
- the three different gBlocks for the three different selection plasmids each encoding a different pill mutation were amplified via PCR using the oligonucleotides gBlock_R and gBlock_pIII_F.
- the base Cas9 CDS was amplified from ABE8e plasmid (Addgene #138489) using the oligonucleotides BE_Npu_F and BE_pJC175e_R. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C.
- the clones with verified sequence were used to generate electrocompetent cells that were then transformed with the mutation plasmid MP4 (Badran).
- the cells were recovered in 1 mL SOC media and plated on 2xYT agar plates containing 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. 5 colonies were used to start 50 pL shake flask 2xYT 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol cultivations. The cultivations were used to freeze 20 % glycerol stocks in 1 mL aliquots after 16 hours. Each culture was also used to isolate plasmid DNA for whole plasmid sequencing by Plasmidsaurus to select the glycerol stocks with no mutation in MP4 and the selection plasmid.
- the evolution was performed as 10 consecutive batch cultivations in triplicates using a mix of three different selection plasmids in each evolution.
- the day prior to the cultivation three 3 mL overnight cultures are prepared using 2xYT media with 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. The cultures are inoculated with the glycerol stock of one of the selection plasmids each.
- 3-4 hours prior to phage infection 50 mL shake flasks are inoculated with a combined OD600 of 0.1 of the pooled overnight cultures with the different selection plasmids.
- the cells are cultivated in 2xYT with 100 pg/mL carbenicillin and 25 g/mL chloramphenicol. 30 minutes prior reaching an OD of 0.4, the cells are induced with 0.5 % arabinose and when the cells reach the OD600 of 0.4, the cells are infected with the selection phages at an MOI of 1 for the first selection round.
- the evolution is performed for 12 hours at 37 °C and after that the entire cultivation was spun down and the supernatant was filtered with 0.2 pM filters. For selection round 2-4, 500 pL, for round 5-6 100 pL, and for the remaining rounds 5 pL of the supernatant were used to infect the following evolution.
- the phage titer after each selection round was determined using the Progen Phage Titration ELISA kit. 3 mL of each culture supernatant was used for phage DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit.
- the resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs).
- the PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via InvitrogenTM QubitTM.
- a 4 nM Pool of the different libraries was generated and 10 % 4 nM PhiX was added.
- the Pool was sequenced using the Illumina MiSeq Reagent Kit v3 (600-cycles) according to the manufacturer’s protocol.
- the number of models in the ensemble (e.g., 4 out of 6, versus 2 out of 6) that agreed on a given prediction (i.e., a specific amino acid substitution) determined the score that was given to a predicted substitution, and a higher score was more likely to yield a positive result (e.g., an evolved protein that retained function/activity).
- the ensemble of protein language models was applied to the TadA-8e sequence, which yielded the following predictions (score in parenthesis): R26G (6), F84L (6), N108D (6), Y149F (6), F156K (6), V106A (5), P152R (5), H8D (5), N157K (5), R111T (4), C146S (3), R111A (2), C146Q (2), M151E (2), C146K (1), Y123H (1), M151Q (1), A48P (1), V155E (1), S109P (1), P152Q (1).
- Lor gRNA engineering a library of 3 '-extended sgRNAs, or anchor-guide RNAs (agRNAs), was designed and tested to improve the precision of ABEs. agRNA candidates from this library screening were then used as part of a Phage Assisted Non-Continuous Evolution (PANCE) system to evolve a more precise TadA-8e enzyme. Using a dual selection pressure (favoring precise editing at the target site while penalizing bystander edits) several variants were identified with narrowed editing windows. Notably, the PANCE- evolved V28C variant exhibited enhanced on-target efficiency while reducing bystander editing. Editing patterns across ⁇ 12K pathogenic mutations demonstrated that V28C is ⁇ 2-3 fold more precise and -20% more efficient than ABE8e.
- Machine learning was applied to computationally design TadA-8e mutants with improved precision and efficiency.
- the M151E mutation narrowed the editing window while increasing on-target editing.
- Example 7 3 '-gRNA extensions restrict editing window and reduce bystander editing
- the mechanism underlying the adenosine deamination process involves that the TadA-8e domain engages with the exposed single- stranded region of the PAM-distal nontarget strand (18) (FIG. 21A).
- the TadA-8e deaminase when attached to the Cas9 protein, induces specific editing patterns within narrow DNA regions, covering several base pairs. This connection limits the enzyme to act on certain nucleotides, defining what is known as the editing window.
- the present disclosure describes anchor guide RNAs (agRNAs) that stabilized the DNA strand within the active site.
- the agRNAs were designed by adding nucleotides to the 3' end of the gRNA scaffold.
- the agRNAs were designed as a library of short sequences and entire hairpin structures (counter-loops) at the opposite site of the editing loop to introduce structures that may sterically restrict the movement of the DNA and TadA enzyme even further (Fig. 1A, Fig. SIB). This design process yielded an agRNAs library containing -60K candidates.
- a lentiviral vector was constructed with the editing target downstream of the agRNA.
- the tested library was designed to target a site in the human DNMT1 locus, being an optimal candidate for screening, both for high accessibility for editing in HEK293T cells, and the multiple adenines context within the editing window.
- the ABE8e-spCas9-WT base editor in combination with a non-3 'extended guide (sgRNACtrl) showed a high editing efficiency for the four adenines in the editing window (A13, A14, A15 and A17).
- Next a library was designed to edit A13.
- agRNAs (termed “anchors”) were selected that showed higher efficiency and lower bystander editing in the DNMT1 agRNA library (FIG 21C). Five candidates were selected to test in the native context in HEK293T cells, and all of them showed a decrease in bystander editing (FIG 16C).
- the enzyme TadA-8e was evolved to further decrease bystander edits and to evaluate how the combination of new variants and agRNA could impact the editing pattern.
- Phage-assisted non-continuous evolution has been used in the past to increase the activity of ABEs (19).
- a selection method was designed to evolve variants with decreased bystander edits and that worked with agRNAs. Specifically, a selection pressure was designed that decreases the phage titer in response to bystander edits, but also increases phage titer upon “perfect editing” (e.g., editing the “target” base), to prevent the evolution of an inactive TadA enzyme.
- perfect editing e.g., editing the “target” base
- the accessory plasmid encoded the corresponding agRNA targeting the Single Nucleotide Variant (SNV), as well as a C-terminal Intein-nCas9-SpRY.
- Phage DNA was isolated from each PANCE round and sequenced using NGS (FIG. 22B). An increase in phage titer was observed after round 4, which suggested an increased activity of the enzyme mutants encoded by the phage (FIG. 22C).
- the top 50 most enriched mutations were selected and individually tested the editing pattern in the human DNMT1 locus in HEK293 cells (FIG. 17C). Variants were detected with both higher efficiency and reduced bystander editing pattern at position A13, when compared with the Abe8e-SpRY BE (FIG. 17C).
- the PANCE evolved TadA variants were also benchmarked against the ABE9 base editor (both WT and SpRY) that showed low editing efficiency and no relative bystander reduction in the DNMT1 site (FIG. 17C).
- Variants displaying V28C (SEQ ID NO: 15) and E34W (SEQ ID NO: 16) showed the highest PreS with both sgRNACtrl and agRNA56114 (FIG. 17B and 17C).
- NGS abundance analysis demonstrates that ABE8e rarely achieved perfect editing (FIG.17D). Unlike the ABE8e-WT, the SpRY version did not reduce bystander editing with the agRNA56114, however increased the editing efficiency (FIG.17D).
- the primary outcome for both V28C (SEQ ID NO: 15) and E34W (SEQ ID NO: 16) was the perfect editing, with an average of 9.3% and 10.9%, respectively, when used with the sgRNACtrl (FIG.17D).
- V28C in combination with agRNA56114, not only reduced bystander editing but also increased perfect editing to 24.4%.
- L34W increased to 18% the on-target editing and kept all bystander editing below 5% (FIG.17D).
- 80.8% ( ⁇ 0.5) of the reads for L34W in combination with agRNA56114 were the perfect editing (Fig. 2E).
- V28C (SEQ ID NO: 15) with agRNA56114 (SEQ ID NO:2) showed 53.4% ( ⁇ 4.6) of the reads with perfect edits and the highest fold-change improvement when compared with ABE8e sgRNACtrl (47.24 ⁇ 1.95) (Fig. 2F).
- Both variants exhibited enrichment across all rounds of PANCE evolution (FIG.17G).
- Cas9-WT base editors with these mutations also showed increased precision (FIG. 22E).
- V28C’s (SEQ ID NO: 15) increased on-target editing was accompanied by DNA Cas9-dependent off-target editing, similar to ABE8e-SpRY, across four sites (FIG.17H).
- an orthogonal R-loop assay was performed to assess the Cas9-independent on target editing, and observed a substantial decrease in both variants across 5 different sites (FIG.171).
- RNA off-target editing was analyzed by RNA-sequencing. The use of the anchor reduced the A-to-I deamination by 3.6-fold (FIG.22F). When combined with the evolved variants, the reduction was even higher with a 14.2 and 22.7-fold for V28C (sgRNACtrl and agRNA56114) and 18.33 and 21.9-fold for L34W (FIG.22F).
- Example 9 Machine Learning-guided design reveals overengineering constraints in ABE8e and unveils novel precise mutations
- NGS abundance analysis of M151E editing showed increased precision in position A15, with the highest fold-change in combination with the and agRNA56114 (25.7-fold ⁇ 2.4) and a 15.4-fold ( ⁇ 1.3) in A13 (FIG.18D and 18E).
- the PANCE derived variant V28C (SEQ ID NO: 15); highest PreS) was machine-learning evolved” (ML-evolved) using the same approach.
- the ML-derived mutation with the highest PreS was combined with the PANCE-derived V28C mutation to determine if the ML approach could be additive to PANCE.
- the V28C-M151E variant (SEQ ID NO: 18) showed reduced editing efficiency at DNMT1 position, but precise edit in other contexts such Site9 (FIG.18G, FIGs. 23A and 23B).
- the M151E mutation was then cross-referenced with the amino acid exchanges observed during the PANCE.
- amino acid substitutions at position 151 were evaluated, an enrichment in aspartic acid was observed across the different rounds of evolution (FIG. 18H and 18G). Both glutamic acid and aspartic acid are negatively charged amino acids, delivering similar editing patterns at the DNMT1 site (FIG. 18J.).
- Example 10 ABE8e-V28C achieves superior precision and efficiency across diverse genomic sites
- V28C variant SEQ ID NO: 15
- FIG. 19D Significant C-to-N changes across all the sites were not detected when compared to ABE8e
- V28C variant SEQ ID NO: 15
- ABE8e ABE8e across 12 different sites in the human genome using HEK293T cells
- V28C variant refined the deaminase’s editing window, improving precision at every tested site (FIG. 19E-19N).
- V28C produced a constrained 4- A editing window, yielding an average 27.1% increase in on- target efficiency (FIG. 190 and 19P).
- Example 11 V28C drastically improves precise correction of pathogenic variants in iPSCs [0336] NGS abundance analysis revealed that the V28C mutation significantly enriched single-base deamination across multiple loci. Structural modeling of the wild-type enzyme predicts that H57 and C87 establish three van der Waals interactions and two hydrogen bonds with 8-Az(26) (FIG. 20A). Without being bound by theory, in the V28C mutant, the C28 may shift closer to 8-Az(26) to reduce the measured distance to below 5.101 A. In line with this, a 0.77 A decrease in distance between residue 28 and 8-Az(26) was detected (FIG. 20A).
- V28C narrows the editing window and minimizes bystander editing without compromising catalytic efficiency.
- V28C variant SEQ ID NO: 15
- PCSK9 a therapeutic target for lowering EDE levels and reducing the risk of coronary heart disease
- gRNA targeting the exon 1-intron 1 splice donor site
- FIG. 20B a loss-of- function mutation was introduced.
- V28C variant SEQ ID NO: 15
- V28C variant SEQ ID NO: 15
- the SNCA E46K mutation was targeted, which causes early-onset Parkinson’s disease by promoting a-synuclein aggregation and neuronal toxicity (FIG. 20E).
- the target sequence shares 45% identity with the DNMT1 site used to identify agRNA56114 (FIG. 20F).
- V28C variant (SEQ ID NO: 15) demonstrated superior precision in reverting the pathogenic mutation, achieving 11.6% ( ⁇ 0.8) perfect edits (% of edited reads) compared to just 0.65% ( ⁇ 0.14) with ABE8e, representing a 17.6-fold increase in precision (FIG. 20J).
- V28C with agRNA56114 (SEQ ID NO:2) further improved precision, yielding 17.5% ( ⁇ 1.06) perfect edits, a 26.6-fold enhancement over ABE8e alone (FIG. 201- 20J).
- Introducing the PLM predicted M151E mutation into V28C further increased perfect editing to 47.4% when combined with sgRNACtrl and 53.0 with agRNA56114 (calculated from edited reads).
- LB and 2xYT media were generated using MP BiomedicalsTM media capsules according to the manufacturer’s protocol.
- 16 g/L agar was added for standard, and 7 g/L agar was added for soft agar. All media was sterilized by autoclaving. Oligo s/primers and plasmids used in the study can be found in Table A. All gRNAs were cloned using KLD (NEB) cloning according to the manufacturer’s protocol. agRNA library generation
- the agRNA library consists of an upstream sequence (US) that is the reverse complement of the downstream sequence of the target, a counter-loop and a downstream binding sequence (DS) that binds the upstream sequence of the target.
- US upstream sequence
- DS downstream binding sequence
- the upstream and downstream binding sequences are of different length and have different binding regions in the 1 to 11 bp region upstream and downstream of the target.
- the counter-loop library consists of 33 different DNA sequences of which the longer one’s form GC rich hairpins.
- the final library is a library containing every combination of the possible UBS, hairpin, and DBS combinations. A script to generate the hairpin library for a novel context can be found in supplementary code 1.
- the agRNA for DNMT1 was ordered as Agilent DNA Oligo Pool (64610 oligos).
- the oligos for the DNMT1 library contained a Gibson overhang, gRNA, gRNA scaffold, agRNA library and a terminator followed by a short DNA sequence used as primer binding site.
- the target for the DNMT1 library was already cloned on the plasmid used as backbone.
- the DNA Oligo Pool library was amplified with the oligos Lib_F and Lib_R via PCR.
- the backbone pU6-tevopreql-GG-acceptor (Addgene #174038) was PCR amplified using the oligos SplitF and SplitR.
- the PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol.
- the fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol.
- 2 pL of the Gibson assembly mix are directly transformed into Lucigen's Endura Competent Cells and after recovery on 1ml of SOC media, plated on Carbenicillin/Agar plates poured in NuncTM Square Bio As say Dishes (Cole Palmer #EW-01929-00).
- the library was amplified from the pU6-tevopreql-GG-acceptor.
- the backbone was digested using PspXI and Esp3I and cloned by Gibson assembly following the previously described protocol.
- HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37°C and 5% CO2. Cells were seeded at a density of 5 x 10 6 cells per 10 cm plate in DMEM supplemented with 10% FBS and antibiotics (Thermo # 15240062). The following day, cells were transfected using Transporter 5® Transfection Reagent with a plasmid mix containing pVSV- G (3.86 pg), pPax2 (8.57 pg), and the lentiviral transfer vector (9.23 pg) in Opti-MEM (Thermo Fisher).
- the DNA-transporter complexes were incubated at room temperature for 20 minutes before being added to the culture media. After 24 hours, the media was replaced with fresh DMEM, and viral supernatants were collected at 48- and 72-hours post-transfection.
- the harvested media was filtered through a 0.45 pm vacuum filter system and concentrated using Lenti-X Concentrator (Takara Bio) at a 1:3 ratio (media:concentrator) by incubation at 4°C for at least 30 minutes, followed by centrifugation at 1,500 x g for 45 minutes at 4°C.
- the viral pellet was resuspended in phosphate-buffered saline (PBS), aliquoted, and stored at -80°C until further use.
- HEK293T cells were transduced with the lentiviral library with an MOI of 0.2. After 24-48hs, media was removed and exchanged with fresh media with 2 ug/ml of puromycin. Selection continued during -14 days. To test the editing pattern across our library, 20 million cells (300X coverage) were seeded in a 225mm3 dish and transfected the day after using Lipof ectamine 3000 (Thermo Fisher) with 20 ug of base editor pCMV-T7-ABE8e-nSpCas9- P2A-EGFP (KAC978) (Addgene #185910). Genomic DNA was collected from cells 5 days after transfection.
- Score (% Perfect Edit) / ((% Perfect Edit + % Bystander) 2 ).
- Anchors achieving the highest scores and demonstrating at least 20% overall editing efficiency were further characterized experimentally.
- the backbone sgBbsI (p2Tol-U6-2xBbsI-sgRNA-HygR) (Addgene #71485) was PCR amplified using the oligos BB_R and BB_F.
- the PCR product of the backbone was Dpnl digested overnight at 37 °C and both PCR products were purified using the New England Biolabs Monarch PCR & DNA Cleanup Kit according to the Manufacturer’s protocol.
- the fragments were assembled using the New England Biolabs Gibson Assembly® Master Mix in a 10:1 ratio Library to backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol.
- the library was then digested using Bsal according to the manufacturer’s protocol and gel purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol.
- the gRNA scaffold, hairpin and terminator with an inward facing Bsal cutting site up- and downstream were ordered as cloned gene synthesis from IDT.
- the plasmid was also Bsal digested, and the fragment was purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol.
- the insert and library were ligated using New England Biolabs T4 DNA Ligase according to the Manufacturer’s protocol with 150 ng backbone and a 10:1 ratio of the insert to the backbone.
- HEK293T were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37°C and 5% CO2.
- Tol2 transposon-mediated library integration 5 million cells (-400X coverage) were seeded in 175mm3dishes. The following day, cells were co-transfected using Lipofectamine 3000 (Thermo Fisher) with lOug of Tol2 transposase plasmid (pCMV-Tol2 Addgene # #31823) and 10 ug of Path_Var library. To generate stable library cell lines, cells were selected with hygromycin (25 ug/ml) starting the day after transfection and continued for > 2-3 weeks. Following, lOug of base editor was transfected using Lipofectamine 3000 to 2.5 million cells (-200X coverage) were seeded the day before in a 100mm3 dish. Genomic DNA was collected from cells 5 days after transfection.
- the resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs).
- the PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via InvitrogenTM QubitTM.
- a 4 nM Pool of the different libraries was generated and 10 % 4 nM PhiX was added.
- the Pool was sequenced using the Illumina MiSeq Reagent Kit v2 (300-cycles) according to the manufacturer’s protocol.
- the evaluation of the context library involved analyzing gRNA libraries, which comprised approximately 12,000 spacer sequences and their respective contextual sequences.
- the efficiency and editing profiles for each gRNA were established using custom scripts developed in R. First, the target sites — where each spacer binds within the context — were extracted from the NGS reads. Subsequently, for each spacer in the library, all combinations of adenine to guanine conversions were aligned against these extracted sequences. Spacers with fewer than 25 total reads were excluded from the analysis. To quantify overall editing efficiency for the different base editors, the mean A to G conversion rate was calculated by averaging the editing frequencies at each targeted position.
- HEK293T, HeLa and HepG2 were purchased from ATCC and maintained in DMEM (Life Technologies) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO2. 20.000 cells were seeded in 96 well plates (Corning) and transfected the day after using jetOPTIMUS® (Polyplus) following manufacturer instructions. 50 ng of sgRNA or agRNA (both cloned in BPK1520 (Plasmid #65777)) with 150ng of base editor were cotransfected, and cells were harvested after three days for Sanger sequencing (Genewiz) or HTS (Quintara Biosciences or in house Illumina miSeq). HTS data was analyzed using CRISPRESSO and BE-analyzer (CRISPR RGEN tools) (26).
- the PANCE selection phages are carrying the CDS for the ABE8e adenine deaminase instead of the CDS of Pill.
- the ABE8e adenine deaminase has part of the peptide linker sequence and a C-terminal fused intein CDS to enable it to encode the relatively small protein and not the whole base editor.
- the phages were generated by PCR amplifying the ABE8e adenine deaminase including the partial sequence of the peptide linker using the oligonucleotides ABE_M13_F and ABE_M13_R.
- the N-terminal Npu DnaE intein was ordered as gBlock and amplified using the oligonucleotides Npu_ABE_F and Npu_M13_R.
- the phage backbone was amplified using the oligonucleotides GOI_M13_F and G0I_M13_R using a wildtype M13 phage genomic DNA as a template. All PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C.
- the cells were immediately mixed with 3 mL soft LB-agar (0.7 %) and plated on LB bottom agar plates containing 100 pg/mL carbenicillin. The plates were incubated at 37 °C overnight. Plaques were picked into 50 pL 2xYT media and 1 pL was used as a template for colony PCR using the oligonucleotides ABE_M13_F and Npu_M13_R. Positive phages were amplified by adding the remaining 2xYT media to a freshly grown S2060 pJC175e culture at the OD600 of 0.4 and cultivating for 16 h at 37 °C. The cultures were spun down to remove the E.
- coli cells and the phages were precipitated by adding a 20% polyethylene glycol (8000) and 2.5M sodium chloride solution in a 1:4 ratio to the culture supernatant.
- the mixture is incubated for at least 3h at 4 °C and the phage pellet is resuspended in a PBS buffer, the phage titer was quantified using the Progen Phage Titration ELISA kit and the phages were stored at 4 °C until usage.
- 3 mL of the culture supernatant were used for phage DNA isolation using the Omega Bio-tek E.Z.N.A.® M13 DNA Mini Kit. The isolated DNA was sent to Plasmidsaurus for whole phage DNA sequencing.
- the selection plasmids were designed on the basis of using pJC175e (Addgene #79219) (27) and adding mutations that when edited by the ABE base editor, only perfect edits restore Pill activity while bystander lower pill activity.
- the pJC175e backbone was amplified using the oligonucleotides pIII_gBlock_R and pJC175e_Cas_F.
- the part of the pill CDS containing the mutation followed by the corresponding guide correcting the introduced mutation downstream as well as the C-terminal DnaE intein necessary to fuse the ABE8e adenine base editor encoded by the phage to the Cas9 encoded by the selection plasmid were ordered as gBlock.
- Each selection plasmid also encodes the agRNA to fix the mutation on pill (SP1 (F366L); SP2 (K360R); SP3 (141 IV) ).
- the three different gBlocks for the three different selection plasmids each encoding a different pill mutation were amplified via PCR using the oligonucleotides gBlock_R and gBlock_pIII_F.
- the base Cas9 CDS was amplified from ABE8e plasmid (Addgene #138489) using the oligonucleotides BE_Npu_F and BE_pJC175e_R.
- PCRs were performed using NEB Q5® High-Fidelity 2X Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C. All fragments were digested with Dpnl (NEB) overnight at 37 °C in the PCR buffer and PCR purified using the NEB Monarch® PCR & DNA Cleanup Kit the next day. The fragments were assembled using the NEB Gibson Assembly® Master Mix according to the manufacturer’s protocol and transformed into electrocompetent S2060 competent cells. The cells were recovered in 500 pL SOC media for Ih and after that plated on LB agar plates with 100 g/ML carbenicillin and incubated overnight at 37 °C.
- Colonies were screened via colony PCr and positive clones were sent to whole plasmid sequencing. The clones with verified sequence were used to generate electrocompetent cells that were then transformed with the mutation plasmid MP4 (Addgene #69652) (28). The cells were recovered in 1 mL SOC media and plated on 2xYT agar plates containing 1% glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. 5 colonies were used to start 50 pL shake flask 2xYT 1 % glucose, 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol cultivations.
- the cultivations were used to freeze 20 % glycerol stocks in 1 mL aliquots after 16 h. Each culture was also used to isolate plasmid DNA for whole plasmid sequencing by Plasmidsaurus to select the glycerol stocks with no mutation in MP4 and the selection plasmid. PANCE
- the evolution was performed as 10 consecutive batch cultivations in triplicates using a mix of three different selection plasmids in each evolution.
- the day prior to the cultivation three 3 mL overnight cultures were prepared using 2xYT media with 1 % glucose, 100 pg/mL carbenicillin and 25 g/mL chloramphenicol. The cultures are inoculated with the glycerol stock of one of the selection plasmids each.
- 3-4 h prior to phage infection 50 mL shake flasks are inoculated with a combined OD600 of 0.1 of the pooled overnight cultures with the different selection plasmids.
- the cells are cultivated in 2xYT with 100 pg/mL carbenicillin and 25 pg/mL chloramphenicol. 30 min prior reaching an OD of 0.4, the cells are induced with 0.5 % arabinose and when the cells reach the OD600 of 0.4, the cells are infected with the selection phages at an MOI of 1 for the first selection round.
- the evolution is performed for 12 h at 37 °C and after that the entire cultivation was spun down and the supernatant was filtered with 0.2 pM filters. For selection round 2-4, 500 pL, for round 5-6 100 pL, and for the remaining rounds 5 pL of the supernatant were used to infect the following evolution.
- V28C, L34W and V28C&M151E correspond to amino acid sequences set forth in SEQ ID NOs.: 15, 16, and 18.
- the resulting PCR products were amplified in a second PCR reaction using a compatible combination of the New England Biolabs NEBNext® Multiplex Oligos for Illumina® (96 Unique Dual Index Primer Pairs).
- the PCR products were purified using the New England Biolabs NEB Monarch® Gel Extraction Kit and quantified via InvitrogenTM QubitTM.
- a 4 nM Pool of the different libraries was generated and 10 % 4 nM PhiX was added.
- the Pool was sequenced using the Illumina MiSeq Reagent Kit v3 (600-cycles) according to the manufacturer’s protocol.
- Variant testing in the DNMT1 site followed the same conditions described in Genome editing of genomic loci section.
- the command line in ChimeraX was used to visualize, mutate, and analyze interactions of target residues. Interactions of nucleotides 5-8 in the gRNA with residues of ABE8e were analyzed. Hydrogen bonds and non-polar (van der Waals) interactions were sought between carbon atoms in the gRNA and protein at a maximum distance of 3.8 Angstroms, and cationic interactions between nitrogen atoms within 5 A of an aromatic carbon involving our target residues and nucleotides.
- KOLF2.1J SNCA E46K-/- was purchased from Jackson laboratories and maintained in StemFlex media (Life Technologies #A3349401) supplemented with 10% FBS (Life Technologies) and kept at 37 °C and 5% CO2. Cells were cultured in coated plates with Img/ml Synthemax stock solution (Synthemax II-SC - Cat# 3535, Corning) following manufacturer instructions.
- Nucleofection was performed using the Neon electroporation system (Thermo Fisher) 10 ul kit. 200.000 cells were resuspended in ⁇ 10ul of buffer R and mixed with 200ng of gRNA vector and 200ug of base editor. Nucleofection was performed using the following parameters: Voltage: 1400 V, Width: 20 ms, Pulses: 2 pulses. After nucleofection, cells were plated in 12- well plates with 400ul of StemFlex without antibiotic and 1:100 dilution of RevitaCellTM Supplement (Cat# A26445-01, Gibco Life Technologies). After 24 hrs, media was replaced and editing was evaluated after 48hs by NGS (Quintara Biosciences).
- the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim.
- any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim.
- elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group.
- the invention, or aspects of the invention is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Wood Science & Technology (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Medicinal Chemistry (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
La divulgation concerne des compositions et des méthodes permettant d'améliorer l'efficacité d'édition et/ou de réduire l'édition à effet bystander d'éditeurs de nucléobase, tels qu'un éditeur de base adénine (ABE). Certains aspects de la présente divulgation décrivent des ARN guides d'ancrage (ARNg) spécifiques à une séquence avec des extensions d'acide nucléique 3' étendues qui améliorent l'efficacité d'édition et/ou réduisent l'édition à effet bystander d'un éditeur de nucléobase d'une manière dépendant du contexte. Certains aspects de la divulgation concernent des éditeurs de nucléobase évolués ayant des fenêtres d'édition rétrécies et une activité accrue.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463648151P | 2024-05-15 | 2024-05-15 | |
| US63/648,151 | 2024-05-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025240795A1 true WO2025240795A1 (fr) | 2025-11-20 |
Family
ID=97720769
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/029650 Pending WO2025240795A1 (fr) | 2024-05-15 | 2025-05-15 | Arng à extrémité modifiée pour édition de base améliorée |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025240795A1 (fr) |
-
2025
- 2025-05-15 WO PCT/US2025/029650 patent/WO2025240795A1/fr active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250011748A1 (en) | Base editors, compositions, and methods for modifying the mitochondrial genome | |
| EP4100032B1 (fr) | Procédés d'édition génomique pour le traitement de l'amyotrophie musculaire spinale | |
| US11344609B2 (en) | Compositions and methods for treating hemoglobinopathies | |
| US20220315906A1 (en) | Base editors with diversified targeting scope | |
| US20230021641A1 (en) | Cas9 variants having non-canonical pam specificities and uses thereof | |
| US12281338B2 (en) | Nucleobase editors comprising GeoCas9 and uses thereof | |
| US20230127008A1 (en) | Stat3-targeted base editor therapeutics for the treatment of melanoma and other cancers | |
| US20220177877A1 (en) | Highly multiplexed base editing | |
| US20220307001A1 (en) | Evolved cas9 variants and uses thereof | |
| US20240173430A1 (en) | Base editing for treating hutchinson-gilford progeria syndrome | |
| WO2021158995A1 (fr) | Algorithme prédictif d'éditeur de base et procédé d'utilisation | |
| JP2024041081A (ja) | アデノシン塩基編集因子の使用 | |
| WO2021222318A1 (fr) | Édition de base ciblée du gène ush2a | |
| JP2020534795A (ja) | ファージによって支援される連続的進化(pace)を用いて塩基編集因子を進化させるための方法および組成物 | |
| WO2022261509A1 (fr) | Éditeurs de bases cytosine à guanine améliorés | |
| US20250339559A1 (en) | Base editing-mediated readthrough of premature termination codons (bert) | |
| US20250090687A1 (en) | Mitochondrial base editors and methods for editing mitochondrial dna | |
| EP4192948A2 (fr) | Édition de base d'arn et d'adn par adar ingéniérisée | |
| WO2025240795A1 (fr) | Arng à extrémité modifiée pour édition de base améliorée | |
| WO2023205687A1 (fr) | Procédés et compositions d'édition primaire améliorés | |
| WO2024040083A1 (fr) | Cytosine désaminases évoluées et méthodes d'édition d'adn l'utilisant | |
| US20250228981A1 (en) | Base editing methods and compositions for treating triplet repeat disorders | |
| WO2024077267A1 (fr) | Méthodes et compositions d'édition d'amorce pour traiter des troubles de répétition de triplet | |
| WO2025122725A1 (fr) | Procédés et compositions pour l'édition de bases de tpp1 dans le traitement d'une maladie de batten | |
| WO2024163862A2 (fr) | Méthodes d'édition de gènes, systèmes et compositions pour le traitement de l'amyotrophie spinale |