[go: up one dir, main page]

WO2025010350A2 - Compositions et procédés d'édition précise du génome à l'aide de rétrons - Google Patents

Compositions et procédés d'édition précise du génome à l'aide de rétrons Download PDF

Info

Publication number
WO2025010350A2
WO2025010350A2 PCT/US2024/036763 US2024036763W WO2025010350A2 WO 2025010350 A2 WO2025010350 A2 WO 2025010350A2 US 2024036763 W US2024036763 W US 2024036763W WO 2025010350 A2 WO2025010350 A2 WO 2025010350A2
Authority
WO
WIPO (PCT)
Prior art keywords
retron
sequence
editor
nucleic acid
nuclease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/036763
Other languages
English (en)
Other versions
WO2025010350A3 (fr
Inventor
Ilya J. FINKELSTEIN
Hung-Che KUO
Kuang Hu
Jesse BUFFINGTON
Kamyab JAVANMANDI
You-Chiun CHANG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Texas System
University of Texas at Austin
Original Assignee
University of Texas System
University of Texas at Austin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Texas System, University of Texas at Austin filed Critical University of Texas System
Publication of WO2025010350A2 publication Critical patent/WO2025010350A2/fr
Publication of WO2025010350A3 publication Critical patent/WO2025010350A3/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • Retrons are bacterial anti-phage defense systems that consist of a self-priming reverse transcriptase (retron-RT), a cognate non-coding RNA (ncRNA) that primes and templates reverse transcription, and an accessory protein that participates in anti-viral immunity [36-44],
  • the ncRNA consists of two main regions: the msr (msDNA-specific region) and the msd (multi-copy ssDNA-coding region) (see Fig. 1A).
  • the msr is located at the 5' end of the ncRNA and forms a specific structure that is recognized by the RT [45, 46], This region typically contains one to three stable stem loops with 7-10 base pair stems and 3-10 nucleotide loops.
  • the msr also includes a highly conserved guanosine residue at the 5' end, which serves as the branching point for initiating reverse transcription (see Fig. 1A).
  • the msd is positioned downstream of the msr and can be divided into two parts: a dispensable region that can be replaced with a desired sequence (i.e., the donor DNA for genome editing) and a conserved region that is essential for the proper folding and function of the ncRNA.
  • the RT primes from the msr, and uses the msd as a template [45, 46],
  • the resulting msDNA remains covalently linked to the ncRNA through a 2',5'-phosphodiester bond formed between the branching guanosine residue in the msr and the 5' end of the msDNA [45, 46],
  • the host RNAse H degrades the RNA-DNA hybrid to expose the ssDNA (see Fig. 1A, right) [39],
  • retron-RTs can generate templates for homology-directed DNA repair in cells.
  • DNA can be coupled to a nuclease such as Cas9, Casl2a, Zinc Finger Nucleases (ZFNs), Transcription activator-like effector nucleases (TALENs), or similar to enable genetic editing, targeted genome mutations, and other gene engineering applications.
  • a nuclease such as Cas9, Casl2a, Zinc Finger Nucleases (ZFNs), Transcription activator-like effector nucleases (TALENs), or similar to enable genetic editing, targeted genome mutations, and other gene engineering applications.
  • a retron editor system comprising: a) a retron, wherein said retron comprises non-coding RNA (ncRNA) and a nucleotide sequence encoding a reverse transcriptase (RT) further wherein the ncRNA comprises an msd sequence and a msr sequence; b) a nucleotide sequence encoding a guide RNA (gRNA); and c) a nucleotide sequence encoding a nuclease.
  • ncRNA non-coding RNA
  • RT reverse transcriptase
  • a vector comprising a) a retron, wherein said retron comprises non-coding RNA (ncRNA) and a nucleotide sequence encoding a reverse transcriptase (RT) further wherein the ncRNA comprises an msd sequence and a msr sequence; b) a nucleotide sequence encoding a guide RNA (gRNA); and c) a nucleotide sequence encoding a nuclease.
  • ncRNA non-coding RNA
  • RT reverse transcriptase
  • a method of using a retron editor system to edit target nucleic acid comprising: a) providing a retron editor system, wherein said system comprises: i) a nucleotide sequence encoding a guide RNA (gRNA); ii) a retron, wherein said retron comprises non-coding RNA (ncRNA) and a nucleotide sequence encoding a reverse transcriptase (RT) further wherein the ncRNA comprises an msd sequence and a msr sequence, wherein said msd sequence comprises a donor nucleic acid; and iii) a nucleotide sequence encoding a nuclease; b) expressing a product from the retron editor system; and c) placing the retron editor system under conditions such that gene editing of target nucleic acid takes place.
  • gRNA guide RNA
  • ncRNA non-coding RNA
  • RT reverse transcriptase
  • Also disclosed is a method of modifying one or more target nucleic acids of interest at one or more target loci in a host cell comprising: a) transforming the host cell with a vector encoding a retron editing system; b) culturing the host cell or transformed progeny of the host cell under conditions sufficient for expressing a retron editor system from the vector; c) providing conditions suitable for the nuclease of the retron editor system to cut at or near the target loci; and d) providing conditions for the donor nucleic acid insertion sequence to recombine with the one or more target nucleic acid sequences to insert, delete, and/or substitute one or more bases of the sequence of the one or more target nucleic acid sequences to induce one or more sequence modifications at the one or more target loci.
  • a method of screening for functional retron editors comprising: a) providing a potential retron editor system; b) transforming a cell with the potential retron editor system, wherein said cell has been modified to express a signal upon successful transformation using the gene-editing retron system; and c) detecting the presence of the signal.
  • a retron editor system comprising: a) a first zinc finger nuclease (ZFN) or a ZFN nucleic acid sequence encoding a zinc finger nuclease, that binds a first area of a target region, wherein the first ZFN comprises a cleavage domain and a ZFN protein; b) a second ZFN or nucleic acid encoding a second ZFN that binds a second area of a target region, wherein the second ZFN comprises a cleavage domain and a second ZFN protein; wherein the first and second ZFN are capable of dimerization and cleavage of the target region; and c) a retron, wherein said retron comprises non-coding RNA (ncRNA) and a nucleotide sequence encoding a reverse transcriptase (RT) further wherein the ncRNA comprises an msd sequence and a msr sequence.
  • ncRNA non-coding RNA
  • RT reverse transcripta
  • a retron editor system comprising: a) a first transcriptional activator-like effect nuclease (TALEN) or a nucleic acid encoding a first TALEN, wherein the first TALEN comprises a target region binding site and a nuclease; b) a second TALEN or a nucleic acid encoding a second TALEN, wherein the second TALEN comprises a target region binding site and a nuclease; and c) a retron, wherein said retron comprises non-coding RNA (ncRNA) and a nucleotide sequence encoding a reverse transcriptase (RT) further wherein the ncRNA comprises an msd sequence and a msr sequence.
  • ncRNA non-coding RNA
  • RT reverse transcriptase
  • Figure 1A-H shows results of a metagenomic survey reveals highly active RTs in mammalian cells.
  • Figure 1A shows retron RTs self-prime from a non-coding RNA, termed the msr-msd.
  • msr gray
  • msd stem black
  • variable region black, in dotted box.
  • Arrow indicates the direction of reverse transcription.
  • Figure IB shows schematic of a retron editor. The RT is linked to Cas9 (shown) or another nuclease.
  • Figure 1C shows reverse transcription of the variable region of the msd generates a ssDNA template for homology-directed repair of the cleavage site.
  • Figure ID shows a plasmid-encoded fluorescent reporter assay.
  • the RFP has a 9bp deletion proximal to a Y64L mutation to completely turn off RFP fluorescence.
  • the reporter is co-transfected with a plasmid that encodes the retron editor, along with an msd that repairs the RFP.
  • RFP+ cells are imaged via confocal microscopy and quantified via flow cytometry.
  • Figure IE shows confocal microscopy images of cells transfected with Cas9 + Ecol-RT (left), Cas9 + ssODN (middle), and Cas9-MvalRT (right).
  • Figure IF shows phylogenetic classification of novel retron systems discovered from metagenomic sources.
  • FIG. 1G shows rank ordered list of RFP repair efficiency with 98 metagenomically discovered retron-RTs using flow cytometry.
  • Dashed line RFP+ repair with Ecol-RT.
  • Inset flow cytometry data for Cas9 with a scrambled sgRNA (top left); RFP-targeting sgRNA (top right); RFP sgRNA and a ssODN repair template (bottom left); RFP sgRNA and Mval-RT (bottom right).
  • Error bars mean of three replicates. The three most active RTs are labeled.
  • Figure 1H shows gene editing activity of the six most active retron-RTs, along with Ecol-RT with a cognate (diagonal) or non-cognate msr-msd. Flow cytometry was used to score activity with the transient RFP reporter. Mean of three replicates.
  • Figure 2A-B shows an overview of the bioinformatic retron discovery pipeline.
  • Figure 2A shows the pipeline involves five steps: 1: predict open reading frames (ORFs) with Prodigal [1]; 2: annotate reverse transcriptase (RT) genes using HMMER [2]; 3: identify putative msr-msd sequences in non-coding regions using cmfinder and infernal [3, 4]; 4: reannotate adjacent ORFs with HMMER; and 5: manually inspect msr-msd structures with ViennaRNA.
  • ORFs open reading frames
  • RT reverse transcriptase
  • Figure 2B shows the analysis was conducted on 2,068,918 reference genomes from the human gut microbiome [5], along with 15,574 bacterial and 531 archaeal genomes from the NCBI database [6], After a 95% de-duplication at the amino acid sequence level and annotating msr-msds; 568 new candidate systems were identified.
  • Figure 3A-B shows a comparison of top RTs in transient and genomically-integrated RFP reporter.
  • Figure 3A shows confocal images of the indicated RTs, or Cas9 with a scrambled sgRNA. Scale bar: 100 pm; inset: 50 pm.
  • Figure 3B shows correlation between retron editing activity in a genomic vs. transient RFP reporter system. The genomic RFP reporter was integrated into the AAVSl locus.
  • Figure 4A-B shows sequence identity and structural analysis of highly active retron- RTs.
  • Figure 4A shows sequence identity heatmap of RT's amino acid sequences and ncRNA sequences for retron candidates.
  • Figure 4B shows structure of a representative msr-msd transcript encoded by retron candidates. The structures are predicted by ViennaRNA. The msd region is highlighted by a dashed rectangle.
  • Figure 5A-E shows Efel-RT catalyzes precise genomic insertions across multiple loci.
  • Figure 5A shows a schematic of the NGS library preparation strategy. Genomic DNA is first amplified with primers that are outside the homology arms to avoid amplifying the retron- synthesized msDNA. After gel extraction, a second round of PCR amplifies and barcodes the insertion site for deep sequencing. Blue, orange: universal Illumina P5/I5 and P7/I7 adapters and indices.
  • Figure 5B shows normalized insertion efficiency for the top 5 retron-RTs at the CFTR and EMX1 loci. Error bars: mean of three replicates.
  • Figure 5C shows the relative insertion frequency of a 10 nt cargo at the EMX1 locus, along with the four most frequent misincorporated sequences (shown, from top to bottom, are SEQ ID NOS: 125, 126, 127, 128, 129, and 130).
  • the most common errors are a deletion or insertion at the periphery of the homology arms. Error bars: mean of three replicates. Dots represent individual replicates.
  • Figure 5D shows Efel-RT substitution errors (left) are less frequent than ssODN insertion (right) at the EMX1. Substitution rates are computed from the insert in the EMX1 locus across three biological replicates.
  • Figure 5E shows schematic (top) and results (bottom) of the insertion efficiency with Efel-RT as a function of the homology arm length. Templated insertion is most active with 50 nt homology arms at five genomic loci. Error bars indicate three replicates.
  • Figure 6A-G shows rational engineering of an Efel-RT-based retron editor.
  • Figure 6A shows results of the effect of splitting the sgRNA and msr-msd (left), the identity of the nuclear localization sequences (NLSs), and the linker between the Cas9 and Efel-RT (bottom).
  • Figure 6B shows results of expressing the sgRNA and msr-msd increased gene editing by 60% relative to a fused sgRNA-msr-msd design.
  • Figure 6C shows optimization of the N- and C-terminal nuclear localization sequences (NLSs). Gray: reference design that was used for normalization. Error bars: mean of three replicates.
  • Figure 6D shows optimization of the linker peptide between the Cas9 and Efel-RT. Retron editors tolerate a broad range of flexible (blue) and rigid (orange) linkers. Splitting the two enzymes via a ribosomal skipping peptide (T2A, light blue) also retains most activity. However, multimerization domains abrogated activity (green). Gray: reference design that was used for normalization. Error bars: mean of three replicates.
  • Figure 6E shows schematic (top) and editing activity of a Casl2a-based retron editor at five genomic loci. Error bars: mean of three replicates.
  • Figure 6F shows the relative insertion frequency of a 10 nt cargo at the BRD8 locus, along with the four most frequent misincorporated sequences.
  • Casl2a editors generate substitution error in the insert.
  • Error bars mean of three replicates (dots). Shown in order from top to bottom are SEQ ID NOS: 131-136, respectively.
  • Figure 6G shows Efel-RT substitution errors at the BRD8 locus. Substitution rates are computed from the insert in the locus. Error bars indicate three replicates.
  • Figure 7 shows Casl2a-based editing outcomes at the indicated loci.
  • Figure 8A-G shows inhibiting non-homologous end joining boosts templated insertion.
  • Figure 8A shows schematics of two strategies that boost templated insertionleft: Cas9 is fused to proteins that alter DNA repair pathway choice and right: small molecule inhibition of DNA-dependent protein-kinase catalytic subunit (DNAPKcs) or CDC7.
  • Figure 8B shows the effect of inhibitors (left) and Cas9 fusions (right) on the relative rate of templated insertion at five loci.
  • AZD7648 left, top
  • Cas9-CtlP-dnRNF168 both have the strongest effect at all tested loci.
  • Open circles editing with no inhibitor or DNA repair protein.
  • FIG. 8C shows a schematic of experiments with 50 nt homology arms and increasing insert lengths at the EMX1 locus.
  • Figure 8D shows AZD7648 increases the insertion efficiency across all cargo sizes tested in this study. Error bars: mean of three replicates.
  • Figure 8E shows AZD7648 outperforms TAK-931 and M3814 in boosting insertion efficiency at EMX1 without increasing mutational signature or Cas9-generated indels. Error bars: mean of three replicates.
  • Figure 8F shows the insertion efficiency decreases for all Cas9-repair protein fusions at EMX1. Error bars: mean of three replicates.
  • Figure 8G shows Cas9 fused to CtlP-dnRNF168 increased insertion efficiency of 10 nt cargo at EMX1 without increasing mutational signature or Cas9-generated indels compared to no DNA repair fusion, DN1S and hGeml/110. Error bars indicate three replicates.
  • Figure 9A-E shows optimizing inhibitor and DNA repair fusions improves gene editing with nickase Cas9 (nCas9).
  • Figure 9A shows optimization of inhibitor concentrations for optimal retron editing at the EMX1 locus. Dashed line: gene editing without inhibitors.
  • Figure 9B shows the effect of combining DNA repair inhibitors with Cas9-DNA repair domain fusions. Combining the most active inhibitor, AZD7468, with Cas9-CtlP-dnRNF168 reduces overall insertion activity. In other cases, AZD7468 improves retron editing, likely due to the limited improvement observed with Cas9-DN1S and Cas9-hGeml/100 fusions.
  • Figure 9C shows illustration of the nickase Cas9(D10A) nickase-DNA repair fusions, and a nickasebased retron editor.
  • Figure 9D shows fusing Cas9(D10A) with CtlP-dnRNF168, hRad51, and Rep-X helicase [7-9] increases the relative rates of templated repair at EMX1.
  • Open circles editing with no repair factor fusion.
  • Closed circle editing with the indicated repair factor fusion. All circles indicate the mean across three biological replicates.
  • Arrow change in editing efficiency with the indicated Cas9-DNA repair protein fusion.
  • Figure 9E shows breakdown of editing outcomes, as reported by NGS read counts. Bottom: retron editor- mediated insertion; middle: nCas9-only edits without any insert, top: unmodified reads. Error bars indicate three biological replicates.
  • Figure 10A-C shows in-frame precise epitope insertion with retron editors.
  • Figure 10A shows schematic of the split super-folder GFP system.
  • the 11th GFP -strand (GFP11) is expressed as a fusion to the protein of interest (POI). Reconstitution of GFP11 with GFP1-10 restores fluorescence.
  • Figure 10B shows GFP1-10 is expressed from a genomically- integrated inducible promoter.
  • the POI-GFP11 is in its native genomic locus.
  • Figure 10C shows confocal imaging of three GFPll-protein fusions. Scale bars: 10 pm.
  • Figure 11A-G shows retron editor delivery via an all-RNA package.
  • Figure 11A shows an illustration of RNA-based retron editing. Cas9 and Efel-RT are delivered as capped and poly-A tailed mRNAs. The sgRNA and msr-msd are mixed with the mRNAs in a 10:1 molar ratio prior to transfection into HEK293T cells.
  • Figure 11B shows insertion efficiency of a 10 nt insert at the indicated loci with RNA-based delivery. Error bars: mean of three replicates. Dots represent individual replicates.
  • Figure 11C shows the relative insertion frequency of a 10 nt cargo at the EMX1 locus, along with the four most frequent misincorporated sequences.
  • FIG. 11D shows a schematic of the three substitutions that are introduced in kif 6 by mRNA injection. The first mutation is silent but abolishes a Bsal cut site. Shown from top to bottom are SEQ ID NOS: 137 and 138.
  • Figure HE shows deep sequencing of embryos 24 hr post RNA injection for the indicated conditions. Each point is a single embryo, p-values are determined by a One-Way Anova test.
  • Figure 11F shows Bsal restriction enzyme digests of the edit site sub-cloned from WT, kif6A, and edited embryos.
  • FIG. 11G shows Sanger sequencing of the edited embryo from Figure HE confirms precise editing at the three expected sites (triangles). The sequence shown is SEQ ID NO: 138.
  • Figure 12A-D shows a schematic for exemplary embodiments of retron editor cassettes.
  • Figure 12A shows a retron editing cassette comprising a nucleotide sequence encoding Cas9.
  • Figure 12B shows a retron editing cassette comprising a nucleotide sequence encoding Casl2a.
  • Figure 12C shows a retron editing cassette comprising a nucleotide sequence encoding zinc finger nuclease (ZFN).
  • Figure 12D shows a retron editing cassette comprising a nucleotide sequence encoding TALENs.
  • "retron” can refer to the reverse transcriptase only.
  • Figure 13 shows the structure of Retron Editor 2.0. It includes a nuclear localization signal (NLS), a nuclease, a linker, and sgRNA and msr-msd extension and expression, as well as msr-msd structure/codon juggling.
  • NLS nuclear localization signal
  • nuclease a nuclease
  • linker a linker
  • sgRNA and msr-msd extension and expression as well as msr-msd structure/codon juggling.
  • Figure 14A-B shows various linker designs that were experimentally tested.
  • Figure 28A shows flexible linkers (5 total) and rigid linkers (6 total).
  • Figure 28B shows dimerization/multimerization polyvalent linkers (5 total).
  • Figure 15 shows linker results for NRT-49 with a split ncRNA.
  • Figure 16 shows the designs of a series of NLS sequences that were tested experimentally.
  • Figure 17 shows an editing summary of NRT-49. Shown are SEQ ID NO: 79 and SEQ ID NO: 80.
  • Figure 18A-C shows the use of nickase along with the retron editor.
  • A is a schematic.
  • B shows proteins which increase the relative rates of templated repair.
  • C shows percentage of editing event using various proteins.
  • Figure 19 shows results from DNA repair fusion domains and Casl2a.
  • the term "about” in relation to a reference numerical value can include a range of values plus or minus 10% from that value.
  • the amount “about 10” includes amounts from 9 to 11, including the reference numbers of 9, 10, and 11.
  • the term “about” in relation to a reference numerical value can also include a range of values plus or minus 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% from that value.
  • the terms "5"' and “3"' denote the positions of elements or features relative to the overall arrangement of the retron- guide RNA cassettes, vectors, or retron donor DNA-guide molecules of the present invention in which they are included. Positions are not, unless otherwise specified, referred to in the context of the orientation of a particular element or features.
  • the term “upstream” refers to a position that is 5' of a point of reference.
  • the term “downstream” refers to a position that is 3' of a point of reference.
  • the term "gene editing” or “genome editing” refers to a type of genetic engineering in which DNA is inserted, replaced, or removed from a target DNA (e.g., the genome of a cell) using one or more nucleases and/or nickases.
  • the nucleases create specific double-strand breaks (DSBs) at desired locations in the genome, and harness the cell's endogenous mechanisms to repair the induced break by homology-directed repair (HDR) (e.g., homologous recombination) or by nonhomologous end joining (NHEJ).
  • HDR homology-directed repair
  • NHEJ nonhomologous end joining
  • two nickases can be used to create two single-strand breaks on opposite strands of a target DNA, thereby generating a blunt or a sticky end.
  • Any suitable DNA nuclease can be introduced into a cell to induce genome editing of a target DNA sequence.
  • a nickase can be used in place of a nuclease in the retron editors described herein.
  • the term "programmable nuclease" refers to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of DNA, and may be an endonuclease or an exonuclease.
  • the programmable nuclease may be an engineered so that it can be used to induce gene editing of a target nucleic acid sequence.
  • Any suitable nuclease can be used including, but not limited to, CRISPR-associated protein (Cas) nucleases, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), or other endo- or exo-nucleases, variants thereof, fragments thereof, and combinations thereof.
  • Cas CRISPR-associated protein
  • ZFNs zinc finger nucleases
  • TALENs transcription activator-like effector nucleases
  • double-strand break or "DSB” or “double-strand cut” refers to the severing or cleavage of both strands of the DNA double helix.
  • the DSB may result in cleavage of both stands at the same position leading to "blunt ends” or staggered cleavage resulting in a region of single-stranded DNA at the end of each DNA fragment, or "sticky ends”.
  • a DSB may arise from the action of one or more DNA nucleases.
  • nonhomologous end joining refers to a pathway that repairs double-strand DNA breaks in which the break ends are directly ligated without the need for a homologous template.
  • HDR homologous recombination
  • nucleic acid refers to deoxyribonucleic acids (DNA), ribonucleic acids (RNA) and polymers thereof in either single- , double- or multi-stranded form.
  • the term includes, but is not limited to, single-, double- or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and/or pyrimidine bases or other natural, chemically modified, biochemically modified, non-natural, synthetic or derivatized nucleotide bases.
  • a nucleic acid can comprise a mixture of DNA, RNA and analogs thereof.
  • nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthologs, single nucleotide polymorphisms (SNPs), and complementary sequences as well as the sequence explicitly indicated.
  • degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); and Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)).
  • single nucleotide polymorphism refers to a change of a single nucleotide within a polynucleotide, including within an allele. This can include the replacement of one nucleotide by another, as well as the deletion or insertion of a single nucleotide. Most typically, SNPs are biallelic markers although tri- and tetra-a llelic markers can also exist.
  • a nucleic acid molecule comprising SNP A ⁇ C may include a C or A at the polymorphic position.
  • the term "gene” means the segment of DNA involved in producing a polypeptide chain.
  • the DNA segment may include regions preceding and following the coding region (leader and trailer) involved in the transcription/translation of the gene product and the regulation of the transcription/translation, as well as intervening sequences (introns) between individual coding segments (exons).
  • a heterologous component polynucleotide, polypeptide, other molecule, cell
  • a “host cell” refers to an in vivo or in vitro eukaryotic cell, prokaryotic cell (e.g., bacterial or archaeal cell), or cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, into which a heterologous polynucleotide or polypeptide has been introduced.
  • the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, an insect cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
  • the cell is in vitro.
  • the cell is in vivo.
  • the term "recombinant" refers to an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis, or manipulation of isolated segments of nucleic acids by genetic engineering techniques.
  • Plasmid refers to a linear or circular extra chromosomal element often carrying genes that are not part of the central metabolism of the cell, and usually in the form of double-stranded DNA.
  • Such elements may be autonomously replicating sequences, genome integrating sequences, phage, or nucleotide sequences, in linear or circular form, of a single- or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a polynucleotide of interest into a cell.
  • Transformation cassette refers to a specific vector comprising a gene and having elements in addition to the gene that facilitates transformation of a particular host cell.
  • Expression cassette refers to a specific vector comprising a gene and having elements in addition to the gene that allow for expression of that gene in a host.
  • a recombinant DNA construct comprises an artificial combination of nucleic acid sequences, e.g., regulatory and coding sequences that are not all found together in nature.
  • a recombinant DNA construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source but arranged in a manner different than that found in nature.
  • Such a construct may be used by itself or may be used in conjunction with a vector.
  • a vector is used, then the choice of vector is dependent upon the method that will be used to introduce the vector into the host cells as is well known to those skilled in the art.
  • a plasmid vector can be used.
  • the skilled artisan is well aware of the genetic elements that must be present on the vector in order to successfully transform, select and propagate host cells.
  • the skilled artisan will also recognize that different independent transformation events may result in different levels and patterns of expression (Jones et al. , (1985) EMBO J 4:2411-2418; De Almeida et al. , (1989 )Mol Gen Genetics 218:78-86), and thus that multiple events are typically screened in order to obtain lines displaying the desired expression level and pattern.
  • Such screening may be accomplished standard molecular biological, biochemical, and other assays including Southern analysis of DNA, Northern analysis of mRNA expression, PCR, real time quantitative PCR (qPCR), reverse transcription PCR (RT-PCR), immunoblotting analysis of protein expression, enzyme or activity assays, and/or phenotypic analysis.
  • Southern analysis of DNA Northern analysis of mRNA expression, PCR, real time quantitative PCR (qPCR), reverse transcription PCR (RT-PCR), immunoblotting analysis of protein expression, enzyme or activity assays, and/or phenotypic analysis.
  • heterologous refers to the difference between the original environment, location, or composition of a particular polynucleotide or polypeptide sequence and its current environment, location, or composition.
  • heterologous in reference to a sequence can refer to a sequence that originates from a different species, variety, foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention.
  • a promoter operably linked to a heterologous polynucleotide is from a species different from the species from which the polynucleotide was derived, or, if from the same/analogous species, one or both are substantially modified from their original form and/or genomic locus, or the promoter is not the native promoter for the operably linked polynucleotide.
  • expression refers to the production of a functional end-product (e.g., an mRNA, guide RNA, or a protein) in either precursor or mature form.
  • a functional end-product e.g., an mRNA, guide RNA, or a protein
  • a "mature" protein refers to a post-translationally processed polypeptide (i.e., one from which any pre- or propeptides present in the primary translation product have been removed).
  • Precursor protein refers to the primary product of translation of mRNA (i.e., with pre- and propeptides still present). Pre- and propeptides may be but are not limited to intracellular localization signals.
  • operably linked refers to two or more genetic elements, such as a polynucleotide coding sequence and a promoter, placed in relative positions that permit the proper biological functioning of the elements, such as the promoter directing transcription of the coding sequence.
  • inducible promoter refers to a promoter that responds to environmental factors and/or external stimuli that can be artificially controlled in order to modify the expression of, or the level of expression of, a polynucleotide sequence or refers to a combination of elements, for example an exogenous promoter and an additional element such as a trans-activator operably linked to a separate promoter.
  • An inducible promoter may respond to abiotic factors such as oxygen levels or to chemical or biological molecules. In some embodiments, the chemical or biological molecules may be molecules not naturally present in humans.
  • vector and "expression vector” refer to a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular polynucleotide sequence in a host cell.
  • An expression vector may be part of a plasmid, viral genome, or nucleic acid fragment.
  • an expression vector includes a polynucleotide to be transcribed, operably linked to a promoter.
  • promoter is used herein to refer to an array of nucleic acid control sequences that direct transcription of a nucleic acid.
  • a promoter includes necessary nucleic acid sequences near the start site of transcription, such as, in the case of a polymerase II type promoter, a TATA element.
  • a promoter also optionally includes distal enhancer or repressor elements, which can be located as much as several thousand base pairs from the start site of transcription.
  • Other elements that may be present in an expression vector include those that enhance transcription (e.g., enhancers) and terminate transcription (e.g., terminators).
  • reporter and “selectable marker” can be used interchangeably and refer to a gene product that permits a cell expressing that gene product to be identified and/or isolated from a mixed population of cells. Such isolation might be achieved through the selective killing of cells not expressing the selectable marker, which may be, as a non-limiting example, an antibiotic resistance gene.
  • the selectable marker may permit identification and/or subsequent isolation of cells expressing the marker as a result of the expression of a fluorescent protein such as GFP or the expression of a cell surface marker which permits isolation of cells by fluorescence- activated cell sorting (FACS), magnetic-activated cell sorting (MACS), or analogous methods.
  • FACS fluorescence- activated cell sorting
  • MCS magnetic-activated cell sorting
  • Suitable cell surface markers include CD8, CD19, and truncated CD19.
  • cell surface markers used for isolating desired cells are non-signaling molecules, such as subunit or truncated forms of CD8, CD19, or CD20. Suitable markers and techniques are known in the art. Also described herein is “traffic light reporting" (Kawalpreet K Aneja.
  • culture when referring to cell culture itself or the process of culturing, can be used interchangeably to mean that a cell (e.g., human cell) is maintained outside its normal environment under controlled conditions, e.g., under conditions suitable for survival.
  • Cultured cells are allowed to survive, and culturing can result in cell growth, stasis, differentiation or division. The term does not imply that all cells in the culture survive, grow, or divide, as some may naturally die or senesce.
  • Cells are typically cultured in media, which can be changed during the course of the culture.
  • subject means a vertebrate, preferably a mammal, more preferably a human.
  • Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.
  • administering includes oral administration, topical contact, administration as a suppository, intravenous, intraperitoneal, intramuscular, intralesional, intrathecal, intranasal, or subcutaneous administration to a subject. Administration is by any route, including parenteral and transmucosal (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal). Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial. Other modes of delivery include, but are not limited to, the use of liposomal formulations, intravenous infusion, transdermal patches, etc.
  • treating refers to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit.
  • therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment.
  • the compositions may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested.
  • the term "effective amount” or “sufficient amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results.
  • the therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary skill in the art.
  • the specific amount may vary depending on one or more of: the particular agent chosen, the host cell type, the location of the host cell in the subject, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, and the physical delivery system in which it is carried.
  • pharmaceutically acceptable carrier refers to a substance that aids the administration of an active agent to a cell, an organism, or a subject.
  • “Pharmaceutically acceptable carrier” refers to a carrier or excipient that can be included in the compositions of the invention and that causes no significant adverse toxicological effect on the patient.
  • Non-limiting examples of pharmaceutically acceptable carrier include water, NaCI, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, cell culture media, and the like.
  • pharmaceutically acceptable carrier include water, NaCI, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, cell culture media, and the like.
  • cellular localization tag refers to an amino acid sequence, also known as a “protein localization signal,” that targets a protein for localization to a specific cellular or subcellular region, compartment, or organelle (e.g., nuclear localization sequence, Golgi retention signal).
  • Cellular localization tags are typically located at either the N-terminal or C-terminal end of a protein. For more information regarding cellular localization tags, see, e.g., Negi, et al. Database (Oxford). 2015: bav003 (2015); incorporated herein by reference in its entirety for all purposes.
  • Percent similarity in the context of polynucleotide or peptide sequences, is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the sequence (e.g., an msr locus sequence) in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence which does not comprise additions or deletions, for optimal alignment of the two sequences.
  • the percentage is calculated by determining the number of positions at which the identical nucleotide or amino acid occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of similarity (e.g., sequence similarity).
  • a polynucleotide or peptide has at least about 80% similarity (e.g., sequence similarity), preferably at least about 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93, 94%, 95%, 96%, 97%, 98%, 99%, or 100% similarity, to a reference sequence, when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection, such sequences are then said to be "substantially similar.”
  • this definition also refers to the complement of a test sequence.
  • sequence comparison typically one sequence acts as a reference sequence, to which test sequences are compared.
  • test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated.
  • sequence comparison algorithm then calculates the percent sequence similarities for the test sequences relative to the reference sequence, based on the program parameters. For sequence comparison of nucleic acids and proteins, the BLAST and BLAST 2.0 algorithms and the default parameters discussed below are used.
  • BLAST and BLAST 2.0 algorithms are described in Altschul et al., (1990) J. Mol. Biol. 215: 403-410 and Altschul et al. (1977) Nucleic Acids Res. 25: 3389-3402, respectively.
  • Software for performing BLAST analyses is publicly available at the National Center for Biotechnology Information website, ncbi.nlm.nih.gov.
  • the algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positivevalued threshold score T when aligned with a word of the same length in a database sequence.
  • HSPs high scoring sequence pairs
  • T is referred to as the neighborhood word score threshold (Altschul et al., supra).
  • These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them.
  • the word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always ⁇ 0).
  • the BLASTP program uses as defaults a word size (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see, e.g., Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)).
  • the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proc. Nat'l. Acad. Sci. USA, 90:5873- 5787 (1993)).
  • One measure of similarity provided by the BLAST algorithm is the smallest sum probability ( P( N )), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • P( N ) the smallest sum probability
  • a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.
  • Binding refers to a sequence-specific, non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid). Not all components of a binding interaction need be sequence-specific (e.g., contacts with phosphate residues in a DNA backbone), as long as the interaction as a whole is sequence-specific. Such interactions are generally characterized by a dissociation constant (Kd) of 10-6 M-l or lower. "Affinity” refers to the strength of binding: increased binding affinity being correlated with a lower Kd.
  • a "binding protein” is a protein that is able to bind non-covalently to another molecule.
  • a binding protein can bind to, for example, a DNA molecule (a DNA-binding protein), an RNA molecule (an RNA-binding protein) and/or a protein molecule (a proteinbinding protein).
  • a DNA-binding protein a DNA-binding protein
  • RNA-binding protein an RNA-binding protein
  • a proteinbinding protein In the case of a protein-binding protein, it can bind to itself (to form homodimers, homotrimers, etc.) and/or it can bind to one or more molecules of a different protein or proteins.
  • a binding protein can have more than one type of binding activity.
  • a "zinc-finger DNA binding protein” (or binding domain) is a protein, or a domain within a larger protein, that binds DNA in a sequence-specific manner through one or more zinc-fingers, which are regions of amino acid sequence within the binding domain whose structure is stabilized through coordination of a zinc ion.
  • a zinc-finger DNA binding protein that is fused to a nuclease i.e., the nuclease domain of Fokl
  • ZFN zinc-finger nuclease
  • TALEN or "TALE-nucleases” refers to an endonuclease comprising a DNA- binding domain, which in one embodiment comprises 14- 20 or 16-22 TAL domain repeats, which can be fused to any portion of the Fokl nuclease domain.
  • TALEN or "TALE-nucleases” refers to an endonuclease comprising a DNA- binding domain, which in one embodiment comprises 14- 20 or 16-22 TAL domain repeats, which can be fused to any portion of the Fokl nuclease domain.
  • WO2011072246 herein incorporated by reference in its entirety.
  • the retron editor systems disclosed herein comprise two main components: the nuclease component and the retron component. Together, these components can edit nucleic acids and insert a donor sequence.
  • the retron component delivers the donor nucleic acid, and the nuclease component cleaves the target nucleic acid, allowing for insertion of a donor nucleic acid.
  • Fig. IB shows the mechanism by which the retron produces a reverse transcribed single-stranded DNA (ssDNA), which can be inserted into a gene of interest by using a programmable nuclease.
  • ssDNA reverse transcribed single-stranded DNA
  • a retron is a distinct nucleotide sequence found in the genome of many bacteria. These naturally occurring retrons can be engineered to produce single stranded DNA with a donor nucleic acid inserted therein.
  • the retrons disclosed herein comprise the elements needed to produce an edited gene. This includes, but is not limited to, non-coding RNA (ncRNA) and a nucleotide sequence encoding a reverse transcriptase (RT).
  • ncRNA comprises an msd sequence and a msr sequence.
  • the msr is the is the immediate precursor to the synthesis of msDNA.
  • the retron msr RNA folds into a characteristic secondary structure that contains a conserved guanosine residue at the end of a stem loop.
  • Synthesis of DNA by the retron-encoded reverse transcriptase (RT) results in a DNA/RNA chimera which is composed of small single-stranded DNA linked to small single-stranded RNA. This is referred to herein as ncDNA.
  • the RNA strand is joined to the 5' end of the DNA chain via a 2'-5' phosphodiester linkage that occurs from the 2' position of the conserved internal guanosine residue.
  • a description of retrons and how they function can be found, for example, in Simon et al. (2019), herein incorporated by reference in its entirety for its teaching concerning retrons.
  • the nuclease component of the retron editor system can comprise the components needed to edit a gene.
  • the nuclease component is referred to herein as a "programmable nuclease," meaning that it can be programmed to target a nucleic acid of interest.
  • This programmable nuclease can include, for example, a nucleotide sequence encoding a nuclease, as well as a guide RNA (gRNA) sequence.
  • the programmable nuclease component can be any system known in the art which can allow restriction of the target nucleic acid to occur, as well as guidance of the nuclease to the proper location within the target nucleic acid.
  • These programmable nuclease components include, but are not limited to, Cas9, Casl2a, Casl2f, TnpB, TALEN, and ZFN systems, as well as any combination thereof.
  • the programmable nuclease component of the retron editor system disclosed herein can comprise those components needed to carry out cleavage of the target. This can include, at a minimum, a guide RNA and a nuclease. Those specific components which are needed for specific gene editing systems are known in the art and are discussed in more detail below.
  • Fig. IB shows a general schematic of how the nuclease component creates a double strand break in DNA, which can then be "repaired” using retron generated single stranded DNA with an inserted donor nucleic acid sequence.
  • This inserted donor nucleic acid sequence can originate from the msd portion of an engineered retron and is described in further detail herein.
  • the retron editor systems disclosed herein can comprise linkers and nuclear localization sequences (NLS) as well. These are described in detail below.
  • Retron Editor Cassettes The components of a retron editor system can be encoded in one or more retron editor cassettes, which comprises nucleic acid encoding not only the retron nuclease components, but any other elements or components which can form a fully functional cassette or cassettes.
  • every element needed to form a retron editor system can be found in the same cassette.
  • various elements of the retron editor system can be in different cassettes.
  • each of the following can be in separate cassette: the retron, the nucleotide sequence encoding the gRNA, and the nucleotide sequence encoding the nuclease.
  • the elements within the retron itself can also be in separate cassettes.
  • the reverse transcriptase (RT) can be found in separate cassette from the ncRNA.
  • the RT can be coupled to the gRNA, for example. Examples of such cassettes can be found in Figs. 1 and 2.
  • cassettes When this cassette or cassettes are introduced into a cell, such as in the form of a vector, the cassette(s) are capable of producing a fully functional retron editor system, which can then edit nucleic acids found in the cell.
  • retrons can produce intracellular DNA at high concentration in different hosts, including mammalian cells.
  • various components of the retron editor system can be integrated into the host cell genome and need not be present in a vector.
  • the RT component can be encoded in a sequence that has been integrated into the host cell genome.
  • retrons use reverse transcriptase to form a multicopy singlestranded DNA (msDNA), which is a molecule comprising a single-stranded DNA that is branched out from an internal nucleic acid of an RNA molecule (msdRNA) via a 2', 5'- phosphodiester linkage.
  • msDNA multicopy singlestranded DNA
  • msdRNA RNA molecule
  • a retron comprises the components necessary to form this msDNA: 1) a nucleotide sequence encoding reverse transcriptase (RT); and 2) a nucleotide sequence encoding non-coding RNA.
  • This non-coding RNA comprises two segments, an msr sequence and an msd sequence.
  • the msr sequence is recognized by the RT, and the msd sequence is reverse transcribed by the RT, and can comprise donor nucleic acids. Prior to reverse transcription, the msr and msd form a single highly structured transcript (Simon et al. 2019). Reverse transcription requires both an msr-msd sequence and its cognate RT.
  • the msd region of a retron transcript typically codes for the DNA component of msDNA, and the msr region is the RNA component of msDNA.
  • the msr and msd loci have overlapping ends, and may be oriented opposite one another with a promoter located upstream of the msr locus which transcribes through the msr and msd loci (Fig. 1).
  • the msd sequence can be modified to include a donor nucleic acid, which can be introduced into a target nucleic acid sequence via the nuclease component and gRNA.
  • the msd and msr regions of retron transcripts generally contain first and second inverted repeat sequences, which together make up a stable stem structure (Simon et al. 2019).
  • the combined msr-msd region of the retron transcript serves not only as a template for reverse transcription but, by virtue of its secondary structure, also serves as a primer (i.e., self-priming) for msDNA synthesis by a reverse transcriptase.
  • the first inverted repeat sequence coding region is located within the 5' end of the msr locus.
  • the second inverted repeat sequence coding region is located 3' of the msd locus.
  • the first inverted repeat sequence is located within the 5' end of the msr region.
  • the second inverted repeat sequence is located 3' of the msd region.
  • SEQ ID NOS: 1-22 represent the protein sequence of the transcribed retrons
  • SEQ ID NOS: 23-44 represent nucleic acids of the associated msr-msd sequences.
  • SEQ ID NO: 1 While in its native form, specific proteins are associated with specific msr-msd sequences (such as SEQ ID NO: 1 and 23, SEQ ID NO: 2 and 24, etc.), it is noted that the protein and the msr-msd sequence can be interchangeable in the case of highly homologous retrons, so, for example, it is contemplated herein that the protein represented by SEQ ID NO: 1 can be used with any of the msr-msd sequences found in SEQ ID NOS: 23-44. Likewise, SEQ ID NO: 2 can be used with any of SEQ ID NOS: 23-44, and so on for all of SEQ ID NOS: 1-22, as each can be used with any of the msr-msd sequences disclosed herein.
  • contemplated herein are not only the exact sequences of SEQ ID NOS: 1-22, but proteins with 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% homology to any of SEQ ID NOS: 1-22.
  • nucleic acid msr-msd sequences with 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% homology to any of SEQ ID NOS: 23-44.
  • retrons may be used in alternative embodiments of the present invention.
  • the nucleotide sequence of a native retron may be modified, for example using known codon optimization techniques, so that expression within the desired host is optimized.
  • codon optimization it is meant the selection of appropriate DNA nucleotides for the synthesis of oligonucleotide building blocks, and their subsequent enzymatic assembly, of a structural gene or fragment thereof in order to approach codon usage within the host.
  • codon optimization it is meant the selection of appropriate DNA nucleotides for the synthesis of oligonucleotide building blocks, and their subsequent enzymatic assembly, of a structural gene or fragment thereof in order to approach codon usage within the host.
  • Cas endonucleases unwind the DNA duplex at the target sequence and optionally cleave at least one DNA strand, as mediated by recognition of the target sequence by a polynucleotide (such as, but not limited to, a guide RNA (gRNA), which can comprise a CRISPR RNA (crRNA) or an sgRNA) that is in complex with the Cas effector protein.
  • a polynucleotide such as, but not limited to, a guide RNA (gRNA), which can comprise a CRISPR RNA (crRNA) or an sgRNA
  • gRNA guide RNA
  • crRNA CRISPR RNA
  • sgRNA sgRNA sequence that is in complex with the Cas effector protein.
  • PAM sequences are described in more detail below.
  • a Cas endonuclease herein may lack DNA cleavage or nicking activity but can still specifically bind to a DNA target sequence when complexed with a suitable RNA component.
  • Cas endonucleases may occur as individual effectors (Class 2 CRISPR systems) or as part of larger effector complexes (Class I CRISPR systems).
  • Cas endonucleases that have been described include, but are not limited to, for example: Cas3 (a feature of Class 1 type I systems), Cas9 (a feature of Class 2 type II systems) and Casl2-family enzymes (e.g., Cpfl) (a feature of Class 2 type V systems).
  • Cas3 (and its variants Cas3' and Cas3") functions as a single-stranded DNA nuclease (HD domain) and an ATP-dependent helicase.
  • a variant of the Cas3 endonuclease can be obtained by disabling the functional activity of one or both domains of the Cas3 endonuclease poly peptide.
  • Disabling the ATPase dependent helicase activity can convert the cleavage ready Cascade comprising the modified Cas3 endonuclease into a nickase (as the HD domain is still functional).
  • Disabling the HD endonuclease activity can be accomplished by any method known in the art, such as but not limited to, mutagenesis of critical residues of the HD domain, can convert the cleavage ready Cascade comprising the modified Cas3 endonuclease into a helicase.
  • Disabling the both the Cas helicase and Cas3 HD endonuclease activity can be accomplished by any method known in the art, such as but not limited to, mutagenesis of critical residues of both the helicase and HD domains, can convert the cleavage ready Cascade comprising the modified Cas3 endonuclease into a binder protein that binds to a target sequence.
  • Cas9 (formerly referred to as Cas5, Csnl, or Csxl2) is a Cas endonuclease that forms a complex with a crRNA and a tracrRNA, or with a single guide polynucleotide, for specifically recognizing and cleaving all or part of a DNA target sequence.
  • Some Cas9 endonucleases recognize a 3' GC-rich PAM sequence on the target dsDNA, while other Cas9 endonucleases recognize other PAM sequences.
  • a Cas9 protein comprises a RuvC nuclease with an HNH (H— N— H) nuclease adjacent to the RuvC-ll domain.
  • the RuvC nuclease and HNH nuclease each can cleave a single DNA strand at a target sequence (the concerted action of both domains leads to DNA double-strand cleavage, whereas activity of one domain leads to a nick).
  • the RuvC domain comprises subdomains I, II and III, where domain I is located near the N-terminus of Cas9 and subdomains II and III are located in the middle of the protein, flanking the HNH domain (Hsu et al., 2013, Cell 157:1262- 1278).
  • Cas9 endonucleases are typically derived from a type II CRISPR system, which includes a DNA cleavage system utilizing a Cas9 endonuclease in complex with at least one polynucleotide component.
  • a Cas9 can be in complex with a CRISPR RNA (crRNA) and a trans-activating CRISPR RNA (tracrRNA).
  • a Cas9 can be in complex with a single guide RNA (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13:1-15).
  • Casl2-family enzymes (formerly referred to as Cpfl, and variants c2cl, c2c3, CasX, and CasY) comprise an RuvC nuclease domain and produced staggered, 5' overhangs on the dsDNA target. Some variants do not require a tracrRNA, unlike the functionality of Cas9. Casl2 and its variants recognize a 5' AT-rich PAM sequence on the target dsDNA.
  • An insert domain, called Nuc of the Casl2a protein has been proposed to be responsible for target strand cleavage (Yamano et al., Cell 2016, 165:949-962). Additional mutation studies demonstrated the Nuc domain contributes to guide and target binding, with the RuvC domain responsible for cleavage of both DNA strands (Swarts et al., Mol Cell 2017, 66:221- 233 e224).
  • Cas endonucleases and effector proteins can be used for targeted genome editing (via simplex and multiplex double-strand breaks and nicks) and targeted genome regulation (via tethering of epigenetic effector domains to either the Cas protein or sgRNA.
  • a Cas endonuclease can also be engineered to function as an RNA-guided recombinase, and via RNA tethers could serve as a scaffold for the assembly of multiprotein and nucleic acid complexes (Mali et al., 2013, Nature Methods Vol. 10:957-963).
  • Zinc-finger binding domains can be engineered to bind to a sequence of choice. See, for example, Beerli et al. (2002) Nature Biotechnol. 20:135-141; Pabo et al. (2001) Ann. Rev. Biochem. 70:313-340; Isalan et al. (2001) Nature Biotechnol. 19:656-660; Segal et al. (2001) Curr. Opin. Biotechnol. 12:632- 637; Choo et al. (2000) Curr. Opin. Struct. Biol. 10:411-416.
  • An engineered zinc-finger binding domain can have a novel binding specificity, compared to a naturally-occurring zinc- finger protein.
  • Engineering methods include, but are not limited to, rational design and various types of selection.
  • Rational design includes, for example, using databases comprising triplet (or quadruplet) nucleotide sequences and individual zinc-finger amino acid sequences, in which each triplet or quadruplet nucleotide sequence is associated with one or more amino acid sequences of zinc-fingers which bind the particular triplet or quadruplet sequence.
  • Exemplary selection methods including phage display and two-hybrid systems, are disclosed in U.S. Pat. Nos. 5,789,538; 5,925,523; 6,007,988; 6,013,453; 6,410,248; 6,140,466; 6,200,759; and 6,242,568; as well as WO 98/37.186; WO 98/53057; WO 00/27878; WO 01/88197 and GB 2,338,237.
  • the ZFNs used with the systems disclosed herein can be nucleic acids when encode ZFNs, or can be the ZFN itself.
  • the ZFN can also comprise a nuclease (cleavage domain, cleavage half-domain).
  • the cleavage domain portion of the fusion proteins disclosed herein can be obtained from any endonuclease or exonuclease.
  • Exemplary endonucleases from which a cleavage domain can be derived include, but are not limited to, restriction endonucleases and homing endonucleases. See, for example, 2002-2003 Catalogue, New England Biolabs, Beverly, Mass.; and Belfort et al. (1997) Nucleic Acids Res.
  • cleavage half-domain can be derived from any nuclease or portion thereof, as set forth above, that requires dimerization for cleavage activity.
  • two fusion proteins are required for cleavage if the fusion proteins comprise cleavage halfdomains.
  • a single protein comprising two cleavage half-domains can be used.
  • the two cleavage half-domains can be derived from the same endonuclease (or functional fragments thereof), or each cleavage half-domain can be derived from a different endonuclease (or functional fragments thereof).
  • the target sites for the two fusion proteins are preferably disposed, with respect to each other, such that binding of the two fusion proteins to their respective target sites places the cleavage half-domains in a spatial orientation to each other that allows the cleavage half-domains to form a functional cleavage domain, e.g., by dimerizing.
  • the near edges of the target sites are separated by 5-8 nucleotides or by 15-18 nucleotides.
  • any integral number of nucleotides or nucleotide pairs can intervene between two target sites (e.g., from 2 to 50 nucleotide pairs or more).
  • the site of cleavage lies between the target sites.
  • TALENs Transcription Activator-Like Effector Nucleases
  • TALEs Transcription activator-like effectors
  • TALEN is also used to refer to one or both members of a pair of TALENs that are engineered to work together to cleave DNA at the same site.
  • TALENs that work together may be referred to as a left-TALEN and a right- TALEN, or a first and second TALEN, which references the handedness of DNA. See U.S. Ser. No. 12/965,590; U.S. Ser. No. 13/426,991 (U.S. Pat. No. 8,450,471); U.S. Ser. No. 13/427,040 (U.S. Pat. No. 8,440,431); U.S. Ser. No. 13/427,137 (U.S. Pat. No. 8,440,432); and U.S. Ser. No. 13/738,381, all of which are incorporated by reference herein in their entirety.
  • the nuclease used with TALEN is selected from a group consisting of Pvull, MutH, Tevl, Fokl, Alwl, Mlyl, Sbfl, Sdal, Stsl, CleDORF, Clo051, and Pept071.
  • Fokl is fused to a TALE domain each member of the TALEN pair binds to the DNA sites flanking a target site, the Fokl monomers dimerize and cause a DSB at the target site.
  • Fokl domains Besides the wild-type Fokl cleavage domain, variants of the Fokl cleavage domain with mutations have been designed to improve cleavage specificity and cleavage activity.
  • the Fokl domain functions as a dimer, requiring two constructs with unique DNA binding domains for sites in the target genome with proper orientation and spacing. Both the number of amino acid residues between the TALEN DNA binding domain and the Fokl cleavage domain, and the number of bases between the two individual TALEN binding sites are parameters for achieving high levels of activity.
  • Pvull, MutH, and Tevl cleavage domains are useful alternatives to Fokl and Fokl variants for use with TALEs.
  • Pvull functions as a highly specific cleavage domain when coupled to a TALE (see Yank et al. 2013. PLoS One. 8: e82539). MutH is capable of introducing strand-specific nicks in DNA (see Gabsalilow et al. 2013. Nucleic Acids Research. 41: e83). Tevl introduces double-stranded breaks in DNA at targeted sites (see Beurdeley et al., 2013. Nature Communications. 4: 1762).
  • TALE-NT TAL Effector-Nucleotide Targeter
  • Engineered TALEN nucleases of the invention can be delivered into a cell in the form of a protein or, preferably, as a nucleic acid encoding the engineered nuclease.
  • Such nucleic acid can be DNA (e.g., circular or linearized plasmid DNA or PCR products) or RNA or a combination of RNAs.
  • RNA may have various stability, various lengths and be delivered in various amounts.
  • the engineered TALEN nuclease coding sequence can be operably linked to a promoter to facilitate transcription of the nuclease, (TALEN or meganuclease gene).
  • Mammalian promoters suitable for the invention include constitutive promoters such as the cytomegalovirus early (CMV) promoter (Thomsen et al. (1984), Proc Natl Acad Sci USA. 81(3):659-63) or the SV40 early promoter (Benoist and Chambon (1981), Nature. 290(5804) :304-10) as well as inducible promoters such as the tetracycline-inducible promoter (Dingermann et al. (1992), Mol Cell Biol. 12(9):4038-45).
  • CMV cytomegalovirus early
  • SV40 early promoter SV40 early promoter
  • inducible promoters such as the tetracycline-inducible promoter (Dingermann et al. (19
  • the guide polynucleotide enables target recognition, binding, and optionally cleavage by the nuclease, and can be a single molecule or a double molecule.
  • the guide polynucleotide sequence can be an RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence).
  • the guide polynucleotide can comprise at least one nucleotide, phosphodiester bond or linkage modification such as, but not limited, to Locked Nucleic Acid (LNA), 5-methyl dC, 2,6-Diaminopurine, 2'-Fluoro A, 2'-Fluoro U, 2'-O-Methyl RNA, phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 (hexaethylene glycol chain) molecule, or 5' to 3' covalent linkage resulting in circularization.
  • LNA Locked Nucleic Acid
  • 5-methyl dC 2,6-Diaminopurine
  • 2'-Fluoro A 2,6-Diaminopurine
  • 2'-Fluoro U 2,6-Diaminopurine
  • 2'-Fluoro U 2,6-Diaminopurine
  • a guide polynucleotide that solely comprises ribonucleic acids is also referred to as a "guide RNA” or “gRNA” (US20150082478 published 19 Mar. 2015 and US20150059010 published 26 Feb. 2015).
  • a guide polynucleotide may be engineered or synthetic.
  • the guide polynucleotide includes a chimeric non-naturally occurring guide RNA comprising regions that are not found together in nature (i.e., they are heterologous with each other).
  • a chimeric non-naturally occurring guide RNA comprising a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA, linked to a second nucleotide sequence that can recognize the Cas endonuclease, such that the first and second nucleotide sequence are not found linked together in nature.
  • VT domain Variable Targeting domain
  • the guide polynucleotide can be a double molecule (also referred to as duplex guide polynucleotide) comprising a crNucleotide sequence (such as a crRNA) and a tracrNucleotide (such as a tracrRNA) sequence.
  • a linker polynucleotide that connects the crRNA and tracrRNA to form a single guide, for example an sgRNA.
  • the crNucleotide includes a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA and a second nucleotide sequence (also referred to as a tracr mate sequence) that is part of a Cas endonuclease recognition (CER) domain.
  • the tracr mate sequence can hybridized to a tracrNucleotide along a region of complementarity and together form the Cas endonuclease recognition domain or CER domain.
  • the CER domain is capable of interacting with a Cas endonuclease polypeptide.
  • the crNucleotide and the tracrNucleotide of the duplex guide polynucleotide can be RNA, DNA, and/or RNA-DNA-combination sequences.
  • the crNucleotide molecule of the duplex guide polynucleotide is referred to as "crDNA” (when composed of a contiguous stretch of DNA nucleotides) or "crRNA” (when composed of a contiguous stretch of RNA nucleotides), or "crDNA-RNA” (when composed of a combination of DNA and RNA nucleotides).
  • the crNucleotide can comprise a fragment of the crRNA naturally occurring in Bacteria and Archaea.
  • the size of the fragment of the crRNA naturally occurring in Bacteria and Archaea that can be present in a crNucleotide disclosed herein can range from, but is not limited to, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides.
  • the tracr nucleotide is referred to as "tracrRNA” (when composed of a contiguous stretch of RNA nucleotides) or "tracrDNA” (when composed of a contiguous stretch of DNA nucleotides) or "tracrDNA-RNA” (when composed of a combination of DNA and RNA nucleotides.
  • the RNA that guides the RNA/Cas9 endonuclease complex is a duplexed RNA comprising a duplex crRNA-tracrRNA.
  • the tracrRNA (transactivating CRISPR RNA) comprises, in the 5'-to-3' direction, (i) a sequence that anneals with the repeat region of CRISPR type II crRNA and (ii) a stem loop-comprising portion (Deltcheva et al., Nature 471:602-607).
  • the duplex guide polynucleotide can form a complex with a Cas endonuclease, wherein said guide polynucleotide/Cas endonuclease complex (also referred to as a guide polynucleotide/Cas endonuclease system) can direct the Cas endonuclease to a genomic target site, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) into the target site.
  • guide polynucleotide/Cas endonuclease complex also referred to as a guide polynucleotide/Cas endonuclease system
  • can direct the Cas endonuclease to a genomic target site enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double
  • the guide RNA includes a dual molecule comprising a chimeric non-naturally occurring crRNA linked to at least one tracrRNA.
  • a chimeric non-naturally occurring crRNA includes a crRNA that comprises regions that are not found together in nature (i.e., they are heterologous with each other.
  • a crRNA comprising a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA, linked to a second nucleotide sequence (also referred to as a tracr mate sequence) such that the first and second sequence are not found linked together in nature.
  • VT domain Variable Targeting domain
  • tracr mate sequence a nucleotide sequence domain
  • the guide polynucleotide can also be a single molecule (also referred to as single guide polynucleotide) comprising a crNucleotide sequence linked to a tracr nucleotide sequence.
  • the single guide polynucleotide comprises a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA and a Cas endonuclease recognition domain (CER domain), that interacts with a Cas endonuclease polypeptide.
  • VT domain Variable Targeting domain
  • CER domain Cas endonuclease recognition domain
  • the VT domain and/or the CER domain of a single guide polynucleotide can comprise a RNA sequence, a DNA sequence, or a RNA-DNA-combination sequence.
  • the single guide polynucleotide being comprised of sequences from the crNucleotide and the tracrNucleotide may be referred to as "single guide RNA" (when composed of a contiguous stretch of RNA nucleotides) or "single guide DNA” (when composed of a contiguous stretch of DNA nucleotides) or “single guide RNA-DNA” (when composed of a combination of RNA and DNA nucleotides).
  • the single guide polynucleotide can form a complex with a Cas endonuclease, wherein said guide polynucleotide/Cas endonuclease complex (also referred to as a guide polynucleotide/Cas endonuclease system) can direct the Cas endonuclease to a genomic target site, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the target site.
  • a guide polynucleotide/Cas endonuclease complex also referred to as a guide polynucleotide/Cas endonuclease system
  • a chimeric non-naturally occurring single guide RNA includes a sgRNA that comprises regions that are not found together in nature (i.e., they are heterologous with each other.
  • a sgRNA comprising a first nucleotide sequence domain (referred to as Variable Targeting domain or VT domain) that can hybridize to a nucleotide sequence in a target DNA linked to a second nucleotide sequence (also referred to as a tracr mate sequence) that are not found linked together in nature.
  • the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a RNA sequence, a DNA sequence, or a RNA-DNA combination sequence.
  • the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide (also referred to as "loop") can be at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
  • the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a tetraloop sequence, such as, but not limiting to a GAAA tetraloop sequence.
  • the guide polynucleotide can be produced by any method known in the art, including chemically synthesizing guide polynucleotides (such as but not limiting to Hendel et al. 2015, Nature Biotechnology 33, 985-989), in vitro generated guide polynucleotides, and/or self-splicing guide RNAs (such as but not limited as such).
  • the degree of complementarity between a guide sequence of the gRNA (i.e., crRNA sequence) and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
  • Burrows-Wheeler Transform e.g., the Burrows Wheeler Aligner
  • ClustalW ClustalW
  • Clustal X Clustal X
  • BLAT Novoalign
  • SOAP available at soap.genomics.org.cn
  • Maq available at maq.sourceforge.net
  • a crRNA sequence is about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some instances, a crRNA sequence is about 20 nucleotides in length. In other instances, a crRNA sequence is about 15 nucleotides in length. In other instances, a crRNA sequence is about 25 nucleotides in length.
  • the nucleotide sequence of a modified gRNA can be selected using any of the webbased software described above. Considerations for selecting a DNA-targeting RNA include the PAM sequence for the nuclease (e.g., Cas9 or Cpfl) to be used, and strategies for minimizing off-target modifications. Tools, such as the CRISPR Design Tool, can provide sequences for preparing the gRNA, for assessing target modification efficiency, and/or assessing cleavage at off-target sites.
  • PAM sequence for the nuclease e.g., Cas9 or Cpfl
  • Tools such as the CRISPR Design Tool, can provide sequences for preparing the gRNA, for assessing target modification efficiency, and/or assessing cleavage at off-target sites.
  • the length of the gRNA molecule is about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, or more nucleotides in length.
  • the length of the gRNA is about 100 nucleotides in length.
  • the gRNA is about 90 nucleotides in length.
  • the gRNA is about 110 nucleotides in length.
  • Nucleotide sequence modification of the guide polynucleotide can be selected from, but not limited to, the group consisting of a 5' cap, a 3' polyadenylated tail, a riboswitch sequence, a stability control sequence, a sequence that forms a dsRNA duplex, a modification or sequence that targets the guide poly nucleotide to a subcellular location, a modification or sequence that provides for tracking, a modification or sequence that provides a binding site for proteins, a Locked Nucleic Acid (LNA), a 5-methyl dC nucleotide, a 2,6-Diaminopurine nucleotide, a 2'-Fluoro A nucleotide, a 2'-Fluoro U nucleotide; a 2'-O- Methyl RNA nucleotide, a phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol
  • LNA
  • the additional beneficial feature is selected from the group of a modified or regulated stability, a subcellular targeting, tracking, a fluorescent label, a binding site for a protein or protein complex, modified binding affinity to complementary target sequence, modified resistance to cellular degradation, and increased cellular permeability.
  • the gRNA described herein can be provided within a cassette along with a retron and a nuclease component, or can be provided alone with the retron or alone with the nuclease in separate cassettes. It can also be provided by being encoded directly into a host cell genome. In one particular embodiment, the gRNA can be fused to ncRNA of the retron, as described above.
  • a "protospacer adjacent motif” herein refers to a short nucleotide sequence adjacent to a target sequence (protospacer) that can be recognized (targeted) by a guide polynucleotide/Cas endonuclease system.
  • the Cas endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not followed by a PAM sequence.
  • the sequence and length of a PAM herein can differ depending on the Cas protein or Cas protein complex used.
  • the PAM sequence can be of any length but is typically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long.
  • a "randomized PAM” and "randomized protospacer adjacent motif” are used interchangeably herein, and refer to a random DNA sequence adjacent to a target sequence (protospacer) that is recognized (targeted) by a guide polynucleotide/Cas endonuclease system.
  • the randomized PAM sequence can be of any length but is typically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long.
  • a randomized nucleotide includes anyone of the nucleotides A, C, G or T.
  • the retron editor systems disclosed herein can include a nuclear localization sequence (NLS).
  • NLS nuclear localization sequence
  • a heterologous NLS amino acid sequence herein may be of sufficient strength to drive accumulation of the nuclease, the RT, or a combination thereof in a detectable amount in the nucleus of a mammalian cell herein, for example.
  • An NLS may comprise one (monopartite) or more (e.g., bipartite) short sequences (e.g., 2 to 30 residues) of basic, positively charged residues (e.g., lysine and/or arginine), and can be located anywhere in the nuclease or retron RT amino acid sequence but such that it is exposed on the protein surface.
  • An NLS may be operably linked to the N-terminus or C-terminus of a Cas protein, the retron RT, or their combination, for example.
  • Two or more NLS sequences can be linked to a Cas protein and the retron RT, for example, such as on both the N- and C- termini of the retron RT.
  • the Cas endonuclease gene and the RT can be operably linked to a SV40 nuclear targeting signal upstream of the coding region and a bipartite VirD2 nuclear localization signal (Tinland et al. (1992) Proc. Natl. Acad. Sci. USA 89:7442-6) downstream of the coding region.
  • suitable NLS sequences herein include those disclosed in U.S. Pat. Nos. 6,660,830 and 7,309,576.
  • a linker can be used with the retron editor systems described herein.
  • Fig. ID shows an example of where the linker can be placed in the cassette.
  • the linker can be placed such that it connects the C-terminus of the nuclease with the N- terminus of the retron RT, the N-terminus of the nuclease with the C-terminus of the RT, or linkers can connect two internal regions.
  • the linker can be a flexible linker, a rigid linker, an in vivo cleavable linker, a polyvalent linker, or any combination thereof.
  • the linker can form a one-to-one connection between the RT and nuclease, or the linker can connect one nuclease to two or more RTs, or two or more nucleases to a single RT (polyvalent linkers).
  • a linker may link functional domains together (as in flexible and rigid linkers) or releasing free functional domain in vivo as in in vivo cleavable linkers.
  • Linkers may improve biological activity, increase expression yield, and achieving desirable pharmacokinetic profiles.
  • a linker can also comprise hydrazone, peptide, disulfide, or thioesther.
  • a linker sequence described herein can include a flexible linker.
  • Flexible linkers can be applied when a joined domain requires a certain degree of movement or interaction.
  • Flexible linkers can be composed of small, non-polar (e.g., Gly) or polar (e.g., Ser or Thr) amino acids.
  • a flexible linker can have sequences consisting primarily of stretches of Gly and Ser residues ("GS" linker).
  • GS Gly-Gly-Ser
  • An example of a flexible linker can have the sequence of (Gly-Gly-Ser)n. By adjusting the copy number "n", the length of this exemplary GS linker can be optimized to achieve appropriate separation of functional domains, or to maintain necessary inter-domain interactions.
  • flexible linkers can be utilized for recombinant fusion proteins.
  • flexible linkers can also be rich in small or polar amino acids such as Gly and Ser, but can contain additional amino acids such as Thr and Ala to maintain flexibility.
  • polar amino acids such as Lys and Glu can be used to improve solubility.
  • Flexible linkers included in linker sequences described herein can be rich in small or polar amino acids such as Gly and Ser to provide good flexibility and solubility. Flexible linkers can be suitable choices when certain movements or interactions are desired for fusion protein domains. In addition, although flexible linkers may not have rigid structures, they can serve as a passive linker to keep a distance between functional domains. The length of flexible linkers can be adjusted to allow for proper folding or to achieve optimal biological activity of the fusion proteins.
  • a linker described herein can further include a rigid linker in some cases.
  • a rigid linker may be utilized to maintain a fixed distance between domains of a polypeptide.
  • Rigid linkers can exhibit relatively stiff structures by adopting a-helical structures or by containing multiple Pro residues in some cases.
  • a linker described herein can include a polyvalent linker that connects one nuclease to two or more RTs or one RT to two or more nucleases.
  • a polyvalent linker may be utilized to increase the copy number of particular enzyme relative to another enzyme (i.e., to bring multiple RTs to the site of a DNA break induced by Cas9).
  • polyvalent linkers can be: SunTag and SpyTag, to name a few.
  • Polyvalent linkers can link two or more entities to a single polypeptide. Examples of such linkers can be found, for example, in W02016/011070A2 and EP3303374B1, both of which are incorporated by reference herein.
  • a linker described herein can be cleavable in some cases. In other cases a linker is not cleavable.
  • Linkers that are not cleavable may covalently join functional domains together to act as one molecule throughout in vivo processes or ex vivo processes.
  • a linker can also be cleavable in vivo.
  • a cleavable linker can be introduced to release free functional domains in vivo.
  • a cleavable linker can be cleaved by the presence of reducing reagents, proteases, to name a few. For example, a reduction of a disulfide bond may be utilized to produce a cleavable linker.
  • a cleavage event through disulfide exchange with a thiol, such as glutathione could produce a cleavage.
  • a cleavable linker can also comprise hydrazone, peptides, disulfide, or thioesther.
  • a hydrazone can confer serum stability.
  • a hydrazone can allow for cleavage in an acidic compartment.
  • An acidic compartment can have a pH up to 7.
  • a linker can also include a thioether.
  • a thioether can be nonreducible
  • a thioether can be designed for intracellular proteolytic degradation.
  • a linker can be engineered.
  • Methods of designing linkers can be computational.
  • computational methods can include graphic techniques. Computation methods can be used to search for suitable peptides from libraries of three- dimensional peptide structures derived from databases. For example, a Brookhaven Protein Data Bank (PDB) can be used to span the distance in space between selected amino acids of a linker.
  • PDB Brookhaven Protein Data Bank
  • RNA delivery systems disclosed herein can be introduced into a cell via a variety of methods.
  • methods of delivering RNA to a cell can include providing the RNA in a therapeutic payload, such as in a synthetic vehicle like lipid or lipid-based nanoparticles. Examples of such systems can be found in Paunovska et al. (Drug delivery systems for RNA therapeutics. Nat Rev Genet 23, 265-280, 2022, herein incorporated by reference in its entirety its teaching concerning RNA delivery). Delivery can also be accomplished by way of a vector.
  • the delivery system comprises lipid particles as described in Kanasty R, Delivery materials for siRNA therapeutics Nat Mater. 12(ll):967-77 (2013), which is hereby incorporated by reference.
  • the lipid-based vector is a lipid nanoparticle, which is a lipid particle between about 1 and about 100 nanometers in size.
  • Cells include, but are not limited to, human, non-human, animal, bacterial, fungal, insect, yeast, non-conventional yeast, and plant cells as well as plants and seeds produced by the methods described herein.
  • Methods for introducing the retron editor system into cells or organisms also include, but are not limited to, microinjection, electroporation, stable transformation methods, transient transformation methods, ballistic particle acceleration (particle bombardment), whiskers mediated transformation, direct gene transfer, viral-mediated introduction, transfection, transduction, cell-penetrating peptides, mesoporous silica nanoparticle (MSN)-mediated direct protein delivery, topical applications, or any combination thereof.
  • Vectors and constructs include circular plasmids, and linear polynucleotides, comprising a polynucleotide of interest and optionally other components including linkers, adapters, regulatory or analysis.
  • a recognition site and/or target site can be comprised within an intron, coding sequence, 5' UTRs, 3' UTRs, and/or regulatory regions.
  • the invention further provides expression constructs for expressing in a prokaryotic or eukaryotic cell/organism a gene editing system that is capable of recognizing, binding to, and optionally nicking, unwinding, or cleaving all or part of a target sequence.
  • the delivery vehicles may be administered by any method known in the art, including injection, optionally by direct injection to target tissues.
  • Nucleic acid modification can be monitored over time by, for example, periodic biopsy with PCR amplification and/or sequencing of the target region from genomic DNA, or by RT-PCR and/or sequencing of the expressed transcripts. Alternatively, nucleic acid modification can be monitored by detection of a reporter gene or reporter sequence. Alternatively, nucleic acid modification can be monitored by expression or activity of a corrected gene product or a therapeutic effect in the subject.
  • the expression constructs of the disclosure comprise a promoter operably linked to a nucleotide sequence encoding a Cas gene and a promoter operably linked to a guide RNA of the present disclosure.
  • the promoter is capable of driving expression of an operably linked nucleotide sequence in a prokaryotic or eukaryotic cell/organism.
  • the cassette can further comprise at least one promoter.
  • the promoter can be capable of driving sgRNA transcription.
  • An example of such as promoter is a U6 promoter.
  • the promoter can be capable of driving msr-msd transcription.
  • An example of such as promoter is the Hl promoter.
  • the promoter can drive nuclease and RT expression, or the expression of their fusion.
  • An example of such a promoter is a CMV protomer.
  • the promoter can be operably linked to non-coding RNA, sgRNA, msr, msd, (or both msr and msd), the nuclease, the RT, or to a nuclease-RT fusion nucleic acid.
  • the promoter can be an RNA polymerase II promoter or an RNA polymerase III promoter.
  • compositions can be employed to obtain a cell or organism having donor nucleic acid inserted in a target site by a retron editor system. Such methods can employ homologous recombination (HR) to provide integration of the polynucleotide of interest at the target site.
  • HR homologous recombination
  • the donor nucleic acid can further comprises a first and a second region of homology that flank the polynucleotide of interest.
  • the first and second regions of homology of the donor nucleic acid can share homology to a first and a second genomic region, respectively, present in or flanking the target site of the cell or organism genome.
  • the donor nucleic acid sequence can be tethered to the guide polynucleotide.
  • the ncDNA can be fused to the gRNA.
  • Tethered donor DNAs can allow for colocalizing target and donor DNA, useful in genome editing, gene insertion, and targeted genome regulation, and can also be useful in targeting post-mitotic cells where function of endogenous HR machinery is expected to be highly diminished (Mali et al., 2013, Nature Methods Vol. 10:957-963).
  • the amount of homology or sequence identity shared by a target nucleic acid and a donor nucleic acid can vary and includes total lengths and/or regions having unit integral values in the ranges of about 1-20 bp, 20-50 bp, 50-100 bp, 75-150 bp, 100-250 bp, 150-300 bp, 200-400 bp, 250-500 bp, 300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250 bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1-2.5 kb, 1.5-3 kb, 2-4 kb, 2.5-5 kb, 3- 6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including the total length of the target site.
  • ranges include every integer within the range, for example, the range of 1-20 bp includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps.
  • the amount of homology can also be described by percent sequence identity over the full aligned length of the two polynucleotides which includes percent sequence identity at least of about 50%, 55%, 60%, 65%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, between 98% and 99%, 99%, between 99% and 100%, or 100%.
  • Sufficient homology includes any combination of polynucleotide length, global percent sequence identity, and optionally conserved regions of contiguous nucleotides or local percent sequence identity, for example sufficient homology can be described as a region of 75-150 bp having at least 80% sequence identity to a region of the target locus. Sufficient homology can also be described by the predicted ability of two polynucleotides to specifically hybridize under high stringency conditions, see, for example, Sambrook et al., (1989) Molecular Cloning: A Laboratory Manual, (Cold Spring Harbor Laboratory Press, NY); Current Protocols in Molecular Biology, Ausubel et al., Eds (1994) Current Protocols, (Greene Publishing Associates, Inc. and John Wiley & Sons, Inc.); and, Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Acid Probes, (Elsevier, New York).
  • the structural similarity between a given genomic region and the corresponding region of homology found on the donor nucleic acid can be any degree of sequence identity that allows for homologous recombination to occur.
  • the amount of homology or sequence identity shared by the "region of homology" of donor nucleic acid and the "genomic region” of the organism genome can be at least 50%, 55%, 60%, 65%, 70%, 75%,
  • the region of homology on the donor nucleic acid can have homology to any sequence flanking the target site. While in some instances the regions of homology share significant sequence homology to the genomic sequence immediately flanking the target site, it is recognized that the regions of homology can be designed to have sufficient homology to regions that may be further 5' or 3' to the target site.
  • the regions of homology can also have homology with a fragment of the target site along with downstream genomic regions.
  • the first region of homology further comprises a first fragment of the target site and the second region of homology comprises a second fragment of the target site, wherein the first and second fragments are dissimilar.
  • the retron editor systems described herein can be used for gene editing.
  • gene editing can be performed by cleaving one or both strands at a specific polynucleotide sequence in a cell with a nuclease associated with a suitable donor nucleic acid sequence. Once a single or double-strand break is induced in the DNA, the cell's DNA repair mechanism is activated to repair the break via nonhomologous end-joining (NHEJ) or Homology-Directed Repair (HDR) processes which can lead to modifications at the target site. This is illustrated in Fig. 1C.
  • NHEJ nonhomologous end-joining
  • HDR Homology-Directed Repair
  • the length of the DNA sequence at the target site can vary, and includes, for example, target sites that are at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleotides in length. It is further possible that the target site can be palindromic, that is, the sequence on one strand reads the same in the opposite direction on the complementary strand.
  • the nick/cleavage site can be within the target sequence or the nick/cleavage site could be outside of the target sequence.
  • the cleavage could occur at nucleotide positions immediately opposite each other to produce a blunt end cut or, in other cases, the incisions could be staggered to produce single-stranded overhangs, also called "sticky ends", which can be either 5' overhangs, or 3' overhangs.
  • Active variants of genomic target sites can also be used. Such active variants can comprise at least 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%,
  • Assays to measure the single or double-strand break of a target site by an endonuclease are known in the art and generally measure the overall activity and specificity of the agent on DNA substrates comprising recognition sites.
  • a targeting method herein can be performed in such a way that two or more DNA target sites are targeted in the method, for example. Such a method can optionally be characterized as a multiplex method. Two, three, four, five, six, seven, eight, nine, ten, or more target sites can be targeted at the same time in certain embodiments.
  • a multiplex method is typically performed by a targeting method herein in which multiple different RNA components are provided, each designed to guide a guide polynucleotide/Cas endonuclease complex to a unique DNA target site.
  • Direct delivery of the retron editors described herein can be accompanied by direct delivery (co-delivery) of other mRNAs that can promote the enrichment and/or visualization of cells receiving the retron editor system.
  • direct co-delivery of the system, together with mRNA encoding phenotypic markers can enable the selection and enrichment of cells without the use of an exogenous selectable marker by restoring function to a non-functional gene product as described in W02017070032 published 27 Apr. 2017.
  • fluorescence can be detected.
  • An example of detecting fluorescence includes the use of "traffic light” reporters.
  • Other examples of reporters can be found in the literature, such as in Stepanenko OV, Verkhusha VV, Kuznetsova IM, Uversky VN, Turoverov KK. Fluorescent proteins as biomarkers and biosensors: throwing color lights on molecular and cellular processes. Curr Protein Pept Sci. 2008 Aug;9(4):338-69, which is herein incorporated by reference in its entirety for its teaching of reporter systems.
  • Detection can also be via sequencing, such as by Sanger or next generation (NGS) DNA sequencing.
  • NGS next generation
  • Retron editor Disclosed herein is the development of a highly efficient retron gene editor is reported. More than 500 high-confidence retrons from metagenomic sources were bioinformatically identified. Using a functional reporter system, 98 variants in mammalian cells were screened and 17 RTs were identified that were more active than the previously- established Ecol-RT. Further rational design achieved editing efficiencies that were comparable to conventional single-stranded oligodeoxynucleotide (ssODN) donors. Steering DNA repair outcomes towards HDR via small molecule inhibitors and Cas9-DNA repair protein fusions boosted targeted DNA insertion. Retron editors also function with Casl2a, significantly broadening their genomic target range.
  • ssODN single-stranded oligodeoxynucleotide
  • the nickase Cas9(D10A) also supports retron editing, and this activity can be improved with DNA repair protein fusions. Retron editors were applied for installing in-frame epitopes for live cell imaging in U2OS cells. Finally, all RNA-based retron editing was demonstrated in cell lines and vertebrates.
  • a metagenomic survey identifies retron-RTs active in mammalian cells.
  • the present study explored whether a metagenomic survey of retron-RTs will uncover variants that improve homology-directed repair in heterologous hosts.
  • RFP The reporter expressed RFP and GFP that were separated by a ribosomal skipping T2A sequence [47], RFP had a 9 basepair (bp) deletion (A9) adjacent to a Y64L mutation [48], These mutations ensured that RFP(A9) was dark until the wild type (WT) sequence was restored via templated HDR following a Cas9-generated double stranded break (DSB). GFP served as a transfection control, and also reported on Cas9-generated insertions and/or deletions (indels) that shifted the open reading frame out of frame.
  • bp 9 basepair
  • DSB Cas9-generated double stranded break
  • ssODN single-stranded oligodeoxynucleotide
  • the well-characterized Ecol-RT also repaired RFP, although HDR activity was substantially lower than the ssODN (see Fig. IE, left).
  • the variable msd region included 29 nt of homology flanking a 9 nt insertion that paired with the target strand and also reverted the Y64L mutation. Having established this assay, additional RTs were tested from diverse microbes.
  • Retron-RTs are ubiquitous throughout bacteria, but only a handful have been tested experimentally [33], Therefore, a bioinformatics pipeline was developed to identify new retron-RTs from metagenomic sources (Fig. 2A).
  • Fig. 2A all RTs in the NCBI database of non- redundant bacterial and archaeal genomes were annotated, as well as 2M partially assembled bacterial genomes from the human microbiome [49, 50], It was considered that human microbiome-derived RTs will also be active at physiological conditions.
  • the msr-msd non-coding RNA (ncRNA) and accessory proteins were queried.
  • This search identified >500 high confidence, non-redundant retrons with well-annotated msr-msds (see Fig. 2B).
  • the identified systems were classified into a phylogenetic tree and sorted into 11 clades following a prior bioinformatic survey (Fig. IF) [51], The highest-confidence systems across multiple clades were prioritized for experimental characterization in mammalian cells.
  • Mval-RT derived from Myxococcus vastator, had an editing efficiency that was 6-fold higher than Ecol-RT in this transient repair assay (see Fig. 1G, inset).
  • the best- performing RTs all belonged to clade 9, indicating that these enzymes were especially active in mammalian cells, and/or that the bioinformatics workflow was most accurate in predicting the msr-msd sequences from this clade.
  • the top systems also showed mostly RFP+ cells via confocal microscopy (see Figs. IE, 3A). To test these RTs in a genomic reporter, the RFP cassette was integrated into the AAVSl locus of HEK293T cells (see Fig. 3B).
  • Retron RTs co-evolve with a cognate msr-msd, but their ability to reverse transcribe from the msr-msd of other retrons is unknown. Therefore, the feasibility of using two or more orthogonal RTs for multiplexed retron editing was explored (Fig. 1H).
  • the RFP repair activity of six active novel RTs was tested, along with Ecol-RT, when co-expressed with the msr-msd from other systems.
  • Mval-RT the most broadly cross-reactive RT, shared only 33- 36% amino acid sequence identity and 49-59% msr-msd nucleotide identity with all other systems (see Fig. 4A).
  • Efel-RT shared 44-46% amino acid sequence identity but remained exclusive to its cognate msr-msd.
  • the remaining RTs used a broad range of msr- msds, including those from Ecol-RT.
  • Vibrio rotiferianus (Vrol)- and Vibrio aphrogenes (Vapl)-RTs showed comparable activity with Proteus sp. (Pspl) msr-msd and their native msr-msds.
  • all RTs shared a similar msr-msd secondary structure, including a palindromic repeat and an extended msd hairpin (see Fig. 4B).
  • RTs were tested for their ability to integrate a 10 nt cargo into native genomic loci (see Fig. 5).
  • Two rounds of PCR were used to generate libraries for next generation DNA sequencing (NGS, Fig. 5A).
  • the first PCR reaction primed outside the homology arms to avoid amplifying the reverse transcribed ssDNA.
  • the second PCR reaction barcoded the amplicons for short-read NGS.
  • Cas9 and an ssODN with the same homology arms were used as a positive control, and to benchmark the RT fidelity.
  • Retron editing efficiency was distinguished from Cas9-generated indels by scoring the percentage of modified reads that had the intended insert relative to all modified reads. Editing efficiencies ranged from 8-30%.
  • Efel-RT showed the highest editing activity at EMX1 and CFTR, which was consistent with the genomic RFP(A9) assay (Fig. 5B). Thus Efel-RT was selected for all subsequent experiments.
  • Retron-RTs interacted extensively with their ncRNA and msDNA via their C-terminal domains [41, 62], An extended C-terminal NLS may impair this interaction, reducing overall repair activity but not Cas9-cata lyzed DNA cleavage. It was concluded that nuclear import was likely not the limiting factor for templated insertion.
  • the linker between the nuclease and the RT can also impact editing outcomes.
  • AZD7648 and M3814 inhibit the DNA-dependent protein-kinase catalytic subunit (DNA-PKcs) to improve templated repair of Cas9 breaks [68-71]
  • TAK-931 is a CDC7-selective inhibitor that arrests cells in S phase, thereby increasing the HDR time window [72]
  • the optimal working concentrations were established for each inhibitor (see Fig. 9A). All three inhibitors improved insertion activity, with the strongest improvements with AZD7648 at all loci (see Fig. 8B, left).
  • M3814 showed strong improvements at all loci except HBB, and TAK-931 decreased retron editing at F9.
  • retron editors can insert larger cargos with 50 nt homology arms, and how this is modulated by inhibitors or DNA repair proteins (see Fig.
  • AZD7648 stimulated insertion of 25 and 50 nt inserts by 8.8- and 6.0-fold respectively at EMX1 (Fig. 8D). TAK-931 showed more modest 2.4- and 1.8-fold editing increases for 25 and 50 nt cargos, respectively.
  • a comparison of Cas9-generated indels and retron-driven insertions confirmed that AZD7648 did not increase Cas9 cleavage but increased the utilization of a template ssDNA for genomic repair. Additionally, AZD7648 reduced the mutational signature at the target site, further highlighting the utility of repair pathway modulation in retron editing applications.
  • Cas9 fusion proteins can improve templated DNA insertion locally without perturbing repair pathways globally [53], At this step, three fusions that had previously been characterized across multiple loci and cell types were the focus (see Figs. 8A, 8B, right) [73- 75], Fusing Cas9 with the HDR-promoting CtIP and a dominant negative RNF168 (dnRNF168) increased retron editing efficiency 1.8- to 2.5-fold across five loci. A dominant negative mutant of 53BP1 (DN1S) had variable effects across the five loci, with no improvements at EMX1 (see Fig.
  • Retron editing in cell lines and vertebrates were used to insert a split GFP for live cell imaging of endogenously expressed proteins in U2OS cells.
  • GFP1-10 comprised of the first 10 GFP p-strands, was expressed from an integrated and inducible promoter (see Fig. 10). The 11 th p-strand was fused to the protein of interest via a short linker.
  • GFP1-10 was not fluorescent until it was complemented by GFP11 because chromophore maturation requires a critical GFPll-encoded glutamic acid [79], Thus, fusing GFP11 to a target protein allowed visualization of sub-cellular localization in live cells [80-82],
  • Efel-RT can synthesize 200 nt ssDNAs. More broadly, this approach can be readily used for installing epitopes, disease-specific alleles, and other large insertions across the entire proteome from a genetically encoded cassette.
  • msd Mutations in kif6 ut2 ° cause scoliosis in zebrafish and are linked to neurological defects in humans [83], The msd was designed to correct two base mutations that reverted a pathogenic Pro->Thr substitution. In addition, a silent T->C mutation was introduced that abolished a Bsal cleavage site for downstream restriction enzyme analysis (see Fig. 5E-F). Embryos were injected with the sgRNA, msr-msd, and a fused Cas9-RT or split Cas9 and Efel-RT mRNAs. Genomic DNA was harvested 24 hours post injection and was submitted to NGS, restriction enzyme digestion, and Sanger sequencing (see Fig. 11D-F).
  • Retron-RT and ncRNAs gene blocks were ordered from IDT or Twist Biosciences and cloned into a GFP dropout entry vector via Golden Gate assembly.
  • mRNAs were purchased from Cisterna Biologies. Retron editor expression plasmids were assembled by combining the sgRNA, retron msr-msd, the RT, and SpCas9 or AsCasl2a in a ccdB dropout mammalian expression vector using Golden Gate assembly.
  • Tissue Culture. HEK293T were generously provided by Professor XiaoluA. Cambronne.
  • HEK293T and U2OS were cultured in Dulbecco's modified Eagle's medium (DMEM) with 10% fetal bovine serum (Gibco) and 1% Penicillin-Streptomycin (Gibco). All cell lines were cultured and maintained at 37 °C and 5% CO2.
  • DMEM Dulbecco's modified Eagle's medium
  • All cell lines were cultured and maintained at 37 °C and 5% CO2.
  • U2OS Fl p-l n TREx - GFP1-10 were generated by stably integrating a GFP1-10 construct at the single FRT locus through dual transfection of pcDNA5/FRT/TO/lntron-eGFPl-10 and pOG44 plasmids in a 1:10 ratio and subsequent selection with hygromycin at 200 pg/mL and blasticidin at 15 pg/mL.
  • CMfinder 0.4.1 an RNA motif predictor that leverages both folding energy and sequence covariation [99]
  • Covariate models were then crafted with Infernal suite's embuild and used to search for analogous structures around the start of the RT open reading frame (ORF).
  • ORF RT open reading frame
  • the msr-msd regions of all retron candidates were manually inspected, especially those that did not return any hits via the automated pipeline.
  • RNAfold was used to inspect structured regions and to compare them to known msr-msd transcripts [100]
  • MAFFT-Q-INS-i was used for multiple alignments, focusing on identifying conserved sequences in related genomes.
  • MSA multiple sequence alignments
  • An MSA was constructed from the RTO-7 domain of 1,912 sequences, sourced from a dataset of 9,141 entries previously categorized as retron/retron-like RTs and an additional 16 RTs from experimentally verified retrons [33]
  • Phylogenetic trees were generated using FastTree, applying the WAG evolutionary model, combined with a discrete gamma model featuring twenty rate categories.
  • the RT tree was crafted using IQ-TREE vl.6.12, incorporating 1000 ultra-fast bootstraps (UFBoot) and the SH-like approximate likelihood ratio test (SH-aLRT) with 1000 iterations [103],
  • the best-fit model identified by Modelfinder as the LG+F+R10 due to its minimal Bayesian Information Criterion (BIC) value among 546 protein models, was used.
  • the RT Clades' internal nodes exhibited UFBoot and SH-aLRT support values exceeding 85% [101],
  • Plasmid-based fluorescent reporter assays 1.2xl0 5 HEK293T cells were seeded in a 24-well plate 18-24 hours before transfection. 0.35 pg of the retron editor plasmid and 0.35 pg of the fluorescent reporter plasmid were co-transfected using Lipofectamine 2000 (Invitrogen). Cells were trypsinized and collected for flow analysis 72 hours after transfection. Flow analysis was conducted on a Novocyte flow cytometer (ACEA Biosciences). Cells were gated to exclude dead cells and doublets, and 10,000 cells were analyzed in all samples. Cells were then gated by FITC-A (x-axis) and PE-Texas Red-A (y-axis). The editing efficiency was reported as the percentage of cells in the quadrant of FITC-A and PE-Texas Red-A.
  • Genomic reporter assays The fluorescent reporter was cloned in a plasmid designed for Bxbl recombinase-driven landing pad system [104], This plasmid was transfected into landing pad HEK293T cells followed by doxycycline induction and AP1903 selection to generate stably integrated reporter cells. 1.2xl0 5 of HEK293T reporter cells were seeded in a 24-well plate 18-24 hours prior to transfection. 1 pg of the retron editor plasmid was transfected using Lipofectamine 2000. 48-72 hours after transfection, cells were treated with doxycycline to induce the expression of the fluorescent reporter. Cells were trypsinized and collected for flow analysis and genomic DNA extraction (Qiagen DNeasy Blood and Tissue kit).
  • HEK293T cells were seeded and transfected as described for the plasmid-based fluorescent reporter assay (see above). 48 hours post -transfection, cells were seeded into 15 mm glass-bottom cell culture dishes (NEST) and incubated for an additional 24 hours. Cells were then imaged with a Nikon Ti2 Spinning Disk Confocal Microscope at 20x magnification. 8858 x 8858 pixel images were acquired and processed using imageJ.
  • U2OS Flpl n TREx GFP1-10 cells were transfected with 1 pg retron editor plasmids that target either the N- or C-terminus of the protein of interest to insert a GFPn fragment. 48-72 hours after transfection, cells were expanded into 10-cm plates. Cells with high GFP intensities were then sorted via a cell sorter (Sony MA900). For confocal imaging, cells were incubated with doxycycline for 48-72 hours before imaging to induce the expression of the GFP1-10 construct. Image acquisition was performed with live cells under spinning-disk confocal microscopy (Olympus).
  • Genomic samples were subjected to two rounds of PCR for NGS library preparation 2.
  • the first round of PCR was performed using the KOD One PCR master mix (TOYOBO). Primers were designed about 600 base pairs away from the cut site on each side to avoid amplifying the RT-generated ssDNA.
  • PCR products were gel purified and barcoded via a second round of PCR with Illumina P5/P7 adapters using Q5 HotStart High Fidelity master mix (NEB). PCR amplicons were sequenced on an Illumina Novaseq. Reads were demultiplexed using NovaSeq Reporter (Illumina).
  • the msr-msd was PCR amplified from a plasmid or gene block with a T7 promoter.
  • the ncRNA was generated using the HiScribe T7 High Yield RNA Synthesis Kit (NEB) according to the manufacturer's protocol. RNA products were purified using the RNeasy Mini Kit (Qiagen).
  • Injected embryos and un-injected sibling controls were incubated at 28.5 °C in fish water until 24 hr post fertilization, at which point, surviving embryos were euthanized in excess Tricaine (0.4% MS-222).
  • Genomic DNA was extracted from individual embryos using the HotSHOT Method [106], Briefly, embryos were transferred into 50 mM NaOH and heated to 95 °C for 20 min. The samples were neutralized with a quarter volume of 1 M Tris-HCI, pH 8, prior to downstream analysis.
  • genomic DNA was PCR amplified to extend the amplicon with Illumina adapters using the KOD One PCR master mix (TOYOBO). PCR amplicons were directly sequenced on an Illumina NovaSeq sequencer.
  • Casl2a-based retron editors further expand the potential target range and create opportunities for multiplexed retron editing due to Casl2a's ability to process its crRNA [84], Based on the results, it appears that retron-RTs are compatible with other established, i.e. transcription activator-like effectors (TALEs), and emerging RNA/DNA-guided nucleases. Importantly, further development of nickase-based retron editors can avoid the induction of doublestranded DNA breaks (Fig. 9).
  • TALEs transcription activator-like effectors
  • Retron editors are uniquely capable of synthesizing high copy numbers of the repair template at the edit site [48, 86, 87], Boosting ssDNA synthesis via RT and ncRNA engineering will further improve the processivity, fidelity, and ultimately, edit length and efficiency.
  • rational engineering of a retron RT-based prime editor boosted editing efficiency more than 8-fold [11]
  • the insertion and truncation site of the native msd, homology arm length, target/nontarget strand selection, and overall RNA structure can be modified.
  • RNA circularization, structured RNA pseudoknots, and chemical modifications also increase ncRNA stability in mammalian cells [88], Further optimization in these directions can enhance the overall efficiency of retron editing.
  • In general principles and predictive algorithms for msr-msd and repair template will further improve retron editors.
  • SSTR was maximized with 30-60 nt homology arms [92, 93]
  • SSTR is a RAD52-dependent process in yeast and human cells, suggesting that nuclease-RAD52 and/or RT-RAD52 fusions may boost retron editing [75, 94]
  • SSTR competes with two error-prone DSB repair pathways: classical NHEJ and polymerase theta-mediated end joining (TMEJ) [52, 95]
  • TMEJ polymerase theta-mediated end joining
  • Dual inhibition of NHEJ and TMEJ may further synergize with RAD52 fusions [47, 69]
  • Rational design of asymmetric templates, cleavage-blocking mutations, and dual Cas9 nickases can be done to maximize editing efficiency [93, 96]
  • Mechanistic studies of retron editor-mediated repair will further improve editing outcomes across all domains of life.
  • retron editors are emerging as a highly promising gene editing tool. Their unique ability to accurately insert or replace sizeable DNA segments opens up possibilities for correcting complex genetic mutations that were previously challenging to address. Compatibility with an all-RNA formulation opens new avenues for therapeutic delivery into cells and organisms. Additionally, retron editors are poised to broaden the scope of high-throughput functional screens, allowing for the characterization of complex genetic variants with single-base resolution. Integration of retron editors into existing screening pipelines holds great promise for advancing the understanding of gene function and regulation, ultimately paving the way for novel therapeutic interventions and biotechnological applications.
  • a bioinformatics discovery pipeline that identifies genes in the retron operon and annotates the RT-associated non-coding RNAs was developed. Open reading frames in metagenomic contigs are identified using established tools (i.e., Prodigal), while putative retron RTs are identified using Hidden Markov Models (HMMs). To reduce false positives, a bootstrapped phylogeny of retron RTs based on multiple sequence alignments (MSAs) was constructed, with non-retron RTs as outgroups. Accessory retron genes that are adjacent to the RT were annotated using the Pfam database.
  • MSAs multiple sequence alignments
  • the non-coding RNA that is recognized by the retron RT is annotated, based on conserved features of the ncRNA: it is almost always in an intergenic region between the RT and the next ORF, and the 5'-3' ends encode a long palindromic repeat.
  • covariate models of the msr-msd region were built using the experimentally validated msr-msd sequences.
  • the putative ncRNA is folded using ViennaRNA 2.0 to predict the msr-msd structure and visual verification is conducted on a subset of candidates. This approach ensures that the most likely RT-ncRNA candidates can be identified for downstream experimental validation.
  • the combination of phylogenetic clustering, the presence of expected accessory proteins, and the identification of plausible, adjacent msr-msd sequences provide a set of high-confidence retron systems for downstream characterization.
  • a "traffic light" HEK293T reporter cell line that consists of a red fluorescent protein (RFP) with a deletion that renders it inactive, and a GFP that is separated by a T2A tag.
  • RFP red fluorescent protein
  • GFP GFP that is separated by a T2A tag.
  • Expression of a retron-Cas9 fusion, along with the ncRNA, will generate a DNA break adjacent to the RFP deletion. This break can then be repaired by the retron-generated msDNA to restore RFP fluorescence in mammalian tissue culture cells. This is detected via fluorescence activated cell sorting.
  • Table 2 (continued) Table 3: Spacers for single guide RNAs (sgRNAs) and CRISPR RNAs (crRNAs) used in Example 1
  • sgRNAs single guide RNAs
  • crRNAs CRISPR RNAs
  • ISSN 1674-800X, 1674-8018 (2024) (Nov. 2021).
  • ISSN 0021-9258 (July 1992). Shimamoto, T., Inouye, M. & Inouye, S. The formation of the 2', 5'- phosphodiester linkage in the cDNA priming reaction by bacterial reverse transcriptase in a cell-free system, eng. The Journal of Biological Chemistry 270, 581-588. ISSN: 0021-9258 (Jan. 1995). Schimmel, J. et al. Modulating mutational outcomes and improving precise gene editing at CRISPR-Cas9-induced breaks by chemical inhibition of endjoining pathways, eng. Cell Reports 42, 112019. ISSN: 2211-1247 (Feb. 2023). Savic, N. et al.
  • NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.
  • CRISPR-Cas9 fusion to dominant-negative 53BP1 enhances HDR and inhibits NHEJ specifically at Cas9 target sites, en. Nature Communications 10, 2866.
  • Carusillo, A. et al. A novel Cas9 fusion protein promotes targeted genome editing with reduced mutational burden in primary human cells, en. Nucleic Acids Research 51, 4660- 4673. ISSN: 0305-1048, 1362-4962. (2024) (May 2023).
  • ISSN 0167-7799, 1879-3096. (2024) (Aug. 2023). Carlson-Stevermer, J. et a/.Assembly of CRISPR ribonucleoproteins with biotinylated oligonucleotides via an RNA aptamer for precise gene editing, en. Nature Communications 8. Publisher: Nature Publishing Group, 1711. ISSN: 2041-1723. (2024) (Nov. 2017). Aird, E. J., Lovendahl, K. N., St. Martin, A., Reuben S. Harris & Gordon, W. R. Increasing Cas9-mediated homology-directed repair efficiency through covalent tethering of DNA repair template, en. Communications Biology 1.
  • HMMER web server interactive sequence similarity searching, en. Nucleic Acids Research 39, W29-W37. ISSN: 0305-1048, 1362-4962. (2024) (July 2011). Eddy, S. R. Accelerated Profile HMM Searches, en. PLoS Computational Biology 7 (ed Pearson, W. R.) el002195. ISSN: 1553-7358. (2024) (Oct. 2011). Yao, Z., Weinberg, Z. & Ruzzo, W. L. CMfinder— a covariance model based RNA motif finding algorithm, en. Bioinformatics 22, 445-452. ISSN: 1367-4811, 1367-4803. (2024) (Feb. 2006). . Lorenz, R. et al. ViennaRNA Package 2.0. en. Algorithms for Molecular Biology 6, 26. ISSN: 1748-7188.
  • CRISPResso2 provides accurate and rapid genome editing sequence analysis, eng. Nature Biotechnology 37, 224-226. ISSN: 1546-1696 (Mar. 2019). . Meeker, N. D., Hutchinson, S. A., Ho, L. & Trede, N. S. Method for Isolation of PCR-Ready Genomic DNA from Zebrafish Tissues. EN. BioTechnigues. Publisher: Taylor & Francis. (2024) (Nov. 2007). .
  • NRT-36AP018689.1-1528883-1529843 cgcacccttagcgaatgagcttacttagttcattggatagcgtttcgctatcctgcatacaatctgattcaatgccgcatga aatgtgcagagccagaatacagtagtttctggaactgcacattttcatccgcgacttaagacgtaagggtgtg
  • NRT-75CP002582.1-3047327-3048323 aaaagagcaactagattgaggcgattcgcctccttggaaaagggtactaagtttctgtcgcacaccaatttataagcttat aaattggtgtgcgacagaaatgaaataatagtagttgctctttttttt
  • Vrol (SEQ ID NO: 26): CACACCCTTA GCGAATGAGC TAACTTAGTT CATTGGATAG CGTTTCGCTA TCCTGCATAC
  • RFP reporter (SEQ ID NO: 115):
  • HBB (SEQ ID NO: 118):
  • CTGCCCAGGG CCTCACCACC AACTTCATCC ACGTTCACCT TGCCCCACAG GCTACATGCT
  • BRD8 (SEQ ID NO: 121): GGAGACTAGG AAGGAGGAGG CCTAAGGATG GGGCTTTTCT GTCACCAATC GCTACATGCT
  • RAB11 SEQ ID NO: 1233:
  • TAGAGTGCGA GAGCCCATGG CCTCACCTTT AAAGAGGTAG TCGTACTCGT CGTCCCGTGT GCCGCCGCCA CCTGTAATCC CAGCAGCATT TACATACTCA TGAAGGACCA TGTGGTCACG CATTGCGCGG CCGAGGAGCG AAAGGGCGGG AGCAGCAGTG GTATCTGTGG GACCAGGGGG

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

Les éditeurs de rétrons permettent l'insertion d'un segment d'ADN défini par l'utilisateur dans un génome de cellule hôte. Pour maximiser l'activité d'éditeurs de rétrons, ils ont été optimisés par des lieurs améliorés, des séquences de localisation nucléaire et des séquences d'ARN. L'éditeur de rétrons résultant peut insérer, supprimer ou modifier de manière programmable le génome d'une cellule ou d'un organisme hôte, y compris des génomes humains.
PCT/US2024/036763 2023-07-03 2024-07-03 Compositions et procédés d'édition précise du génome à l'aide de rétrons Pending WO2025010350A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363511699P 2023-07-03 2023-07-03
US63/511,699 2023-07-03

Publications (2)

Publication Number Publication Date
WO2025010350A2 true WO2025010350A2 (fr) 2025-01-09
WO2025010350A3 WO2025010350A3 (fr) 2025-05-30

Family

ID=94172134

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/036763 Pending WO2025010350A2 (fr) 2023-07-03 2024-07-03 Compositions et procédés d'édition précise du génome à l'aide de rétrons

Country Status (1)

Country Link
WO (1) WO2025010350A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120518784A (zh) * 2025-07-24 2025-08-22 崖州湾国家实验室 一种核苷酸序列编辑方法及具有编辑作用的融合蛋白

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240263173A1 (en) * 2021-08-11 2024-08-08 The Board Of Trustees Of The Leland Stanford Junior University High-throughput precision genome editing in human cells

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120518784A (zh) * 2025-07-24 2025-08-22 崖州湾国家实验室 一种核苷酸序列编辑方法及具有编辑作用的融合蛋白

Also Published As

Publication number Publication date
WO2025010350A3 (fr) 2025-05-30

Similar Documents

Publication Publication Date Title
US11555181B2 (en) Engineered cascade components and cascade complexes
US20240117330A1 (en) Enzymes with ruvc domains
US20240344045A1 (en) Enzymes with ruvc domains
US20240209332A1 (en) Enzymes with ruvc domains
WO2020180975A1 (fr) Édition de base hautement multiplexée
EP3420080A1 (fr) Méthodes de modulation de résultats de réparation d'adn
CA3060508A1 (fr) Polynucleotides adn/arn crispr hybrides et leurs procedes d'utilisation
US20230340481A1 (en) Systems and methods for transposing cargo nucleotide sequences
US20220220460A1 (en) Enzymes with ruvc domains
US20240301374A1 (en) Systems and methods for transposing cargo nucleotide sequences
WO2025010350A2 (fr) Compositions et procédés d'édition précise du génome à l'aide de rétrons
Buffington et al. Discovery and engineering of retrons for precise genome editing
WO2024086845A9 (fr) Nucléases casphi2 modifiées
AU2023225035A1 (en) Systems and methods for transposing cargo nucleotide sequences
EP4423277A1 (fr) Enzymes ayant des domaines hepn
GB2617659A (en) Enzymes with RUVC domains
CN119213129A (zh) 融合蛋白

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24836578

Country of ref document: EP

Kind code of ref document: A2