[go: up one dir, main page]

WO2024124204A2 - Retrotransposon compositions and methods of use - Google Patents

Retrotransposon compositions and methods of use Download PDF

Info

Publication number
WO2024124204A2
WO2024124204A2 PCT/US2023/083232 US2023083232W WO2024124204A2 WO 2024124204 A2 WO2024124204 A2 WO 2024124204A2 US 2023083232 W US2023083232 W US 2023083232W WO 2024124204 A2 WO2024124204 A2 WO 2024124204A2
Authority
WO
WIPO (PCT)
Prior art keywords
retrotransposase
cell
sequence
nucleic acid
seq
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2023/083232
Other languages
French (fr)
Other versions
WO2024124204A3 (en
Inventor
Brian C. Thomas
Lisa ALEXANDER
Christopher Brown
Cindy CASTELLE
Daniela S.A. Goltsman
Sarah Laperriere
Morayma TEMOCHE-DIAZ
Anu Thomas
Mary Kaitlyn TSAI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Metagenomi Inc
Original Assignee
Metagenomi Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Metagenomi Inc filed Critical Metagenomi Inc
Priority to EP23901699.1A priority Critical patent/EP4630544A2/en
Publication of WO2024124204A2 publication Critical patent/WO2024124204A2/en
Publication of WO2024124204A3 publication Critical patent/WO2024124204A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • C12N15/625DNA sequences coding for fusion proteins containing a sequence coding for a signal sequence
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/90Vectors containing a transposable element

Definitions

  • Transposable elements are movable DNA sequences and play a crucial role in gene function and evolution. While transposable elements are found in nearly all forms of life, their prevalence varies among organisms, with a large proportion of the eukaryotic genome encoding for transposable elements.
  • the retrotransposase comprises an amino acid sequence having at least 80% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the retrotransposase comprises an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises an amino acid sequence having at least 95% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase is encoded by a nucleic acid having at least 75% sequence identity to any one of SEQ ID NOs: 17-19, 24, and 76-817. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 80% sequence identity to any one of SEQ ID NOs: 17-19, 24, and 76-81.
  • retrotransposase is encoded by a nucleic acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, retrotransposase is encoded by a nucleic acid sequence having at least 95% sequence identity to any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase has less than 80% sequence identity to a known retrotransposase. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR).
  • UTR untranslated region
  • UTR untranslated region
  • the retrotransposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate.
  • the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
  • the retrotransposase comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of the retrotransposase.
  • the NLS comprises a sequence at least 80% identical to a sequence from the group consisting of SEQ ID NO: 49-64. In some embodiments, the NLS comprises SEQ ID NO: 50. In some embodiments, the NLS is proximal to the N-terminus of the retrotransposase. In some embodiments, the NLS comprises SEQ ID NO: 49. In some embodiments, the NLS is proximal to the C-terminus of the retrotransposase. In some embodiments, the retrotransposase is derived from an uncultivated microorganism.
  • polypeptides comprising a reverse transcriptase comprising an amino acid sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47 fused N- or C-terminally to a non-retrotransposase domain or an affinity tag.
  • the non-retrotransposase domain is an RNA-binding protein domain.
  • the RNA binding protein domain comprises a bacteriophage MS2 coat protein (MCP) domain.
  • nucleic acids encoding the engineered retrotransposase system described herein or the polypeptide described herein.
  • Described herein, in certain embodiments, are methods for modifying a target nucleic acid sequence comprising contacting the target nucleic acid sequence using the engineered nuclease system described herein.
  • modifying the target nucleic acid sequence comprises binding, nicking, or cleaving, the target nucleic acid sequence.
  • the target nucleic acid sequence comprises genomic DNA, viral DNA, viral RNA, or bacterial DNA.
  • the target nucleic acid sequence comprises deoxyribonucleic acid (DNA).
  • the modification is in vitro.
  • the modification is in vivo.
  • the modification is ex vivo.
  • the vector is a plasmid, a minicircle, a CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.
  • the cell is a eukaryotic cell.
  • the cell is a mammalian cell.
  • the cell is an immortalized cell.
  • the cell is an insect cell.
  • the cell is a yeast cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is a fungal cell. In some embodiments, the cell is a prokaryotic cell. In some embodiments, the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, primary cell, or a derivative thereof. In some embodiments, the cell is an engineered cell. In some embodiments, the cell is a stable cell.
  • the present disclosure provides for an engineered retrotransposase system, comprising: (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence, wherein the cargo nucleotide sequence is configured to interact with a retrotransposase; and (b) a retrotransposase, wherein: (i) the retrotransposase is configured to transpose the cargo nucleotide sequence to a target nucleic acid locus; and (ii) the retrotransposase is derived from an uncultivated microorganism.
  • the retrotransposase comprises a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the retrotransposase comprises a reverse transcriptase domain. In some embodiments, the retrotransposase further comprises one or more zinc finger domains. In some embodiments, the retrotransposase further comprises an endonuclease domain. In some embodiments, the retrotransposase has less than 80% sequence identity to a known retrotransposase. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR). In some embodiments, the retrotransposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate.
  • the retrotransposase comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of the retrotransposase.
  • NLS nuclear localization sequences
  • the NLS comprises a sequence at least 80% identical to a sequence from the group consisting of SEQ ID NO: 49-64.
  • sequence identity is determined by a BLASTP, CLUSTALW, MUSCLE, MAFFT, or CLUSTALW with the parameters of the Smith-Waterman homology search algorithm.
  • sequence identity is determined by the BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment.
  • the present disclosure provides for an engineered retrotransposase system, comprising: (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence, wherein the cargo nucleotide sequence is configured to interact with a retrotransposase; and (b) a retrotransposase, wherein: (i) the retrotransposase is configured to transpose the cargo nucleotide sequence to a target nucleic acid locus; and (ii) the retrotransposase comprises a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the retrotransposase is derived from an uncultivated microorganism.
  • the retrotransposase comprises a reverse transcriptase domain. In some embodiments, the retrotransposase further comprises one or more zinc finger domains. In some embodiments, the retrotransposase further comprises an endonuclease domain. In some embodiments, the retrotransposase has less than 80% sequence identity to a known retrotransposase. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR). In some embodiments, the retrotransposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate.
  • the sequence identity is determined by a BLASTP, CLUSTALW, MUSCLE, MAFFT, or CLUSTALW with the parameters of the Smith- Waterman homology search algorithm.
  • the sequence identity is determined by the BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment.
  • the present disclosure provides for a deoxyribonucleic acid polynucleotide encoding the engineered retrotransposase system of any one of the aspects or embodiments described herein.
  • the present disclosure provides for a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a retrotransposase, and wherein the retrotransposase is derived from an uncultivated microorganism, wherein the organism is not the uncultivated microorganism.
  • the retrotransposase comprises at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the retrotransposase comprises a sequence encoding one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of the retrotransposase.
  • NLSs nuclear localization sequences
  • the NLS comprises a sequence selected from SEQ ID NOs: 49-64. In some embodiments, the NLS comprises SEQ ID NO: 50. In some embodiments, the NLS is proximal to the N-terminus of the retrotransposase. In some embodiments, the NLS comprises SEQ ID NO: 49. In some embodiments, the NLS is proximal to the C-terminus of the retrotransposase. In some embodiments, the organism is prokaryotic, bacterial, eukaryotic, fungal, plant, mammalian, rodent, or human.
  • the present disclosure provides for a vector comprising the nucleic acid of any one of the aspects or embodiments described herein.
  • the method further comprises a nucleic acid encoding a cargo nucleotide sequence configured to form a complex with the retrotransposase.
  • the vector is a plasmid, a minicircle, a CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.
  • the present disclosure provides for a cell comprising the vector of any one of any one of the aspects or embodiments described herein
  • the present disclosure provides for a method of manufacturing a retrotransposase, comprising cultivating the cell of any one of the aspects or embodiments described herein.
  • the present disclosure provides for a method for binding, nicking, cleaving, marking, modifying, or transposing a double-stranded deoxyribonucleic acid polynucleotide, comprising: (a) contacting the double-stranded deoxyribonucleic acid polynucleotide with a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid locus; and (b) wherein the retrotransposase comprises a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the retrotransposase is derived from an uncultivated microorganism.
  • the retrotransposase comprises a reverse transcriptase domain. In some embodiments, the retrotransposase further comprises one or more zinc finger domains. In some embodiments, the retrotransposase further comprises an endonuclease domain. In some embodiments, the retrotransposase has less than 80% sequence identity to a known retrotransposase. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR). In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is transposed via a ribonucleic acid polynucleotide intermediate.
  • the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
  • the present disclosure provides for a method of modifying a target nucleic acid locus, the method comprising delivering to the target nucleic acid locus the engineered retrotransposase system of any one of the aspects or embodiments described herein, wherein the retrotransposase is configured to transpose the cargo nucleotide sequence to the target nucleic acid locus, and wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies the target nucleic acid locus.
  • the target nucleic acid locus comprises binding, nicking, cleaving, marking, modifying, or transposing the target nucleic acid locus.
  • the target nucleic acid locus comprises deoxyribonucleic acid (DNA). In some embodiments, the target nucleic acid locus comprises genomic DNA, viral DNA, or bacterial DNA. In some embodiments, the target nucleic acid locus is in vitro. In some embodiments, the target nucleic acid locus is within a cell. In some embodiments, the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, a human cell, or a primary cell. In some embodiments, the cell is a primary cell. In some embodiments, the primary cell is a T cell.
  • the primary cell is a hematopoietic stem cell (HSC).
  • delivering the engineered retrotransposase system to the target nucleic acid locus comprises delivering the nucleic acid of any one of the aspects or embodiments described herein or the vector of any one of the aspects or embodiments described herein.
  • delivering the engineered retrotransposase system to the target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding the retrotransposase.
  • the nucleic acid comprises a promoter to which the open reading frame encoding the retrotransposase is operably linked.
  • delivering the engineered retrotransposase system to the target nucleic acid locus comprises delivering a capped mRNA containing the open reading frame encoding the retrotransposase. In some embodiments, delivering the engineered retrotransposase system to the target nucleic acid locus comprises delivering a translated polypeptide. In some embodiments, the retrotransposase does not induce a break at or proximal to the target nucleic acid locus. [0020] In some aspects, the present disclosure provides for a host cell comprising an open reading frame encoding a heterologous retrotransposase having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47 or a variant thereof.
  • the host cell is an E. coli cell.
  • the E. coli cell is a ZDE3 lysogen or the E. coli cell is a BL21 (DE3) strain.
  • the E. coli cell has an ompT Ion genotype.
  • the open reading frame is operably linked to a T7 promoter sequence, a T7-lac promoter sequence, a lac promoter sequence, a tac promoter sequence, a trc promoter sequence, a ParaBAD promoter sequence, a PrhaBAD promoter sequence, a T5 promoter sequence, a cspA promoter sequence, an ara uAD promoter, a strong leftward promoter from phage lambda (pL promoter), or any combination thereof.
  • the open reading frame comprises a sequence encoding an affinity tag linked in-frame to a sequence encoding the retrotransposase.
  • the affinity tag is an immobilized metal affinity chromatography (IMAC) tag.
  • the IMAC tag is a polyhistidine tag.
  • the affinity tag is a myc tag, a human influenza hemagglutinin (HA) tag, a maltose binding protein (MBP) tag, a glutathione S-transferase (GST) tag, a streptavidin tag, a FLAG tag, or any combination thereof.
  • the affinity tag is linked in-frame to the sequence encoding the retrotransposase via a linker sequence encoding a protease cleavage site.
  • the protease cleavage site is a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof.
  • the open reading frame is codon-optimized for expression in the host cell.
  • the open reading frame is provided on a vector.
  • the open reading frame is integrated into a genome of the host cell.
  • the present disclosure provides for a culture comprising the host cell of any one of the aspects or embodiments described herein in compatible liquid medium.
  • the present disclosure provides for a method of producing a retrotransposase, comprising cultivating the host cell of any one of the aspects or embodiments described herein in compatible growth medium.
  • the method further comprising inducing expression of the retrotransposase by addition of an additional chemical agent or an increased amount of a nutrient.
  • the additional chemical agent or increased amount of a nutrient comprises Isopropyl P-D-l -thiogalactopyranoside (IPTG) or additional amounts of lactose.
  • the method further comprising isolating the host cell after the cultivation and lysing the host cell to produce a protein extract.
  • the method further comprises subjecting the protein extract to IMAC, or ionaffinity chromatography.
  • the open reading frame comprises a sequence encoding an IMAC affinity tag linked in-frame to a sequence encoding the retrotransposase.
  • the IMAC affinity tag is linked in-frame to the sequence encoding the retrotransposase via a linker sequence encoding protease cleavage site.
  • the protease cleavage site comprises a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof.
  • the method further comprises cleaving the IMAC affinity tag by contacting a protease corresponding to the protease cleavage site to the retrotransposase.
  • the method further comprises performing subtractive IMAC affinity chromatography to remove the affinity tag from a composition comprising the retrotransposase.
  • the present disclosure provides for a method of disrupting a locus in a cell, comprising contacting to the cell a composition comprising: (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence, wherein the cargo nucleotide sequence is configured to interact with a retrotransposase; and(b) a retrotransposase, wherein: (i) the retrotransposase is configured to transpose the cargo nucleotide sequence to a target nucleic acid locus; (ii) the retrotransposase comprises a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47; and (iii) the retrotransposase has at least equivalent transposition activity to a known retrotransposase in a cell.
  • the transposition activity is measured in vitro by introducing the retrotransposase to cells comprising the target nucleic acid locus and detecting transposition of the target nucleic acid locus in the cells.
  • the composition comprises 20 pmoles or less of the retrotransposase. In some embodiments, the composition comprises 1 pmol or less of the retrotransposase.
  • FIG. 1 depicts the genomic context of a bacterial retrotransposon.
  • MG140-34 is a predicted retrotransposase (arrow) encoding a reverse transcriptase domain. Regions flanking the retrotransposase display secondary structure that possibly represent binding sites for the retrotransposase (secondary structure boxes and zoomed images).
  • FIG. 2 shows microbial MG retrotransposases (black branches on clade 4) are more closely related to Eukaryotic than viral retrotransposases (grey branches on clade 6).
  • Clade 1 Telomerase reverse transcriptases
  • clade 2 Group II intron reverse transcriptases
  • clade 3 Eukaryotic R1 type retrotransposases
  • clade 4 microbial and Eukaryotic R2 retrotransposases
  • clade 5 Eukaryotic retrovirus-related reverse transcriptases
  • clade 6 viral reverse transcriptases.
  • FIG. 3 depicts Clades 3 and 4 from the phylogenetic gene tree from (A).
  • Some microbial MG retrotransposases contain multiple Zn-finger motifs (vertical rectangles), the conserved RVT l reverse transcriptase domain, and APE/RLE or other endonuclease domains (top and bottom panel).
  • Some microbial MG retrotransposases lack an endonuclease domain (mid-panel).
  • FIG. 4 depicts a phylogenetic tree inferred from a multiple sequence alignment of the reverse transcriptase domain from diverse enzymes. RT sequences were derived from DNA, as well as RNA assemblies. Reference RTs were included in the tree for classification purposes.
  • FIG. 5 A depicts a phylogenetic tree inferred from a multiple sequence alignment of RT domains identified from families of RTs (MG148).
  • FIG. 5B depicts genomic context of MG140-34-R2 RT. Predicted genes not associated with the RT are displayed as white arrows.
  • FIG. 5C depicts nucleotide sequence alignment of four members of the MG148 family indicating conserved regions (boxes underneath the sequence) upstream of the RT (arrow annotated over the consensus sequence).
  • FIG. 6 depicts screening of in vitro activity of RTns family of enzymes by qPCR (MG148). Activity was detected by qPCR using primers that amplify the full-length cDNA product derived from a primer extension reaction containing the respective RT. Samples are derived from RT reactions containing 100 nM substrate.
  • the negative control is a no-template water in the in vitro transcript! on/translati on system reaction.
  • FIG. 7A depicts a phylogenetic tree inferred from a multiple sequence alignment of full-length Group II intron RTs identified sequences of Class C.
  • FIG. 7B depicts a summary table of the MG153 family of Group II introns.
  • AAI average pairwise amino acid identity of family members to reference Group II intron sequences.
  • FIGs. 8A and 8B depict screening of in vitro activity of GII intron Class C candidates MG1 53-22, MG153-23, and MG153-24 by primer extension assay.
  • FIG. 8A lane numbers correspond to the following: 1-PURExpress (in vitro transcript! on/translati on system) no template control, 2-MMLV control RT, 3-TGIRT-III control RT, 4-MarathonRT control RT, 5-7 correspond to candidates MG153-22 through 24. Numbering in bold corresponds to gel lanes with active candidates. Results are representative of two independent experiments.
  • FIG. 8B depicts detection of full-length cDNA production by qPCR. Dark grey bars correspond to RTs that generate product at least 10-fold above background. Results were determined from two technical replicates.
  • FIG. 9 depicts screening to assess the ability of indicated control RTs and GII intron Class C candidates to synthesize cDNA in mammalian cells. Detection of 542 bp PCR products by D1000 TapeStation for MG153-23. Lanes not relevant for the described experiment are covered by black boxes.
  • FIG. 10 depicts the genomic context of the MG160-7 retron-like single-domain RT.
  • the region upstream from the RT (dotted box) is conserved across MG160 members and folds into secondary structures (inset) that may be required for activity and function.
  • FIGs. 11A and 11B depict screening of in vitro activity of retron-like candidate MG160-7 by primer extension assay.
  • FIG. 11A lane numbers correspond to the following samples: 1-PURExpress (in vitro transcription/translation system) no template control, 2-MMLV control RT, 3-TGIRT-III control RT, 4: MG160-7.
  • FIG. 11B depicts quantification of full-length cDNA production by qPCR. Dark grey bars correspond to RTs that generate product at least 10- fold above background. Results were determined from two technical replicates.
  • FIG. 12 depicts a screening of the ability of MG153 GII derived RTs to synthesize cDNA in mammalian cells. Detection of 542 bp cDNA synthesis PCR products were assayed by Taqman qPCR. cDNA activity was normalized to the activity TGIRT control where TGIRT represents a value of 1. Y axis is shown in log 10 scale.
  • FIGs. 13A and 13B depict protein expression of MG153 GII derived RTs by immunoblots.
  • FIG. 13A Cells were transfected with plasmids containing the candidate RTs and protein expression was evaluated by immunoblot, detecting the HA peptide fused to the N termini of the RTs. All lanes were normalized to total protein concentration. Lanes not relevant for the described experiment in FIG. 13A are covered by black boxes.
  • FIG. 13B Table of expected molecular sizes for tested RTs.
  • FIG. 14 depicts relative activity of MG153-23 GII derived RT normalized to protein expression. cDNA synthesis was detected by Taqman qPCR, protein expression was detected by immunoblots. Activity relative to TGIRT was normalized per total protein concentration. Y axis is shown in a linear scale.
  • FIGs. 15A-15C depict a screen of the ability of indicated control RTs and candidates RTs to synthesize cDNA in mammalian cells.
  • FIG. 15A depicts a schematic illustration showing the methodology used to detect cDNA synthesis in mammalian cells.
  • the first (FAM) and last (HEX) 100 bps of a 4. Ikb RNA template are detected using Taqman based qPCR.
  • Taqman qPCR was used to detect the first (FAM probe) and last (HEX probe) 100 bp PCR products amplified from cDNA synthesized from an RNA template by MG148 family of non-LTR retrotransposon derived RTs (FIG. 15B) and retron-like MG160-7 (FIG. 15C).
  • SEQ ID NOs: 1-16 show the full-length peptide sequences of MG140 transposition proteins.
  • SEQ ID NOs: 32-41 show the full-length peptide sequences of MG148 reverse transcriptase proteins.
  • SEQ ID NOs: 25-31 show the nucleotide sequences of genes encoding HA-His-tagged MG148 reverse transcriptase proteins.
  • SEQ ID NOs: 76-80 show the nucleotide sequences of genes encoding MG148 reverse transcriptase proteins optimized for expression in mammalian cells.
  • SEQ ID NOs: 42-44 show the full-length peptide sequences of MG153 reverse transcriptase proteins.
  • SEQ ID NOs: 17-19 show the nucleotide sequences of E. coli codon optimized genes encoding MG153 reverse transcriptase proteins.
  • SEQ ID NOs: 20-23 show the nucleotide sequences of genes encoding strep-tagged MG153 reverse transcriptase proteins.
  • SEQ ID NOs: 45-47 shows the full-length peptide sequences of MG160 reverse transcriptase proteins.
  • SEQ ID NO: 24 shows the nucleotide sequence of an E. coli codon optimized gene encoding an MG160 reverse transcriptase protein.
  • SEQ ID NO: 48 shows the nucleotide sequence of a genes encoding an MG160 reverse transcriptase protein optimized for expression in mammalian cells and cloned into a tethered spCas9 (H840A) plasmid.
  • SEQ ID NO: 81 shows the nucleotide sequences of genes encoding MG160 reverse transcriptase proteins optimized for expression in mammalian cells.
  • SEQ ID NOs: 66-69 show the nucleotide sequences of primers.
  • SEQ ID NOs: 70-71 show the nucleotide sequences of Taqman probes for qPCR.
  • SEQ ID NO: 65 shows the nucleotide sequence of an RNA template for cDNA synthesis.
  • SEQ ID NOs: 72-75 show the nucleotide sequences of genes encoding control reverse transcriptase proteins optimized for expression in mammalian cells.
  • the term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within one or more than one standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value.
  • nucleotide refers to a base-sugar-phosphate combination.
  • Contemplated nucleotides include naturally occurring nucleotides and synthetic nucleotides.
  • Nucleotides are monomeric units of a nucleic acid sequence (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)).
  • nucleotide includes ribonucleoside triphosphates adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP, dCTP, diTP, dUTP, dGTP, dTTP, or derivatives thereof.
  • ribonucleoside triphosphates adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP)
  • deoxyribonucleoside triphosphates such as dATP, dCTP, diTP, dUTP, dGTP, dTTP, or derivatives thereof.
  • Such derivatives include, for example, [aS]dATP, 7-deaza-dGTP and 7-deaza-dATP, and nucleot
  • nucleotide as used herein encompasses dideoxyribonucleoside triphosphates (ddNTPs) and their derivatives.
  • ddNTPs dideoxyribonucleoside triphosphates
  • Illustrative examples of ddNTPs include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP.
  • a nucleotide may be unlabeled or detectably labeled, such as using moieties comprising optically detectable moieties (e.g, fluorophores) or quantum dots.
  • Detectable labels include, for example, radioactive isotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels, and enzyme labels.
  • Fluorescent labels of nucleotides include but are not limited fluorescein, 5- carboxyfluorescein (FAM), 2'7'-dimethoxy-4'5-dichloro-6-carboxyfluorescein (JOE), rhodamine, 6-carboxyrhodamine (R6G), N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy- X-rhodamine (ROX), 4-(4 'dimethylaminophenylazo) benzoic acid (DABCYL), Cascade Blue, Oregon Green, Texas Red, Cyanine and 5-(2'-aminoethyl)aminonaphthalene-l-sulfonic acid (EDANS).
  • FAM 5- carboxyfluorescein
  • JE 2'7'-dimethoxy-4'5-dichloro-6-carboxyfluorescein
  • rhodamine 6-carboxy
  • fluorescently labeled nucleotides include [R6G]dUTP, [TAMRA]dUTP, [R110]dCTP, [R6G]dCTP, [TAMRA]dCTP, [JOE]ddATP, [R6G]ddATP, [FAM]ddCTP, [R110]ddCTP, [TAMRA]ddGTP, [ROX]ddTTP, [dR6G]ddATP, [dR110]ddCTP, [dTAMRA]ddGTP, and [dROX]ddTTP available from Perkin Elmer, Foster City, Calif;
  • nucleotide encompasses chemically modified nucleotides.
  • An exemplary chemically-modified nucleotide is biotin-dNTP.
  • biotinylated dNTPs include, biotin-dATP (e.g, bio-N6-ddATP, biotin- 14- dATP), biotin-dCTP (e.g., biotin- 11-dCTP, biotin- 14-dCTP), and biotin-dUTP e.g., biotin-11- dUTP, biotin- 16-dUTP, biotin-20-dUTP).
  • polynucleotide oligonucleotide
  • nucleic acid refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof, either in single-, double-, or multistranded form.
  • Contemplated polynucleotides include a gene or fragment thereof.
  • Exemplary polynucleotides include, but are not limited to, DNA, RNA, coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger
  • RNA RNA
  • tRNA transfer RNA
  • rRNA ribosomal RNA
  • siRNA short interfering RNA
  • shRNA short-hairpin RNA
  • miRNA micro-RNA
  • ribozymes cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, cell-free polynucleotides including cell-free DNA (cfDNA) and cell-free RNA (cfRNA), nucleic acid probes, and primers.
  • a T means U (Uracil) in RNA and T (Thymine) in DNA.
  • a polynucleotide can be exogenous or endogenous to a cell and/or exist in a cell-free environment.
  • the term polynucleotide encompasses modified polynucleotides (e.g., altered backbone, sugar, or nucleobase). If present, modifications to the nucleotide structure are imparted before or after assembly of the polymer.
  • Non-limiting examples of modifications include: 5-bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos, locked nucleic acids, glycol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or fluorescein linked to the sugar), thiol-containing nucleotides, biotin-linked nucleotides, fluorescent base analogs, CpG islands, methyl -7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, queuosine, and wyosine.
  • the sequence of nucleotides may be interrupted by non-nucleotide components.
  • transfection refers to introduction of a nucleic acid into a cell by non-viral or viral-based methods.
  • the nucleic acid molecules may be gene sequences encoding complete proteins or functional portions thereof.
  • peptide refers to a polymer of at least two amino acid residues joined by peptide bond(s). This term does not connote a specific length of polymer, nor is it intended to imply or distinguish whether the peptide is produced using recombinant techniques, chemical or enzymatic synthesis, or is naturally occurring. The terms apply to naturally occurring amino acid polymers as well as amino acid polymers comprising at least one modified amino acid. In some cases, the polymer is interrupted by non-amino acids. The terms include amino acid chains of any length, including full length proteins, and proteins with or without secondary or tertiary structure (e.g., domains).
  • amino acid polymer that has been modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, oxidation, and any other manipulation such as conjugation with a labeling component.
  • amino acid and amino acids refer to natural and non-natural amino acids, including, but not limited to, modified amino acids.
  • Modified amino acids include amino acids that have been chemically modified to include a group or a chemical moiety not naturally present on the amino acid.
  • amino acid includes both D-amino acids and L-amino acids.
  • non-native refers to a nucleic acid or polypeptide sequence that is non-naturally occurring.
  • Non-native refers to a non-naturally occurring nucleic acid or polypeptide sequence that comprises modifications such as mutations, insertions, or deletions.
  • the term non-native encompasses fusion nucleic acids or polypeptides that encodes or exhibits an activity (e.g, enzymatic activity, methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.) of the nucleic acid or polypeptide sequence to which the non-native sequence is fused.
  • a non-native nucleic acid or polypeptide sequence includes those linked to a naturally-occurring nucleic acid or polypeptide sequence (or a variant thereof) by genetic engineering to generate a chimeric nucleic acid or polypeptide sequence encoding a chimeric nucleic acid or polypeptide.
  • promoter refers to the regulatory DNA region which controls transcription or expression of a polynucleotide (e.g., a gene) and which may be located adjacent to or overlapping a nucleotide or region of nucleotides at which RNA transcription is initiated.
  • a promoter may contain specific DNA sequences which bind protein factors, often referred to as transcription factors, which facilitate binding of RNA polymerase to the DNA leading to gene transcription.
  • Eukaryotic basal promoters typically, though not necessarily, contain a TATA-box and/or a CAAT box.
  • expression refers to the process by which a nucleic acid sequence or a polynucleotide is transcribed from a DNA template (such as into mRNA or other RNA transcript) and/or the process by which a transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides may be collectively referred to as “gene product.” If the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell.
  • operably linked refers to an arrangement of genetic elements, e.g., a promoter, an enhancer, a polyadenylation sequence, etc., wherein an operation (e.g, movement or activation) of a first genetic element has some effect on the second genetic element.
  • the effect on the second genetic element can be, but need not be, of the same type as operation of the first genetic element.
  • two genetic elements are operably linked if movement of the first element causes an activation of the second element.
  • a regulatory element which may comprise promoter and/or enhancer sequences, is operatively linked to a coding region if the regulatory element helps initiate transcription of the coding sequence. There may be intervening residues between the regulatory element and coding region so long as this functional relationship is maintained.
  • a “vector” as used herein refers to a macromolecule or association of macromolecules that comprises or associates with a polynucleotide and which mediates delivery of the polynucleotide to a cell.
  • vectors include nucleic-based vectors (e.g., plasmids and viral vectors) and liposomes.
  • An exemplary nucleic-acid based vector comprises genetic elements, e.g., regulatory elements, operatively linked to a gene to facilitate expression of the gene in a target.
  • expression cassette and “nucleic acid cassette” are used interchangeably to refer to a component of a vector comprising a combination of nucleic acid sequences or elements (e.g., therapeutic gene, promoter, and a terminator) that are expressed together or are operably linked for expression.
  • the terms encompass an expression cassette including a combination of regulatory elements and a gene or genes to which they are operably linked for expression.
  • a “functional fragment” of a DNA or protein sequence refers to a fragment that retains a biological activity (either functional or structural) that is substantially similar to a biological activity of the full-length DNA or protein sequence.
  • a biological activity of a DNA sequence includes its ability to influence expression in a manner attributed to the full-length sequence.
  • engineered,” “synthetic,” and “artificial” are used interchangeably herein to refer to an object that has been modified by human intervention. For example, the terms refer to a polynucleotide or polypeptide that is non-naturally occurring.
  • An engineered peptide has, but does not require, low sequence identity (e.g., less than 50% sequence identity, less than 25% sequence identity, less than 10% sequence identity, less than 5% sequence identity, less than 1% sequence identity) to a naturally occurring human protein.
  • low sequence identity e.g., less than 50% sequence identity, less than 25% sequence identity, less than 10% sequence identity, less than 5% sequence identity, less than 1% sequence identity
  • VPR and VP64 domains are synthetic transactivation domains.
  • Non-limiting examples include the following: a nucleic acid modified by changing its sequence to a sequence that does not occur in nature; a nucleic acid modified by ligating it to a nucleic acid that it does not associate with in nature such that the ligated product possesses a function not present in the original nucleic acid; an engineered nucleic acid synthesized in vitro with a sequence that does not exist in nature; a protein modified by changing its amino acid sequence to a sequence that does not exist in nature; an engineered protein acquiring a new function or property.
  • An “engineered” system comprises at least one engineered component.
  • transposable element refers to a DNA sequence that can move from one location in the genome to another (i.e., they can be “transposed”).
  • Transposable elements can be generally divided into two classes. Class I transposable elements, or “retrotransposons”, are transposed via transcription and translation of an RNA intermediate which is subsequently reincorporated into its new location into the genome via reverse transcription (a process mediated by a reverse transcriptase). Class II transposable elements, or “DNA transposons”, are transposed via a complex of single- or double-stranded DNA flanked on either side by a transposase.
  • retrotransposons refers to Class I transposable elements that function according to a two-part “copy and paste” mechanism involving an RNA intermediate.
  • “Retrotransposase” refers to an enzyme responsible for transposition of a retrotransposon.
  • the retrotransposase can comprise a reverse transcriptase domain, one or more zinc finger domains, an endonuclease domain, or combinations thereof.
  • Genome editing and “genome editing” can be used interchangeably.
  • Gene editing or genome editing means to change the nucleic acid sequence of a gene or a genome.
  • Genome editing can include, for example, insertions, deletions, and mutations.
  • Genome editing can be performed by a gene editing system, for example a retrotransposase.
  • complex refers to a joining of at least two components. The two components may each retain the properties/activities they had prior to forming the complex or gain properties as a result of forming the complex.
  • the joining includes, but is not limited to, covalent bonding, non-covalent bonding (i.e., hydrogen bonding, ionic interactions, Van der Waals interactions, and hydrophobic bond), use of a linker, fusion, or any other suitable method.
  • Contemplated components of the complex include polynucleotides, polypeptides, or combinations thereof.
  • a complex comprises an endonuclease and a guide polynucleotide.
  • sequence identity refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as measured using a sequence comparison algorithm.
  • Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment for polypeptide sequences longer than 30 residues; BLASTP using parameters of a wordlength (W) of 2, an expectation (E) of 1000000, and the PAM30 scoring matrix setting gap costs at 9 to open gaps and 1 to extend gaps for sequences of less than 30 residues (these are the default parameters for BLASTP in the BLAST suite available at https://blast.ncbi.nlm.nih.gov); CLUSTALW with the Smith -Waterman homology search algorithm parameters with a match of 2, a mismatch of -1, and a gap of -1; MUSCLE with default parameters; MAFFT with parameters of a retree of 2 and max iterations of 1000; Novafold with default parameters; HMMER hmmalign with
  • optically aligned in the context of two or more nucleic acids or polypeptide sequences, refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned to maximal correspondence of amino acids residues or nucleotides, for example, as determined by the alignment producing a highest or “optimized” percent identity score.
  • open reading frame refers to a nucleotide sequence that can encode a protein, or a portion of a protein.
  • An open reading frame can begin with a start codon (represented as, e.g., AUG for an RNA molecule and ATG in a DNA molecule in the standard code) and can be read in codon-triplets until the frame ends with a STOP codon (represented as, e.g., UAA, UGA, or UAG for an RNA molecule and TAA, TGA, or TAG in a DNA molecule in the standard code).
  • start codon represented as, e.g., AUG for an RNA molecule and ATG in a DNA molecule in the standard code
  • STOP codon represented as, e.g., UAA, UGA, or UAG for an RNA molecule and TAA, TGA, or TAG in a DNA molecule in the standard code.
  • variants of any of the enzymes described herein with one or more conservative amino acid substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three-dimensional structure or function of the polypeptide.
  • Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity, and R chain length for one another. Additionally, or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., non-conserved residues) without altering the basic functions of the encoded proteins.
  • Such conservatively substituted variants may include variants with at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of the retrotransposase protein sequences described herein (e.g., MG140, MG 148, MG 153, or MG160 family retrotransposases described herein, or any other family retrotransposase described herein).
  • retrotransposase protein sequences described herein e.g., MG140, MG 148, MG 153, or MG160 family retrotrans
  • such conservatively substituted variants are functional variants.
  • Such functional variants can encompass sequences with substitutions such that the activity of one or more critical active site residues of the retrotransposase are not disrupted.
  • a functional variant of any of the proteins described herein lacks substitution of at least one of the conserved or functional residue. In some embodiments, a functional variant of any of the proteins described herein lacks substitution of all of the conserved or functional residues.
  • a decreased activity variant as a protein described herein comprises a disrupting substitution of at least one, at least two, or all three catalytic residues.
  • transposable elements with unique functionality and structure may offer the potential to further disrupt deoxyribonucleic acid (DNA) editing technologies, improving speed, specificity, functionality, and ease of use.
  • DNA deoxyribonucleic acid
  • Metagenomic sequencing from natural environmental niches containing large numbers of microbial species may offer the potential to drastically increase the number of new transposable elements known and speed the discovery of new oligonucleotide editing functionalities.
  • Transposable elements are deoxyribonucleic acid sequences that can change position within a genome, often resulting in the generation or amelioration of mutations. In eukaryotes, a great proportion of the genome, and a large share of the mass of cellular DNA, is attributable to transposable elements. Although transposable elements are “selfish genes” which propagate themselves at the expense of other genes, they have been found to serve various important functions and to be crucial to genome evolution. Based on their mechanism, transposable elements are classified as either Class I “retrotransposons” or Class II “DNA transposons”.
  • Class I transposable elements also referred to as retrotransposons, function according to a two-part “copy and paste” mechanism involving an RNA intermediate.
  • the retrotransposon is transcribed.
  • the resulting RNA is subsequently converted back to DNA by reverse transcriptase (generally encoded by the retrotransposon itself), and the reverse transcribed retrotransposon is integrated into its new position in the genome by integrase.
  • Retrotransposons are further classified into three orders. Retrotransposons with long terminal repeats (“LTRs”) encode reverse transcriptase and are flanked by long strands of repeating DNA.
  • LTRs long terminal repeats
  • Retrotransposons with long interspersed nuclear elements encode reverse transcriptase, lack LTRs, and are transcribed by RNA polymerase II.
  • Retrotransposons with short interspersed nuclear elements (“SINEs”) are transcribed by RNA polymerase III but lack reverse transcriptase, instead relying on the reverse transcription machinery of other transposable elements (e.g., LINEs).
  • Class II transposable elements also referred to as DNA transposons, function according to mechanisms that do not involve an RNA intermediate.
  • Many DNA transposons display a “cut and paste” mechanism in which transposase binds terminal inverted repeats (“TIRs”) flanking the transposon, cleaves the transposon from the donor region, and inserts it into the target region of the genome.
  • Others referred to as “helitrons,” display a “rolling circle” mechanism involving a single-stranded DNA intermediate and mediated by an undocumented protein believed to possess HUH endonuclease function and 5’ to 3’ helicase activity. First, a circular strand of DNA is nicked to create two single DNA strands.
  • the protein remains attached to the 5’ phosphate of the nicked strand, leaving the 3’ hydroxyl end of the complementary strand exposed and thus allowing a polymerase to replicate the non-nicked strand.
  • the new strand disassociates and is itself replicated along with the original template strand.
  • Still other DNA transposons, “Polintons,” are theorized to undergo a “self-synthesis” mechanism.
  • the transposition is initiated by an integrase’s excision of a single-stranded extra-chromosomal Polinton element, which forms a racket-like structure.
  • the Polinton undergoes replication with DNA polymerase B, and the double stranded Polinton is inserted into the genome by the integrase.
  • DNA transposons such as those in the IS200/IS605 family, proceed via a “peel and paste” mechanism in which TnpA excises a piece of single-stranded DNA (as a circular “transposon joint”) from the lagging strand template of the donor gene and reinserts it into the replication fork of the target gene.
  • transposable elements While transposable elements have found some use as biological tools, documented transposable elements do not encompass the full range of possible biodiversity and targetability, and may not represent all possible activities. Here, thousands of genomic fragments were mined from numerous metagenomes for transposable elements. The documented diversity of transposable elements may have been expanded and novel systems may have been developed into highly targetable, compact, and precise gene editing agents.
  • the retrotransposase is a MG140, MG148, MG153, or MG160, retrotransposase. (see FIG. 1).
  • the retrotransposases are less than about 1,400 amino acids in length.
  • the retrotransposases simplify delivery and extend therapeutic applications.
  • the present disclosure provides for an engineered retrotransposase system discovered through metagenomic sequencing.
  • the metagenomic sequencing is conducted on samples.
  • the samples are collected from a variety of environments.
  • the environment is a human microbiome, an animal microbiome, environments with high temperatures, environments with low temperatures.
  • the environment includes sediment.
  • the present disclosure provides for an engineered retrotransposase system comprising a retrotransposase derived from an uncultivated microorganism.
  • the retrotransposase is configured to bind a 3’ untranslated region (UTR).
  • the retrotransposase binds a 5’ untranslated region (UTR).
  • the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the retrotransposase is a MG140 retrotransposase (i.e., SEQ ID NOs: 1-16).
  • the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 1-16.
  • the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 1-16.
  • the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 1-16.
  • the retrotransposase is a MG148 retrotransposase (i.e., SEQ ID NOs: 32-41).
  • the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 32-41.
  • the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 32-41.
  • the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 32-41.
  • the retrotransposase is a MG153 retrotransposase (i.e., SEQ ID NOs: 42-44).
  • the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 42-44.
  • the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 42-44.
  • the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 42-44.
  • the retrotransposase is a MG160 retrotransposase (i.e., SEQ ID NOs: 45-47).
  • the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 45-47.
  • the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 45-47.
  • the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 45-47.
  • the retrotransposase is encoded by a nucleic acid sequence that is codon optimized. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence that is codon optimized for expression in a mammalian cell.
  • the retrotransposase is encoded by a nucleic acid sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81.
  • the retrotransposase is encoded by a nucleic acid sequence having at least 70% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 75% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 80% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76- 81.
  • the retrotransposase is encoded by a nucleic acid sequence having at least 85% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 90% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 95% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81.
  • the retrotransposase is encoded by a nucleic acid sequence having at least 96% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 97% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 98% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81.
  • the retrotransposase is encoded by a nucleic acid sequence having at least 99% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76- 81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81.
  • the retrotransposase is tagged with a tag such as a His-tag or strep-tag or tethered to an enzyme (e.g., spCas9).
  • the retrotransposase is encoded by a nucleic acid sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about
  • the retrotransposase is encoded by a nucleic acid sequence having at least 70% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 75% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48.
  • the retrotransposase is encoded by a nucleic acid sequence having at least 80% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 85% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 90% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48.
  • the retrotransposase is encoded by a nucleic acid sequence having at least 95% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 96% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 97% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48.
  • the retrotransposase is encoded by a nucleic acid sequence having at least 98% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 99% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48.
  • the retrotransposase comprises a reverse transcriptase domain. In some embodiments, the retrotransposase further comprises one or more zinc finger domains. In some embodiments, the retrotransposase further comprises an endonuclease finger domain.
  • the retrotransposase has less than about 90%, less than about 85%, less than about 80%, less than about 75%, less than about 70%, less than about 65%, less than about 60%, less than about 55%, less than about 50%, less than about 45%, less than about 40%, less than about 35%, less than about 30%, less than about 25%, less than about 20%, less than about 15%, less than about 10%, or less than about 5% sequence identity to a known or documented retrotransposase.
  • the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR).
  • the retrotransposase comprises one or more nuclear localization sequences (NLSs).
  • the NLS is proximal to the N- or C-terminus of the retrotransposase.
  • the NLS is appended N-terminal or C-terminal of the retrotransposase and comprise any one of SEQ ID NOs: 49-64, or having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NOs: 49-64.
  • the NLS comprises a sequence having at least about 80% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 85% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 90% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 91% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 92% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 93% identity to SEQ ID NOs: 49-64.
  • the NLS comprises a sequence having at least about 94% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 95% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 96% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 97% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 98% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 99% identity to SEQ ID NOs: 49-64.
  • the NLS comprises a sequence having 100% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having 100% identity to SEQ ID NO: 49. In some cases, the NLS comprises a sequence having 100% identity to SEQ ID NOs: 50.
  • Table 1 Example NLS Sequences that may be used with retrotransposases according to the disclosure
  • the retrotransposase comprises a tag.
  • the tag is an affinity tag.
  • affinity tags include, but are not limited to, a His-tag, a Flag tag, a Myc-tag, an MBP-tag, and a GST-tag.
  • the retrotransposase comprises a protease cleavage site.
  • exemplary protease cleavage sites include, but are not limited to, a TEV site, a C3 site, a Factor Xa site, and an Enterokinase site.
  • the retrotransposase is tethered to a site directed nuclease. In some embodiments, the retrotransposase is fused to a site directed nuclease. In some embodiments, the retrotransposase is recruited to a site directed nuclease. In some embodiments, the site directed nuclease is an endonuclease. In some embodiments, the site directed nuclease is a Cas nuclease. In some embodiments, the Cas nuclease is an RNA guided CRISPR Cas9 nuclease. In some embodiments, the site directed nuclease is a dead nuclease or a nickase. In some embodiments, the site directed nuclease brings the retrotransposase into close proximity of a target site that is to be modified.
  • the retrotransposase system further comprises a site directed nuclease and a guide RNA (e.g., gRNA).
  • a T means U (Uracil) in RNA and T (Thymine) in DNA.
  • the retrotransposase systems and described herein comprise a means for directing the site directed nuclease to a particular location in the target nucleic acid.
  • the guide RNA comprises synthetic nucleotides or modified nucleotides.
  • the guide RNA comprises one or more inter-nucleoside linkers modified from the natural phosphodiester.
  • all of the internucleoside linkers of the guide RNA, or contiguous nucleotide sequence thereof, are modified.
  • the inter nucleoside linkage comprises Sulphur (S), such as a phosphorothioate inter-nucleoside linkage.
  • the guide RNA comprises modifications to a ribose sugar or nucleobase.
  • the guide RNA comprises one or more nucleosides comprising a modified sugar moiety, wherein the modified sugar moiety is a modification of the sugar moiety when compared to the ribose sugar moiety found in deoxyribose nucleic acid (DNA) and RNA.
  • the modification is within the ribose ring structure.
  • Exemplary modifications include, but are not limited to, replacement with a hexose ring (HNA), a bicyclic ring having a biradical bridge between the C2 and C4 carbons on the ribose ring (e.g., locked nucleic acids (LNA)), or an unlinked ribose ring which typically lacks a bond between the C2 and C3 carbons (e.g., UNA).
  • the sugar-modified nucleosides comprise bicyclohexose nucleic acids or tricyclic nucleic acids.
  • the modified nucleosides comprise nucleosides where the sugar moiety is replaced with a non-sugar moiety, for example peptide nucleic acids (PNA) or morpholino nucleic acids.
  • the guide RNA comprises one or more modified sugars.
  • the sugar modifications comprise modifications made by altering the substituent groups on the ribose ring to groups other than hydrogen, or the 2 ’-OH group naturally found in DNA and RNA nucleosides.
  • substituents are introduced at the 2’, 3’, 4’, or 5’ positions, or combinations thereof.
  • nucleosides with modified sugar moieties comprise 2’ modified nucleosides, e.g., 2’ substituted nucleosides.
  • a 2’ sugar modified nucleoside in some embodiments, is a nucleoside that has a substituent other than -H or -OH at the 2’ position (2’ substituted nucleoside) or comprises a 2’ linked biradical, and comprises 2’ substituted nucleosides and LNA (2’-4’ biradical bridged) nucleosides.
  • 2’- substituted modified nucleosides comprise, but are not limited to, 2’-O-alkyl-RNA, 2’-O-methyl- RNA, 2 ’-alkoxy -RNA, 2’-O-methoxyethyl-RNA (MOE), 2’-amino-DNA, 2’-Fluoro-RNA, and 2’-F-ANA nucleosides.
  • the modification in the ribose group comprises a modification at the 2’ position of the ribose group.
  • the modification at the 2’ position of the ribose group is selected from the group consisting of 2’-O-methyl, 2’ -fluoro, 2’-deoxy, and 2’-O-(2-methoxyethyl).
  • the guide RNA comprises one or more modified sugars. In some embodiments, the guide RNA comprises only modified sugars. In certain embodiments, the guide RNA comprises greater than about 10%, 25%, 50%, 75%, or 90% modified sugars. In some embodiments, the modified sugar is a bicyclic sugar. In some embodiments, the modified sugar comprises a 2’-O-methoxyethyl group. In some embodiments, the guide RNA comprises both inter-nucleoside linker modifications and nucleoside modifications.
  • the guide RNA comprises a sequence complementary to a eukaryotic, fungal, plant, mammalian, or human genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a eukaryotic genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a fungal genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a plant genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a mammalian genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a human genomic polynucleotide sequence.
  • the guide RNA is 30-400 nucleotides in length. In some cases, the guide RNA is 85-245 nucleotides in length. In some cases, the guide RNA is more than 90 nucleotides in length. In some cases, the guide RNA is less than 245 nucleotides in length. In some embodiments, the guide RNA is 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 220, 240, or more than 240 nucleotides in length.
  • the guide RNA is about 30 to about 40, about 30 to about 50, about 30 to about 60, about 30 to about 70, about 30 to about 80, about 30 to about 90, about 30 to about 100, about 30 to about 120, about 30 to about 140, about 30 to about 160, about 30 to about 180, about 30 to about 200, about 30 to about 220, about 30 to about 240, about 50 to about 60, about 50 to about 70, about 50 to about 80, about 50 to about 90, about 50 to about 100, about 50 to about 120, about 50 to about 140, about 50 to about 160, about 50 to about 180, about 50 to about 200, about 50 to about 220, about 50 to about 240, about 100 to about 120, about 100 to about 140, about 100 to about 160, about 100 to about 180, about 100 to about 200, about 100 to about 220, about 100 to about 240, about 160 to about 180, about 160 to about 200, about 160 to about 220, or about 160 to about 240 nucleotides in length.
  • the sequence is determined by a BLASTP, CLUSTALW, MUSCLE, or MAFFT algorithm, or a CLUSTALW algorithm with the Smith-Waterman homology search algorithm parameters.
  • the sequence is determined by the BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment.
  • the retrotransposase system comprises a cargo nucleic acid or polynucleotide.
  • the cargo nucleic acid is comprised in a double-stranded deoxyribonucleic acid.
  • the cargo nucleic acid is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
  • the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR).
  • the cargo nucleic acid comprises synthetic nucleotides or modified nucleotides.
  • the cargo nucleic acid comprises one or more internucleoside linkers modified from the natural phosphodiester.
  • all of the inter-nucleoside linkers of the cargo nucleic acid, or contiguous nucleotide sequence thereof, are modified.
  • the inter-nucleoside linkage comprises Sulphur (S), such as a phosphorothioate inter-nucleoside linkage.
  • the cargo nucleic acid comprises modifications to a ribose sugar or nucleobase.
  • the cargo nucleic acid comprises one or more nucleosides comprising a modified sugar moiety, wherein the modified sugar moiety is a modification of the sugar moiety when compared to the ribose sugar moiety found in deoxyribose nucleic acid (DNA) and RNA.
  • the modification is within the ribose ring structure.
  • Exemplary modifications include, but are not limited to, replacement with a hexose ring (EINA), a bicyclic ring having a biradical bridge between the C2 and C4 carbons on the ribose ring (c.g, locked nucleic acids (LNA)), or an unlinked ribose ring which typically lacks a bond between the C2 and C3 carbons (e.g., UNA).
  • the sugar-modified nucleosides comprise bicyclohexose nucleic acids or tricyclic nucleic acids.
  • the modified nucleosides comprise nucleosides where the sugar moiety is replaced with a non-sugar moiety, for example peptide nucleic acids (PNA) or morpholino nucleic acids.
  • the cargo nucleic acid comprises one or more modified sugars.
  • the sugar modifications comprise modifications made by altering the substituent groups on the ribose ring to groups other than hydrogen, or the 2’ -OH group naturally found in DNA and RNA nucleosides.
  • substituents are introduced at the 2’, 3’, 4’, 5’ positions, or combinations thereof.
  • nucleosides with modified sugar moieties comprise 2’ modified nucleosides, e.g., 2’ substituted nucleosides.
  • a 2’ sugar modified nucleoside in some embodiments, is a nucleoside that has a substituent other than -H or -OH at the 2’ position (2’ substituted nucleoside) or comprises a 2’ linked biradical, and comprises 2’ substituted nucleosides and LNA (2’ -4’ biradical bridged) nucleosides.
  • Examples of 2 ’-substituted modified nucleosides comprise, but are not limited to, 2’-O-alkyl-RNA, 2’-O- methyl-RNA, 2 ’-alkoxy -RNA, 2 ’-O-m ethoxy ethyl -RNA (MOE), 2’-amino-DNA, 2’-Fluoro- RNA, and 2’-F-ANA nucleosides.
  • the modification in the ribose group comprises a modification at the 2’ position of the ribose group.
  • the modification at the 2’ position of the ribose group is selected from the group consisting of 2’-O- methyl, 2’-fluoro, 2’-deoxy, and 2’-O-(2-methoxy ethyl).
  • the cargo nucleic acid comprises one or more modified sugars. In some embodiments, the cargo nucleic acid comprises only modified sugars. In certain embodiments, the cargo nucleic acid comprises greater than about 10%, 25%, 50%, 75%, or 90% modified sugars. In some embodiments, the modified sugar is a bicyclic sugar. In some embodiments, the modified sugar comprises a 2’ -O-m ethoxy ethyl group. In some embodiments, the cargo nucleic acid comprises both inter-nucleoside linker modifications and nucleoside modifications.
  • engineered retrotransposase system comprising: (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence.
  • engineered retrotransposase systems described herein comprise a means for cutting a target nucleic acid sequence.
  • the engineered retrotransposase system comprises (a) a doublestranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 70% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 80% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a doublestranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 85% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 95% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a doublestranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 96% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 97% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 98% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a doublestranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 99% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having 100% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
  • Described herein, in certain embodiments, is a cell comprising the systems described herein.
  • the cell is a eukaryotic cell (e.g., a plant cell, an animal cell, a protist cell, or a fungi cell), a mammalian cell (a Chinese hamster ovary (CHO) cell, baby hamster kidney (BHK), human embryo kidney (HEK), mouse myeloma (NSO), or human retinal cells), an immortalized cell e.g., a HeLa cell, a COS cell, a HEK-293T cell, a MDCK cell, a 3T3 cell, a PC12 cell, a Huh7 cell, a HepG2 cell, a K562 cell, a N2a cell, or a SY5Y cell), an insect cell e.g., a Spodoptera frugiperda cell, a Trichoplusia ni cell, Drosophila melanogaster cell, a S2 cell, or a Heliothis virescens cell
  • the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is an immortalized cell. In some embodiments, the cell is an insect cell. In some embodiments, the cell is a yeast cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is a fungal cell. In some embodiments, the cell is a prokaryotic cell.
  • the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, a primary cell, or derivative thereof.
  • the cell is an engineered cell.
  • the cell is a stable cell (i.e., a cell that has constant expression of a specific gene or protein).
  • nucleic acid sequences encoding the engineered retrotransposase systems described herein.
  • the present disclosure provides a nucleic acid comprising an engineered nucleic acid sequence encoding a retrotransposase described herein.
  • the engineered nucleic acid sequence encoding a retrotransposase is optimized for expression in an organism.
  • the retrotransposase is derived from an uncultivated microorganism. In some embodiments, the organism is not the uncultivated organism.
  • the organism is prokaryotic. In some embodiments, the organism is bacterial. In some embodiments, the organism is eukaryotic. In some embodiments, the organism is fungal. In some embodiments, the organism is a plant. In some embodiments, the organism is mammalian. In some embodiments, the organism is a rodent. In some embodiments, the organism is human.
  • the nucleic acid encoding the engineered retrotransposase system is a DNA, for example a linear DNA, a plasmid DNA, or a minicircle DNA.
  • the nucleic acid encoding the engineered nuclease system is an RNA, for example a mRNA.
  • the nucleic acid encoding the engineered retrotransposase systems is delivered by a nucleic acid-based vector.
  • the nucleic acidbased vector is plasmid (e.g., circular DNA molecules that can autonomously replicate inside a cell), cosmid (e.g., pWE or sCos vectors), artificial chromosome, human artificial chromosome (HAC), yeast artificial chromosomes (YAC), bacterial artificial chromosome (BAC), Pl -derived artificial chromosomes (PAC), phagemid, phage derivative, bacmid, or virus.
  • the vector is selected from the group consisting of: pSF-CMV-NEO-NH2-PPT- 3XFLAG, pSF-CMV-NEO-COOH-3XFLAG, pSF-CMV-PURO-NH2-GST-TEV, pSF-OXB20- COOH-TEV-FLAG(R)-6His, pCEP4 pDEST27, pSF-CMV-Ub-KrYFP, pSF-CMV-FMDV- daGFP, pEFla-mCherry-Nl vector, pEFla-tdTomato vector, pSF-CMV-FMDV-Hygro, pSF- CMV-PGK-Puro, pMCP-tag(m), pSF-CMV-PURO-NH2-CMYC, pSF-OXB20-BetaGal,pSF- OXB20-Fhic, pSF-OXB20, pSF-Ta
  • the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus.
  • the virus is an alphavirus.
  • the virus is a parvovirus.
  • the virus is an adenovirus.
  • the virus is an AAV.
  • the virus is a baculovirus.
  • the virus is a Dengue virus.
  • the virus is a lentivirus. In some embodiments, the virus is a herpesvirus. In some embodiments, the virus is a poxvirus. In some embodiments, the virus is an anellovirus. In some embodiments, the virus is a bocavirus. In some embodiments, the virus is a vaccinia virus. In some embodiments, the virus is a retrovirus.
  • the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV- rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV-Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11,
  • the nucleic acid encoding the engineered retrotransposase system is delivered by a non-nucleic acid-based delivery system (e.g., a non-viral delivery system).
  • a non-viral delivery system e.g., a liposome.
  • the nucleic acid is associated with a lipid.
  • the nucleic acid associated with a lipid in some embodiments, is encapsulated in the aqueous interior of a liposome, interspersed within the lipid bilayer of a liposome, attached to a liposome via a linking molecule that is associated with both the liposome and the nucleic acid, entrapped in a liposome, complexed with a liposome, dispersed in a solution containing a lipid, mixed with a lipid, combined with a lipid, contained as a suspension in a lipid, contained or complexed with a micelle, or otherwise associated with a lipid.
  • the nucleic acid is comprised in a lipid nanoparticle (LNP).
  • the endonuclease or gene editing system (e.g., retrotransposase) is introduced into a cell (e.g., host cell) in any suitable way, either stably or transiently.
  • the endonuclease or gene editing system is transfected into the cell.
  • the cell is transduced or transfected with a nucleic acid construct that encodes the endonuclease or gene editing system.
  • a cell is transduced (e.g., with a virus encoding the endonuclease or gene editing system), or transfected (e.g., with a plasmid encoding the endonuclease or gene editing system) with a nucleic acid that encodes the endonuclease or gene editing system.
  • the transduction is a stable or transient transduction.
  • cells expressing the endonuclease or gene editing system or containing the endonuclease or gene editing system are transduced or transfected with one or more gRNA molecules, for example when the endonuclease or gene editing system comprises the retrotransposase.
  • a plasmid expressing the endonuclease or gene editing system is introduced into cells through electroporation, transient (e.g., lipofection) or stable genome integration (e.g., piggybac), or viral transduction (for example lentivirus or AAV), or other methods known to those of skill in the art.
  • the gene editing system is introduced into the cell as one or more polypeptides.
  • delivery is achieved through the use of RNP complexes. Delivery methods to cells for polypeptides and/or RNPs are known in the art, for example by electroporation or by cell squeezing.
  • Exemplary methods of delivery of nucleic acids include lipofection, nucleofection, electroporation, stable genome integration (e.g., piggybac), microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipidnucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.
  • lipofection is described in e.g., U.S. Pat. Nos.
  • lipofection reagents are sold commercially (e.g., TransfectamTM, LipofectinTM and SF Cell Line 4D-Nucleofector X KitTM (Lonza)).
  • Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of WO 91/17424 and WO 91/16024.
  • the delivery is to cells (e.g., in vitro or ex vivo administration) or target tissues (e.g., in vivo administration).
  • the nucleic acid is comprised in a liposome or a nanoparticle that specifically targets a host cell.
  • Systems of the present disclosure may be used for various applications, such as, for example, nucleic acid editing (e.g., gene editing), binding to a nucleic acid molecule (e.g., sequence-specific binding).
  • Such systems may be used, for example, for addressing (e.g., removing or replacing) a genetically inherited mutation that may cause a disease in a subject, inactivating a gene in order to ascertain its function in a cell, as a diagnostic tool to detect disease-causing genetic elements (e.g., via cleavage of reverse-transcribed viral RNA or an amplified DNA sequence encoding a disease-causing mutation), as deactivated enzymes in combination with a probe to target and detect a specific nucleotide sequence (e.g., sequence encoding antibiotic resistance int bacteria), to render viruses inactive or incapable of infecting host cells by targeting viral genomes, to add genes or amend metabolic pathways to engineer organisms to produce valuable small molecules, macromolecules, or secondary metabolites, to establish a
  • Described herein, in certain embodiments, are methods for modifying a target nucleic acid comprising providing an engineered retrotransposase system.
  • the present disclosure provides a method for binding, nicking, cleaving, marking, modifying, or transposing a double-stranded deoxyribonucleic acid polynucleotide.
  • the method comprises contacting the double-stranded deoxyribonucleic acid polynucleotide with a retrotransposase.
  • the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
  • the retrotransposase is configured to transpose the cargo nucleotide sequence as single- stranded deoxyribonucleic acid polynucleotide.
  • the retrotransposase is configured to transpose the cargo nucleotide sequence as double-stranded deoxyribonucleic acid polynucleotide.
  • the retrotransposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate.
  • the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR).
  • the present disclosure provides a method of modifying a target nucleic acid sequence (e.g., locus).
  • the method comprises delivering to the target nucleic acid sequence the engineered retrotransposase system described herein.
  • the complex is configured such that upon binding of the complex to the target nucleic acid sequence, the complex modifies the target nucleic acid sequence.
  • modifying the target nucleic acid sequence comprises binding, nicking, cleaving, marking, modifying, or transposing the target nucleic acid sequence.
  • the target nucleic acid sequence comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
  • the target nucleic acid comprises genomic DNA, viral DNA, viral RNA, or bacterial DNA.
  • the target nucleic acid sequence is in vitro.
  • the target nucleic acid sequence is within a cell.
  • the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell.
  • the cell is a primary cell.
  • the primary cell is a T cell.
  • the primary cell is a hematopoietic stem cell (HSC).
  • the cell is a human cell.
  • the cell is genome edited ex vivo. In some embodiments, the cell is genome edited in vivo.
  • delivery of the engineered retrotransposase system to the target nucleic acid sequence comprises delivering the nucleic acid described herein or the vector described herein. In some embodiments, delivery of engineered retrotransposase system to the target nucleic acid sequence comprises delivering a nucleic acid comprising an open reading frame encoding the retrotransposase. In some embodiments, the nucleic acid comprises a promoter. In some embodiments, the open reading frame encoding the retrotransposase is operably linked to the promoter.
  • delivery of the engineered retrotransposase system to the target nucleic acid sequence comprises delivering a capped mRNA containing the open reading frame encoding the retrotransposase. In some embodiments, delivery of the engineered retrotransposase system to the target nucleic acid sequence comprises delivering a translated polypeptide. In some embodiments, delivery of the engineered retrotransposase system to the target nucleic acid sequence comprises delivering a deoxyribonucleic acid (DNA) encoding the engineered retrotransposase operably linked to a ribonucleic acid (RNA) pol III promoter.
  • DNA deoxyribonucleic acid
  • RNA ribonucleic acid
  • the retrotransposase does not induce a break at or proximal to the target nucleic acid sequence.
  • the transposition activity is measured in vitro by introducing the retrotransposase to cells comprising the target nucleic acid sequence and detecting transposition of the target nucleic acid sequence in the cells.
  • the composition comprises 20 pmoles or less of the retrotransposase. In some embodiments, the composition comprises 1 pmol or less of the retrotransposase.
  • the method comprises cultivating a host cell with the engineered retrotransposase system described herein.
  • the host cell is a bacterial cell.
  • the bacterial cell is Bifidobacterium longum, Bifidobacterium lactis, Bifidobacterium animalis, Bifidobacterium breve, Bifidobacterium infantis, Bifidobacterium adolescentis, Lactobacillus acidophilus, Lactobacillus casei, Lactobacillus paracasei, Lactobacillus salivarius, Lactobacillus reuteri, Lactobacillus rhamnosus, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus fermentum, Lactococcus lactis, Streptococcus thermophilus, Lactococcus lactis, Lactococcus diacetylactis, Lactococcus cremoris, Lactobacillus bulgaricus, Lactobacillus helveticus, Lactobacill
  • the host cell is an E. coli cell.
  • the E. coli cell is a ZDE3 lysogen or a BL21(DE3) strain.
  • the A. coli cell has an ompT Ion genotype.
  • the host cell is an E. coli cell.
  • the E. coli cell is a ZDE3 lysogen or the E. coli cell is a BL21(DE3) strain.
  • the E. coli cell has an ompT Ion genotype.
  • the open reading frame is operably linked to a promoter sequence.
  • the promoter is selected from the group consisting of a mini promoter, an inducible promoter, a constitutive promoter, and derivatives thereof.
  • the promoter is selected from the group consisting of CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl9, p40, Synapsin, CaMKII, GRK1, and derivatives thereof.
  • the open reading frame is operably linked to a T7 promoter sequence, a T7-lac promoter sequence, a lac promoter sequence, a tac promoter sequence, a trc promoter sequence, a ParaBAD promoter sequence, a PrhaBAD promoter sequence, a T5 promoter sequence, a cspA promoter sequence, an araPBAD promoter, a strong leftward promoter from phage lambda (pL promoter), or any combination thereof.
  • a T7 promoter sequence a T7-lac promoter sequence, a lac promoter sequence, a tac promoter sequence, a trc promoter sequence, a ParaBAD promoter sequence, a PrhaBAD promoter sequence, a T5 promoter sequence, a cspA promoter sequence, an araPBAD promoter, a strong leftward promoter from phage lambda (pL promoter), or any combination thereof.
  • the open reading frame comprises a sequence encoding an affinity tag linked in-frame to a sequence encoding the retrotransposase.
  • the affinity tag is an immobilized metal affinity chromatography (IMAC) tag.
  • the IMAC tag is a polyhistidine tag.
  • the affinity tag is a myc tag, a human influenza hemagglutinin (HA) tag, a maltose binding protein (MBP) tag, a glutathione S-transferase (GST) tag, a streptavidin tag, a FLAG tag, or any combination thereof.
  • the affinity tag is linked in-frame to the sequence encoding the retrotransposase via a linker sequence encoding a protease cleavage site.
  • the protease cleavage site is a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof.
  • TSV tobacco etch virus
  • the open reading frame is codon-optimized for expression in the host cell. In some embodiments, the open reading frame is provided on a vector. In some embodiments, the open reading frame is integrated into a genome of the host cell.
  • the present disclosure provides a culture comprising a host cell described herein in compatible liquid medium.
  • the present disclosure provides a method of producing a retrotransposase, comprising cultivating a host cell described herein in compatible growth medium.
  • the method further comprises inducing expression of the retrotransposase by addition of an additional chemical agent or an increased amount of a nutrient.
  • the additional chemical agent or increased amount of a nutrient comprises Isopropyl P-D-l -thiogalactopyranoside (IPTG) or additional amounts of lactose.
  • the method further comprises isolating the host cell after the cultivation and lysing the host cell to produce a protein extract.
  • the method further comprises subjecting the protein extract to IMAC, or ion-affinity chromatography.
  • the open reading frame comprises a sequence encoding an IMAC affinity tag linked in-frame to a sequence encoding the retrotransposase.
  • the IMAC affinity tag is linked in-frame to the sequence encoding the retrotransposase via a linker sequence encoding protease cleavage site.
  • the protease cleavage site comprises a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof.
  • the method further comprises cleaving the IMAC affinity tag by contacting a protease corresponding to the protease cleavage site to the retrotransposase.
  • the method further comprises performing subtractive IMAC affinity chromatography to remove the affinity tag from a composition comprising the retrotransposase.
  • kits comprising one or more nucleic acid constructs encoding the various components of the retrotransposase or gene editing system described herein, e.g., comprising a nucleotide sequence encoding the components of the retrotransposase or gene editing system capable of modifying a target DNA sequence.
  • the nucleotide sequence comprises a heterologous promoter that drives expression of the gene editing system components.
  • any of the retrotransposase or gene editing systems disclosed herein is assembled into a pharmaceutical, diagnostic, or research kit to facilitate its use in therapeutic, diagnostic, or research applications.
  • a kit may include one or more containers housing any of the vectors disclosed herein and instructions for use.
  • the kit may be designed to facilitate use of the methods described herein by researchers and can take many forms.
  • Each of the compositions of the kit may be provided in liquid form (e.g., in solution), or in solid form, (e.g., a dry powder).
  • some of the compositions may be constitutable or otherwise processable (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water or a cell culture medium), which may or may not be provided with the kit.
  • a suitable solvent or other species for example, water or a cell culture medium
  • Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc.
  • the written instructions in some embodiments, are in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which instructions can also reflect approval by the agency of manufacture, use, or sale for animal administration.
  • Example 1 A method of metagenomic analysis for new proteins
  • Samples for metagenomic analysis were collected from sediment, soil, and animals. Samples were collected with consent of property owners. Additional raw sequence data from public sources included animal microbiomes, sediment, soil, hot springs, hydrothermal vents, marine, peat bogs, permafrost, and sewage sequences. Deoxyribonucleic acid (DNA) was extracted with a DNA mini-prep kit and sequenced. Metagenomic sequence data was searched using Hidden Markov Models generated based on documented retrotransposase protein sequences to identify new retrotransposases. Retrotransposase proteins identified by the search were aligned to documented proteins to identify potential active sites. This metagenomic workflow resulted in the delineation of the MG140 family described herein.
  • Example 1 Analysis of the data from the metagenomic analysis of Example 1 revealed a new cluster of undescribed putative retrotransposase systems comprising 1 family (MG140). The corresponding protein sequences for these new enzymes and their subdomains are presented as SEQ ID NOs: 1-16 and 32-47.
  • Integrase activity can be interrogated via expression in an E. coli lysate-based expression system.
  • the required components for in vitro testing are three plasmids: an expression plasmid with the retrotransposon gene(s) under a T7 promoter, a target plasmid, and a donor plasmid which contains the required 5’ and 3’ UTR sequences recognized by the retrotransposase around a selection marker gene (e.g., Tet resistance gene).
  • the lysate-based expression products, target DNA, and donor plasmid are incubated to allow for transposition to occur. Transposition is detected via PCR.
  • the transposition product will be tagmented with T5 and sequenced via NGS to determine the insertion sites on a population of transposition events.
  • the in vitro transposition products can be transformed into E. coli under antibiotic (e.g., Tet) selection, where growth requires the selection marker to be stably inserted into a plasmid. Either single colonies or a population of E. coli can be sequenced to determine the insertion sites.
  • Integration efficiency can be measured via ddPCR or qPCR of the experimental output of target DNA with integrated cargo, normalized to the amount of unmodified target DNA also measured via ddPCR.
  • This assay may also be conducted with purified protein components rather than from lysate-based expression.
  • the proteins are expressed in E. coli protease-deficient B strain under T7 inducible promoter, the cells are lysed using sonication, and the His-tagged protein of interest is purified using Ni-NTA affinity chromatography on an FPLC. Purity is determined using densitometry of the protein bands resolved on SDS-PAGE and Coomassie stained acrylamide gels.
  • the protein is desalted in storage buffer composed of 50 mM Tris-HCl, 300 mM NaCl, 1 mM TCEP, 5% glycerol; pH 7.5 (or other buffers as determined for maximum stability) and stored at -80°C. After purification the transposon gene(s) are added to the target DNA and donor plasmid as described above in a reaction buffer, for example 26 mM HEPES pH
  • the retrotransposon ends are tested for retrotransposase binding via an electrophoretic mobility shift assay (EMSA).
  • ESA electrophoretic mobility shift assay
  • a target DNA fragment 100-500 bp
  • FAM-labeled primers 100-500 bp
  • the 3’ UTR RNA and 5’ UTR RNA are generated in vitro using T7 RNA polymerase and purified.
  • the retrotransposase proteins are synthesized in an in vitro transcription/translation system.
  • binding buffer e.g. 20 mM HEPES pH 7.5, 2.5 mM Tris pH 7.5, 10 mM NaCl, 0.0625 mM EDTA, 5 mM TCEP, 0.005% BSA, 1 ug/mL poly(dl-dC), and 5% glycerol.
  • binding buffer e.g. 20 mM HEPES pH 7.5, 2.5 mM Tris pH 7.5, 10 mM NaCl, 0.0625 mM EDTA, 5 mM TCEP, 0.005% BSA, 1 ug/mL poly(dl-dC), and 5% glycerol.
  • 6X loading buffer 60 mM KC1, 10 mM Tris pH
  • Engineered E. coli strains are transformed with a plasmid expressing the retrotransposon genes and a plasmid containing a temperature-sensitive origin of replication with a selectable marker flanked by 5’ and 3’ UTR of the retrotransposon required for integration. Transformants induced for expression of these genes are then screened for transfer of the marker to a genomic target by selection at restrictive temperature for plasmid replication and the marker integration in the genome is confirmed by PCR.
  • Integrations are screened using an unbiased approach.
  • purified gDNA is tagmented with Tn5
  • DNA of interest is then PCR amplified using primers specific to the Tn5 tagmentation and the selectable marker.
  • the amplicons are then prepared for NGS sequencing. Analysis of the resulting sequences is trimmed of the transposon sequences and flanking sequences are mapped to the genome to determine insertion position, and insertion rates are determined.
  • Example 7 Integration of reverse transcribed DNA into mammalian genomes (prophetic)
  • the integrase proteins are purified in E. coli or sf9 cells with 2 NLS peptides either in the N, C or both terminus of the protein sequence.
  • a plasmid containing a selectable neomycin resistance marker (NeoR) or a fluorescent marker flanked by the 5’ and 3’ UTR regions required for transposition and under control of a CMV promoter are synthesized. Cells are be transfected with the plasmid, recovered for 4-6 hours for RNA transcription, and subsequently electroporated with purified integrase proteins.
  • NiR selectable neomycin resistance marker
  • Antibiotic resistance integration into the genome is quantified by G418 -resistant colony counts (selection to start 7 days post-transfection), and positive transposition by the fluorescent marker is assayed by fluorescence activated cell cytometry. 7-10 days after the second transfection, genomic DNA is extracted and used for the preparation of an NGS library. Off target frequency is assayed by fragmenting the genome and preparing amplicons of the transposon marker and flanking DNA for NGS library preparation. At least 40 different target sites are chosen for testing each targeting system’s activity.
  • RNA delivery An RNA encoding the retrotransposase with 2 NLS is designed, and cap and polyA tail are added. A second RNA is designed containing a selectable neomycin resistance marker (NeoR) or a fluorescent marker flanked by the 5’ and 3’ UTR regions.
  • NeoR neomycin resistance marker
  • the RNA constructs are introduced into mammalian cells via a liposome transfection reagent. 10 days post-transfection, genomic DNA is extracted to measure transposition efficiency using ddPCR and NGS.
  • the domain sequences were clustered at 50% identity over 80% coverage and, representative sequences (26,824 in total) were aligned, and the domain alignment was used to infer a phylogenetic tree.
  • Phylogenetic analysis of RT domains suggests that many different classes of RTs with high sequence diversity were recovered (FIG. 4).
  • the MG148 family of retrotransposon-associated RTs includes extremely divergent RT homologs, predicted to be active by the presence of all expected catalytic residues and multiple Zn-binding ribbon motifs (FIGs. 5A and 5B). Alignment at the nucleotide level for several family members uncovered conserved regions within the 5’ UTR, which are possibly involved in RT function, activity, or mobilization (FIG. 5C).
  • RNA template 200 nt
  • reaction buffer containing 40 mM Tris-HCl (pH 7.5), 0.2 M NaCl, 10 mM MgCh, 1 mM TCEP, and 0.5 mM dNTPs.
  • the resulting full-length cDNA product was quantified by qPCR by extrapolating values from a standard curve generated with the DNA template of known concentrations.
  • MG148 family members MG140-33-R2 through MG140-34-R2 (SEQ ID NOs: 5-6), MG140-42-R2 through MG140-44-R2 (SEQ ID NOs: 14- 16), and MG148-12 (SEQ ID NO: 32) are active at cDNA synthesis as determined by primer extension (FIG. 6).
  • Example 10 Group II intron RTs (MG153 family)
  • Group II introns are capable of integrating large cargo into a target site via reverse transcription of an RNA template.
  • RT domains from Group II introns were identified and delineated in the phylogenetic tree in FIG. 4. Over 10,000 unique full-length Group II intron proteins containing RT domains from contigs with > 2 kb of sequence flanking the RT enzyme were aligned. A phylogenetic tree was inferred from this alignment and Group II intron families were further identified (FIGs. 7A-7B).
  • Group II introns of Class C were identified, and their domain architecture includes an RT domain predicted to be active, as well as a maturase domain involved in intron mobilization. Some Group II intron proteins contain an additional endonuclease domain likely involved in target recognition and cleavage. Many candidates from all families identified were nominated for laboratory characterization.
  • GII intron Class C (MG153) RTs were assessed by a primer extension reaction containing RT enzyme derived from a cell-free expression system. Expression constructs were codon-optimized for E. coli and contained an N-terminal single Strep tag. Expression of the RT was confirmed by SDS-PAGE analysis.
  • the substrate for the reaction was 100 nM of RNA template (200 nt) annealed to a 5 ’-FAM labeled primer.
  • the reaction buffer contained the following components: 50 mM Tris-HCl (pH 8.0), 75 mM KC1, 3 mM MgCh, 10 mM DTT, and 0.5 mM dNTPs.
  • RNA loading dye Following incubation at 37 °C for 1 h, the reaction was quenched via incubation with RnaseH, followed by the addition of 2X RNA loading dye. The resulting cDNA product(s) were separated on a 10% denaturing polyacrylamide gel and were visualized using an imaging system. RT activity was also assessed by qPCR with primers that amplify the full-length cDNA product. Products from the primer extension assay were diluted to ensure cDNA concentrations were within the linear range of detection. The amount of cDNA was quantified by extrapolating values from a standard curve generated with the DNA template of known concentrations.
  • a plasmid containing MCP fused to the RT candidate under CMV promoter was cloned and isolated for transfection in HEK293T cells. Transfection was performed using lipofectamine 2000. mRNA coding for nanoluciferase was made using an mRNA synthesizer according to the manufacturer instructions. In order to degrade any DNA template left in the mRNA preparation, the reaction was treated with DNase for 1 hour, and the mRNA is cleaned using a Clean-Up kit. The mRNA was hybridized to a complementary DNA primer in lOmM Tris pH 7.5, 50mM NaCl at 95 °C for 2 min and cooled to 4 °C at the rate of 0.1 °C/s.
  • the mRNA/DNA hybrid was transfected into HEK293T cells using a liposome based transfection reagent 6 hours after the plasmid containing the MCP-RT fusion was transfected. 18 hours post mRNA/DNA transfection, cells were lysed using DNA Extraction Solution, 100 pL of quick extract was added per 24 well in a 24 well plate.
  • the nanoluciferase is -500 bp long, primers to amplify products of 100 bp and 542 bp from the newly synthesized cDNA were designed.
  • cDNA was amplified using the set of primers mentioned above, and PCR products were detected by agarose gel electrophoresis or DNA Tape Station.
  • GII RTs The ability of GII RTs to synthesize cDNA in a mammalian cell environment was tested as previously described with a small modification. cDNA synthesis was previously detected using PCR and analyzed by agarose gel electrophoresis. In order to have a quantitative readout, a Taqman qPCR assay was developed using Taqman qPCR primers previously described with a Taqman probe “ACTCTGTGAGCGGATCTTGGCTTAGCC” (SEQ ID NO: 70). MG153-23 and MG153-24 RTs were active to various degrees, with MG153-23 nearly as active as the TGIRT control (FIG. 12).
  • Retrons are DNA elements of approximately 2000 bp in length that encode an RT-coding gene (ret) and a contiguous non-coding RNA containing inverted sequences, the msr and msd. Retrons employ a unique mechanism for RT-DNA synthesis, in which the ncRNA template folds into a conserved secondary structure, insulated between two inverted repeats (al/a2). The retron RT recognizes the folded ncRNA, and reverse transcription is initiated from a conserved guanosine 2’OH adjacent to the inverted repeats, forming a 2’-5’ linkage between the template RNA and the nascent cDNA strand.
  • this 2’ -5’ linkage persists into the mature form of processed RT-DNA, while in others an exonuclease cleaves the DNA product resulting in a free 5’ end.
  • the RT only targets the msr-msd derived from the same retron as its RNA template, providing specificity that may avoid off-target reverse transcription.
  • a divergent group of “retron-like” single-domain RT sequences were identified within the retron clade in FIG. 4.
  • the single-domain RTs of the MG160 family range between 250 and 300 aa and are predicted to be active based on the presence of expected RT catalytic residues [F/Y]XDD.
  • the 5’ UTR of the MG160 family are conserved among family members and fold into conserved secondary structures (FIG. 10) that are likely important for element activity or mobilization.
  • MG160 family The in vitro activity of retron-like RTs (MG160 family) was assessed by a primer extension reaction containing RT enzyme derived from a cell-free expression system. Expression constructs were codon-optimized for E. coli and contained an N-terminal single Strep tag.
  • the substrate for the reaction was 100 nM of RNA template (200 nt) annealed to a 5 ’-FAM labeled primer.
  • the reaction buffer contained the following components: 50 mM Tris-HCl (pH 8.0), 75 mM KC1, 3 mM MgCh, 10 mM DTT, and 0.5 mM dNTPs.
  • Retron RTs are produced in a cell-free expression system by incubating 10 ng/pL of a DNA template encoding the E. co/z-optimized gene with an N-terminal single Strep tag with the in vitro transcription/translation system components for 2 h at 37 °C. All tested retron RTs are expressed as indicated by SDS-PAGE analysis.
  • the retron ncRNAs are generated using a T7 in vitro transcription kit and a DNA template encoding the respective ncRNA gene following a T7 promoter. The reaction is then incubated with DNase-I to eliminate the DNA template and purified by an RNA cleanup kit. Quantity of the ncRNA is determined by nanodrop and the purity assessed by electrophoretic RNA analysis.
  • the retron RT enzyme is produced in a cell-free expression system using a construct containing an E. coli codon-optimized gene with an N-terminal single Strep tag as described above. Expression of the enzyme is confirmed by SDS-PAGE analysis. Retron RT activity on a general template is determined by a primer extension assay as described above, containing a 200 nt RNA annealed to a 5 ’-FAM labeled DNA primer. The resulting cDNA product(s) are detected on a denaturing polyacrylamide gel or by qPCR with primers specific for the full-length cDNA product.
  • Retron RT in vitro activity on its own ncRNA is assessed in a reaction containing buffer, dNTPs, the retron RT produced from a cell-free expression system, and the refolded ncRNA.
  • RT activity before and after purification of the RT from the cell-free expression system via the N-terminal single Strep tag is compared. After incubation, half of the reaction is treated with RNase A/Tl. Products before and after RNase A/Tl treatment are evaluated on a denaturing polyacrylamide gel and visualized by SYBR gold staining.
  • RNase A/Tl should digest away the RNA template and result in a mass shift towards a smaller product containing only the ssDNA.
  • RNase H Since RNase H is expected to improve homogeneity of the 5’ and 3’ ssDNA boundaries, the impact of RNase H on the distribution of products is also evaluated by gel analysis.
  • the covalent linkage between the ncRNA template and ssDNA is confirmed by incubating the RT product with a 5’ to 3’ ssDNA exonuclease (RecJ) before or after treatment with a debranching enzyme (DBR1). RecJ should only be able to degrade the ssDNA after DBR1 has removed the 2’-5’ phosphodiester linkage between the RNA and ssDNA.
  • RecJ ssDNA exonuclease
  • DBR1 debranching enzyme
  • Example 14 Determining retron msr-msd boundaries by NGS (prophetic) [0195]
  • the msr-msd boundaries are determined by unbiased ligation of adapter sequences to the 5’ and 3’ end of the msDNA product after removal of the 2’ -5’ phosphodiester linkage by DBR1.
  • the resulting ligated product is PCR-amplified, library prepped, and subjected to next generation sequencing. Sequencing reads are aligned to the reference sequence to determine the 5’ and 3’ boundaries of the msd.
  • the impact of the presence of RNase H in the RT reaction on the homogeneity of 5’ and 3’ msd boundaries is also evaluated.
  • RT activity is assessed using a primer extension assay containing the RT derived from a cell-free expression system and an RNA template annealed to a DNA primer as described above.
  • the resulting cDNA product(s) are detected by a denaturing polyacrylamide gel and qPCR as described above. Detection of cDNA drop-off products on the denaturing gel provides a relative assessment of processivity for candidates.
  • Primer length preference is determined by testing the RT’s activity on an RNA template annealed to 5 ’-FAM labeled DNA primers of either 6, 8, 10, 13, 16, or 20 nucleotides in length.
  • the RT is derived from a cell-free expression system as described above. After incubating the reaction, the reaction is quenched via the addition of RNase H. The size distribution of cDNA products is analyzed on a denaturing polyacrylamide gel as described above.
  • Optimal primer length is determined as the length that enables the RT to convert the most primer into cDNA product. The experimentally determined optimal primer length is then used in subsequent experiments, such as fidelity and processivity assays, to further characterize the RT in vitro.
  • RT fidelity is assessed by a primer extension assay as described above with the exception that a 14-nt unique molecular identifier (UMI) barcode is included in the primer for the reverse transcription reaction.
  • UMI 14-nt unique molecular identifier
  • the resulting full-length cDNA product is PCR-amplified, library-prepped, and subjected to next- generation sequencing. Barcodes with >5 reads are analyzed. After aligning to the reference sequence, mutations, insertions, and deletions are counted only if the error is present in all sequence reads with the same barcode. Errors present in one but not all sequencing reads are considered to be introduced during PCR or sequencing. Further analysis of substitution, insertion, and deletion profile is performed, in addition to identification of mutation hotspots within the RNA template. The fidelity measurements will also be performed with modified bases, e.g., pseudouridine, in the template.
  • RT processivity is evaluated using a primer extension assay containing the RT enzyme derived from a cell-free expression system as described above and RNA templates between 1.6 kb - 6.6 kb in length annealed to either a 5 ’-FAM labeled primer (for gel analysis) or an unlabeled primer (for sequencing analysis).
  • Reverse transcription reactions are performed under single cycle conditions to prevent rebinding of RT enzymes that have dropped off the RNA template during cDNA synthesis.
  • the optimal trap molecule and concentration to achieve single cycle conditions are experimentally determined. The selected condition should provide sufficient inhibition of cDNA synthesis if incubated prior to reaction initiation but otherwise should not impact the velocity of the reaction.
  • Optimal trap molecules to test include unrelated RNA templates and unrelated RNA templates annealed to DNA primers of various lengths.
  • processivity is evaluated by initiating the reaction with the addition of dNTPs and the selected trap molecule after preequilibrating the RT with the RNA template annealed to a DNA primer in the reaction buffer. After incubating the reaction, the reaction is quenched by the addition of RnaseH. The size distribution of cDNA products is analyzed on a denaturing polyacrylamide gel as described above and/or subjected to PCR and library prepped for long-read sequencing. From these experiments, a processivity coefficient is quantified as the template length which yields 50% of the full-length cDNA product.
  • the median length of the cDNA product from the single cycle primer extension reaction is used to estimate the probability that the RT will dissociate on the tested template. From this, the probability that the RT will dissociate at each nucleotide position is calculated, assuming that each dissociation is an independent event and that the probability of dissociation is equal at all nucleotide positions.
  • the processivity coefficient representing the length of template required for 50% of RT dissociated is then determined as 1/(2 *Pd), where Pd is the probability of dissociation at each nucleotide.
  • the RNA template contains one of the following challenge motifs at fixed distance (100-300 nt) downstream of the primer binding site: homopolymeric stretches, thermodynamically stable GC-rich stem loop, pseudoknot, tRNA, GII intron, and RNA template containing base or backbone modifications (i.e., pseudouridine, phosphothiorate bonds).
  • challenge motifs at fixed distance (100-300 nt) downstream of the primer binding site: homopolymeric stretches, thermodynamically stable GC-rich stem loop, pseudoknot, tRNA, GII intron, and RNA template containing base or backbone modifications (i.e., pseudouridine, phosphothiorate bonds).
  • base or backbone modifications i.e., pseudouridine, phosphothiorate bonds
  • the ligated product(s) are then PCR-amplified, and library prepped for next generation sequencing to identify both sites of RT misincorporation/insertions/deletions and sites of RT drop-off with single nucleotide resolution. Extent of RT drop-off at a given position is quantified by comparing the number of sequencing reads corresponding to the drop-off product to the number of sequencing reads corresponding to the full-length product.
  • Non-templated addition of bases to the 5’ end of the cDNA product is evaluated by next generation sequencing.
  • Primer extension reactions containing the RT derived from the cell-free expression system and RNA template are conducted as described above. Systematic analysis of different RNA template lengths and sequence motifs at the 5’ end are tested.
  • An adapter sequence is unbiasedly ligated to the 3’ ends of the resulting cDNA products by T4 ligase, resulting in capture of all cDNA products despite the potential heterogeneous nature of their 3 ’ ends.
  • the ligated product(s) are then PCR-amplified, and library prepped for next generation sequencing. Comparison of the expected full-length cDNA reference sequence to experimentally produced cDNA sequences that are longer than full-length enable identification of both the type and number of base additions to the 5 ’-end that were not templated by the RNA.
  • Proteins of interest are purified via a Twin-strep tag after IPTG-induced overexpression in A. coli. Purified proteins are tested against 1 kb and 4 kb cargos flanked by the 3’ UTRs identified from their native contexts and the 5’ UTRs plus 400 bp past the start codon. The 5’ and 3’ flanking sequences’ effect on activity is assayed via qPCR to sections near the end of the template to determine if cargos with these native features are preferred substrates.
  • Example 23 Human cells cDNA synthesis results (prophetic) [0206] The ability of these enzymes to produce cDNA in a mammalian environment is tested by expressing them in mammalian cells and detecting cDNA synthesis by PCR, followed by agarose electrophoresis.
  • Reverse transcriptases are cloned in a plasmid for mammalian expression under the CMV promoter as fusion proteins having MS2 coat protein (MCP) at the N terminus, in addition to a flag-HA tag (FH).
  • MCP is a protein derived from the MS2 bacteriophage that recognizes a 20 nucleotide RNA stem loop with high affinity (subnanomolar Kd).
  • a plasmid containing MCP fused to the RT candidate under CMV promoter is cloned and isolated for transfection in HEK293T cells. Transfection is performed using lipofectamine 2000. mRNA codifying nanoluciferase is made using mRNA synthesizer. In order to degrade any DNA template left in the mRNA preparation, the reaction is treated with DNase for 1 hour and the mRNA is cleaned using a Transcription Clean-Up kit. The mRNA is hybridized to a complementary DNA primer in lOmM Tris pH 7.5, 50mM NaCl at 95 °C for 2 min and cooled to 4 °C at the rate of 0.1 °C/s.
  • the mRNA/DNA hybrid is transfected into HEK293T cells using Lipofectamine Messenger Max 6 hours after the plasmid containing the MCP-RT fusion was transfected. 18 hours post mRNA/DNA transfection, cells are lysed using a DNA Extraction Solution, 100 pL of quick extract is added per 24 well in a 24 well plate.
  • the nanoluciferase is ⁇ 500bp long, primers to amplify products of 100 bp and 542 bp from the newly synthesized cDNA are designed.
  • cDNA is amplified using the set of primers mentioned above and PCR products are detected by agarose gel electrophoresis.
  • Example 24 - RT cDNA synthesis activity can be harnessed for multiple applications (prophetic)
  • RNA important in RNA biology such as expression, processing, modifications, and half-life, as well as quality control steps in biotechnology
  • RTs used for these purposes include the MMLV RT, AMV RT, and GsI-IIC RT (TGIRT).
  • TGIRT GsI-IIC RT
  • the first two represent retroviral RTs, while the latter is a GII intron-derived RT.
  • GII intron-derived RTs, as well as non-LTR derived RTs show several advantages compared to their retroviral counterparts. For example, they are more processive, reading through structural and modified RNAs.
  • RNAs can’t be properly reverse transcribed by retroviral RTs, as they create early termination products that can be misinterpreted as RNA fragments.
  • the ability to template switch of some RTs can be harnessed for early adaptor addition, removing the adaptor ligation step during library preparation. Therefore, highly processive RTs are suitable for the generation of libraries with complex RNA. Further, some highly processive RTs are generally smaller than currently used retroviral RTs, making their production and associated downstream steps easier. Data disclosed herein demonstrates that several RTs described herein outperform the commercially available TGIRT enzyme, some with over six-fold its cDNA synthesis activity.
  • Example 25 - cDNA synthesis by non-LTR retrotransposon RTs and retron-like RTs
  • Non-LTR retrotransposases are capable of integrating large cargo into a target site via reverse transcription of an RNA template.
  • These reverse transcriptases integrate an RNA template via target primed reverse transcription (TPRT), a mechanism in which cDNA synthesis is primed by the free 3’ hydroxyl group at the target DNA nick.
  • TPRT target primed reverse transcription
  • the MG160 family of RTs are a divergent group of “retron-like” single-domain RT enzymes previously identified within the retron RT clade, which form a distantly branching group. The enzymes are predicted to be active based on the presence of expected RT catalytic residues [F/Y]XDD.
  • RTs The ability of RTs to produce cDNA in a mammalian environment was tested by expressing them in mammalian cells and detecting cDNA synthesis by qPCR.
  • Reverse transcriptases were cloned in a plasmid for mammalian expression under the CMV promoter as fusion proteins having MS2 coat protein (MCP) at the N terminus, in addition to a flag-HA tag (FH).
  • MCP is a protein derived from the MS2 bacteriophage that recognizes a 20 nucleotide RNA stem loop with high affinity (subnanomolar Kd).
  • RNA template By fusing the RTs with MCP and having the MS2 loops in the RNA template, it was ensured that once the RT is translated it finds the RNA template and starts cDNA synthesis from the DNA primer hybridized to the RNA template.
  • a plasmid containing MCP fused to the RT candidate under CMV promoter was cloned and isolated for transfection in HEK293T cells. Transfection was performed using lipofectamine 2000. mRNA codifying dCas9 fused to nanoluciferase was made using a mRNA synthesizer. To degrade any DNA template left in the mRNA preparation the reaction was treated with DNase for 1.5 hours and the mRNA was cleaned up using a Transcription Clean-Up kit.
  • the mRNA was hybridized to a complementary DNA primer in lOmM Tris pH 7.5, 50mM NaCl at 95 °C for 2 min and cooled to 4 °C at the rate of 0.1 °C/s.
  • the mRNA/DNA hybrid was transfected into HEK293T cells 6 hours after the plasmid containing the MCP-RT fusion was transfected. 18 hours post mRNA/DNA transfection cells were lysed using a DNA Extraction Solution. 100 pl of quick extract is added per 24 well in a 24 well plate.
  • the RNA template was -4247 nt. Primers to amplify first and last 100 bps products from the newly synthesized cDNA (4100 bp) were designed, along with taqman probes to quantify their amplification (FIG. 15A).
  • Control group II intron RT TGIRT and control R2 non-LTR retrotransposon RT R2Tg showed a closer FAM/HEX ratio, demonstrating their high processivity (FIGs. 15B and 15C).
  • Five candidates of the MG148 family of non-LTR retrotransposon RTs were tested in mammalian cells (FIG. 15B). All tested candidates showed low activity compared to the control RTs.
  • MG160-7 a retron-like RT was also tested similarly. It displayed poor activity and poor processivity as evidenced by FAM and HEX values that were below background (indicated by dotted line parallel to the x- axis) (FIG. 15C)

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • Mycology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Steroid Compounds (AREA)

Abstract

The present disclosure provides systems and methods for transposing a cargo nucleotide sequence to a target nucleic acid site. These systems and methods may comprise a first double-stranded nucleic acid comprising the cargo nucleotide sequence, wherein the cargo nucleotide sequence is configured to interact with a retrotransposase, and the retrotransposase, wherein the retrotransposase is configured to transpose the cargo nucleotide sequence to the target nucleic acid site.

Description

RETROTRANSPOSON COMPOSITIONS AND METHODS OF USE
CROSS-REFERENCE
[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/386,867, filed December 9, 2022, U.S. Provisional Patent Application No. 63/489,156 filed March 8, 2023, and U.S. Provisional Patent Application No. 63/491,942 filed March 23, 2023, each of which is incorporated by reference in its entirety herein.
BACKGROUND
[0002] Transposable elements are movable DNA sequences and play a crucial role in gene function and evolution. While transposable elements are found in nearly all forms of life, their prevalence varies among organisms, with a large proportion of the eukaryotic genome encoding for transposable elements.
SUMMARY
[0003] While the foundational research on transposable elements was conducted in the 1940s, their potential utility in DNA manipulation and gene editing applications has only been recognized in recent years.
[0004] Described herein, in certain embodiments, are engineered retrotransposase systems, comprising: (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises an amino acid sequence having at least 80% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises an amino acid sequence having at least 95% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase is encoded by a nucleic acid having at least 75% sequence identity to any one of SEQ ID NOs: 17-19, 24, and 76-817. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 80% sequence identity to any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, retrotransposase is encoded by a nucleic acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, retrotransposase is encoded by a nucleic acid sequence having at least 95% sequence identity to any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase has less than 80% sequence identity to a known retrotransposase. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR). In some embodiments, the retrotransposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide. In some embodiments, the retrotransposase comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of the retrotransposase. In some embodiments, the NLS comprises a sequence at least 80% identical to a sequence from the group consisting of SEQ ID NO: 49-64. In some embodiments, the NLS comprises SEQ ID NO: 50. In some embodiments, the NLS is proximal to the N-terminus of the retrotransposase. In some embodiments, the NLS comprises SEQ ID NO: 49. In some embodiments, the NLS is proximal to the C-terminus of the retrotransposase. In some embodiments, the retrotransposase is derived from an uncultivated microorganism.
[0005] Described herein, in certain embodiments, are polypeptides comprising a reverse transcriptase comprising an amino acid sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47 fused N- or C-terminally to a non-retrotransposase domain or an affinity tag. In some embodiments, the non-retrotransposase domain is an RNA-binding protein domain. In some embodiments, the RNA binding protein domain comprises a bacteriophage MS2 coat protein (MCP) domain.
[0006] Described herein, in certain embodiments, are nucleic acids encoding the engineered retrotransposase system described herein or the polypeptide described herein.
[0007] Described herein, in certain embodiments, are methods for modifying a target nucleic acid sequence comprising contacting the target nucleic acid sequence using the engineered nuclease system described herein. In some embodiments, modifying the target nucleic acid sequence comprises binding, nicking, or cleaving, the target nucleic acid sequence. In some embodiments, the target nucleic acid sequence comprises genomic DNA, viral DNA, viral RNA, or bacterial DNA. In some embodiments, the target nucleic acid sequence comprises deoxyribonucleic acid (DNA). In some embodiments, the modification is in vitro. In some embodiments, the modification is in vivo. In some embodiments, the modification is ex vivo. [0008] Described herein, in certain embodiments, are methods of modifying a target nucleic acid sequence in a mammalian cell comprising contacting the mammalian cell using the engineered nuclease system described herein.
[0009] Described herein, in certain embodiments, are vectors comprising the nucleic acid described herein. In some embodiments, the vector is a plasmid, a minicircle, a CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus. [0010] Described herein, in certain embodiments, are cells comprising the engineered nuclease system described herein or the polypeptide described herein. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is an immortalized cell. In some embodiments, the cell is an insect cell. In some embodiments, the cell is a yeast cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is a fungal cell. In some embodiments, the cell is a prokaryotic cell. In some embodiments, the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, primary cell, or a derivative thereof. In some embodiments, the cell is an engineered cell. In some embodiments, the cell is a stable cell.
[0011] In some aspects, the present disclosure provides for an engineered retrotransposase system, comprising: (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence, wherein the cargo nucleotide sequence is configured to interact with a retrotransposase; and (b) a retrotransposase, wherein: (i) the retrotransposase is configured to transpose the cargo nucleotide sequence to a target nucleic acid locus; and (ii) the retrotransposase is derived from an uncultivated microorganism. In some embodiments, the retrotransposase comprises a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a reverse transcriptase domain. In some embodiments, the retrotransposase further comprises one or more zinc finger domains. In some embodiments, the retrotransposase further comprises an endonuclease domain. In some embodiments, the retrotransposase has less than 80% sequence identity to a known retrotransposase. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR). In some embodiments, the retrotransposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate. In some embodiments, the retrotransposase comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of the retrotransposase. In some embodiments, the NLS comprises a sequence at least 80% identical to a sequence from the group consisting of SEQ ID NO: 49-64. In some embodiments, the sequence identity is determined by a BLASTP, CLUSTALW, MUSCLE, MAFFT, or CLUSTALW with the parameters of the Smith-Waterman homology search algorithm. In some embodiments, the sequence identity is determined by the BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment. [0012] In some aspects, the present disclosure provides for an engineered retrotransposase system, comprising: (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence, wherein the cargo nucleotide sequence is configured to interact with a retrotransposase; and (b) a retrotransposase, wherein: (i) the retrotransposase is configured to transpose the cargo nucleotide sequence to a target nucleic acid locus; and (ii) the retrotransposase comprises a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase is derived from an uncultivated microorganism. In some embodiments, the retrotransposase comprises a reverse transcriptase domain. In some embodiments, the retrotransposase further comprises one or more zinc finger domains. In some embodiments, the retrotransposase further comprises an endonuclease domain. In some embodiments, the retrotransposase has less than 80% sequence identity to a known retrotransposase. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR). In some embodiments, the retrotransposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate. In some embodiments, the sequence identity is determined by a BLASTP, CLUSTALW, MUSCLE, MAFFT, or CLUSTALW with the parameters of the Smith- Waterman homology search algorithm. In some embodiments, the sequence identity is determined by the BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment.
[0013] In some aspects, the present disclosure provides for a deoxyribonucleic acid polynucleotide encoding the engineered retrotransposase system of any one of the aspects or embodiments described herein.
[0014] In some aspects, the present disclosure provides for a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a retrotransposase, and wherein the retrotransposase is derived from an uncultivated microorganism, wherein the organism is not the uncultivated microorganism. In some embodiments, the retrotransposase comprises at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence encoding one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of the retrotransposase. In some embodiments, the NLS comprises a sequence selected from SEQ ID NOs: 49-64. In some embodiments, the NLS comprises SEQ ID NO: 50. In some embodiments, the NLS is proximal to the N-terminus of the retrotransposase. In some embodiments, the NLS comprises SEQ ID NO: 49. In some embodiments, the NLS is proximal to the C-terminus of the retrotransposase. In some embodiments, the organism is prokaryotic, bacterial, eukaryotic, fungal, plant, mammalian, rodent, or human.
[0015] In some aspects, the present disclosure provides for a vector comprising the nucleic acid of any one of the aspects or embodiments described herein. In some embodiments, the method further comprises a nucleic acid encoding a cargo nucleotide sequence configured to form a complex with the retrotransposase. In some embodiments, the vector is a plasmid, a minicircle, a CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.
[0016] In some aspects, the present disclosure provides for a cell comprising the vector of any one of any one of the aspects or embodiments described herein
[0017] In some aspects, the present disclosure provides for a method of manufacturing a retrotransposase, comprising cultivating the cell of any one of the aspects or embodiments described herein.
[0018] In some aspects, the present disclosure provides for a method for binding, nicking, cleaving, marking, modifying, or transposing a double-stranded deoxyribonucleic acid polynucleotide, comprising: (a) contacting the double-stranded deoxyribonucleic acid polynucleotide with a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid locus; and (b) wherein the retrotransposase comprises a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase is derived from an uncultivated microorganism. In some embodiments, the retrotransposase comprises a reverse transcriptase domain. In some embodiments, the retrotransposase further comprises one or more zinc finger domains. In some embodiments, the retrotransposase further comprises an endonuclease domain. In some embodiments, the retrotransposase has less than 80% sequence identity to a known retrotransposase. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR). In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is transposed via a ribonucleic acid polynucleotide intermediate. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide. [0019] In some aspects, the present disclosure provides for a method of modifying a target nucleic acid locus, the method comprising delivering to the target nucleic acid locus the engineered retrotransposase system of any one of the aspects or embodiments described herein, wherein the retrotransposase is configured to transpose the cargo nucleotide sequence to the target nucleic acid locus, and wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies the target nucleic acid locus. In some embodiments, the target nucleic acid locus comprises binding, nicking, cleaving, marking, modifying, or transposing the target nucleic acid locus. In some embodiments, the target nucleic acid locus comprises deoxyribonucleic acid (DNA). In some embodiments, the target nucleic acid locus comprises genomic DNA, viral DNA, or bacterial DNA. In some embodiments, the target nucleic acid locus is in vitro. In some embodiments, the target nucleic acid locus is within a cell. In some embodiments, the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, a human cell, or a primary cell. In some embodiments, the cell is a primary cell. In some embodiments, the primary cell is a T cell. In some embodiments, the primary cell is a hematopoietic stem cell (HSC). In some embodiments, delivering the engineered retrotransposase system to the target nucleic acid locus comprises delivering the nucleic acid of any one of the aspects or embodiments described herein or the vector of any one of the aspects or embodiments described herein. In some embodiments, delivering the engineered retrotransposase system to the target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding the retrotransposase. In some embodiments, the nucleic acid comprises a promoter to which the open reading frame encoding the retrotransposase is operably linked. In some embodiments, delivering the engineered retrotransposase system to the target nucleic acid locus comprises delivering a capped mRNA containing the open reading frame encoding the retrotransposase. In some embodiments, delivering the engineered retrotransposase system to the target nucleic acid locus comprises delivering a translated polypeptide. In some embodiments, the retrotransposase does not induce a break at or proximal to the target nucleic acid locus. [0020] In some aspects, the present disclosure provides for a host cell comprising an open reading frame encoding a heterologous retrotransposase having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47 or a variant thereof. In some embodiments, the host cell is an E. coli cell. In some embodiments, the E. coli cell is a ZDE3 lysogen or the E. coli cell is a BL21 (DE3) strain. In some embodiments, the E. coli cell has an ompT Ion genotype. In some embodiments, the open reading frame is operably linked to a T7 promoter sequence, a T7-lac promoter sequence, a lac promoter sequence, a tac promoter sequence, a trc promoter sequence, a ParaBAD promoter sequence, a PrhaBAD promoter sequence, a T5 promoter sequence, a cspA promoter sequence, an ara uAD promoter, a strong leftward promoter from phage lambda (pL promoter), or any combination thereof. In some embodiments, the open reading frame comprises a sequence encoding an affinity tag linked in-frame to a sequence encoding the retrotransposase. In some embodiments, the affinity tag is an immobilized metal affinity chromatography (IMAC) tag. In some embodiments, the IMAC tag is a polyhistidine tag. In some embodiments, the affinity tag is a myc tag, a human influenza hemagglutinin (HA) tag, a maltose binding protein (MBP) tag, a glutathione S-transferase (GST) tag, a streptavidin tag, a FLAG tag, or any combination thereof. In some embodiments, the affinity tag is linked in-frame to the sequence encoding the retrotransposase via a linker sequence encoding a protease cleavage site. In some embodiments, the protease cleavage site is a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof. In some embodiments, the open reading frame is codon-optimized for expression in the host cell. In some embodiments, the open reading frame is provided on a vector. In some embodiments, the open reading frame is integrated into a genome of the host cell.
[0021] In some aspects, the present disclosure provides for a culture comprising the host cell of any one of the aspects or embodiments described herein in compatible liquid medium.
[0022] In some aspects, the present disclosure provides for a method of producing a retrotransposase, comprising cultivating the host cell of any one of the aspects or embodiments described herein in compatible growth medium. In some embodiments, the method further comprising inducing expression of the retrotransposase by addition of an additional chemical agent or an increased amount of a nutrient. In some embodiments, the additional chemical agent or increased amount of a nutrient comprises Isopropyl P-D-l -thiogalactopyranoside (IPTG) or additional amounts of lactose. In some embodiments, the method further comprising isolating the host cell after the cultivation and lysing the host cell to produce a protein extract. In some embodiments, the method further comprises subjecting the protein extract to IMAC, or ionaffinity chromatography. In some embodiments, the open reading frame comprises a sequence encoding an IMAC affinity tag linked in-frame to a sequence encoding the retrotransposase. In some embodiments, the IMAC affinity tag is linked in-frame to the sequence encoding the retrotransposase via a linker sequence encoding protease cleavage site. In some embodiments, the protease cleavage site comprises a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof. In some embodiments, the method further comprises cleaving the IMAC affinity tag by contacting a protease corresponding to the protease cleavage site to the retrotransposase. In some embodiments, the method further comprises performing subtractive IMAC affinity chromatography to remove the affinity tag from a composition comprising the retrotransposase.
[0023] In some aspects, the present disclosure provides for a method of disrupting a locus in a cell, comprising contacting to the cell a composition comprising: (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence, wherein the cargo nucleotide sequence is configured to interact with a retrotransposase; and(b) a retrotransposase, wherein: (i) the retrotransposase is configured to transpose the cargo nucleotide sequence to a target nucleic acid locus; (ii) the retrotransposase comprises a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47; and (iii) the retrotransposase has at least equivalent transposition activity to a known retrotransposase in a cell. In some embodiments, the transposition activity is measured in vitro by introducing the retrotransposase to cells comprising the target nucleic acid locus and detecting transposition of the target nucleic acid locus in the cells. In some embodiments, the composition comprises 20 pmoles or less of the retrotransposase. In some embodiments, the composition comprises 1 pmol or less of the retrotransposase.
[0024] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] The novel features of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings of which:
[0026] FIG. 1 depicts the genomic context of a bacterial retrotransposon. MG140-34 is a predicted retrotransposase (arrow) encoding a reverse transcriptase domain. Regions flanking the retrotransposase display secondary structure that possibly represent binding sites for the retrotransposase (secondary structure boxes and zoomed images).
[0027] FIG. 2 shows microbial MG retrotransposases (black branches on clade 4) are more closely related to Eukaryotic than viral retrotransposases (grey branches on clade 6). Clade 1 : Telomerase reverse transcriptases; clade 2: Group II intron reverse transcriptases; clade 3: Eukaryotic R1 type retrotransposases; clade 4: microbial and Eukaryotic R2 retrotransposases; clade 5: Eukaryotic retrovirus-related reverse transcriptases; and clade 6: viral reverse transcriptases.
[0028] FIG. 3 depicts Clades 3 and 4 from the phylogenetic gene tree from (A). Some microbial MG retrotransposases contain multiple Zn-finger motifs (vertical rectangles), the conserved RVT l reverse transcriptase domain, and APE/RLE or other endonuclease domains (top and bottom panel). Some microbial MG retrotransposases lack an endonuclease domain (mid-panel).
[0029] FIG. 4 depicts a phylogenetic tree inferred from a multiple sequence alignment of the reverse transcriptase domain from diverse enzymes. RT sequences were derived from DNA, as well as RNA assemblies. Reference RTs were included in the tree for classification purposes. [0030] FIG. 5 A depicts a phylogenetic tree inferred from a multiple sequence alignment of RT domains identified from families of RTs (MG148).
[0031] FIG. 5B depicts genomic context of MG140-34-R2 RT. Predicted genes not associated with the RT are displayed as white arrows.
[0032] FIG. 5C depicts nucleotide sequence alignment of four members of the MG148 family indicating conserved regions (boxes underneath the sequence) upstream of the RT (arrow annotated over the consensus sequence).
[0033] FIG. 6 depicts screening of in vitro activity of RTns family of enzymes by qPCR (MG148). Activity was detected by qPCR using primers that amplify the full-length cDNA product derived from a primer extension reaction containing the respective RT. Samples are derived from RT reactions containing 100 nM substrate. The negative control is a no-template water in the in vitro transcript! on/translati on system reaction. Positive control: R2Tg (Taeniopygia guttata), a previously described retrotransposon. Active candidates, defined as at least 10-fold signal above the negative control, are marked in dark grey while candidates inactive in these conditions are in light grey.
[0034] FIG. 7A depicts a phylogenetic tree inferred from a multiple sequence alignment of full-length Group II intron RTs identified sequences of Class C.
[0035] FIG. 7B depicts a summary table of the MG153 family of Group II introns. AAI: average pairwise amino acid identity of family members to reference Group II intron sequences. [0036] FIGs. 8A and 8B depict screening of in vitro activity of GII intron Class C candidates MG1 53-22, MG153-23, and MG153-24 by primer extension assay. FIG. 8A lane numbers correspond to the following: 1-PURExpress (in vitro transcript! on/translati on system) no template control, 2-MMLV control RT, 3-TGIRT-III control RT, 4-MarathonRT control RT, 5-7 correspond to candidates MG153-22 through 24. Numbering in bold corresponds to gel lanes with active candidates. Results are representative of two independent experiments. FIG. 8B depicts detection of full-length cDNA production by qPCR. Dark grey bars correspond to RTs that generate product at least 10-fold above background. Results were determined from two technical replicates.
[0037] FIG. 9 depicts screening to assess the ability of indicated control RTs and GII intron Class C candidates to synthesize cDNA in mammalian cells. Detection of 542 bp PCR products by D1000 TapeStation for MG153-23. Lanes not relevant for the described experiment are covered by black boxes.
[0038] FIG. 10 depicts the genomic context of the MG160-7 retron-like single-domain RT. The region upstream from the RT (dotted box) is conserved across MG160 members and folds into secondary structures (inset) that may be required for activity and function.
[0039] FIGs. 11A and 11B depict screening of in vitro activity of retron-like candidate MG160-7 by primer extension assay. FIG. 11A lane numbers correspond to the following samples: 1-PURExpress (in vitro transcription/translation system) no template control, 2-MMLV control RT, 3-TGIRT-III control RT, 4: MG160-7. FIG. 11B depicts quantification of full-length cDNA production by qPCR. Dark grey bars correspond to RTs that generate product at least 10- fold above background. Results were determined from two technical replicates.
[0040] FIG. 12 depicts a screening of the ability of MG153 GII derived RTs to synthesize cDNA in mammalian cells. Detection of 542 bp cDNA synthesis PCR products were assayed by Taqman qPCR. cDNA activity was normalized to the activity TGIRT control where TGIRT represents a value of 1. Y axis is shown in log 10 scale.
[0041] FIGs. 13A and 13B depict protein expression of MG153 GII derived RTs by immunoblots. FIG. 13A: Cells were transfected with plasmids containing the candidate RTs and protein expression was evaluated by immunoblot, detecting the HA peptide fused to the N termini of the RTs. All lanes were normalized to total protein concentration. Lanes not relevant for the described experiment in FIG. 13A are covered by black boxes. FIG. 13B: Table of expected molecular sizes for tested RTs.
[0042] FIG. 14 depicts relative activity of MG153-23 GII derived RT normalized to protein expression. cDNA synthesis was detected by Taqman qPCR, protein expression was detected by immunoblots. Activity relative to TGIRT was normalized per total protein concentration. Y axis is shown in a linear scale.
[0043] FIGs. 15A-15C depict a screen of the ability of indicated control RTs and candidates RTs to synthesize cDNA in mammalian cells. FIG. 15A depicts a schematic illustration showing the methodology used to detect cDNA synthesis in mammalian cells. The first (FAM) and last (HEX) 100 bps of a 4. Ikb RNA template are detected using Taqman based qPCR. Taqman qPCR was used to detect the first (FAM probe) and last (HEX probe) 100 bp PCR products amplified from cDNA synthesized from an RNA template by MG148 family of non-LTR retrotransposon derived RTs (FIG. 15B) and retron-like MG160-7 (FIG. 15C). BRIEF DESCRIPTION OF THE SEQUENCE LISTING
[0044] The Sequence Listing filed herewith provides exemplary polynucleotide and polypeptide sequences for use in methods, compositions, and systems according to the disclosure. Below are exemplary descriptions of sequences therein.
MG140
[0045] SEQ ID NOs: 1-16 show the full-length peptide sequences of MG140 transposition proteins.
MG148
[0046] SEQ ID NOs: 32-41 show the full-length peptide sequences of MG148 reverse transcriptase proteins.
[0047] SEQ ID NOs: 25-31 show the nucleotide sequences of genes encoding HA-His-tagged MG148 reverse transcriptase proteins.
[0048] SEQ ID NOs: 76-80 show the nucleotide sequences of genes encoding MG148 reverse transcriptase proteins optimized for expression in mammalian cells.
MG153
[0049] SEQ ID NOs: 42-44 show the full-length peptide sequences of MG153 reverse transcriptase proteins.
[0050] SEQ ID NOs: 17-19 show the nucleotide sequences of E. coli codon optimized genes encoding MG153 reverse transcriptase proteins.
[0051] SEQ ID NOs: 20-23 show the nucleotide sequences of genes encoding strep-tagged MG153 reverse transcriptase proteins.
MG160
[0052] SEQ ID NOs: 45-47 shows the full-length peptide sequences of MG160 reverse transcriptase proteins.
[0053] SEQ ID NO: 24 shows the nucleotide sequence of an E. coli codon optimized gene encoding an MG160 reverse transcriptase protein.
[0054] SEQ ID NO: 48 shows the nucleotide sequence of a genes encoding an MG160 reverse transcriptase protein optimized for expression in mammalian cells and cloned into a tethered spCas9 (H840A) plasmid.
[0055] SEQ ID NO: 81 shows the nucleotide sequences of genes encoding MG160 reverse transcriptase proteins optimized for expression in mammalian cells.
Other Sequences
[0056] SEQ ID NOs: 66-69 show the nucleotide sequences of primers.
[0057] SEQ ID NOs: 70-71 show the nucleotide sequences of Taqman probes for qPCR. [0058] SEQ ID NO: 65 shows the nucleotide sequence of an RNA template for cDNA synthesis.
[0059] SEQ ID NOs: 72-75 show the nucleotide sequences of genes encoding control reverse transcriptase proteins optimized for expression in mammalian cells.
DETAILED DESCRIPTION
[0060] While various embodiments of the disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed.
[0061] The practice of some methods disclosed herein employ, unless otherwise indicated, techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA.
[0062] As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, to the extent that the terms “including,” “includes,” “having,” “has,” “with,” or variants thereof are used in either the detailed description and/or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising”.
[0063] The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within one or more than one standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value.
[0064] The term “nucleotide,” as used herein, refers to a base-sugar-phosphate combination. Contemplated nucleotides include naturally occurring nucleotides and synthetic nucleotides. Nucleotides are monomeric units of a nucleic acid sequence (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)). The term nucleotide includes ribonucleoside triphosphates adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP, dCTP, diTP, dUTP, dGTP, dTTP, or derivatives thereof. Such derivatives include, for example, [aS]dATP, 7-deaza-dGTP and 7-deaza-dATP, and nucleotide derivatives that confer nuclease resistance on the nucleic acid molecule containing them. The term nucleotide as used herein encompasses dideoxyribonucleoside triphosphates (ddNTPs) and their derivatives. Illustrative examples of ddNTPs include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP. A nucleotide may be unlabeled or detectably labeled, such as using moieties comprising optically detectable moieties (e.g, fluorophores) or quantum dots. Detectable labels include, for example, radioactive isotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels, and enzyme labels. Fluorescent labels of nucleotides include but are not limited fluorescein, 5- carboxyfluorescein (FAM), 2'7'-dimethoxy-4'5-dichloro-6-carboxyfluorescein (JOE), rhodamine, 6-carboxyrhodamine (R6G), N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy- X-rhodamine (ROX), 4-(4 'dimethylaminophenylazo) benzoic acid (DABCYL), Cascade Blue, Oregon Green, Texas Red, Cyanine and 5-(2'-aminoethyl)aminonaphthalene-l-sulfonic acid (EDANS). Specific examples of fluorescently labeled nucleotides include [R6G]dUTP, [TAMRA]dUTP, [R110]dCTP, [R6G]dCTP, [TAMRA]dCTP, [JOE]ddATP, [R6G]ddATP, [FAM]ddCTP, [R110]ddCTP, [TAMRA]ddGTP, [ROX]ddTTP, [dR6G]ddATP, [dR110]ddCTP, [dTAMRA]ddGTP, and [dROX]ddTTP available from Perkin Elmer, Foster City, Calif;
FluoroLink DeoxyNucleotides, FluoroLink Cy3-dCTP, FluoroLink Cy5-dCTP, FluoroLink Fluor X-dCTP, FluoroLink Cy3-dUTP, and FluoroLink Cy5-dUTP available from Amersham, Arlington Heights, IL; Fluorescein- 15 -dATP, Fluorescein- 12-dUTP, Tetramethyl-rodamine-6- dUTP, IR770-9-dATP, Fluorescein- 12-ddUTP, Fluorescein- 12-UTP, and Fluorescein- 15-2'- dATP available from Boehringer Mannheim, Indianapolis, Ind.; and Chromosome Labeled Nucleotides, BODIPY-FL-14-UTP, BODIPY-FL-4-UTP, B0DIPY-TMR-14-UTP, BODIPY- TMR-14-dUTP, B0DIPY-TR-14-UTP, BODIPY-TR-14-dUTP, Cascade Blue-7-UTP, Cascade Blue-7-dUTP, fluorescein- 12-UTP, fluorescein- 12-dUTP, Oregon Green 488-5-dUTP, Rhodamine Green-5-UTP, Rhodamine Green-5-dUTP, tetramethylrhodamine-6-UTP, tetramethylrhodamine-6-dUTP, Texas Red-5-UTP, Texas Red-5-dUTP, and Texas Red-12-dUTP available from Molecular Probes, Eugene, Oreg. The term nucleotide encompasses chemically modified nucleotides. An exemplary chemically-modified nucleotide is biotin-dNTP. Nonlimiting examples of biotinylated dNTPs include, biotin-dATP (e.g, bio-N6-ddATP, biotin- 14- dATP), biotin-dCTP (e.g., biotin- 11-dCTP, biotin- 14-dCTP), and biotin-dUTP e.g., biotin-11- dUTP, biotin- 16-dUTP, biotin-20-dUTP).
[0065] The terms “polynucleotide,” “oligonucleotide,” and “nucleic acid” are used interchangeably to refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof, either in single-, double-, or multistranded form. Contemplated polynucleotides include a gene or fragment thereof. Exemplary polynucleotides include, but are not limited to, DNA, RNA, coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger
RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, cell-free polynucleotides including cell-free DNA (cfDNA) and cell-free RNA (cfRNA), nucleic acid probes, and primers. In a polynucleotide when referring to a T, a T means U (Uracil) in RNA and T (Thymine) in DNA. A polynucleotide can be exogenous or endogenous to a cell and/or exist in a cell-free environment. The term polynucleotide encompasses modified polynucleotides (e.g., altered backbone, sugar, or nucleobase). If present, modifications to the nucleotide structure are imparted before or after assembly of the polymer. Non-limiting examples of modifications include: 5-bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos, locked nucleic acids, glycol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or fluorescein linked to the sugar), thiol-containing nucleotides, biotin-linked nucleotides, fluorescent base analogs, CpG islands, methyl -7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, queuosine, and wyosine. The sequence of nucleotides may be interrupted by non-nucleotide components.
[0066] The terms “transfection” or “transfected” refer to introduction of a nucleic acid into a cell by non-viral or viral-based methods. The nucleic acid molecules may be gene sequences encoding complete proteins or functional portions thereof.
[0067] The terms “peptide,” “polypeptide,” and “protein” are used interchangeably herein to refer to a polymer of at least two amino acid residues joined by peptide bond(s). This term does not connote a specific length of polymer, nor is it intended to imply or distinguish whether the peptide is produced using recombinant techniques, chemical or enzymatic synthesis, or is naturally occurring. The terms apply to naturally occurring amino acid polymers as well as amino acid polymers comprising at least one modified amino acid. In some cases, the polymer is interrupted by non-amino acids. The terms include amino acid chains of any length, including full length proteins, and proteins with or without secondary or tertiary structure (e.g., domains). The terms also encompass an amino acid polymer that has been modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, oxidation, and any other manipulation such as conjugation with a labeling component. The terms “amino acid” and “amino acids,” as used herein, refer to natural and non-natural amino acids, including, but not limited to, modified amino acids. Modified amino acids include amino acids that have been chemically modified to include a group or a chemical moiety not naturally present on the amino acid. The term “amino acid” includes both D-amino acids and L-amino acids.
[0068] As used herein, the “non-native” refers to a nucleic acid or polypeptide sequence that is non-naturally occurring. Non-native refers to a non-naturally occurring nucleic acid or polypeptide sequence that comprises modifications such as mutations, insertions, or deletions. The term non-native encompasses fusion nucleic acids or polypeptides that encodes or exhibits an activity (e.g, enzymatic activity, methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.) of the nucleic acid or polypeptide sequence to which the non-native sequence is fused. A non-native nucleic acid or polypeptide sequence includes those linked to a naturally-occurring nucleic acid or polypeptide sequence (or a variant thereof) by genetic engineering to generate a chimeric nucleic acid or polypeptide sequence encoding a chimeric nucleic acid or polypeptide.
[0069] The term “promoter”, as used herein, refers to the regulatory DNA region which controls transcription or expression of a polynucleotide (e.g., a gene) and which may be located adjacent to or overlapping a nucleotide or region of nucleotides at which RNA transcription is initiated. A promoter may contain specific DNA sequences which bind protein factors, often referred to as transcription factors, which facilitate binding of RNA polymerase to the DNA leading to gene transcription. Eukaryotic basal promoters typically, though not necessarily, contain a TATA-box and/or a CAAT box.
[0070] The term “expression,” as used herein, refers to the process by which a nucleic acid sequence or a polynucleotide is transcribed from a DNA template (such as into mRNA or other RNA transcript) and/or the process by which a transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides may be collectively referred to as “gene product.” If the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell.
[0071] As used herein, “operably linked”, “operable linkage”, “operatively linked”, or grammatical equivalents thereof refer to an arrangement of genetic elements, e.g., a promoter, an enhancer, a polyadenylation sequence, etc., wherein an operation (e.g, movement or activation) of a first genetic element has some effect on the second genetic element. The effect on the second genetic element can be, but need not be, of the same type as operation of the first genetic element. For example, two genetic elements are operably linked if movement of the first element causes an activation of the second element. For instance, a regulatory element, which may comprise promoter and/or enhancer sequences, is operatively linked to a coding region if the regulatory element helps initiate transcription of the coding sequence. There may be intervening residues between the regulatory element and coding region so long as this functional relationship is maintained.
[0072] A “vector” as used herein, refers to a macromolecule or association of macromolecules that comprises or associates with a polynucleotide and which mediates delivery of the polynucleotide to a cell. Examples of vectors include nucleic-based vectors (e.g., plasmids and viral vectors) and liposomes. An exemplary nucleic-acid based vector comprises genetic elements, e.g., regulatory elements, operatively linked to a gene to facilitate expression of the gene in a target.
[0073] As used herein, “expression cassette” and “nucleic acid cassette” are used interchangeably to refer to a component of a vector comprising a combination of nucleic acid sequences or elements (e.g., therapeutic gene, promoter, and a terminator) that are expressed together or are operably linked for expression. The terms encompass an expression cassette including a combination of regulatory elements and a gene or genes to which they are operably linked for expression.
[0074] A “functional fragment” of a DNA or protein sequence refers to a fragment that retains a biological activity (either functional or structural) that is substantially similar to a biological activity of the full-length DNA or protein sequence. A biological activity of a DNA sequence includes its ability to influence expression in a manner attributed to the full-length sequence. [0075] The terms “engineered,” “synthetic,” and “artificial” are used interchangeably herein to refer to an object that has been modified by human intervention. For example, the terms refer to a polynucleotide or polypeptide that is non-naturally occurring. An engineered peptide has, but does not require, low sequence identity (e.g., less than 50% sequence identity, less than 25% sequence identity, less than 10% sequence identity, less than 5% sequence identity, less than 1% sequence identity) to a naturally occurring human protein. For example, VPR and VP64 domains are synthetic transactivation domains. Non-limiting examples include the following: a nucleic acid modified by changing its sequence to a sequence that does not occur in nature; a nucleic acid modified by ligating it to a nucleic acid that it does not associate with in nature such that the ligated product possesses a function not present in the original nucleic acid; an engineered nucleic acid synthesized in vitro with a sequence that does not exist in nature; a protein modified by changing its amino acid sequence to a sequence that does not exist in nature; an engineered protein acquiring a new function or property. An “engineered” system comprises at least one engineered component.
[0076] As used herein, the term “transposable element” refers to a DNA sequence that can move from one location in the genome to another (i.e., they can be “transposed”). Transposable elements can be generally divided into two classes. Class I transposable elements, or “retrotransposons”, are transposed via transcription and translation of an RNA intermediate which is subsequently reincorporated into its new location into the genome via reverse transcription (a process mediated by a reverse transcriptase). Class II transposable elements, or “DNA transposons”, are transposed via a complex of single- or double-stranded DNA flanked on either side by a transposase. [0077] As used herein, the term “retrotransposons” refers to Class I transposable elements that function according to a two-part “copy and paste” mechanism involving an RNA intermediate. “Retrotransposase” refers to an enzyme responsible for transposition of a retrotransposon. The retrotransposase can comprise a reverse transcriptase domain, one or more zinc finger domains, an endonuclease domain, or combinations thereof.
[0078] As used herein, the terms “gene editing” and “genome editing” can be used interchangeably. Gene editing or genome editing means to change the nucleic acid sequence of a gene or a genome. Genome editing can include, for example, insertions, deletions, and mutations. Genome editing can be performed by a gene editing system, for example a retrotransposase. [0079] As used herein, the term “complex” refers to a joining of at least two components. The two components may each retain the properties/activities they had prior to forming the complex or gain properties as a result of forming the complex. The joining includes, but is not limited to, covalent bonding, non-covalent bonding (i.e., hydrogen bonding, ionic interactions, Van der Waals interactions, and hydrophobic bond), use of a linker, fusion, or any other suitable method. Contemplated components of the complex include polynucleotides, polypeptides, or combinations thereof. For example, a complex comprises an endonuclease and a guide polynucleotide.
[0080] The term “sequence identity” or “percent identity” in the context of two or more nucleic acids or polypeptide sequences, refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as measured using a sequence comparison algorithm. Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment for polypeptide sequences longer than 30 residues; BLASTP using parameters of a wordlength (W) of 2, an expectation (E) of 1000000, and the PAM30 scoring matrix setting gap costs at 9 to open gaps and 1 to extend gaps for sequences of less than 30 residues (these are the default parameters for BLASTP in the BLAST suite available at https://blast.ncbi.nlm.nih.gov); CLUSTALW with the Smith -Waterman homology search algorithm parameters with a match of 2, a mismatch of -1, and a gap of -1; MUSCLE with default parameters; MAFFT with parameters of a retree of 2 and max iterations of 1000; Novafold with default parameters; HMMER hmmalign with default parameters.
[0081] The term “optimally aligned” in the context of two or more nucleic acids or polypeptide sequences, refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned to maximal correspondence of amino acids residues or nucleotides, for example, as determined by the alignment producing a highest or “optimized” percent identity score.
[0082] The term “open reading frame” or “ORF” refers to a nucleotide sequence that can encode a protein, or a portion of a protein. An open reading frame can begin with a start codon (represented as, e.g., AUG for an RNA molecule and ATG in a DNA molecule in the standard code) and can be read in codon-triplets until the frame ends with a STOP codon (represented as, e.g., UAA, UGA, or UAG for an RNA molecule and TAA, TGA, or TAG in a DNA molecule in the standard code).
[0083] Included in the current disclosure are variants of any of the enzymes described herein with one or more conservative amino acid substitutions. Such conservative substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three-dimensional structure or function of the polypeptide. Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity, and R chain length for one another. Additionally, or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., non-conserved residues) without altering the basic functions of the encoded proteins. Such conservatively substituted variants may include variants with at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of the retrotransposase protein sequences described herein (e.g., MG140, MG 148, MG 153, or MG160 family retrotransposases described herein, or any other family retrotransposase described herein). In some embodiments, such conservatively substituted variants are functional variants. Such functional variants can encompass sequences with substitutions such that the activity of one or more critical active site residues of the retrotransposase are not disrupted. In some embodiments, a functional variant of any of the proteins described herein lacks substitution of at least one of the conserved or functional residue. In some embodiments, a functional variant of any of the proteins described herein lacks substitution of all of the conserved or functional residues.
[0084] Also included in the current disclosure are variants of any of the enzymes described herein with substitution of one or more catalytic residues to decrease or eliminate activity of the enzyme (e.g., decreased-activity variants). In some embodiments, a decreased activity variant as a protein described herein comprises a disrupting substitution of at least one, at least two, or all three catalytic residues.
[0085] Conservative substitution tables providing functionally similar amino acids are available from a variety of references (see, for e.g., Creighton, Proteins: Structures and Molecular Properties (W H Freeman & Co.; 2nd edition (December 1993)). The following eight groups each contain amino acids that are conservative substitutions for one another:
1) Alanine (A), Glycine (G);
2) Aspartic acid (D), Glutamic acid (E);
3) Asparagine (N), Glutamine (Q);
4) Arginine (R), Lysine (K);
5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);
7) Serine (S), Threonine (T); and
8) Cysteine (C), Methionine (M)
Overview
[0086] The discovery of new transposable elements with unique functionality and structure may offer the potential to further disrupt deoxyribonucleic acid (DNA) editing technologies, improving speed, specificity, functionality, and ease of use. Relative to the predicted prevalence of transposable elements in microbes and the sheer diversity of microbial species, relatively few functionally characterized transposable elements exist in the literature. This is partly because a huge number of microbial species may not be readily cultivated in laboratory conditions. Metagenomic sequencing from natural environmental niches containing large numbers of microbial species may offer the potential to drastically increase the number of new transposable elements known and speed the discovery of new oligonucleotide editing functionalities.
[0087] Transposable elements are deoxyribonucleic acid sequences that can change position within a genome, often resulting in the generation or amelioration of mutations. In eukaryotes, a great proportion of the genome, and a large share of the mass of cellular DNA, is attributable to transposable elements. Although transposable elements are “selfish genes” which propagate themselves at the expense of other genes, they have been found to serve various important functions and to be crucial to genome evolution. Based on their mechanism, transposable elements are classified as either Class I “retrotransposons” or Class II “DNA transposons”.
[0088] Class I transposable elements, also referred to as retrotransposons, function according to a two-part “copy and paste” mechanism involving an RNA intermediate. First, the retrotransposon is transcribed. The resulting RNA is subsequently converted back to DNA by reverse transcriptase (generally encoded by the retrotransposon itself), and the reverse transcribed retrotransposon is integrated into its new position in the genome by integrase. Retrotransposons are further classified into three orders. Retrotransposons with long terminal repeats (“LTRs”) encode reverse transcriptase and are flanked by long strands of repeating DNA. Retrotransposons with long interspersed nuclear elements (“LINEs”) encode reverse transcriptase, lack LTRs, and are transcribed by RNA polymerase II. Retrotransposons with short interspersed nuclear elements (“SINEs”) are transcribed by RNA polymerase III but lack reverse transcriptase, instead relying on the reverse transcription machinery of other transposable elements (e.g., LINEs).
[0089] Class II transposable elements, also referred to as DNA transposons, function according to mechanisms that do not involve an RNA intermediate. Many DNA transposons display a “cut and paste” mechanism in which transposase binds terminal inverted repeats (“TIRs”) flanking the transposon, cleaves the transposon from the donor region, and inserts it into the target region of the genome. Others, referred to as “helitrons,” display a “rolling circle” mechanism involving a single-stranded DNA intermediate and mediated by an undocumented protein believed to possess HUH endonuclease function and 5’ to 3’ helicase activity. First, a circular strand of DNA is nicked to create two single DNA strands. The protein remains attached to the 5’ phosphate of the nicked strand, leaving the 3’ hydroxyl end of the complementary strand exposed and thus allowing a polymerase to replicate the non-nicked strand. Once replication is complete, the new strand disassociates and is itself replicated along with the original template strand. Still other DNA transposons, “Polintons,” are theorized to undergo a “self-synthesis” mechanism. The transposition is initiated by an integrase’s excision of a single-stranded extra-chromosomal Polinton element, which forms a racket-like structure. The Polinton undergoes replication with DNA polymerase B, and the double stranded Polinton is inserted into the genome by the integrase. Additionally, some DNA transposons, such as those in the IS200/IS605 family, proceed via a “peel and paste” mechanism in which TnpA excises a piece of single-stranded DNA (as a circular “transposon joint”) from the lagging strand template of the donor gene and reinserts it into the replication fork of the target gene.
[0090] While transposable elements have found some use as biological tools, documented transposable elements do not encompass the full range of possible biodiversity and targetability, and may not represent all possible activities. Here, thousands of genomic fragments were mined from numerous metagenomes for transposable elements. The documented diversity of transposable elements may have been expanded and novel systems may have been developed into highly targetable, compact, and precise gene editing agents.
MG Enzymes [0091] Described herein, in certain embodiments, are retrotransposases. In some embodiments, the retrotransposase is a MG140, MG148, MG153, or MG160, retrotransposase. (see FIG. 1). In some embodiments, the retrotransposases are less than about 1,400 amino acids in length. In some embodiments, the retrotransposases simplify delivery and extend therapeutic applications. [0092] In some embodiments, the present disclosure provides for an engineered retrotransposase system discovered through metagenomic sequencing. In some embodiments, the metagenomic sequencing is conducted on samples. In some embodiments, the samples are collected from a variety of environments. In some embodiments, the environment is a human microbiome, an animal microbiome, environments with high temperatures, environments with low temperatures. In some embodiments, the environment includes sediment.
[0093] In some embodiments, the present disclosure provides for an engineered retrotransposase system comprising a retrotransposase derived from an uncultivated microorganism. In some embodiments, the retrotransposase is configured to bind a 3’ untranslated region (UTR). In some embodiments, the retrotransposase binds a 5’ untranslated region (UTR).
[0094] In some embodiments, the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 1-16 and 32-47.
[0095] In some embodiments, the retrotransposase is a MG140 retrotransposase (i.e., SEQ ID NOs: 1-16). In some embodiments, the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 1-16. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 1-16.
[0096] In some embodiments, the retrotransposase is a MG148 retrotransposase (i.e., SEQ ID NOs: 32-41). In some embodiments, the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 32-41. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 32-41.
[0097] In some embodiments, the retrotransposase is a MG153 retrotransposase (i.e., SEQ ID NOs: 42-44). In some embodiments, the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 42-44. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 42-44.
[0098] In some embodiments, the retrotransposase is a MG160 retrotransposase (i.e., SEQ ID NOs: 45-47). In some embodiments, the retrotransposase comprises a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 45-47. In some embodiments, the retrotransposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 45-47.
[0099] In some embodiments, the retrotransposase is encoded by a nucleic acid sequence that is codon optimized. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence that is codon optimized for expression in a mammalian cell. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 70% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 75% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 80% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76- 81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 85% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 90% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 95% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 96% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 97% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 98% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 99% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76- 81. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence of any one of SEQ ID NOs: 17-19, 24, and 76-81.
[0100] In some embodiments, the retrotransposase is tagged with a tag such as a His-tag or strep-tag or tethered to an enzyme (e.g., spCas9). In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about
55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about
80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about
93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about
98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 70% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 75% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 80% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 85% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 90% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 95% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 96% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 97% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 98% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence having at least 99% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48. In some embodiments, the retrotransposase is encoded by a nucleic acid sequence of any one of SEQ ID NOs: 20-23, 25-31, and 48.
[0101] In some embodiments, the retrotransposase comprises a reverse transcriptase domain. In some embodiments, the retrotransposase further comprises one or more zinc finger domains. In some embodiments, the retrotransposase further comprises an endonuclease finger domain.
[0102] In some embodiments, the retrotransposase has less than about 90%, less than about 85%, less than about 80%, less than about 75%, less than about 70%, less than about 65%, less than about 60%, less than about 55%, less than about 50%, less than about 45%, less than about 40%, less than about 35%, less than about 30%, less than about 25%, less than about 20%, less than about 15%, less than about 10%, or less than about 5% sequence identity to a known or documented retrotransposase.
[0103] In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR). [0104] In some embodiments, the retrotransposase comprises one or more nuclear localization sequences (NLSs). In some embodiments, the NLS is proximal to the N- or C-terminus of the retrotransposase. In some embodiments, the NLS is appended N-terminal or C-terminal of the retrotransposase and comprise any one of SEQ ID NOs: 49-64, or having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 80% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 85% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 90% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 91% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 92% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 93% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 94% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 95% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 96% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 97% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 98% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having at least about 99% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having 100% identity to SEQ ID NOs: 49-64. In some cases, the NLS comprises a sequence having 100% identity to SEQ ID NO: 49. In some cases, the NLS comprises a sequence having 100% identity to SEQ ID NOs: 50.
Table 1: Example NLS Sequences that may be used with retrotransposases according to the disclosure
Figure imgf000029_0001
Figure imgf000030_0001
[0105] In some embodiments, the retrotransposase comprises a tag. In some embodiments, the tag is an affinity tag. Exemplary affinity tags include, but are not limited to, a His-tag, a Flag tag, a Myc-tag, an MBP-tag, and a GST-tag.
[0106] In some embodiments, the retrotransposase comprises a protease cleavage site. Exemplary protease cleavage sites include, but are not limited to, a TEV site, a C3 site, a Factor Xa site, and an Enterokinase site.
[0107] In some embodiments, the retrotransposase is tethered to a site directed nuclease. In some embodiments, the retrotransposase is fused to a site directed nuclease. In some embodiments, the retrotransposase is recruited to a site directed nuclease. In some embodiments, the site directed nuclease is an endonuclease. In some embodiments, the site directed nuclease is a Cas nuclease. In some embodiments, the Cas nuclease is an RNA guided CRISPR Cas9 nuclease. In some embodiments, the site directed nuclease is a dead nuclease or a nickase. In some embodiments, the site directed nuclease brings the retrotransposase into close proximity of a target site that is to be modified.
Guide Nucleic Acids
[0108] In some embodiments, the retrotransposase system further comprises a site directed nuclease and a guide RNA (e.g., gRNA). In a polynucleotide when referring to a T, a T means U (Uracil) in RNA and T (Thymine) in DNA. In some embodiments, the retrotransposase systems and described herein comprise a means for directing the site directed nuclease to a particular location in the target nucleic acid.
[0109] In some embodiments, the guide RNA comprises synthetic nucleotides or modified nucleotides. In some embodiments, the guide RNA comprises one or more inter-nucleoside linkers modified from the natural phosphodiester. In some embodiments, all of the internucleoside linkers of the guide RNA, or contiguous nucleotide sequence thereof, are modified. For example, in some embodiments, the inter nucleoside linkage comprises Sulphur (S), such as a phosphorothioate inter-nucleoside linkage.
[0110] In some embodiments, the guide RNA comprises modifications to a ribose sugar or nucleobase. In some embodiments, the guide RNA comprises one or more nucleosides comprising a modified sugar moiety, wherein the modified sugar moiety is a modification of the sugar moiety when compared to the ribose sugar moiety found in deoxyribose nucleic acid (DNA) and RNA. In some embodiments, the modification is within the ribose ring structure. Exemplary modifications include, but are not limited to, replacement with a hexose ring (HNA), a bicyclic ring having a biradical bridge between the C2 and C4 carbons on the ribose ring (e.g., locked nucleic acids (LNA)), or an unlinked ribose ring which typically lacks a bond between the C2 and C3 carbons (e.g., UNA). In some embodiments, the sugar-modified nucleosides comprise bicyclohexose nucleic acids or tricyclic nucleic acids. In some embodiments, the modified nucleosides comprise nucleosides where the sugar moiety is replaced with a non-sugar moiety, for example peptide nucleic acids (PNA) or morpholino nucleic acids.
[OHl] In some embodiments, the guide RNA comprises one or more modified sugars. In some embodiments, the sugar modifications comprise modifications made by altering the substituent groups on the ribose ring to groups other than hydrogen, or the 2 ’-OH group naturally found in DNA and RNA nucleosides. In some embodiments, substituents are introduced at the 2’, 3’, 4’, or 5’ positions, or combinations thereof. In some embodiments, nucleosides with modified sugar moieties comprise 2’ modified nucleosides, e.g., 2’ substituted nucleosides. A 2’ sugar modified nucleoside, in some embodiments, is a nucleoside that has a substituent other than -H or -OH at the 2’ position (2’ substituted nucleoside) or comprises a 2’ linked biradical, and comprises 2’ substituted nucleosides and LNA (2’-4’ biradical bridged) nucleosides. Examples of 2’- substituted modified nucleosides comprise, but are not limited to, 2’-O-alkyl-RNA, 2’-O-methyl- RNA, 2 ’-alkoxy -RNA, 2’-O-methoxyethyl-RNA (MOE), 2’-amino-DNA, 2’-Fluoro-RNA, and 2’-F-ANA nucleosides. In some embodiments, the modification in the ribose group comprises a modification at the 2’ position of the ribose group. In some embodiments, the modification at the 2’ position of the ribose group is selected from the group consisting of 2’-O-methyl, 2’ -fluoro, 2’-deoxy, and 2’-O-(2-methoxyethyl).
[0112] In some embodiments, the guide RNA comprises one or more modified sugars. In some embodiments, the guide RNA comprises only modified sugars. In certain embodiments, the guide RNA comprises greater than about 10%, 25%, 50%, 75%, or 90% modified sugars. In some embodiments, the modified sugar is a bicyclic sugar. In some embodiments, the modified sugar comprises a 2’-O-methoxyethyl group. In some embodiments, the guide RNA comprises both inter-nucleoside linker modifications and nucleoside modifications.
[0113] In some cases, the guide RNA comprises a sequence complementary to a eukaryotic, fungal, plant, mammalian, or human genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a eukaryotic genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a fungal genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a plant genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a mammalian genomic polynucleotide sequence. In some cases, the guide RNA comprises a sequence complementary to a human genomic polynucleotide sequence.
[0114] In some cases, the guide RNA is 30-400 nucleotides in length. In some cases, the guide RNA is 85-245 nucleotides in length. In some cases, the guide RNA is more than 90 nucleotides in length. In some cases, the guide RNA is less than 245 nucleotides in length. In some embodiments, the guide RNA is 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 220, 240, or more than 240 nucleotides in length. In some embodiments, the guide RNA is about 30 to about 40, about 30 to about 50, about 30 to about 60, about 30 to about 70, about 30 to about 80, about 30 to about 90, about 30 to about 100, about 30 to about 120, about 30 to about 140, about 30 to about 160, about 30 to about 180, about 30 to about 200, about 30 to about 220, about 30 to about 240, about 50 to about 60, about 50 to about 70, about 50 to about 80, about 50 to about 90, about 50 to about 100, about 50 to about 120, about 50 to about 140, about 50 to about 160, about 50 to about 180, about 50 to about 200, about 50 to about 220, about 50 to about 240, about 100 to about 120, about 100 to about 140, about 100 to about 160, about 100 to about 180, about 100 to about 200, about 100 to about 220, about 100 to about 240, about 160 to about 180, about 160 to about 200, about 160 to about 220, or about 160 to about 240 nucleotides in length.
[0115] In some embodiments, the sequence is determined by a BLASTP, CLUSTALW, MUSCLE, or MAFFT algorithm, or a CLUSTALW algorithm with the Smith-Waterman homology search algorithm parameters. In some embodiments, the sequence is determined by the BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment.
Cargo Nucleic Acids
[0116] In some embodiments, the retrotransposase system comprises a cargo nucleic acid or polynucleotide. In some embodiments, the cargo nucleic acid is comprised in a double-stranded deoxyribonucleic acid. In some embodiments, the cargo nucleic acid is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR).
[0117] In some embodiments, the cargo nucleic acid comprises synthetic nucleotides or modified nucleotides. In some embodiments, the cargo nucleic acid comprises one or more internucleoside linkers modified from the natural phosphodiester. In some embodiments, all of the inter-nucleoside linkers of the cargo nucleic acid, or contiguous nucleotide sequence thereof, are modified. For example, in some embodiments, the inter-nucleoside linkage comprises Sulphur (S), such as a phosphorothioate inter-nucleoside linkage.
[0118] In some embodiments, the cargo nucleic acid comprises modifications to a ribose sugar or nucleobase. In some embodiments, the cargo nucleic acid comprises one or more nucleosides comprising a modified sugar moiety, wherein the modified sugar moiety is a modification of the sugar moiety when compared to the ribose sugar moiety found in deoxyribose nucleic acid (DNA) and RNA. In some embodiments, the modification is within the ribose ring structure. Exemplary modifications include, but are not limited to, replacement with a hexose ring (EINA), a bicyclic ring having a biradical bridge between the C2 and C4 carbons on the ribose ring (c.g, locked nucleic acids (LNA)), or an unlinked ribose ring which typically lacks a bond between the C2 and C3 carbons (e.g., UNA). In some embodiments, the sugar-modified nucleosides comprise bicyclohexose nucleic acids or tricyclic nucleic acids. In some embodiments, the modified nucleosides comprise nucleosides where the sugar moiety is replaced with a non-sugar moiety, for example peptide nucleic acids (PNA) or morpholino nucleic acids.
[0119] In some embodiments, the cargo nucleic acid comprises one or more modified sugars. In some embodiments, the sugar modifications comprise modifications made by altering the substituent groups on the ribose ring to groups other than hydrogen, or the 2’ -OH group naturally found in DNA and RNA nucleosides. In some embodiments, substituents are introduced at the 2’, 3’, 4’, 5’ positions, or combinations thereof. In some embodiments, nucleosides with modified sugar moieties comprise 2’ modified nucleosides, e.g., 2’ substituted nucleosides. A 2’ sugar modified nucleoside, in some embodiments, is a nucleoside that has a substituent other than -H or -OH at the 2’ position (2’ substituted nucleoside) or comprises a 2’ linked biradical, and comprises 2’ substituted nucleosides and LNA (2’ -4’ biradical bridged) nucleosides. Examples of 2 ’-substituted modified nucleosides comprise, but are not limited to, 2’-O-alkyl-RNA, 2’-O- methyl-RNA, 2 ’-alkoxy -RNA, 2 ’-O-m ethoxy ethyl -RNA (MOE), 2’-amino-DNA, 2’-Fluoro- RNA, and 2’-F-ANA nucleosides. In some embodiments, the modification in the ribose group comprises a modification at the 2’ position of the ribose group. In some embodiments, the modification at the 2’ position of the ribose group is selected from the group consisting of 2’-O- methyl, 2’-fluoro, 2’-deoxy, and 2’-O-(2-methoxy ethyl).
[0120] In some embodiments, the cargo nucleic acid comprises one or more modified sugars. In some embodiments, the cargo nucleic acid comprises only modified sugars. In certain embodiments, the cargo nucleic acid comprises greater than about 10%, 25%, 50%, 75%, or 90% modified sugars. In some embodiments, the modified sugar is a bicyclic sugar. In some embodiments, the modified sugar comprises a 2’ -O-m ethoxy ethyl group. In some embodiments, the cargo nucleic acid comprises both inter-nucleoside linker modifications and nucleoside modifications.
MG Systems
[0121] Described herein, in certain embodiments, are engineered retrotransposase system, comprising: (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence. In some embodiments, engineered retrotransposase systems described herein comprise a means for cutting a target nucleic acid sequence.
[0122] In some embodiments, the engineered retrotransposase system comprises (a) a doublestranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 70% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 80% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a doublestranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 85% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 95% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a doublestranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 96% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 97% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 98% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a doublestranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 99% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. In some embodiments, the engineered retrotransposase system comprises (a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and (b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having 100% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47. Cells
[0123] Described herein, in certain embodiments, is a cell comprising the systems described herein.
[0124] In some embodiments, the cell is a eukaryotic cell (e.g., a plant cell, an animal cell, a protist cell, or a fungi cell), a mammalian cell (a Chinese hamster ovary (CHO) cell, baby hamster kidney (BHK), human embryo kidney (HEK), mouse myeloma (NSO), or human retinal cells), an immortalized cell e.g., a HeLa cell, a COS cell, a HEK-293T cell, a MDCK cell, a 3T3 cell, a PC12 cell, a Huh7 cell, a HepG2 cell, a K562 cell, a N2a cell, or a SY5Y cell), an insect cell e.g., a Spodoptera frugiperda cell, a Trichoplusia ni cell, Drosophila melanogaster cell, a S2 cell, or a Heliothis virescens cell), a yeast cell (e.g., a Saccharomyces cerevisiae cell, a Cryptococcus cell, or a Candida cell), a plant cell e.g., a parenchyma cell, a collenchyma cell, or a sclerenchyma cell), a fungal cell e.g., a Saccharomyces cerevisiae cell, a Cryptococcus cell, or a Candida cell), or a prokaryotic cell e.g., E. coli cell, a streptococcus bacterium cell, a streptomyces soil bacteria cell, or an archaea cell). In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is an immortalized cell. In some embodiments, the cell is an insect cell. In some embodiments, the cell is a yeast cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is a fungal cell. In some embodiments, the cell is a prokaryotic cell.
[0125] In some embodiments, the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, a primary cell, or derivative thereof. In some embodiments, the cell is an engineered cell. In some embodiments, the cell is a stable cell (i.e., a cell that has constant expression of a specific gene or protein).
Delivery and Vectors
[0126] Disclosed herein, in some embodiments, are nucleic acid sequences encoding the engineered retrotransposase systems described herein.
[0127] In some embodiments, the present disclosure provides a nucleic acid comprising an engineered nucleic acid sequence encoding a retrotransposase described herein. In some embodiments, the engineered nucleic acid sequence encoding a retrotransposase is optimized for expression in an organism. In some embodiments, the retrotransposase is derived from an uncultivated microorganism. In some embodiments, the organism is not the uncultivated organism.
[0128] In some embodiments, the organism is prokaryotic. In some embodiments, the organism is bacterial. In some embodiments, the organism is eukaryotic. In some embodiments, the organism is fungal. In some embodiments, the organism is a plant. In some embodiments, the organism is mammalian. In some embodiments, the organism is a rodent. In some embodiments, the organism is human.
[0129] In some embodiments, the nucleic acid encoding the engineered retrotransposase system is a DNA, for example a linear DNA, a plasmid DNA, or a minicircle DNA. In some embodiments, the nucleic acid encoding the engineered nuclease system is an RNA, for example a mRNA.
[0130] In some embodiments, the nucleic acid encoding the engineered retrotransposase systems is delivered by a nucleic acid-based vector. In some embodiments, the nucleic acidbased vector is plasmid (e.g., circular DNA molecules that can autonomously replicate inside a cell), cosmid (e.g., pWE or sCos vectors), artificial chromosome, human artificial chromosome (HAC), yeast artificial chromosomes (YAC), bacterial artificial chromosome (BAC), Pl -derived artificial chromosomes (PAC), phagemid, phage derivative, bacmid, or virus. In some embodiments, the vector is selected from the group consisting of: pSF-CMV-NEO-NH2-PPT- 3XFLAG, pSF-CMV-NEO-COOH-3XFLAG, pSF-CMV-PURO-NH2-GST-TEV, pSF-OXB20- COOH-TEV-FLAG(R)-6His, pCEP4 pDEST27, pSF-CMV-Ub-KrYFP, pSF-CMV-FMDV- daGFP, pEFla-mCherry-Nl vector, pEFla-tdTomato vector, pSF-CMV-FMDV-Hygro, pSF- CMV-PGK-Puro, pMCP-tag(m), pSF-CMV-PURO-NH2-CMYC, pSF-OXB20-BetaGal,pSF- OXB20-Fhic, pSF-OXB20, pSF-Tac, pRI 101-AN DNA, pCambia2301, pTYB21 pKLAC2, pAc5.1/V5-His A, and pDEST8.
[0131] In some embodiments, the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus. In some embodiments, the virus is an alphavirus. In some embodiments, the virus is a parvovirus. In some embodiments, the virus is an adenovirus. In some embodiments, the virus is an AAV. In some embodiments, the virus is a baculovirus. In some embodiments, the virus is a Dengue virus. In some embodiments, the virus is a lentivirus. In some embodiments, the virus is a herpesvirus. In some embodiments, the virus is a poxvirus. In some embodiments, the virus is an anellovirus. In some embodiments, the virus is a bocavirus. In some embodiments, the virus is a vaccinia virus. In some embodiments, the virus is a retrovirus.
[0132] In some embodiments, the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV- rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV-Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11, AAV-HSC12, AAV- HSC13, AAV-HSC14, AAV-HSC15, AAV-TT, AAV-DJ/8, AAV-Myo, AAV-NP40, AAV- NP59, AAV-NP22, AAV-NP66, AAV-HSC16, or a derivative thereof. In some embodiments, the herpesvirus is HSV type 1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8.
[0133] In some embodiments, the nucleic acid encoding the engineered retrotransposase system is delivered by a non-nucleic acid-based delivery system (e.g., a non-viral delivery system). In some embodiments, the non-viral delivery system is a liposome. In some embodiments, the nucleic acid is associated with a lipid. The nucleic acid associated with a lipid, in some embodiments, is encapsulated in the aqueous interior of a liposome, interspersed within the lipid bilayer of a liposome, attached to a liposome via a linking molecule that is associated with both the liposome and the nucleic acid, entrapped in a liposome, complexed with a liposome, dispersed in a solution containing a lipid, mixed with a lipid, combined with a lipid, contained as a suspension in a lipid, contained or complexed with a micelle, or otherwise associated with a lipid. In some embodiments, the nucleic acid is comprised in a lipid nanoparticle (LNP).
[0134] In some embodiments, the endonuclease or gene editing system (e.g., retrotransposase) is introduced into a cell (e.g., host cell) in any suitable way, either stably or transiently. In some embodiments, the endonuclease or gene editing system is transfected into the cell. In some embodiments, the cell is transduced or transfected with a nucleic acid construct that encodes the endonuclease or gene editing system. For example, a cell is transduced (e.g., with a virus encoding the endonuclease or gene editing system), or transfected (e.g., with a plasmid encoding the endonuclease or gene editing system) with a nucleic acid that encodes the endonuclease or gene editing system. In some embodiments, the transduction is a stable or transient transduction. In some embodiments, cells expressing the endonuclease or gene editing system or containing the endonuclease or gene editing system are transduced or transfected with one or more gRNA molecules, for example when the endonuclease or gene editing system comprises the retrotransposase. In some embodiments, a plasmid expressing the endonuclease or gene editing system is introduced into cells through electroporation, transient (e.g., lipofection) or stable genome integration (e.g., piggybac), or viral transduction (for example lentivirus or AAV), or other methods known to those of skill in the art. In some embodiments, the gene editing system is introduced into the cell as one or more polypeptides. In some embodiments, delivery is achieved through the use of RNP complexes. Delivery methods to cells for polypeptides and/or RNPs are known in the art, for example by electroporation or by cell squeezing.
[0135] Exemplary methods of delivery of nucleic acids include lipofection, nucleofection, electroporation, stable genome integration (e.g., piggybac), microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipidnucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386; 4,946,787; and 4,897,355; and lipofection reagents are sold commercially (e.g., Transfectam™, Lipofectin™ and SF Cell Line 4D-Nucleofector X Kit™ (Lonza)). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of WO 91/17424 and WO 91/16024. In some embodiments, the delivery is to cells (e.g., in vitro or ex vivo administration) or target tissues (e.g., in vivo administration). In some embodiments, the nucleic acid is comprised in a liposome or a nanoparticle that specifically targets a host cell.
[0136] Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US 2003/0087817.
Methods of Use
[0137] Systems of the present disclosure may be used for various applications, such as, for example, nucleic acid editing (e.g., gene editing), binding to a nucleic acid molecule (e.g., sequence-specific binding). Such systems may be used, for example, for addressing (e.g., removing or replacing) a genetically inherited mutation that may cause a disease in a subject, inactivating a gene in order to ascertain its function in a cell, as a diagnostic tool to detect disease-causing genetic elements (e.g., via cleavage of reverse-transcribed viral RNA or an amplified DNA sequence encoding a disease-causing mutation), as deactivated enzymes in combination with a probe to target and detect a specific nucleotide sequence (e.g., sequence encoding antibiotic resistance int bacteria), to render viruses inactive or incapable of infecting host cells by targeting viral genomes, to add genes or amend metabolic pathways to engineer organisms to produce valuable small molecules, macromolecules, or secondary metabolites, to establish a gene drive element for evolutionary selection, to detect cell perturbations by foreign small molecules and nucleotides as a biosensor.
[0138] Described herein, in certain embodiments, are methods for modifying a target nucleic acid comprising providing an engineered retrotransposase system. In some embodiments, the present disclosure provides a method for binding, nicking, cleaving, marking, modifying, or transposing a double-stranded deoxyribonucleic acid polynucleotide. In some embodiments, the method comprises contacting the double-stranded deoxyribonucleic acid polynucleotide with a retrotransposase.
[0139] In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide. [0140] In some embodiments, the retrotransposase is configured to transpose the cargo nucleotide sequence as single- stranded deoxyribonucleic acid polynucleotide. In some embodiments, the retrotransposase is configured to transpose the cargo nucleotide sequence as double-stranded deoxyribonucleic acid polynucleotide. In some embodiments, the retrotransposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR).
[0141] In some embodiments, the present disclosure provides a method of modifying a target nucleic acid sequence (e.g., locus). In some embodiments, the method comprises delivering to the target nucleic acid sequence the engineered retrotransposase system described herein. In some embodiments, the complex is configured such that upon binding of the complex to the target nucleic acid sequence, the complex modifies the target nucleic acid sequence.
[0142] In some embodiments, modifying the target nucleic acid sequence comprises binding, nicking, cleaving, marking, modifying, or transposing the target nucleic acid sequence. In some embodiments, the target nucleic acid sequence comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). In some embodiments, the target nucleic acid comprises genomic DNA, viral DNA, viral RNA, or bacterial DNA. In some embodiments, the target nucleic acid sequence is in vitro. In some embodiments, the target nucleic acid sequence is within a cell. In some embodiments, the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell. In some embodiments, the cell is a primary cell. In some embodiments, the primary cell is a T cell. In some embodiments, the primary cell is a hematopoietic stem cell (HSC). In some embodiments, the cell is a human cell. In some embodiments, the cell is genome edited ex vivo. In some embodiments, the cell is genome edited in vivo.
[0143] In some embodiments, delivery of the engineered retrotransposase system to the target nucleic acid sequence comprises delivering the nucleic acid described herein or the vector described herein. In some embodiments, delivery of engineered retrotransposase system to the target nucleic acid sequence comprises delivering a nucleic acid comprising an open reading frame encoding the retrotransposase. In some embodiments, the nucleic acid comprises a promoter. In some embodiments, the open reading frame encoding the retrotransposase is operably linked to the promoter.
[0144] In some embodiments, delivery of the engineered retrotransposase system to the target nucleic acid sequence comprises delivering a capped mRNA containing the open reading frame encoding the retrotransposase. In some embodiments, delivery of the engineered retrotransposase system to the target nucleic acid sequence comprises delivering a translated polypeptide. In some embodiments, delivery of the engineered retrotransposase system to the target nucleic acid sequence comprises delivering a deoxyribonucleic acid (DNA) encoding the engineered retrotransposase operably linked to a ribonucleic acid (RNA) pol III promoter.
[0145] In some embodiments, the retrotransposase does not induce a break at or proximal to the target nucleic acid sequence.
[0146] In some embodiments, the transposition activity is measured in vitro by introducing the retrotransposase to cells comprising the target nucleic acid sequence and detecting transposition of the target nucleic acid sequence in the cells. In some embodiments, the composition comprises 20 pmoles or less of the retrotransposase. In some embodiments, the composition comprises 1 pmol or less of the retrotransposase.
[0147] Further described herein, in certain embodiments, are methods of manufacturing a retrotransposase. In some embodiments, the method comprises cultivating a host cell with the engineered retrotransposase system described herein.
[0148] In some embodiments, the host cell is a bacterial cell. In some embodiments, the bacterial cell is Bifidobacterium longum, Bifidobacterium lactis, Bifidobacterium animalis, Bifidobacterium breve, Bifidobacterium infantis, Bifidobacterium adolescentis, Lactobacillus acidophilus, Lactobacillus casei, Lactobacillus paracasei, Lactobacillus salivarius, Lactobacillus reuteri, Lactobacillus rhamnosus, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus fermentum, Lactococcus lactis, Streptococcus thermophilus, Lactococcus lactis, Lactococcus diacetylactis, Lactococcus cremoris, Lactobacillus bulgaricus, Lactobacillus helveticus, Lactobacillus delbrueckii, or Escherichia coli. In some embodiments, the host cell is an E. coli cell. In some embodiments, the E. coli cell is a ZDE3 lysogen or a BL21(DE3) strain. In some embodiments, the A. coli cell has an ompT Ion genotype.
[0149] In some embodiments, the host cell is an E. coli cell. In some embodiments, the E. coli cell is a ZDE3 lysogen or the E. coli cell is a BL21(DE3) strain. In some embodiments, the E. coli cell has an ompT Ion genotype.
[0150] In some embodiments, the open reading frame is operably linked to a promoter sequence. In some embodiments, the promoter is selected from the group consisting of a mini promoter, an inducible promoter, a constitutive promoter, and derivatives thereof. In some embodiments, the promoter is selected from the group consisting of CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl9, p40, Synapsin, CaMKII, GRK1, and derivatives thereof.
[0151] In some embodiments, the open reading frame is operably linked to a T7 promoter sequence, a T7-lac promoter sequence, a lac promoter sequence, a tac promoter sequence, a trc promoter sequence, a ParaBAD promoter sequence, a PrhaBAD promoter sequence, a T5 promoter sequence, a cspA promoter sequence, an araPBAD promoter, a strong leftward promoter from phage lambda (pL promoter), or any combination thereof.
[0152] In some embodiments, the open reading frame comprises a sequence encoding an affinity tag linked in-frame to a sequence encoding the retrotransposase. In some embodiments, the affinity tag is an immobilized metal affinity chromatography (IMAC) tag. In some embodiments, the IMAC tag is a polyhistidine tag. In some embodiments, the affinity tag is a myc tag, a human influenza hemagglutinin (HA) tag, a maltose binding protein (MBP) tag, a glutathione S-transferase (GST) tag, a streptavidin tag, a FLAG tag, or any combination thereof. In some embodiments, the affinity tag is linked in-frame to the sequence encoding the retrotransposase via a linker sequence encoding a protease cleavage site. In some embodiments, the protease cleavage site is a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof.
[0153] In some embodiments, the open reading frame is codon-optimized for expression in the host cell. In some embodiments, the open reading frame is provided on a vector. In some embodiments, the open reading frame is integrated into a genome of the host cell.
[0154] In some embodiments, the present disclosure provides a culture comprising a host cell described herein in compatible liquid medium.
[0155] In some embodiments, the present disclosure provides a method of producing a retrotransposase, comprising cultivating a host cell described herein in compatible growth medium. In some embodiments, the method further comprises inducing expression of the retrotransposase by addition of an additional chemical agent or an increased amount of a nutrient. In some embodiments, the additional chemical agent or increased amount of a nutrient comprises Isopropyl P-D-l -thiogalactopyranoside (IPTG) or additional amounts of lactose. In some embodiments, the method further comprises isolating the host cell after the cultivation and lysing the host cell to produce a protein extract. In some embodiments, the method further comprises subjecting the protein extract to IMAC, or ion-affinity chromatography. In some embodiments, the open reading frame comprises a sequence encoding an IMAC affinity tag linked in-frame to a sequence encoding the retrotransposase. In some embodiments, the IMAC affinity tag is linked in-frame to the sequence encoding the retrotransposase via a linker sequence encoding protease cleavage site. In some embodiments, the protease cleavage site comprises a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof. In some embodiments, the method further comprises cleaving the IMAC affinity tag by contacting a protease corresponding to the protease cleavage site to the retrotransposase. In some embodiments, the method further comprises performing subtractive IMAC affinity chromatography to remove the affinity tag from a composition comprising the retrotransposase.
Kits
[0156] In some embodiments, this disclosure provides kits comprising one or more nucleic acid constructs encoding the various components of the retrotransposase or gene editing system described herein, e.g., comprising a nucleotide sequence encoding the components of the retrotransposase or gene editing system capable of modifying a target DNA sequence. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the gene editing system components.
[0157] In some embodiments, any of the retrotransposase or gene editing systems disclosed herein is assembled into a pharmaceutical, diagnostic, or research kit to facilitate its use in therapeutic, diagnostic, or research applications. A kit may include one or more containers housing any of the vectors disclosed herein and instructions for use.
[0158] The kit may be designed to facilitate use of the methods described herein by researchers and can take many forms. Each of the compositions of the kit, where applicable, may be provided in liquid form (e.g., in solution), or in solid form, (e.g., a dry powder). In certain cases, some of the compositions may be constitutable or otherwise processable (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water or a cell culture medium), which may or may not be provided with the kit. As used herein, "instructions" can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc. The written instructions, in some embodiments, are in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which instructions can also reflect approval by the agency of manufacture, use, or sale for animal administration.
EXAMPLES
Example 1 - A method of metagenomic analysis for new proteins
[0159] Samples for metagenomic analysis were collected from sediment, soil, and animals. Samples were collected with consent of property owners. Additional raw sequence data from public sources included animal microbiomes, sediment, soil, hot springs, hydrothermal vents, marine, peat bogs, permafrost, and sewage sequences. Deoxyribonucleic acid (DNA) was extracted with a DNA mini-prep kit and sequenced. Metagenomic sequence data was searched using Hidden Markov Models generated based on documented retrotransposase protein sequences to identify new retrotransposases. Retrotransposase proteins identified by the search were aligned to documented proteins to identify potential active sites. This metagenomic workflow resulted in the delineation of the MG140 family described herein.
Example 2 - Discovery of MG140, MG148, MG153, and MG160 Families of Retrotransposases
[0160] Analysis of the data from the metagenomic analysis of Example 1 revealed a new cluster of undescribed putative retrotransposase systems comprising 1 family (MG140). The corresponding protein sequences for these new enzymes and their subdomains are presented as SEQ ID NOs: 1-16 and 32-47.
Example 3 - Integration of reverse transcribed DNA in vitro activity (prophetic) [0161] Integrase activity can be interrogated via expression in an E. coli lysate-based expression system. The required components for in vitro testing are three plasmids: an expression plasmid with the retrotransposon gene(s) under a T7 promoter, a target plasmid, and a donor plasmid which contains the required 5’ and 3’ UTR sequences recognized by the retrotransposase around a selection marker gene (e.g., Tet resistance gene). The lysate-based expression products, target DNA, and donor plasmid are incubated to allow for transposition to occur. Transposition is detected via PCR. In addition, the transposition product will be tagmented with T5 and sequenced via NGS to determine the insertion sites on a population of transposition events. Alternatively, the in vitro transposition products can be transformed into E. coli under antibiotic (e.g., Tet) selection, where growth requires the selection marker to be stably inserted into a plasmid. Either single colonies or a population of E. coli can be sequenced to determine the insertion sites.
[0162] Integration efficiency can be measured via ddPCR or qPCR of the experimental output of target DNA with integrated cargo, normalized to the amount of unmodified target DNA also measured via ddPCR.
[0163] This assay may also be conducted with purified protein components rather than from lysate-based expression. In this case, the proteins are expressed in E. coli protease-deficient B strain under T7 inducible promoter, the cells are lysed using sonication, and the His-tagged protein of interest is purified using Ni-NTA affinity chromatography on an FPLC. Purity is determined using densitometry of the protein bands resolved on SDS-PAGE and Coomassie stained acrylamide gels. The protein is desalted in storage buffer composed of 50 mM Tris-HCl, 300 mM NaCl, 1 mM TCEP, 5% glycerol; pH 7.5 (or other buffers as determined for maximum stability) and stored at -80°C. After purification the transposon gene(s) are added to the target DNA and donor plasmid as described above in a reaction buffer, for example 26 mM HEPES pH
7.5, 4.2 mM TRIS pH 8, 50 ug/mL BSA, 2 mM ATP, 2.1 mM DTT, 0.05 mM EDTA, 0.2 mM MgCh, 30-200 mM NaCl, 21 mM KC1, 1.35% glycerol, (final pH 7.5) supplemented with 15 mM MgOAc?.
Example 4 - Retrotransposon end verification via gel shift (prophetic)
[0164] The retrotransposon ends are tested for retrotransposase binding via an electrophoretic mobility shift assay (EMSA). In this case, a target DNA fragment (100-500 bp) is end-labeled with FAM via PCR with FAM-labeled primers. The 3’ UTR RNA and 5’ UTR RNA are generated in vitro using T7 RNA polymerase and purified. The retrotransposase proteins are synthesized in an in vitro transcription/translation system. After synthesis, 1 pL of protein is added to 50 nM of the labeled DNA and 100 ng of the 3’ or 5’ UTR RNA in a 10 pL reaction in binding buffer (e.g., 20 mM HEPES pH 7.5, 2.5 mM Tris pH 7.5, 10 mM NaCl, 0.0625 mM EDTA, 5 mM TCEP, 0.005% BSA, 1 ug/mL poly(dl-dC), and 5% glycerol). The binding is incubated at 30° for 40 minutes, then 2 pL of 6X loading buffer (60 mM KC1, 10 mM Tris pH
7.6, 50% glycerol) is added. The binding reaction is separated on a 5% TBE gel and visualized. Shifts of the 3’ or 5’ UTR in the presence of retrotransposase protein and target DNA can be attributed to successful binding and are indicative of retrotransposase activity. This assay can also be performed with retrotransposase truncations or mutations, as well as using E. coli extract or purified protein.
Example 5 - Cleavage of target DNA verification (prophetic)
[0165] To confirm that the retrotransposase is involved in cleavage of target DNA, short (~ 140 bp) DNA fragments are labelled at both ends with FAM via PCR with FAM-labeled primers. In vitro transcription/translation retrotransposase products are pre-incubated with 1 pg of RNase A (negative control), or 3’ UTR, 5’ UTR or non-specific RNA fragments (control), followed by incubating with labeled target DNA at 37°C. The DNA is then analyzed on a denaturing gel. Cleavage of one or both strands of DNA can result in labelled fragments of various sizes, which migrate at different rates on the gel.
Example 6 - Integrase activity in E. coli (prophetic)
[0166] Engineered E. coli strains are transformed with a plasmid expressing the retrotransposon genes and a plasmid containing a temperature-sensitive origin of replication with a selectable marker flanked by 5’ and 3’ UTR of the retrotransposon required for integration. Transformants induced for expression of these genes are then screened for transfer of the marker to a genomic target by selection at restrictive temperature for plasmid replication and the marker integration in the genome is confirmed by PCR.
[0167] Integrations are screened using an unbiased approach. In brief, purified gDNA is tagmented with Tn5, and DNA of interest is then PCR amplified using primers specific to the Tn5 tagmentation and the selectable marker. The amplicons are then prepared for NGS sequencing. Analysis of the resulting sequences is trimmed of the transposon sequences and flanking sequences are mapped to the genome to determine insertion position, and insertion rates are determined.
Example 7 - Integration of reverse transcribed DNA into mammalian genomes (prophetic) [0168] To show targeting and cleavage activity in mammalian cells, the integrase proteins are purified in E. coli or sf9 cells with 2 NLS peptides either in the N, C or both terminus of the protein sequence. A plasmid containing a selectable neomycin resistance marker (NeoR) or a fluorescent marker flanked by the 5’ and 3’ UTR regions required for transposition and under control of a CMV promoter are synthesized. Cells are be transfected with the plasmid, recovered for 4-6 hours for RNA transcription, and subsequently electroporated with purified integrase proteins. Antibiotic resistance integration into the genome is quantified by G418 -resistant colony counts (selection to start 7 days post-transfection), and positive transposition by the fluorescent marker is assayed by fluorescence activated cell cytometry. 7-10 days after the second transfection, genomic DNA is extracted and used for the preparation of an NGS library. Off target frequency is assayed by fragmenting the genome and preparing amplicons of the transposon marker and flanking DNA for NGS library preparation. At least 40 different target sites are chosen for testing each targeting system’s activity.
[0169] Integration in mammalian cells can also be assessed via RNA delivery. An RNA encoding the retrotransposase with 2 NLS is designed, and cap and polyA tail are added. A second RNA is designed containing a selectable neomycin resistance marker (NeoR) or a fluorescent marker flanked by the 5’ and 3’ UTR regions. The RNA constructs are introduced into mammalian cells via a liposome transfection reagent. 10 days post-transfection, genomic DNA is extracted to measure transposition efficiency using ddPCR and NGS.
Example 8 - Bioinformatic discovery of RTs
[0170] An extensive assembly-driven metagenomic database of microbial, viral, and eukaryotic genomes was mined to retrieve predicted proteins with reverse transcriptase function. Over 4.5 million RT proteins were predicted on the basis of having a hit to the PF am domains PF00078 and PF07727, of which 3.4 million had a significant e-value (< 1 xlO'5). After filtering for complete ORFs with an RT domain coverage of > 70%, and with predicted catalytic residues ([F/Y]XDD), nearly half a million proteins were retained for further analysis. The RT domains were extracted from this set of proteins, as well as from reference sequences retrieved from public databases. The domain sequences were clustered at 50% identity over 80% coverage and, representative sequences (26,824 in total) were aligned, and the domain alignment was used to infer a phylogenetic tree. Phylogenetic analysis of RT domains suggests that many different classes of RTs with high sequence diversity were recovered (FIG. 4).
Example 9 - Non-LTR retrotransposons (MG148 family)
[0171] Retrotransposon-associated R T bioinformatic analysis
[0172] The MG148 family of retrotransposon-associated RTs includes extremely divergent RT homologs, predicted to be active by the presence of all expected catalytic residues and multiple Zn-binding ribbon motifs (FIGs. 5A and 5B). Alignment at the nucleotide level for several family members uncovered conserved regions within the 5’ UTR, which are possibly involved in RT function, activity, or mobilization (FIG. 5C).
[0173] Testing the in vitro activity of retrotransposon RTs by qPCR
[0174] The in vitro activity of retrotransposon RTs was assessed by a primer extension reaction containing RT enzyme derived from a cell-free expression system and 100 nM of RNA template (200 nt) annealed to a DNA primer in reaction buffer containing 40 mM Tris-HCl (pH 7.5), 0.2 M NaCl, 10 mM MgCh, 1 mM TCEP, and 0.5 mM dNTPs. The resulting full-length cDNA product was quantified by qPCR by extrapolating values from a standard curve generated with the DNA template of known concentrations. MG148 family members MG140-33-R2 through MG140-34-R2 (SEQ ID NOs: 5-6), MG140-42-R2 through MG140-44-R2 (SEQ ID NOs: 14- 16), and MG148-12 (SEQ ID NO: 32) are active at cDNA synthesis as determined by primer extension (FIG. 6).
Example 10 - Group II intron RTs (MG153 family)
[0175] Group II intron bioinformatic analysis
[0176] Group II introns are capable of integrating large cargo into a target site via reverse transcription of an RNA template. RT domains from Group II introns were identified and delineated in the phylogenetic tree in FIG. 4. Over 10,000 unique full-length Group II intron proteins containing RT domains from contigs with > 2 kb of sequence flanking the RT enzyme were aligned. A phylogenetic tree was inferred from this alignment and Group II intron families were further identified (FIGs. 7A-7B). Group II introns of Class C were identified, and their domain architecture includes an RT domain predicted to be active, as well as a maturase domain involved in intron mobilization. Some Group II intron proteins contain an additional endonuclease domain likely involved in target recognition and cleavage. Many candidates from all families identified were nominated for laboratory characterization.
[0177] Testing the in vitro activity of Group II intron RTs Class C
[0178] The in vitro activity of GII intron Class C (MG153) RTs was assessed by a primer extension reaction containing RT enzyme derived from a cell-free expression system. Expression constructs were codon-optimized for E. coli and contained an N-terminal single Strep tag. Expression of the RT was confirmed by SDS-PAGE analysis. The substrate for the reaction was 100 nM of RNA template (200 nt) annealed to a 5 ’-FAM labeled primer. The reaction buffer contained the following components: 50 mM Tris-HCl (pH 8.0), 75 mM KC1, 3 mM MgCh, 10 mM DTT, and 0.5 mM dNTPs. Following incubation at 37 °C for 1 h, the reaction was quenched via incubation with RnaseH, followed by the addition of 2X RNA loading dye. The resulting cDNA product(s) were separated on a 10% denaturing polyacrylamide gel and were visualized using an imaging system. RT activity was also assessed by qPCR with primers that amplify the full-length cDNA product. Products from the primer extension assay were diluted to ensure cDNA concentrations were within the linear range of detection. The amount of cDNA was quantified by extrapolating values from a standard curve generated with the DNA template of known concentrations. By detection of cDNA products on a denaturing gel and by qPCR, the following GII intron class C candidates are active under these experimental conditions: MG153- 22 through MG153-24 (SEQ ID NOs: 42-44). (FIGs. 8A-8B).
[0179] Human cells cDNA synthesis results
[0180] The ability of these enzymes to produce cDNA in a mammalian environment was tested by expressing them in mammalian cells and detecting cDNA synthesis by PCR, followed by agarose electrophoresis. Reverse transcriptases were cloned in a plasmid for mammalian expression under the CMV promoter as fusion proteins having MS2 coat protein (MCP) at the N terminus, in addition to a flag-HA tag (FH). MCP is a protein derived from the MS2 bacteriophage that recognizes a 20 nucleotide RNA stem loop with high affinity (subnanomolar Kd). By fusing the RTs with MCP and having the MS2 loops in the RNA template, it is ensured that once the RT is translated, it finds the RNA template and starts cDNA synthesis from the DNA primer hybridized to the RNA template.
[0181] A plasmid containing MCP fused to the RT candidate under CMV promoter was cloned and isolated for transfection in HEK293T cells. Transfection was performed using lipofectamine 2000. mRNA coding for nanoluciferase was made using an mRNA synthesizer according to the manufacturer instructions. In order to degrade any DNA template left in the mRNA preparation, the reaction was treated with DNase for 1 hour, and the mRNA is cleaned using a Clean-Up kit. The mRNA was hybridized to a complementary DNA primer in lOmM Tris pH 7.5, 50mM NaCl at 95 °C for 2 min and cooled to 4 °C at the rate of 0.1 °C/s. The mRNA/DNA hybrid was transfected into HEK293T cells using a liposome based transfection reagent 6 hours after the plasmid containing the MCP-RT fusion was transfected. 18 hours post mRNA/DNA transfection, cells were lysed using DNA Extraction Solution, 100 pL of quick extract was added per 24 well in a 24 well plate. The nanoluciferase is -500 bp long, primers to amplify products of 100 bp and 542 bp from the newly synthesized cDNA were designed. cDNA was amplified using the set of primers mentioned above, and PCR products were detected by agarose gel electrophoresis or DNA Tape Station.
[0182] Activity for the control GII intron RTs TGIRT was detected (FIG. 9), as shown by the presence of a 500bp DNA product. Moreover, cDNA synthesis activity for a GII intron derived RT, MG153-23 (SEQ ID NO: 43), was also shown (FIG. 9). Altogether, this shows that these newly discovered RTs are expressed, fold properly, and are active inside living mammalian cells, opening options for their biotechnological applications.
[0183] Human cells RT expression and cDNA synthesis results
[0184] The ability of GII RTs to synthesize cDNA in a mammalian cell environment was tested as previously described with a small modification. cDNA synthesis was previously detected using PCR and analyzed by agarose gel electrophoresis. In order to have a quantitative readout, a Taqman qPCR assay was developed using Taqman qPCR primers previously described with a Taqman probe “ACTCTGTGAGCGGATCTTGGCTTAGCC” (SEQ ID NO: 70). MG153-23 and MG153-24 RTs were active to various degrees, with MG153-23 nearly as active as the TGIRT control (FIG. 12).
[0185] In order to understand protein expression and stability of the GII RTs in mammalian cells, immunoblots were performed. Briefly, transfected cells were lysed with RIPA lysis buffer supplemented with protease inhibitors (80 pL per well in a 24 well format). The lysate was centrifuged at 14,000 g for 10 min at 4 °C in order to remove insoluble aggregates. Proteins were quantified using BCA. 3 or 10 ug of total protein was loaded per lane in a 4-12% polyacrylamide SDS gel. All lanes were normalized to the same amount of protein. Proteins were transferred to a PVDF membrane using a gel transfer system. Proteins were detected using a rabbit HA antibody with an HRP -based detection method. Results indicate that MG153-23 is expressed in human cells, as given by the intensity of the band (FIGs. 13A-13B). When normalizing cDNA synthesis by the quantified expression, the MG153-23 RTs outperformed the TGIRT control by over sixfold (FIG. 14). Example 11 - Retron-like RTs (MG160 family)
[0186] Retron bioinformatic analysis
[0187] Bacterial retrons are DNA elements of approximately 2000 bp in length that encode an RT-coding gene (ret) and a contiguous non-coding RNA containing inverted sequences, the msr and msd. Retrons employ a unique mechanism for RT-DNA synthesis, in which the ncRNA template folds into a conserved secondary structure, insulated between two inverted repeats (al/a2). The retron RT recognizes the folded ncRNA, and reverse transcription is initiated from a conserved guanosine 2’OH adjacent to the inverted repeats, forming a 2’-5’ linkage between the template RNA and the nascent cDNA strand. In some retrons, this 2’ -5’ linkage persists into the mature form of processed RT-DNA, while in others an exonuclease cleaves the DNA product resulting in a free 5’ end. Moreover, the RT only targets the msr-msd derived from the same retron as its RNA template, providing specificity that may avoid off-target reverse transcription. [0188] A divergent group of “retron-like” single-domain RT sequences were identified within the retron clade in FIG. 4. The single-domain RTs of the MG160 family range between 250 and 300 aa and are predicted to be active based on the presence of expected RT catalytic residues [F/Y]XDD. The 5’ UTR of the MG160 family are conserved among family members and fold into conserved secondary structures (FIG. 10) that are likely important for element activity or mobilization.
[0189] Testing the in vitro activity of the MG160 family of retron-like RTs
[0190] The in vitro activity of retron-like RTs (MG160 family) was assessed by a primer extension reaction containing RT enzyme derived from a cell-free expression system. Expression constructs were codon-optimized for E. coli and contained an N-terminal single Strep tag. The substrate for the reaction was 100 nM of RNA template (200 nt) annealed to a 5 ’-FAM labeled primer. The reaction buffer contained the following components: 50 mM Tris-HCl (pH 8.0), 75 mM KC1, 3 mM MgCh, 10 mM DTT, and 0.5 mM dNTPs. Following incubation at 37 °C for 1 h, the reaction was quenched via incubation with RnaseH, followed by the addition of 2X RNA loading dye. The resulting cDNA product(s) were separated on a 10% denaturing polyacrylamide gel and were visualized using an imaging system. RT activity was also assessed by qPCR with primers that amplify the full-length cDNA product. Products from the primer extension assay were diluted to ensure cDNA concentrations were within the linear range of detection. The amount of cDNA was quantified by extrapolating values from a standard curve generated with the DNA template of known concentrations. By gel analysis and by qPCR, MG160-7 (SEQ ID NO: 45) is active (FIGs. 11A-11B). Example 12 - Cell-free expression of retron RTs and in vitro transcription of retron ncRNAs (prophetic)
[0191] Retron RTs are produced in a cell-free expression system by incubating 10 ng/pL of a DNA template encoding the E. co/z-optimized gene with an N-terminal single Strep tag with the in vitro transcription/translation system components for 2 h at 37 °C. All tested retron RTs are expressed as indicated by SDS-PAGE analysis.
[0192] The retron ncRNAs are generated using a T7 in vitro transcription kit and a DNA template encoding the respective ncRNA gene following a T7 promoter. The reaction is then incubated with DNase-I to eliminate the DNA template and purified by an RNA cleanup kit. Quantity of the ncRNA is determined by nanodrop and the purity assessed by electrophoretic RNA analysis.
Example 13 - Testing retron RT in vitro activity (prophetic)
[0193] The retron RT enzyme is produced in a cell-free expression system using a construct containing an E. coli codon-optimized gene with an N-terminal single Strep tag as described above. Expression of the enzyme is confirmed by SDS-PAGE analysis. Retron RT activity on a general template is determined by a primer extension assay as described above, containing a 200 nt RNA annealed to a 5 ’-FAM labeled DNA primer. The resulting cDNA product(s) are detected on a denaturing polyacrylamide gel or by qPCR with primers specific for the full-length cDNA product.
[0194] Retron RT in vitro activity on its own ncRNA is assessed in a reaction containing buffer, dNTPs, the retron RT produced from a cell-free expression system, and the refolded ncRNA. RT activity before and after purification of the RT from the cell-free expression system via the N-terminal single Strep tag is compared. After incubation, half of the reaction is treated with RNase A/Tl. Products before and after RNase A/Tl treatment are evaluated on a denaturing polyacrylamide gel and visualized by SYBR gold staining. RNase A/Tl should digest away the RNA template and result in a mass shift towards a smaller product containing only the ssDNA. Since RNase H is expected to improve homogeneity of the 5’ and 3’ ssDNA boundaries, the impact of RNase H on the distribution of products is also evaluated by gel analysis. The covalent linkage between the ncRNA template and ssDNA is confirmed by incubating the RT product with a 5’ to 3’ ssDNA exonuclease (RecJ) before or after treatment with a debranching enzyme (DBR1). RecJ should only be able to degrade the ssDNA after DBR1 has removed the 2’-5’ phosphodiester linkage between the RNA and ssDNA.
Example 14 - Determining retron msr-msd boundaries by NGS (prophetic) [0195] The msr-msd boundaries are determined by unbiased ligation of adapter sequences to the 5’ and 3’ end of the msDNA product after removal of the 2’ -5’ phosphodiester linkage by DBR1. The resulting ligated product is PCR-amplified, library prepped, and subjected to next generation sequencing. Sequencing reads are aligned to the reference sequence to determine the 5’ and 3’ boundaries of the msd. The impact of the presence of RNase H in the RT reaction on the homogeneity of 5’ and 3’ msd boundaries is also evaluated.
Example 15 - Systemic evaluation of insertion sequences into the msd on RT activity (prophetic)
[0196] Sequences of distinct length, predicted secondary structure, and GC-content are inserted into the msd at select insertion sites informed by the msd boundaries determined by NGS and secondary structure predictions of the ncRNA. The impact of these insertion sequences on RT activity are assessed by gel analysis or NGS as described above.
Example 16 - Testing the in vitro activity of RTs (prophetic)
[0197] RT activity is assessed using a primer extension assay containing the RT derived from a cell-free expression system and an RNA template annealed to a DNA primer as described above. The resulting cDNA product(s) are detected by a denaturing polyacrylamide gel and qPCR as described above. Detection of cDNA drop-off products on the denaturing gel provides a relative assessment of processivity for candidates.
Example 17 - Evaluating the priming requirements of RTs (prophetic)
[0198] Primer length preference is determined by testing the RT’s activity on an RNA template annealed to 5 ’-FAM labeled DNA primers of either 6, 8, 10, 13, 16, or 20 nucleotides in length. The RT is derived from a cell-free expression system as described above. After incubating the reaction, the reaction is quenched via the addition of RNase H. The size distribution of cDNA products is analyzed on a denaturing polyacrylamide gel as described above. Optimal primer length is determined as the length that enables the RT to convert the most primer into cDNA product. The experimentally determined optimal primer length is then used in subsequent experiments, such as fidelity and processivity assays, to further characterize the RT in vitro.
Example 18 - Evaluating RT fidelity (prophetic)
[0199] To account for errors introduced during PCR and sequencing, RT fidelity is assessed by a primer extension assay as described above with the exception that a 14-nt unique molecular identifier (UMI) barcode is included in the primer for the reverse transcription reaction. The resulting full-length cDNA product is PCR-amplified, library-prepped, and subjected to next- generation sequencing. Barcodes with >5 reads are analyzed. After aligning to the reference sequence, mutations, insertions, and deletions are counted only if the error is present in all sequence reads with the same barcode. Errors present in one but not all sequencing reads are considered to be introduced during PCR or sequencing. Further analysis of substitution, insertion, and deletion profile is performed, in addition to identification of mutation hotspots within the RNA template. The fidelity measurements will also be performed with modified bases, e.g., pseudouridine, in the template.
Example 19 - Determining the processivity coefficient of RTs (prophetic)
[0200] RT processivity is evaluated using a primer extension assay containing the RT enzyme derived from a cell-free expression system as described above and RNA templates between 1.6 kb - 6.6 kb in length annealed to either a 5 ’-FAM labeled primer (for gel analysis) or an unlabeled primer (for sequencing analysis).
[0201] Reverse transcription reactions are performed under single cycle conditions to prevent rebinding of RT enzymes that have dropped off the RNA template during cDNA synthesis. The optimal trap molecule and concentration to achieve single cycle conditions are experimentally determined. The selected condition should provide sufficient inhibition of cDNA synthesis if incubated prior to reaction initiation but otherwise should not impact the velocity of the reaction. Optimal trap molecules to test include unrelated RNA templates and unrelated RNA templates annealed to DNA primers of various lengths.
[0202] Once single cycle reaction conditions have been optimized, processivity is evaluated by initiating the reaction with the addition of dNTPs and the selected trap molecule after preequilibrating the RT with the RNA template annealed to a DNA primer in the reaction buffer. After incubating the reaction, the reaction is quenched by the addition of RnaseH. The size distribution of cDNA products is analyzed on a denaturing polyacrylamide gel as described above and/or subjected to PCR and library prepped for long-read sequencing. From these experiments, a processivity coefficient is quantified as the template length which yields 50% of the full-length cDNA product. The median length of the cDNA product from the single cycle primer extension reaction is used to estimate the probability that the RT will dissociate on the tested template. From this, the probability that the RT will dissociate at each nucleotide position is calculated, assuming that each dissociation is an independent event and that the probability of dissociation is equal at all nucleotide positions. The processivity coefficient representing the length of template required for 50% of RT dissociated is then determined as 1/(2 *Pd), where Pd is the probability of dissociation at each nucleotide. Example 20 - Systematic analysis of challenge structures on primer extension (prophetic) [0203] To evaluate the impact of challenging templates on RT activity, a primer extension reaction is conducted as stated above, with modifications. The RNA template contains one of the following challenge motifs at fixed distance (100-300 nt) downstream of the primer binding site: homopolymeric stretches, thermodynamically stable GC-rich stem loop, pseudoknot, tRNA, GII intron, and RNA template containing base or backbone modifications (i.e., pseudouridine, phosphothiorate bonds). After quenching the reaction, the size distribution of cDNA products is analyzed by denaturing polyacrylamide gel. An adapter sequence is also unbiasedly ligated to the 3’ ends of the cDNA products using T4 ligase. The ligated product(s) are then PCR-amplified, and library prepped for next generation sequencing to identify both sites of RT misincorporation/insertions/deletions and sites of RT drop-off with single nucleotide resolution. Extent of RT drop-off at a given position is quantified by comparing the number of sequencing reads corresponding to the drop-off product to the number of sequencing reads corresponding to the full-length product.
Example 21 - Evaluating non-templated base additions (prophetic)
[0204] Non-templated addition of bases to the 5’ end of the cDNA product is evaluated by next generation sequencing. Primer extension reactions containing the RT derived from the cell-free expression system and RNA template are conducted as described above. Systematic analysis of different RNA template lengths and sequence motifs at the 5’ end are tested. An adapter sequence is unbiasedly ligated to the 3’ ends of the resulting cDNA products by T4 ligase, resulting in capture of all cDNA products despite the potential heterogeneous nature of their 3 ’ ends. The ligated product(s) are then PCR-amplified, and library prepped for next generation sequencing. Comparison of the expected full-length cDNA reference sequence to experimentally produced cDNA sequences that are longer than full-length enable identification of both the type and number of base additions to the 5 ’-end that were not templated by the RNA.
Example 22 - Determining 5’ and 3’ UTR requirements for activity and processivity for R2- like systems (prophetic)
[0205] Proteins of interest are purified via a Twin-strep tag after IPTG-induced overexpression in A. coli. Purified proteins are tested against 1 kb and 4 kb cargos flanked by the 3’ UTRs identified from their native contexts and the 5’ UTRs plus 400 bp past the start codon. The 5’ and 3’ flanking sequences’ effect on activity is assayed via qPCR to sections near the end of the template to determine if cargos with these native features are preferred substrates.
Example 23 - Human cells cDNA synthesis results (prophetic) [0206] The ability of these enzymes to produce cDNA in a mammalian environment is tested by expressing them in mammalian cells and detecting cDNA synthesis by PCR, followed by agarose electrophoresis. Reverse transcriptases are cloned in a plasmid for mammalian expression under the CMV promoter as fusion proteins having MS2 coat protein (MCP) at the N terminus, in addition to a flag-HA tag (FH). MCP is a protein derived from the MS2 bacteriophage that recognizes a 20 nucleotide RNA stem loop with high affinity (subnanomolar Kd). By fusing the RTs with MCP and having the MS2 loops in the RNA template, it is ensured that once the RT is translated, it finds the RNA template and starts cDNA synthesis from the DNA primer hybridized to the RNA template.
[0207] A plasmid containing MCP fused to the RT candidate under CMV promoter is cloned and isolated for transfection in HEK293T cells. Transfection is performed using lipofectamine 2000. mRNA codifying nanoluciferase is made using mRNA synthesizer. In order to degrade any DNA template left in the mRNA preparation, the reaction is treated with DNase for 1 hour and the mRNA is cleaned using a Transcription Clean-Up kit. The mRNA is hybridized to a complementary DNA primer in lOmM Tris pH 7.5, 50mM NaCl at 95 °C for 2 min and cooled to 4 °C at the rate of 0.1 °C/s. The mRNA/DNA hybrid is transfected into HEK293T cells using Lipofectamine Messenger Max 6 hours after the plasmid containing the MCP-RT fusion was transfected. 18 hours post mRNA/DNA transfection, cells are lysed using a DNA Extraction Solution, 100 pL of quick extract is added per 24 well in a 24 well plate. The nanoluciferase is ~500bp long, primers to amplify products of 100 bp and 542 bp from the newly synthesized cDNA are designed. cDNA is amplified using the set of primers mentioned above and PCR products are detected by agarose gel electrophoresis.
Example 24 - RT cDNA synthesis activity can be harnessed for multiple applications (prophetic)
[0208] Processes dependent on RNA important in RNA biology, such as expression, processing, modifications, and half-life, as well as quality control steps in biotechnology, require a crucial step: conversion of RNA to cDNA. Therefore, multiple RTs have been used for the production of cDNA libraries over the years. RTs used for these purposes include the MMLV RT, AMV RT, and GsI-IIC RT (TGIRT). The first two represent retroviral RTs, while the latter is a GII intron-derived RT. GII intron-derived RTs, as well as non-LTR derived RTs, show several advantages compared to their retroviral counterparts. For example, they are more processive, reading through structural and modified RNAs. Structural and/or modified RNAs can’t be properly reverse transcribed by retroviral RTs, as they create early termination products that can be misinterpreted as RNA fragments. In addition, the ability to template switch of some RTs can be harnessed for early adaptor addition, removing the adaptor ligation step during library preparation. Therefore, highly processive RTs are suitable for the generation of libraries with complex RNA. Further, some highly processive RTs are generally smaller than currently used retroviral RTs, making their production and associated downstream steps easier. Data disclosed herein demonstrates that several RTs described herein outperform the commercially available TGIRT enzyme, some with over six-fold its cDNA synthesis activity.
Example 25 - cDNA synthesis by non-LTR retrotransposon RTs and retron-like RTs
[0209] Non-LTR retrotransposases are capable of integrating large cargo into a target site via reverse transcription of an RNA template. These reverse transcriptases (RTs) integrate an RNA template via target primed reverse transcription (TPRT), a mechanism in which cDNA synthesis is primed by the free 3’ hydroxyl group at the target DNA nick. The MG160 family of RTs are a divergent group of “retron-like” single-domain RT enzymes previously identified within the retron RT clade, which form a distantly branching group. The enzymes are predicted to be active based on the presence of expected RT catalytic residues [F/Y]XDD.
[0210] Results: Human cells cDNA synthesis by RTs
[0211] The ability of RTs to produce cDNA in a mammalian environment was tested by expressing them in mammalian cells and detecting cDNA synthesis by qPCR. Reverse transcriptases were cloned in a plasmid for mammalian expression under the CMV promoter as fusion proteins having MS2 coat protein (MCP) at the N terminus, in addition to a flag-HA tag (FH). MCP is a protein derived from the MS2 bacteriophage that recognizes a 20 nucleotide RNA stem loop with high affinity (subnanomolar Kd). By fusing the RTs with MCP and having the MS2 loops in the RNA template, it was ensured that once the RT is translated it finds the RNA template and starts cDNA synthesis from the DNA primer hybridized to the RNA template. [0212] A plasmid containing MCP fused to the RT candidate under CMV promoter was cloned and isolated for transfection in HEK293T cells. Transfection was performed using lipofectamine 2000. mRNA codifying dCas9 fused to nanoluciferase was made using a mRNA synthesizer. To degrade any DNA template left in the mRNA preparation the reaction was treated with DNase for 1.5 hours and the mRNA was cleaned up using a Transcription Clean-Up kit. The mRNA was hybridized to a complementary DNA primer in lOmM Tris pH 7.5, 50mM NaCl at 95 °C for 2 min and cooled to 4 °C at the rate of 0.1 °C/s. The mRNA/DNA hybrid was transfected into HEK293T cells 6 hours after the plasmid containing the MCP-RT fusion was transfected. 18 hours post mRNA/DNA transfection cells were lysed using a DNA Extraction Solution. 100 pl of quick extract is added per 24 well in a 24 well plate. The RNA template was -4247 nt. Primers to amplify first and last 100 bps products from the newly synthesized cDNA (4100 bp) were designed, along with taqman probes to quantify their amplification (FIG. 15A).
[0213] Activity for the control GII intron RT TGIRT, the retroviral MMLV (WT and pentamutant) as well as a positive control for R2 RTs, R2Tg, was detected (FIGs. 15B and 15C), as shown by an early amplification of the first and last 100 bp products. As expected for a low processivity RT, the retroviral RTs (MMLVs) showed high amplification levels of the first 100 bps (FAM signal) but the levels at which they completed cDNA synthesis (the last 100 bps) was lower (20-fold lower than first 100 bp, as observed by the FAM/HEX ratio signal). Control group II intron RT TGIRT and control R2 non-LTR retrotransposon RT R2Tg showed a closer FAM/HEX ratio, demonstrating their high processivity (FIGs. 15B and 15C). Five candidates of the MG148 family of non-LTR retrotransposon RTs were tested in mammalian cells (FIG. 15B). All tested candidates showed low activity compared to the control RTs. MG160-7, a retron-like RT was also tested similarly. It displayed poor activity and poor processivity as evidenced by FAM and HEX values that were below background (indicated by dotted line parallel to the x- axis) (FIG. 15C)
[0214] While preferred embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the disclosure be limited by the specific examples provided within the specification. While the disclosure has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the disclosure. Furthermore, it shall be understood that all aspects of the disclosure are not limited to the specific depictions, configurations, or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is therefore contemplated that the disclosure shall also cover any such alternatives, modifications, variations, or equivalents. It is intended that the following claims define the scope of the disclosure and that methods and structures within the scope of these claims and their equivalents be covered thereby. Attorney Docket No. MTG-007W
Table 2 - Protein and nucleic acid sequences referred to herein
Figure imgf000058_0001
- 56 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000059_0001
- 57 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000060_0001
- 58 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000061_0001
- 59 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000062_0001
-60- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000063_0001
-61 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000064_0001
-62- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000065_0001
-63 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000066_0001
-64- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000067_0001
-65 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000068_0001
-66- 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000069_0001
-67- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000070_0001
-68 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000071_0001
-69- 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000072_0001
-70- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000073_0001
-71 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000074_0001
-72- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000075_0001
-73 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000076_0001
-74- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000077_0001
-75 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000078_0001
-76- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000079_0001
25359631.1
Attorney Docket No. MTG-007W
Figure imgf000080_0001
-78 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000081_0001
-79- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000082_0001
- 80 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000083_0001
25359631.1
Attorney Docket No. MTG-007W
Figure imgf000084_0001
- 82 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000085_0001
25359631.1
Attorney Docket No. MTG-007W
Figure imgf000086_0001
- 84 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000087_0001
- 85 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000088_0001
- 86 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000089_0001
- 87 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000090_0001
- 88 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000091_0001
- 89 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000092_0001
-90- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000093_0001
-91 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000094_0001
-92- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000095_0001
-93 - 25359631.1
Attomey Docket No. MTG-007W
Figure imgf000096_0001
-94- 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000097_0001
-95 - 25359631.1
Attorney Docket No. MTG-007W
Figure imgf000098_0001
25359631.1

Claims

CLAIMS WHAT IS CLAIMED IS:
1. An engineered retrotransposase system, comprising:
(a) a double-stranded nucleic acid comprising a cargo nucleotide sequence configured to form a complex with a retrotransposase; and
(b) a retrotransposase configured to transpose the cargo nucleotide sequence to a target nucleic acid sequence and comprising an amino acid sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
2. The engineered retrotransposase system of claim 1, wherein the retrotransposase comprises an amino acid sequence having at least 80% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
3. The engineered retrotransposase system of claim 1, wherein the retrotransposase comprises an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
4. The engineered retrotransposase system of claim 1, wherein the retrotransposase comprises an amino acid sequence having at least 95% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47.
5. The engineered retrotransposase system of claim 1, wherein the retrotransposase is encoded by a nucleic acid having at least 75% sequence identity to any one of SEQ ID NOs: 17- 19, 24, and 76-817.
6. The engineered retrotransposase system of claim 1, wherein the retrotransposase is encoded by a nucleic acid sequence having at least 80% sequence identity to any one of SEQ ID NOs: 17-19, 24, and 76-81.
7. The engineered retrotransposase system of claim 1, wherein retrotransposase is encoded by a nucleic acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 17- 19, 24, and 76-81.
8. The engineered retrotransposase system of claim 1, wherein retrotransposase is encoded by a nucleic acid sequence having at least 95% sequence identity to any one of SEQ ID NOs: 17- 19, 24, and 76-81.
9. The engineered retrotransposase system of any one of claims 1-4, wherein the retrotransposase has less than 80% sequence identity to a known retrotransposase.
10. The engineered retrotransposase system of any one of claims 1-9, wherein the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR).
11. The engineered retrotransposase system of any one of claims 1-10, wherein the retrotransposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate.
12. The engineered retrotransposase system of any one of claims 1-11, wherein the doublestranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
13. The engineered retrotransposase system of any one of claims 1-12, wherein the retrotransposase comprises one or more nuclear localization sequences (NLSs) proximal to an Nor C-terminus of the retrotransposase.
14. The engineered retrotransposase system of claim 13, wherein the NLS comprises a sequence at least 80% identical to a sequence from the group consisting of SEQ ID NO: 49-64.
15. The engineered retrotransposase system of claim 13, wherein the NLS comprises SEQ ID NO: 50.
16. The engineered retrotransposase system of claim 13, wherein the NLS is proximal to the N-terminus of the retrotransposase.
17. The engineered retrotransposase system of claim 13, wherein the NLS comprises SEQ ID NO: 49.
18. The engineered retrotransposase system of claim 13, wherein the NLS is proximal to the C-terminus of the retrotransposase.
19. The engineered retrotransposase system of any one of claims 1-18, wherein the retrotransposase is derived from an uncultivated microorganism.
20. A polypeptide comprising a reverse transcriptase comprising an amino acid sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-16 and 32-47 fused N- or C- terminally to a non-retrotransposase domain or an affinity tag.
21. The polypeptide of claim 20, wherein the non-retrotransposase domain is an RNA- binding protein domain.
22. The polypeptide of claim 21, wherein the RNA binding protein domain comprises a bacteriophage MS2 coat protein (MCP) domain.
23. A nucleic acid encoding the engineered retrotransposase system of any one of claims 1-19 or the polypeptide of any one of claims 20-22.
24. A method for modifying a target nucleic acid sequence comprising contacting the target nucleic acid sequence using the engineered nuclease system of any one of claims 1-19.
25. The method of claim 23, wherein modifying the target nucleic acid sequence comprises binding, nicking, or cleaving, the target nucleic acid sequence.
26. The method of any one of claims 23-25, wherein the target nucleic acid sequence comprises genomic DNA, viral DNA, viral RNA, or bacterial DNA.
27. The method of any one of claims 23-25, wherein the target nucleic acid sequence comprises deoxyribonucleic acid (DNA).
28. The method of any one of claims 23-27, wherein the modification is in vitro.
29. The method of any one of claims 23-27, wherein the modification is in vivo.
30. The method of any one of claims 23-27, wherein the modification is ex vivo.
31. A method of modifying a target nucleic acid sequence in a mammalian cell comprising contacting the mammalian cell using the engineered nuclease system of any one of claims 1-19.
32. A vector comprising the nucleic acid of claim 23.
33. The vector of claim 32, wherein the vector is a plasmid, a minicircle, a CELiD, an adeno- associated virus (AAV) derived virion, or a lentivirus.
34. A cell comprising the engineered nuclease system of any one of claims 1-19 or the polypeptide of any one of claims 20-22.
35. The cell of claim 34, wherein the cell is a eukaryotic cell.
36. The cell of claim 34, wherein the cell is a mammalian cell.
37. The cell of claim 34, wherein the cell is an immortalized cell.
38. The cell of claim 34, wherein the cell is an insect cell.
39. The cell of claim 34, wherein the cell is a yeast cell.
40. The cell of claim 34, wherein the cell is a plant cell.
41. The cell of claim 34, wherein the cell is a fungal cell.
42. The cell of claim 34, wherein the cell is a prokaryotic cell.
43. The cell of claim 34, wherein the cell is an A549, HEK-293, HEK-293T, BHK, CHO,
HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, primary cell, or a derivative thereof.
44. The cell of claim 34, wherein the cell is an engineered cell.
45. The cell of claim 34, wherein the cell is a stable cell.
PCT/US2023/083232 2022-12-09 2023-12-08 Retrotransposon compositions and methods of use Ceased WO2024124204A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP23901699.1A EP4630544A2 (en) 2022-12-09 2023-12-08 Retrotransposon compositions and methods of use

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202263386867P 2022-12-09 2022-12-09
US63/386,867 2022-12-09
US202363489156P 2023-03-08 2023-03-08
US63/489,156 2023-03-08
US202363491942P 2023-03-23 2023-03-23
US63/491,942 2023-03-23

Publications (2)

Publication Number Publication Date
WO2024124204A2 true WO2024124204A2 (en) 2024-06-13
WO2024124204A3 WO2024124204A3 (en) 2024-07-11

Family

ID=91380305

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/083232 Ceased WO2024124204A2 (en) 2022-12-09 2023-12-08 Retrotransposon compositions and methods of use

Country Status (2)

Country Link
EP (1) EP4630544A2 (en)
WO (1) WO2024124204A2 (en)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220298495A1 (en) * 2019-06-12 2022-09-22 Emendobio Inc. Novel genome editing tool

Also Published As

Publication number Publication date
WO2024124204A3 (en) 2024-07-11
EP4630544A2 (en) 2025-10-15

Similar Documents

Publication Publication Date Title
US20240287484A1 (en) Systems, compositions, and methods involving retrotransposons and functional fragments thereof
US20240327871A1 (en) Systems and methods for transposing cargo nucleotide sequences
AU2023314925A1 (en) Class ii, type v crispr systems
WO2024233984A2 (en) Systems and methods for transposing cargo nucleotide sequences
EP4615983A2 (en) Serine recombinases for gene editing
CA3244138A1 (en) Systems and methods for transposing cargo nucleotide sequences
EP4482971A2 (en) Systems and methods for transposing cargo nucleotide sequences
WO2024124204A2 (en) Retrotransposon compositions and methods of use
US20240360477A1 (en) Systems and methods for transposing cargo nucleotide sequences
JP2025542108A (en) Retrotransposon compositions and methods of use
WO2024055013A1 (en) Systems and methods for transposing cargo nucleotide sequences
WO2024055012A1 (en) Systems and methods for transposing cargo nucleotide sequences
WO2024124197A2 (en) Retrotransposon compositions and methods of use
WO2024187119A2 (en) Systems and methods for transposing cargo nucleotide sequences
AU2024233048A1 (en) Class 2, type v crispr systems
KR20250175370A (en) Class 2, V-type CRISPR system
WO2023164592A2 (en) Fusion proteins
WO2024102666A2 (en) Serine recombinases for gene editing
WO2024086661A2 (en) Gene editing systems comprising reverse transcriptases
WO2025059585A1 (en) Engineered and chimeric nucleases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23901699

Country of ref document: EP

Kind code of ref document: A2

ENP Entry into the national phase

Ref document number: 2025531027

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025531027

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2023901699

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2023901699

Country of ref document: EP

Effective date: 20250709

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23901699

Country of ref document: EP

Kind code of ref document: A2

WWP Wipo information: published in national office

Ref document number: 2023901699

Country of ref document: EP