[go: up one dir, main page]

WO2024187119A2 - Systèmes et procédés de transposition de séquences nucléotidiques de charge - Google Patents

Systèmes et procédés de transposition de séquences nucléotidiques de charge Download PDF

Info

Publication number
WO2024187119A2
WO2024187119A2 PCT/US2024/019145 US2024019145W WO2024187119A2 WO 2024187119 A2 WO2024187119 A2 WO 2024187119A2 US 2024019145 W US2024019145 W US 2024019145W WO 2024187119 A2 WO2024187119 A2 WO 2024187119A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
transposase
seq
nos
cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/019145
Other languages
English (en)
Other versions
WO2024187119A3 (fr
Inventor
Brian C. Thomas
Lisa ALEXANDER
Christopher Brown
Daniela S.A. Goltsman
Maria Jose SOTO CONTRERAS
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Metagenomi Inc
Original Assignee
Metagenomi Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Metagenomi Inc filed Critical Metagenomi Inc
Publication of WO2024187119A2 publication Critical patent/WO2024187119A2/fr
Publication of WO2024187119A3 publication Critical patent/WO2024187119A3/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]

Definitions

  • the cargo nucleotide sequence is flanked by a flanking sequence recognized by the transposase.
  • the flanking sequence comprises a nucleic acid sequence having at least 80% identity to any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512-524. In some embodiments, the flanking sequence comprises a sequence having at least 80% identity to any one of SEQ ID NOs: 350-454. In some embodiments, the cargo nucleotide comprises a sequence having at least 80% identity 7 to any one of SEQ ID NOs: 471-474, 503-506, 528, 475- 477 and 525-526. In some embodiments, the transposase is not a TnpA transposase or a TnpB transposase.
  • the transposase has less than 80% sequence identity to a TnpA transposase. In some embodiments, the transposase has less than 80% sequence identity to a TnpB transposase. In some embodiments, the transposase comprises a catalytic tyrosine residue. In some embodiments, the transposase is configured to bind a left-hand region comprising a subterminal palindromic sequence and a right-hand region comprising a subterminal palindromic sequence. In some embodiments, the transposase is configured to transpose the cargo nucleotide sequence as single-stranded deoxyribonucleic acid polynucleotide.
  • a first flanking sequence flanks a 5’ end of the cargo nucleic acid sequence, and a second flanking sequence flanks a 3’ end of the cargo nucleic acid sequence.
  • the transposase is configured to recognize an insertion motif adjacent to the target nucleic acid site.
  • the insertion motif comprises at least three, four, five, or six consecutive nucleotides of the sequence AATGAC.
  • the insertion motif comprises a sequence TTAT, TTAG or TCAT.
  • the transposase comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of the transposase.
  • the NLS comprises a sequence at least 80% identical to a sequence from the group consisting of SEQ ID NO: 455-470 and 571-596. In some embodiments, the NLS comprises SEQ ID NO: 456. In some embodiments, the NLS is proximal to the N-terminus of the transposase. In some embodiments, the NLS comprises SEQ ID NO: 455. In some embodiments, the NLS is proximal to the C-terminus of the transposase.
  • the flanking sequence comprises a nucleic acid sequence having at least 80% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511. In some embodiments, the flanking sequence comprises a sequence having at least 80% identity to any one of SEQ ID NOs: 350-454.
  • the transposase is configured to recognize an insertion motif adjacent to the target nucleic acid site. In some embodiments, the insertion motif comprises a sequence of any one of TTAT or TTAG.
  • the flanking sequence comprises a nucleic acid sequence having at least 80% identity to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the transposase is configured to recognize an insertion motif adjacent to the target nucleic acid site.
  • the insertion motif comprises a sequence TCAT.
  • engineered transposase systems comprising: (a) a double-stranded nucleic acid configured to interact with a transposase and comprising a cargo nucleotide sequence; (b) a transposase configured to transpose the cargo nucleotide sequence to a target nucleic acid site; and comprising a sequence having at least 80% sequence identity to any one of SEQ ID NOs: 1-349 and 530-538; (c) an endonuclease; and (d) an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to the target nucleic acid sequence.
  • the guide polynucleotide comprises a sequence having at least 80% sequence identity to any one of SEQ ID NOs: 551- 562. In some embodiments, the guide polynucleotide comprises a sequence having at least 90% sequence identity to any one of SEQ ID NOs: 551-562. In some embodiments, the guide polynucleotide comprises a sequence having 100% sequence identity to any one of SEQ ID NOs: 551-562.
  • the guide polynucleotide comprises a sequence having at least 80% sequence identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide comprises a sequence having at least 90% sequence identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide comprises a sequence having 100% sequence identity to any one of SEQ ID NOs: 551-556.
  • the guide polynucleotide comprises a sequence having at least 80% sequence identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide comprises a sequence having at least 90% sequence identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide comprises a sequence having 100% sequence identity to any one of SEQ ID NOs: 557-562.
  • nucleic acids encoding the engineered transposase system described herein.
  • cells comprising the system described herein.
  • the cell is a eukaryotic cell.
  • the cell is a mammalian cell.
  • the cell is an immortalized cell.
  • the cell is an insect cell.
  • the cell is a yeast cell.
  • the cell is a plant cell.
  • the cell is a fungal cell.
  • the cell is a prokaryotic cell.
  • the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, primary cell, or a derivative thereof.
  • the cell is an engineered cell.
  • the cell is a stable cell.
  • the cell is a primary cell.
  • the cell is a T cell.
  • FIGS. 1A and IB depict MG transposases.
  • FIG. 1A depicts the organization of a transposon comprising the tyrosine (Yl) transposase MG92-1 locus.
  • MG92-1 is encoded at the 5’ end of the transposon, followed by the accessory transposition protein TnpB and other cargo.
  • the transposon ends contain direct repeats of 16-17 bp, and they exhibit secondary structure likely involved in transposition activity.
  • FIG. IB depicts multiple sequence alignment (MSA) of MG Yl transposase homologs, MG92-1, MG92-2, MG92-3, MG92-4, MG92-5, MG92-6, MG92-7, and MG92-8.
  • the consensus sequences is at the top.
  • Catalytic residues HUH and Y are identified by additional small boxes directly below the consensus sequence and on the MSA (boxes).
  • FIG. 2 depicts a phylogenetic tree of TnpA protein sequences. The tree was built from a multiple sequence alignment of 414 TnpA sequences recovered here (black dots) and 19 reference TnpA sequences (grey dots). Labels for references sequences were included.
  • FIG. 3 depicts an example insertion sequence IS200/IS605 MG92-28. Top panel: Genomic context of the MG92-28 insertion sequence encoding the TnpA-like transposase and its associated TnpB-like gene. Both genes are flanked by LE and RE (boxes) predicted from covariance models. Bottom panel: LE (top left) and RE (bottom right) delineate the boundaries of the insertion sequence. Region predicted by the covariance models is annotated as arrows below the sequence. LE and RE secondary structures and sequences are shown for each end.
  • FIG. 4 depicts a Western blot of TnpA-like proteins expressed in in vitro expression system.
  • Lanes are: ladder, 1 : HpTnpA, 2: HhTpA, 3: 92-2, 4: 92-3, 5: 92-4, 6: 92-5, 7: 92-6, 8: 92-7, 9: 92-8, 10: 92-10, 11 : 92-11.
  • HpTnpA and HhTpA are positive controls from H. pylori and H. Heilmannii, respectively.
  • Molecular weights range from 17-23 kilodaltons (kDa).
  • FIG. 5A depicts the PCR product for the LE of the transposition reaction. All reactions have the protein and its paired specific cargo, except the control lane where the cargo is specified. Lanes are: 1: Ladder, 2: negative control NTC with HpTnpA cargo, 3: 92-1, 4: 92-2, 5: 92-3, 6: 92-4, 7: 92-5, 8: 92-6, 9: 92-7, 10: 92-8, 1 1 : 92-10, 12: 92-1 1, 13: HpTnpA, 14;
  • Expected transposition product can range from 200 to 300 bp depending on LE size and is marked with an arrow.
  • the band at ⁇ 200 bp in 92-5 is related to non-specific primer interactions.
  • FIG. 5B depicts the PCR product for the RE of the transposition reaction. All reactions have the protein and its paired specific cargo, except the control lane where the cargo is specified. Lanes are: 1 : NTC with HpTnpA cargo, 2: 92-1, 3: 92-2, 4: 92-3, 5: 92-4, 6: 92-5, 7: 92-6, 8: 92-7, 9: 92-8, 10: 92-10, 11 : 92-11, 12: HpTnpA. 13; HhTnpA, and 14: ladder.
  • Expected transposition product can range from 300 to 500 bp depending on RE size and is marked with an arrow. Transposition that occurs into the 8N region will have a much w eaker band than transposition into flanking sequence, so the faint bands are expected.
  • FIG. 6 depicts Sanger sequencing data confirming transposition for MG92-3.
  • the chromatogram trace is shown mapped to the cargo sequence, where shaded letters match the cargo.
  • the trace instead maps onto the target sequence (boxed).
  • Analysis of the target reveals the insertion motif, which is shared sequence between the LE and the target. Downstream hairpins with flanking non-canonical base interactions can be identified.
  • FIG. 7 depicts Sanger sequencing data confirming transposition for MG92-3.
  • the chromatogram trace is shown mapped to the cargo, and shaded letters match the cargo.
  • the trace instead maps onto the target sequence (boxed).
  • Analysis of the target reveals the insertion motif.
  • the cleavage position in the putative RE defines the boundary 7 of the RE, which folds into a canonical hairpin to allow TnpA recognition and strand cleavage (inset of dotted box).
  • FIG. 8 depicts analysis of chimeric NGS reads showing cargo and target sequence joints which were analyzed to determine the breakpoint.
  • the x-axis is the position along the cargo sequence and the y-axis is the count of reads which transition at that position.
  • the identified peak in the breakpoint at 2030 nt on the cargo matches the breakpoint identified in Sanger sequencing, confirming the position of LE cleavage.
  • FIG. 9 depicts NGS sequencing data confirming transposition for MG92-4.
  • the NGS reads are shown mapped to the target, and light-shaded letters match the cargo.
  • the trace instead maps onto the cargo sequence (boxed).
  • the cleavage position in the putative RE defines the boundary of the RE, which folds into a canonical hairpin to allow TnpA recognition and strand cleavage (inset of dotted box).
  • the NGS read histogram shows the frequency of reads corresponding to this breakpoint on the cargo.
  • FIG. 10 depicts TnpA enzymes that are ssDNA tyrosine (Y) transposases.
  • FIG. 10 depicts the genomic region encoding a TnpA transposase as part of an insertion sequence IS200/IS605. Transposon left end (LE) and right end (RE) were predicted using covariance models.
  • FIG. 11 depicts the PCR readout of MG179 TnpAs insertion reaction into ultramer target 503.
  • Lanes are: (1) MG179-14, (2) MG179-15, (3) MG179-36, (4) MG179-58, (5) MG179-60, and (6) in vitro expression system NTC control with MG179-60 cargo.
  • Lanes 1, 3, and 5 show activity as evidenced by the band at 139 bp.
  • Ladder lanes are (from bottom): 100, 200, 300. 400, 500 bp.
  • FIG. 12 depicts Sanger sequencing of PCR readout for TnpA insertion assay.
  • Sanger sequencing maps back to the target, ultramer 503, up to and including the TTAT insertion motif of MG92-4. Following the insertion motif, sequencing maps back to the LE sequence of MG92-4 indicating that insertion of the cargo into the target has occurred.
  • FIG. 13 depicts Sanger sequencing of PCR readout for TnpA insertion assay.
  • Sanger sequencing maps back to the target, ultramer 503, up to the TTAG insertion motif of MG92-11. Following the insertion motif, sequencing maps back to the LE sequence of MG92-11 indicating insertion into the target has occurred.
  • FIG. 14 depicts Sanger sequencing of PCR readout for TnpA insertion assay. Sanger sequencing maps back to the target, ultramer 503, up to the TCAT insertion motif of MG179-36. Following the insertion motif, sequencing maps back to the LE sequence of MG179-36 indicating insertion into the target has occurred.
  • FIG. 15 depicts Sanger sequencing of PCR readout for TnpA insertion assay.
  • Sanger sequencing maps back to the target, ultramer 503. up to the TCAT insertion motif of MG179-58. Following the insertion motif, sequencing maps back to the LE sequence of MG179-58 indicating insertion into the target has occurred.
  • FIG. 16 depicts Sanger sequencing of PCR readout for TnpA insertion assay.
  • Sanger sequencing maps back to the target, ultramer 503, up to the TCAT insertion motif of MG179-60. Following the insertion motif, sequencing maps back to the LE sequence of 179-60 indicating insertion into the target has occurred.
  • FIG. 17 depicts the PCR readout of MG92-4 insertion reaction into ultramer targets with engineered LE sequences (ELEs). Lanes are: (1) MG92-4 with ELEI and Ultramer 1, (2) MG92-4 with ELE3 and Ultramer 1. (3) MG92-4 with ELE3 and Ultramer 7. (4) MG92-4 with ELE5 and Ultramer 5, (5) MG92-4 with ELE6 and Ultramer 1, (6) MG92-4 and ELE8 (LE trim) and Ultramer 1, (7) MG92-4 with ELE9 (LE trim) and Ultramer 1, and (8) NTC + ELE3 + Ultramer 1. Lanes 2, 3, and 6 show activity' as evidenced by the higher molecular weight bands. [0030] FIG.
  • TnpA landing pad has 3 nested motif sites and 5 expanded motif sites for candidate TnpA’s to recognize and integrate into.
  • the nested motif sites provide a compact region of integration for multiple TnpAs, whereas the expanded LE motif sites allow for each motif to be in different lengths from the 3-6 PAM site.
  • the top panel shows how all the LE motifs, spacer sequences and PAM sites are oriented relative to each other.
  • the bottom panel shows a diagram depicting a zoomed out picture of the full integration cassette that the lentivirus integrated into the genome. This cassette includes the landing pad but also a hygromycin resistance marker to all for the selection of cells that contain the integrated cassette.
  • FIG. 19 depicts an illustration of the general TnpA cargo design.
  • the LE and RE sequences are unique for each TnpA candidate and allow for binding and processing by the TnpA dimer.
  • the buffer sequences are added on the ends to protect the interior sequence from intracellular exonuclease activity'.
  • Nano luciferase is highly expressed and can be easily detected using a plate-based luciferase assay.
  • a puromycin resistance marker was also added to the cassette. The long 18 day selection for puromycin resistance ensures that only integrated copies of the Nanoluciferase and puromycin resistance marker remain. Different amounts were transfected into the landing pad cell line to find optimal expression with minimal toxicity.
  • FIG. 20 depicts ssDNA cargo integration to human cells mediated by the MG92-4 TnpA. Barplot shows luminescence levels after cell transfection with 400 ng or 100 ng of cargo, or no cargo negative control. There was a 129-fold enrichment in luminescence in the fused with guide 4 (g4) treatment (diagonal lined bars) vs. the fused and unguided treatment (horizontal lined bars, not visible) using 100 ng of cargo. In addition, the unfused and guided treatment (solid grey bar) showed an 8.5 fold enrichment in luminescence over the unfused and unguided treatment (dotted bars) with the 100 ng of cargo.
  • guide 4 g4 treatment
  • solid grey bar showed an 8.5 fold enrichment in luminescence over the unfused and unguided treatment (dotted bars) with the 100 ng of cargo.
  • FIG. 21 depicts the PCR readout of the R-loop insertion assay mediated by the MG92-4
  • the generated PCR product was sequence confirmed to correspond to the insertion product generated by R-loop insertion by Sanger sequencing. Lanes are: (1) MG92-4 positive control targeting ssDNA, (2) no target negative control. (3) no cargo negative control, (4) no dSpyCas9 negative control.
  • Lanes showing transposition within the R-loop in dsDNA with the insertion motif TT AC 2-12 bases from the PAM are: (5) MG92-4 with dsDNA target_ttac_2-6 92-4, (6) MG92-4 with dsDNA target_ttac_4-8 92-4, (7) MG92-4 with dsDNA target_ttac_6-10 92-4, (8) MG92-4 with dsDNA target_ttac_8-12 92-4, (9) MG92-4 with dsDNA target_ttac_10-14 92-4, and (10) MG92-4 with dsDNA target_ttac_12-16 92-4.
  • FIG. 22 depicts the PCR readout of R-loop insertion assay mediated by the MG179-36.
  • the generated PCR product was sequence confirmed to correspond to the insertion product generated by R-loop insertion by Sanger sequencing.
  • Lanes are: (1) MG179-36 positive control targeting ssDNA. (2) no target negative control. (3) no cargo negative control, (4) no dSpyCas9 negative control.
  • Lanes showing transposition within the R-loop in dsDNA with the insertion motif 2-12 bases from the PAM are: (5) MG179-36 with dsDNA target_ttac_2-6 179s, (6) MG179-36 with dsDNA target_ttac_4-8 179s.
  • MG179-36 with dsDNA target_ttac_6-10 179s (8) MG179-36 with dsDNA target ttac 8-12 179s, (9) MG179-36 with dsDNA target ttac 10- 14 179s, (10) MG179-36 with dsDNA target_ttac_12-16 179s.
  • FIG. 23 depicts the PCR readout of R-loop insertion assay mediated by MG179-58.
  • the generated PCR product was sequence confirmed to correspond to the insertion product generated by R-loop insertion by Sanger sequencing.
  • Lanes are: (1) MG179-58 positive control targeting ssDNA. (2) no target negative control, (3) no cargo negative control, (4) no dSpyCas9 negative control.
  • Lanes showing transposition within the R-loop in dsDNA with the insertion motif 2-12 bases from the PAM are: (5) MG179-58 with dsDNA target_ttac_2-6 179s, (6)
  • FIGs. 24A and 24B depict the PCR readout of R-loop insertion assay and Sanger sequencing mediated by MG179-60.
  • FIG. 24A Lanes are: (1) MG179-60 positive control targeting ssDNA. (2) no target negative control, (3) no cargo negative control, (4) no dSpyCas9 negative control.
  • Lanes showing transposition within the R-loop in dsDNA with the insertion motif 2-12 bases from the PAM are: (5) MG179-60 with dsDNA target_ttac_2-6 179s, (6) MG179-60 with dsDNA target_ttac_4-8 179s, (7) MG179-60 with dsDNA target_ttac_6-10 179s, (8) MG179-60 with dsDNA target_ttac_8-12 179s, (9) MG179-60 with dsDNA target_ttac_10- 14 179s, and (10) MG179-60 with dsDNA target_ttac_12-16 179s.
  • FIG. 24B Sanger sequencing of lane 9 which shows insertion at the expected position in the R-loop.
  • FIGs. 25A-25C depict schematic illustrations of LE re-programming assay cargo.
  • FIG. 25A Schematic illustration of the introduced N modifications in positions 3 or 4 of the LE cleavage motif and hairpin-adjacent motifs (HAM) of the cargo sequences.
  • FIG. 25B Schematic illustration of single stranded DNA targets which also contain an N in positions 3 or 4 that were then used with the modified cargos depicted in FIG. 25A.
  • FIG. 25C Schematic illustration of LE cleavage motif - HAM interactions within the LE sequence.
  • FIGs. 26A-26B depict heatmaps showing TTNT and TTAN NGS re-programming summary data for MG92-4.
  • Cleavage motif/HAM combinations for MG92-4 were quantified based on the number of times the combinations specified in the heat map were present in the NGS reads.
  • FIG. 26A Heat map of the cleavage motif/HAM combinations for MG92-4 where the 3rd position within the 4 nt cleavage motif of MG92-4 is variable.
  • FIG. 26B Heat map of the cleavage motif/HAM combinations for MG92-4 where the 4th position within the 4 nt cleavage motif of MG92-4 is variable.
  • FIGs. 27A-27B depict heatmaps showing TCNT NGS re-programming summary data for MG179-36 and MG179-60. Cleavage motif/HAM combinations for MG179-36 and 179-60 were quantified based on the number of times the combinations specified in the heat map were present in the NGS reads.
  • FIG. 27A Heat map of the cleavage motif/HAM combinations for MG179-36 where the 3rd position within the 4 nt cleavage motif of MG179-36 is variable.
  • FIG. 27B Heat map of the cleavage motif/HAM combinations for MG179-60 where the 3rd position within the 4 nt cleavage motif of MG179-60 is variable.
  • FIG. 28A-28C depict heatmaps showing TCAN NGS re-programming summary data for MG179-36, MG179-58, and MG179-60.
  • Cleavage motif/HAM combinations for MG179-36, MG179-58 and 179-60 were quantified based on the number of times the combinations specified in the heat map were present in the NGS reads.
  • FIG. 28A Heat map of the cleavage motif/HAM combinations for MG179-36 where the 4th position within the 4 nt cleavage motif of MG179-36 is variable.
  • FIG. 28B Heat map of the cleavage motif/HAM combinations for MG179-58 where the 4th position within the 4 nt cleavage motif of MG179-58 is variable.
  • FIG. 28C Heat map of the cleavage motif/HAM combinations for MG179-60 where the 4th position within the 4 nt cleavage motif of MG179-60 is variable.
  • SEQ ID NOs: 1-349 show the full-length amino acid sequences of MG92 transposition proteins.
  • SEQ ID NOs: 350-454 show the amino acid sequences of MG92 transposon ends
  • SEQ ID NOs: 478-481, 485-488, and 507-511 show the nucleotide sequences of MG92 transposon ends.
  • TTAT and TTAG are the nucleotide sequences of MG92 insertion motifs.
  • SEQ ID NOs: 471-474, 503-506, and 528 show the nucleotide sequences of MG92 transposon cargos.
  • SEQ ID NO: 527 shows the nucleotide sequence of the MG92-4 coding sequence codon optimized for expression in mammalian cells.
  • SEQ ID NOs: 539-544 show the nucleotide sequence of R-loop ds targets for MG92 transposition proteins.
  • SEQ ID NOs: 551-556 show the nucleotide sequence of sgRNA suitable for MG92 transposition proteins.
  • SEQ ID NOs: 530-538 show the full-length amino acid sequences of MG179 transposition proteins.
  • SEQ ID NOs: 482-484, 489-491, and 512-524 show the nucleotide sequences of MG179 transposon ends.
  • TCAT is the nucleotide sequences of MG179 insertion motifs.
  • SEQ ID NOs: 475-477 and 525-526 show the nucleotide sequences of MG179 transposon cargos.
  • SEQ ID NOs: 545-550 show the nucleotide sequence of R-loop ds targets for MG179 transposition proteins.
  • SEQ ID NOs: 557-562 show the nucleotide sequence of sgRNA suitable for MG179 transposition proteins.
  • SEQ ID NOs: 455-470 and 571-596 show the amino acid sequences of nuclear localization sequences (NLSs) suitable for use with MG92 transposition proteins described herein.
  • SEQ ID NOs: 497-502 show the nucleotide sequences of insertion assay target sequences.
  • SEQ ID NOs: 563-566 show the nucleotide sequences of LE reprogramming IN cargos.
  • SEQ ID NOs: 567-570 show the nucleotide sequences of LE reprogramming IN targets.
  • the term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which w ill depend in part on how the value is measured or determined, i.e.. the limitations of the measurement system. For example, “about” can mean within one or more than one standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value.
  • nucleotide refers to a base-sugar-phosphate combination.
  • Contemplated nucleotides include naturally occurring nucleotides and synthetic nucleotides.
  • Nucleotides are monomeric units of a nucleic acid sequence (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)).
  • the term nucleotide includes ribonucleoside triphosphates adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP.
  • ATP adenosine triphosphate
  • UDP uridine triphosphate
  • CTP cytosine triphosphate
  • GTP guanosine triphosphate
  • deoxyribonucleoside triphosphates such as dATP.
  • dCTP diTP, dUTP, dGTP, dTTP, or derivatives thereof.
  • derivatives include, for example, [aS]dATP, 7-deaza-dGTP and 7-deaza-dATP, and nucleotide derivatives that confer nuclease resistance on the nucleic acid molecule containing them.
  • nucleotide as used herein encompasses dideoxyribonucleoside triphosphates (ddNTPs) and their derivatives.
  • ddNTPs include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP.
  • a nucleotide may be unlabeled or detectably labeled, such as using moieties comprising optically detectable moieties (e.g., fluorophores) or quantum dots.
  • Detectable labels include, for example, radioactive isotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels, and enzyme labels.
  • Fluorescent labels of nucleotides include but are not limited fluorescein, 5- carboxyfluorescein (FAM), 2'7'-dimethoxy-4'5-dichloro-6-carboxyfluorescein (JOE), rhodamine, 6-carboxyrhodamine (R6G), N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy- X-rhodamine (ROX), 4-(4'dimethylaminophenylazo) benzoic acid (DABCYL), Cascade Blue, Oregon Green, Texas Red, Cyanine and 5-(2'-aminoethyl)aminonaphthalene-l -sulfonic acid (EDANS).
  • FAM 5- carboxyfluorescein
  • JE 2'7'-dimethoxy-4'5-dichloro-6-carboxyfluorescein
  • rhodamine 6-carboxy
  • fluorescently labeled nucleotides include [R6G]dUTP, [TAMRA]dUTP, [R110]dCTP, [R6G]dCTP, [TAMRA]dCTP, [JOE]ddATP.
  • nucleotide encompasses chemically modified nucleotides.
  • An exemplary chemically-modified nucleotide is biotin-dNTP.
  • biotinylated dNTPs include, biotin-dATP (e.g., bio-N6-ddATP, biotin-14- dATP), biotin-dCTP (e.g., biotin- 11-dCTP, biotin-14-dCTP), and biotin-dUTP (e.g., biotin-11- dUTP, biotin- 16-dUTP, biotm-20-dUTP).
  • biotin-dATP e.g., bio-N6-ddATP, biotin-14- dATP
  • biotin-dCTP e.g., biotin- 11-dCTP, biotin-14-dCTP
  • biotin-dUTP e.g., biotin-11- dUTP, biotin- 16-dUTP, biotm-20-dUTP.
  • polynucleotide oligonucleotide
  • nucleic acid a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof, either in single-, double-, or multi- stranded form.
  • Contemplated polynucleotides include a gene or fragment thereof.
  • Exemplary polynucleotides include, but are not limited to, DNA, RNA, coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, cell-free polynucleotides including cell-free DNA (cfDNA) and cell-free RNA (cfRNA), nucleic acid probes, and primers.
  • loci locus defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short
  • a T means U (Uracil) in RNA and T (Thymine) in DNA.
  • a polynucleotide can be exogenous or endogenous to a cell and/or exist in a cell-free environment.
  • the term polynucleotide encompasses modified polynucleotides (e.g, altered backbone, sugar, or nucleobase). If present, modifications to the nucleotide structure are imparted before or after assembly of the polymer.
  • Non-limiting examples of modifications include: 5 -bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos, locked nucleic acids, glycol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or fluorescein linked to the sugar), thiol-containing nucleotides, biotin-linked nucleotides, fluorescent base analogs, CpG islands, methyl-7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, queuosine, and wyosine.
  • the sequence of nucleotides may be interrupted by non-nucleotide components.
  • peptide refers to a polymer of at least two amino acid residues joined by peptide bond(s). This term does not connote a specific length of polymer, nor is it intended to imply or distinguish whether the peptide is produced using recombinant techniques, chemical or enzymatic synthesis, or is naturally occurring. The terms apply to naturally occurring amino acid polymers as well as amino acid polymers comprising at least one modified amino acid. In some embodiments, the polymer is interrupted by non-amino acids. The terms include amino acid chains of any length, including full length proteins, and proteins with or without secondary or tertiary’ structure (e.g. domains).
  • amino acid polymer that has been modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, oxidation, and any other manipulation such as conjugation with a labeling component.
  • amino acid and amino acids refer to natural and non-natural amino acids, including, but not limited to, modified amino acids.
  • Modified amino acids include amino acids that have been chemically modified to include a group or a chemical moiety not naturally present on the amino acid.
  • amino acid includes both D-amino acids and L-amino acids.
  • operably linked refers to an arrangement of genetic elements, e.g., a promoter, an enhancer, a polyadenylation sequence, etc., wherein an operation (e.g, movement or activation) of a first genetic element has some effect on the second genetic element.
  • the effect on the second genetic element can be, but need not be, of the same ty pe as operation of the first genetic element.
  • two genetic elements are operably linked if movement of the first element causes an activation of the second element.
  • a regulatory element which may comprise promoter and/or enhancer sequences, is operatively linked to a coding region if the regulatory’ element helps initiate transcription of the coding sequence. There may be intervening residues between the regulatory element and coding region so long as this functional relationship is maintained.
  • a “functional fragment” of a DNA or protein sequence refers to a fragment that retains a biological activity (either functional or structural) that is substantially similar to a biological activity of the full-length DNA or protein sequence.
  • a biological activity of a DNA sequence includes its ability to influence expression in a manner attributed to the full-length sequence.
  • engineered synthetic
  • artificial are used interchangeably herein to refer to an object that has been modified by human intervention.
  • the terms refer to a polynucleotide or polypeptide that is non-naturally occurring.
  • An engineered peptide has, but does not require, low sequence identity (e.g..
  • VPR and VP64 domains are synthetic transactivation domains.
  • Non-limiting examples include the following: a nucleic acid modified by changing its sequence to a sequence that does not occur in nature; a nucleic acid modified by ligating it to a nucleic acid that it does not associate with in nature such that the ligated product possesses a function not present in the original nucleic acid; an engineered nucleic acid synthesized in vitro with a sequence that does not exist in nature; a protein modified by changing its amino acid sequence to a sequence that does not exist in nature; an engineered protein acquiring a new function or property.
  • An “engineered” system comprises at least one engineered component.
  • the term “complex” refers to a joining of at least two components.
  • the two components may each retain the properties/activities they had prior to forming the complex or gain properties as a result of forming the complex.
  • the joining includes, but is not limited to, covalent bonding, non-covalent bonding (i.e., hydrogen bonding, ionic interactions, Van der Waals interactions, and hydrophobic bond), use of a linker, fusion, or any other suitable method.
  • Contemplated components of the complex include polynucleotides, polypeptides, or combinations thereof.
  • a complex comprises an endonuclease and a guide polynucleotide.
  • transposable element refers to a DNA sequence that can move from one location in the genome to another (i.e., they can be “transposed”).
  • Transposable elements can be generally divided into two classes. Class I transposable elements, or “retrotransposons”, are transposed via transcription and translation of an RNA intermediate which is subsequently reincorporated into its new location into the genome via reverse transcription (a process mediated by a reverse transcriptase). Class II transposable elements, or “DNA transposons”, are transposed via a complex of single- or double-stranded DNA flanked on either side by a transposase.
  • TnpA refers to the transposase found in members of the IS200/IS605 bacterial insertion sequence (“IS”) family. Unlike other documented IS transposases, which carry out DNA transposition via double-stranded DNA intermediates, TnpA proceeds via a single-stranded DNA intermediate. TnpA also differs from other documented IS transposases in that it contains flanking subterminal palindromic sequences rather than terminal inverted repeats. Further, TnpA inserts 3’ to specific AT-rich tetra- or pentanucleotides without duplication of the target site.
  • TnpA belongs to the His-hydrophobic-His (“HuH”) superfamily of enzymes rather than the “DDE” superfamily of other IS transposases.
  • TnpB refers to an enzyme of undocumented function (though speculated to play a regulatory role in transposition) found alongside TnpA in IS200/IS605 bacteria.
  • IS200/IS605 transposases are “Y1 transposases”, meaning that they are single-domain proteins comprising a single catalytic tyrosine residue.
  • TnpA-like refers to a protein which exhibits one or more functional, structural, biochemical, biophysical, or other properties or characteristics in common with a TnpA protein.
  • TnpB-like refers to a protein which exhibits one or more function, structural, biochemical, biophysical, or other properties or characteristics in common with a TnpB protein.
  • sequence identity refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as measured using a sequence comparison algorithm.
  • Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment for polypeptide sequences longer than 30 residues; BLASTP using parameters of a wordlength (W) of 2, an expectation (E) of 1000000, and the PAM30 scoring matrix setting gap costs at 9 to open gaps and 1 to extend gaps for sequences of less than 30 residues (these are the default parameters for BLASTP in the BLAST suite available at https://blast.ncbi.nlm.nih.gov) ; CLUSTALW with parameters of the Smith- Waterman homology 7 search algorithm with parameters of a match of 2, a mismatch of -1, and a gap of -1; MUSCLE with default parameters; MAFFT with parameters retree of 2 and maxiterations of 1000; Novafold with default parameters.
  • W wordlength
  • variants of any of the enzymes described herein with one or more conservative amino acid substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three-dimensional structure or function of the polypeptide.
  • Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity , and R chain length for one another. Additionally or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., non-conserved residues) without altering the basic functions of the encoded proteins.
  • conservatively substituted variants may include variants with at least about 20%.
  • Such functional variants can encompass sequences with substitutions such that the activity of critical active site residues of the endonuclease are not disrupted.
  • a functional variant of any of the proteins described herein lacks substitution of at least one of the residues predicted as essential.
  • a functional variant of any of the proteins described herein lacks substitution of all of the residues predicted as essential.
  • a functional variant of any of the proteins described herein lacks substitution of at least one of the conserved or functional residues called out in FIG. IB.
  • a functional variant of any of the proteins described herein lacks substitution of all of the conserved or functional residues called out in FIG. IB
  • a decreased activity variant as a protein described herein comprises a disrupting substitution of at least one, at least two, or all three catalytic residues called out in FIG. IB.
  • Conservative substitution tables providing functionally similar amino acids are available from a variety of references (see, for e.g., Creighton, Proteins: Structures and Molecular Properties (W H Freeman & Co.; 2nd edition (December 1993)). The following eight groups each contain amino acids that are conservative substitutions for one another:
  • the discovery' of new transposable elements w ith unique functionality and structure may offer the potential to further disrupt deoxyribonucleic acid (DNA) editing technologies, improving speed, specificity, functionality, and ease of use.
  • DNA deoxyribonucleic acid
  • Metagenomic sequencing from natural environmental niches containing large numbers of microbial species may offer the potential to drastically increase the number of new' transposable elements documented and speed the discovery' of new oligonucleotide editing functionalities.
  • Transposable elements are deoxyribonucleic acid sequences that can change position within a genome, often resulting in the generation or amelioration of mutations. In eukaryotes, a great proportion of the genome, and a large share of the mass of cellular DNA, is attributable to transposable elements. Although transposable elements are “selfish genes” which propagate themselves at the expense of other genes, they have been found to serve various important functions and to be crucial to genome evolution.
  • transposable elements are classified as either Class I “retrotransposons” or Class II “DNA transposons.”
  • Class I transposable elements also referred to as retrotransposons, function according to a tw o-part “copy and paste” mechanism involving an RNA intermediate.
  • the retrotransposon is transcribed.
  • the resulting RNA is subsequently converted back to DNA byreverse transcriptase (generally encoded by the retrotransposon itself), and the reverse transcribed retrotransposon is finally integrated into its new position in the genome by integrase.
  • Retrotransposons are further classified into three orders. Retrotransposons with long terminal repeats (“LTRs”) encode reverse transcriptase and are flanked by long strands of repeating DNA.
  • LTRs long terminal repeats
  • Retrotransposons with long interspersed nuclear elements encode reverse transcriptase, lack LTRs, and are transcribed by RNA polymerase II.
  • Retrotransposons with short interspersed nuclear elements (“SINEs”) are transcribed by RNA polymerase III but lack reverse transcriptase, instead relying on the reverse transcription machinery of other transposable elements (e.g., LINEs).
  • Class II transposable elements also referred to as DNA transposons, function according to mechanisms that do not involve an RNA intermediate.
  • Many DNA transposons display a “cut and paste” mechanism in which transposase binds terminal inverted repeats (“TIRs”) flanking the transposon, cleaves the transposon from the donor region, and inserts it into the target region of the genome.
  • Others referred to as “helitrons,” display a “rolling circle” mechanism involving a single-stranded DNA intermediate and mediated by an undocumented protein believed to possess HUH endonuclease function and 5’ to 3’ helicase activity. First, a circular strand of DNA is nicked to create two single DNA strands.
  • the protein remains attached to the 5’ phosphate of the nicked strand, leaving the 3’ hydroxyl end of the complementary strand exposed and thus allowing a polymerase to replicate the non-nicked strand.
  • the new strand disassociates and is itself replicated along with the original template strand.
  • Still other DNA transposons, “Polintons,” are theorized to undergo a “self-synthesis” mechanism.
  • the transposition is initiated by an integrase’s excision of a single-stranded extra-chromosomal Polinton element, which forms a racket-like structure.
  • the Polinton undergoes replication with DNA polymerase B, and the double stranded Polinton is inserted into the genome by the integrase.
  • DNA transposons such as those in the IS200/IS605 family, proceed via a “peel and paste” mechanism in which TnpA excises a piece of single-stranded DNA (as a circular “transposon joint”) from the lagging strand template of the donor gene and reinserts it into the replication fork of the target gene.
  • transposable elements While transposable elements have found some use as biological tools, documented transposable elements do not encompass the full range of possible biodiversity and targetability 7 , and may not represent all possible activities. Here, thousands of genomic fragments were mined from numerous metagenomes for transposable elements. The documented diversity 7 of transposable elements may have been expanded and novel systems may have been developed into highly targetable, compact, and precise gene editing agents. Gene Editing Systems
  • transposase systems comprising a transposase and a cargo nucleotide sequence.
  • the transposase is a MG92 transposase (i.e., SEQ ID NOs: 1-349).
  • the transposase is a MG179 transposase (i.e., SEQ ID NOs: 530-538).
  • the engineered transposase system is discovered through metagenomic sequencing.
  • the metagenomic sequencing is conducted on samples collected from various environments.
  • the environment is a human microbiome, an animal microbiome, an environment with high temperatures, an environment with low temperatures, or sediment.
  • the transposase comprises a sequence having at least about 30%, at least about 35%. at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NOs: 1-349 and 530-538.
  • the transposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 1-349 and 530-538.
  • the transposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 1-349 and 530-538. In some embodiments, the transposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 1-349 and 530-538. In some embodiments, the transposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 1-349 and 530-538. In some embodiments, the transposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 1-349 and 530-538.
  • the transposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 1-349 and 530-538. In some embodiments, the transposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 1-349 and 530-538. In some embodiments, the transposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 1-349 and 530-538. In some embodiments, the transposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 1-349 and 530-538.
  • the transposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 1-349 and 530-538. In some embodiments, the transposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 1-349 and 530-538.
  • the transposase is a MG92 transposase (i.e., SEQ ID NOs: 1- 349).
  • the transposase comprises a sequence having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%. at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NOs: 1-349.
  • the transposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 1-349. In some embodiments, the transposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 1-349. In some embodiments, the transposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 1-349. In some embodiments, the transposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 1-349. In some embodiments, the transposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 1-349.
  • the transposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 1-349. In some embodiments, the transposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 1-349. In some embodiments, the transposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 1-349. In some embodiments, the transposase comprises a sequence having at least about 98% identity' to any one of SEQ ID NOs: 1-349. In some embodiments, the transposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 1-349. In some embodiments, the transposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 1-349.
  • the transposase is a MG179 transposase (i.e., SEQ ID NOs: 530- 538).
  • the transposase comprises a sequence having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%. at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NOs: 530-538.
  • the transposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 530-538. In some embodiments, the transposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 530-538. In some embodiments, the transposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 530-538. In some embodiments, the transposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 530-538. In some embodiments, the transposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 530-538.
  • the transposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 530-538. In some embodiments, the transposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 530-538. In some embodiments, the transposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 530-538. In some embodiments, the transposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 530-538. In some embodiments, the transposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 530-538. In some embodiments, the transposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 530-538.
  • the transposase is encoded by a nucleic acid sequence that is codon optimized. In some embodiments, the transposase is encoded by a nucleic acid sequence that is codon optimized for expression in a mammalian cell. In some embodiments, the transposase is encoded by a nucleic acid sequence having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%. at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%.
  • the transposase is encoded by a nucleic acid sequence having at least 70% sequence identity with the nucleic acid sequence of any one of SEQ ID NO: 527. In some embodiments, the transposase is encoded by a nucleic acid sequence having at least 75% sequence identity with the nucleic acid sequence of any one of SEQ ID NO: 527. In some embodiments, the transposase is encoded by a nucleic acid sequence having at least 80% sequence identity with the nucleic acid sequence of any one of SEQ ID NO: 527.
  • the transposase is encoded by a nucleic acid sequence having at least 85% sequence identity with the nucleic acid sequence of any one of SEQ ID NO: 527. In some embodiments, the transposase is encoded by a nucleic acid sequence having at least 90% sequence identity with the nucleic acid sequence of any one of SEQ ID NO: 527. In some embodiments, the transposase is encoded by a nucleic acid sequence having at least 95% sequence identity with the nucleic acid sequence of any one of SEQ ID NO: 527. In some embodiments, the transposase is encoded by a nucleic acid sequence having at least 96% sequence identity with the nucleic acid sequence of any one of SEQ ID NO: 527.
  • the transposase is encoded by a nucleic acid sequence having at least 97% sequence identity' with the nucleic acid sequence of any one of SEQ ID NO: 527. In some embodiments, the transposase is encoded by a nucleic acid sequence having at least 98% sequence identity' with the nucleic acid sequence of any one of SEQ ID NO: 527. In some embodiments, the transposase is encoded by a nucleic acid sequence having at least 99% sequence identity' with the nucleic acid sequence of any one of SEQ ID NO: 527. In some embodiments, the transposase is encoded by a nucleic acid sequence of any one of SEQ ID NO: 527.
  • the transposase the transposase has at least equivalent transposition activity to TnpA transposase.
  • the transposase comprises a nuclear localization sequence (NLS).
  • NLS nuclear localization sequence
  • the NLS is at an N-terminus of the transposase.
  • the NLS is at a C-terminus of the transposase.
  • the NLS is at an N-terminus and a C-terminus of the transposase.
  • the NLS comprises a sequence of any one of SEQ ID NOs: 455- 470 and 571-596, or a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about
  • the NLS comprises a sequence having at least about 80% identity to SEQ ID NOs: 455- 470 and 571-596. In some cases, the NLS comprises a sequence having at least about 85% identity’ to SEQ ID NOs: 455-470 and 571-596. In some cases, the NLS comprises a sequence having at least about 90% identity to SEQ ID NOs: 455-470 and 571-596. In some cases, the NLS comprises a sequence having at least about 91% identity to SEQ ID NOs: 455-470 and 571- 596.
  • the NLS comprises a sequence having at least about 92% identity to SEQ ID NOs: 455-470 and 571-596. In some cases, the NLS comprises a sequence having at least about 93% identity to SEQ ID NOs: 455-470 and 571-596. In some cases, the NLS comprises a sequence having at least about 94% identity’ to SEQ ID NOs: 455-470 and 571-596. In some cases, the NLS comprises a sequence having at least about 95% identity to SEQ ID NOs: 455- 470 and 571-596. In some cases, the NLS comprises a sequence having at least about 96% identity to SEQ ID NOs: 455-470 and 571-596.
  • the NLS comprises a sequence having at least about 97% identity to SEQ ID NOs: 455-470 and 571-596. In some cases, the NLS comprises a sequence having at least about 98% identity to SEQ ID NOs: 455-470 and 571 - 596. In some cases, the NLS comprises a sequence having at least about 99% identity to SEQ ID NOs: 455-470 and 571-596. In some cases, the NLS comprises a sequence having 100% identity to SEQ ID NOs: 455-470 and 571-596.
  • Described herein, in certain embodiments, are engineered transposase systems comprising a transposase and a cargo nucleotide sequence.
  • the transposase is configured to interact with the transposase and transpose the cargo nucleotide sequence to a target nucleic acid site.
  • the cargo nucleotide sequence is flanked by a flanking sequence recognized by the transposase.
  • the cargo nucleotide sequence is double stranded. In some embodiments, the cargo nucleotide sequence is double stranded DNA. In some embodiments, the cargo nucleotide sequence is single stranded. In some embodiments, the cargo nucleotide sequence is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
  • the target nucleic acid is double stranded. In some embodiments, the target nucleic acid is double stranded DNA. In some embodiments, the target nucleic acid is single stranded. In some embodiments, the target nucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
  • the cargo nucleotide comprises a nucleic acid sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%. at least about 94%, at least about 95%, at least about 96%, at least about 97%. at least about 98%.
  • the cargo nucleotide comprises a nucleic acid sequence having at least 70% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 471-474, 503-506, 528. 475-477, 525-526. In some embodiments, the cargo nucleotide comprises a nucleic acid sequence having at least 75% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 471 - 474, 503-506, 528, 475-477, 525-526.
  • the cargo nucleotide comprises a nucleic acid sequence having at least 80% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 471-474, 503-506. 528, 475-477, 525-526. In some embodiments, the cargo nucleotide comprises a nucleic acid sequence having at least 85% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 471-474, 503-506, 528, 475-477, 525-526. In some embodiments, the cargo nucleotide comprises a nucleic acid sequence having at least 90% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 471-474, 503- 506, 528. 475-477.
  • the cargo nucleotide comprises a nucleic acid sequence having at least 95% sequence identity’ with the nucleic acid sequence of any one of SEQ ID NOs: 471-474, 503-506, 528, 475-477, 525-526. In some embodiments, the cargo nucleotide comprises a nucleic acid sequence having at least 96% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 471-474. 503-506. 528, 475-477, 525-526.
  • the cargo nucleotide comprises a nucleic acid sequence having at least 97% sequence identity' with the nucleic acid sequence of any one of SEQ ID NOs: 471-474, 503-506, 528, 475-477, 525-526. In some embodiments, the cargo nucleotide comprises a nucleic acid sequence having at least 98% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 471-474, 503-506, 528, 475-477, 525-526. In some embodiments, the cargo nucleotide comprises a nucleic acid sequence having at least 99% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 471-474. 503-506.
  • the cargo nucleotide comprises a nucleic acid sequence of any one of SEQ ID NOs: 471-474, 503-506, 528, 475-477, 525-526.
  • the cargo nucleotide is configured to interact with the transposase through transposon ends (e.g.. flanking sequences).
  • the cargo nucleotide is flanked by a first flanking sequence at a 5’ end of the cargo nucleotide and a second flanking sequence at a 3’ end of the cargo nucleotide.
  • the flanking sequence comprises a nucleic acid sequence having at least about 20%. at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%. at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512-524.
  • the flanking sequence comprises a nucleic acid sequence having at least 70% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485- 488, 507-511, 482-484, 489-491, and 512-524. In some embodiments, the flanking sequence comprises a nucleic acid sequence having at least 75% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512- 524.
  • the flanking sequence comprises a nucleic acid sequence having at least 80% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512-524. In some embodiments, the flanking sequence comprises a nucleic acid sequence having at least 85% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512- 524.
  • the flanking sequence comprises a nucleic acid sequence having at least 90% sequence identity 7 with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512-524. In some embodiments, the flanking sequence comprises a nucleic acid sequence having at least 95% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512- 524.
  • the flanking sequence comprises a nucleic acid sequence having at least 96% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512-524. In some embodiments, the flanking sequence comprises a nucleic acid sequence having at least 97% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512- 524.
  • the flanking sequence comprises a nucleic acid sequence having at least 98% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512-524. In some embodiments, the flanking sequence comprises a nucleic acid sequence having at least 99% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512- 524. In some embodiments, the flanking sequence comprises a nucleic acid sequence of any one of SEQ ID NOs: 478-481, 485-488, 507-511, 482-484, 489-491, and 512-524.
  • the flanking sequence comprises a sequence having at least about 20%. at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%. at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%. at least about 97%, at least about 98%, or at least about 99% sequence identity to any one of SEQ ID NOs: 350-454.
  • the flanking sequence comprises a sequence having at least 70% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 350-454. In some embodiments, the flanking sequence comprises a sequence having at least 75% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 350-454. In some embodiments, the flanking sequence comprises a sequence having at least 80% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 350-454. In some embodiments, the flanking sequence comprises a sequence having at least 85% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 350-454.
  • the flanking sequence comprises a sequence having at least 90% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 350-454. In some embodiments, the flanking sequence comprises a sequence having at least 95% sequence identity’ with the nucleic acid sequence of any one of SEQ ID NOs: 350-454. In some embodiments, the flanking sequence comprises a sequence having at least 96% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 350-454. In some embodiments, the flanking sequence comprises a sequence having at least 97% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 350-454.
  • the flanking sequence comprises a sequence having at least 98% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 350-454. In some embodiments, the flanking sequence comprises a sequence having at least 99% sequence identity with the nucleic acid sequence of any one of SEQ ID NOs: 350-454. In some embodiments, the flanking sequence comprises a sequence of any one of SEQ ID NOs: 350-454.
  • the cargo nucleotide sequence comprises synthetic nucleotides or modified nucleotides.
  • the cargo polynucleotide comprises one or more inter-nucleoside linkers modified from the natural phosphodiester.
  • all of the inter-nucleoside linkers of the cargo polynucleotide, or contiguous nucleotide sequence thereof, are modified.
  • the inter nucleoside linkage comprises Sulphur (S), such as a phosphorothioate inter-nucleoside linkage.
  • the cargo nucleotide sequence comprises modifications to a ribose sugar or nucleobase.
  • the cargo nucleotide sequence comprises one or more nucleosides comprising a modified sugar moiety, wherein the modified sugar moiety is a modification of the sugar moiety' when compared to the ribose sugar moiety found in deoxyribose nucleic acid (DNA) and RNA.
  • the modification is within the ribose ring structure.
  • Exemplary modifications include, but are not limited to, replacement with a hexose ring (HNA), a bicyclic ring having a biradical bridge between the C2 and C4 carbons on the ribose ring (e.g., locked nucleic acids (LNA)), or an unlinked ribose ring which typically lacks a bond between the C2 and C3 carbons (e.g., UNA).
  • the sugar-modified nucleosides comprise bicyclohexose nucleic acids or tricyclic nucleic acids.
  • the modified nucleosides comprise nucleosides where the sugar moiety is replaced with a non-sugar moiety, for example peptide nucleic acids (PNA) or morpholino nucleic acids.
  • the cargo nucleotide sequence comprises one or more modified sugars.
  • the sugar modifications comprise modifications made by altering the substituent groups on the ribose ring to groups other than hydrogen, or the 2’ -OH group naturally found in DNA and RNA nucleosides.
  • substituents are introduced at the 2’, 3’, 4’, or 5’ positions, or combinations thereof.
  • nucleosides with modified sugar moieties comprise 2‘ modified nucleosides, e.g., 2’ substituted nucleosides.
  • a 2’ sugar modified nucleoside in some embodiments, is a nucleoside that has a substituent other than -H or -OH at the 2' position (2’ substituted nucleoside) or comprises a 2 linked biradical, and comprises 2’ substituted nucleosides and LNA (2’-4’ biradical bridged) nucleosides.
  • Examples of 2’ -substituted modified nucleosides comprise, but are not limited to, 2’-O-alkyl-RNA, 2 -O- methyl-RNA, 2’ -alkoxy -RNA, 2’-O-methoxyethyl-RNA (MOE), 2’-amino-DNA, 2’-Fluoro- RNA, and 2’-F-ANA nucleosides.
  • the modification in the ribose group comprises a modification at the 2’ position of the ribose group.
  • the modification at the 2’ position of the ribose group is selected from the group consisting of 2 -0- methyl, 2'-fluoro, 2’-deoxy, and 2 , -O-(2-methoxyethyl).
  • the cargo nucleotide sequence comprises one or more modified sugars. In some embodiments, the cargo nucleotide sequence comprises only modified sugars. In certain embodiments, the cargo nucleotide sequence comprises greater than about 10%, 25%, 50%, 75%, or 90% modified sugars. In some embodiments, the modified sugar is a bicyclic sugar. In some embodiments, the modified sugar comprises a 2’ -O-methoxy ethyl group. In some embodiments, the cargo nucleotide sequence comprises both inter-nucleoside linker modifications and nucleoside modifications.
  • the cargo nucleotide sequence is 30-250 nucleotides in length. In some embodiments, the cargo nucleotide sequence is more than 90 nucleotides in length. In some embodiments, the cargo nucleotide sequence is less than 245 nucleotides in length. In some embodiments, the cargo nucleotide sequence is 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200. 220, 240, or more than 240 nucleotides in length. In some embodiments, the cargo nucleotide sequence is about 30 to about 40.
  • about 50 to about 140 about 50 to about 160, about 50 to about 180, about 50 to about 200, about 50 to about 220, about 50 to about 240, about 100 to about 120, about 100 to about 140, about 100 to about 160, about 100 to about 180, about 100 to about 200, about 100 to about 220, about 100 to about 240, about 160 to about 180, about 160 to about 200, about 160 to about 220. or about 160 to about 240 nucleotides in length.
  • Described herein, in certain embodiments, are engineered transposase systems comprising a transposase and a cargo nucleotide sequence that further may comprise a guide polynucleotide, e.g., a guide ribonucleic acid (gRNA), a single gRNA, or a dual guide RNA.
  • a guide polynucleotide e.g., a guide ribonucleic acid (gRNA), a single gRNA, or a dual guide RNA.
  • gRNA guide ribonucleic acid
  • a T means U (Uracil) in RNA and T (Thymine) in DNA.
  • the guide polynucleotide is encoded by any one of SEQ ID NOs: 551-556 or a sequence having at least 90%, 95%, 97%, 98%, or 99% sequence identity to any one of SEQ ID NOs: 551-556.
  • the guide polynucleotide comprises a sequence comprising at least about 46-80 consecutive nucleotides having at least about 20%, at least about 25%, at least about 30%, at least about 35%. at least about 40%, at least about 45%, at least about 50%, at least about 55%. at least about 60%.
  • the guide polynucleotide is encoded by a sequence having at least about 80% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 85% identity to any one of SEQ ID NOs: 551-556.
  • the guide polynucleotide is encoded by a sequence having at least about 90% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 95% identity to any one of SEQ ID NOs: 551 -556. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 96% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 97% identity to any one of SEQ ID NOs: 551-556.
  • the guide polynucleotide is encoded by a sequence having at least about 98% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 99% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide is encoded by a sequence having 100% identity to any one of SEQ ID NOs: 551- 556.
  • the guide polynucleotide hybridizes or targets a sequence complementary to any one of SEQ ID NOs: 551-556 or a sequence having at least 90%, 95%, 97%, 98%, or 99% sequence identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 80% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 85% identity to any one of SEQ ID NOs: 551-556.
  • the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 90% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 95% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 96% identity to any one of SEQ ID NOs: 551-556.
  • the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 97% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 98% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 99% identity to any one of SEQ ID NOs: 551-556. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having 100% identity to any one of SEQ ID NOs: 551-556.
  • the guide polynucleotide is encoded by any one of SEQ ID NOs: 557-562 or a sequence having at least 90%, 95%, 97%, 98%, or 99% sequence identity to any one of SEQ ID NOs: 557-562.
  • the guide polynucleotide comprises a sequence comprising at least about 46-80 consecutive nucleotides having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%. at least about 85%.
  • the guide polynucleotide is encoded by a sequence having at least about 80% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 85% identity to any one of SEQ ID NOs: 557-562.
  • the guide polynucleotide is encoded by a sequence having at least about 90% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 95% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 96% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 97% identity to any one of SEQ ID NOs: 557-562.
  • the guide polynucleotide is encoded by a sequence having at least about 98% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide is encoded by a sequence having at least about 99% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide is encoded by a sequence having 100% identity to any one of SEQ ID NOs: 557- 562.
  • the guide polynucleotide hybridizes or targets a sequence complementary to any one of SEQ ID NOs: 557-562 or a sequence having at least 90%, 95%, 97%, 98%, or 99% sequence identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 80% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 85% identity to any one of SEQ ID NOs: 557-562.
  • the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 90% identity' to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 95% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 96% identity' to any one of SEQ ID NOs: 557-562.
  • the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 97% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 98% identity 7 to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having at least about 99% identity to any one of SEQ ID NOs: 557-562. In some embodiments, the guide polynucleotide hybridizes or targets a sequence complementary to a sequence having 100% identity to any one of SEQ ID NOs: 557-562.
  • the guide polynucleotide is configured to form a complex with the endonuclease. In some embodiments, the guide polynucleotide binds to the endonuclease to form a complex. In some embodiments, the guide polynucleotide binds (e.g.. non-covalently through electrostatic interactions or hydrogen bonds) to the endonuclease to form a complex. In some embodiments, the guide polynucleotide is fused to the endonuclease to form a complex. [0112] In some embodiments, the guide polynucleotide comprises a spacer sequence.
  • the spacer sequence is configured to hybridize to a target nucleic acid sequence.
  • the endonuclease is configured to bind to a protospacer adjacent motif (PAM) sequence.
  • the guide polynucleotide e.g., gRNA
  • the guide polynucleotide targets a gene or locus in a mammalian cell.
  • the mammalian cell is a pig, a cow, a goat, a sheep, a rodent, a rat, a mouse, a non-human primate, or a human cell.
  • the guide polynucleotides comprise various structural elements including but not limited to: a spacer sequence which binds to the protospacer sequence (target sequence), a crRNA, and an optional tracrRNA.
  • the genome editing system comprises a CRISPR guide RNA.
  • the guide RNA comprises a crRNA comprising a spacer sequence.
  • the guide RNA additionally comprises a tracrRNA or a modified tracrRNA.
  • the systems provided herein comprise one or more guide polynucleotides.
  • the guide polynucleotide comprises a sense sequence.
  • the guide polynucleotide comprises an anti-sense sequence.
  • the guide polynucleotide comprises nucleotide sequences other than the region complementary' to or substantially complementary' to a region of a target sequence.
  • a crRNA is part or considered part of a guide polynucleotide, or is comprised in a guide polynucleotide, e.g.. a crRNA:tracrRNA chimera.
  • the guide polynucleotide comprises synthetic nucleotides or modified nucleotides.
  • the guide polynucleotide comprises one or more inter-nucleoside linkers modified from the natural phosphodiester.
  • all of the inter-nucleoside linkers of the guide polynucleotide, or contiguous nucleotide sequence thereof, are modified.
  • the inter nucleoside linkage comprises Sulphur (S), such as a phosphorothioate inter-nucleoside linkage.
  • the guide polynucleotide comprises greater than about 10%, 25%, 50%, 75%, or 90% modified inter- nucleoside linkers. In some embodiments, the guide polynucleotide comprises 1, 2, 3, 4, 5. 6, 7, 8, 9, 10, or more than 10 modified inter-nucleoside linkers (e.g., phosphorothioate inter- nucleoside linkage).
  • the guide polynucleotide comprises modifications to a ribose sugar or nucleobase.
  • the guide polynucleotide comprises one or more nucleosides comprising a modified sugar moiety’, wherein the modified sugar moiety is a modification of the sugar moiety when compared to the ribose sugar moiety found in deoxyribose nucleic acid (DNA) and RNA.
  • the modification is within the ribose ring structure.
  • Exemplary' modifications include, but are not limited to, replacement with a hexose ring (HNA), a bicyclic ring having a biradical bridge between the C2 and C4 carbons on the ribose ring (e.g., locked nucleic acids (LNA)), or an unlinked ribose ring which typically lacks a bond between the C2 and C3 carbons (e.g., UNA).
  • the sugar-modified nucleosides comprise bicyclohexose nucleic acids or tricyclic nucleic acids.
  • the modified nucleosides comprise nucleosides where the sugar moiety is replaced with a non-sugar moiety, for example peptide nucleic acids (PNA) or morpholino nucleic acids.
  • the guide polynucleotide comprises one or more modified sugars.
  • the sugar modifications comprise modifications made by altering the substituent groups on the ribose ring to groups other than hydrogen, or the 2’ -OH group naturally found in DNA and RNA nucleosides.
  • substituents are introduced at the 2’, 3’, 4’, 5’ positions, or combinations thereof.
  • nucleosides with modified sugar moieties comprise 2‘ modified nucleosides, e.g., 2' substituted nucleosides.
  • a 2’ sugar modified nucleoside in some embodiments, is a nucleoside that has a substituent other than H or -OH at the 2’ position (2’ substituted nucleoside) or comprises a 2’ linked biradical, and comprises 2’ substituted nucleosides and LNA (2’-4’ biradical bridged) nucleosides.
  • Examples of 2’ -substituted modified nucleosides comprise, but are not limited to, 2’-O-alkyl-RNA, 2 -0- methyl-RNA, 2’-alkoxy-RNA.
  • the modification in the ribose group comprises a modification at the 2’ position of the ribose group.
  • the modification at the 2’ position of the ribose group is selected from the group consisting of 2’-O- methyl, 2’-fluoro, 2'-deoxy, and 2’-O-(2-methoxyethyl).
  • the guide polynucleotide comprises one or more modified sugars. In some embodiments, the guide polynucleotide comprises only modified sugars. In some embodiments, the guide polynucleotide comprises greater than about 10%, 25%, 50%, 75%, or 90% modified sugars. In some embodiments, the modified sugar is a bicyclic sugar. In some embodiments, the modified sugar comprises a 2 ? -O-methyl. In some embodiments, the modified sugar comprises a 2’-fluoro. In some embodiments, the modified sugar comprises a 2’-O- methoxyethyl group. In some embodiments, the guide polynucleotide comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 modified sugars (e.g., comprising a 2’-O-methyl or 2'-fluoro).
  • the guide polynucleotide comprises both inter-nucleoside linker modifications and nucleoside modifications. In some embodiments, the guide polynucleotide comprises greater than about 10%, 25%, 50%, 75%, or 90% modified inter-nucleoside linkers and greater than about 10%, 25%, 50%, 75%, or 90% modified sugars. In some embodiments, the guide polynucleotide comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 modified inter- nucleoside linkers phosphorothioate inter-nucleoside linkage) and 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 modified sugars (e.g., comprising a 2’-O-methyl or 2’-fluoro).
  • the guide polynucleotide comprises a sequence complementary to a eukaryotic, fungal, plant, mammalian, or human genomic polynucleotide sequence. In some embodiments, the guide polynucleotide comprises a sequence complementary to a eukaryotic genomic polynucleotide sequence. In some embodiments, the guide polynucleotide comprises a sequence complementary to a fungal genomic polynucleotide sequence. In some embodiments, the guide polynucleotide comprises a sequence complementary to a plant genomic polynucleotide sequence.
  • the guide polynucleotide comprises a sequence complementary to a mammalian genomic polynucleotide sequence. In some embodiments, the guide polynucleotide comprises a sequence complementary to a human genomic polynucleotide sequence.
  • the guide polynucleotide is 30-250 nucleotides in length. In some embodiments, the guide polynucleotide is more than 90 nucleotides in length. In some embodiments, the guide polynucleotide is less than 245 nucleotides in length. In some embodiments, the guide polynucleotide is 30, 40, 50, 60, 70, 80, 90, 100. 120, 140. 160, 180, 200, 220, 240, or more than 240 nucleotides in length.
  • the guide polynucleotide is about 30 to about 40, about 30 to about 50, about 30 to about 60, about 30 to about 70, about 30 to about 80, about 30 to about 90, about 30 to about 100, about 30 to about 120, about 30 to about 140, about 30 to about 160, about 30 to about 180, about 30 to about 200. about 30 to about 220. about 30 to about 240. about 50 to about 60.
  • engineered transposase systems comprising (a) a double-stranded nucleic acid configured to interact with a transposase and comprising a cargo nucleotide sequence; and (b) a transposase configured to transpose the cargo nucleotide sequence to a target nucleic acid site.
  • the engineered transposase system comprises a transposase comprising sequence having at least about 70% identity to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 70% identity to any one of SEQ ID NOs: 471-474. 503-506. and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 75% identity to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 75% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 80% identity to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 80% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 85% identity to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 85% identity' to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 90% identity to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 90% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 95% identity to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 95% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 96% identity' to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 96% identity' to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 97% identity to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 97% identity' to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 98% identity to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 98% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 99% identity' to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising a sequence having at least about 99% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising 100% identity to any one of SEQ ID NOs: 1-349 and a cargo nucleotide comprising 100% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528.
  • the engineered transposase system comprises a transposase comprising sequence having at least about 70% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 70% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 70% identity 7 to any one of SEQ ID NOs: 350-454 or encoded by a nucleic acid sequence having at least about 70% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 75% identity to any one of SEQ ID NOs: 1 -349, a cargo nucleotide comprising a sequence having at least about 75% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 75% identity to any one of SEQ ID NOs: 350-454 or encoded by a nucleic acid sequence having at least about 75% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 80% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 80% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 80% identity to any one of SEQ ID NOs: 350- 454 or encoded by a nucleic acid sequence having at least about 80% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 85% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 85% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 85% identity to any one of SEQ ID NOs: 350-454 or encoded by a nucleic acid sequence having at least about 85% identity to any- one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 90% identity 7 to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 90% identity 7 to any one of SEQ ID NOs: 471-474, 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 90% identity to any one of SEQ ID NOs: 350-454 or encoded by a nucleic acid sequence having at least about 75% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 95% identity’ to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 95% identity’ to any one of SEQ ID NOs: 471-474, 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 95% identity to any one of SEQ ID NOs: 350-454 or encoded by a nucleic acid sequence having at least about 95% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 96% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 96% identity to any one of SEQ ID NOs: 471-474. 503-506. and 528, and a flanking sequence flanking the cargo nucleotide having at least about 96% identity to any one of SEQ ID NOs: 350-454 or encoded by a nucleic acid sequence having at least about 96% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 97% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 97% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 97% identity to any one of SEQ ID NOs: 350-454 or encoded by a nucleic acid sequence having at least about 97% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 98% identity’ to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 98% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 98% identity to any one of SEQ ID NOs: 350-454 or encoded by anucleic acid sequence having at least about 98% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 99% identity' to any one of SEQ ID NOs: 1-349.
  • a cargo nucleotide comprising a sequence having at least about 99% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 99% identity' to any one of SEQ ID NOs: 350- 454 or encoded by a nucleic acid sequence having at least about 99% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising 100% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising 100% identity to any one of SEQ ID NOs: 471-474. 503-506, and 528, and a flanking sequence flanking the cargo nucleotide having at least about 100% identity to any one of SEQ ID NOs: 350-454 or encoded by a nucleic acid sequence having at least about 100% identity to any one of SEQ ID NOs: 478-481, 485-488, and 507-511.
  • the engineered transposase system comprises a transposase comprising sequence having at least about 70% identity to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 70% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 75% identity 7 to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 75% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 80% identity 7 to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 80% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 85% identity to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 85% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 90% identity to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 90% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 95% identity to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 95% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 96% identity to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 96% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 97% identity to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 97% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 98% identity to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 98% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 99% identity to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising a sequence having at least about 99% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising 100% identity to any one of SEQ ID NOs: 530-538 and a cargo nucleotide comprising 100% identity' to any one of SEQ ID NOs: 475-477 and 525-526.
  • the engineered transposase system comprises a transposase comprising sequence having at least about 70% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 70% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 70% sequence identity to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 75% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 75% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 75% sequence identity to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 80% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 80% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 80% sequence identity to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 85% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 85% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 85% sequence identity’ to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 90% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 90% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 90% sequence identity to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 95% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 95% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 95% sequence identity’ to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 96% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 96% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 96% sequence identity to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 97% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 97% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 97% sequence identity to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 98% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 98% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 98% sequence identity to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 99% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 99% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 99% sequence identity’ to any one of SEQ ID NOs: 482-484, 489-491, and 512-524.
  • the engineered transposase system comprises a transposase comprising 100% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising 100% identity to any one of SEQ ID NOs: 475-477 and 525-526, and a flanking sequence flanking the cargo nucleotide encoded by a nucleic acid sequence having at least about 100% sequence identity to any one of SEQ ID NOs: 482-484, 489-491. and 512-524.
  • engineered transposase systems comprising (a) a double-stranded nucleic acid configured to interact with a transposase and comprising a cargo nucleotide sequence; (b) a transposase configured to transpose the cargo nucleotide sequence to a target nucleic acid site; (c) an endonuclease configured to form a complex with the transposase; and (d) an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to the target nucleic acid sequence.
  • the endonuclease is catalytically dead. In some embodiments, the endonuclease has reduced or altered endonuclease (e.g., nickase) activity.
  • the transposase is configured to form a complex with the endonuclease. In some embodiments, the transposase binds to the endonuclease to form a complex. In some embodiments, the transposase is fused to the endonuclease to form a complex.
  • the engineered transposase system comprises a transposase comprising sequence having at least about 70% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 70% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 70% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 75% identity' to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 75% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 75% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 80% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 80% identity' to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 80% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 85% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 85% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 85% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 90% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 90% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 90% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 95% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 95% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 95% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 96% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 96% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 96% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 97% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 97% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 97% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 98% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 98% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 98% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 99% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising a sequence having at least about 99% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 99% identity to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising 100% identity to any one of SEQ ID NOs: 1-349, a cargo nucleotide comprising 100% identity to any one of SEQ ID NOs: 471-474, 503-506, and 528, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising 100% identity 7 to any one of SEQ ID NOs: 551-556.
  • the engineered transposase system comprises a transposase comprising sequence having at least about 70% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 70% identity to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 70% identity to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 75% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 75% identity to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 75% identity to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 80% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 80% identity to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 80% identity' to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 85% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 85% identity' to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 85% identity to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 90% identity' to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 90% identity to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 90% identity' to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 95% identity to any one of SEQ ID NOs: 530-538.
  • a cargo nucleotide comprising a sequence having at least about 95% identity to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence
  • the engineered guide polynucleotide comprising a sequence having at least about 95% identity to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 96% identity' to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 96% identity' to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 96% identity' to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 97% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 97% identity to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 97% identity to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 98% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 98% identity to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 98% identity to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising sequence having at least about at least about 99% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising a sequence having at least about 99% identity to any one of SEQ ID NOs: 475-477 and 525-526, an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising a sequence having at least about 99% identity to any one of SEQ ID NOs: 557-562.
  • the engineered transposase system comprises a transposase comprising 100% identity to any one of SEQ ID NOs: 530-538, a cargo nucleotide comprising 100% identity to any one of SEQ ID NOs: 475-477 and 525-526.
  • an endonuclease, and an engineered guide polynucleotide configured to form a complex with the endonuclease and to hybridize to a target nucleic acid sequence, the engineered guide polynucleotide comprising 100% identity to any one of SEQ ID NOs: 557-562.
  • nucleic acid sequences encoding an engineered transposase system disclosed herein are disclosed herein.
  • the nucleic acid encoding the engineered transposase system is a DNA, for example a linear DNA, a plasmid DNA, or a mini circle DNA.
  • the nucleic acid encoding the engineered transposase system is an RNA, for example a mRNA.
  • the nucleic acid encoding the engineered transposase system is delivered by a nucleic acid-based vector.
  • the nucleic acid-based vector is a plasmid (e.g., circular DNA molecules that can autonomously replicate inside a cell), cosmid (e.g., pWE or sCos vectors), artificial chromosome, human artificial chromosome (HAC), yeast artificial chromosomes (YAC), bacterial artificial chromosome (BAC), Pl -derived artificial chromosomes (PAC), phagemid, phage derivative, bacmid, or virus.
  • cosmid e.g., pWE or sCos vectors
  • HAC human artificial chromosome
  • YAC yeast artificial chromosomes
  • BAC bacterial artificial chromosome
  • PAC Pl -derived artificial chromosomes
  • the nucleic acid-based vector is selected from the list consisting of: pSF-CMV-NEO-NH2-PPT- 3XFLAG, pSF-CMV-NEO-COOH-3XFLAG, pSF-CMV-PURO-NH2-GST-TEV, pSF-OXB20- COOH-TEV-FLAG(R)-6His, pCEP4 pDEST27, pSF-CMV-Ub-KrYFP, pSF-CMV-FMDV- daGFP, pEFla-mCherry-Nl vector, pEFla-tdTomato vector, pSF-CMV-FMDV-Hygro, pSF- CMV-PGK-Puro, pMCP-tag(m), pSF-CMV-PURO-NH2-CMYC, pSF-OXB20-BetaGal,pSF- OXB20-Fluc, pSF-OXB20,
  • the nucleic acid-based vector comprises a promoter.
  • the promoter is selected from the group consisting of a mini promoter, an inducible promoter, a constitutive promoter, and derivatives thereof.
  • the promoter is selected from the group consisting of CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl9, p40, Synapsin, CaMKII, GRK1, and derivatives thereof.
  • the promoter is a U6 promoter.
  • the promoter is a CAG promoter.
  • the nucleic acid-based vector is a virus.
  • the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus.
  • the virus is an alphavirus.
  • the virus is a parvovirus.
  • the virus is an adenovirus.
  • the virus is an AAV.
  • the virus is a baculovirus.
  • the virus is a Dengue virus. In some embodiments, the virus is a lentivirus. In some embodiments, the virus is a herpesvirus. In some embodiments, the virus is a poxvirus. In some embodiments, the virus is an anellovirus. In some embodiments, the virus is a bocavirus. In some embodiments, the virus is a vaccinia virus. In some embodiments, the virus is or a retrovirus.
  • the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7. AAV8. AAV9, AAV10, AAV11, AAV12, AAV13. AAV14, AAV15, AAV16, AAV- rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV-Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11
  • the virus is AAV 1 or a derivative thereof. In some embodiments, the virus is AAV2 or a derivative thereof. In some embodiments, the virus is AAV3 or a derivative thereof. In some embodiments, the virus is AAV4 or a derivative thereof. In some embodiments, the virus is AAV5 or a derivative thereof. In some embodiments, the virus is AAV6 or a derivative thereof. In some embodiments, the virus is AAV7 or a derivative thereof. In some embodiments, the virus is AAV8 or a derivative thereof. In some embodiments, the virus is AAV9 or a derivative thereof. In some embodiments, the virus is AAV 10 or a derivative thereof. In some embodiments, the virus is AAV 11 or a derivative thereof.
  • the virus is AAV 12 or a derivative thereof. In some embodiments, the virus is AAV 13 or a derivative thereof. In some embodiments, the virus is AAV14 or a derivative thereof. In some embodiments, the virus is AAV 15 or a derivative thereof. In some embodiments, the virus is AAV 16 or a derivative thereof. In some embodiments, the virus is AAV-rh8 or a derivative thereof. In some embodiments, the virus is AAV-rhlO or a derivative thereof. In some embodiments, the virus is AAV-rh20 or a derivative thereof. In some embodiments, the virus is AAV-rh39 or a derivative thereof. In some embodiments, the virus is AAV-rh74 or a derivative thereof.
  • the virus is AAV-rhM4-l or a derivative thereof. In some embodiments, the virus is AAV-hu37 or a derivative thereof. In some embodiments, the virus is AAV-Anc80 or a derivative thereof. In some embodiments, the virus is AAV-Anc80L65 or a derivative thereof. In some embodiments, the virus is AAV-7m8 or a derivative thereof. In some embodiments, the virus is AAV-PHP-B or a derivative thereof. In some embodiments, the virus is AAV-PHP-EB or a derivative thereof. In some embodiments, the virus is AAV-2.5 or a derivative thereof. In some embodiments, the virus is AAV-2IYF or a derivative thereof.
  • the virus is AAV-3B or a derivative thereof. In some embodiments, the virus is AAV-LK03 or a derivative thereof. In some embodiments, the virus is AAV-HSC1 or a derivative thereof. In some embodiments, the virus is AAV-HSC2 or a derivative thereof. In some embodiments, the virus is AAV-HSC3 or a derivative thereof. In some embodiments, the virus is AAV-HSC4 or a derivative thereof. In some embodiments, the virus is AAV-HSC5 or a derivative thereof. In some embodiments, the virus is AAV-HSC6 or a derivative thereof. In some embodiments, the virus is AAV-HSC7 or a derivative thereof.
  • the virus is AAV-HSC8 or a derivative thereof. In some embodiments, the virus is AAV-HSC9 or a derivative thereof. In some embodiments, the virus is AAV-HSC10 or a derivative thereof. In some embodiments, the virus is AAV-HSC11 or a derivative thereof. In some embodiments, the virus is AAV-HSC12 or a derivative thereof. In some embodiments, the virus is AAV-HSC 13 or a derivative thereof. In some embodiments, the virus is AAV-HSC14 or a derivative thereof. In some embodiments, the virus is AAV-HSC15 or a derivative thereof. In some embodiments, the virus is AAV-TT or a derivative thereof.
  • the virus is AAV-DJ/8 or a derivative thereof. In some embodiments, the virus is AAV -Myo or a derivative thereof. In some embodiments, the virus is AAV-NP40 or a derivative thereof. In some embodiments, the virus is AAV-NP59 or a derivative thereof. In some embodiments, the virus is AAV-NP22 or a derivative thereof. In some embodiments, the virus is AAV-NP66 or a derivative thereof. In some embodiments, the virus is AAV-HSC16 or a derivative thereof. [0140] In some embodiments, the virus is HSV-1 or a derivative thereof. In some embodiments, the virus is HSV-2 or a derivative thereof. In some embodiments, the virus is N7N or a derivative thereof.
  • the virus is EBV or a derivative thereof. In some embodiments, the vims is CMV or a derivative thereof. In some embodiments, the virus is HHV- 6 or a derivative thereof. In some embodiments, the virus is HHV-7 or a derivative thereof. In some embodiments, the virus is HHV-8 or a derivative thereof.
  • the nucleic acid encoding the engineered transposase system is delivered by a non-nucleic acid-based deliver ⁇ ' system (e.g., a non-viral delivery' system).
  • a non-viral delivery' system e.g., a non-viral delivery' system
  • the non-viral delivery system is a liposome.
  • the nucleic acid is associated with a lipid.
  • the nucleic acid associated with a lipid in some embodiments, is encapsulated in the aqueous interior of a liposome, interspersed within the lipid bilayer of a liposome, attached to a liposome via a linking molecule that is associated with both the liposome and the nucleic acid, entrapped in a liposome, complexed with a liposome, dispersed in a solution containing a lipid, mixed with a lipid, combined with a lipid, contained as a suspension in a lipid, contained or complexed with a micelle, or otherw ise associated with a lipid.
  • the nucleic acid is comprised in a lipid nanoparticle (LNP).
  • the engineered transposase system is introduced into the cell in any suitable way, either stably or transiently.
  • an engineered transposase system is transfected into the cell.
  • the cell is transduced or transfected with a nucleic acid construct that encodes an engineered transposase system.
  • a cell is transduced (e.g., with a virus encoding an engineered transposase system), or transfected (e.g., wi th a plasmid encoding an engineered transposase system) with a nucleic acid that encodes an engineered transposase system, or the translated engineered transposase system.
  • the transduction is a stable or transient transduction.
  • a plasmid expressing an engineered transposase system is introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction (for example lentivirus or AAV) or other methods know n to those of skill in the art.
  • the gene editing system is introduced into the cell as one or more polypeptides.
  • delivery is achieved through the use of RNP complexes. Delivery methods to cells for polypeptides and/or RNPs are known in the art, for example by electroporation or by cell squeezing.
  • Exemplary methods of delivery 7 of nucleic acids include lipofection, nucleofection, electroporation, stable genome integration (e.g., piggybac), microinjection, biolistics, virosomes, liposomes, immunoliposomes, poly cation or lipid nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.
  • lipofection is described in e g., U.S. Pat. Nos.
  • lipofection reagents are sold commercially (e.g., TransfectamTM, LipofectinTM and SF Cell Line 4D-Nucleofector X KitTM (Lonza)).
  • Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of WO 91/17424 and WO 91/16024.
  • the delivery is to cells (e.g., in vitro or ex vivo administration) or target tissues (e.g., in vivo administration).
  • the nucleic acid is comprised in a liposome or a nanoparticle that specifically targets a host cell.
  • Described herein, in certain embodiments, is a cell comprising the engineered transposase system described herein.
  • the cell is a eukaryotic cell (e.g., a plant cell, an animal cell, a protist cell, or a fungi cell), a mammalian cell (a Chinese hamster ovary (CHO) cell, baby hamster kidney (BHK), human embryo kidney (HEK), mouse myeloma (NS0), or human retinal cells), an immortalized cell (e.g., a HeLa cell, a COS cell, a HEK-293T cell, a MDCK cell, a 3T3 cell, a PC 12 cell. a Huh7 cell, a HepG2 cell.
  • a mammalian cell a Chinese hamster ovary (CHO) cell, baby hamster kidney (BHK), human embryo kidney (HEK), mouse myeloma (NS0), or human retinal cells
  • an immortalized cell e.g., a HeLa cell, a COS cell, a HEK-293T cell, a MDCK
  • a K562 cell a N2a cell, or a SY5Y cell
  • an insect cell e.g., aSpodoptera frugiperda cell, a Trichoplusia ni cell, a Drosophila melanogaster cell, a S2 cell, or aHeliothis virescens cell
  • a yeast cell e.g., a Saccharomyces cerevisiae cell, a Cryptococcus cell, or a Candida cell
  • a plant cell e.g., a parenchyma cell, a collenchyma cell, or a sclerenchyma cell
  • a fungal cell e.g., a Saccharomyces cerevisiae cell, a Cryptococcus cell, or a Candida cell
  • a prokaryotic cell e.g..
  • the cell is a eukaryotic cell.
  • the cell is a mammalian cell.
  • the cell is an immortalized cell.
  • the cell is an insect cell.
  • the cell is a yeast cell.
  • the cell is a plant cell.
  • the cell is a fungal cell.
  • the cell is a prokaryotic cell.
  • the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, a primary cell, or derivative thereof.
  • Described herein, in certain embodiments, are methods for modifying a target nucleic acid comprising providing an engineered transposase system disclosed herein.
  • the engineered transposase system comprises a transposase and cargo nucleotide sequence.
  • the target nucleic acid is double stranded.
  • the target nucleic acid is double stranded DNA.
  • the target nucleic acid is single stranded.
  • the methods are used to introduce a modification in the genome of a cell.
  • the modification is an insertion, deletion, or mutation.
  • the methods are used to introduce site-directed insertions, deletions, and/or mutations in the genome of a cell (for example an insertion and a mutation).
  • the methods are used in combination with a nucleic acid template to facilitate site- directed insertions into the genome of a cell.
  • the cell is a human cell.
  • the cell genome or a vector comprised in the cell is modified.
  • the cell genome is modified ex vivo.
  • the cell genome is modified in vivo.
  • Systems of the present disclosure may be used for various applications, such as, for example, nucleic acid editing (e.g., gene editing), binding to a nucleic acid molecule (e.g., sequence-specific binding).
  • Such systems may be used, for example, for addressing (e.g., removing or replacing) a genetically inherited mutation that may cause a disease in a subject, inactivating a gene in order to ascertain its function in a cell, as a diagnostic tool to detect disease-causing genetic elements (e.g., via cleavage of reverse-transcribed viral RNA or an amplified DNA sequence encoding a disease-causing mutation), as deactivated enzymes in combination with a probe to target and detect a specific nucleotide sequence (e.g., sequence encoding antibiotic resistance int bacteria), to render viruses inactive or incapable of infecting host cells by targeting viral genomes, to add genes or amend metabolic pathways to engineer organisms to produce valuable small molecules, macromolecules, or secondary metabolites, to establish a
  • Described herein, in certain embodiments, are methods for modifying a target nucleic acid comprising providing an engineered transposase system.
  • the present disclosure provides a method for binding, nicking, cleaving, marking, modifying, or transposing a double-stranded deoxyribonucleic acid polynucleotide.
  • the method comprises contacting the double-stranded deoxyribonucleic acid polynucleotide with a transposase.
  • the transposase induces a single-stranded break or a double- stranded break at or proximal to the target nucleic acid site.
  • the transposase induces a staggered single stranded break within or 5’ to the target site.
  • the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.
  • the transposase is configured to transpose the cargo nucleotide sequence as single-stranded deoxyribonucleic acid polynucleotide. In some embodiments, the transposase is configured to transpose the cargo nucleotide sequence as double-stranded deoxyribonucleic acid polynucleotide. In some embodiments, the transposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR).
  • UTR untranslated region
  • UTR untranslated region
  • the present disclosure provides a method of modifying a target nucleic acid site.
  • the method comprises delivering to the target nucleic acid site the engineered transposase system described herein.
  • the engineered transposase system is configured such that upon binding of the engineered transposase system to the target nucleic acid site, the engineered transposase system modifies the target nucleic acid site.
  • modifying the target nucleic acid site comprises binding, nicking, cleaving, marking, modifying, or transposing the target nucleic acid site.
  • the target nucleic acid site comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
  • the target nucleic acid comprises genomic DNA. viral DNA, viral RNA, or bacterial DNA.
  • the target nucleic acid site is in vitro.
  • the target nucleic acid site is within a cell.
  • the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell.
  • the cell is a primary cell.
  • the primary cell is a T cell. In some embodiments, the primary cell is a hematopoietic stem cell (HSC). In some embodiments, the cell is a human cell. In some embodiments, the cell is genome edited ex vivo. In some embodiments, the cell is genome edited in vivo.
  • HSC hematopoietic stem cell
  • delivery of the engineered transposase system to the target nucleic acid site comprises delivering the nucleic acid described herein or the vector described herein. In some embodiments, delivery of engineered transposase system to the target nucleic acid site comprises delivering a nucleic acid comprising an open reading frame encoding the transposase. In some embodiments, the nucleic acid comprises a promoter. In some embodiments, the open reading frame encoding the transposase is operably linked to the promoter.
  • deliver ⁇ ' of the engineered transposase system to the target nucleic acid site comprises delivering a capped mRNA containing the open reading frame encoding the transposase.
  • delivery of the engineered transposase system to the target nucleic acid site comprises delivering a translated polypeptide.
  • the transposase does not induce a break at or proximal to the target nucleic acid site.
  • the transposition activity is measured in vitro by introducing the transposase to cells comprising the target nucleic acid site and detecting transposition of the target nucleic acid site in the cells.
  • the composition comprises 20 pmoles or less of the transposase. In some embodiments, the composition comprises 1 pmol or less of the transposase.
  • the method comprises cultivating a host cell with the engineered transposase system described herein.
  • the host cell is a bacterial cell.
  • the bacterial cell is Bifidobacterium longum. Bifidobacterium lactis, Bifidobacterium animalis, Bifidobacterium breve, Bifidobacterium infantis, Bifidobacterium adolescentis, Lactobacillus acidophilus, Lactobacillus casei.
  • the host cell is an E. coli cell.
  • the E. coli cell is a ZDE3 lysogen or a BL21(DE3) strain.
  • the E. coli cell has an ompT Ion genotype.
  • the host cell is an E. coli cell.
  • the E. coli cell is a ZDE3 lysogen or the E. coli cell is a BL21(DE3) strain.
  • the E. coli cell has an ompT Ion genotype.
  • the open reading frame is operably linked to a promoter sequence.
  • the promoter is selected from the group consisting of a mini promoter, an inducible promoter, a constitutive promoter, and derivatives thereof.
  • the promoter is selected from the group consisting of CMV, CBA. EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl9, p40, Synapsin, CaMKII, GRK1, and derivatives thereof.
  • the open reading frame is operably linked to a T7 promoter sequence, a T7-lac promoter sequence, a lac promoter sequence, a tac promoter sequence, a trc promoter sequence, a ParaBAD promoter sequence, a PrhaBAD promoter sequence, a T5 promoter sequence, a cspA promoter sequence, an araPnw) promoter, a strong leftward promoter from phage lambda (pL promoter), or any combination thereof.
  • a T7 promoter sequence a T7-lac promoter sequence, a lac promoter sequence, a tac promoter sequence, a trc promoter sequence, a ParaBAD promoter sequence, a PrhaBAD promoter sequence, a T5 promoter sequence, a cspA promoter sequence, an araPnw) promoter, a strong leftward promoter from phage lambda (pL promoter), or any combination thereof.
  • the open reading frame comprises a sequence encoding an affinity tag linked in-frame to a sequence encoding the transposase.
  • the affinity tag is an immobilized metal affinity chromatography (IMAC) tag.
  • the IMAC tag is a polyhistidine tag.
  • the affinity tag is a myc tag, a human influenza hemagglutinin (HA) tag, a maltose binding protein (MBP) tag, a glutathione S- transferase (GST) tag, a streptavidin tag, a FLAG tag. or any combination thereof.
  • the affinity tag is linked in-frame to the sequence encoding the transposase via a linker sequence encoding a protease cleavage site.
  • the protease cleavage site is a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof.
  • TSV tobacco etch virus
  • the open reading frame is codon-optimized for expression in the host cell. In some embodiments, the open reading frame is provided on a vector. In some embodiments, the open reading frame is integrated into a genome of the host cell.
  • the present disclosure provides a culture comprising a host cell described herein in compatible liquid medium.
  • the present disclosure provides a method of producing a transposase, comprising cultivating a host cell described herein in compatible growth medium.
  • the method further comprises inducing expression of the transposase by addition of an additional chemical agent or an increased amount of a nutrient.
  • the additional chemical agent or increased amount of a nutrient comprises Isopropyl P-D-l -thiogalactopyranoside (IPTG) or additional amounts of lactose.
  • the method further comprises isolating the host cell after the cultivation and lysing the host cell to produce a protein extract.
  • the method further comprises subjecting the protein extract to IMAC, or ion-affinity chromatography.
  • the open reading frame comprises a sequence encoding an IMAC affinity tag linked in-frame to a sequence encoding the transposase.
  • the IMAC affinity tag is linked in- frame to the sequence encoding the transposase via a linker sequence encoding protease cleavage site.
  • the protease cleavage site comprises a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof.
  • the method further comprises cleaving the IMAC affinity tag by contacting a protease corresponding to the protease cleavage site to the transposase.
  • the method further comprises performing subtractive IMAC affinity' chromatography to remove the affinity tag from a composition comprising the transposase.
  • kits comprising one or more nucleic acid constructs encoding the various components of the transposase or gene editing system described herein, e.g., comprising a nucleotide sequence encoding the components of the transposase or gene editing system capable of modifying a target DNA sequence.
  • the nucleotide sequence comprises a heterologous promoter that drives expression of the gene editing system components.
  • any of the transposase or gene editing systems disclosed herein is assembled into a pharmaceutical, diagnostic, or research kit to facilitate its use in therapeutic, diagnostic, or research applications.
  • a kit may include one or more containers housing any of the vectors disclosed herein and instructions for use.
  • the kit may be designed to facilitate use of the methods described herein by researchers and can take many forms.
  • Each of the compositions of the kit may be provided in liquid form (e.g., in solution), or in solid form, (e.g, a dry powder).
  • some of the compositions may be constitutable or otherwise processable (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water or a cell culture medium), which may or may not be provided with the kit.
  • a suitable solvent or other species for example, water or a cell culture medium
  • Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit.
  • audiovisual e.g., videotape, DVD, etc.
  • Internet e.g., adidirectional or bidirectional communications
  • web-based communications e.g., adidirectional or bidirectional communications
  • the written instructions in some embodiments, are in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which instructions can also reflect approval by the agency of manufacture, use, or sale for animal administration.
  • Example 1 A method of metagenomic analysis for new proteins
  • Metagenomic samples were collected from sediment, soil, and animals.
  • Deoxyribonucleic acid (DNA) was extracted with a DNA mini-prep kit and sequenced on an. Samples were collected with consent of property owners. Additional raw sequence data from public sources included animal microbiomes, sediment, soil, hot springs, hydrothermal vents, marine, peat bogs, permafrost, and sewage sequences.
  • Metagenomic sequence data was searched using Hidden Markov Models generated based on documented transposase protein sequences to identify new transposases. Transposase proteins identified by the search were aligned to documented proteins to identify potential active sites. This metagenomic workflow resulted in the delineation of the MG92 family described herein.
  • Example 2 Discovery of MG92 Family of Transposases
  • Example 1 Analysis of the data from the metagenomic analysis of Example 1 revealed a new cluster of previously undescribed putative transposase systems comprising 1 family (MG92). The corresponding protein sequences for these new enzymes and their example subdomains are presented as SEQ ID NOs: 1-349.
  • Integrase activity can be conducted via expression in an E. coli lysate based expression system.
  • the required components for in vitro testing are three plasmids: an expression plasmid with the transposon gene(s) under a T7 promoter, a target plasmid, and a donor plasmid which contains the required left end (LE) and right end (RE) DNA sequences for transposition around a cargo gene (e.g.. a Tet resistance gene).
  • the lysate-based expression products, target DNA, and donor DNA are incubated to allow for transposition to occur. Transposition is detected via PCR.
  • the transposition product will be tagmented with T5 and sequenced viaNGS to determine the insertion sites on a population of transposition events.
  • the in vitro transposition products can be transformed into E. coli under antibiotic (e.g., Tet) selection, where growth requires the transposition cargo to be stably inserted into a plasmid. Either single colonies or a population of E. coli can be sequenced to determine the insertion sites.
  • Integration efficiency can be measured via ddPCR or qPCR of the experimental output of target DNA with integrated cargo, normalized to the amount of unmodified target DNA also measured via ddPCR.
  • This assay may also be conducted with purified protein components rather than from lysate-based expression.
  • the proteins are expressed in E. coli protease-deficient B strain under T7 inducible promoter, the cells are lysed using sonication, and the His-tagged protein of interest is purified using Ni-NTA affinity chromatography on a FPLC. Purity is determined using densitometry in of the protein bands resolved on SDS-PAGE and Coomassie stained acrylamide gels.
  • the protein is desalted in storage buffer composed of 50 mM Tris-HCl, 300 mM NaCl, 1 mM TCEP, 5% glycerol; pH 7.5 (or other buffers as determined for maximum stability) and stored at -80°C.
  • the transposon gene(s) are added to the target DNA and donor DNA as described above in a reaction buffer, for example 26 mM HEPES pH 7.5, 4.2 mM TRIS pH 8, 50 pg/mL BSA, 2 mM ATP, 2.1 mM DTT, 0.05 mM EDTA, 0.2 mM MgC12, 28 mM NaCl, 21 mM KC1, 1.35% glycerol, (final pH 7.5) supplemented with 15 mM MgOAc2.
  • a reaction buffer for example 26 mM HEPES pH 7.5, 4.2 mM TRIS pH 8, 50 pg/mL BSA, 2 mM ATP, 2.1 mM DTT, 0.05 mM EDTA, 0.2 mM MgC12, 28 mM NaCl, 21 mM KC1, 1.35% glycerol, (final pH 7.5) supplemented with 15 mM MgOAc2.
  • transposon ends are tested for transposase binding via an electrophoretic mobility shift assay (EMSA).
  • ESA electrophoretic mobility shift assay
  • the potential LE or RE is synthesized as a DNA fragment (100- 500 bp) and end-labeled with FAM via PCR with FAM-labeled primers.
  • the transposase protein is synthesized in an in vitro transcription/translation system.
  • binding buffer e.g. 20 mM HEPES pH 7.5, 2.5 mM Tris pH 7.5, 10 mM NaCl, 0.0625 mM EDTA, 5 mM TCEP, 0.005% BSA. 1 pg/mL poly(dl-dC), and 5% glycerol.
  • the binding is incubated at 30° for 40 minutes, then 2 pL of 6X loading buffer (60 mM KC1, 10 mM Tris pH 7,6, 50% glycerol) is added.
  • 6X loading buffer 60 mM KC1, 10 mM Tris pH 7,6, 50% glycerol
  • Shifts of the LE or RE in the presence of transposase protein can be attributed to successful binding and are indicative of transposase activity.
  • This assay can also be performed with transposase truncations or mutations, as well as using E. coll extract or purified protein.
  • short ( ⁇ 140 bp) fragments containing RE-LE junctions separated by up to 10 bp are labelled at both ends with FAM via PCR with FAM-labeled primers.
  • Labeled DNA fragments are incubated with in vitro transcription/translation transposase products and the DNA is analyzed on a denaturing gel. Cleavage at each end of the junction can result in two labelled single-strand fragments which migrate at different rates on the gel.
  • Engineered E. coli strains are transformed with a plasmid expressing the transposon genes and a plasmid containing a temperature-sensitive origin of replication with a selectable marker flanked by left end (LE) and right end (RE) transposon motifs for integration.
  • ssDNA plasmid supercoiling can be used as donor.
  • Transformants induced for expression of these genes are then screened for transfer of the marker to a genomic target by selection at restrictive temperature for plasmid replication and the marker integration in the genome is confirmed by PCR.
  • Integrations are screened using an unbiased approach.
  • purified gDNA is tagmented with Tn5
  • DNA of interest is then PCR amplified using primers specific to the Tn5 tagmentation and the selectable marker.
  • the amplicons are then prepared for NGS sequencing. Analysis of the resulting sequences is trimmed of the transposon sequences and flanking sequences are mapped to the genome to determine insertion position, and insertion rates are determined.
  • a polA mutant E. coll strain, MM383, which produces a DNA polymerase I (Poll) that is defective at 42°C is used to detect integration. Resistance to a selectable marker after growth at 42°C indicates incorporation of donor DNA into the chromosome. The pUC19 plasmid without donor is used as a control following growth for 24 hours at 42°C without antibiotic selection.
  • E. coll strains that successfully grow in selection media are presumed to have integrated the donor DNA encoding the cargo resistance gene. Colonies growing in antibiotic selection plates are genotyped for cargo presence and NGS of whole genome sequence is performed.
  • each of the transposon proteins is purified with 2 NLS peptides on either terminus of the protein sequence.
  • a plasmid containing a selectable neomycin resistance marker (NeoR) or a fluorescent marker flanked by the left end (LE) and right end (RE) motifs is synthesized.
  • Cells are then transfected with the plasmid, recovered for 4-6 hours, and subsequently electroporated with transposon proteins.
  • Antibiotic resistance integration into the genome is quantified by G418-resistant colony counts, and positive transposition by the fluorescent marker is assayed by fluorescence activated cell cytometry. 72 hours after co-transfection, genomic DNA is extracted and used for the preparation of an NGS-library. Integration frequency is assayed by Tn5 tagmentation.
  • Each TnpA-like candidate had a unique cargo comprising the putative left end (LE) and right end (RE) sequences identified in the metagenomic contig. These putative LE and RE sequences were cloned to flank a kanamycin (Kan) resistance cargo gene via Gibson assembly.
  • the ssDNA cargo was generated via PCR of the Kan cargo plasmid with common primers outside of the LE/RE regions with forward primer GTGCGGTAGTAAAGGTTAATACTGTT (SEQ ID NO: 597) and a 5’-phosphate-modified reverse primer CTATAGTGAGTCGTATTA (SEQ ID NO: 598) using standard cycling conditions with Phusion HF.
  • the DNA bottom strand was degraded using Lambda exonuclease and the remaining top strand was purified using a DCC-5 spin column.
  • the single stranded DNA was checked on an agarose gel to verify complete conversion of dsDNA and quantified, yielding an average concentration of 20 nM.
  • each TnpA-like protein gene was synthesized in pET21(+) codon- optimized for E. coli translation under control of a T7 promoter and flanked by C-terminal HA and His tags, with the exception of 92-1 that lacks the HA tag.
  • TnpA-like protein plasmids were then amplified using primers that bind -150 bp upstream of the T7 promoter and downstream of the T7 terminator (primers TGGCGAGAAAGGAAGGGAAG (SEQ ID NO: 599) and CCGAAACAAGCGCTCATGAG (SEQ ID NO: 600)) and purified via SPRI bead clean-up to give final template concentrations >80 ng/pL.
  • TnpA-like protein candidates were first expressed in an in vitro transcription-translation (IVTT) kit following manufacturer’s recommended conditions at 37 °C for 2 hours with a minimum template concentration of 8 ng/pL. Expression was verified via Western blot to the HA tag, with the exception of 92-1, which lacks this tag (FIG. 4).
  • IVTT in vitro transcription-translation
  • Transposition assays were set up with 1 pL of IVTT product added per 10 pL reaction, an average of 5 nM of ssDNA cargo and 50 nM of a 161 nt “target” ssDNA containing an 8N randomized sequence in reaction buffer (20 mM HEPES (pH 7.5), 160 mM NaCl, 5 mM MgC12, 5 mM TCEP, 20 pg/mL BSA, 0.5 pg/mL of poly-dldC, and 20% glycerol).
  • Control reactions contained a no-template control (NTC) reaction of IVTT where Tris buffer was added instead of PCR template to the IVTT.
  • NTC no-template control
  • Reactions were incubated at 37 °C for 1 hour to allow transposition to occur, then the reaction was diluted 10-fold in water and transposition was detected via PCR.
  • the LE junction was detected via a forward primer on the 5' end of the target and reverse primer within the Kan cargo, and the RE junction via a forward primer in the Kan cargo and a reverse primer on the 3’ end of the target.
  • PCR products were run on an agarose gel to detect transposition (FIGs. 5A and 5B), and sequenced via Sanger and NGS sequencing. Chimeric reads that contained both target and cargo sequence were analyzed to determine the junction of transposition, the insertion motif, and the cleavage sites on the cargo (FIGs. 6-9).
  • the insertion motif can be identified from overlapping sequence identity between the cargo and the target. For example, the junction between target and the LE for MG92-3 is identified as the point where sequences for the target and cargo no longer overlap (FIG. 6).
  • the insertion motif can be identified via analysis of the flanking sequence of the target DNA without transposition. In the case of insertion into the 8N, the target motif can only be identified without ambiguity' in the LE read, not the RE read.
  • the insertion motif w as identified as AATGAC or a subset of nucleotides therein, for example TGAC (FIGs. 6-7).
  • the RE junction is identified via the breakpoint where reads switch between mapping to the cargo and the target (FIG. 7). Sequencing for the LE junction and the RE junction shows the same insertion location. The LE junction was further confirmed via NGS, which identified the same cleavage point in the LE as determined via Sanger sequencing (FIG. 8)
  • the LE boundary can be determined as: TGAAAACAAACATTTTACCAAGGCCCGCAGGCTCCGTCTATAGCGACAAGCGCTAA CTTTGGCTACGCTTGTCGTTTAGGCGGGGTTAGT (SEQ ID NO: 601). This is a subset of the full MG92-3 LE and will be recognized by MG92-3 only when flanked by the recognition motif AATGAC. or a subset of nucleotides therein.
  • the RE boundary can be identified as: GTTTGCGCTGTATCTGTGGTCAGGTATCCACTCCTACCTAAAGTAGCAGGCATGAAC GAAAGTTTATGCGGAGTTTGGAAGCCCCGTCTATATTCGCGAAAGCGGATTAGGCGG GGAGGGTTCAC (SEQ ID NO: 602), some or all of which is required for recognition, excision, and insertion by TnpA-like proteins.
  • Both of the sequences contain predicted hairpins for TnpA- like protein recognition flanked by non-canonical base pairing interactions which TnpA and TnpA-like proteins recognize (FIGs. 6-7), as described in Cell 132, 208-220 (2008) and Nucleic Acids Res 39, 8503-8512 (2011).
  • TnpA-like protein candidates are expressed in an in vitro transcription-translation (IVTT) kit following manufacturer’s recommended conditions at 37 °C for 2 hours with a minimum template concentration of 8 ng/pL.
  • IVTT in vitro transcription-translation
  • Excision assays are set with 1 pL of IVTT product added per 10 pL reaction and 100 ng of LE-Kan-RE ssDNA (about 2.2 kb) for 60 minutes at 37 °C in TnpA reaction buffer (20 mM HEPES (pH 7.5), 160 mM NaCl, 5 mM MgC12, 10 mM TCEP, 20 mg/mL BSA, 0.5 mg of poly-dldC, and 20% glycerol).
  • Reactions are terminated with the addition of 0.1% SDS and incubation of an additional 15 minutes at 37 °C. Reactions are subsequently RNase treated and run on a DNA agarose gel to determine if excision of the LE-Kan-RE ssDNA has occurred. The excised Kan sequence is then gel extracted and submitted for sequencing for determination of the LE and RE cleavage motifs.
  • In vivo excision assays are also performed by co-transforming E. coll with 2 plasmids, one containing the LE-Kan-RE cargo and the other TnpA. Following transformation and overnight growth, excision is determined by mini-prep of overnight culture and detection of reclosed donor backbone molecules from which the Kan sequence has been removed on a DNA gel. Controls for this experiment include the transformation of a single plasmid or the transformation of both the TnpA-containing plasmid and the cargo plasmid with an inverted origin of replication. The excised DNA backbone is gel extracted and subjected to sequencing to yield the RE and LE boundaries of the TnpA transposon. The insertion motif remains in the excised backbone and can also be identified at the sealed junction.
  • Example 15 - TnpA can be used with sequence-specific endonucleases for programmable integrations
  • IS200/IS605 transposons are a type of mobile genetic element that integrate at specific target sites. These transposons are mobilized by their encoded TnpA-hke transposase, an enzyme that belongs to the family of ty rosine (Y) transposases.
  • TnpA-hke transposase an enzyme that belongs to the family of ty rosine (Y) transposases.
  • Y ty rosine
  • the mechanism of IS200/IS605 transposon mobilization involves its excision by TnpA or a TnpA-hke protein, followed by its integration at a recognized target site during host replication, when target sites are accessible as ssDNA at the replication fork.
  • RNA-guided binding ability of certain sequence-specific (e.g., Cas) endonuclease effectors to a target site that is shared wi th TnpA-like proteins may aid TnpA-hke effector- mediated integration of a desired cargo by making ssDNA and target site available through formation of the R-loop.
  • a desired cargo for example, a fluorescence marker gene flanked by TnpA-like-recogmzable LE and RE is excised from a donor template by TnpA or a TnpA-like effector and integrated into a desired target site (which contains the TnpA or TnpA- like protein recognizable motif) that is made available by the binding of a (fused) sequence- specific endonuclease.
  • the sequence-specific endonuclease may be engineered to be catalytically dead or have reduced or altered endonuclease (e.g., nickase) activity.
  • TnpA-hke proteins can be “programmed’’ to insert a desired cargo into a T AM-dependent target site made available by fused, engineered (e.g., dead or nickase) sequence-specific endonuclease effectors.
  • TnpA-like proteins The ability of TnpA-like proteins to insert into ssDNA generated as an R-loop in dsDNA can be tested using active TnpA-hke proteins identified in vitro and their corresponding LE and RE sequences.
  • the R-loop can be generated via a sequence-specific endonuclease, such as an RNA-directed nuclease-dead enzyme or nickase that is expressed in an IVTT reaction or added as purified RNP.
  • the TnpA-like protein is tested as described in the in vitro insertion assay, except the target ssDNA is replaced by the dsDNA and RNP.
  • Insertion activity is assayed via PCR with a primer in the dsDNA target and the ssDNA cargo, flanking either the LE junction or the RE junction.
  • the optimal location of the insertion site is tested by placing the insertion motif at various positions along the R-loop to determine the site with best accessibility by the TnpA-like protein. Insertion into ssDNA bubbles in dsDNA where mismatched DNA strands are annealed can also be tested.
  • Example 17 - TnpA transposases derived from metagenomic data are active for ssDNA transposition in vitro
  • HMMs Hidden-Markov models
  • TnpB protein sequences were clustered at 90% amino acid identity (AAI) with coverage mode 1 and 80% coverage of the target sequence. Sequence representatives were chosen to build a multiple sequence alignment and a phylogenetic tree. A family of TnpA transposases associated with the TnpB nucleases with low sequence similarity' to sequences in public databases were identified (MG179, SEQ ID NOs: 530-538), and the presence of the tyrosine (Y) catalytic residue for TnpA was confirmed from the protein alignment.
  • a family of TnpA transposases associated with the TnpB nucleases with low sequence similarity' to sequences in public databases were identified (MG179, SEQ ID NOs: 530-538), and the presence of the tyrosine (Y) catalytic residue for TnpA was confirmed from the protein alignment.
  • Covariance models were built from active IS200/IS605 transposon LE and RE sequences. Specifically, a multiple sequence alignment (MSA) of LE and RE sequences was built and the secondary structure of the alignment was inferred. Covariance models were built and genomic fragments containing candidate TnpA and TnpB were searched using the covariance models. Covariance models predicted LE and RE for many candidate IS200/IS605 insertion sequences (FIG. 10, SEQ ID NOs: 512-524).
  • TnpA candidates were first expressed in an in vitro transcription- translation (IVTT) kit following manufacturer’s recommended conditions at 37 °C for 2 hours with a minimum template concentration of 8 ng/pL.
  • IVTT in vitro transcription- translation
  • Transposition assays were set up with 1 pL of IVTT product added per 10 pL reaction, 10 nM of ssDNA cargo (SEQ ID NOs: 471-477), and 50 nM of a “target” ssDNA in reaction buffer (20 mM HEPES (pH 7.5), 160 mM NaCl, 5 mM MgCh, 5 mM TCEP, 20 pg/mL BSA, 0.5 pg/mL of poly- dldC, and 20% glycerol).
  • the target DNA can contain a designed insertion site, such as TTAC or TCAT (SEQ ID NO: 497), or can contain a 4N random sequence for the insertion site (SEQ ID NO: 498), or can be a concatemer of various possible four -nucleotide combinations (SEQ ID NOs: 499-502).
  • Control reactions contained a no-template control (NTC) reaction of IVTT where Tris buffer was added instead of PCR template to the IVTT but still contained target and cargo ssDNA. Reactions w ere incubated at 37 °C for 1 hour to allow transposition to occur, then the reaction was diluted 10-fold in water and transposition was detected via PCR.
  • NTC no-template control
  • the LE junction was detected via a forward primer on the 5’ end of the target and reverse primer within the cargo, and the RE junction via a forward primer in the cargo and a reverse primer on the 3’ end of the target.
  • PCR products were run on an agarose gel to detect transposition (FIG. 11) and sequenced via Sanger sequencing. Chimeric reads that contained both target and cargo sequence were analyzed to determine the junction of transposition, the insertion motif, and the cleavage sites on the cargo (FIGS. 11-16).
  • Table 2 Experimentally validated active proteins and their insertion motifs.
  • Example 18 - TnpA insertion sites can be re-programmed [0208]
  • LE re-programming [0209] To demonstrate LE re-programming for active MG TnpA candidates, known insertion motif sequences and the interacting non-canonical base pairs within the “guide sequence” of the LE were altered (Table 3) and tested for in vitro transposition according to the in vitro assay described in Example 17.
  • PCR amplification of the transposition product showed successful reprogramming for ELE3, changing the insertion motif from TTAT to TTTT (FIG. 17).
  • Successful reprogramming of the LE allow s for more flexible targeting of the transposase, as the insertion motif could be designed to match the target sites of interest without requiring protein engineering.
  • Example 19 - TnpA can target ssDNA generated in a R-loop (prophetic)
  • in vitro transposition assays are performed with a double- stranded (ds) DNA target instead of ssDNA.
  • the target is composed of 2 annealed ultramers containing an internal insertion site and PAM sequence with variable spacing between the insertion site and PAM.
  • An sgRNA is used to direct dSpyCas9 to bind the ds ultramer target at the PAM to generate an ss-DNA R-loop.
  • TnpA can then insert into the R-loop, to be detected via PCR and sequencing of the chimeric product.
  • the dSpyCas9 RNP is prepared by pre-incubating 1 pM of protein and sgRNA in reaction buffer for 20 minutes at room temperature. Final transposition reactions with 50 nM dsDNA target. 5 nM ssDNA cargo, 250 nM of dSpyCas9 RNP, and 10% v/v IVTT TnpA in reaction buffer (20 mM HEPES (pH 7.5), 160 mM NaCl, 5 mM MgCl 2, 5 mM TCEP, 20 pg/rnl BSA, 0.5 pg/ml of poly-dldC, and 20% glycerol) are incubated at 37 °C for 1 hour.
  • Example 20 - TnpA is active for transposition in human cells
  • Lentiviruses were used to generate an engineered HEK293T cell line.
  • the cell line was engineered with multiple TnpA integration sites comprising insertion motifs adjacent to MG3-6 PAM sequences.
  • the integration site cassette is also known as the ‘landing pad” (FIG. 18; SEQ ID NO: 529).
  • the landing pad was flanked by a hygromycin selection marker to select for cells that contain the landing pad sequence.
  • TnpA candidates were codon optimized for mammalian cell expression (SEQ ID NO: 527) and cloned into two formats: One format was cloned into a fusion protein linked with MG3-6 (and a UGI; “fused”). The other format was into a plasmid by itself with a nuclease (MG3-6) encoded separately on a second plasmid (“infused”). Both designs were tested since, while a single plasmid design may give higher efficiency, it is not known if TnpA is active in a fusion construct.
  • the ssDNA substrate for the TnpAs to bind and integrate into the HEK293T landing pad cell line were designed containing the LE sequence, nanoluciferase, P2A, a puromycin selection cassette, and the RE sequence (FIG. 19, SEQ ID NO: 528).
  • the LE and RE sequences are specific to each TnpA candidate and were determined using in-silico prediction and confirmed with in vitro assays.
  • the nanoluciferase driven by an EFla promoter serves as a reporter that detects the presence of luciferase enzyme measured using a luminescence -based plate reader assay.
  • the puromycin resistance marker was used to select for cells that contain integrated copies of the cassette.
  • the single -stranded cargo was generated by ordering double -stranded cargo gene blocks from IDT. The gene blocks were then amplified using PCR with a standard forward primer and a 5‘ -phosphory lated reverse primer. The PCR products were then cleaned up using a PCR Clean-up System. The cleaned-up PCR products were then digested using lambda exonuclease for 1 hour. The now single -stranded cargo was further purified using a clean and concentrator and then used in the assay.
  • Engineered TnpA landing pad HEK293T cells were transfected with either 200 ng of TnpA plasmid and 200 ng of MG3-6 plasmid, or 400 ng of the fused TnpA-MG3-6 plasmid along with 0, 100, or 400 ng of ssDNA cargo. 10 pmol of MG3-6 guide RNA was transfected 30 minutes after the initial transfection, or mock transfection with 0 pmol guide for the no guide conditions.
  • Example 21 - TnpA can selectively integrate into an R-loop
  • TnpA candidates were first expressed in an in vitro transcription-translation (IVTT) kit following manufacturer’s recommended conditions at 37 °C for 2 hours with a minimum template concentration of 8 ng/pL.
  • IVTT in vitro transcription-translation
  • R-loop transposition assays were set up with 1 pL of IVTT product added per 10 pL reaction, 10 nM of ssDNA cargo (SEQ ID NOs: 473, 475-477), 50 nM of a “target” dsDNA (SEQ ID NOs: 539-550) in reaction buffer (20 mM HEPES (pH 7.5), 160 mM NaCl, 5 mM MgCh, 5 mM TCEP, 20 pg/ml BSA, 0.5 pg/ml of poly-dldC, and 20% glycerol), and a ribonucleoprotein (RNP) complex consisting of 1.25 pM endonuclease dead Streptococcus pyogenes Cas9 (dSpyCas9) protein and 5 pM of a single guide RNA (sgRNA, SEQ ID NOs: 551-562).
  • sgRNA single guide RNA
  • sgRNAs were in vitro transcribed (IVTed) from DNA templates utilizing a kit and following the manufactures recommended conditions of incubation at 37 °C for a period of 16 hours and a minimum template concentration of 20 ng/pL.
  • the insertion reaction target DNA contains a designed insertion site, such as TTAT or TCAT (SEQ ID NOs: 539-550) at a variable distance (2 - 12 bp) from the PAM of dSpyCas9. Control reactions contained no cargo, no template or no sgRNA but still the IVTTed TnpA of interest.
  • Reactions were incubated at 37 °C for 1 hour to allow transposition to occur, then the reaction was diluted 10-fold in water and transposition was detected via PCR.
  • the LE junction was detected via a forward primer on the 5’ end of the target and reverse primer within the cargo, and the RE junction was detected via a forward primer in the cargo and a reverse primer on the 3’ end of the target.
  • PCR products were run on an agarose gel to detect transposition (FIGs. 21-23, and 24A) and sequenced via Sanger sequencing as exemplified in FIG. 24B. Chimeric reads that contained both target and cargo sequence were analyzed to determine the junction of transposition, the insertion motif, and the cleavage sites on the cargo (FIG. 24B).
  • Proteins MG92-4 (FIG. 21), MG179-36 (FIG. 22), MG179-58 (FIG. 23), and MG179- 60 (FIG. 24A) show active transposition into R-loops.
  • the preferred insertion position of MG92-4 is when the insertion motif is 10-12 bases upstream of the PAM (FIG. 21, lanes 9 and 10).
  • MG179-36 also prefers insertion at position 10 or 12 in the R-loop (FIG. 22, lanes 9 and 10).
  • MG179-58 prefers insertion at positions 8 in the R-loop (FIG. 23, lane 8).
  • MG179-60 prefers insertion at position 8 or 10 in the R-loop (FIG. 24A, lanes 8 and 9).
  • Example 22 Left end sequence can be re-programmed to alternate insertion sites
  • Example 21 The in vitro assay described above (Example 21) was modified to test the targeting flexibility of the LE motif.
  • the LE contains the insertion/cleavage motif followed by a conserved hairpin 15-20 nt downstream. Adjacent to the hairpin is another 4 nt motif which forms non- canonical base pairs with the cleavage motif.
  • the transposition product was amplified via PCR and sequenced via NGS sequencing to determine the frequency of the various LEs that were transposition-competent (FIGs 26A-26B, 27A-27B, 28A-28C).
  • MG92-4 does not have targeting flexibility' in the 3rd nucleotide position of the cleavage motif, as only the WT cleavage motif TT AT was active (FIG. 26A). However, it does have an activating mutation in the LE HAM which enhances transposition activity where AAGA is more active than the WT AATA (FIG. 26A). In the 4th position, MG92-4 shows activity with all 3 altered cleavage motifs of TTAT. TTAC, and TTAG when paired with the complementary base mutation in the HAM. TTAC is the most active cleavage motif with higher activity than WT TTAT (FIG. 26B).
  • MG179 proteins followed a similar trend in the 3rd position where the WT “A” is preferred in the cleavage motif, but alternate bases in the HAM enhance activity (FIGs. 27A- 27B).
  • GATA is an activating mutation over the WT.
  • MG179-36 has higher activity with an alternate motif TCAA>TCAT when paired with the complementary base mutation in the HAM (FIG. 28A).
  • MG 179-58 prefers the WT motif but has weak activity with TC AA when paired with the complementary base mutation in the HAM as well (FIG. 28B).
  • MG179-60 prefers the WT motif but has activity with all 4 possible bases in the 4th position when paired with the complementary base mutation in the HAM where TCAT>TCAA>TCAC>TCAG (FIG. 28C).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Zoology (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Mycology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

La présente divulgation propose des systèmes et des procédés de transposition d'une séquence nucléotidique de charge à un site d'acide nucléique cible. Ces systèmes et procédés peuvent comprendre un premier acide nucléique double brin comprenant la séquence nucléotidique de chargement, la séquence nucléotidique de chargement étant conçue pour interagir avec une transposase, ladite transposase étant conçue pour transposer la séquence nucléotidique de chargement vers le site d'acide nucléique cible.
PCT/US2024/019145 2023-03-08 2024-03-08 Systèmes et procédés de transposition de séquences nucléotidiques de charge Pending WO2024187119A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202363489160P 2023-03-08 2023-03-08
US63/489,160 2023-03-08
US202363505960P 2023-06-02 2023-06-02
US63/505,960 2023-06-02

Publications (2)

Publication Number Publication Date
WO2024187119A2 true WO2024187119A2 (fr) 2024-09-12
WO2024187119A3 WO2024187119A3 (fr) 2025-01-16

Family

ID=92675724

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/019145 Pending WO2024187119A2 (fr) 2023-03-08 2024-03-08 Systèmes et procédés de transposition de séquences nucléotidiques de charge

Country Status (1)

Country Link
WO (1) WO2024187119A2 (fr)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220081692A1 (en) * 2020-09-05 2022-03-17 Verne A. Luckow Combinatorial Assembly of Composite Arrays of Site-Specific Synthetic Transposons Inserted Into Sequences Comprising Novel Target Sites in Modular Prokaryotic and Eukaryotic Vectors
MX2023003436A (es) * 2020-09-24 2023-04-14 Metagenomi Inc Sistemas y metodos para transposicion de secuencias de nucleotidos de carga.

Also Published As

Publication number Publication date
WO2024187119A3 (fr) 2025-01-16

Similar Documents

Publication Publication Date Title
US12123014B2 (en) Class II, type V CRISPR systems
US20240327871A1 (en) Systems and methods for transposing cargo nucleotide sequences
US20240287484A1 (en) Systems, compositions, and methods involving retrotransposons and functional fragments thereof
WO2024026499A2 (fr) Systèmes crispr de type v, classe ii
WO2024233984A2 (fr) Systèmes et procédés de transposition de séquences nucléotidiques cargo
WO2024102667A2 (fr) Recombinases de sérine pour l'édition de gènes
AU2023364078A1 (en) Gene editing systems comprising reverse transcriptases
WO2024187119A2 (fr) Systèmes et procédés de transposition de séquences nucléotidiques de charge
EP4482971A2 (fr) Systèmes et procédés de transposition de séquences nucléotidiques de charge
WO2023164593A2 (fr) Systèmes et procédés de transposition de séquences nucléotidiques de charge
WO2024055012A1 (fr) Systèmes et méthodes de transposition de séquences de nucléotides cargo
WO2024055013A1 (fr) Systèmes et procédés de transposition de séquences nucléotidiques de chargement
EP4630544A2 (fr) Compositions de rétrotransposon et procédés d'utilisation
WO2024187140A2 (fr) Systèmes crispr de classe 2 et de type v
US20240360477A1 (en) Systems and methods for transposing cargo nucleotide sequences
US20250059568A1 (en) Class ii, type v crispr systems
KR20250175370A (ko) 클래스 2, v형 crispr 시스템
WO2024124197A2 (fr) Compositions de rétrotransposon et procédés d'utilisation
WO2024086661A2 (fr) Systèmes d'édition de gènes comprenant des transcriptases inverses
AU2023226059A1 (en) Fusion proteins
WO2024229449A2 (fr) Enyzmes d'édition de base
KR20250153814A (ko) Ruvc 도메인을 갖는 효소

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE