[go: up one dir, main page]

WO2025038989A1 - Rna-guided genome recombineering at kilobase scale - Google Patents

Rna-guided genome recombineering at kilobase scale Download PDF

Info

Publication number
WO2025038989A1
WO2025038989A1 PCT/US2024/042871 US2024042871W WO2025038989A1 WO 2025038989 A1 WO2025038989 A1 WO 2025038989A1 US 2024042871 W US2024042871 W US 2024042871W WO 2025038989 A1 WO2025038989 A1 WO 2025038989A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
protein
composition
cell
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/042871
Other languages
French (fr)
Inventor
Le Cong
Chengkun WANG
Yuanhao QU
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leland Stanford Junior University
Original Assignee
Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leland Stanford Junior University filed Critical Leland Stanford Junior University
Publication of WO2025038989A1 publication Critical patent/WO2025038989A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • A61K48/005Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy characterised by an aspect of the 'active' part of the composition delivered, i.e. the nucleic acid delivered
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • the present invention relates to RNA-guided recombineering-editing systems using phage recombination enzymes as well as methods, vectors, nucleic acid compositions, and kits thereof.
  • STDU2-42312.601 S22-113 BACKGROUND OF THE INVENTION
  • ERF proteins form a family that shows no evolutionary relationship to other single-stranded annealing proteins (SSAPs) such as RecT and Red ⁇ .
  • the ERF proteins often function with an exonuclease (e.g., phage D3 exonuclease (also known as ofr51) in viral two-component recombinases.
  • an exonuclease e.g., phage D3 exonuclease (also known as ofr51) in viral two-component recombinases.
  • the recombination protein may be a single stranded DNA annealing protein (SSAP), including but not limited to a microbial recombination protein, for example D3 STDU2-42312.601 (S22-113) ERF (orf52), D3 Exo (orf51), RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
  • the system further comprises a donor nucleic acid, including but not limited to donor DNA or donor RNA.
  • the system further comprises a nucleic acid polymerase, such as without limitation, a reverse transcriptase.
  • the target DNA sequence is a genomic DNA sequence in a host cell.
  • the system comprises a recruitment system which recruits the recombination protein and a nucleic acid that directs the recombination protein to a target.
  • the recruitment system recruits the recombination protein, the nucleic acid that directs the recombination protein, and a CRISPR component.
  • the invention provides a recombination system or composition comprising (i) a Cas protein; (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) for expression in vivo in a cell; or, (v) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell.
  • SSAP single stranded DNA annealing protein
  • SSB single stranded DNA binding protein
  • the recombination protein comprises an amino acid sequence with at least 70% similarity or identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:781; SEQ ID NO:
  • the system or composition comprises a recruitment system for recruiting a guide nucleic acid and a recombination protein.
  • the recruitment system comprises at least one aptamer sequence and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein.
  • the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence.
  • the nucleic acid molecule or nucleic additionally comprises the at least one RNA aptamer sequence or comprises one, two, three, or more RNA aptamer sequences.
  • two aptamer sequences comprise the same sequence or comprise sequences that bind to the same aptamer binding protein.
  • the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof.
  • the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof.
  • the at least one peptide aptamer sequence is conjugated to the guide RNA.
  • the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences.
  • two or more aptamer sequences comprise the same sequence.
  • an aptamer sequence comprises a GCN4 peptide sequence.
  • the recombination protein N-terminus is linked to the aptamer binding protein C-terminus.
  • the recombination protein and the aptamer binding protein are operably linked by a linker.
  • the recombination system or composition comprises at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the recombination protein.
  • the NLS is located at the recombination protein C-terminus or at the recombination protein N-terminus.
  • the recombination protein comprises a microbial recombination protein or active portion thereof, a mitochondrial recombination protein or active portion thereof, a viral recombination protein or active portion thereof, or a eukaryotic recombination protein or active portion thereof, including without limitation, a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof.
  • the recombination protein comprises an amino acid sequence with at least 70% identity , or at least 75% identity, or at least 80% identity, or at least 85% identity, or at least 90% identity, or at least 92% identity, or at least 95% identity, or at least 96% identity, or at least 97% identity, or at least 98% identity, or at least 99% identity to a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof.
  • the system or composition comprises a donor nucleic acid.
  • the donor nucleic acid comprises a single stranded nucleic acid which can STDU2-42312.601 (S22-113) comprise RNA and/or DNA and/or modified nucleotides.
  • the donor nucleic acid comprises a double stranded nucleic acid which can comprise RNA and/or modified nucleotides. In certain embodiments, the donor nucleic acid comprises RNA. In certain embodiments, the donor nucleic acid comprises DNA. In certain embodiments, the donor nucleic acid comprises homology arms. [0019] In certain embodiments, the target DNA sequence comprises a genomic DNA sequence in a host cell. In certain embodiments, the target DNA sequence comprises a mitochondrial DNA or a plastid or chloroplast DNA in a host cell. In certain embodiments, the target DNA is an episomal or viral nucleic acid sequence in a host cell.
  • the recombination system is comprised in a cell, for example, a eukaryotic cell, a mammalian cell, an animal cell, a human cell, or a plant cell.
  • the recruitment system is adaptable to a multitude of combinations and configurations of recombination proteins. For example, by selecting and incorporating multiple nucleic acid aptamers, the system can comprise multiple recombination proteins, which may be the same or different and in various ratios.
  • the system comprises an exonuclease.
  • the system comprises an SSAP.
  • the system comprises an SSB.
  • the system comprises an exonuclease and an SSAP. In certain embodiments, the system comprises an exonuclease and an SSB. In certain embodiments, the system comprises an SSAP and an SSB. In certain embodiments, the system comprises an exonuclease and an SSAP and does not comprise an SSB. In certain embodiments, the system comprises an exonuclease and an SSB and does not comprise an SSAP. In certain embodiments, the system comprises an SSAP and an SSB and does not comprise an exonuclease. In certain embodiments, the system comprises an exonuclease, an SSAP, and an SSB.
  • the invention provides a recombination system comprising an recombination protein and a nucleic acid polymerase, including but not limited to a reverse transcriptase (RT).
  • a reverse transcriptase RT
  • the invention provides a system or composition comprising: (i) a reverse transcriptase(s) (RT); (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription, or nucleic acid molecules comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing STDU2-42312.601 (S22-113) protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s
  • the recombination protein comprises an amino acid sequence with at least 70% similarity or identity, or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:779; SEQ ID NO
  • the nucleic acid polymerase comprises a reverse transcriptase, such as but not limited to a reverse transcriptase which comprises an amino acid sequence having at least 70% similarity or 70% identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to the reverse transcriptase of any one of SEQ ID NO:627 to SEQ ID NO:755.
  • a reverse transcriptase such as but not limited to a reverse transcriptase which comprises an amino acid sequence having at least 70% similarity or 70% identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92%
  • the recombination system comprises a recombination protein and a prime editor.
  • prime editors include prime editor 1, which comprises a wild-type Moloney Murine Leukemia Virus (M-MLV) reverse transcriptase was fused to the Cas9 H840A nickase C-terminus, prime editor 2, which comprises mutant M-MLV RT (D200N/L603W/T330P/T306K/W313F) fused to the Cas9 H840A nickase C-terminus, prime editor 3, which comprises a nicking guide to nick the unedited strand in a prime editor system, prime editor systems which comprise a component that knocks down endogenous mismatch repair, prime editor systems that comprise Cas9 nuclease instead of Cas9 nickase, and twin prime editors.
  • M-MLV Moloney Murine Leukemia Virus
  • the system or composition further comprises a Cas protein; or (iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or a Cas protein for expression in vivo in a cell; or the vector(s) of (v) additionally contains nucleic acid molecule(s) encoding a Cas protein.
  • one or more of the components is provided as a complex.
  • a protein or a fusion protein and a nucleic acid are provided as a ribonucleoprotein (RNP).
  • Non-limiting examples of an RNP include a CRISPR-guideRNA complex, and an SSAP-guide RNA complex.
  • a fusion protein comprises one or more components. Non- limiting examples include a Cas9-SSAP fusion, a Cas9-RT fusion, and a SSAP-RT fusion.
  • the system or composition comprises a recruitment system for recruiting a guide nucleic acid and a recombination protein. In certain embodiments, the recruitment system comprises at least one aptamer sequence and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein.
  • the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence.
  • the nucleic acid molecule or nucleic acid molecules additionally comprises the at least one RNA aptamer sequence or comprises one, two, three, or more RNA aptamer sequences.
  • two aptamer sequences comprise the same sequence or comprise sequences that bind to the same aptamer binding protein.
  • the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof.
  • the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof.
  • the at least one peptide aptamer sequence is conjugated to the guide RNA. In certain embodiments, the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences. In certain embodiments, two or more aptamer sequences comprise the same sequence. In certain embodiments, an aptamer sequence comprises a GCN4 peptide sequence.
  • the recombination protein N-terminus is linked to the aptamer binding protein C-terminus. In certain embodiments, the recombination protein and the aptamer binding protein are operably linked by a linker.
  • the recombination system or composition comprises at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the STDU2-42312.601 (S22-113) recombination protein.
  • NLS nuclear localization sequence
  • the NLS is located at the recombination protein C-terminus or at the recombination protein N-terminus.
  • the recombination protein comprises a microbial recombination protein or active portion thereof, a mitochondrial recombination protein or active portion thereof, a viral recombination protein or active portion thereof, or a eukaryotic recombination protein or active portion thereof, including without limitation, a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof.
  • the recombination protein comprises an amino acid sequence with at least 70% identity, or at least 75% identity, or at least 80% identity, or at least 85% identity, or at least 90% identity, or at least 92% identity, or at least 95% identity, or at least 96% identity, or at least 97% identity, or at least 98% identity, or at least 99% identity to a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof.
  • the system or composition comprises a donor nucleic acid.
  • the donor nucleic acid comprises homology arms.
  • the recombination system is comprised in a cell, for example, a eukaryotic cell, a mammalian cell, an animal cell, a human cell, or a plant cell.
  • the invention provides a system or composition comprising: (i) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; and (ii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iii) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for expression in vivo in a cell; or, (iv) vector(s) containing the nucleic acid molecule(s) of (iii) for expression
  • SSAP single stranded DNA annea
  • the system or composition of does not comprise a CRISPR protein, or does not comprise a Cas protein, or does not comprise a Cas9 protein, or does not comprise a Cas12a protein.
  • the invention provides a method of recombination, which comprises providing in a cell, a system or composition, (i) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; wherein the target DNA sequence comprises a genomic DNA sequence in the cell, and (ii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iii) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for expression in vivo in a cell;
  • SSAP single stranded DNA annea
  • (i) and (ii) further comprise a Cas protein or a nucleic acid polymerase, including but not limited to a native or engineered polymerase having reverse transcriptase activity such as a reverse transcriptase (RT) or a Cas protein and RT; or (iii) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or a Cas protein and/or a RT for expression in vivo in the cell; or the vector(s) of (iv) additionally contains nucleic acid molecule(s) encoding a Cas protein and or RT.
  • a Cas protein or a nucleic acid polymerase including but not limited to a native or engineered polymerase having reverse transcriptase activity such as a reverse transcriptase (RT) or a Cas protein and RT
  • RT reverse transcriptase
  • RT reverse transcriptase
  • RT reverse transcriptase
  • RT reverse transcripta
  • the recombination protein comprises an amino acid sequence with at least 70% similarity or identity, or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:779; SEQ ID NO
  • the nucleic acid polymerase comprises a reverse transcriptase, such as but not limited to a reverse transcriptase which comprises an amino acid sequence at least 70% similarity or 70% identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to the reverse transcriptase of any one of SEQ ID NO:627 to SEQ ID NO:755.
  • a reverse transcriptase such as but not limited to a reverse transcriptase which comprises an amino acid sequence at least 70% similarity or 70% identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity
  • one or more of the components is provided as a complex.
  • a protein or a fusion protein and a nucleic acid are provided as a ribonucleoprotein (RNP).
  • RNP ribonucleoprotein
  • Nonlimiting examples of an RNP include a CRISPR-guideRNA complex, and an SSAP-guide RNA complex.
  • a fusion protein comprises one or more components. Non- limiting examples include a Cas9-SSAP fusion, a Cas9-RT fusion, and a SSAP-RT fusion.
  • the target DNA sequence comprises a genomic sequence of albumin (ALB), AAVS1, HSP90AA1, DYNLT1, ACTB, BCAP31, HIST1H2BK, CLTA, or RAB11A.
  • the system or composition comprises a recruitment system for recruiting a guide nucleic acid and a recombination protein.
  • the recruitment system comprises at least one aptamer sequence and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein.
  • the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence.
  • the nucleic acid molecule or nucleic acid molecules additionally comprises the at least one RNA aptamer sequence or comprises one, two, three, or more RNA aptamer sequences. In certain embodiments two aptamer sequences comprise the same sequence or comprise sequences that bind to the same aptamer binding protein. [0044] In certain embodiments, the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. In certain embodiments, the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. In certain embodiments, the at least one peptide aptamer sequence is conjugated to the guide RNA.
  • the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences.
  • STDU2-42312.601 S22-113
  • two or more aptamer sequences comprise the same sequence.
  • an aptamer sequence comprises a GCN4 peptide sequence.
  • the recombination protein N-terminus is linked to the aptamer binding protein C-terminus.
  • the recombination protein and the aptamer binding protein are operably linked by a linker.
  • the linker comprises 39115.
  • the recombination system or composition comprises at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the recombination protein.
  • NLS comprises the amino acid sequence of SEQ ID NO:16.
  • the NLS is located at the recombination protein C- terminus or at the recombination protein N-terminus.
  • the recombinant protein comprises a microbial recombination protein or active portion thereof, a mitochondrial recombination protein or active portion thereof, a viral recombination protein or active portion thereof, or a eukaryotic recombination protein or active portion thereof, including without limitation, a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof.
  • the recombination protein comprises an amino acid sequence with at least 70% identity , or at least 75% identity, or at least 80% identity, or at least 85% identity, or at least 90% identity, or at least 92% identity, or at least 95% identity, or at least 96% identity, or at least 97% identity, or at least 98% identity, or at least 99% identity to a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof.
  • the system or composition comprises a donor nucleic acid.
  • the donor nucleic acid comprises homology arms.
  • the recombination system is comprised in a cell, for example, a eukaryotic cell, a mammalian cell, an animal cell, a human cell, or a plant cell.
  • the Cas protein is Cas9 or Cas12a.
  • the Cas protein is a catalytically dead.
  • the Cas9 protein is wild-type Streptococcus pyogenes Cas9 or a wild-type Staphylococcus aureus Cas9.
  • the Cas9 protein is a Cas9 nickase (e.g., wild-type Streptococcus pyogenes Cas9 with an amino acid substation at position 10 of D10A).
  • a eukaryotic cell comprising the systems or vectors disclosed herein. STDU2-42312.601 (S22-113)
  • methods of altering a target genomic DNA sequence in a host cell comprise contacting the systems, compositions, or vectors described herein with a target DNA sequence (e.g., introducing the systems, compositions, or vectors described herein into a host cell comprising a target genomic DNA sequence).
  • Kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods are also disclosed herein.
  • the invention provides a system or composition comprising: (i) a nucleic acid polymerase, such as a reverse transcriptase(s) (RT); (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription, or nucleic acid molecules comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), (ii) and (iii
  • the RT system or composition can involve (i) being enzyme, (ii) being nucleic acid molecule(s), and (iii) being nucleic acid molecules; or (i) being nucleic acid molecule(s) encoding the enzyme(s), (ii) being nucleic acid molecule(s), and (iii) being protein, or all of (i), (ii) and (iii) being nucleic acid molecules.
  • the RT system or composition can include more than one reverse transcriptase. When there is more than one reverse transcriptase there can be more than one RNA for reverse transcription.
  • composition (i), (ii) and (iii) further comprises a Cas protein; or (iv) further comprises nucleic acid molecule(s) encoding a Cas protein, e.g., (iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) and/or a Cas protein for expression in vivo in a cell; or the vector(s) of (v) additional contain nucleic acid molecule(s) encoding a Cas protein.
  • Reverse transcriptases that can be used according to the inventions herein include, without limitation, reverse transcriptases, retrotransposon reverse transcriptases, retron reverse transcriptases, LINE-1 reverse transcriptase, Ec86 reverse transcriptase, Human immunodeficiency virus (HIV) RT, Moloney murine leukemia virus (M-MLV) RT a group II intron RT, a group II intron-like RT, a chimeric RT, Maloney mouse leukemia virus (M-MLV) STDU2-42312.601 (S22-113) Transcriptase, Rous sarcoma virus (Rous sarcoma virus, RSV), avian myeloblastosis virus (AMV) reverse transcriptase, Lao Sishi correlated virus (RAV) reverse transcriptase and myeloblast Tumor correlated virus (MAV) reverse transcriptase or other Avian Sarcoma leukovirus (Avian sarcoma leukosis
  • Such engineered polymerases include, with limitation, human DNA polymerase ⁇ which has reverse transcriptase activity in cellular environments (Su et al. 2019, J. Biol. Chem. 294(15):6073-81), and Taq DNA polymerase engineered to enhance reverse transcription and strand displacement (Barnes et el., Front. Bioeng.
  • telomerase reverse transcriptase and related reverse transcriptases that are eukaryotic polymerase genes that do not represent a component of any mobile element or virus, or cellular single-copy rvt reverse transcriptase and related reverse transcriptase with similar domain structure, or R2 mobile element, or R2 retrotranposable element reverse transcriptase (R2 RT), or engineered phage or prokaryotic polymerases including derivatives of DNA polymerases with reverse transcriptase activities, or chimeric reverse transcriptase that are engineered by fusion, with or without peptide linker, between two reverse transcriptase can be used.
  • TERT telomerase reverse transcriptase
  • R2 RT retrotranposable element reverse transcriptase
  • engineered phage or prokaryotic polymerases including derivatives of DNA polymerases with reverse transcriptase activities, or chimeric reverse transcriptase that are engineered by fusion, with or without peptide linker, between two reverse transcripta
  • chimeric or fusion reverse transcriptase will have N-term from one reverse transcriptase and the C-term from another reverse transcriptase.
  • one type of fusion or chimeric reverse transcriptase consists of N-terminal polymerase domain of one reverse transcriptase and the C-terminal RNaseH domain of another reverse transcriptase, with the fusion site either before, within, or after the connection domain (originally located between the polymerase domain and the RNaseH domain).
  • Reverse transcriptases further include, without limitation, those which comprises an amino acid sequence having at least 70% similarity or 70% identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or are identical to a reverse transcriptase comprises in any one of SEQ ID NO:627 to SEQ ID NO:755.
  • the RT system or composition further comprises a recruitment system comprising at least one aptamer sequence; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein.
  • the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence.
  • the RT system or composition or composition having a recruitment system has nucleic acid molecule or nucleic acid molecules that additionally comprises the at least one RNA aptamer sequence, such as nucleic acid molecule or nucleic acid molecules comprises two RNA aptamer sequences; for instance, wherein the two RNA aptamer sequences comprise the same sequence.
  • the RT system or composition or composition having a recruitment system has the aptamer binding protein comprising a MS2 coat protein, or a functional derivative or variant thereof; and/or the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof; and/or the at least one peptide aptamer sequence is conjugated to the Cas protein; and/or the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences; and/or the aptamer sequences comprise the same sequence.
  • the RT system or composition or composition having a recruitment system has the aptamer sequence comprising a GCN4 peptide sequence.
  • the recombination protein N- terminus is linked to the aptamer binding protein C-terminus; and in some embodiments, the RT system or composition further comprises a linker between the recombination protein and the aptamer binding protein; for instance, in some embodiments, the linker comprises the amino acid sequence of SEQ ID NO:15.
  • the system or composition includes at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is / are linked to the recombination protein or the Cas protein or the reverse transcriptase or at least one NLS on each at least two or three of the recombination protein, the reverse transcriptase or the Cas protein; for instance, the nuclear localization sequence in some embodiments comprises the amino acid sequence of SEQ ID NO:16.
  • the nuclear localization sequence is on the recombination protein C-terminus on the recombination protein or the Cas protein.
  • the recombination protein comprises a recombination protein or active portion thereof. In some embodiments of the RT system or composition, the recombination protein comprises a mitochondrial recombination protein or active portion thereof. In some embodiments of the RT system or composition, the STDU2-42312.601 (S22-113) recombination protein comprises a viral recombination protein or active portion thereof. In some embodiments of the RT system or composition, the recombination protein comprises a recombination protein or active portion thereof. In some embodiments of the RT system or composition, the recombination protein comprises RecE or RecT or RecE and RecT or derivative or variant or functional portion thereof.
  • the RecE, or derivative or variant thereof comprises an amino acid sequence with at least 70% (or any whole number integer from 70 to 100% e.g., at least 71%, 72%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%) similarity or identity or homology to an amino acid sequence selected from the group consisting of SEQ ID NOs:1-8.
  • the fusion protein comprises RecT, or derivative or variant thereof.
  • the RecT, or derivative or variant thereof comprises an amino acid sequence with at least 70%(or any whole number integer from 70 to 100% e.g., at least 71%, 72%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%) similarity or identity or homology to an amino acid sequence selected from the group consisting of SEQ ID NOs: 9-14.
  • the Cas protein is catalytically inactive (less than 5% nuclease activity as compared with a wild-type or non-mutated of the Cas protein) or catalytically dead.
  • the Cas protein comprises Cas9 or Cas12a.
  • the Cas9 protein comprises wild-type Streptococcus pyogenes Cas9 or a wild-type Staphylococcus aureus Cas9.
  • the Cas protein comprises a nickase.
  • the nickase comprises wild-type Streptococcus pyogenes Cas9 with an amino acid substation at position 10 of D10A.
  • the RT system or composition further comprises donor nucleic acid.
  • the target DNA sequence is a genomic DNA sequence in a host cell.
  • the RT and recombination protein are functionally linked to each other and comprise a fusion protein.
  • the aptamer binding protein and the recombination protein are functionally linked to each other and comprise a fusion protein.
  • the RT and the Cas protein are functionally linked to each other and STDU2-42312.601 (S22-113) comprise a fusion protein.
  • the recombination protein and the Cas protein are functionally linked to each other a fusion protein.
  • the RT system or composition In some embodiments of the RT system or composition. In some embodiments of the RT system or composition the RT, and the Cas protein, and the recombination protein are functionally linked to each other and comprise a fusion protein.
  • RT and linkers or ways to functionally link components of embodiments of the RT system or composition (as well as with regard to linkers or ways to functionally link components of systems or compositions discussed herein that do not involve RT) mention is made of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020/191171 and WO2021/226558 that involve what is known as prime editing and twin prime editing.
  • WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020/191171 and WO2021/226558 is hereby incorporated herein by reference.
  • RTs of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020/191171 and WO2021/226558 can be used in the practice of the present invention.
  • the invention comprehends a cell or eukaryotic cell comprising any herein-described or discussed RT system or composition.
  • the invention comprehends a method of altering a target genomic DNA sequence in a cell comprising a target genomic DNA sequence, comprising introducing any herein-discussed or described RT system or composition.
  • the cell or eukaryotic cell is a mammalian cell, or in the methods the cell or eukaryotic cell is a mammalian cell; for instance, a human cell; for instance, a stem cell.
  • the method involves the target genomic DNA sequence encoding a gene product.
  • the method includes introducing into a cell comprises administering to a subject.
  • the method involves the subject being a STDU2-42312.601 (S22-113) mammalian non-human animal (e.g., a laboratory animal such as a rodent, rat, mouse, rabbit, or a domestic animal such as a horse, dog or canine, or cat or feline, or a zoo non- domesticated animal in human care and custody) or a production animal such as a cow or pig), or a human.
  • the method the administering comprises in vivo administration.
  • the cell or eukaryotic cell or mammalian cell is an ex vivo or in vitro cell.
  • the method comprises after the introducing step, administering to a subject the ex vivo or in vitro cells; and in such embodiments, the subject is a mammalian non-human animal or a human.
  • the invention involves use of the RT system or composition of for the alteration of a target DNA sequence in a cell.
  • aspects of the RT system that do not pertain to RT or the RT system e.g., linkers
  • the linker may be a peptide of 5-30, 10-30, 10-20 or 15 amino acid residues.
  • the linker may be - (Gly-Gly-Gly-Gly-Ser)2 - (SEQ ID NO:560), - (Gly-Gly-Gly-Gly-Ser)3 - (SEQ ID NO:561), or - (Gly-Gly-Gly-Gly-Ser)4 - (SEQ ID NO:562).
  • the linker is - (Gly-Gly-Gly-Gly-Ser) 3 - (SEQ ID NO:561).
  • the amino acid sequence of SEQ ID NO:561 may be encoded by the nucleic acid sequence of SEQ ID NO:563.
  • a linker is made up of a majority of amino acids that are sterically unhindered, such as glycine and alanine.
  • exemplary linkers are polyglycines (particularly (Glys, poly(Gly-Ala), and polyalanines.
  • One exemplary suitable linker as shown in the Examples below is (Gly-Ser), such as - (Gly-Gly-Gly-Gly-Ser)2 - (SEQ ID NO:560), - (Gly- Gly-Gly-Gly-Ser)3 - (SEQ ID NO:561), or - (Gly-Gly-Gly-Gly-Ser)4 - (SEQ ID NO:562).
  • Linkers may also be non-peptide linkers.
  • These alkyl linkers may further be substituted by any non-sterically hindering group such as lower alkyl (e.g., C1-4) lower acyl, halogen (e.g., CI, Br), CN, NH2, phenyl, etc.
  • nucleic acid sequence of linker STDU2-42312.601 (S22-113) GGGGSGGGGSGGGGS (SEQ ID NO:561) Amino acid sequence of linker any p y p , p g p , g p such that Applicants reserve the right and hereby disclose a disclaimer of any previously known product, process, or method. It is further noted that the invention does not intend to encompass within the scope of the invention any product, process, or making of the product or method of using the product, which does not meet the written description and enablement requirements of the USPTO (35 U.S.C.
  • FIG. 1A and FIG. 1B are the reconstructed RecE (FIG. 1A) and RecT (FIG. 1B) phylogenetic trees with eukaryotic recombination enzymes from yeast and human. STDU2-42312.601 (S22-113) [0075]
  • FIG. 2A is a phylogenetic tree and length distribution of RecE/RecT homologs.
  • FIG. 2B is the metagenomics distribution of RecE/T.
  • FIG. 2C is a schematic showing disclosed herein.
  • FIG. 1A and FIG. 1B are the reconstructed RecE (FIG. 1A) and RecT (FIG. 1B) phylogenetic trees with eukaryotic recombination enzymes from yeast and human. STDU2-42312.601 (S22-113) [0075]
  • FIG. 2A is a phylogenetic tree and length distribution of RecE/RecT homologs.
  • FIG. 2B is the metagenomics distribution of RecE
  • FIG. 3A and 3B are graphs of the high-throughput sequencing (HTS) reads of homology directed repair (HDR) at the EMX1 (FIG. 3A) locus and the VEGFA (FIG. 3B) locus.
  • FIGS.3C-3D are graphs of the mKate knock-in efficiency at HSP90AA1 (FIG.3C), DYNLT1 (FIG. 3D), and AAVS1 (FIG.3E) loci in HEK293T cells.
  • FIG.3F is images of mKate knock-in efficiency in HEK293T cells with RecT.
  • FIG. 3G is a schematic of an exemplary AAVS1 knock-in strategy and chromatogram trace from RecT knock-in group.
  • FIG. 3H is schematics and graphs of the recruitment control experiment and corresponding knock-in efficiency. All results are normalized to NR.
  • NC no cutting; NR, no recombinator.
  • FIGS.4A-4C are graphs of the relative mKate knock-in efficiencies to the NE group at HSP90AA1 (FIG.4A), DYNLT1 (FIG.4B), and AAVS1 (FIG.4C) loci in HEK293T cells.
  • NC no cutting control group.
  • NR no recombinator control group.
  • FIG. 4D is an image of an exemplary agarose gel of junction PCR that validates mKate knock-in at AAVS1 locus.
  • FIG. 4E and 4F are graphs of the absolute and (FIG. 4E) and relative (FIG. 4F) LOV knock-in efficiencies at AAVS1 locus.
  • FIG. 4G are the Sanger sequencing results of the junction PCR product of an exemplary mKate knock-in at AAVS1 locus.
  • FIGS. 5A-5D are graphs of the genomic knock-in efficiencies at different loci across cell lines A549 (FIG. 5A), HepG2 (FIG. 5B), HeLa (FIG. 5C), and hESCs (H9) (FIG. 5D).
  • FIG. 5A genomic knock-in efficiencies at different loci across cell lines A549 (FIG. 5A), HepG2 (FIG. 5B), HeLa (FIG. 5C), and hESCs (H9) (FIG.
  • FIG. 5E is images of mKate knock-ins in hESCs.
  • FIG. 5F and 5G are genomic-wide off-target site (OTS) counts (FIG. 5F) and OTS chromosomal distribution (FIG. 5G) of REDITv1 tools.
  • FIGS. 6A-6D are graphs of the relative mKate knock-in efficiency at the AAVS1 locus and the DYNT1 locus in A549 cell line (FIG. 6A), the DYNLT1 locus and the HSP90AA1 locus in HepG2 cell line (FIG. 6B), the DYNLT1 locus and the HSP90AA1 locus in Hela cell line (FIG.
  • FIG. 6C is representative FACS results of HSP90AA1 mKate knock-in in hES-H9 cells.
  • FIGS. 7A-7D are graphs of the absolute mKate knock-in efficiencies of different homology arm lengths at the DYNLT1 (FIG. 7A) and HSP90AA1 (FIG. 7B) loci and the no recombinator controls for DYNLT1 (FIG. 7C) and HSP90AA1 (FIG.
  • FIGS. 8A-8F are graphs of the indel rates of the top 3 predicted off-target loci associated with sgEMX1 (FIGS. 8A-8C) or sgVEGFA (FIGS. 8D-8F) in the [0082]
  • FIG.9A is a schematic of select embodiments of REDITv2N and corresponding knock- in efficiencies in HEK293T cells.
  • FIG. 9B and 9C are graphs of genomic-wide off-target site (OTS) counts (FIG. 9B) and OTS chromosomal distribution (FIG. 9C) comparing REDITv2N against REDITv1.
  • OTS genomic-wide off-target site
  • FIG.9D is a schematic of select embodiments of REDITv2D and corresponding knock-in efficiencies.
  • FIG. 9E is a graph of editing efficiency of REDITv1, REDITv2N, and REDITv2D under serum starvation conditions.
  • FIG. 9F is the knock-in efficiencies of REDITv3 in hESCs.
  • FIG. 9G is images of mKate knock in using REDITv3 in hESCs.
  • FIG. 10A and 10B are schematics and graphs of the relative mKate knock-in efficiencies of select embodiments of REDITv2N (FIG. 10A) and REDITv2D (FIG. 10B) at the DYNLT1 locus and the HSP90AA1 locus.
  • FIGS. 11A-11D are images of agarose gels showing junction PCR of mKate knock-in at the DYNLT1 locus and the HSP90AA1 locus for a select REDITv2N system.
  • FIG. 11E is the chromatogram sequence of junction PCR products at the DYNLT1 locus.
  • FIG. 12A and 12B are graphs of the genomic distribution of detected off-target cleavages of select embodiments of REDITv2 (FIG. 12A) and REDITv2N (FIG. 12B).
  • a pileup includes alignments that have two or more reads overlapping with each other. Flanking pairs include alignments that show up on opposite strands within 200bp upstream of each other.
  • Target matched includes alignments that match to a treated target in the upstream sequence (up to 6 mismatches, including 1 mismatch in the PAM, are allowed in the target sequence).
  • FIG.12C is a graph of the HTS HDR and indel reads at EMX1 locus for REDITv2N system.
  • FIG. 13A is an image of an agarose gel showing junction PCR of mKate knock-ins at the DYNLT1 locus for REDITv2D system.
  • FIG. 13B is the chromatogram sequence of junction PCR products at the DYNLT1 locus.
  • FIGS. 14A-14C are graphs of the mKate knock-in efficiencies at the HSP90AA1 locus in REDITv2 (FIG. 14A), REDITv2N (FIG. 14B) and REVITv2D (FIG. 14C) when treated with different FBS concentrations.
  • FIGS.14D-14F are graphs of the mKate knock-in efficiencies at the HSP90AA1 locus in REDITv2 (FIG. 14D), REDITv2N (FIG. 14E) and REVITv2D (FIG. 14F) when treated with different serum FBS concentrations.
  • STDU2-42312.601 S22-113
  • FIG. 15 is images of the nuclear localization of RecE_587 and RecT following EGFP fusion to the REDITv1 systems. Nuclei were stained with NucBlue Live Ready Probes Reagent.
  • FIG. 16A and 16B are the relative mKate knock-in efficiencies at HSP90AA1and DYNLT1 loci following fusion of different nuclear localization sequences to either the N- or C- terminus of RecT and RecE_587.
  • FIG. 16C and 16D are graphs of the absolute mKate knock-in efficiencies of the constructs from FIGS. 16A and 16B for the DYNLT1 locus (FIG. 16C) and the HSP90AA1 locus (FIG. 16D). [0090] FIGS.
  • FIGS. 17A-17D are graphs of the relative (FIGS. 17A and 17B) and absolute (FIGS. 17C and 17D) mKate knock-in efficiencies for the DYNLT1 locus (FIGS. 17A and 17C) and the HSP90AA1 locus (FIGS. 17B and 17D) following fusion new NLS sequences as well as optimal linkers to REDITv2 and REDITv3 variants.
  • the REDITv2 versions using REDITv2N (D10A or H840A) and REDITv2D (dCas9) are indicated in the horizonal axis, along with the number of guides used.
  • the different colors represent the different control groups and REDIT versions.
  • FIG.18 is a graph of the relative editing efficiency of REDITv3N system at HSP90AA1 locus in hES-H9 cells.
  • FIG. 19A is a diagram of an exemplary saCas9 expression vector.
  • FIGS. 19B-19E are graphs of the relative mKate knock-in efficiencies at the AAVS1 locus (FIG. 19B) and HSP90AA1 locus (FIG. 19C) of different effectors in saCas9 system and the respective absolute efficiencies (FIG. 19D and 19E, respectively).
  • NC no cutting control group.
  • NR no recombinator control group.
  • FIG. 20A is a schematic of RecT truncations.
  • FIGS. 20B and 20C are graphs of the relative mKate knock-in efficiencies at the DYNLT1 locus for wild-type Streptococcus pyogenes Cas9 and Streptococcus pyogenes Cas9n(D10A) with single- and double-nicking.
  • FIG. 21A is a schematic of RecE_587 truncations.
  • FIGS. 21B and 21C are graphs of the relative mKate knock-in efficiencies at the DYNLT1 locus for wild-type Streptococcus pyogenes Cas9 and Streptococcus pyogenes Cas9n(D10A) with single- and double-nicking.
  • FIGS.22A and 22B are graphs of comparison of efficiency to perform recombineering- based editing with various exonucleases (FIG. 22A) and single-strand DNA annealing protein (SSAP) (FIG. 22B) from naturally occurring recombineering systems, including NR (no recombinator) as negative control.
  • the gene-editing activity was measured using mKate knock-in STDU2-42312.601 (S22-113) assay at genomic loci (DYNLT1 and HSP90AA1). The data shown are percentage of successful mKate knock-in using human HEK293 cells, each experiments were performed in .
  • FIGS.23A-23E show a compact recruitment system using boxB and N22.
  • the REDIT recombinator proteins were fused to N22 peptide and within the sgRNA was boxB, the short cognizant sequence of N22 peptide (FIG. 23A).
  • FIGS. 23B-23E are graphs of the gene-editing efficiency using mKate knock-in assay, with wildtype SpCas9, with side-by-side comparisons to the MS2-MCP recruitment system.
  • FIGS.23B and 23D are absolute mKate knock-in efficiency at DYNLT1, HSP90AA1 loci and
  • FIGS. 23C and 23E are relative efficiencies.
  • FIGS.24A-24C show a SunTag recruitment system.
  • the REDIT recombinator proteins were fused to scFV antibody and the GCN4 peptide in tandem fashion (10 copies of GCN4 peptide separated by linkers) was fused to the Cas9 protein (FIG. 24A).
  • An mKate knock-in experiment (FIG.24B) with the DYNLT1 locus was used to measure the gene-editing knock-in efficiency (FIG. 24C). All data are measurements of gene-editing efficiency using mKate knock-in assay, with wildtype SpCas9.
  • FIGS. 25A and 25B exemplify REDIT with a Cas12A system.
  • a Cpf1/Cas12a based REDIT system via the SunTag recruitment design was created (FIG. 25A) for two different Cpf1/Cas12a proteins.
  • the efficiencies at two endogenous loci were measured. (FIG. 25B).
  • FIGS. 27A and 27B is a schematic showing the SunTag-based recruitment of SSAP RecT to Cas9-gRNA complex for gene-editing (FIG. 27A) and a graph efficiencies of SunTag compared to MS2-based strategies (FIG. 27B).
  • FIGS. 28A-28C show comparisons of REDIT with alternative HDR-enhancing gene- editing approaches.
  • FIG. 28A is schematics showing alternative HDR-enhancing approaches via fusing functional domains, CtIP or Geminin (Gem), to Cas9 protein (left) and when combined with REDIT (right).
  • FIG.28C is comparisons of gene-editing efficiencies using REDIT and alternative HDR-enhancing tools, Cas9-HE (CtIP fusion), Cas9-Gem (Geminin fusion), and Nocodazole (noc), along with combination of REDIT with these methods (Cas9-HE/Cas9- Gem/noc+REDIT).
  • Donor DNAs have 200 + 400 bp (DYNLT1) or 200 + 200bp (HSP90AA1) of HAs. All assays performed with no donor, NTC and Cas9 (no enhancement) controls.
  • FIGS. 29A-29D show template design guideline, junction precision, and capacity of REDIT gene-editing methods.
  • FIG.29A is graphs of a homology arm (HA) length test comparing different template designs of HDR donors (longer HAs) or NHEJ/MMEJ donors (zero/shorter HAs) using REDIT and Cas9 references. Top and bottom are two genomic loci tested using mKate knock-in assay.
  • HA homology arm
  • FIG.29B is a design of an exemplary junction profiling assay through isolation of knock-in clones, followed by genomic PCR using primers (fwd, rev) binding outside donor to avoid template amplification. Paired Sanger sequencing of the PCR products reveal homologous and non-homologous edits at the 5’- and 3’- junctions.
  • FIG. 29C is a graph of the percentage of colonies with indicated junction profiles from the Sanger sequencing of knock-in clones as in FIG. 29B. Editing methods and donor DNA are listed at the bottom (HA lengths indicated in bracket).
  • FIG. 29D is a graph of knock-in efficiencies using a 2-kb cassette to insert dual-GFP/mKate tags to validate REDIT methods with Cas9.
  • FIGS. 30A-30C show GISseq results ( Figures 6C–6E) indicating that REDIT is an efficient method with the ability to insert kilobase-length sequences with less unwanted editing events.
  • FIG.30A is a schematic showing the design, procedures, and analysis steps for GIS-seq to measure genome-wide insertion sites of the knock-in cassettes. High-molecular-weight (HMW) genomic DNA purification was needed to remove potential contamination from donor DNAs. STDU2-42312.601 (S22-113) Donor DNAs had 200 bp HAs each side.
  • FIG. 30B is representative GIS-seq results showing plus/minus reads at on-target locus DYNLT1.
  • FIG. 30C is a summary of top GIS-seq insertion sites comparing Cas9dn and REDITdn groups, showing the expected on-target insertion site (highlighted) and reduced number of identified off-target insertion sites when using REDITdn. (Left) DYNLT1 and (Right) ACTB loci with MLE calculated from the distribution of filtered and trimmed GIS-seq reads. [00104] FIGS.
  • FIG. 31A-31F show the dependence of REDIT gene-editing on endogenous DNA repair and applying REDIT methods for human stem cell engineering.
  • FIG. 31A is a model showing the editing process and major repair pathways involved when using REDIT or Cas9 for gene-editing, the HDR pathway are highlighted for chemical perturbation (inhibition of RAD51). Donor DNAs with 200 + 200 bp HAs are used for all inhibitor experiments.
  • FIGS. 31B and 31C are graphs showing the relative knock-inefficiency of REDIT tools compared with Cas9 reference treated with RAD51 inhibitor B02 and RI-1, or vehicle-treated, for the wtCas9-based REDIT and Cas9 (FIG.
  • FIG. 31B are graphs of knock-in efficiencies in hESCs (H9) using REDIT and REDITdn tested across three genomic loci, compared with corresponding Cas9 and Cas9dn references.
  • FIGS. 31E and 31F are flow cytometry plots of mKate knock-in results in hESCs using REDIT, REDITdn with Cas9, Cas9dn, and NTC controls. Donor DNAs in the hESC experiments have 200 + 200 bp HAs across all loci tested.
  • FIGS. 32A-32B show chemical perturbations to dCas9 REDIT. Gene editing efficiencies were determined when treated with mammalian DNA repair pathway inhibitors (Mirin, RI-1, and B02) with (FIG. 32A) and without (FIG. 32B) cell cycle inhibitor (Thy, doubly Thymidine) blocking. Statistical analyses are from t-test results with 1% FDR via a two-stage step- up method. [00106] FIGS.33A and 33B are schematics of the DNA components (gene-editing vectors and template DNA) and tail vein injection of mice, respectively. [00107] FIGS. 34A-34C are results from the tail vein injection of mice with gene-editing vectors.
  • FIG.34A is a schematic and gel electrophoresis of PCR analysis of liver hepatocytes from STDU2-42312.601 (S22-113) the injected mice.
  • FIG. 34B is the Sanger sequencing results of the PCR amplicon.
  • FIG. 34C is a schematic of next-generation sequencing and a graph of the quantification of knock-in junction errors.
  • FIGS. 35A and 35B are schematics of the DNA components (gene-editing and control vector) and adeno-associated virus (AAV) treatment, respectively.
  • FIG.35C is fluorescent images of lungs from AAV treated mice and graphs of corresponding quantitation of tumor number.
  • FIGS. 36A-36C show the predicted structure of E.
  • FIGS. 37A-37B show predicted interactions of EcRecT SSAP amino acids with ssDNA.
  • FIGS.38A-38F show development of the dCas9 gene-editor through mining microbial SSAPs.
  • FIG. 38A Schematic model of dCas9 editor with single-strand annealing proteins (SSAP).
  • FIG. 38B Design of the genomic knock-in assay to measure gene-editing efficiencies (left); workflow of the SSAP screening experiments (right).
  • FIG. 38A Schematic model of dCas9 editor with single-strand annealing proteins
  • FIG. 38C Construct designs for screening gene-editing efficiency of SSAPs using the 2A-mKate knock-in assay, with an 800bp transgene.
  • FIG. 38D Results of initial screen of three SSAPs: Bet protein from Lambda phage (LBet), RecT protein from Rac prophage (RacRecT), and gp2.5 from T7 phage (T7gp2.5).
  • FIG. 38E Screening RecT-like SSAP candidates via metagenomic homolog mining and knock-in assay. The most active candidate is labeled as dCas9-SSAP.
  • NTC non-target control.
  • Donor templates were added in all groups except the no-donor controls, with the homology arm (HA) lengths: DYNLT1, 200+200bp; HSP90AA1, 200+400bp; ACTB, 200+400bp.
  • HA homology arm
  • FIGS. 39A-39H show on-target and off-target editing errors of dCas9-SSAP. (FIG.
  • FIG. 39A Deep sequencing to measure the levels of indel formation when using dCas9-SSAP and Cas9 references at endogenous targets.
  • the donor templates used are 200bp-HA HDR templates. Details of the assay described in Methods.
  • FIG. 39B Clonal Sanger sequencing to analyze the accuracy of knock-in editing using dCas9-SSAP and Cas9 references with different HDR and MMEJ donors.
  • the donor templates used are the 200bp-HA HDR templates and 25bp-HA MMEJ STDU2-42312.601 (S22-113) templates (Methods and Supplementary Notes).
  • FIG. 39C- FIG. 39E Genome-wide detection of insertion sites of knock-in cassette using unbiased sequencing, showing (FIG.
  • FIG.39D representative reads aligned at knock-in genomic site, and (e) summary of detected on- target and off-target insertion sites.
  • FIG.39F- FIG.39G workflow and results for measuring cell fitness effect as defined by percentage of live cells after editing (normalized to mock controls).
  • FIG. 39H Summary analysis of knock-in accuracy of dCas9-SSAP editor, in comparison with Cas9 HDR and Cas9 MMEJ methods. Accuracy is defined as the overall yield (%) of correct knock-in within all edited outcomes (correct knock-in, knock-in with indels, and NHEJ indels).
  • FIGS 40A-40G show validation of dCas9-SSAP editor and comparison with Cas9 reference and other HDR-enhancing methods.
  • FIGS 40A-40G show validation of dCas9-SSAP editor and comparison with Cas9 reference and other HDR-enhancing methods.
  • FIGS 40A-40G show validation of dCas9-SSAP editor and comparison with Cas9 reference and other HDR-enhancing methods.
  • FIGS 40A-40G show validation of dCas9-SSAP editor and comparison with Cas9 reference and other HDR-enhancing methods.
  • FIGS 40A-40G show validation of dCas9-SSAP editor and comparison with Cas9 reference and other HDR-enhancing methods.
  • FIGS 40A-40G show validation of dCas9-SSAP editor and comparison with Cas9 reference and other HDR-enhancing methods.
  • FIGS 40A-40G show validation of dCas9-SSAP editor and comparison with Cas9 reference and other HDR-enhancing methods.
  • FIG 40B Imaging verification of mKate knock-in at endogenous genome locus using dCas9-SSAP editor.
  • FIG 40C Design of knock-in donor with different lengths of transgenes.
  • FIG 40D knock-in efficiencies for different transgene lengths using dCas9-SSAP editors.
  • Donor HA lengths are 200bp+200bp for DYNLT1, 200bp+400bp for HSP90AA1.
  • FIG 40E performance of dCas9-SSAP editor compared with Cas9 references across 7 endogenous loci in HEK293T cells. ND, no-donor controls; NT, non-target controls.
  • FIGS 41A-41D show chemical perturbations to probe the editing mechanism of dCas9- SSAP editor.
  • FIGS. 42A-42D show minimization of dCas9-SSAP editor as a compact CRISPR knock-in tool for convenient delivery.
  • FIG. 42A Schematic showing the EcRecT predicted STDU2-42312.601 (S22-113) secondary structure and priming sites for constructing truncated EcRecT proteins based on the structural prediction.
  • FIG. 42B Relative knock-in efficiencies of various All groups were normalized to Cas9 references (individually for each target).
  • FIG. 42C Schematic of dSaCas9-mSSAP system in AAV construct using the compact SaCas9 (left, sizes of elements not shown to scale) and (FIG.
  • FIGS. 43A-43E show gel electrophoresis and sequencing verification of knock-in- specific PCR products using dCas9-SSAP.
  • FIG. 43A Agarose gel results of knock-in-specific junction PCR at DYNLT1 locus.
  • FIG. 43B- FIG. 43E Sanger sequencing chromatogram of genomic junctions from knock-in experiments at DYNLT1 locus. For all samples, Applicants amplified the 5’ (FIG. 43B, FIG.
  • FIG. 44 shows a phylogenetic tree and amino acid alignment of representative RecT homologs along with the protein conserved domain annotated.
  • FIGS. 45A-45B show deep sequencing of short-sequence editing comparing dCas9- SSAP and Cas9 editors.
  • FIG. 45A Donor design of 16-bp replacement at EMX1.
  • FIG. 45B Analysis of precision HDR and indel editing outcomes using deep sequencing at EMX1 genomic locus.
  • FIGS. 46A-46B are schematics showing the workflows used in Sanger sequencing of knock-in products (FIG. 46A) and the sequencing method used in deep on-target indel assay (FIG 46B). Assays described here correspond to Fig.41. gPCR, genomic PCR. Seq-F/seq-R are primers for Sanger sequencing binding upstream/downstream of the knock-in templates. [00120] FIGS.
  • FIGS. 48A-48B show Sanger sequencing chromatograms of genomic junctions from dCas9-SSAP experiments at DYNLT1 locus. The sequences in the red boxes were not precisely repaired.
  • the 5’ (FIG. 47A) and 3’ (FIG. 47B) ends of genomic DNA were amplified using junction-spanning primers to confirm knock-in precision.
  • the genomic-binding primers used are completely outside of the donor DNAs to avoid contamination.
  • FIGS. 48A-48B show Sanger sequencing chromatograms of genomic junctions from dCas9-SSAP experiments at HSP90AA1 locus.
  • the 5’ FIG.
  • FIGS. 49A-49B show genome-wide insertion site mapping and quantification.
  • FIG. 49A Overall workflow for unbiased genome-wide insertion site mapping process. On-target and off-target insertions sites are recovered from reads that align to the reference genome (hg38). Full protocol and data analysis pipeline are detailed in Methods.
  • FIGS.50A-50B show testing of dCas9-SSAP editor tool using single-guide (FIG.50A) and dual-guide (FIG. 50B) designs across three genomic targets (shown on the top). The donor DNAs used are the same as shown in Fig. 3a with 800-bp knock-in design. [00124] FIGS.
  • FIGS. 52A-52C show the full set of flow cytometry analysis data using dCas9-SSAP editor for human stem cell engineering. Flow cytometry analysis of knock-in gene-editing at HSP90AA1 (FIG. 52A), ACTB (FIG. 52B), OCT4 (FIG.
  • FIG. 53 is a schematic showing the RecT protein secondary structure predicted using an online tool (CFSSP, see Methods). The prediction results (secondary structure visualized at top, alignment at bottom) formed the basis for developing a truncated functional RecT variant.
  • FIGS. 54A-54C show optimization of dCas9-SSAP for efficient and durable gene- editing. (FIG. 54A) Knock-in efficiencies for SSAP dosage optimization.
  • FIG. STDU2-42312.601 (S22-113) 54B Performance of dCas9-SSAP editor compared with Cas9 references across 7 endogenous loci in HEK293T cells after SSAP dosage optimization and donor HA extension.
  • FIGS. 55A-55C show optimization of donor dosages and homology arms of donor DNA.
  • FIG.55A Quantification of genomic mKate knock-in efficiency at DYNLT1, HSP90AA1, ACTB loci for donor dosage optimization when using dCas9-SSAP editor. non target, non-target controls.
  • FIGS.56A-56D show validation of dCas9-SSAP editor with protein functional assays.
  • FIG. 56A Design of genomic Puromycin/Blasticidin-resistance-cassette knock-in assay to validate functional on-target editing by dCas9-SSAP.
  • FIG. 56B Immunoblotting confirms the presence and sizes of on-target dCas9-SSAP knock-in products at HSP90AA1 and ACTB loci, performed with anti-V5 antibody recognizing in-frame fusion with endogenous protein. Data shown represent 3 biologically independent experiments.
  • FIG. 56A Design of genomic Puromycin/Blasticidin-resistance-cassette knock-in assay to validate functional on-target editing by dCas9-SSAP.
  • FIG. 56B Immunoblotting confirms the presence and sizes of on-target dCas9-SSAP knock-in products at HSP90AA1 and ACTB loci, performed with anti-V5 antibody
  • FIGS. 57A-57E show validation the stability of on-target editing.
  • FIGS. 57A Workflow of the long-term time-course experiments to evaluate the editing outcome stability using STDU2-42312.601 (S22-113) dCas9-SSAP editor.
  • FIG.57B Flow cytometry analysis of knock-in gene-editing at HSP90AA1, ACTB endogenous loci at different time points post delivery of dCas9-SSAP and
  • FIG. 57E Quantification of HSP90AA1 and ACTB gene expression levels in HEK293T cells by bulk RNA-seq analysis, demonstrating significantly higher levels of HSP90AA1 expression. This led to the better cell survival in the HSP90AA1 group compared with ACTB group.
  • FIG. 58 shows SSAP + Cas9 mediated knock-in editing with deactivated guide RNA (dgRNA).
  • the SSAP + Cas9 comprises RecT and wtCas9.
  • mKate knock-ins are depicted at DYNLT1, HSP90AA1, and ACTB.
  • FIG. 59 shows dCas9-SSAP mediated knock-in of luciferase-expressing or mKate expressing 600-bp transgenes at the human albumin (ALB) locus (top) or the AAVS1 locus (bottom) in human HEK293T cells or human hepatocytes.
  • ALB human albumin
  • FIG. 60 shows dCas9-SSAP mediated knock-in of luciferase-expressing or mKate expressing 800-bp transgenes at the human albumin (ALB) locus (top) or the AAVS1 locus (bottom) in human HEK293T cells or human hepatocytes.
  • Transgene knock-ins at the albumin locus were highly expressed in hepatocytes but not HEK293T (top).
  • FIGS. 61A-61B shows electroporation of an RNP comprising an 800bp mKate- encoding transgene in K562 cells.
  • Cells were electroporated with RNP comprising purified Cas9 or dCas9 protein complexed with guideRNA, a double stranded 800bp mKate transgene, with and without RecT.
  • Knock-ins were at the HSP90AA1 (FIG. 61A) or HIST1H2BK (FIG. 61B) locus.
  • FIG. 62 shows delivery of RNP comprising Cas9 or dCAS9, with or without SSAP to mouse primary hematopoietic stem cells (HSC) and AAV6 to knock in a GFP-expressing transgene.
  • FIG. 63 shows transgene expression.
  • FIGS. 64A-64D depicts SSAP-mediated knock-in of transgenes using an Rloop- forming guide without CRISPR components.
  • FIG. 64A Model of guide-RNA-SSAP mediated STDU2-42312.601 (S22-113) gene editing showing MCP-MS2 aptamer pairing of SSAP and R-loop-gRNA.
  • FIGS. 65A-65B show R-loop-guide RNA design.
  • the R-loop-guideRNA comprises two components, guide, and scaffold, depicted in guide-scaffold and scaffold-guide configurations (e.g., the guide at the 5' or 3'end of scaffold).
  • the guide sequence is designed to match a target DNA.
  • FIG. 66 shows a chimeric guide RNA comprising an MS2/PP7-aptamer.
  • FIG. 67 shows the effect of varying guide length on knock-in efficiency at the ACTB locus comparing R-loop-SSAP (no CRISPR), Cas9 HDR, and dCas9-SSAP. Donor-only included only the mKate knock-in donor. [00141] FIG.
  • FIG. 68 shows the effect of varying guide length on knock-in efficiency at the HIST locus comparing R-loop-SSAP (no CRISPR), Cas9 HDR, and dCas9-SSAP. Donor-only included only the mKate knock-in donor.
  • FIG. 69 shows the effect of varying guide length on knock-in efficiency at the HSP90AA1 locus comparing R-loop-SSAP (no CRISPR), Cas9 HDR, and dCas9-SSAP. Donor- only included only the mKate knock-in donor.
  • FIG. 69 shows the effect of varying guide length on knock-in efficiency at the HSP90AA1 locus comparing R-loop-SSAP (no CRISPR), Cas9 HDR, and dCas9-SSAP. Donor- only included only the mKate knock-in donor.
  • FIG. 70 shows R-loop-SSAP mediated knock-in of luciferase-expressing or mKate expressing 600-bp transgenes at the human albumin (ALB) locus (top) or the AAVS1 locus (bottom) in human HEK293T cells or human hepatocytes.
  • ALB human albumin
  • AAVS1 locus bottom
  • Transgene knock-ins at the albumin locus were highly expressed in hepatocytes but not HEK293T (top).
  • Transgene knock-ins at the AAVS1 locus were highly expressed in HEK293T but not hepatocytes (bottom).
  • FIG. 71 shows R-loop-SSAP mediated knock-in of luciferase-expressing or mKate expressing 800-bp transgenes at the human albumin (ALB) locus (top) or the AAVS1 locus (bottom) in human HEK293T cells or human hepatocytes.
  • ALB human albumin
  • AAVS1 locus bottom
  • Transgene knock-ins at the albumin locus were highly expressed in hepatocytes but not HEK293T (top).
  • Transgene knock-ins at the AAVS1 locus were highly expressed in HEK293T but not hepatocytes (bottom).
  • RNA template/donor 72 shows schematics comparing RNA-mediated SSAP editing without reverse transcriptase (top) with RNA-mediated SSAP editing with reverse transcriptase (bottom).
  • the STDU2-42312.601 (S22-113) RNA template/donor as depicted includes a Homology Arm (HA) region with one HA.
  • the RNA template/donor comprises two HA regions, one at each is matched with the genomic region next to the editing site so SSAP can promote editing.
  • FIG.73 shows insertion rate for a 4bp sequence inserted at the human EMX locus using an in vitro transcribed (IVT) RNA template.
  • System components are 1.
  • FIG.74 shows a U6-expressed RNA template in sense or anti-sense orientation used to replace a 16bp sequence (install 16bp edits) at human EMX1.
  • System components comprise 1. SpCas9 guideRNA targeting human EMX1; 2. dead/deactivated guideRNA binding to the region.
  • FIG. 75 shows dosage relationship of a U6-expressed RNA template in sense or anti- sense orientation used to replace a 16bp sequence (install 16bp edits) at human EMX1.
  • System components comprise 1. SpCas9 guideRNA targeting human EMX1; 2. dead/deactivated guideRNA binding to a region.
  • FIGS. 76A-76B show a system of the invention inserted at the human AAVS1 locus (FIG. 76A) and repair of a defective Venus (green fluorescent protein) locus (FIG. 76B).
  • FIG. 77 shows a sgRNA+ dgRNA system schematic (top) and example based on the TLR locus (bottom).
  • SpCas9 guide_20bp sgRNA 20bp guide for sgRNA targeting TLR used in first guideRNA. Also shown are SaCas9 guide designs.
  • dg1-dg6 different 15bp/16bp dead/deactivated guide in dgRNA targeting TLR, used in second guideRNA with aptamer for recruitment.
  • FIG. 78 shows a demonstration of the sgRNA+dgRNA system including signal achieved in GFP (green) channel indicating repair of Venus protein.
  • FIG. 79 shows a sgRNA+dgRNA system schematic with a direct fusion of RNA template/donor to dgRNA.
  • the circular RNA can enhance stability and efficiency.
  • FIG.80 shows a demonstration of a system with fusion of dgRNA to the RNA template donor and SSAP. The box highlights repair of Venus significantly higher than control.
  • FIG. 81 shows a test of pol2 (CMV) v.s. pol3 (U6) promoters TLR genomic editing.
  • FIG. 82 shows a sgRNA+ dgRNA system example based on the EMX1 locus.
  • FIG. 83 shows dgRNA with fusion RNA template donor and SSAP targeted at the human EMX site.
  • pA19 has Cas9 and the guide RNA with sg334/sg516 that are two guides targeting the EMX1 gene-editing reporter genome region.
  • BB is backbone, serving as negative control.
  • FIG.84 shows a schematic of a system incorporating SSAP and prime editing. An MS2 aptamer recruits SSAP-MCP to a Cas9-RT complex.
  • FIG. 85 shows SSAP + prime editing mediated editing at the HEK3 locus (top) and RFN2 locus (bottom).293T cells were transfected with a Cas9n-RT construct, pegRNA construct, nicking/recruiting sgRNA-MS2 construct, and SSAP-MCP construct.
  • FIG. 85 shows SSAP + prime editing mediated editing at the HEK3 locus (top) and RFN2 locus (bottom).293T cells were transfected with a Cas9n-RT construct, pegRNA construct, nicking/recruiting sgRNA-MS2 construct, and SSAP-MCP construct.
  • FIG. 87 shows a schematic of an editing system incorporating SSAP and retron.
  • Retron-sgRNA can be subdivided into three regions: STDU2-42312.601 (S22-113) the region of RNA that is reverse transcribed (called “msd”) and a region that remains as RNA in the final molecule (called “msr”), and finally the guide RNA region (guide RNA or other aptamer to recruit SSAP).
  • the gRNA region can be derived from Cas9 scaffold. This msr/msd RNA helps initiate the RT process that generates reverse-transcribed ssDNA directly linked to sgRNA.
  • the RT of a retron which recognizes retron RNA and complete reverse transcription of the donor template (a linked RNA-DNA hybrid molecule).
  • FIGS. 88A-88D depicts SSAP array screening, showing cell viability vs. editing efficiency (fold over negative control (FIGS.88A, 88C) or percent of mKate knock-in (FIGS.88B, 88D)) for the ACTB target (FIGS. 88A, 88B) and the HSP90AA1 target (FIGS. 88C, 88D).
  • the positive control is EcRecT.
  • FIGS. 89A-89C depicts normalized (FIG.
  • FIG. 90A-90D depict by scatter plot a comparison of cell viability vs. normalized (FIG.90A) or absolute (FIG.90B) editing efficiency for all targets combined. Bar graphs compare editing efficiency at two targets, HSP90 and QCTB, normalized (FIG.90C) or absolute (FIG.90D) for each of the candidates.
  • the positive control is EcRecT. [00164] FIG.
  • FIG. 91 depicts a tree and sequence alignment for SSAP_16 (1, SEQ ID NO:185), SSAP_10 (2, SEQ ID NO:179), SSAP_36 (3, SEQ ID NO:205), SSAP_152 (4, SEQ ID NO:321), and SSAP_184 (5, SEQ ID NO:353) compared with EcRecT (SEQ ID NO:171). See Table 12. [00165] FIG.
  • FIG. 92 depicts a tree and sequence alignment for SSAP_16 (1, SEQ ID NO:185), SSAP_10 (2, SEQ ID NO:179), SSAP_36 (3, SEQ ID NO:205), SSAP_152 (4, SEQ ID NO:321), SSAP_184 (5, SEQ ID NO:353), SSAP_197 (6, SEQ ID NO:366), SSAP_305 (7, SEQ ID NO:424), SSAP_210 (8; SEQ ID NO:379), and SSAP_190 (9, SEQ ID NO:359) compared with EcRecT (SEQ ID NO:171). See Table 12. [00166] FIG.
  • FIG. 94 depicts an evolution tree of candidate SSAPs. 296 candidates were selected applying a set of filters and maximizing evolution distances. The SSAPs cover a phylogenetic family (branches) within the SSAP family. [00168] FIGS.
  • FIG. 95A-95C depict editing efficiencies of 10 top-ranked SSAPs compared to EcRecT and a negative control using the dCas9 editing system.
  • FIG. 95A mKate knock-ins at the ACTB locus.
  • FIG. 95B mKate knock-ins at the HSP90 locus.
  • FIG. 95C Scatter plot depicting editing efficiency at the ACTB target and at the HSP90AA1 target for the candidate SSAPs.
  • FIG. 96 compares editing efficiencies in a Cas-free system of 10 top-ranked SSAPs with a negative control (pA25 expresses MCP-EBFP) and pCK914 which expresses MCP- EcRecT.
  • FIGS. 97A-97C depict editing efficiencies of top SSAPs compared to EcRecT using a dCas9 editing system with a transcribed AAV donor in primary hepatocyte (mouse).
  • the dCas9 is virally delivered separately using adeno-viral-Cas9 under the control of strong CMV promoter (Adeno-CMV-Cas9).
  • AAV donor designs (top) typical AAV donor DNA; (bottom) AAV vector includes a promoter that transcribes the donor cargo into RNA. The donor RNA is transcribed in anti-sense orientation to avoid cargo expression.
  • a 600bp luciferase cargo was knocked in at the mouse Albumin locus (FIG. 97B) or ACTB locus (FIG. 97C).
  • FIGS.98A-98C depict AAV donor designs and editing efficiency.
  • the dCas9 is virally delivered separately using adeno-viral-Cas9 under the control of strong CMV promoter (Adeno- CMV-Cas9).
  • FIG. 98A top: 5’ release AAV design.
  • a second guide RNA is provided to bind/cleave the 5’ end of the cargo (hsgRNA cleavage site adjacent to the left homology arm (Left HA).
  • middle 3’ release AAV design.
  • a second guide RNA is provided to bind/cleave the 3’ end of the cargo (hsgRNA cleavage site adjacent to the right homology arm (Right HA).
  • Bottom Intact AAV design.
  • FIG. 99 depicts genome engineering across multiple human targets with SSAPs.
  • the editing system included dCas9 (dSpCas9), guideRNA with MS2 aptamer, MCP protein fused to STDU2-42312.601 (S22-113) candidate SSAP, and donor DNA inserting a mKate fluorescent protein cargo in-frame into the indicated endogenous genomic loci.
  • FIGS. 100A-100C depict engineering of RecT. Model of RecT in complex with a duplex intermediate of DNA annealing.
  • FIGS. 101A-101B depict a model of LiRecT showing interaction with dsDNA (highlighted) consistent with knock-in efficiency of N-terminal truncated mini-SSAP (mSSAP) (See, e.g., FIG. 42).
  • FIG. 102 depicts a plasmid map of R2RT-Cas9-GCN4 (11495 bp).
  • FIG. 103 depicts a plasmid map of U6-R2Bm_RNA-MS2-guideRNA (8410 bp).
  • FIGS. 104A-104L depict a model and gating strategy for detecting in-frame 800bp knock-in encoding fluorescent protein.
  • FIG. 104A schematic showing dCas9, guide, knock-in donor and MCP-SSAP fusion protein.
  • FIGS. 104B-104D donor only;
  • FIG. 104E-104H dCas9 only;
  • FIG. 104I-FIG. 104L donor + dCas9.
  • FIG. 105 depicts comparison of ERF family SSAPs with RecT and SSAP-16.
  • FIGS. 106A-106C depict vectors according to the invention.
  • FIG. 106A depicts a map of an exemplary lentiviral vector for primer editor (PE) expression.
  • FIG. 106B depicts a map of an exemplary vector for expressing MS2-pegRNA.
  • FIGS. 107A-107C depict vector maps for expression of SSAP-16 (FIG. 107A), D3 (orf52)-ERF (FIG. 107B) and D3 (orf52)-EXO (FIG. 107C).
  • FIG.108 depicts editing efficiency of recombineering system of the invention designed to include prime editors and SSAPs.
  • SSAPs are RecT (SEQ ID NO:764), SSAP-16 (SEQ ID NO:765) and SSAP-ERF (SEQ ID NO:767).
  • FIG. SSAPs are RecT (SEQ ID NO:764), SSAP-16 (SEQ ID NO:765) and SSAP-ERF (SEQ ID NO:767).
  • FIG.110 depicts an exemplary circular single stranded DNA (cssDNA) donor construct for homology directed knock-in of mCherry into the RAB11a locus.
  • FIGS. 111A-111C depict RAB11A knock-in efficiencies using a cssDNA mKate donor, nCas9 or dCas9, and SSAPs of the invention.
  • 111A 90 ng donor / well.
  • FIG. 111B 30 ng donor / well.
  • FIG. 111C 10 ng donor / well.
  • SSAPs are RecT (SEQ ID NO:764), SSAP-16 (SEQ ID NO:765), SSAP D3_Orf52_Erf (SEQ ID NO:767) and SSAP ERF-N10 (SEQ ID NO:779).
  • the present invention is directed to a system and the components for DNA editing.
  • the disclosed system based on CRISPR targeting and homology directed repair by phage recombination enzymes.
  • the system results in superior recombination efficiency and accuracy on a kilobase scale.
  • the degree of complementarity between two nucleic acid sequences can be indicated by the percentage of nucleotides in a nucleic acid sequence which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 50%, 60%, 70%, 80%, 90%, and 100% complementary).
  • Two nucleic acid sequences are “perfectly complementary” if all the contiguous nucleotides of a nucleic acid sequence will hydrogen bond with the same number of contiguous nucleotides in a second nucleic acid sequence.
  • Two nucleic acid sequences are “substantially complementary” if the degree of complementarity between the two nucleic acid sequences is at least 60% (e.g., 65%, 70%, 75%, 80%, 85%, 90%, 95%. 97%, 98%, 99%, or 100%) over a region of at least 8 nucleotides (e.g., 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides), or if the two nucleic acid sequences hybridize under at least moderate, preferably high, stringency conditions.
  • 60% e.g., 65%, 70%, 75%, 80%, 85%, 90%, 95%. 97%, 98%, 99%, or 100%
  • at least 8 nucleotides e.g., 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides
  • Exemplary moderate stringency conditions include overnight incubation at 37° C in a solution comprising 20% formamide, 5 ⁇ SSC (150 mM NaCl, 15 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5 ⁇ Denhardt’s solution, 10% dextran sulfate, and 20 mg/ml denatured sheared salmon sperm DNA, followed by washing the filters in 1 ⁇ SSC at about 37-50° C., or substantially similar conditions, e.g., the moderately stringent conditions described in Sambrook et al., infra.
  • High stringency conditions are conditions that use, for example (1) low ionic strength and high temperature for washing, such as 0.015 M sodium chloride/0.0015 M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50° C, (2) employ a denaturing agent during hybridization, such as formamide, for example, 50% (v/v) formamide with 0.1% bovine serum albumin (BSA)/0.1% Ficoll/0.1% polyvinylpyrrolidone (PVP)/50 mM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride and 75 mM sodium citrate at 42° C., or (3) employ 50% formamide, 5 ⁇ SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium STDU2-42312.601 (S22-113) pyrophosphate, 5 ⁇ Denhardt’s solution, sonicated salmon sperm DNA (50 ⁇ g/ml), 0.1% S
  • a cell has been “genetically modified,” “transformed,” or “transfected” by exogenous DNA, e.g., a recombinant expression vector, when such DNA has been introduced inside the cell.
  • exogenous DNA e.g., a recombinant expression vector
  • the presence of the exogenous DNA results in permanent or transient genetic change.
  • the transforming DNA may or may not be integrated (covalently linked) into the genome of the cell.
  • the transforming DNA may be maintained on an episomal element such as a plasmid.
  • a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication.
  • a “nucleic acid” or a “nucleic acid sequence” refers to a polymer or oligomer of pyrimidine and/or purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively.
  • the present technology contemplates any deoxyribonucleotide, ribonucleotide, or peptide nucleic acid component, and any chemical variants thereof, such as methylated, hydroxymethylated, or glycosylated forms of these bases, and the like.
  • the polymers or oligomers may be heterogenous or homogenous in composition and may be isolated from naturally occurring sources or may be artificially or synthetically produced.
  • the nucleic acids may be DNA or RNA, or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states.
  • a nucleic acid or nucleic acid sequence comprises other kinds of nucleic acid structures such as, for instance, a DNA/RNA helix, peptide nucleic acid (PNA), morpholino nucleic acid (see, e.g., Braasch and Corey, Biochemistry, 41(14): 4503-4510 (2002)) and U.S. Pat. STDU2-42312.601 (S22-113) No.5,034,506, incorporated herein by reference), locked nucleic acid (LNA; see Wahlestedt et al., Proc. Natl. Acad. Sci.
  • PNA peptide nucleic acid
  • morpholino nucleic acid see, e.g., Braasch and Corey, Biochemistry, 41(14): 4503-4510 (2002)
  • U.S. Pat. STDU2-42312.601 S22-113 No.5,034,506, incorporated herein by reference
  • LNA locked nucleic acid
  • nucleic acid or “nucleic acid sequence” may also encompass a chain comprising non-natural nucleotides, modified nucleotides, and/or non- nucleotide building blocks that can exhibit the same function as natural nucleotides (e.g., “nucleotide analogs”); further, the term “nucleic acid sequence” as used herein refers to an oligonucleotide, nucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin, which may be single or double-stranded, and represent the sense or antisense strand.
  • nucleic acid refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof.
  • a “peptide” or “polypeptide” is a linked sequence of two or more amino acids linked by peptide bonds.
  • the peptide or polypeptide can be natural, synthetic, or a modification or combination of natural and synthetic.
  • Polypeptides include proteins such as binding proteins, receptors, and antibodies. The proteins may be modified by the addition of sugars, lipids or other moieties not included in the amino acid chain.
  • polypeptide and “protein,” are used interchangeably herein.
  • percent sequence identity refers to the percentage of nucleotides or nucleotide analogs in a nucleic acid sequence, or amino acids in an amino acid sequence, that is identical with the corresponding nucleotides or amino acids in a reference sequence after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity.
  • additional nucleotides in the nucleic acid, that do not align with the reference sequence are not taken into account for determining sequence identity.
  • a “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, e.g., an “insert,” may be attached or incorporated so as to bring about the replication of the attached segment in a cell.
  • an “insert” may be attached or incorporated so as to bring about the replication of the attached segment in a cell.
  • wild-type refers to a gene or a gene product that has the characteristics of that gene or gene product when isolated from a naturally occurring source.
  • a wild-type gene is STDU2-42312.601 (S22-113) that which is most frequently observed in a population and is thus arbitrarily designated the “normal” or “wild-type” form of the gene.
  • the term “modified,” or “polymorphic” refers to a gene or gene product that displays modifications in sequence and or functional properties (e.g., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product.
  • CRISPR/Cas systems provide immunity by incorporating fragments of invading phage, virus, and plasmid DNA into CRISPR loci and using corresponding CRISPR RNAs (“crRNAs”) to guide the degradation of homologous sequences.
  • crRNAs CRISPR RNAs
  • Each CRISPR locus encodes acquired “spacers” that are separated by repeat sequences. Transcription of a CRISPR locus produces a “pre-crRNA,” which is processed to yield crRNAs containing spacer-repeat fragments that guide effector nuclease complexes to cleave dsDNA sequences complementary to the spacer.
  • CRISPR systems Three different types are known, type I, type II, or type III, and classified based on the Cas protein type and the use of a proto-spacer-adjacent motif (PAM) for selection of proto-spacers in invading DNA.
  • the endogenous type II systems comprise the Cas9 protein and two noncoding crRNAs: trans-activating crRNA (tracrRNA) and a precursor crRNA (pre-crRNA) array containing nuclease guide sequences (also referred to as “spacers”) interspaced by identical direct repeats (DRs).
  • tracrRNA is important for processing the pre-crRNA and formation of the Cas9 complex.
  • tracrRNAs hybridize to repeat regions of the pre-crRNA.
  • each mature complex locates a target double stranded DNA (dsDNA) sequence and cleaves both strands using the nuclease activity of Cas9.
  • dsDNA target double stranded DNA
  • CRISPR/Cas gene editing systems are commonly based on the RNA-guided Cas9 nuclease from the type II prokaryotic clustered regularly interspaced short palindromic repeats (CRISPR) adaptive immune system.
  • Engineering CRISPR/Cas systems for use in eukaryotic cells typically involves reconstitution of the crRNA- tracrRNA-Cas9 complex.
  • the Cas9 amino acid sequence may be codon-optimized and modified to include an appropriate nuclear localization signal, and the STDU2-42312.601 (S22-113) crRNA and tracrRNA sequences may be expressed individually or as a single chimeric molecule via an RNA polymerase II promoter.
  • the crRNA and tracrRNA sequences are as a chimera and are referred to collectively as “guide RNA” (gRNA) or single guide RNA (sgRNA).
  • gRNA guide RNA
  • sgRNA single guide RNA
  • guide RNA single guide RNA
  • single guide RNA single guide RNA
  • synthetic guide RNA are used interchangeably herein and refer to a nucleic acid sequence comprising a tracrRNA and a pre- crRNA array containing a guide sequence.
  • guide sequence guide
  • guide and “spacer,” are used interchangeably herein and refer to the about 20 nucleotide sequence within a guide RNA that specifies the target site.
  • the guide RNA contains an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Cas9 via Watson-Crick base pairing to a target sequence.
  • PAM protospacer adjacent motif
  • the system comprises: a Cas protein, a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and a recombination protein.
  • the recombination protein comprises a microbial recombination protein.
  • the recombination protein comprises a viral recombination protein. In certain embodiments, the recombination protein comprises a eukaryotic recombination protein. In certain embodiments, the recombination protein comprises a mitochondrial recombination protein.
  • Cas protein families are described in further detail in, e.g., Haft et al., PLoS Comput. Biol., 1(6): e60 (2005), incorporated herein by reference.
  • the Cas protein may be any Cas endonucleases. In some embodiments, the Cas protein is Cas9 or Cas12a, otherwise referred to as Cpf1.
  • the Cas9 protein is a wild-type Cas9 protein.
  • the Cas9 protein can be obtained from any suitable microorganism, and a number of bacteria express Cas9 protein orthologs or variants.
  • the Cas9 is from Streptococcus pyogenes or Staphylococcus aureus.
  • Cas9 proteins of other species are known in the art (see, e.g., U.S. Patent Application Publication 2017/0051312, incorporated herein by reference) and may be used in connection with the present invention.
  • the amino acid sequences of Cas proteins from a variety of species are publicly available through the GenBank and UniProt databases.
  • the Cas9 protein is a Cas9 nickase (Cas9n).
  • Wild-type Cas9 has two catalytic nuclease domains facilitating double-stranded DNA breaks.
  • a Cas9 nickase protein is typically engineered through inactivating point mutation(s) in one of the catalytic nuclease STDU2-42312.601 (S22-113) domains causing Cas9 to nick or enzymatically break only one of the two DNA strands using the remaining active nuclease domain.
  • Cas9 nickases are known in the art (see, e.g., Application Publication 2017/0051312, incorporated herein by reference) and include, for example, Streptococcus pyogenes with point mutations at D10 or H840.
  • the Cas9 nickase is Streptococcus pyogenes Cas9n (D10A).
  • the Cas protein is a catalytically dead Cas.
  • catalytically dead Cas9 is essentially a DNA-binding protein due to, typically, two or more mutations within its catalytic nuclease domains which renders the protein with very little or no catalytic nuclease activity.
  • Streptococcus pyogenes Cas9 may be rendered catalytically dead by mutations of D10 and at least one of E762, H840, N854, N863, or D986, typically H840 and/or N863 (see, e.g., U.S. Patent Application Publication 2017/0051312, incorporated herein by reference). Mutations in corresponding orthologs are known, such as N580 in Staphylococcus aureus Cas9. Oftentimes, such mutations cause catalytically dead Cas proteins to possess no more than 3% of the normal nuclease activity. [00204] In certain embodiments, the system comprises a nucleic acid molecule comprising a guide RNA sequence complementary to a target DNA sequence.
  • the guide RNA sequence specifies the target site with an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Cas9 via Watson-Crick base pairing to a target sequence.
  • the system comprises a nucleic acid molecule comprising a deactivated guide RNA (dgRNA) sequence complementary to a target DNA sequence.
  • dgRNA deactivated guide RNA
  • the deactivated guide is shortened or modified such that a CRISPR complex comprising the dgRNA binds to but does not cut or nick target DNA.
  • Non-limiting examples include guides such as are described by WO/2017/094872, which are modified in a manner which allows for formation of a CRISPR complex and successful binding to a target, while at the same time, not allowing for successful nuclease activity (e.g., without nuclease activity / without indel activity).
  • the guide nucleic acids can be thought of as catalytically inactive or conformationally inactive with regard to nuclease activity.
  • dgRNAs with short target recognition sequences can dramatically improve Cas9-mediated editing specificity by binding to and shielding off-target sites from an active Cas9 sgRNA complex.
  • target DNA sequence refers to a polynucleotide (nucleic acid, gene, chromosome, genome, etc.) to which a guide sequence (e.g., a guide RNA) is designed to have complementarity, wherein hybridization between the target sequence and a guide sequence promotes the formation of a Cas9/CRISPR complex, provided sufficient conditions for binding exist.
  • the target sequence is a genomic DNA sequence.
  • genomic refers to a nucleic acid sequence (e.g., a gene or locus) that is located on a chromosome in a cell.
  • a target sequence may comprise any polynucleotide, such as DNA or RNA.
  • Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell.
  • Other suitable DNA/RNA binding conditions e.g., conditions in a cell-free system are known in the art; see, e.g., Sambrook, referenced herein and incorporated by reference.
  • the target genomic DNA sequence may encode a gene product.
  • gene product refers to any biochemical product resulting from expression of a gene. Gene products may be RNA or protein.
  • RNA gene products include non-coding RNA, such as tRNA, rRNA, micro RNA (miRNA), and small interfering RNA (siRNA), and coding RNA, such as messenger RNA (mRNA).
  • the target genomic DNA sequence encodes a protein or polypeptide.
  • two nucleic acid molecules comprising a guide RNA sequence may be utilized. The two nucleic acid molecules may have the same or different guide RNA sequences, thus complementary to the same or different target DNA sequence.
  • the guide RNA sequences of the two nucleic acid molecules are complementary to a target DNA sequences at opposite ends (e.g., 3’ or 5’) and/or on opposite strands of the insert location.
  • STDU2-42312.601 S22-113
  • the system further comprises a recruitment system comprising at least one aptamer sequence and an aptamer binding protein functionally the recombination protein as part of a fusion protein.
  • the aptamer sequence is an RNA aptamer sequence.
  • the nucleic acid molecule comprising the guide RNA also comprises one or more RNA aptamers, or distinct RNA secondary structures or sequences that can recruit and bind another molecular species, an adaptor molecule, such as a nucleic acid or protein.
  • an adaptor molecule such as a nucleic acid or protein.
  • CRISPR systems are compatible with guide RNA insertions and extensions, including but not limited to SpCas9, SaCas9, and LbCas12a (aka Cpf1).
  • the RNA aptamers can be naturally occurring or synthetic oligonucleotides that have been engineered through repeated rounds of in vitro selection or SELEX (systematic evolution of ligands by exponential enrichment) to bind to a specific target molecular species.
  • the nucleic acid comprises two or more aptamer sequences.
  • the aptamer sequences may be the same or different and may target the same or different adaptor proteins.
  • the nucleic acid comprises two aptamer sequences.
  • Any RNA aptamer/ aptamer binding protein pair known may be selected and used in connection with the present invention (see, e.g., Jayasena, S.D., Clinical Chemistry, 1999. 45(9): p. 1628-1650; Gelinas, et al., Current Opinion in Structural Biology, 2016. 36: p. 122-132; and Hasegawa, H., Molecules, 2016; 21(4): p.
  • RNA aptamer binding, or adaptor, proteins exist, including a diverse array of bacteriophage coat proteins.
  • coat proteins include but are not limited to: MS2, Q ⁇ , F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ⁇ Cb5, ⁇ Cb8r, ⁇ Cb12r, ⁇ Cb23r, 7s and PRR1.
  • the RNA aptamer binds MS2 bacteriophage coat protein or a functional derivative, fragment or variant thereof.
  • MS2 binding RNA aptamers commonly have a simple stem-loop structure, classically defined by a 19 nucleotide RNA molecule with a single bulged adenine on the 5’ leg of the stem (Witherall G.W., et al., (1991) Prog. Nucleic Acid Res. Mol. Biol., 40, 185–220, incorporated herein by reference).
  • MS2 coat protein Parrott AM, et al., Nucleic Acids Res. 2000;28(2):489–497, Buenrostro JD, et al. Natura Biotechnology 2014; 32, 562-568, and incorporated herein by reference).
  • RNA aptamer sequence known to bind the MS2 bacteriophage coat protein STDU2-42312.601 may be utilized in connection with the present invention to bind to fusion proteins comprising MS2.
  • the MS2 RNA aptamer sequence AACAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO:145), AGCAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO:146), or AGCGUGAGGAUCACCCAUGCCUGCAG (SEQ ID NO:147).
  • N-proteins (Nut-utilization site proteins) of bacteriophages contain arginine-rich conserved RNA recognition motifs of ⁇ 20 amino acids, referred to as N peptides.
  • the RNA aptamer may bind a phage N peptide or a functional derivative, fragment or variant thereof.
  • the phage N peptide is the lambda or P22 phage N peptide or a functional derivative, fragment or variant thereof.
  • the N peptide is lambda phage N22 peptide, or a functional derivative, fragment or variant thereof.
  • the N22 peptide comprises an amino acid sequence with at least 70% similarity to the amino acid sequence GNARTRRRERRAEKQAQWKAAN (SEQ ID NO:149).
  • N22 peptide the 22 amino acid RNA- binding domain of the ⁇ bacteriophage antiterminator protein N ( ⁇ N-(1–22) or ⁇ N peptide), is capable of specifically binding to specific stem-loop structures, including but not limited to the BoxB stem-loop. See, for example Cilley and Williamson, RNA 1997; 3(1):57-67, incorporated herein by reference. A number of different BoxB stem-loop primary sequences are known to bind the N22 peptide and any of those may be utilized in connection with the present invention.
  • the N22 peptide RNA aptamer sequence comprises a nucleotide sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCCCUGAAAAAGGGC (SEQ ID NO:150), GCCCUGAAGAAGGGC (SEQ ID NO:151), GCGCUGAAAAAGCGC (SEQ ID NO:152), GCCCUGACAAAGGGC (SEQ ID NO:153), and GCGCUGACAAAGCGC (SEQ ID NO:154).
  • the N22 peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 150-154.
  • the N peptide is the P22 phage N peptide, or a functional derivative, fragment or variant thereof.
  • a number of different BoxB stem-loop primary sequences are known to bind the P22 phage N peptide and variants thereof and any of those may be utilized in connection with the present invention. See, for example Cocozaki, Ghattas, and Smith, Journal of Bacteriology 2008; 190(23):7699-7708, incorporated herein by reference.
  • the P22 phage N peptide comprises an amino acid sequence with at least 70% STDU2-42312.601 (S22-113) similarity to the amino acid sequence GNAKTRRHERRRKLAIERDTI (SEQ ID NO:155).
  • the P22 phage N peptide RNA aptamer sequence comprises a sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCGCUGACAAAGCGC (SEQ ID NO:156) and CCGCCGACAACGCGG (SEQ ID NO:157). In some embodiments, the P22 phage N peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 156-157, UGCGCUGACAAAGCGCG (SEQ ID NO:158) or ACCGCCGACAACGCGGU (SEQ ID NO:159).
  • aptamer/aptamer binding protein pairs can be selected to bring together a combination of recombination proteins and functions.
  • the aptamer sequence is a peptide aptamer sequence.
  • the peptide aptamers can be naturally occurring or synthetic peptides that are specifically recognized by an affinity agent.
  • Such aptamers include, but are not limited to, a c-Myc affinity tag, an HA affinity tag, a His affinity tag, an S affinity tag, a methionine-His affinity tag, an RGD-His affinity tag, a 7 ⁇ His tag (SEQ ID NO:763), a FLAG octapeptide, a strep tag or strep tag II, a V5 tag, or a VSV-G epitope.
  • Corresponding aptamer binding proteins are well-known in the art and include, for example, primary antibodies, biotin, affimers, single domain antibodies, and antibody mimetics.
  • An exemplary peptide aptamer includes a GCN4 peptide (Tanenbaum et al., Cell 2014; 159(3):635-646, incorporated herein by reference). Antibodies, or GCN4 binding protein can be used as the aptamer binding proteins.
  • the peptide aptamer sequence is conjugated to the Cas protein.
  • the peptide aptamer sequence may be fused to the Cas in any orientation (e.g., N-terminus to C- terminus, C-terminus to N-terminus, N-terminus to N-terminus).
  • the peptide aptamer is fused to the C-terminus of the Cas protein.
  • peptide aptamer sequences may be conjugated to the Cas protein.
  • the aptamer sequences may be the same or different and may target the same or different aptamer binding proteins.
  • 1 to 24 tandem repeats of the same peptide aptamer sequence are conjugated to the Cas protein.
  • between 4 and 18 tandem repeats are conjugated to the Cas protein.
  • the individual aptamers may be separated by a linker region. Suitable linker regions are known in the art.
  • the linker may be flexible or configured to allow the binding of affinity agents to adjacent aptamers without or with decreased STDU2-42312.601 (S22-113) steric hindrance.
  • the linker sequences may provide an unstructured or linear region of the polypeptide, for example, with the inclusion of one or more glycine and/or serine linker sequences can be at least about 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids in length.
  • the fusion protein comprises a recombination protein functionally linked to an aptamer binding protein.
  • the recombination protein comprises a microbial recombination protein.
  • the recombination protein comprises a recombinase.
  • the recombination protein comprises 5’-3’ exonuclease activity.
  • the recombination protein comprises 3’-5’ exonuclease activity. In certain embodiments, the recombination protein comprises ssDNA binding activity. In certain embodiments, the recombination protein comprises ssDNA annealing activity.
  • the bacteriophage ⁇ -encoded genetic recombination machinery named the ⁇ red system, comprises the exo and bet genes, assisted by the gam gene, together designated ⁇ red genes. Exo is a 5’-3’ exonuclease which targets dsDNA and Bet is a ssDNA-binding protein. Bet functions include protecting ssDNA from degradation and promoting annealing of complementary ssDNA strands.
  • the microbial recombination protein may be RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
  • Recombination proteins and functional fragments thereof useful in the invention include nucleases, ssDNA-binding proteins (SSBs), and ssDNA annealing proteins (SSABs).
  • SSBs ssDNA-binding proteins
  • SSABs ssDNA annealing proteins
  • microbial proteins these include, without limitation, E.
  • coli proteins such as ExoI (xonA; sbcB), ExoIII (xthA), ExoIV (orn), ExoVII (xseA, xseB), ExoIX (ygdG), ExoX (exoX), DNA polI 5’ Exo (ExoVI) (polA), DNA Pol I 3’ Exo (ExoII) (polA), DNA Pol II 3’ Exo (polB), DNA Pol III 3’ Exo (dnaQ, mutD), RecBCD (recB, recC, recD), and RecJ (recJ) and their functional fragments.
  • ExoI xonA; sbcB
  • ExoIII xthA
  • ExoIV orn
  • ExoVII xseA, xseB
  • ExoIX ygdG
  • ExoX ExoX
  • SSBs ssDNA binding proteins
  • Useful SSBs include, without limitation, SSBs of prokaryotes, bacteriophage, eukaryotes, mammals, mitochondria, and viruses. While SSBs are STDU2-42312.601 (S22-113) found in every organism, the proteins themselves share surprisingly little sequence similarity, and may differ in subunit composition and oligomerization states. SSB proteins may comprise certain structural features.
  • oligonucleotide/oligosaccharide-binding (OB) domains to bind ssDNA through a combination of electrostatic and base-stacking interactions with the phosphodiester backbone and nucleotide bases.
  • Another feature is oligomerization that brings together DNA-binding OB folds.
  • Eukaryotic SSBs are regulated by phosphorylation on serine and threonine residues. Tyrosine phosphorylation of microbial SSBs is observed in taxonomically distant bacteria and substantially increases affinity for ssDNA.
  • the human mitochondrial ssDNA- binding protein is structurally similar to SSB from Escherichia coli (EcoSSB), but lacks the C- terminal disordered domain.
  • Eukaryotic replication protein A shares function, but not sequence homology with bacterial SSB.
  • the herpes simplex virus (HSV-1) SSB, ICP8, is a nuclear protein that, along other replication proteins is required for viral DNA replication.
  • HSV-1 SSB The herpes simplex virus
  • ICP8 is a nuclear protein that, along other replication proteins is required for viral DNA replication.
  • exonuclease activities and ssDNA binding activities of the recombination proteins of the invention uncover and protect single stranded regions of template and target DNAs, thereby facilitating recombination.
  • targeting can be cooperative, involving target directed CRISPR-mediated nicking of chromosomal DNA coordinated with recombination directed by homology arms designed into template DNAs. In certain embodiments of the invention, off-target effects are minimized.
  • SSAPs Single stranded DNA annealing proteins
  • phage encoded SSAPs are recognized to encode their own SSAP recombinases which substitute for classic RecA proteins while functioning with host proteins to control DNA metabolism.
  • SSAPs Steczkiewiz classified SSAPs into seven families (RecA, Gp2.5, RecT/Red ⁇ , Erf, Rad52/22, Sak3, and Sak4) organized into three superfamilies including prokaryotes, eukaryotes, and phage (Steczkiewicz et al., 2021, Front. Microbiol 12:644622).
  • Non- limiting examples of SSAPs that can be used according to the invention are provided in Table 7. Any one or more of the SSAPs can be employed in the invention.
  • a microbial recombination protein is RecE or RecT, or a derivative or variant thereof.
  • RecE and RecT are functionally equivalent STDU2-42312.601 (S22-113) proteins or polypeptides which possess substantially similar function to wild-type RecE and RecT.
  • RecE and RecT derivatives or variants include biologically active amino acid sequences to the wild-type sequences but differing due to amino acid substitutions, additions, deletions, truncations, post-translational modifications, or other modifications.
  • the derivatives may improve translation, purification, biological half-life, activity, or eliminate or lessen any undesirable side effects or reactions.
  • the derivatives or variants may be naturally occurring polypeptides, synthetic or chemically synthesized polypeptides or genetically engineered peptide polypeptides.
  • RecE and RecT bioactivities are known to, and easily assayed by, those of ordinary skill in the art, and include, for example exonuclease and single-stranded nucleic acid binding, respectively.
  • the RecE or RecT may be from a number of organisms, including Escherichia coli, Pantoea breeneri, Type-F symbiont of Plautia stali, Providencia sp. MGF014, Shigella sonnei, Pseudobacteriovorax antillogorgiicola, among others.
  • the RecE and RecT protein is derived from Escherichia coli.
  • the fusion protein comprises RecE, or a derivative or variant thereof.
  • the RecE, or derivative or variant thereof may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-8.
  • the RecE, or derivative or variant thereof may comprise an amino acid sequences with at least 70% (e.g., 75%., 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8.
  • the RecE, or derivative or variant thereof comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8.
  • the RecE, or derivative or variant thereof comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-3.
  • the fusion protein comprises RecT, or a derivative or variant thereof.
  • the RecT, or derivative or variant thereof may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 9-14.
  • the RecT, or derivative or variant thereof may comprise an amino acid sequences with at least 70% (e.g., 75%., 80%, 85%, 90%, 91%, 92%, STDU2-42312.601 (S22-113) 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14.
  • the RecT, or or variant thereof comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14.
  • the RecT comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NO:9.
  • the fusion protein comprises a recombination protein comprising an amino acid sequence at least 75% similar, or at least 75% identical to a recombination protein of SEQ ID NO:166 to SEQ ID NO:491, a recombination protein of Table 9, a recombination protein of SEQ ID NO:179, SEQ ID NO:185, SEQ ID NO:205, SEQ ID NO:321, SEQ ID NO:353, SEQ ID NO:359, SEQ ID NO:366, SEQ ID NO:424, or SEQ ID NO:479, or a recombination protein of SEQ ID NO:166, SEQ ID NO:168, SEQ ID NO:169, SEQ ID NO:170, SEQ ID NO:171, SEQ ID NO:241, SEQ ID NO:
  • the fusion protein comprises a recombination protein comprising a sequence having at least 80%, at least 85%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 98.5%, at least 99%, at least 99.5%, or 100% similarity or identity to the above referenced recombination proteins.
  • Truncations may be from either the C-terminal or N-terminal ends, or both. For example, as demonstrated in Example 6 below, a diverse set of truncations from either end or both provided a functional product.
  • the recombination protein comprises a tyrosine recombinase or functional fragment thereof. In some embodiments, the recombination protein comprises a serine recombinase or functional fragment thereof. In some embodiments, the recombination protein comprises an integrase, resolvase, or invertase, or functional fragment thereof.
  • the recombinase protein comprises a site-specific recombinase protein or functional fragment thereof. In some embodiments, the recombination protein comprises an exonuclease or functional fragment thereof. In some embodiments, the recombination protein comprises an ssDNA-binding protein or functional fragment thereof.
  • the fusion protein STDU2-42312.601 (S22-113) comprises without limitation, Hin, Gin, Tn3, ⁇ /six, CinH, Min, ParA, ⁇ , Bxb1, ⁇ C31, TP901-1, TGI, W ⁇ , ⁇ 370.1, ⁇ K38, ⁇ BTl, R4, ⁇ RVl, ⁇ FCl, MR11, A118, U153, Bxz2, gp29, Cre, Dre, Vika, Flp, Kw, SprA, HK022, P22, L1, or L5 or a homolog of any of such proteins or functional fragment thereof.
  • recombinases which may be classified in the art as integrases, resolvases, or invertases, may share substructures and activities with exonucleases and SSBs and be used according to the invention.
  • the invention provides a system which comprises a reverse transcriptase, a guide nucleic acid, and a recombination protein, and optionally a Cas protein.
  • reverse transcriptase describes a class of polymerases characterized as RNA-dependent DNA polymerases. All known reverse transcriptases require a primer to synthesize a DNA transcript from an RNA template.
  • reverse transcriptase has been used primarily to transcribe mRNA into cDNA which can then be cloned into a vector for further manipulation.
  • Avian myoblastosis virus (AMV) reverse transcriptase was the first widely used RNA-dependent DNA polymerase (Verma, Biochim. Biophys. Acta 473:1 (1977)).
  • the enzyme has 5 '-3 ' RNA-directed DNA polymerase activity, 5 -3 ' DNA-directed DNA polymerase activity, and RNase H activity.
  • RNase H is a processive 5' and 3' ribonuclease specific for the RNA strand for RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, New York: Wiley & Sons (1984)). Errors in transcription cannot be corrected by reverse transcriptase because known viral reverse transcriptases lack the 3'-5' exonuclease activity necessary for proof-reading (Saunders and Saunders, Microbial Genetics Applied to Biotechnology, London: Croom Helm (1987)). A detailed study of the activity of AMV reverse transcriptase and its associated RNase H activity has been presented by Berger et al, Biochemistry 22:2365-2372 (1983).
  • M-MLV Moloney murine leukemia virus
  • RT and linkers or ways to functionally link components of embodiments of the invention, such as the RT system or composition of the invention (as well as with regard to linkers or ways to functionally link components of systems or compositions discussed herein that do not involve RT) mention is made of WO2020/191241, WO2020/191153, WO2020/191245, STDU2-42312.601 (S22-113) WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and that involve what is known as prime editing and twin prime editing.
  • WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 is hereby incorporated herein by reference.
  • RTs of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 can be used in the practice of the present invention.
  • Linkers or ways to functionally link of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 can be used in the practice of the present invention.
  • WO/2020/191153 describes a system comprising a CRISPR protein (e.g.
  • a Cas9 nickase and a reverse transcriptase for use with a guide RNA that specifies a target site and templates synthesis of a desired edit in the form of a replacement DNA strand by way of an extension (either DNA or RNA) engineered onto a guide nucleic acid (e.g., at the 5' or 3' end, or at an internal portion of a guide RNA).
  • an extension either DNA or RNA
  • a guide nucleic acid e.g., at the 5' or 3' end, or at an internal portion of a guide RNA.
  • the invention provides single stranded binding protein (e.g., SSAP or SSB) used with a reverse transcriptase to edit without CRISPR-mediated nicking or cleavage or target DNA.
  • RT systems and compositions Current genome editing technology is limited by the low efficiency and accuracy for precision editing leading to very unreliable ability for using current tools such as CRISPR system to introduce accurate replacement, deletion, or insertion in mammalian cells.
  • the usual process involves delivery of gene editing tool (like CRISPR) and DNA repair template for introducing desirable changes to STDU2-42312.601 (S22-113) genome sequence.
  • the DNA delivered into the cell could insert non-specifically into off- target genomic loci or unintended targets, leading to major challenge for ensuring gene editing for therapeutic purposes.
  • Description regarding RT systems and compositions Here Applicants describe the invention using RNA as a molecular entity to mediate gene editing.
  • RNA as template donor
  • SSAP single-strand annealing protein, exemplified by RecT, lambda Red, T7gp2.5.
  • Applicants here show the efficiency of gene editing through the process of delivering three components into a cell: (1) Applicants introduced local DNA cleavage, nicking, or R-loop-formation using the CRISPR system composed of CRISPR enzymes (corresponds to Cas9/Cas9n/dCas9 or Cas12a/nCas12a/dCas12a respectively for cleavage/nick/R- loop-formation), and a guide RNA, where the guideRNA contains aptamer (such as MS2, or PP7, or BoxB) to recruit SSAP protein; (2) an RNA sequence bearing the desirable DNA changes with one or more homology arm (HA) region(s) that is either fused/linked to the guide RNA in (1), or fused/linked to a second guide RNA.
  • CRISPR system composed of CRISPR enzymes (corresponds to Cas9/Cas9n/dCas9 or Cas12a/nCas12a/dCas12a respectively for
  • the HA region is at least 20bp and provides a homology region next to the editing site for SSAP-mediated editing.
  • this second guideRNA binds to a nearby genomic site, located between 0bp to 150bp away from the guide RNA in (1).
  • This second guide RNA then forms a complex with CRISPR enzymes (such as Cas9/nCas9/dCas9 and Cas12a/nCas12a/dCas12a), and be recruited to the target genomic loci, and serve to provide RNA template/donor for the editing.
  • CRISPR enzymes such as Cas9/nCas9/dCas9 and Cas12a/nCas12a/dCas12a
  • the enzymes are either regular CRISPR enzymes or Cas proteins, but could also be nicking or deactivated CRISPR enzymes (dCas9, dCas12a, etc.) that only binds to target loci.
  • the guide is regular guide RNA or shorter guide RNA (typically 2 ⁇ 6bp shorter than the regular guide RNA, so 14bp to 18bp) to allow efficient binding but not cleavage of targets.
  • the RBP is MS2 coat protein (MCP), PP7 coat protein (PCP), or BoxB binding peptide from lambda phage (lambda N22 peptide).
  • RNA-templated SSAP gene-editing when Applicants fuse a reverse transcriptase (RT) to the SSAP protein via a long peptide linker, making this third component RBP-SSAP-RT, or RBP-RT-SSAP (- represent linkers), this further enhance editing efficiencies.
  • RT reverse transcriptase
  • the Cas9/nCas9/dCas9 or Cas12a/nCas12a/dCas12a protein is fused via linker to a reverse transcriptase (RT).
  • the guide RNA in this design also a binding-site (PBS) of at least 14-bp or more, which is complementary to a region at the editing site.
  • PBS binding-site
  • This PBS helps to initiate RT activity.
  • another design uses the same guide RNA as in the first embodiment, and to initiate RT activity, and a short oligo DNA (length is 14bp or more) that is complementary to a region at the editing site is supplied to the cell. This oligo DNA initiates RT activity and allows SSAP-mediated gene-editing.
  • the Cas9/nCas9/dCas9 or Cas12a/nCas12a/dCas12a protein is fused via linker to a reverse transcriptase (RT) from a retron system.
  • the guide RNA in this design has a msr/msd sequence from retron, and also one or more homology arm (HA) region(s), which is complementary to a region at the editing site.
  • the msr/msd sequence helps to initiate RT activity.
  • the HA region helps to mediate SSAP gene-editing.
  • RNA- mediated/RNA-templated gene editing in eukaryotic/mammalian cells.
  • Applicants further demonstrated that through designing cleavable RNA template using endogenous tRNA, ribozyme, or the direct repeat from Cas12a system, Applicants also achieve multiple-target gene editing using RNA as template.
  • Description regarding Prime Editing systems and compositions Here Applicants describe the invention using SSAPs to enhance editing methods that employ RNA as a molecular entity with reverse transcriptases to mediate gene editing.
  • Prime-editing has been generally described by Anzalone et al. (Anzalone et al., Nature. 2019; 576:149-157). Prime-editors use an engineered reverse transcriptase fused to Cas9 nickase and a prime-editing guide RNA (pegRNA). The pegRNA differs from regular sgRNAs and plays a major role in the system's function.
  • the pegRNA contains not only (a) the sequence complimentary to the target sites that directs nCas9 to its target sequence, but also (b) an additional sequence spelling the desired sequence changes (Anzalone et al., Nature. 2019; 576:149-157).
  • the 5′ of the pegRNA binds to the primer binding site (PBS) region on the DNA, exposing the non-complimentary strand.
  • PBS primer binding site
  • the unbound DNA of the PAM-containing strand is nicked by Cas9, creating a primer for the reverse transcriptase (RT) that is linked to nCas9.
  • the nicked PAM-strand is then extended by the RT by STDU2-42312.601 (S22-113) using the interior of the pegRNA as a template, consequently modifying the target region in a programmable manner.
  • the result of this step is two redundant PAM DNA flaps: the that was reverse transcribed from the pegRNA and the original, unedited 5′ flap.
  • the choice of which flap hybridizes with the non-PAM containing DNA-strand is an equilibrium process, in which the perfectly complimentary 5′ would likely be thermodynamically favored.
  • the 5′ flaps are preferentially degraded by cellular endonucleases that are ubiquitous during lagging- strand DNA synthesis (Hosfield et al., Cell.1998; 95:135-146).
  • PEI Polypeptide immunodeficiency virus reverse transcriptase
  • M-MLV RT Moloney murine leukemia virus reverse transcriptase
  • Prime-editing shows other advantages over previous CRISPR- mediated base-editing approaches, including less stringent PAM requirements due to the varied length of the RT template and reduced “bystander” editing.
  • prime editor systems use Cas9 nuclease instead of a Cas9 nickase. (See, e.g. Adikusuma, F. et al., Nucleic Acids Research. 2021; 49(18):10785-10795).
  • prime editor systems employ two or more prime editors (e.g. “twin prime editing”) which operate on both strands of a target DNA to promote editing of large pieces of DNA or strands of different targets to promote recombination (Anzalone et al., Nature Biotechnology. 2021; 40(5):731-740).
  • the prime editor designated “PEmax” includes optimizations of the PE2 STDU2-42312.601 (S22-113) protein by varying RT codon usage, SpCas9 mutations, NLS sequences, and the length and composition of peptide linkers between nCas9 and RT (Chen, PJ et al., Cell. 2021; 552).
  • the targeted modifications may be introduced using recombination proteins of the invention with a technique described in US Patent Nos. 11932884, 11898179, 11795443, 11732274, 11560566, 11542509, 11542496, 11447770 and/or 11268082 and/or International Patent Publications WO2015/089406, WO2017/070632, WO2018/176009, WO2020/191241, WO2020/191248, WO2021/226558, WO2021/072328 and/or WO2021/155065.
  • RNA-templated SSAP gene editing system (1) it has reduced off-target or toxicity due to RNA and is less immunogenic compared with DNA used in existing gene editing process, and also that RNA cannot be integrated directly into unintended genomic DNA sites or off-target DNA sites; (2) Applicants easily multiplex the precision gene editing methods by using cleavable RNA template in Applicants’ methods; (3) RNA is easier to delivery into cells, it is easier to manufacture, less expensive to scale up for clinical usage; (4) RNA has a lot of engineering potential by combining other regulatory or combinatorial payload/components via chemical linkage or biochemical coupling, to enable more efficiency delivery, editing, or synergistic action of RNA-templated gene editing with other type of gene editing or therapeutic modalities; and (5) the efficiency of RNA-templated gene editing could be enhanced via RNA and protein factors and is orthogonal to regular DNA-repair pathways that may be critical for health of target cells.
  • RNA-guided Recombination Protein System In certain embodiments or the invention, there is provided a system or composition for RNA-guided recombineering that does not rely primarily on CRISPR proteins.
  • the system or composition comprises: a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and a recombination protein.
  • the system or composition is capable of promoting R-loop formation.
  • the system or composition is capable of recombination.
  • the system or composition is free of CRISPR proteins.
  • the recombination protein comprises a microbial recombination protein.
  • the recombination protein comprises a viral recombination protein. In certain embodiments, the recombination protein comprises a eukaryotic STDU2-42312.601 (S22-113) recombination protein. In certain embodiments, the recombination protein comprises a mitochondrial recombination protein. In various embodiments, the comprises a single stranded DNA annealing protein (SSAP), a single stranded DNA binding protein (SSB), an exonuclease, or a combination of two or more thereof. In certain embodiments, the system or composition does not comprise a Cas9. In certain embodiments, the system or composition does not comprise a Cas12a.
  • SSAP single stranded DNA annealing protein
  • SSB single stranded DNA binding protein
  • exonuclease or a combination of two or more thereof.
  • the system or composition does not comprise a Cas9. In certain embodiments, the system or composition does not comprise a Cas12a
  • the system or composition does not comprise a Cas. In certain embodiments, the system or composition does not comprise a CRISPR.
  • the system can be thought of as comprising a guide nucleic acid that promotes R-loop formation by binding to target DNA and a recombination protein that promotes recombination between the target nucleic acids and donor nucleic acids.
  • the guide RNA and the recombination protein are effectively linked. In some embodiments, the linkage is covalent. In some embodiments, the linkage is non- covalent.
  • the guide nucleic acid comprises an aptamer sequence and the recombination protein comprises or is joined to an aptamer binding domain.
  • Table 1 provides non-limiting examples of R-loop guide nucleic acids used in the invention.
  • the RLoop-guideRNA comprises a guide component and a scaffold component in various arrangements, e.g., guide-scaffold and scaffold-guide embodiments, RLoop-guideRNA comprises the guide at 5' end of scaffold.
  • RLoop-guideRNA comprises the guide at the 3’ end of scaffold.
  • the guide sequence is engineered to bind to target DNA (genome target).
  • the guide is from 17 to 160 bases.
  • the scaffold comprises one or more of an aptamer sequence.
  • Aptamers used in the invention include, without limitation, MS2, PP7, BoxB, and others.
  • the fusion protein comprises an RNA binding component that binds to an aptamer such as is described above and an SSAP protein such as but not limited to RecT, LambdaRed, T7gp2.5, and others.
  • Donor nucleic acids can be single-strand or double-stranded DNA and comprise (1) various lengths of homology arms (HA) to match a genomic target region, and (2) a transgene, e.g., knock-in sequence or replacement sequence etc. There is no limit to the size of the transgene. Insertions of 600-bp (FIG. 70) and 800-bp (FIG. 71) are exemplified herein.
  • an RLoop-guideRNA binds to an RNA-binding-protein or domain fused to a recombination protein such as but not limited to SSAP.
  • the invention provides fusion proteins.
  • a recombination protein may be linked to either terminus of an aptamer binding protein in any orientation (e.g., N-terminus to C-terminus, C-terminus to N-terminus, N-terminus to N-terminus).
  • a recombination protein N-terminus is linked to the aptamer binding protein C-terminus.
  • the overall fusion protein from N- to C-terminus comprises the aptamer binding protein (N- to C-terminus) linked to the recombination protein (N- to C-terminus).
  • the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an endonuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an exonuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease and/or a Cas or dCas.
  • the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an endonuclease and/or a Cas or dCas.
  • the recombination protein may be functionally linked STDU2-42312.601 (S22-113) as a fusion protein or chimera or chimeric molecule to an exonuclease and/or a Cas or dCas.
  • the recombination protein may be expressed independently a protein with a nuclease.
  • the recombination protein may be expressed independently from, not a fusion protein with an endonuclease.
  • the recombination protein may be expressed independently from, not a fusion protein with an exonuclease. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with a nuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an aptamer and/or aptamer binding protein. In some embodiments, the recombination protein may be expressed independently, not as a fusion protein, with an aptamer and/or aptamer binding protein.
  • the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease and/or Cas or dCas and/or to an aptamer and/or aptamer binding protein.
  • the recombination protein may be expressed independently from, not a fusion protein with a nuclease and/or a Cas or dCas and/or an aptamer and/or aptamer binding protein.
  • the aptamer and/or aptamer binding protein is an MCP protein.
  • the recombination protein may be an SSAP.
  • nuclease refers to an agent, such as a protein or small molecule, that is capable of cleaving phosphodiester bonds that join nucleotide residues in a nucleic acid molecule.
  • the nuclease is but woven, e.g., an enzyme that is capable of binding to a nucleic acid molecule and cleaving phosphodiester bonds linking nucleotide residues in the nucleic acid molecule.
  • the nuclease may be an endonuclease, which cleaves a phosphodiester bond in a polynucleotide strand, or an exonuclease, which cleaves a phosphodiester bond at the end of a polynucleotide strand.
  • the nuclease is a site-specific nuclease that binds to and/or cleaves a particular phosphodiester bond within a particular nucleotide sequence, which is also referred to herein as a “recognition sequence,” “nuclease target site,” or “target site.”
  • the nuclease is an RNA-guided (e.g., RNA-programmable) nuclease that complexes (e.g., binds) to RNA having a sequence complementary to the target site, thereby providing sequence specificity of the nuclease.
  • the nuclease recognizes a single-stranded target site, while in other embodiments, the nuclease recognizes a double-stranded target site, e.g., a double-stranded DNA target site.
  • STDU2-42312.601 S22-113
  • Target sites for many naturally occurring nucleases for example many naturally occurring DNA restriction nucleases, are well known to those skilled in the art.
  • DNA nucleases such as EcoRI, HindIII or BamHI recognize palindromic double-stranded DNA target sites that are 4 to 10 base pairs in length and cut each of the two DNA strands at specific positions within the target site.
  • Some endonucleases symmetrically cleave a double-stranded nucleic acid target site, e.g., cleave both strands at the same position, such that the ends comprise base-paired nucleotides, also referred to herein as blunt ends.
  • Other endonucleases cleave double-stranded nucleic acid target sites asymmetrically, e.g., each strand is cleaved at a different position such that the ends contain unpaired nucleotides.
  • Unpaired nucleotides at the ends of a double-stranded DNA molecule are also referred to as “overhangs.” e.g., “5'-overhangs” or “3'-overhangs,” depending on whether the unpaired nucleotide forms the 5' or 3' end of the corresponding DNA strand.
  • the ends of a double-stranded DNA molecule that terminate in unpaired nucleotides are also referred to as sticky ends, so they can “stick” to the ends of other double-stranded DNA molecules that contain complementary unpaired nucleotides.
  • Nuclease proteins typically comprise a “binding domain” that mediates interaction of the protein with a nucleic acid substrate (in some cases also specifically binding to a target site) and a “cleavage domain” that catalyzes the cleavage of phosphodiester bonds within the nucleic acid backbone.
  • the nuclease protein is capable of binding and cleaving a nucleic acid molecule in a monomeric form, while in other embodiments, the nuclease protein must dimerize or otherwise cleave a target nucleic acid molecule.
  • Binding and cleavage domains of naturally occurring nucleases, as well as mode binding and cleavage domains that can be fused to create nucleases, are well known to those of skill in the art.
  • a zinc finger or transcriptional activator-like element can be used as a binding domain to specifically bind a desired target site and fused or conjugated to a cleavage domain, such as the cleavage domain of fokl, to create an engineered nuclease that cleaves the target site.
  • exonuclease examples include exonuclease I, exonuclease II, exonnuclease III, exonuclease IV, exonuclease V, exonuclease VII, exonuclease VIII, lambda exonuclease, Xrn1, mung bean nuclease, TREX2, exonuclease T, T7 exonuclease, strandase exonuclease, 3’-5’ exophosphodiesterase, and Bal31 nuclease.
  • the fusion protein further comprises a linker between the recombination protein and the aptamer binding protein.
  • the linkers may comprise any amino acid sequence of any length.
  • the linkers may be flexible such that they do not constrain either of the STDU2-42312.601 (S22-113) two components they link together in any particular orientation.
  • the linkers may essentially act as a spacer.
  • the linker links the C-terminus of the to the N-terminus of the aptamer binding protein.
  • the linker comprises the amino acid sequence of the 16-residue XTEN linker, SGSETPGTSESATPES (SEQ ID NO:15) or the 37-residue EXTEN linker, SASGGSSGGSSGSETPGTSESATPESSGGSSGGSGGS (SEQ ID NO:148).
  • the fusion protein further comprises a nuclear localization sequence (NLS).
  • the nuclear localization sequence may be at any location within the fusion protein (e.g., C-terminal of the aptamer binding protein, N-terminal of the aptamer binding protein, C-terminal of the recombination protein).
  • the nuclear localization sequence is linked to the C-terminus of the recombination protein.
  • a number of nuclear localization sequences are known in the art (see, e.g., Lange, A., et al., J Biol Chem.2007; 282(8): 5101-5105, incorporated herein by reference) and may be used in connection with the present invention.
  • the nuclear localization sequence may be the SV40 NLS, PKKKRKV (SEQ ID NO:16); the Ty1 NLS, NSKKRSLEDNETEIKVSRDTWNTKNMRSLEPPRSKKRIH (SEQ ID NO:17); the c-Myc NLS, PAAKRVKLD (SEQ ID NO:18); the biSV40 NLS, KRTADGSEFESPKKKRKV (SEQ ID NO:19); and the Mut NLS, PEKKRRRPSGSVPVLARPSPPKAGKSSCI (SEQ ID NO:20).
  • the nuclear localization sequence is the SV40 NLS, PKKKRKV (SEQ ID NO:16).
  • the Cas protein and the fusion protein are desirably included in a single composition alone, in combination with each other, and/or the polynucleotide(s) (e.g., a vector) comprising the guide RNA sequence and the aptamer sequence.
  • the Cas protein and/or the fusion protein may or may not be physically or chemically bound to the polynucleotide.
  • the Cas protein and/or the recombination protein can be associated with a polynucleotide using any suitable method for protein-protein linking or protein-virus linking known in the art.
  • the invention further provides compositions and vectors comprising a polynucleotide comprising a nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an RNA aptamer binding protein.
  • the compositions or vectors may further comprise at least one or both of a polynucleotide comprising a nucleic acid sequence encoding a Cas protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence.
  • the nucleic acid molecule comprising a guide RNA sequence further comprises at least one RNA aptamer sequence.
  • the polynucleotide comprising a nucleic acid sequence encoding a Cas protein further comprises a sequence encoding at least one peptide aptamer sequence.
  • Descriptions of the nucleic acid molecule comprising a guide RNA sequence, the aptamer sequences, the Cas proteins, the recombination proteins, and the aptamer binding proteins set forth above in connection with the inventive system also are applicable to the polynucleotides of the recited compositions and vectors.
  • the nucleic acid sequence encoding the Cas protein and/or the nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein can be provided to a cell on the same vector (e.g., in cis) as the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence.
  • a unidirectional promoter can be used to control expression of each nucleic acid sequence.
  • a combination of bidirectional and unidirectional promoters can be used to control expression of multiple nucleic acid sequences.
  • a nucleic acid sequence encoding the Cas protein, the nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein, and the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence can be provided to a cell on separate vectors (e.g., in trans).
  • Each of the nucleic acid sequences in each of the separate vectors can comprise the same or different expression control sequences.
  • the separate vectors can be provided to cells simultaneously or sequentially.
  • the vector(s) comprising the nucleic acid sequences encoding the Cas protein and encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein can be introduced into a host cell that is capable of expressing the polypeptide encoded thereby, including any suitable prokaryotic or eukaryotic cell.
  • the invention provides an isolated cell comprising the vector or nucleic acid sequences disclosed herein.
  • Preferred host cells are those that can be easily and reliably grown, have reasonably fast growth rates, have well characterized expression systems, and can be transformed or transfected easily and efficiently.
  • suitable prokaryotic cells include, but are not limited to, cells from the genera Bacillus (such as Bacillus subtilis and Bacillus brevis), Escherichia (such as E. coli), STDU2-42312.601 (S22-113) Pseudomonas, Streptomyces, Salmonella, and Envinia.
  • Suitable eukaryotic cells are known in the art and include, for example, yeast cells, insect cells, and mammalian cells. yeast cells include those from the genera Kluyveromyces, Pichia, Rhino-sporidium, Saccharomyces, and Schizosaccharomyces.
  • Exemplary insect cells include Sf-9 and HIS (Invitrogen, Carlsbad, Calif.) and are described in, for example, Kitts et al., Biotechniques, 14: 810-817 (1993); Lucklow, Curr. Opin. Biotechnol., 4: 564-572 (1993); and Lucklow et al., J. Virol., 67: 4566-4579 (1993), incorporated herein by reference.
  • the host cell is a mammalian cell, and in some embodiments, the host cell is a human cell.
  • a number of suitable mammalian and human host cells are known in the art, and many are available from the American Type Culture Collection (ATCC, Manassas, Va.).
  • suitable mammalian cells include, but are not limited to, Chinese hamster ovary cells (CHO) (ATCC No. CCL61), CHO DHFR-cells (Urlaub et al., Proc. Natl. Acad. Sci. USA, 97: 4216-4220 (1980)), human embryonic kidney (HEK) 293 or 293T cells (ATCC No. CRL1573), and 3T3 cells (ATCC No. CCL92).
  • CHO Chinese hamster ovary cells
  • CHO DHFR-cells Urlaub et al., Proc. Natl. Acad. Sci. USA, 97: 4216-4220 (1980)
  • human embryonic kidney (HEK) 293 or 293T cells ATCC No. CRL1573)
  • 3T3 cells ATCC No. CCL92
  • Other suitable mammalian cell lines are the monkey COS-1 (ATCC No. CRL1650) and COS-7 cell lines (ATCC No. CRL1651), as
  • mammalian host cells include primate, rodent, and human cell lines, including transformed cell lines. Normal diploid cells, cell strains derived from in vitro culture of primary tissue, as well as primary explants, are also suitable.
  • Other suitable mammalian cell lines include, but are not limited to, mouse neuroblastoma N2A cells, HeLa, HEK, A549, HepG2, mouse L-929 cells, and BHK or HaK hamster cell lines. Methods for selecting suitable mammalian host cells and methods for transformation, culture, amplification, screening, and purification of cells are known in the art. [00267] Methods of Altering Target DNA. The invention also provides a method of altering a target DNA.
  • the method alters genomic DNA sequence in a cell, although any desired nucleic acid may be modified.
  • the method comprises introducing the systems, compositions, or vectors described herein into a cell comprising a target genomic DNA sequence.
  • Descriptions of the nucleic acid molecule comprising a guide RNA sequence, the Cas proteins, the recombination proteins, the recruitment systems, and polynucleotides encoding thereof, the cell, the target genomic DNA sequence, and components thereof, set forth above in connection with the inventive system are also applicable to the method of altering a target genomic DNA sequence in a cell.
  • delivery of editing systems or components comprises delivery of a ribonucleoprotein (RNP) complex.
  • RNP ribonucleoprotein
  • targeting nucleic acids including but not limited to gRNAs, dgRNAs can be provided in complexes, such as without limitation, complexes comprising Cas9 or dCas9.
  • an RNP complex comprises a guide nucleic acid and a Cas9 fusion protein, such as without limitation a complex comprising dCas9-SSAP.
  • an RNP complex comprises a guide nucleic acid and a recombination protein, e.g., an SSAP or SSB, which may be adapted or modified to bind to the guide nucleic acid.
  • the guide nucleic acid and the recombination protein or Cas9 fusion protein comprise binding elements that promote complex formation.
  • a recombination protein comprises an MCP domain and a guide RNA comprises an MS2 aptamer, whereby binding of the MS2 aptamer to the MCP domain produces an RNP.
  • the guide RNA and the Cas and/or recombination protein polypeptide are be incubated together to form a ribonucleoprotein (RNP) complex prior to introducing into a cell, for example mixed together in a vessel to form an RNP complex, and then the RNP complex is introduced into the cell.
  • RNP ribonucleoprotein
  • the Cas polypeptide described herein can be an mRNA encoding the Cas polypeptide, which Cas mRNA is introduced into the primary cell together with the modified sgRNA as an “All RNA” CRISPR system.
  • the RNP complex and donor nucleic acid or vector are concomitantly introduced into a cell.
  • the RNP complex and the donor nucleic acid or vector are sequentially introduced into the primary cell.
  • the RNP complex is introduced into the primary cell before the donor.
  • the donor is introduced into the primary cell before the RNP complex.
  • the RNP complex can be introduced into a cell about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 90, 120, 150, 180, 210, or 240 minutes or more before the donor nucleic acid or vector, or vice versa.
  • US Pat. 11,193,141 describes introduction of an RNP complex and a homologous donor adeno-associated viral (AAV) vector into a cell to mediate targeted integration. The methods described can be used with the instant invention.
  • AAV homologous donor adeno-associated viral
  • US Patent Publication 2019/0093128 describes introducing into the zygote a ribonucleoprotein (RNP) comprising a class STDU2-42312.601 (S22-113) 2 CRISPR/Cas endonuclease complexed with a corresponding CRISPR/Cas guide RNA that hybridizes to a target sequence within the genomic DNA of the zygote.
  • RNP ribonucleoprotein
  • Non-limiting examples include use of (1) purified Cas9 or dCas9 protein; (2) synthesized guideRNA with MS2 aptamer; (3) purified MCP-SSAP fusion protein; (4) donor DNA (double, single strand DNA donor for HEK293 and K562, and AAV donor for HSC), delivered into HEK293, K562, and primary hematopoietic stem cells (mouse and human) for knock-in editing.
  • the following table provides exemplary sequences for generating knock-ins including at ALB and AAVS1. The sequences can be employed in RNPs, nucleic acids, vectors, for expression, and the like. Table 2.
  • the cell is in an organism or host, such that introducing the disclosed systems, compositions, vectors into the cell comprises administration to a subject.
  • the method may comprise providing or administering to the subject, in vivo, or by transplantation of ex vivo treated cells, systems, compositions, vectors of the present system.
  • a “subject” may be human or non-human and may include, for example, animal strains or species used as “model systems” for research purposes, such a mouse model as described herein.
  • subject may include either adults or juveniles (e.g., children).
  • subject may mean any living organism, preferably a mammal (e.g., human or non-human) that may from the administration of compositions contemplated herein.
  • mammals include, but are not limited to, any member of the Mammalian class: humans, non-human primates such as chimpanzees, and other apes and monkey species; farm animals such as cattle, horses, sheep, goats, swine; domestic animals such as rabbits, dogs, and cats; laboratory animals including rodents, such as rats, mice and guinea pigs, and the like.
  • non-mammals include, but are not limited to, birds, fish and the like.
  • the mammal is a human. Plants include without limitation sugar cane, corn, wheat, rice, oil palm fruit, potatoes, soy beans, vegetables, cassava, sugar beets, tomatoes, barley, bananas, watermelon, onions, sweet potatoes, cucumbers, apples, seed cotton, oranges, and the like.
  • the terms “providing,” “administering,” and “introducing,” are used interchangeably herein and refer to the placement of the systems of the invention into a subject by a method or route which results in at least partial localization of the system to a desired site.
  • the systems can be administered by any appropriate route which results in delivery to a desired location in the subject.
  • altering a DNA sequence refers to modifying at least one physical feature of a DNA sequence of interest.
  • DNA alterations include, for example, single or double strand DNA breaks, deletion, or insertion of one or more nucleotides, and other modifications that affect the structural integrity or nucleotide sequence of the DNA sequence.
  • the modifications of a target sequence in genomic DNA may lead to, for example, gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, gene knock-down, and the like.
  • the systems and methods described herein may be used to correct one or more defects or mutations in a gene (referred to as “gene correction”).
  • the target genomic DNA sequence encodes a defective version of a gene
  • the system further comprises a donor nucleic acid molecule which encodes a wild-type or corrected version of the gene.
  • the target genomic DNA sequence is a “disease-associated” gene.
  • the term “disease-associated gene,” refers to any gene or polynucleotide whose gene products are expressed at an abnormal level or in an abnormal form in cells obtained from a disease-affected individual as compared with tissues or cells obtained from an individual not affected by the disease.
  • a disease-associated gene may be expressed at an abnormally high level or at an abnormally low level, where the altered expression correlates with the occurrence and/or progression
  • a disease-associated gene also refers to a gene, the mutation or genetic variation of which is directly responsible or is in linkage disequilibrium with a gene(s) that is responsible for the etiology of a disease.
  • genes responsible for such “single gene” or “monogenic” diseases include, but are not limited to, adenosine deaminase, ⁇ -1 antitrypsin, cystic fibrosis transmembrane conductance regulator (CFTR), ⁇ -hemoglobin (HBB), oculocutaneous albinism II (OCA2), Huntingtin (HTT), dystrophia myotonica-protein kinase (DMPK), low-density lipoprotein receptor (LDLR), apolipoprotein B (APOB), neurofibromin 1 (NF1), polycystic kidney disease 1 (PKD1), polycystic kidney disease 2 (PKD2), coagulation factor VIII (F8), dystrophin (DMD), phosphate-regulating endopeptidase homologue, X-linked (PHEX), methyl-CpG-binding protein 2 (MECP2), and ubiquitin-specific peptidase 9Y, Y-linked (USP9Y
  • the invention provides knock-ins of large transgenes at therapeutically relevant loci in the human genome.
  • the locus provides cell or tissue-specific expression.
  • the invention comprises insertion of nucleic acids into the albumin (ALB) locus.
  • ALB locus provides for liver targeting in human hepatocytes, is highly expressed and in a liver-specific manner.
  • the invention comprises insertion of nucleic acids into the AAVS1 locus.
  • the AAVS1 locus is a safe-harbor locus for gene therapy that is well expressed in certain tissue types and can be used in a wide variety of treatments, with low expression in liver.
  • US Patent Publication 2018/0214490 A1 describes gene therapy for lysosomal storage diseases, including targeting transgenes to safe harbor loci such as the AAVS1, HPRT and CCR5 genes in human cells, and Rosa26 in murine cells.
  • US Patent 9267154 describes integration of exogenous nucleic acid sequences into the PPP1R12C locus, which is widely expressed in most tissues.
  • the target genomic DNA sequence can comprise a gene, the mutation of which contributes to a particular disease in combination with mutations genes. Diseases caused by the contribution of multiple genes which lack simple (e.g., Mendelian) inheritance patterns are referred to in the art as a “multifactorial” or “polygenic” disease.
  • multifactorial or polygenic diseases include, but are not limited to, asthma, diabetes, epilepsy, hypertension, bipolar disorder, and schizophrenia. Certain developmental abnormalities also can be inherited in a multifactorial or polygenic pattern and include, for example, cleft lip/palate, congenital heart defects, and neural tube defects.
  • the method of altering a target genomic DNA sequence can be used to delete nucleic acids from a target sequence in a cell by cleaving the target sequence and allowing the cell to repair the cleaved sequence in the absence of an exogenously provided donor nucleic acid molecule.
  • nucleic acid sequence in this manner can be used in a variety of applications, such as, for example, to remove disease-causing trinucleotide repeat sequences in neurons, to create gene knock-outs or knock-downs, and to generate mutations for disease models in research.
  • the term “donor nucleic acid molecule” refers to a nucleotide sequence that is inserted into the target DNA (e.g., genomic DNA). As described above the donor DNA may include, for example, a gene or part of a gene, a sequence encoding a tag or localization sequence, or a regulating element. The donor nucleic acid molecule may be of any length.
  • the donor nucleic acid molecule is between 10 and 10,000 nucleotides in length. For example, between about 100 and 5,000 nucleotides in length, between about 200 and 2,000 nucleotides in length, between about 500 and 1,000 nucleotides in length, between about 500 and 5,000 nucleotides in length, between about 1,000 and 5,000 nucleotides in length, or between about 1,000 and 10,000 nucleotides in length, [00283]
  • the disclosed systems and methods overcome challenges encountered during conventional gene editing, including low efficiency and off-target events, particularly with kilobase-scale nucleic acids. In some embodiments, the disclosed systems and methods improve the efficiency of gene editing.
  • the disclosed systems and methods can have a 2- to 10-fold increase in efficiency over conventional CRISPR-Cas9 systems and methods, as shown in Examples 2, 3, and 5.
  • the improvement in efficiency is accompanied by a reduction in off-target events.
  • the off-target events may be reduced by greater than 50% compared STDU2-42312.601 (S22-113) to conventional CRISPR-Cas9 systems and methods, for example, a reduction of off-target events by about 90% is shown in Example 3.
  • Another aspect of increasing the overall accuracy of a gene editing system is reducing the on-target insertion-deletions (indels), a byproduct of HDR editing.
  • the disclosed systems and methods reduce the on-target indels by greater than 90% compared to conventional CRISPR-Cas9 systems and methods, as shown in Example 3.
  • the invention further provides kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods described herein.
  • kits may include CRISPR reagents (Cas protein, guide RNA, vectors, compositions, etc.), recombineering reagents (recombination protein-aptamer binding protein fusion protein, the aptamer sequence, vectors, compositions, etc.) transfection or administration reagents, negative and positive control samples (e.g., cells, template DNA), cells, containers housing one or more components (e.g., microcentrifuge tubes, boxes), detectable labels, detection and analysis instruments, software, instructions, and the like.
  • the RNAs may be delivered using adeno associated virus (AAV), lentivirus, adenovirus or other viral vector types, or combinations thereof.
  • AAV adeno associated virus
  • the RNAs can be packaged into one or more viral vectors.
  • the viral vector is delivered to the tissue of interest by, for example, an intramuscular injection, while other times the viral delivery is via intravenous, transdermal, intranasal, oral, mucosal, or other delivery methods. Such delivery may be either via a single dose, or multiple doses.
  • the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector chose, the target cell, organism, or tissue, the general condition of the subject to be treated, the degree of transformation/modification sought, the administration route, the administration mode, the type of transformation/modification sought, etc.
  • Such a dosage may further contain, for example, a carrier (water, saline, ethanol, glycerol, lactose, sucrose, calcium phosphate, gelatin, dextran, agar, pectin, peanut oil, sesame oil, etc.), a diluent, a pharmaceutically-acceptable carrier (e.g., phosphate-buffered saline), a pharmaceutically-acceptable excipient, and/or other compounds known in the art.
  • a carrier water, saline, ethanol, glycerol, lactose, sucrose, calcium phosphate, gelatin, dextran, agar, pectin, peanut oil, sesame oil, etc.
  • a pharmaceutically-acceptable carrier e.g., phosphate-buffered saline
  • a pharmaceutically-acceptable excipient e.g., phosphate-buffered saline
  • the dosage may further contain one or more pharmaceutically acceptable salts such as, for example, a mineral acid salt such as a hydrochloride, a hydrobromide, a phosphate, a sulfate, etc.; and the salts of organic acids such as acetates, propionates, malonates, benzoates, etc.
  • auxiliary substances such as STDU2-42312.601 (S22-113) wetting or emulsifying agents, pH buffering substances, gels or gelling materials, flavorings, colorants, microspheres, polymers, suspension agents, etc.
  • Suitable exemplary ingredients include microcrystalline cellulose, carboxymethylcellulose sodium, polysorbate 80, phenylethyl alcohol, chlorobutanol, potassium sorbate, sorbic acid, sulfur dioxide, propyl gallate, the parabens, ethyl vanillin, glycerin, phenol, parachlorophenol, gelatin, albumin, and a combination thereof.
  • the delivery is via an adenovirus, which may be at a single booster dose containing at least 1 ⁇ 10 5 particles (also referred to as particle units, pu) of adenoviral vector.
  • the dose preferably is at least about 1 ⁇ 10 6 particles (for example, about 1 ⁇ 10 6 -1 ⁇ 10 12 particles), more preferably at least about 1 ⁇ 10 10 particles, more preferably at least about 1 ⁇ 10 8 particles (e.g., about 1 ⁇ 10 8 -1 ⁇ 10 11 particles or about 1 ⁇ 10 8 -1 ⁇ 10 12 particles), and most preferably at least about 1 ⁇ 10 10 particles (e.g., about 1 ⁇ 10 9 -1 ⁇ 10 10 particles or about 1 ⁇ 10 9 -1 ⁇ 10 12 particles), or even at least about 1 ⁇ 10 10 particles (e.g., about 1 ⁇ 10 10 -1 ⁇ 10 12 particles) of the adenoviral vector.
  • the dose comprises no more than about 1 ⁇ 1014 particles, preferably no more than about 1 ⁇ 10 13 particles, even more preferably no more than about 1 ⁇ 10 12 particles, even more preferably no more than about 1 ⁇ 10 11 particles, and most preferably no more than about 1 ⁇ 10 10 particles (e.g., no more than about 1 ⁇ 10 9 articles).
  • the dose may contain a single dose of adenoviral vector with, for example, about 1 ⁇ 10 6 particle units (pu), about 2 ⁇ 10 6 pu, about 4 ⁇ 10 6 pu, about 1 ⁇ 10 7 pu, about 2 ⁇ 10 7 pu, about 4 ⁇ 10 7 pu, about 1 ⁇ 10 8 pu, about 2 ⁇ 10 8 pu, about 4 ⁇ 10 8 pu, about 1 ⁇ 10 9 pu, about 2 ⁇ 10 9 pu, about 4 ⁇ 10 9 pu, about 1 ⁇ 10 10 pu, about 2 ⁇ 10 10 pu, about 4 ⁇ 10 10 pu, about 1 ⁇ 10 11 pu, about 2 ⁇ 10 11 pu, about 4 ⁇ 10 11 pu, about 1 ⁇ 10 12 pu, about 2 ⁇ 10 12 pu, or about 4 ⁇ 10 12 pu of adenoviral vector.
  • adenoviral vector with, for example, about 1 ⁇ 10 6 particle units (pu), about 2 ⁇ 10 6 pu, about 4 ⁇ 10 6 pu, about 1 ⁇ 10 7 pu, about 2 ⁇ 10 7 pu, about 4 ⁇ 10 7 pu, about 1 ⁇ 10 8 pu, about 2 ⁇ 10 8 pu, about 4 ⁇ 10
  • the adenovirus is delivered via multiple doses. STDU2-42312.601 (S22-113) [00288] In an embodiment herein, the delivery is via an AAV.
  • a therapeutically effective dosage for in vivo delivery of the AAV to a human is believed to be in the range of 20 to about 50 ml of saline solution containing from about 1 ⁇ 10 10 to about 1 ⁇ 10 10 functional AAV/ml solution.
  • the AAV dose is generally in the range of concentrations of from about 1 ⁇ 10 5 to 1 ⁇ 10 50 genomes AAV, from about 1 ⁇ 10 8 to 1 ⁇ 10 20 genomes AAV, from about 1 ⁇ 10 10 to about 1 ⁇ 10 16 genomes, or about 1 ⁇ 10 11 to about 1 ⁇ 10 16 genomes AAV.
  • a human dosage may be about 1 ⁇ 10 13 genomes AAV.
  • concentrations may be delivered in from about 0.001 ml to about 100 ml, about 0.05 to about 50 ml, or about 10 to about 25 ml of a carrier solution.
  • Other effective dosages can be readily established by one of ordinary skill in the art through routine trials establishing dose response curves.
  • the delivery is via a plasmid.
  • the dosage should be a sufficient amount of plasmid to elicit a response.
  • suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg, or from about 1 ⁇ g to about 10 ⁇ g.
  • the doses herein are based on an average 70 kg individual.
  • the frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), or scientist skilled in the art.
  • Lentiviruses are complex retroviruses that have the ability to infect and express their genes in both mitotic and post-mitotic cells.
  • the most commonly known lentivirus is the human immunodeficiency virus (HIV), which uses the envelope glycoproteins of other viruses to target a broad range of cell types.
  • HIV human immunodeficiency virus
  • Lentiviruses may be prepared as follows.
  • Lentivirus may be purified as follows. Viral supernatants were harvested after 48 hours. Supernatants were first cleared of debris and filtered through a 0.45 um low protein binding (PVDF) filter. They were then spun in an ultracentrifuge for 2 hours at 24,000 rpm.
  • PVDF low protein binding
  • lentiviral vectors based on the equine infectious anemia virus (EIAV) are also contemplated, especially for ocular gene therapy (see, e.g., Balagaan, J Gene Med 2006; 8: 275-285, Published online 21 Nov.2005 in Wiley InterScienc; available at the website: interscience.wiley.com. DOI: 10.1002/jgm.845).
  • EIAV equine infectious anemia virus
  • RetinoStat® an equine infectious anemia virus-based lentiviral gene therapy vector that expresses angiostatic proteins endostain and angiostatin that is delivered via a subretinal injection for the treatment of the web form of age-related macular degeneration
  • RetinoStat® an equine infectious anemia virus-based lentiviral gene therapy vector that expresses angiostatic proteins endostain and angiostatin that is delivered via a subretinal injection for the treatment of the web form of age-related macular degeneration
  • Lentiviral vectors have been disclosed as in the treatment for Parkinson's Disease, see, e.g., US Patent Publication No. 20120295960 and U.S. Pat. Nos. 7,303,910 and 7,351,585.
  • Lentiviral vectors have also been disclosed for the treatment of ocular diseases, see e.g., US Patent Publication Nos. 20060281180, 20090007284, US20110117189; US20090017543; US20070054961, US20100317109. Lentiviral vectors have also been disclosed for delivery to the brain, see, e.g., US Patent Publication Nos. US20110293571; US20110293571, US20040013648, US20070025970, US20090111106 and U.S. Pat. No. 7,259,015. [00296] Several types of particle delivery systems and/or formulations are known to be useful in a diverse spectrum of biomedical applications.
  • a particle is defined as a small object that behaves as a whole unit with respect to its transport and properties. Particles are further classified according to diameter. Coarse particles cover a range between 2,500 and 10,000 nanometers. Fine particles are sized between 100 and 2,500 nanometers. Ultrafine particles, or nanoparticles, are generally between 1 and 100 nanometers in size. The basis of the 100-nm limit is the fact that novel properties that differentiate particles from the bulk material typically develop at a critical length scale of under 100 nm.
  • a particle delivery system/formulation is defined as any biological delivery system/formulation which includes a particle in accordance with the present
  • a particle in accordance with the present invention is any entity having a greatest dimension (e.g., diameter) of less than 100 microns ( ⁇ m). In some embodiments, inventive particles have a greatest dimension of less than 10 microns ( ⁇ m). In some embodiments, inventive particles have a greatest dimension of less than 2000 nanometers (nm). In some embodiments, inventive particles have a greatest dimension of less than 1000 nanometers (nm).
  • inventive particles have a greatest dimension of less than 900 nm, 800 nm, 700 nm, 600 nm, 500 nm, 400 nm, 300 nm, 200 nm, or 100 nm.
  • inventive particles have a greatest dimension (e.g., diameter) of 500 nm or less.
  • inventive particles have a greatest dimension (e.g., diameter) of 250 nm or less.
  • inventive particles have a greatest dimension (e.g., diameter) of 200 nm or less.
  • inventive particles have a greatest dimension (e.g., diameter) of 150 nm or less.
  • inventive particles have a greatest dimension (e.g., diameter) of 100 nm or less. Smaller particles, e.g., having a greatest dimension of 50 nm or less are used in some embodiments of the invention. In some embodiments, inventive particles have a greatest dimension ranging between 25 nm and 200 nm. [00298] Particle characterization (including e.g., characterizing morphology, dimension, etc.) is done using a variety of different techniques.
  • TEM electron microscopy
  • SEM atomic force microscopy
  • AFM dynamic light scattering
  • XPS X-ray photoelectron spectroscopy
  • XRD powder X-ray diffraction
  • FTIR Fourier transform infrared spectroscopy
  • MALDI-TOF matrix-assisted laser desorption/ionization time-of-flight mass spectrometry
  • NMR nuclear magnetic resonance
  • Characterization may be made as to native particles (e.g., preloading) or after loading of the cargo (herein cargo refers to one or more RNAs and/or vectors encoding the same, and may include additional components, carriers and/or excipients) to provide particles of an optimal size for delivery for any in vitro, ex vivo and/or in vivo application of the present invention.
  • particle dimension (e.g., diameter) characterization is based on measurements using dynamic laser scattering (DLS).
  • DLS dynamic laser scattering
  • any of the delivery systems described herein including but not limited to, e.g., lipid-based systems, STDU2-42312.601 (S22-113) liposomes, micelles, microvesicles, exosomes, or gene gun may be provided as particle delivery systems within the scope of the present invention.
  • CRISPR enzyme mRNA and guide RNA may be delivered simultaneously using nanoparticles or lipid envelopes.
  • Other delivery systems or vectors may be used in conjunction with the nanoparticle aspects of the invention.
  • a “nanoparticle” refers to any particle having a diameter of less than 1000 nm.
  • nanoparticles of the invention have a greatest dimension (e.g., diameter) of 500 nm or less. In other preferred embodiments, nanoparticles of the invention have a greatest dimension ranging between 25 nm and 200 nm. In other preferred embodiments, nanoparticles of the invention have a greatest dimension of 100 nm or less. In other preferred embodiments, nanoparticles of the invention have a greatest dimension ranging between 35 nm and 60 nm.
  • Nanoparticles encompassed in the present invention may be provided in different forms, e.g., as solid nanoparticles (e.g., metal such as silver, gold, iron, titanium), non-metal, lipid- based solids, polymers), suspensions of nanoparticles, or combinations thereof.
  • Metal, dielectric, and semiconductor nanoparticles may be prepared, as well as hybrid structures (e.g., core-shell nanoparticles).
  • Nanoparticles made of semiconducting material may also be labeled quantum dots if they are small enough (typically sub 10 nm) that quantization of electronic energy levels occurs. Such nanoscale particles are used in biomedical applications as drug carriers or imaging agents and may be adapted for similar purposes in the present invention.
  • Nanoparticles with one half hydrophilic and the other half hydrophobic are termed Janus particles and are particularly effective for stabilizing emulsions. They can self-assemble at water/oil interfaces and act as solid surfactants.
  • Su X, Fricke J, Kavanagh D G, Irvine D J In vitro and in vivo mRNA delivery using lipid-enveloped pH-responsive polymer nanoparticles” Mol Pharm. 2011 Jun.
  • PBAE poly( ⁇ -amino ester)
  • STDU2-42312.601 S22-113 bilayer shell.
  • the pH-responsive PBAE component was chosen to promote endosome disruption, while the lipid surface layer was to minimize toxicity of the polycation core. Such are, therefore, preferred for delivering RNA of the present invention.
  • nanoparticles based on self assembling bioadhesive polymers are contemplated, which may be applied to oral delivery of peptides, intravenous delivery of peptides and nasal delivery of peptides, all to the brain.
  • Other embodiments, such as oral absorption and ocular deliver of hydrophobic drugs are also contemplated.
  • the molecular envelope technology involves an engineered polymer envelope which is protected and delivered to the site of the disease (see, e.g., Mazza, M. et al. ACSNano, 2013. 7(2): 1016-1026; Siew, A., et al. Mol Pharm, 2012. 9(1):14-28; Lalatsa, A., et al.
  • nanoparticles that can deliver RNA to a cancer cell to stop tumor growth developed by Dan Anderson's lab at MIT may be used/and or adapted to the CRISPR Cas system of the present invention.
  • the Anderson lab developed fully automated, combinatorial systems for the synthesis, purification, characterization, and formulation of new biomaterials and nanoformulations.
  • US patent application 20110293703 relates to lipidoid compounds are also particularly useful in the administration of polynucleotides, which may be applied to deliver the CRISPR Cas system of the present invention.
  • the aminoalcohol lipidoid compounds are combined with an agent to be delivered to a cell or a subject to form microparticles, nanoparticles, liposomes, or micelles.
  • the agent to be delivered by the particles, liposomes, or micelles may be in the form STDU2-42312.601 (S22-113) of a gas, liquid, or solid, and the agent may be a polynucleotide, protein, peptide, or small molecule.
  • the minoalcohol lipidoid compounds may be combined with other compounds, polymers (synthetic or natural), surfactants, cholesterol, carbohydrates, proteins, lipids, etc. to form the particles. These particles may then optionally be combined with a pharmaceutical excipient to form a pharmaceutical composition.
  • US Patent Publication No. 0110293703 also provides methods of preparing the aminoalcohol lipidoid compounds. One or more equivalents of an amine are allowed to react with one or more equivalents of an epoxide-terminated compound under suitable conditions to form an aminoalcohol lipidoid compound of the present invention.
  • all the amino groups of the amine are fully reacted with the epoxide-terminated compound to form tertiary amines. In other embodiments, all the amino groups of the amine are not fully reacted with the epoxide-terminated compound to form tertiary amines thereby resulting in primary or secondary amines in the aminoalcohol lipidoid compound. These primary or secondary amines are left as is or may be reacted with another electrophile such as a different epoxide-terminated compound.
  • amines may be fully functionalized with two epoxide-derived compound tails while other molecules will not be completely functionalized with epoxide-derived compound tails.
  • a diamine or polyamine may include one, two, three, or four epoxide-derived compound tails off the various amino moieties of the molecule resulting in primary, secondary, and tertiary amines. In certain embodiments, all the amino groups are not fully functionalized.
  • two of the same types of epoxide-terminated compounds are used. In other embodiments, two or more different epoxide-terminated compounds are used.
  • the synthesis of the aminoalcohol lipidoid compounds is performed with or without solvent, and the synthesis may be performed at higher temperatures ranging from 30.-100 C., preferably at approximately 50.-90 C.
  • the prepared aminoalcohol lipidoid compounds may be optionally purified.
  • the mixture of aminoalcohol lipidoid compounds may be purified to yield an aminoalcohol lipidoid compound with a particular number of epoxide-derived compound tails. Or the mixture may be purified to yield a particular stereo- or regioisomer.
  • the aminoalcohol lipidoid compounds may also be alkylated using an alkyl halide (e.g., methyl iodide) or other alkylating agent, and/or they may be acylated.
  • an alkyl halide e.g., methyl iodide
  • STDU2-42312.601 S22-113
  • US Patent Publication No.0110293703 also provides libraries of aminoalcohol lipidoid compounds prepared by the inventive methods. These aminoalcohol lipidoid may be prepared and/or screened using high-throughput techniques involving liquid handlers, robots, microtiter plates, computers, etc.
  • the aminoalcohol lipidoid compounds are screened for their ability to transfect polynucleotides or other agents (e.g., proteins, peptides, small molecules) into the cell.
  • US Patent Publication No.20130302401 relates to a class of poly(beta-amino alcohols) (PBAAs) has been prepared using combinatorial polymerization.
  • PBAAs poly(beta-amino alcohols)
  • the inventive PBAAs may be used in biotechnology and biomedical applications as coatings (such as coatings of films or multilayer films for medical devices or implants), additives, materials, excipients, non-biofouling agents, micropatterning agents, and cellular encapsulation agents.
  • these PBAAs When used as surface coatings, these PBAAs elicited different levels of inflammation, both in vitro and in vivo, depending on their chemical structures.
  • the large chemical diversity of this class of materials allowed us to identify polymer coatings that inhibit macrophage activation in vitro. Furthermore, these coatings reduce the recruitment of inflammatory cells, and reduce fibrosis, following the subcutaneous implantation of carboxylated polystyrene microparticles. These polymers may be used to form polyelectrolyte complex capsules for cell encapsulation.
  • the invention may also have many other biological applications such as antimicrobial coatings, DNA or siRNA delivery, and stem cell tissue engineering.
  • US Patent Publication No. 20130302401 may be applied to the system of the present invention.
  • lipid nanoparticles are contemplated.
  • an antitransthyretin small interfering RNA encapsulated in lipid nanoparticles may be applied to the system of the present invention.
  • Doses of about 0.01 to about 1 mg per kg of body weight administered intravenously are contemplated.
  • Medications to reduce the risk of infusion-related reactions are contemplated, such as dexamethasone, acetaminophen, diphenhydramine or cetirizine, and ranitidine are contemplated.
  • Lipids include, but are not limited to, DLin-KC2-DMA4, C12-200 and colipids disteroylphosphatidyl choline, cholesterol, and PEG-DMG may be formulated RNA instead of siRNA (see, e.g., Novobrantseva, Molecular Therapy—Nucleic Acids (2012) 1, e4; doi:10.1038/mtna.2011.3) using a spontaneous vesicle formation procedure.
  • the component molar STDU2-42312.601 (S22-113) ratio may be about 50/10/38.5/1.5 (DLin-KC2-DMA or C12-200/disteroylphosphatidyl choline/cholesterol/PEG-DMG).
  • the final lipid:siRNA weight ratio may be ⁇ 12:1 the case of DLin-KC2-DMA and C12-200 lipid nanoparticles (LNPs), respectively.
  • the formulations may have mean particle diameters of ⁇ 80 nm with >90% entrapment efficiency. A 3 mg/kg dose may be contemplated.
  • LNPs have been shown to be highly effective in delivering siRNAs to the liver (see, e.g., Tabernero et al., Cancer Discovery, April 2013, Vol. 3, No. 4, pages 363-470) and are therefore contemplated for delivering CRISPR Cas to the liver.
  • a dosage of about four doses of 6 mg/kg of the LNP (or RNA of the CRISPR-Cas) every two weeks may be contemplated.
  • Tabernero et al. demonstrated that tumor regression was observed after the first 2 cycles of LNPs dosed at 0.7 mg/kg, and by the end of 6 cycles the patient had achieved a partial response with complete regression of the lymph node metastasis and substantial shrinkage of the liver tumors.
  • ionizable cationic lipids with pKa values below 7 were developed (see, e.g., Rosin et al, Molecular Therapy, vol. 19, no. 12, pages 1286-2200, December 2011).
  • Negatively charged polymers such as siRNA oligonucleotides may be loaded into LNPs at low pH values (e.g., pH 4) where the ionizable lipids display a positive charge.
  • the LNPs exhibit a low surface charge compatible with longer circulation times.
  • ionizable cationic lipids Four species of ionizable cationic lipids have been focused upon, namely 1,2-dilineoyl-3-dimethylammonium- propane (DLinDAP), 1,2-dilinoleyloxy-3-N,N-dimethylaminopropane (DLinDMA), 1,2- dilinoleyloxy-keto-N,N-dimethyl-3-aminopropane (DLinKDMA), and 1,2-dilinoleyl-4-(2- dimethylaminoethyl)-[1,3]-dioxolane (DLinKC2-DMA).
  • DLinDAP 1,2-dilineoyl-3-dimethylammonium- propane
  • DLinDMA 1,2-dilinoleyloxy-3-N,N-dimethylaminopropane
  • DLinKDMA 1,2- dilinoleyloxy-keto-N,N-dimethyl-3-amin
  • LNP siRNA systems containing these lipids exhibit remarkably different gene silencing properties in STDU2-42312.601 (S22-113) hepatocytes in vivo, with potencies varying according to the series DLinKC2- DMA>DLinKDMA>DLinDMA>>DLinDAP employing a Factor VII gene silencing model (see, e.g., Rosin et al, Molecular Therapy, vol.19, no.12, pages 1286-2200, December 2011).
  • a dosage of 1 ⁇ g/ml levels may be contemplated, especially for a formulation containing DLinKC2-DMA.
  • Preparation of LNPs and CRISPR Cas encapsulation may be used/and or adapted from Rosin et al, Molecular Therapy, vol.19, no.12, pages 1286-2200, December 2011).
  • the cationic lipids 1,2- dilineoyl-3-dimethylammonium-propane (DLinDAP), 1,2-dilinoleyloxy-3-N,N- dimethylaminopropane (DLinDMA), 1,2-dilinoleyloxyketo-N,N-dimethyl-3-aminopropane (DLinK-DMA), 1,2-dilinoleyl-4-(2-dimethylaminoethyl)-[1,3]-dioxolane (DLinKC2-DMA), (3- o-[2′′-(methoxypolyethyleneglycol 2000) succinoyl]-1,2-dimyristoyl-sn-glycol (PEG-S-DMG), and R-3-[(w-
  • Cholesterol may be purchased from Sigma (St Louis, Mo.).
  • the specific CRISPR Cas RNA may be encapsulated in LNPs containing DLinDAP, DLinDMA, DLinK-DMA, and DLinKC2-DMA (cationic lipid:DSPC:CHOL: PEGS-DMG or PEG-C-DOMG at 40:10:40:10 molar ratios).
  • 0.2% SP-DiOC18 Invitrogen, Burlington, Canada
  • Encapsulation may be performed by dissolving lipid mixtures comprised of cationic lipid:DSPC:cholesterol:PEG-c-DOMG (40:10:40:10 molar ratio) in ethanol to a final lipid concentration of 10 mmol/1.
  • This ethanol solution of lipid may be added drop-wise to 50 mmol/1 citrate, pH 4.0 to form multilamellar vesicles to produce a final concentration of 30% ethanol vol/vol.
  • Large unilamellar vesicles may be formed following extrusion of multilamellar vesicles through two stacked 80 nm Nuclepore polycarbonate filters using the Extruder (Northern Lipids, Vancouver, Canada).
  • Encapsulation may be achieved by adding RNA dissolved at 2 mg/ml in 50 mmol/1 citrate, pH 4.0 containing 30% ethanol vol/vol drop-wise to extruded preformed large unilamellar vesicles and incubation at 31° C. for 30 minutes with constant mixing to a final RNA/lipid weight ratio of 0.06/1 wt/wt. Removal of ethanol and neutralization of formulation buffer were performed by dialysis against phosphate-buffered saline (PBS), pH 7.4 for 16 hours using Spectra/Por 2 regenerated cellulose dialysis membranes.
  • PBS phosphate-buffered saline
  • Nanoparticle size distribution may be determined by dynamic light scattering using a NICOMP 370 particle sizer, the vesicle/intensity modes, and Gaussian fitting (Nicomp Particle Sizing, Santa Barbara, Calif.).
  • the particle size for STDU2-42312.601 (S22-113) all three LNP systems may be ⁇ 70 nm in diameter.
  • siRNA encapsulation efficiency may be determined by removal of free siRNA using VivaPureD MiniH columns Biotech) from samples collected before and after dialysis. The encapsulated RNA may be extracted from the eluted nanoparticles and quantified at 260 nm.
  • siRNA to lipid ratio was determined by measurement of cholesterol content in vesicles using the Cholesterol E enzymatic assay from Wako Chemicals USA (Richmond, Va.).
  • PEGylated liposomes (or LNPs) can also be used for delivery.
  • Preparation of large LNPs may be used/and or adapted from Rosin et al, Molecular Therapy, vol. 19, no. 12, pages 1286-2200, December 2011.
  • a lipid premix solution (20.4 mg/ml total lipid concentration) may be prepared in ethanol containing DLinKC2-DMA, DSPC, and cholesterol at 50:10:38.5 molar ratios.
  • Sodium acetate may be added to the lipid premix at a molar ratio of 0.75:1 (sodium acetate:DLinKC2-DMA).
  • the lipids may be subsequently hydrated by combining the mixture with 1.85 volumes of citrate buffer (10 mmol/1, pH 3.0) with vigorous stirring, resulting in spontaneous liposome formation in aqueous buffer containing 35% ethanol.
  • the liposome solution may be incubated at 37° C. to allow for time-dependent increase in particle size. Aliquots may be removed at various times during incubation to investigate changes in liposome size by dynamic light scattering (Zetasizer Nano ZS, Malvern Instruments, Worcestershire, UK).
  • the liposomes should maintain their size, effectively quenching further growth.
  • RNA may then be added to the empty liposomes at an siRNA to total lipid ratio of approximately 1:10 (wt:wt), followed by incubation for 30 minutes at 37° C. to form loaded LNPs. The mixture may be subsequently dialyzed overnight in PBS and filtered with a 0.45- ⁇ m syringe filter.
  • Spherical Nucleic Acid (SNATM) constructs and other nanoparticles (particularly gold nanoparticles) are also contemplate as a means to delivery CRISPR/Cas system to intended targets.
  • Significant data show that AuraSense Therapeutics' Spherical Nucleic Acid (SNATM) constructs, based upon nucleic acid-functionalized gold nanoparticles, are superior to alternative platforms based on multiple key success factors, such as: STDU2-42312.601 (S22-113)
  • STDU2-42312.601 S22-113
  • the constructs demonstrate a transfection efficiency of 99% with no need for carriers or transfection agents.
  • Therapeutic targeting The unique target binding affinity and specificity of the constructs allowaki specificity for matched target sequences (e.g., limited off-target effects).
  • Superior efficacy The constructs significantly outperform leading conventional transfection reagents (Lipofectamine 2000 and Cytofectin).
  • Low toxicity The constructs can enter a variety of cultured cells, primary cells, and tissues with no apparent toxicity.
  • the constructs elicit minimal changes in global gene expression as measured by whole-genome microarray studies and cytokine-specific protein assays.
  • Chemical tailorability Any number of single or combinatorial agents (e.g., proteins, peptides, small molecules) can be used to tailor the surface of the constructs.
  • This platform for nucleic acid-based therapeutics may be applicable to numerous disease states, including inflammation and infectious disease, cancer, skin disorders and cardiovascular disease.
  • Citable literature includes: Cutler et al., J. Am. Chem. Soc. 2011133:9254-9257, Hao et al., Small. 20117:3158-3162, Zhang et al., ACS Nano.
  • Self-assembling nanoparticles with siRNA may be constructed with polyethyleneimine (PEI) that is PEGylated with an Arg-Gly-Asp (RGD) peptide ligand attached at the distal end of the polyethylene glycol (PEG), for example, as a means to target tumor neovasculature expressing integrins and used to deliver siRNA inhibiting vascular endothelial growth factor receptor-2 (VEGF R2) expression and thereby tumor angiogenesis (see, e.g., Schiffelers et al., Nucleic Acids STDU2-42312.601 (S22-113) Research, 2004, Vol.
  • PEI polyethyleneimine
  • RGD Arg-Gly-Asp
  • VEGF R2 vascular endothelial growth factor receptor-2
  • Nanoplexes may be prepared by mixing equal volumes of aqueous solutions of cationic polymer and nucleic acid to give a net molar excess of ionizable nitrogen (polymer) to phosphate (nucleic acid) over the range of 2 to 6.
  • the electrostatic interactions between cationic polymers and nucleic acid resulted in the formation of polyplexes with average particle size distribution of about 100 nm, hence referred to here as nanoplexes.
  • a dosage of about 100 to 200 mg of CRISPR Cas is envisioned for delivery in the self-assembling nanoparticles of Schiffelers et al. [00326]
  • the nanoplexes of Bartlett et al. PNAS, Sep. 25, 2007, vol.
  • the nanoplexes of Bartlett et al. are prepared by mixing equal volumes of aqueous solutions of cationic polymer and nucleic acid to give a net molar excess of ionizable nitrogen (polymer) to phosphate (nucleic acid) over the range of 2 to 6.
  • the electrostatic interactions between cationic polymers and nucleic acid resulted in the formation of polyplexes with average particle size distribution of about 100 nm, hence referred to here as nanoplexes.
  • the DOTA-RNAsense conjugate was ethanol-precipitated, resuspended in water, and annealed to the unmodified antisense strand to yield DOTA-siRNA. All liquids were pretreated with Chelex-100 (Bio-Rad, Hercules, Calif.) to remove trace metal contaminants. Tf-targeted and nontargeted siRNA nanoparticles may be formed by using cyclodextrin-containing polycations. Typically, nanoparticles were formed in water at a charge ratio of 3 (+/ ⁇ ) and an siRNA concentration of 0.5 g/liter.
  • adamantane-PEG-Tf adamantane-PEG-Tf
  • the nanoparticles were suspended in a 5% (wt/vol) glucose carrier solution for injection.
  • Davis et al. (Nature, Vol 464, 15 Apr. 2010) conducts a siRNA clinical trial that uses a targeted nanoparticle-delivery system (clinical trial registration number NCT00689065).
  • Patients with solid cancers refractory to standard-of-care therapies are administered doses of targeted nanoparticles on days 1, 3, 8 and 10 of a 21-day cycle by a 30-min intravenous infusion.
  • the nanoparticles consist of a synthetic delivery system containing: (1) a linear, cyclodextrin-based STDU2-42312.601 (S22-113) polymer (CDP), (2) a human transferrin protein (TF) targeting ligand displayed on the exterior of the nanoparticle to engage TF receptors (TFR) on the surface of the cancer cells, (3) a hydrophilic polymer (polyethylene glycol (PEG) used to promote nanoparticle stability in biological fluids), and (4) siRNA designed to reduce the expression of the RRM2 (sequence used in the clinic was previously denoted siR2B+5).
  • TFR has long been known to be upregulated in malignant cells, and RRM2 is an established anti-cancer target.
  • the delivery of the invention may be achieved with nanoparticles containing a linear, cyclodextrin-based polymer (CDP), a human transferrin protein (TF) targeting ligand displayed on the exterior of the nanoparticle to engage TF receptors (TFR) on the surface of the cancer cells and/or a hydrophilic polymer (for example, polyethylene glycol (PEG) used to promote nanoparticle stability in biological fluids).
  • CDP linear, cyclodextrin-based polymer
  • TF human transferrin protein
  • TFR TF receptors
  • hydrophilic polymer for example, polyethylene glycol (PEG) used to promote nanoparticle stability in biological fluids
  • Liposomes are spherical vesicle structures composed of a uni- or multilamellar lipid bilayer surrounding internal aqueous compartments and a relatively impermeable outer lipophilic phospholipid bilayer. Liposomes have gained considerable attention as drug delivery carriers because they are biocompatible, nontoxic, can deliver both hydrophilic and lipophilic drug molecules, protect their cargo from degradation by plasma enzymes, and transport their load across biological membranes and the blood brain barrier (BBB) (see, e.g., Spuch and Navarro, Journal of Drug Delivery, vol. 2011, Article ID 469679, 12 pages, 2011. doi:10.1155/2011/469679 for review).
  • BBB blood brain barrier
  • Liposomes can be made from several different types of lipids; however, phospholipids are most commonly used to generate liposomes as drug carriers. Although liposome formation is spontaneous when a lipid film is mixed with an aqueous solution, it can also be expedited by STDU2-42312.601 (S22-113) applying force in the form of shaking by using a homogenizer, sonicator, or an extrusion apparatus (see, e.g., Spuch and Navarro, Journal of Drug Delivery, vol. 2011, Article ID 469679, 12 pages, 2011. doi:10.1155/2011/469679 for review). [00330] Several other additives may be added to liposomes in order to modify their structure and properties.
  • liposomes are prepared from hydrogenated egg phosphatidylcholine or egg phosphatidylcholine, cholesterol, and dicetyl phosphate, and their mean vesicle sizes were adjusted to about 50 and 100 nm. (see, e.g., Spuch and Navarro, Journal of Drug Delivery, vol.2011, Article ID 469679, 12 pages, 2011. doi:10.1155/2011/469679 for review).
  • Conventional liposome formulation is mainly comprised of natural phospholipids and lipids such as 1,2-distearoryl-sn-glycero-3-phosphatidyl choline (DSPC), sphingomyelin, egg phosphatidylcholines and monosialoganglioside. Since this formulation is made up of phospholipids only, liposomal formulations have encountered many challenges, one of the ones being the instability in plasma. Several attempts to overcome these challenges have been made, specifically in the manipulation of the lipid membrane. One of these attempts focused on the manipulation of cholesterol.
  • DSPC 1,2-distearoryl-sn-glycero-3-phosphatidyl choline
  • nucleic acid molecule e.g., DNA, RNA
  • liposomes may be contemplated for in vivo administration in liposomes.
  • the system may be administered in liposomes, such as a stable nucleic-acid-lipid particle (SNALP) (see, e.g., Morrissey et al., Nature 23, No. 8, August 2005).
  • SNALP stable nucleic-acid-lipid particle
  • Daily intravenous injections of about 1, 3 or 5 mg/kg/day of a specific CRISPR Cas targeted in a SNALP are contemplated.
  • the daily treatment may be over about three days and then weekly for about five weeks.
  • a specific CRISPR Cas encapsulated SNALP administered by intravenous injection to at doses of abpit 1 or 2.5 mg/kg are also contemplated (see, e.g., Zimmerman et al., Nature Letters, Vol. 441, 4 May 2006).
  • the SNALP formulation may contain the lipids 3-N-[(wmethoxypoly(ethylene glycol) 2000) carbamoyl]-1,2-dimyristyloxy-propylamine (PEG-C-DMA), 1,2-dilinoleyloxy-N,N-dimethyl-3- aminopropane (DLinDMA), 1,2-distearoyl-sn-glycero-3-phosphocholine (DSPC) and cholesterol, in a 2:40:10:48 molar percent ratio (see, e.g., Zimmerman et al., Nature Letters, Vol. 441, 4 May 2006).
  • PEG-C-DMA 1,2-dilinoleyloxy-N,N-dimethyl-3- aminopropane
  • DSPC 1,2-distearoyl-sn-glycero-3-phosphocholine
  • cholesterol in a 2:40:10:48 molar percent ratio (see, e.g., Zimmerman et al.,
  • SNALPs stable nucleic-acid-lipid particles
  • the SNALP liposomes may be prepared by formulating D-Lin-DMA and PEG-C-DMA with distearoylphosphatidylcholine (DSPC), Cholesterol and siRNA using a 25:1 lipid/siRNA ratio and a 48/40/10/2 molar ratio of Cholesterol/D-Lin-DMA/DSPC/PEG-C-DMA.
  • a SNALP may comprise synthetic cholesterol (Sigma- Aldrich, St Louis, Mo., USA), dipalmitoylphosphatidylcholine (Avanti Polar Lipids, Alabaster, Ala., USA), 3-N-[(w-methoxy poly(ethylene glycol)2000)carbamoyl]-1,2- dimyrestyloxypropylamine, and cationic 1,2-dilinoleyloxy-3-N,Ndimethylaminopropane (see, e.g., Geisbert et al., Lancet 2010; 375: 1896-905).
  • a SNALP may comprise synthetic cholesterol (Sigma- Aldrich), 1,2-distearoyl-sn-glycero-3-phosphocholine (DSPC; Avanti Polar Lipids Inc.), PEG- cDMA, and 1,2-dilinoleyloxy-3-(N;N-dimethyl)aminopropane (DLinDMA) (see, e.g., Judge, J. Clin. Invest. 119:661-673 (2009)).
  • Formulations used for in vivo studies may comprise a final lipid/RNA mass ratio of about 9:1.
  • STDU2-42312.601 S22-113
  • Other cationic lipids such as amino lipid 2,2-dilinoleyl-4-dimethylaminoethyl-[1,3]- dioxolane (DLin-KC2-DMA) may be utilized to encapsulate CRISPR Cas similar to e.g., Jayaraman, Angew. Chem. Int. Ed. 2012, 51, 8529-8533).
  • a preformed vesicle with the following lipid composition may be contemplated: amino lipid, di stearoylphosphatidylcholine (DSPC), cholesterol and (R)-2,3-bis(octadecyloxy) propyl-1-(methoxy poly(ethylene glycol)2000)propylcarbamate (PEG-lipid) in the molar ratio 40/10/40/10, respectively, and a FVII siRNA/total lipid ratio of approximately 0.05 (w/w).
  • the particles may be extruded up to three times through 80 nm membranes prior to adding the CRISPR Cas RNA.
  • Particles containing the highly potent amino lipid 16 may be used, in which the molar ratio of the four lipid components 16, DSPC, cholesterol and PEG-lipid (50/10/38.5/1.5) which may be further optimized to enhance in vivo activity.
  • Any element of any suitable CRISPR/Cas gene editing system known in the art can be employed in the systems and methods described herein, as appropriate.
  • coli Rac prophage RecT (NP_415865.1) and RecE (NP_415866.1) as queries using position-specific iterated (PSI)- BLAST 1 to retrieve protein homologs. Hits were clustered with CD-HIT2 and representative sequences were selected from each cluster for multiple alignment with MUSCLE 3 . Then, FastTree4 was used for maximum likelihood tree reconstruction with default parameters. A diverse set of RecET homologs were selected, synthesized by GenScript, and cloned into pMPH_MCP vectors for testing.
  • PSI position-specific iterated
  • Plasmids construction pX330, pMPH and pU6-(BbsI)_CBh-Cas9-T2A-BFP plasmids were obtained from Addgene. Tested effector DNA fragments were ordered from IDT, Genewiz, and GenScript. The fragments were Gibson assembled into the backbones using NEBuilder HiFi DNA Assembly Master Mix (New England BioLabs). All sgRNAs (Table 3) were inserted into backbones using Golden Gate cloning. All constructs were sequence-verified with Sanger sequencing of prepped plasmids. Table 3.
  • hES-H9 cells were maintained in mTeSR1 medium (StemCell Technologies) at 37 oC with 5% CO2. Culture plates were pre-coated with Matrigel (Corning) 12 hours prior to use, and cells were supplemented with 10 ⁇ M Y27632 (Sigma) for the first 24 hours after passaging. Culture media was changed every 24 hours. [00345] Transfection HEK293T cells were seeded into 96-well plates (Corning) 12-24 hours prior to transfection at a density of 30,000 cells/well, and 250 ng of total DNA was transfected per well.
  • HeLa and HepG2 cells were seeded into 48-well plates (Corning) one day prior to transfection at a density of 50,000 and 30,000 cells/well respectively, and 400 ng of total DNA was transfected per well. Transfections were performed with Lipofectamine 3000 (Life Technologies) following the manufacturer’s instructions. [00346] Electroporation For hES-H9 related transfection experiments, P3 Primary Cell 4D- NucleofectorTM X Kit S (Lonza) was used following the manufacturer’s protocol. For each reaction, 300,000 cells were nucleofected with 4 ⁇ g total DNA using the DC100 Nucleofector Program.
  • Fluorescence-activated cell sorting FACS mKate knock-in efficiency was analyzed on a CytoFLEX flow cytometer (Beckman Coulter; Stanford Stem Cell FACS Core). 72 hours after transfection, cells were washed once with PBS and dissociated with TrypLE Express Enzyme (Thermo Fisher Scientific). Cell suspension was then transferred to a 96-well U-bottom plate (Thermo Fisher Scientific) and centrifuged at 300xG for 5 minutes. After removing the supernatant, pelleted cells were resuspended with 50 ⁇ l 4% FBS in PBS, and cells were sorted within 30 minutes of preparation.
  • FACS Fluorescence-activated cell sorting
  • RFLP HEK293T cells were transfected with plasmid DNA and PCR templates and harvested after 72 hours for genomic DNA using the QuickExtract DNA Extraction Solution (Biosearch Technologies) following the manufacturer’s protocol.
  • the target genomic region was STDU2-42312.601 (S22-113) amplified using specific primers outside of the homology arms of the PCR template.
  • PCR products were purified with Monarch PCR & DNA Cleanup Kit (New England BioLabs).300 ng product was digested with BsrGI (EMX1, New England BioLabs) or XbaI (VEGFA, NEB), and the digested products were analyzed on a 5% Mini-PROTEAN TBE gel (Bio-Rad).
  • iGUIDE Off-target Analysis Genome-wide, unbiased off-target analysis was performed following the iGUIDE pipeline (Nobles, C.L., et al. Genome Biol 20, 14 (2019), incorporated herein by reference) based on Guide-seq invented previously (Tsai, S., et al. Nat Biotechnol 33, 187–197 (2015), incorporated herein by reference).
  • HEK293T cells were transfected in 20uL Lonza SF Cell Line Nucleofector Solution on a Lonza Nucleofector 4-D with program DS-150 according to the manufacturer’s instructions.
  • gRNA-Cas9 plasmids or 150ng of each gRNACas9n plasmid for the double nickase
  • 150ng of the effector plasmids and 5pmol of double stranded oligonucleotides (dsODN) were transfected.
  • Cells were harvested after 72hrs for genomic DNA using Agencourt DNAdvance reagent kit. 400ng of purified gDNA which was then fragmented to an average of 500bp and ligated with adaptors using NEBNext Ultra II FS DNA Library Prep kit following manufacturer’s instructions.
  • Microbial recombineering has two major steps: template DNA is chewed back by exonucleases (Exo), then the single-strand annealing protein (SSAP) supports homology directed repair by the template, optionally facilitated by nuclease inhibitor.
  • SSAP single-strand annealing protein
  • a system for RNA-guided targeting of RecE/T recombineering activities was developed and achieved kilobase (kb) human gene-editing without DNA cutting.
  • STDU2-42312.601 S22-113
  • RecE/T proteins were determined to be the most distant from eukaryotic recombination proteins and among the most compact (FIG. 1). Thus, RecE/T systems were utilized for downstream analysis. [00356]
  • the NCBI protein database was systematically searched for RecE/T homologs. To develop a portable tool, evolutionary relationships and lengths were examined (FIG. 2A). Co- occurrence analysis revealed that most RecE/T systems have only one of the two proteins (FIG. 2B).
  • RecE is only 269 amino acid (AA) long
  • RecE was truncated from AA587 (RecE_587) and the carboxy terminus domain (RecE_CTD) based on functional studies (Muyrers, J.P., Genes Dev. (2000); 14, 1971-1982, incorporated herein by reference).
  • HDR homology directed repair
  • RecE variants demonstrated variable increases in knock-in efficiency
  • RecT significantly enhanced HDR in all cases, replacing ⁇ 16bp sequences at EMX1 and VEGFA, and knocking-in ⁇ 1kb cassette at HSP90AA1, DYNLT1, AAVS1 (FIGS. 3A-E, FIG. 4).
  • imaging FIG. 3F
  • junction sites were sequenced using Sanger sequencing to confirm precise insertion (FIG.3G, FIG.4G).
  • a no-recruitment control with the PP7 coat protein (PCP) that recognizes PP7 aptamers not MS2 aptamers was employed.
  • PCP PP7 coat protein
  • RecE had activities without recruitment, whereas RecT showed efficiency increases in a recruitment-dependent manner (FIG. 3H). Without being bound by theory, this may be explained STDU2-42312.601 (S22-113) by RecE exonuclease activity acting promiscuously (FIG. 2C).
  • the RecE/T recombineering-edit (REDIT) tools was termed as REDITv1, with REDITv1_RecT as the preferred Example 3 [00359] Three tests on REDITv1 were performed to explore: 1) activity across cell types, 2) optimal designs of HDR template, and 3) specificity. REDITv1 activity was robust across multiple genomic sites in HEK, A549, HepG2, and HeLa cells (FIGS. 5A-C, FIGS.
  • REDITv1 did efficient kilobase editing using HA length as short as 200bp total, with longer HA supporting higher efficiency. It achieved up to 10% efficiency (without selection) for kb-scale knock-in, a 5- fold increase over Cas9-HDR and significantly higher than the 1 ⁇ 2% typical efficiency (FIG. 7).
  • REDITv1 accuracy was determined using deep sequencing of predicted off-target sites (OTSs) and GUIDE-seq. Although REDITv1 did not increase off-target effects, detectable OTSs remained at previously reported sites for EMX1 and VEGFA (FIGS. 5F-G, FIG. 8). In short, REDITv1 showcased kilobase-scale genome recombineering but retained the off- target issues, with REDITv1_RecT having the highest efficiency.
  • Example 4 [00360] To alleviate unwanted edits, a version of REDIT with non-cutting Cas9 nickases (Cas9n) was assessed.
  • Results showed minimal off-target cleavage and a reduction of OTSs by ⁇ 90% compared to REDITv1 (FIG. 9B). Specifically, for DYNLT1-targeting guides, the most abundant KIF6 OTS was significantly enriched in REDITv1 group but disappeared when using REDITv2N (FIG. 9C). REDITv2N was highly accurate (FIGS. 9B-C, FIG. 12).
  • Another byproduct of HDR editing is on-target insertion-deletions (indels). They could drastically lower yields of gene-editing, especially for long sequences. Indel formation was measured in an EMX1 knock-in experiment using deep sequencing.
  • REDITv2D when using catalytically dead Cas9 (dCas9) to construct REDITv2D, an exact genomic knock-in of a kilobase cassette was observed in human cells (FIG.9D, top, FIG.13). While REDITv2D has lower efficiency than REDITv2N, it achieved programmable DNA-damage-free editing at kilobase-scale with 1 ⁇ 2% efficiency and no selection (FIG. 9D, FIG. 10B). It was hypothesized that two processes could be contributing to the STDU2-42312.601 (S22-113) REDITv2D recombineering. One possibility was via dCas9 unwinding.
  • dCas9 could unwind DNA as it induces sequence-specific formation of loop, a double-binding with two be expected to promote genome accessibility to RecE/T.
  • a significant increase upon delivering two guide RNAs was not observed (FIG. 9D, bottom).
  • Another possibility was that the unwinding of DNA during cell cycle permitted RecE/T to access the target region mediated by dCas9 binding.
  • a 1kb knock-in was performed with different REDIT tools at varying serum levels (10% regular, 2% reduced, and no serum). As serum starvation arrests cell proliferation, the results indicated that the cell cycle correlated positively with REDITv2D recombineering (FIG.9E).
  • REDITv3 further achieved a 2- to 3- fold increase of HDR efficiencies over REDITv2 across genome targets and Cas9 variants (wtCas9, Cas9n, dCas9) (FIG. 17).
  • REDITv3 was utilized in hESCs to engineer kilobase knock-in alleles in human stem cells.
  • REDITv3N single- and double-nicking designs resulted in 5-fold and 20-fold increased HDR efficiencies over no-recombinator controls, respectively (FIG.9F). The efficacy and fidelity were confirmed via a combination of assays described for previous REDIT versions (FIGS.9F-G, FIG.18).
  • RecT and RecE_587 were truncated at various lengths as shown in FIG. 20A and FIG. 21A, respectively. The resulting efficiencies were measured using an mKate knock-in assay, with both wildtype SpCas9 and STDU2-42312.601 (S22-113) Cas9n(D10A) with single- and double-nicking at the DYNLT1 locus (FIGS.20B-C and FIGS.21B- C, respectively).
  • Efficiencies of the no recombination group are shown as the [00368]
  • the truncated versions of both RecT and RecE_587 retained significant recombineering activity when used with different Cas9s.
  • the new truncated versions such as RecT(93-264aa) are over 30% smaller yet they preserved essentially the full activities of RecT in stimulating recombination in eukaryotic cells.
  • truncated versions such as RecE_587(120-221aa) and RecE_587(120-209aa) are over 60% smaller but still retained high recombination activities in human cells.
  • These truncated versions demonstrated the potential to further engineer minimal-functional recombineering enzymes using RecE and RecT protein variants, but also provide valuable compact recombineering tools for human genome editing that is ideal for in vitro, ex vivo, and in vivo delivery given their small size.
  • Example 8 [00370] The reconstructed RecE and RecT phylogenetic trees with eukaryotic recombination enzymes from yeast and human (FIGS.1A and 1B) show the evolutionary distance of the proteins based on sequence homology. The dotted boxes indicate the full-length E. coli RecB and E. coli RecE protein. The catalytic core domain of E. coli RecB and E. coli RecE protein (solid boxes) was used for the comparison.
  • the gene-editing activities of these families of recombineering proteins were measured using the MS2-MCP recruitment system, where sgRNA bearing MS2 stem-loop is used with recombineering proteins fused to the MCP protein via peptide linker and with nuclear-localization signals.
  • Three exonuclease proteins were used: the exonuclease from phage Lambda, the RecE587 core domain of E. coli RecE protein, and the exonuclease (gene name gp6) from phage T7 (FIG. 22A).
  • the gene-editing activity was measured using mKate knock-in assay at genomic loci (DYNLT1 and HSP90AA1).
  • coli prophase RecE and RecT proteins T7 phage exonuclease gp6 and single-strand binding gp2.5 proteins. All six proteins from three systems achieved efficient gene editing to knock-in kilobase-long sequences into mammalian genome across two genomic loci. Overall, the exonucleases showed ⁇ 3-fold higher recombination efficiency (up to 4% mKate genome knock-in) when compared with no-recombinator controls.
  • the single-strand annealing proteins (SSAP) showed higher activities, with 4-fold to 8-fold higher gene-editing activities over the control groups.
  • a REDIT system using SunTag recruitment was developed (FIGS. 24A and 27A). Because SunTag is based on fusion protein design, the sgRNA or guideRNAs are the same as wild-type CRISPR system. Specifically, the REDIT recombinator proteins were fused to scFV antibody peptide (replacing MCP), and the GCN4 peptide was fused in tandem fashion (10 copies of GCN4 peptide separated by linkers) to the Cas9 protein. Thus, the scFV-REDIT could be recruited to the Cas9 complex via GCN4’s affinity to scFV.
  • Example 10 [00377] In order to demonstrate the generalizability of REDIT protein design and develop versatile REDIT system applicable to a range of CRISPR enzymes, Cpf1/Cas12a based REDIT system using the SunTag recruitment design was developed (FIG. 25A). Two different Cpf1/Cas12a proteins were tested (Lachnospiraceae bacterium ND2006, LbCpf1 and Acidaminococcus sp. BV3L6) using the mKate knock-in assay as previously shown (FIG. 25B).
  • Cpf1/Cas12a enzymes have different catalytic residues and DNA-recognition mechanisms from the Cas9 enzymes.
  • the REDIT recombination proteins exonucleases and single-strand annealing proteins
  • CRISPR enzyme components Cas9, Cpf1/Cas12a, and others.
  • Example 11 15 different species of microbes having RecE/RecT proteins were selected for a screen of various RecE and RecT proteins across the microbial kingdom (Table 5).
  • Each protein was STDU2-42312.601 (S22-113) codon-optimized and synthesized. As previously described for E. coli RecE/RecT based REDIT systems, each protein was fused via E-XTEN linker to the MCP protein with localization signal. mKate knock-in gene-editing assay was used to measure efficiencies at DYNLT1 locus (FIG. 26A, Table 6) and HSP90AA1 locus (FIG. 26B, Table 6). The homologs demonstrated the ability to enable and enhance precision gene-editing. Table 5. RecE and RecT protein homologs Homolo So rce Protein STDU2-42312.601 (S22-113) T15 Photobacterium sp.
  • Example 13 The effect of template HA lengths on the editing efficiency of REDIT was quantified when using the canonical HDR donor bearing HAs of at least 100 bp on each side (FIG.29A, left). Higher HDR rates were observed for both Cas9 and RecT groups with increasing HA lengths, and REDIT effectively stimulated HDR over Cas9 using HA lengths as short as ⁇ 100bp each side.
  • REDIT and REDITdn editing used donor DNAs with 200-bp HAs on each side and achieved up to over 5% efficiency for kb-scale gene-editing without selection compared with ⁇ 1% efficiency using non- REDIT methods. Additionally, REDIT improved knock-in efficiencies in A549 (lung-derived), HepG2 (liver-derived), and HeLa (cervix-derived) cells, demonstrating up to ⁇ 15% kb-scale genomic knock-in without selection.
  • Example 15 [00389] In vivo use of dCas9-EcRecT (SAFE-dCas9) was tested using cleavage free dCas9 editor via hydrodynamic tail vein injection.
  • the gene editing vectors and template DNA used are shown in FIG. 33A.
  • Successful gene editing of liver hepatocytes was monitored by transgene-encoded protein expression from the albumin locus.
  • a schematic of the experimental procedure is shown in FIG.
  • LTC mice include three genome alleles: 1) Lkb1 (flox/flox) allele KO when expressing Cre; 2) R26(LSL-TdTom) allele allows detection of AAV-transduced cells via TdTom red fluorescent protein; and 3) H11(LSL-Cas9) allele allows expression of Cas9 in AAV-transduced cells.
  • Schematics of the REDI gene editing vector and Cas9 control vectors are shown in FIG. 35A. As shown in FIG.
  • Escherichia coli RecE amino acid sequence (SEQ ID NO:1): MSTKPLFLLRKAKKSSGEPDVVLWASNDFESTCATLDYLIVKSGKKLSSYFKAVATNFP VVNDLPAEGEIDFTWSERYQLSKDSMTWELKPGAAPDNAHYQGNTNVNGEDMTEIEEN MLLPISGQELPIRWLAQHGSEKPVTHVSRDGLQALHIARAEELPAVTALAVSHKTSLLDP LEIRELHKLVRDTDKVFPNPGNSNLGLITAFFEAYLNADYTDRGLLTKEWMKGNRVSHI TRTASGANAGGGNLTDRGEGFVHDLTSLARDVATGVLARSMDLDIYNLHPAHAKRIEEI IAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIEVIPAHVTEYLNKVLTETDHA NPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKPQPSGTTAVEQGEAETMEPDATEHHQ DTQPL
  • MGF014 RecE amino acid sequence (SEQ ID NO:6): MKEGIYYNISNEDYHNGLGISKSQLDLINEMPAEYIWSKEAPVDEEKIKPLEIGTALHCLL LEPDEYHKRYKIGPDVNRRTNVGKEKEKEFFDMCEKEGITPITHDDNRKLMIMRDSALA HPIAKWCLEADGVSESSIYWTDKETDVLCRCRPDRIITAHNYIIDVKSSGDIEKFDYEYYN YRYHVQDAFYSDGYKEVTGITPTFLFLVVSTKIDCGKYPVRTYVMSEEAKSAGRTAYK HNLLTYAECLKTDEWAGIRTLSLPRWAKELRNE Shigella sonnei RecE amino acid sequence (SEQ ID NO:7): DRGLLTKEWRKGNRVSRITRTASGANAGGGNLTDRGEGFVHDLTSLARDIATGVLARS MDVDIYNLHPAHAKRIEEIIAENKPPFSVFRDKFITMPGGLDYSRAIVVA
  • MGF014 RecT amino acid sequence (SEQ ID NO:12): MSNPPLAQSDLQKTQGTEVKVKTKDQQLIQFINQPSMKAQLAAALPRHMTPDRMIRIVT TEIRKTPALATCDMQSFVGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLII GYRGMIDLARRSNQIISISARTVRQGDNFHFEYGLNEDLTHTPSENEDSPITHVYAVARL KDGGVQFEVMTYNQVEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQ KAVVLDEKAEANVDQENATIFEGEYEEVGTDGN Shigella sonnei RecT amino acid sequence (SEQ ID NO:13): MTKQPPIAKADLQKTQENRAPAAIKNNDVISFINQPSMKEQLAAALPRHMTAERMIRIA TTEIRKVPALGNCDTMSFVSAIVQCSQLGLEPGSALGHAY
  • JCM 19050 RecT DNA (SEQ ID NO:111): AACACCGACATGATCGCCATGCCCCCTTCTCCAGCCATCAGCATGCTGGACACAAGC AAGCTGGATGTGATGGTGCGGGCAGCAGAGCTGATGTCCCAGGCCGTGGTCATGGT GCCCGACCACTTCAAGGGCAAGCCAGCCGATTGCCTGGCAGTGGTCATGCAGACCAGTGGGGCATGAACCCCTTTACCGTGGCCCAGAAAACCCACCTGGTGAGCGGC ACCCTGGGATACGAGTCCCAGCTGGTGAATGCCGTGATCAGCTCCTCTAAGGCCATC AAGGGCCGGTTCCACTATGAGTGGTCTGATGGCTGGGAGAGACTGGCCGGCAAGGT GCAGTACGTGAAGGAGTCTCGGCAGAGAAAGGGCCAGCAGGGCAGCTATCAGGTGACCGTGGCCAAGCCAACATGGAAGCCAGAGAGGGCCTGTGGGTGCGGT GTGGAGCCGTGCTGGCCGGAGAAGGAAGGA
  • JCM 19050 RecE DNA (SEQ ID NO:112): GCCGAGCGGGTGAGAACCTATCAGCGGGACGCCGTGTTCGCACACGAGCTGAAGGC CGAGTTTGATGAGGCCGTGGAACGGCAAGACCGGCGTGACACTGGAGGACCAGG STDU2-42312.601 (S22-113) CCAGGGCCAAGAGGATGGTGCACGAGGCCACCACAAACCCCGCCTCTCGGAATTGG GGAGGCAGGCCTGGTGCTGAAGGCCAGGCCTGACAAGGAGATCGGCAACAATCTGA TCGATGTGAAGTCCATCGAGGTGCCAACCGACGTGTGCGCCTGTGATCTGAACGCCT ATATCAATCGGCAGATCGAAGAGAGGCTACCACATCTCCGCCGCCCACTATCTGT CTGGCACAGGCAAGGACCGCTTCTTTTGGATCTTCATCAATAAGGTGAAGGGCTACG AGTGGGTGGCAATCGTGGAGGCCTCTCCCCTGCACATCGAGCTGGGCTTCTTTTGGATCTTCATCA
  • MGF014 RecT Protein (SEQ ID NO:125): MSNPPLAQSDLQKTQGTEVKVKTKDQQLIQFINQPSMKAQLAAALPRHMTPDRMIRIVT TEIRKTPALATCDMQSFVGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLII GYRGMIDLARRSNQIISISARTVRQGDNFHFEYGLNEDLTHTPSENEDSPITHVYAVARL KDGGVQFEVMTYNQVEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQ KAVVLDEKAEANVDQENATIFEGEYEEVGTDGN Providencia sp.
  • MGF014 RecE Protein (SEQ ID NO:126): MKEGIYYNISNEDYHNGLGISKSQLDLINEMPAEYIWSKEAPVDEEKIKPLEIGTALHCLL LEPDEYHKRYKIGPDVNRRTNVGKEKEKEFFDMCEKEGITPITHDDNRKLMIMRDSALA HPIAKWCLEADGVSESSIYWTDKETDVLCRCRPDRIITAHNYIIDVKSSGDIEKFDYEYYN YRYHVQDAFYSDGYKEVTGITPTFLFLVVSTKIDCGKYPVRTYVMSEEAKSAGRTAYK HNLLTYAECLKTDEWAGIRTLSLPRWAKELRNE Shewanella putrefaciens RecT Protein (SEQ ID NO:127): MQTAQVKLSVPHQQVYQDNFNYLSSQVVGHLVDLNEEIGYLNQIVFNSLSTASPLDVA APWSVYGLLLNVCRLGLSLNPEKKLAYVMPSWSETGE
  • MUM 116 RecT Protein (SEQ ID NO:129): MSKQLTTVNTQAVVGTFSQAELDTLKQTIAKGTTNEQFALFVQTCANSRLNPFLNHIHCI VYNGKEGATMSLQIAVEGILYLARKTDGYKGIECQLIHENDEFKFDAKSKEVDHQIGFP RGNVIGGYAIAKREGFDDVVVLMESNEVDHMLKGRNGHMWRDWFNDMFKKHIMKR AAKLQYGIEIAEDETVSSGPSVDNIPEYKPQPRKDITPNQDVIDAPPQQPKQDDEAAKLK AARSEVSKKFKKLGIVKEDQTEYVEKHVPGFKGTLSDFIGLSQLLDLNIEAQEAQSADG DLLD Bacillus sp.
  • JCM 19050 RecT Protein (SEQ ID NO:141): MNTDMIAMPPSPAISMLDTSKLDVMVRAAELMSQAVVMVPDHFKGKPADCLAVVMQ ADQWGMNPFTVAQKTHLVSGTLGYESQLVNAVISSSKAIKGRFHYEWSDGWERLAGK VQYVKESRQRKGQQGSYQVTVAKPTWKPEDEQGLWVRCGAVLAGEKDITWGPKLYL ASVLVRNSELWTTKPYQQAAYTALKDWSRLYTPAVMQGSMTGKSWSLTGRLISPR Photobacterium sp.
  • JCM 19050 RecE Protein (SEQ ID NO:142): MAERVRTYQRDAVFAHELKAEFDEAVENGKTGVTLEDQARAKRMVHEATTNPASRN WFRYDGELAACERSYFWRDEEAGLVLKARPDKEIGNNLIDVKSIEVPTDVCACDLNAYI NRQIEKRGYHISAAHYLSGTGKDRFFWIFINKVKGYEWVAIVEASPLHIELGTYEVLEGL RSIASSTKEADYPAPLSHPVNERGIPQPLMSNLSTYAMKRLEQFREL Providencia alcalifaciens DSM 30120 RecT Protein (SEQ ID NO:143): MKAQLAAALPKHITSDRMIRIVSTEIRKTPSLANCDIQSFIGAVVQCSQLGLEPGNALGH AYLLPFGNGKSDNGKSNVQLIIGYRGMIDLARRSGQIISISARTVRQGDNFHFEYGLNEN LTHIPEGNEDSPITHVYAVARLKDEGVQFEVMTYNQIE
  • SH-Sr6A (SEQ ID NO:226) SEATKEAKPETAIAKKPNGIKDWLKSDALKTQIASVAPKHMAPERVMRIALNAVSRTPK LADCTPESFMRCLLDLSSWGLVPDGRHAHLIPYGTECTLVLDYKGLVTLAYRSGWVKKI HADVVFEGDIFVYSLGTVCQHIPWEFRDDANKPEHKGHFRAAYCVVTMADGIEKHEV MTASEIDAIKAKSRSGNSGPWKDHYTEMAKKTVYRRASKWLPLSPEQADAMERDDDRI IDAVSVAVTQRLSKAAMPLIGANETGDTE UPI00078EBE91 RecT [Pirellula sp.
  • SH-Sr6A (SEQ ID NO:227) STDU2-42312.601 (S22-113) SEATKEAKPETAIAKKPNGIKDWLKSDALKTQIASVAPKHMAPERVMRIALNAVSRTPK HADVVFEGDIFVYSLGTVCQHIPWEFRDDANKPEHKGHFRAAYCVVTMADGIEKHEV MTASEIDAIKAKSRSGNSGPWKDHYTEMAKKTVYRRASKWLPLSPEQADAMERDDDRI IDAVSVAVTQRLSKAAMPLIGANETGDTE UPI00078ED021 (SEQ ID NO:228) SEIQQQAEAQTQAHPTAVLDDYRGAIASVAPPGTNIDLFIRMTKSNVNRSDEIVAAVKR NPGLFMQAVMDSAALGHIPGSEYYYLTPRRDGISGIESWKGVAKRIFNTGRYQRIVCEV VYEGEQWEFQPGEDLKPKHVIDWDARQVGSKVRFTYAYAVDFEGNPSTVAVCTKLDL
  • UNC322MFChir4.1 (SEQ ID NO:281) ATNKDVKNQLANRKENKPATPEQKVEAYMTAMAPRFAEVLPKHMSMDRMSRIALTTI RTNPKLLECSVPSLMGAVMQAVQLGLEPGLLGHCYILPYKSEATFIIGYKGMIDLARRSG HIQSIYAHAVYENDEFDYELGLHPKLTHKPSFGERGEFIGAYAVAHFKDGGHQMEFMPK STDU2-42312.601 (S22-113) SEIEKRRSRSASGNSSYSPWKSDYEEMAKKTVVRYMFKYLPISIEVQSQAQHDEVVRKD ITEEPEFIEMDSIEVAEASEGDGQKEFVIEE RDC50983.1 RecT [Acinetobacter sp.
  • FCS006 PYLFGGQMKEQEIKNQLAAKAVETTNPKLSKNMNIADLIKAIEPEIKKALPTVITPERFTR IALSALNTTPKLAECSQMSFLAALMNAAQLGLEVNSPLGQAYLIPYNNKGKLECQFQIG YKGMLGLAYRNPEIQTIQAQVVYENDDFKYELGLDSKLYHKPSLSDRGKVRCYYALYK LRNGGYGFEVMSRRDVEEYAKRYSKVTDSLYSPWANNFDSMAKKTVIKQLLKYAPLR TDLEKAMSMDESIKTRVSVDMSEVENEETFDAEVEV WP_107514794.1 RecT [Staphylococcus equorum] (SEQ ID NO:309) ATNETLKQKVVERKPNGVKEQSPKTQLNHLLKKMAPEIQRALPKHMDSDRMARIAMT AVSNTPKLLECDQMSFIAALMQASQLGVEPNTGLG
  • AFS017336 (SEQ ID NO:325) ATNESLKNQITNKKTGEVPLTPAQQVSSYLKAYEGTFQQIAPKHFNTERFQRIALSEIRKN PKLLECSVPSLMSAVLQSVKLGLEPGLFGQAYLIPYGKEVQFQIGYRGLIELSQRSGRILK IQAREVYENDEFEVSYGIDDNIIHKPALDVDRGKVRLYYAVAWFKDGGAQFELMSISDV EKHRDKFSKTAKFGPWKDHFDEMAKKTVLKKLVKQLPMDVEFQEAVQEDETVRKTIT DEPEILQAEFEIVDQPEISVE WP_087290962.1 RecT [Pseudoflavonifractor sp.
  • KNHs210] (SEQ ID NO:334) TTRTGNIKEELAKKAEGTNGDTRLTKAMSIADLIKAMEPEIKKALPEVITPERFTRMALS ALNTTPKLRECTQISFLAAMMNAAQLGLEPNTPLGQAYLIPFNNKGTMECQFQIGYKGM IDLSYRNPQMQMISAQAVYENDEFKYELGLNPTLIHRPVLRGRGEVILFYGLFKLTNGGY GFEVMSKEEMDAYAKAYSKAIDSSFSPWKSNYNGMAKKTVIKQVLKYAPIKADFRKAL SSDETIKNEISENMSEIHGEIIFDTDYMEESA WP_117768035.1 RecT [Blautia sp.
  • JCM 19055] (SEQ ID NO:346) GYKGMIDLARRSGHIKSIYAHTVHANDEFEYELGLEPKLVHKPATGDRGNMEYAYAVA HFVDGGYQFEVFSHHDIEQVKKRSKAGNFGPWKTDYEEMAKKTVVRRMFKYLPISIEIQ QHASQDETVRRDITEEAEKVDNIIDLPNYEDPNNIDVPDEEQDEQKDEKQKQQGSAEEIA LDFK WP_135329961.1 RecT [Streptomyces sp.
  • SEQ ID NO:347 STNLAARVEARRQNPTTKQPARRGKAAQQPTLVQFVQSMRGEIARALPSHVASPERIAR IALTELRRVDHLAECTQESFGGALMTCAALGLEPGGVGGEAYLLPFWNKKVRAYEVTL VIGYQGMVRLFWQHPAAAGLAAHTVHEGDEFDFEYGLEPFLRHKPARTGRGKPTDYY STDU2-42312.601 (S22-113) AVAKMANGGSAFVVMNVEDIEAIRHRSKARDAGPWSTDYGLRRHGAQDLHSAVVQV AAEVC WP_079588582.1 RecT [Acetoanaerobium noterae] (SEQ ID NO:348) SNLKNELAKKANNSVTDGNKEPQTIKDWIKVMEPAIKKALPSVITPERFTRMALTAISVN PKLAECTPKSFMGSLMNAAQLGLEPNTPLGQAYLIPYKNKGNMEVQFQIGYKGLIELAY
  • SH-Sr6A SEATKEAKPETAIAKKPNGIKDWLKSDALKTQIASVAPKHMAPERVMRIALNAVSRTPK LADCTPESFMRCLLDLSSWGLEPDGRHAHLIPYGTECTLVLDYKGLVTLAYRSGWVKKI HADVVFEGDIFVYSLGTVCQHIPWEFRDDANKPEHKGHFRAAYCVVTMADGIEKHEV MTASEIDAIKAKSRSGNSGPWKDHYTEMAKKTVYRRASKWLPLSPEQADAMERDDDRI IDAVSVAVTQRLSKAAMPLIGANETGDTE WP_126032909.1 RecT [Bifidobacterium castoris] (SEQ ID NO:355) GALATTAKNNELTTMNTMGDIHALIRGRRAQIESVMSGVLTPERLYSLLQSAVSHEPKL LQCTPESIVACCMKCAVLGLEPSNVDGLGKAYILPYGNKNYQTGQVEATFILGYKGMIE
  • TolDC (SEQ ID NO:362) KASEIASMVKKEDERRNHKPDPLAGIVKNLTSIKGEIANALPDAGITPERMIRIVVTLLRQ NKSLAEAAMQNPASLLGAVMMAAQLGLDPTNGLDQCALVPRKGKVCFDIMYEGLVEL GYRSDRMESIVARTVYEKDTFSLKYGLNEELVHIPYLDGDPGESKGYYMVGKLKGGGN IIVYMTKEQVHKIRDRYSVAYKAGLSGSRKDSPWFTSEDRMGEKTVVKAGFRWIPKSPII RTALALDETAREASRLPMRN WP_109196224.1 RecT [Streptomyces sp.
  • NIBRBAC000502771 (SEQ ID NO:373) TSQLAEATAAKAVEQRKNPTARDLIQAQQAAIETQLAGAMNSAAFVRAAISSVSASPQL QQATPASLLGGIMLAAQLKLEIGPALGHFHLTPRMVSKKDGDNWVEVWTCLPIIGYQG YIELAYRSGRIEKIESLLVRKGDKFDHGANSERGRFFDWAPADYEETREWTGVIALAKIK GAGTVWAYLPKEKVIARRPDRWEKTPWATNEEEMARKSGIRALAPYLPKSTELGKALE ADEHKVEHIAGVHDLVVSKAEDEPLEEPTA TAK04183.1 EPO34_03495 [Patescibacteria group bacterium] (SEQ ID NO:374) TNQPTTHVATTPNQRPATTLEQFRHQLVGDYQKQVLNYFNGQKEKAMKFMSAVVYSA QKNPALLECDRTTLLHAFMACAEYQLYPSSVSGEAYVIPYKGKAQ
  • NRRL S-1824 (SEQ ID NO:381) SEISNAIATRDQGPAAQIEAYRDEYAALVPSHINVDQWVRLAAGAIRGNEDLMEAARND IGVFLRELKTAARLGLEPGTEQFYLTARKSKAHGYALIIKGIVGYQGIVELIYRAGAVSSV IVEAVRANDTFSYVPGRDDRPIHEIDWFGGDRGPLVGVYAYAVMKDGAVSKVVVMNH KRVMEIKARSDSKNSQYSPWNTDEESMWLKSAIRQLAKWVPTSAEYKSEQLRAHAEAI GELASVASAPLPPQPSVLDDVDPDDEGPIEGELVD RKT60104.1 RecT [Agromyces sp.
  • OV415 (SEQ ID NO:382) STTVALPAQKAEAVIQQVTGAANGFAAALADRIGPDRFVRAAVTSIRTSPQLAQCEPLSI LGGLFVAAQLALEVGGPRGLAYLVPYGREAQLIVGYRGYVELFYRAGARKVEWFIVRD GDTFRQWSTGRGGRDYEWTPLDDDSNRRPIGAVAQIQGAHGEFQFEHMTVDQINERRP KRATSGPWVDWYEEMALKTVMRQLAKTARQSTDDLAFAAANDGAVITQVEGGQARV VHPATSEPEQPLSLDALERTPGELAEETNP WP_017415747.1 RecT [Clostridium tunisiense] (SEQ ID NO:383) TTKANVTSVKNALKEQIQVQQVAAQTDTSFQGVLTKQLQHQFKAIQSLVPKHVTPERLC RIGINAASRNPQLMNCTPETIVGAIVNCATLGLEPNLLGHAYIVPFYNNKTGKMEAQF
  • BV3C26 (SEQ ID NO:400) TNIQKQENRALSPVNQMKNLLANQGMQNLFADALKENKDRFIASIIDLYNGDNYLQNC DPKEVAMEALKAATLNLPINKSLGYAYIVPFKNKGKLTPQFQIGYKGYIQMAQRSGQYK ALNAGIMYEGMEIKRDFLRGTFEIVGEPKSDKAIGYFAYFQLLNGYEKALYMSKEDITD HAKRYSQSFGSDFSPWKNQFDEMAQKTVLRRLLTKYGVLTTEFQEAAKREEDEEVLKA TEENAMIEMNSQEETIAVDPKTGEIIEETEAPF PCR98661.1 RecT [Lactococcus fujiensis JCM 16395] (SEQ ID NO:401) KSAPVQARFQEVLGKKSSGFVSSLLTVVNNNNLLKRATPDSIMTAAMKAATLDLPIEPS LGFAYIIPYGQEAQF
  • CAG:76 (SEQ ID NO:422) AERKQITTKEYLAEVKGGLENELNLNAKALPENFNQSRFVLNCISLIKSNLSNYNNITPES VYLALAKGAYLGLDFFNGECYAIPYSGEVNFQTDYKGEIKLAKTYSRNPIKDIYAKNVR DGDFFEEIIESGKQSVNFRPVPFSDKKIIGTFAVVLFKDGSMMYDTMSVKEIEEVRNNFS KAKNSKAWAATPGEMYKKTVLRRLCKLIDLDFNSQQRLAYEDAGDFDKEKADEPVAD DTVNVFDAEFKEVEPENKDAAIIEEMGLEEA WP_099299656.1 RecT [Pediococcus pentosaceus] (SEQ ID NO:423) MNDISKVPMKVLVQQDKVQRMLENTLKGKTRQFTTSLINVVNSNQSLADVDQMSVIKS AMVAASLDLPIDQNLGFMWLVPYK
  • VMFN-D1 (SEQ ID NO:453) AKALLENKLQERAAGASTPSTQGTSLKALLNSPAIKKRFDELLDKRSAQYMTSIVNLYN SDAMLQKAEPMSVISSCIVAATLDLPVDKNLGYAWIVPYSGKAQFQLGYKGYIQLALRT GQYKAINVIEVYEGELVKWNPLTEALELDFEKRKSDAVIGYAGYFELINGFRKSVYWTR EQIESHRKKFSKSDFGWKKDYDAMAKKTIIRNMLSKWGILSIEMQDAYSKEIEAIPPLNN ENEEDPPIDLTPEDYRVGDEPQDGKEQGEMNFE WP_123849158.1 RecT [Chitinophaga lutea] (SEQ ID NO:454) SNVNAPAAPVKSKIEVLKDIMNAPSVQEQFQNALRENSGVFVASVIDLFNSDTYLQNCE PKQVVMECLKAATLKLPINKNLGFAYVVP
  • Cas9 editing can cause DNA damage at on- and off-target sites and relies on the endogenous DNA repair mechanisms that are error-prone. These features often lead to unwanted mutations and safety concerns, which can be exacerbated when Applicants alter long sequences.
  • SSAPs single-strand annealing proteins
  • dCas9-based editor had very low editing errors at target loci, minimal detectable off-target effect, and higher overall accuracy than Cas9 editors.
  • STDU2-42312.601 S22-113
  • dCas9-SSAP editor had comparable efficiencies as Cas9 editors, with robust performances across human cell lines and stem cells.
  • This dCas9-SSAP editor was for inserting sequences of variable lengths, up to kilobase scale.
  • dCas9-SSAP editing demonstrated notable independence from endogenous mammalian repair pathways.
  • WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 is hereby incorporated herein by reference.
  • RTs of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 can be used in the practice of the present invention.
  • Bacteriophages evolved enzymes that take advantage of accessible replicating genome DNA to perform precise recombination. Applicants reasoned that the key enzyme for microbial recombination, namely the single-strand annealing protein (SSAP), could be useful for gene editing in mammalian cells, and it would not explicitly cleave DNA and not rely on the error-prone pathway that was needed by Cas9 editing.
  • SSAP single-strand annealing protein
  • dCas9 deactivated Cas9
  • dCas9-SSAP dCas9-SSAP editor
  • Applicants performed a metagenomic search of SSAPs focusing on RecT homologs, and identified EcRecT as the most efficient one for human genome knock-in.
  • dCas9-SSAP To help with delivery of dCas9-SSAP for future applications, Applicants optimize its molecular design using structural-guided truncation, and obtain a minimized dSaCas9- mSSAP, achieving over 50% reduction in size and retaining similar levels of efficiency.
  • This minimal dCas9 editor would allow convenient delivery using viral vectors such as adeno- associated virus (AAV), potentially useful for hard-to-transfect cell types or in vivo applications.
  • AAV adeno- associated virus
  • the dCas9-SSAP editor is capable of efficient, accurate knock-in genome engineering. With space for further improvement, it has potential research and therapeutic values as a cleavage- free gene-editing tool for mammalian cells.
  • phage SSAPs for dCas9 knock-in gene editing.
  • Most CRISPR-based editors capable of long-sequence knock-in require SSNs or DSBs, which can trigger the competing, error- prone NHEJ pathways, resulting in variable efficiency and accuracy.
  • bacteriophages evolved DNA-modifying enzymes to integrate themselves into the genomes of host bacteria via sequence homology, e.g., Lambda Red.
  • sequence homology e.g., Lambda Red.
  • Such precise phage integration relies on a major homology-directed step: recombination between genomic and donor DNA is stimulated by the SSAPs, e.g., Lambda Bet or its functional homolog, RecT.
  • Phage SSAPs may not rely on DNA cleavage thanks to its unusual ATP-independent activity, in contrast to the ATP-dependent RAD51 protein in human cells.
  • Phage SSAPs high affinity for single- and double-stranded DNAs may allow attachment to donor templates when multiple SSAPs are recruited to genomic targets via RNA-guided dCas9. It could then promote genomic-donor DNA exchange without cleavage, as target DNA strands become transiently accessible during dCas9-mediated DNA-unwinding and R-loop formation.
  • the dCas9 protein cannot cut DNA but retains the ability to unwind target sites and form R-loop, rendering the non-target strand putatively accessible for SSAP-stimulated homologous recombination.
  • RNA aptamer MS2 stem-loop RNA aptamer MS2 stem-loop
  • Applicants generated knock-in donors with an 800-bp transgene encoding fluorescent protein (FP) cassette flanked by homology-arms (HA), which allow in-frame insertion of the FP into housekeeping genes, e.g., DYNLT1, HSP90AA1, ACTB (FIG. 38B, left).
  • FP fluorescent protein
  • housekeeping genes e.g., DYNLT1, HSP90AA1, ACTB
  • Applicants’ initial test identified that RecT has higher knock-in editing activities relative to other SSAPs in human cells, whereas no editing above background was observed with dCas9-only or non-targe controls (FIGS. 38C, 38D).
  • Development of dCas9-SSAP as a mammalian gene-editing tool Applicants conducted metagenomic mining to identify the best SSAP for mammalian gene-editing. Applicants focused on RecT homologs and sought to maximize evolutionary diversity via a phylogenetic analysis. Applicants systematically searched the NCBI non-redundant sequence database for RecT homologs, and identified 2,071 initial candidates.
  • Applicants built phylogenetic trees, filtered out proteins with high sequence homology, and subsampled the evolutionary branches, obtaining 16 highly diverse SSAP candidates (FIG. 44).
  • Applicants examined the SSAP candidates by knock-in screening and evaluating their editing efficiencies across three genomic loci: HSP90AA1, DYNLT1, and ACTB (FIG. 38E).
  • EcRecT demonstrates the highest efficiency for dCas9 editing – it achieves genomic knock-in of kilobase cassette with up to ⁇ 6% efficiency in human cells.
  • dCas9-SSAP Characterizing the accuracy of dCas9-SSAP gene-editing. The motivation for developing dCas9-SSAP is to perform potentially safer, cleavage-free dCas9 editing with the help of SSAP.
  • Applicants experimentally evaluated the accuracy of dCas9-SSAP for knock-in editing where the target sequence is ⁇ 1kb in length.
  • Applicants measured the on-target error, off- target insertion, cell fitness effect, and editing yields of dCas9-SSAP, in comparison with Cas9 references.
  • On-target error analysis There are two types of on-target errors: (1) on-target indel formation, whose occurrence means that knock-in is unsuccessful; (2) knock-in errors, which means that knock-in happens but is imperfect, and that junction indels occur.
  • To evaluate (1) Applicants used deep sequencing to measure the on-target indel formation of dCas9 editor.
  • Applicants used the nested PCR design with an initial primer binding outside the donor DNA to avoid template contamination (FIG.39A, FIG.46). Deep sequencing of on-target sites showed that the dCas9 editor’s level of on-target error is as low as that of negative controls, in contrast to high levels of indel formation observed for Cas9 editor (FIG. 39A). [00409] To evaluate (2), Applicants benchmarked the knock-in errors of dCas9-SSAP and measured junction indels. Applicants clonally isolated edited cells, and then amplified the knock- in genomic loci using a similar 2-step nested PCR design to avoid contamination (FIG. 39B, FIG. 46), Applicants assessed the edited genomic alleles via Sanger sequencing.
  • Applicants also performed down-sampling to ensure all groups have the same sequencing depth/coverage. Considering insertion sites with >1% of total aligned reads, Applicants’ results confirmed that dCas9-SSAP had no detected off-target insertion site, while Cas9 references led to a significant number of off-target error sites (FIG. 39E). Notably, in all dCas9-SSAP samples, there were significantly less off-target sites when Applicants consider all sites with at least one UMI aligned, in contrast to Cas9 editor (FIG. 49). This result suggests that dCas9-SSAP could help to address the off-target issues that are prominent for long-sequence knock-in.
  • dCas9-SSAP has higher accuracy for knock-in editing, validated its efficiencies and usages.
  • Applicants’ data showed that dCas9-SSAP had consistent performances, with comparable and often higher efficiencies than Cas9 references across the transgene lengths tested (FIG. 40D).
  • dCas9-SSAP editor demonstrated efficiencies up to 12% without selection, comparable and often slightly higher than Cas9 references using the same donors (FIG. 40E).
  • Applicants applied dCas9-SSAP to three cell lines with distinctive tissue origins (cervix-derived HeLa cells, liver-derived HepG2 cells, and bone-derived U-2OS cells). Applicants observed consistent knock-in efficiencies comparable to Cas9 references in all three lines (FIG.
  • dCas9-SSAP editor in human embryonic stem cells (hESCs) to engineer sequences in a more therapeutically relevant setting.
  • dCas9-SSAP editing used short ⁇ 200-bp HAs and achieved up to ⁇ 3% efficiency for kb-scale editing without selection, comparable and often higher than the Cas9 references in human stem cells (FIG. 40G, FIG. 52).
  • Chemical perturbations suggest dCas9-SSAP gene-editing has less dependence on endogenous DNA repair pathways.
  • dCas9-SSAP acts through the activity of SSAP when recruited by the dCas9-guideRNA complex and differs from Cas9 editing.
  • Applicants investigate how cell cycling affects the dCas9-SSAP editor. Cell cycling has been shown to facilitate the accessibility of mammalian genomes. More specifically, STDU2-42312.601 (S22-113) the genome replication (during S phase) may provide a favorable environment for the dCas9 to unwind DNAs and allow SSAP-mediated recombination (FIG. 41C). To test this Applicants synchronized cells at the G1/S boundary using the double Thymidine blockage (DTB).
  • DTB Thymidine blockage
  • dSaCas9-mSSAP editor harmonizes the RNA-guided programmability of CRISPR genome-targeting with the SSAP activity of phage enzyme RecT.
  • the fragments encoding the recombination enzymes were Gibson assembled into backbones (addgene plasmid #61423) using Q5® High-Fidelity 2X Master Mix (New England BioLabs). The amino acids sequence for these SSAP could be found in the Table 10. All sgRNAs were inserted into backbones (dCas9-SSAP and dSaCas9-SSAP plasmids) using Golden Gate cloning. dCas9-SSAP plasmids bearing BbsI(dSpCas9) and BsaI(dSaCas9) sites as gRNA backbones were sequence-verified (Eton and Genewiz).
  • HEK 293T, Hela, HepG2 and U2OS cells were maintained in Dulbecco’s Modified Eagle’s Medium (DMEM, Life Technologies), with 10% fetal bovine serum (FBS, BenchMark), 100 U/mL penicillin, and 100 ⁇ g/mL streptomycin (Life Technologies) at 37 oC with 5% CO2.
  • DMEM Modified Eagle’s Medium
  • FBS fetal bovine serum
  • streptomycin Life Technologies
  • hES-H9 cells were maintained in mTeSR1 medium (StemCell Technologies) at 37 oC with 5% CO2. Culture plates were pre-coated with Matrigel (Corning) 12 hours prior to use. 10 ⁇ M Rho Kinase inhibitor Y27632 (Sigma) was added for the first 24 hours after each passaging. Culture media was changed every 24 hours. [00428] Transfection.
  • HEK293T, Hela, HepG2 and U2OS cells were seeded into 96-well plates (Corning) 12-24 hours prior to transfection at a density of 30,000 cells/well, and 250 ng of total DNA was transfected per well.
  • Cells were transfected with Lipofectamine 3000 (Life Technologies) following the manufacturer’s instructions when the cell are ⁇ 70% confluence.
  • Applicants used 250 ng total DNA, 0.4 ul Lip3000 reagent, mixed with 10 ul of Opti-MEM per well.
  • Applicants used 160 ng of dCas9-SSAP guideRNA plasmids (for double sgRNAd design, use equal amount of the two guideRNA plasmids, e.g., 80ng each), 60 ng of pMCP-RecT or GFP control plasmid (addgene # 64539) and 30 ng of PCR template DNA (the PCR primer could be found in Table 9, the template sequence could be found in Supplementary Sequences).Three days later, the cells were analyzed using FACS. [00429] Electroporation. For hES-H9 transfection, P3 Primary Cell 4D-NucleofectorTM X Kit S (Lonza) was used following the manufacturer’s protocol.
  • the hES-H9 cells were resuspended using Accutase (Innovative Cell Technology) and washed with PBS twice before the electroporation. For each reaction, 300,000 cells were nucleofected with 4 ⁇ g total DNA mixed in 20 ul electroporation buffer using the DC100 Nucleofector Program.
  • Applicants used 2.6 ug of dCas9-SSAP guideRNA plasmids (for double sgRNAd design, use equal amount of the two guideRNA plasmids, e.g., 1.3 ug each), 1 ug of pMCP-RecT or GFP control plasmid and 0.4 ug of PCR template DNA (the PCR primer could be found in Table 9, the template sequence could be found in Supplementary Sequences).
  • the cells were seeded into 12-well plates with 1 mL of mTeSR1 media added with 10 uM Y27632. Culture media was changed every 24 hours. Four days later, the cells were analyzed using FACS.
  • FACS Fluorescence-activated cell analysis
  • HEK293T cells transfected with plasmid DNA and HDR templates were harvested 72 hours after transfection.
  • the genomic DNA of these cells were extracted using the QuickExtract DNA Extraction Solution (Biosearch Technologies) following the manufacturer’s protocol.
  • the target genomic region was amplified using specific primers outside of the homology arms of the HDR template.
  • the primers used for Sanger sequencing or NGS analysis could be found in the Table 9.
  • PCR products were purified STDU2-42312.601 (S22-113) with Monarch PCR & DNA Cleanup Kit (New England BioLabs). 100 ng of purified product was sent for Sanger sequencing with target-specific primers (EtonBio or Genewiz).
  • the knock-in events were amplified using specific TA colony primers targeted to DYNLT1 or HSP90AA1 locus (Table 9) using Phusion Flash High-Fidelity PCR Master Mix (ThermoScientific, F-548L). Purify the targeted PCR products using Gel extraction kit (New England BioLabs, T1020L) following the manufacturer’s instructions. Add a-tail to the PCR products using Taq polymerase (New England BioLabs, M0273S) through incubate at 72C for 30 minutes. Set up the TOPO cloning reaction and transformation following the manufacturer’s instructions (Thermo Scientific, K457501).
  • the quantification window was increased to 10 bp surrounding the expected cut site to better capture diverse editing outcomes, but substitutions were ignored to avoid inclusion of sequencing errors. Only reads containing no mismatches to the expected amplicon were considered for HDR quantification; reads containing indels that partially matched the expected amplicons were included in the overall reported indel frequency.
  • the computation work was supported by the SCG cluster hosted by the Genetics Bioinformatics Service Center (GBSC) at the Department of Genetics of Stanford. All customized scripts for data analysis will be deposited to Github under Cong Lab and made available for download. [00436] Insertion site mapping and analysis.
  • Applicants used a process that was previously developed (GIS-seq) and adapted for the genome-wide, unbiased off-target analysis of mKate knock-in, following the similar protocol in Applicants’ previous study. Briefly, Applicants harvest the HEK293T cells 3 days after transfection. The genomic DNA was size-selected to avoid the template contamination in the following step via the DNAdvance genomic DNA kit (A48705, Beckman Coulter).400 ng of purified genomic DNA was fragmented to an average of 500bp using NEB Fragmentase, ligated with adaptors, and size-selected using NEBNext Ultra II FS DNA Library Prep kit following manufacture’s instruction.
  • Applicants’ search follows two guidelines: 1) Closely- related candidates are less likely to have differential activities; 2) Microbial enzymes that function well when heterologous expressed in eukaryotic cells are difficult to predict, thus sampling diverse evolutionary branches of RecT homologs would be ideal.
  • Applicants built phylogenetic trees and selected representative candidates after filtering out proteins with high sequence homology. Then, Applicants used a threshold of at least 10% sequence divergence and sizes up to 300-aa (to avoid extremely large proteins that are hard to synthesize and less portable) to refine the hits, and randomly sampled the evolutionary branches to obtain a final list of 16 SSAPs (FIG. 38E, FIG. 44).
  • dCas9-SSAP benefited from successively longer HA within the donor, regardless of whether the HAs are for HDR-type or MMEJ-type, in contrast to Cas9 editor that showed a boost of knock-in efficiencies when using the MMEJ donors (FIG.38F, HDR and MMEJ donors). This is consistent with the assumption that the enhancing effect when using MMEJ donors is dependent on Cas9 cleavage of target genomic sites.
  • target sequence usually 20-bp
  • PAM protospacer adjacent motif
  • NVG protospacer adjacent motif
  • NGRRT protospacer adjacent motif
  • Two DNA oligos could be ordered based on selected guides, with golden gate cloning overhangs, as shown below.
  • N denotes the guide sequences. Standard desalting oligos are sufficient for this cloning. The two oligos above will be annealed to form the insert fragments in the next step.
  • B Annealing of two DNA oligos for each guideRNA target. Perform phosphorylation and annealing of each pair of oligos via reaction setup below.
  • the wild-type Cas9 plasmids for this step will be: pCas9-MS2-BB_BbsI (see list of plasmids at end of protocol) Item Volume Note as needed.
  • dCas9-SSAP Golden Gate Cloning of annealed oligos into sgRNA/dspCas9 (dCas9-SSAP) plasmid
  • dCas9-SSAP sgRNA/dspCas9
  • the backbone vectors for the cloning will bear BbsI cloning sites matching the annealed oligos from Step B.
  • the dCas9-SSAP plasmids for this step will be: pdCas9-SSAP-MS2-BB_BbsI (see list of plasmids at end of protocol) Item Volume Note e. [00457] D.
  • dCas9-SSAP plasmids Perform gene-editing via delivery of dCas9-SSAP plasmids and template DNA STDU2-42312.601 (S22-113)
  • the three components of dCas9-SSAP editing method are ready for experiments: the guideRNA/Cas9 plasmid (cloned in step A-C), the template D), and the SSAP plasmid (pMCP-RecT, can be obtained from Addgene).
  • routine transfection or electroporation could be performed following the recommended conditions by the reagent or equipment manufacturer and selected based on the cell types. For HEK293T cells as an example, a typical transfection condition is described below: [00461] 1.
  • Transfection material dCas9-SSAP guideRNA plasmids, 160ng (for double sgRNAd design, use equal amount of the two guideRNA plasmids, e.g., 80ng each); pMCP-RecT or GFP control plasmid, 60ng; Template DNA, up to 30ng.
  • 4. Mix plasmids with template DNA and perform transfection according to the manufacturer's protocol for HEK293T/Hela/HepG2/U2OS cells. [00465] 5. 12-24 hours after transfection, if applicable could switch to fresh media. [00466] 6. After at least 3 days post transfection, cells could be harvested or proceed to downstream experiments or analysis as needed.
  • p-dCas9-SSAP-MS2-BB_BbsI pU6-MS2-gRNA-backbone(BbsI)-CBH- es are: guides starting with sp indicate SpCas9 guide RNA targets, and guides starting with dsp indicate dSpCas9 guide RNA targets. Table 8.
  • DYNLT1 P2A-mKate knock-in HDR template sequence (SEQ ID NO:548) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold) the inserted mKate fluorescent protein sequence, the proceeding non- underlined part is the P2A peptide sequence) AGTGACCTGTGTAATTATGCAGAAGAATGGAGCTGGATTACACACAGCAAGTTCCT STDU2-42312.601 (S22-113) TAAATGCTGCTCTCTTCCCTCCCGCAGGGAGCTGCACTGTGCGATGGGAATAA GACCATGTACTGCATCGTCAGTGCCTTCGGACTGTCTATTGGAAGCGGAGCTACTA GTATCCCTGTAGGTCACCTGCAGCCTGCGTTGCCACTTGTCTTAACTCTGAATATT TCATTTCAAAGGTGCTAAAATCTGAAATC
  • Applicants sought to functionally validate the ability of dCas9-SSAP editor to insert diverse payloads at endogenous loci (FIG. 56A). Briefly, Applicants constructed knock-in donors with selectable payloads (Puromycin and Blasticidin resistance cassettes) as fusion protein STDU2-42312.601 (S22-113) with endogenous genes (FIG. 56B, left). Applicants examined the knock-in results from dCas9- SSAP and Cas9 reference using Western Blot. Immunoblotting confirmed the presence sizes of expected knock-in fusion proteins using dCas9-SSAP across targets (HSP90AA1, ACTB) and payloads (FIG. 56B).
  • Editing efficiency was tested for a system combining SSAP with a reverse transcriptase and Cas9.
  • Cas9-RT pA131
  • Cas9(H840A) + RT expressing Cas9 nickase fused to reverse transcriptase (RT)
  • guideRNA with RNA template/donor [00485]
  • pA132 non-targeting control
  • pA132_HEK3_CTT_ins U6 driving guideRNA fused to RNA template/donor to insert, CTT, a 3bp sequence, at HEK3 genomic site in human genome
  • pA132_RNF2_GTA_ins U6 driving guideRNA fused to RNA template/donor to insert, GTA, a 3bp sequence, at RNF2 genomic site in human
  • Example 20 [00496] SSAP-RT for different lengths of genomic edits in HEK293T cells [00497] SSAP + Reverse Transcriptase with Cas9 [00498] 48-well plate HEK293T cell, the cell density was 60%. [00499] For lipofectamine 2000, 1086 ng DNA + 1 ul Lip2000, mix in 30 ul opti-MEM per well.
  • Cas9-RT [00502]
  • Cas9-RT [00503] pA131, Cas9(H840A) + RT: expressing Cas9 nickase fused to reverse transcriptase (RT)
  • guideRNA with RNA template/donor [00505]
  • pA132_HEK3_12_ins U6 driving guideRNA fused to RNA template/donor to insert 12bp sequence at HEK3 genomic site in human genome
  • pA132_HEK3_36_ins (U6 driving guideRNA fused to RNA template/donor to insert 36bp sequence at HEK3 genomic site in human genome) STDU2-42312.601 (S22
  • SSAP-encoding plasmids were purified and quantified. [00523] Each SSAP encoding plasmid was tested in duplicate, including a negative control (same plasmid encoding Flag_HA which is not expected to promote gene editing). Transfections were in 96-well plates and transfection efficiency was estimated to be 50%. [00524] Knock-in templates: 1. HSP90AA1: gCK240+241, tm 66.1C, mKate/pCK1451/pCK1452 as PCR template 2.
  • ACTB gCK115+116, tm 63.6C, mKate/pCK1453/pCK1454 as PCR templateLG
  • mKate positive cells and cell viability were quantified across all replicates, along with positive (original RecT SSAP) and negative (Flag-HA control protein) controls.
  • Higher frequency of mKate+ cells indicates a candidate SSAP is more active (e.g., has higher ability to mediate precision knock-in editing of the kilobase-scale transgene).
  • the cell viability was measured by live cell counts via flow cytometry, to help quantify the fitness effect of SSAP on mammalian cells.
  • FIG.88 shows results of SSAP array screening, showing editing efficiency as fold over negative control or percent of mKate knock-in and cell viability for the ACTB target and the HSP90AA1 target.
  • FIG. 89 shows normalized (89A) and absolute (89B) editing efficiency at HSP90AA compared to editing efficiency at ACTB.
  • FIG. 89C shows cell viability, comparing SSAP use for HSP90AA1 knock-ins with ACTB knock-ins.
  • FIG.90 provides plots comparing cell viability and editing efficiency, normalized (90A) and absolute (90B) over all targets and bar graphs illustrating normalized (C) or absolute (D) editing efficiency at ACTB and HSP90 for each of the SSAP candidates.
  • FIG. 91, 92, and 93 Alignments and phylogenic trees depicting related proteins and sequence alignments for several of the top targets are provided in FIG. 91, 92, and 93.
  • the alignments indicate certain conserved regions and motifs, consistent with regions of predicted 3D structure (e.g., FIG. 36, 37, 44, 53).
  • At least 3 regions are highly conserved: (1) the N-terminal part has a S/N/Y-R/K-F/L/I- rich region resembling a Serine/Tyrosine recombinase motif; (2) the middle-part has a M-R/K- R/K-rich region; (3) the C-terminal part includes a D/E-D/E-F/Y region that resembles a transposase-like motif. Some candidate SSAPs may have one, or more of these regions. This is STDU2-42312.601 (S22-113) also in agreement with the predicted 3D structure of SSAP and interaction of the SSAP with DNA that promotes homology-based recombination via highly-charged amino acids.
  • Top scoring SSAP proteins are shown in Table 12.
  • the table shows editing efficiency as the normalized average of two targets (HSP90 and ACTB), absolute editing efficiency, and cell viability.
  • SSAP proteins are identified by Uniparc deposit number and SEQ ID NO. Alignment numbers correspond to SSAPs in FIG. 91, 92, and 93.
  • R2 element reverse transcriptase and related non-LTR retrotransposable element reverse transcriptase can be used.
  • Reference 1 Wilkinson ME, Frangieh CJ, Macrae RK, Zhang F. Structure of the R2 non-LTR retrotransposon initiating target-primed reverse transcription. Science. 2023 Apr 21;380(6642):301-308. doi: 10.1126/science.adg7883. Epub 2023 Apr 6. PMID: 37023171.
  • Reference 2 Deng P, Tan SQ, Yang QY, Fu L, Wu Y, Zhu HZ, Sun L, Bao Z, Lin Y, Zhang QC, Wang H, Wang J, Liu JG.
  • Reference 1 Heller RC, Chung S, Crissy K, Dumas K, Schuster D, Schoenfeld TW. Engineering of a thermostable viral polymerase using metagenome- derived diversity for highly sensitive and specific RT-PCR. Nucleic Acids Res. 2019 Apr 23;47(7):3619-3630. doi: 10.1093/nar/gkz104. PMID: 30767012; PMCID: PMC6468311.
  • Reference 2 Ellefson JW, Gollihar J, Shroff R, Shivram H, Iyer VR, Ellington AD. Synthetic evolutionary origin of a proofreading reverse transcriptase. Science.2016 Jun 24;352(6293):1590- 3. doi: 10.1126/science.aaf5409.
  • SEQ ID No. N1-N10 for the chimeric or fusion RT design.
  • Chimeric reverse transcriptase that are engineered by fusion, with or without peptide linker, between two reverse transcriptase can be used. These chimeric or fusion reverse transcriptase will have N-term from one reverse transcriptase and the C-term from another reverse transcriptase.
  • one type of fusion or chimeric reverse transcriptase consists of N-terminal polymerase domain of one reverse transcriptase and the C-terminal RNaseH domain of another reverse transcriptase, with the fusion site either before, within, or after the connection domain (originally located between the polymerase domain and the RNaseH domain).
  • R2 retron designs that showcase SSAP along with R2 RNA element, or a family of non-long terminal repeat (non-LTR) retrotransposable element called R2 retron are as follows: [00536] R2 protein fused to Cas9 nickase (H840A) or dCas9 (D10A, H840A), and use MS2 aptamer fused to the R2 element RNA to recruit SSAP via MS2-coat-protein (MCP). Different version of the fusion proteins, SEQ ID N126 for Cas9 nickase and SEQ ID N127 for dCas9, were designed.
  • the sequence is designed as SEQ ID N131-N136, with different location of the aptamer and number of aptamers.
  • STDU2-42312.601 S22-113
  • SEQ ID N128 for Cas9 nickase and SEQ ID N129 for dCas9 were designed.
  • the R2 element basic RNA sequence is designed as SEQ ID N130. Table 13.
  • the cargo contains promoter-less mKate ( ⁇ 1kb) for in-frame insertion into specific endogenous genome loci.
  • FIG. 104B through FIG. 104L show the gating strategy for detecting mKate + lnock- ins. Top candidates were then compared for activity using Cas9 nickase and Cas9 wild-type STDU2-42312.601 (S22-113) nuclease.
  • Fig.105 shows candidate SSAPs and D3 phage exonuclease.
  • Table 15 lists ERF family proteins identified in the screen, including SSAPs and exonucleases. Table 15.
  • PE-Max was used as a benchmark for comparison with SSAPs using SpCas9-H840A nickase. Two types of edits were compared: a 12-bp insertion using a 40bp flap donor in HEK3 (see Anzalone et al., Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nature Biotechnology 40, 731-740 (2022); and a lentiviral reporter to detect splice correction (see Gould et al., High-throughput evaluation of genetic variants with prime editing sensor libraries. Nature Biotechnology (2024) doi: 10.1038/s41587- 024-02172-9). PE-Max system and reporter vectors are depicted in FIG.
  • SSAP vectors are depicted in FIG. 107A-FIG. 107C. STDU2-42312.601 (S22-113) [00542] As shown in FIG. 108 and FIG. 109, SSAPs enhanced PE efficiency in the study by both measures.
  • Example 25 [00543] Circular ssDNA donor with SSAP for high-efficiency genome insertion.
  • HEK293T cells were transfected with Cas9n or dCas9 with an MS2-guideRNA (GGTAGTCGTACTCGTCGTCG (SEQ ID NO:500). SSAP was recruited using an MS2 aptamer.
  • the donor was circular single stranded DNA (cssDNA) RAB11A-mCherry including RAB11A flanking homology arms (FIG. 110).
  • cssDNA doses were and 90 ng / well, 30 ng / well, and 10 ng / well.
  • Cells were subject to FACS analysis three days post-transfection. [00545] Controls were “no donor” and “non-SSAP.” As shown in FIG. 111A - FIG. 111C, SSAPs tested enhanced efficiency at all cssRNA doses tested.
  • a system or composition comprising: (i) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; and (ii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; STDU2-42312.601 (S22-113) or, (iii) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for in a cell; or, (iv) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell.
  • SSAP single stranded DNA annealing protein
  • SSB single stranded DNA binding protein
  • Paragraph 2 The system or composition of paragraph 1, wherein the system does not comprise a CRISPR protein, or does not comprise a Cas protein, or does not comprise a Cas9 protein, or does not comprise a Cas12a protein.
  • Paragraph 3 The system or composition of paragraph 1 or 2, further comprising a recruitment system comprising. at least one aptamer; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein.
  • Paragraph 4. The system or composition or composition of paragraph 3, wherein the at least one aptamer comprises an RNA aptamer or a peptide aptamer.
  • Paragraph 5 The system or composition of paragraph 4, wherein the nucleic acid molecule or nucleic acid molecules comprise two RNA aptamer sequences.
  • Paragraph 6 The system or composition of paragraph 5, wherein the two RNA aptamer sequences comprise the same sequence.
  • Paragraph 7. The system or composition of any of paragraphs 3 to 6, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof.
  • Paragraph 8. The system or composition of any of paragraphs 3 to 61.a.i.5, wherein the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof.
  • Paragraph 9. The system or composition of paragraph 3 or 4, wherein the at least one aptamer is linked to the guide RNA.
  • Paragraph 10. The system or composition of paragraph 9, wherein the guide RNA sequence comprises between 1 and 24 aptamer sequences.
  • NLS nuclear localization sequence
  • Paragraph 17 The system or composition of paragraph 16, wherein the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO:16.
  • Paragraph 18 The system or composition of paragraph 16 or 17, wherein the nuclear localization sequence is on the recombination protein C-terminus or on the recombination N-terminus.
  • Paragraph 19 The system or composition of any one of paragraphs 1 to 18, wherein the recombination protein comprises a microbial recombination protein or active portion thereof.
  • Paragraph 20 The system or composition of any of paragraphs 1 to 15, including at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the recombination protein.
  • Paragraph 29 The system or composition of any of paragraphs 1 to 26, wherein the target DNA sequence is a genomic DNA sequence in a host cell.
  • Paragraph 29 The system or composition of any of paragraphs 3 to 28 wherein the aptamer binding protein and the recombination protein are functionally linked to each other and comprise a fusion protein.
  • Paragraph 30 A cell or eukaryotic cell comprising the system or composition of any one of paragraphs 1 to 29.
  • Paragraph 31 A method of altering a target genomic DNA sequence in a cell comprising a target genomic DNA sequence, comprising introducing the system or composition of any one of paragraphs 1 to 29 into the cell.
  • Paragraph 32 A method of altering a target genomic DNA sequence in a cell comprising a target genomic DNA sequence, comprising introducing the system or composition of any one of paragraphs 1 to 29 into the cell.
  • the method of any one of paragraphs 31 to 34, wherein the introducing into a cell comprises administering to a subject.
  • a system or composition comprising: (i) a nucleic acid polymerase(s); (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription, or nucleic acid molecules comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) for expression in vivo in a cell; or, (v) vector(
  • Paragraph 44 The system or composition of paragraph 43 wherein (i), (ii) and (iii), further comprises a Cas protein; or (iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) and/or a Cas protein for expression in vivo in a cell; or the vector(s) of (v) additionally contains nucleic acid molecule(s) encoding a Cas protein.
  • Paragraph 45 The system or composition of paragraph 43 or 44, further comprising a recruitment system comprising. at least one aptamer; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein.
  • Paragraph 46 The system or composition or composition of paragraph 45, wherein the at least one aptamer is an RNA aptamer or a peptide aptamer.
  • Paragraph 47 The system or composition of paragraph 46, wherein the nucleic acid molecule or nucleic acid molecules comprises two RNA aptamers.
  • Paragraph 48 The system or composition of paragraph 47, wherein the two RNA aptamer sequences comprise the same sequence.
  • Paragraph 49. The system or composition of any of paragraphs 45 to 47, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof.
  • Paragraph 50 The system or composition of any of paragraphs 45 to 47, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof.
  • aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof.
  • Paragraph 51. The system or composition of paragraph 45, wherein the at least one aptamer sequence is linked to the Cas protein.
  • Paragraph 52. The system or composition of paragraph 45, wherein the at least one aptamer sequence is linked to the guide RNA.
  • Paragraph 53. The system or composition of paragraph 45, wherein the recruitment system comprises from 1 to 24 aptamers.
  • Paragraph 54 The system or composition of any one of paragraphs 51 to 53, wherein two or more aptamers comprise the same sequence.
  • any of paragraphs 43 to 58 including at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is / are linked to the recombination protein or the Cas protein or the reverse transcriptase or at least one NLS on each at least two or three of the recombination protein, the reverse transcriptase or the Cas protein.
  • NLS nuclear localization sequence
  • STDU2-42312.601 S22-113 Paragraph 60.
  • the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO:16.
  • paragraph 59 or 60 which comprises a NLS on one or more of the recombination protein C-terminus, the recombination protein N-terminus or on the Cas protein.
  • Paragraph 62 The system or composition of any one of paragraphs 43 to 61, wherein the recombination protein comprises a microbial recombination protein or active portion thereof.
  • Paragraph 63 The system or composition of any one of paragraphs 43 to 61, wherein the recombination protein comprises a mitochondrial recombination protein or active portion thereof.
  • Paragraph 64 The system or composition of any one of paragraphs 43 to 61, wherein the recombination protein comprises a viral recombination protein or active portion thereof.
  • Paragraph 65 The system or composition of paragraph 59 or 60, which comprises a NLS on one or more of the recombination protein C-terminus, the recombination protein N-terminus or on the Cas protein.
  • Paragraph 76 The system or composition of any of paragraphs 43 to 75, wherein the target DNA sequence is a genomic DNA sequence in a host cell.
  • Paragraph 77 The system of composition of any of paragraphs 43 to 76, wherein the nucleic acid polymerase comprises reverse transcriptase activity.
  • Paragraph 78 The system or composition of any of paragraphs 43 to 76, wherein the nucleic acid polymerase comprises a retron RT.
  • Paragraph 79. The system or composition of any one of paragraphs 43 to 78, wherein the nucleic acid polymerase and recombination protein are functionally linked to each other and comprise a fusion protein.
  • Paragraph 80 The system or composition of any one of paragraphs 43 to 78, wherein the nucleic acid polymerase and recombination protein are functionally linked to each other and comprise a fusion protein.
  • Paragraph 84 A cell or eukaryotic cell comprising the system or composition of any one of paragraphs 43 to 83.
  • Paragraph 85 A method of altering a target genomic DNA sequence in a cell comprising a target genomic DNA sequence, comprising introducing the system or composition of any one of paragraphs 43 to 83 into the cell.
  • STDU2-42312.601 S22-113
  • Paragraph 86 The cell or eukaryotic cell of paragraph 84 or the method of paragraph 85, wherein the cell or eukaryotic cell is a mammalian cell.
  • Paragraph 87 The cell or eukaryotic cell of paragraph 84 or the method of paragraph 85, wherein the cell or eukaryotic cell is a mammalian cell.
  • Paragraph 88. The cell or eukaryotic cell or method of any of paragraphs 84 to 87, wherein the cell or eukaryotic cell or mammalian cell is a stem cell.
  • Paragraph 89. The method of any one of paragraphs 85 to 88, wherein the target genomic DNA sequence encodes a gene product.
  • the method of any one of paragraphs 85 to 88, wherein the introducing into a cell comprises administering to a subject.
  • Paragraph 91. The method of paragraph 90, wherein the subject is a mammalian non-human animal or a human.
  • Paragraph 93 The method of any one of paragraphs 85 to 88, wherein the cell or eukaryotic cell or mammalian cell is an ex vivo or in vitro cell.
  • Paragraph 94 The method of paragraph 93, further comprising, after the introducing step, administering to a subject the ex vivo or in vitro cells.
  • Paragraph 95 The method of paragraph 90, wherein the subject is a mammalian non-human animal or a human.
  • Paragraph 96 Use of the system or composition of any one of paragraphs 43 to 83 for the alteration of a target DNA sequence in a cell.
  • Paragraph 97 The method of paragraph 90 or 91, wherein the administering comprises in vivo administration.
  • a method of recombination which comprises providing in a cell, a system or composition comprising: (i) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; wherein the target DNA sequence comprises a genomic DNA sequence in the cell, and (ii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; STDU2-42312.601 (S22-113) or, (iii) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for in a cell; or, (iv) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell.
  • SSAP single stranded DNA annealing protein
  • SSB
  • Paragraph 98 The method of paragraph 97, wherein (i) and (ii) further comprises a Cas protein or a reverse transcriptase (RT) or a Cas protein and RT; or (iii) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or a Cas protein and/or a nucleic acid polymerase for expression in vivo in the cell; or the vector(s) of (iv) additionally contains nucleic acid molecule(s) encoding a Cas protein and or RT.
  • Paragraph 99 Paragraph 99.
  • the target DNA sequence comprises a genomic sequence of albumin (ALB), AAVS1, HSP90AA1, DYNLT1, ACTB, BCAP31, HIST1H2BK, CLTA, or RAB11A.
  • Paragraph 100 The method of any of paragraphs 97 to 99, further comprising a recruitment system comprising. at least one aptamer; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein.
  • Paragraph 101 The method of paragraph 0, wherein the at least one aptamer is an RNA aptamer or a peptide aptamer.
  • Paragraph 102 is an RNA aptamer or a peptide aptamer.
  • nucleic acid molecule or nucleic acid molecules comprises two RNA aptamers.
  • Paragraph 103 The method of paragraph 102, wherein the two RNA aptamer sequence comprise the same sequence.
  • Paragraph 104. The method of any of paragraphs 100 to 102, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof.
  • Paragraph 105. The method of any of paragraphs 100 to 102, wherein the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof.
  • Paragraph 106 The method of paragraph 100, wherein the at least one aptamer is linked to the Cas protein. STDU2-42312.601 (S22-113) Paragraph 107.
  • any of paragraphs 100 to 111 wherein further comprises a linker between the recombination protein and the aptamer binding protein.
  • Paragraph 113. The method of paragraph 112, wherein the linker comprises the amino acid sequence of SEQ ID NO:15.
  • Paragraph 114 The method of any of paragraphs 97 to 113, including at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is / are linked to the recombination protein or the Cas protein or the reverse transcriptase or at least one NLS on each at least two or three of the recombination protein, the reverse transcriptase or the Cas protein.
  • NLS nuclear localization sequence
  • the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO:16.
  • Paragraph 116 The method of paragraph 114 or 115, wherein the system comprises a NLS on one or more of the recombination protein C-terminus, the recombination protein N-terminus or on the Cas protein.
  • Paragraph 117 The method of any one of paragraphs 97 to 116, wherein the recombination protein comprises a microbial recombination protein or active portion thereof.
  • Paragraph 118. The method of any one of paragraphs 97 to 116, wherein the recombination protein comprises a mitochondrial recombination protein or active portion thereof.
  • Paragraph 119 The method of paragraph 114, wherein the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO:16.
  • Paragraph 142 The method of any of paragraphs 97 to 141 wherein the cell or eukaryotic cell or mammalian cell or human cell is a stem cell.
  • Paragraph 143 A recombinant cell or organism produced by the method of any of claims 97 to 142. * * * [00549] Having thus described in detail preferred embodiments of the present invention, it is to be understood that the invention defined by the above paragraphs is not to be limited to particular details set forth in the above description as many apparent variations thereof are possible without departing from the spirit or scope of the present invention.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Mycology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The present invention provides recombineering-editing systems using CRISPR and recombination enzymes as well as methods, vectors, nucleic acid compositions, and kits thereof. The methods and systems provide means for altering target DNA, including genomic DNA in a host cell.

Description

STDU2-42312.601 (S22-113) RNA-GUIDED GENOME RECOMBINEERING AT KILOBASE SCALE
Figure imgf000002_0001
RELATED APPLICATIONS AND INCORPORATION BY REFERENCE [0001] This application claims priority to US provisional application Serial No. 63/533,192, filed August 17, 2023, which is incorporated by reference herein in its entirety. [0002] Reference is made to PCT/US2023/062406, filed February 10, 2023, PCT/US2022/075850, filed September 1, 2022, and PCT/US2021/020513, filed March 2, 2021. [0003] The foregoing applications, and all documents cited therein or during their prosecution (“appln cited documents”) and all documents cited or referenced in the appln cited documents, and all documents cited or referenced herein (“herein cited documents”), and all documents cited or referenced in herein cited documents, together with any manufacturer’s instructions, descriptions, product specifications, and product sheets for any products mentioned herein or in any document incorporated by reference herein, are hereby incorporated herein by reference, and may be employed in the practice of the invention. More specifically, all referenced documents are incorporated by reference to the same extent as if each individual document was specifically and individually indicated to be incorporated by reference. SEQUENCE LISTING [0004] The instant application contains a Sequence Listing which has been submitted via Patent Center and is hereby incorporated by reference in its entirety. Said .xml copy, created on August 18, 2024 is named STDU2-42312601.xml, and is 1,096,300 bytes in size. STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT [0005] This invention was made with Government support under contracts GM141627 and HG011316 awarded by the National Institutes of Health. The Government has certain rights in the invention. FIELD OF THE INVENTION [0006] The present invention relates to RNA-guided recombineering-editing systems using phage recombination enzymes as well as methods, vectors, nucleic acid compositions, and kits thereof. STDU2-42312.601 (S22-113) BACKGROUND OF THE INVENTION
Figure imgf000003_0001
[0007] The Clustered Regularly Interspaced Short Palindromic Repeats originally found in bacteria and archaea as part of the immune system to defend against invading viruses, forms the basis for genome editing technologies that can be programmed to target specific stretches of a genome or other DNA for editing at precise locations. While various CRISPR-based tools are available, the majority are geared towards editing short sequences. Long-sequence editing is highly sought after in the engineering of model systems, therapeutic cell production and gene therapy. Prior studies have developed technologies to improve Cas9-mediated homology-5 directed repair (HDR) (K. S. Pawelczak, et al., ACS Chem. Biol. 13, 389–396 (2018)), and tools leveraging nucleic acid modification enzymes with Cas9, e.g., prime-editing (A. V. Anzalone, et al., Nature. 576, 149–157 (2019)) that demonstrated editing up to 80 base-pairs (bp) in length. Despite these progresses, there are continued demands for large-scale mammalian genome engineering with high efficiency and fidelity. [0008] Citation or identification of any document in this application is not an admission that such document is available as prior art to the present invention. SUMMARY OF THE INVENTION [0009] Provided herein are systems and methods that facilitate nucleic acid editing in a manner that allows large-scale nucleic acid editing with high accuracy and low off-target errors. These systems and methods employ a recombination protein component and optionally a CRISPR component. [0010] The Pseudomonas bacteriophage D3 ERF protein (also known as orf52) is involved in circularization of the linear dsDNA phage genome upon entry into a the host cell. Homologs of ERF are encoded in several bacterial and phage genomes from diverse taxa. ERF proteins form a family that shows no evolutionary relationship to other single-stranded annealing proteins (SSAPs) such as RecT and Redβ. The ERF proteins often function with an exonuclease (e.g., phage D3 exonuclease (also known as ofr51) in viral two-component recombinases. [0011] Disclosed herein are systems comprising a binding protein, a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence, and a recombination protein. The recombination protein may be a single stranded DNA annealing protein (SSAP), including but not limited to a microbial recombination protein, for example D3 STDU2-42312.601 (S22-113) ERF (orf52), D3 Exo (orf51), RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof. In some embodiments, the system further comprises a donor nucleic acid, including but not limited to donor DNA or donor RNA. In some embodiments, the system further comprises a nucleic acid polymerase, such as without limitation, a reverse transcriptase. In some embodiments, the target DNA sequence is a genomic DNA sequence in a host cell. In certain embodiments, there is no CRISPR component. In certain embodiments, the system comprises a recruitment system which recruits the recombination protein and a nucleic acid that directs the recombination protein to a target. In certain embodiments, the recruitment system recruits the recombination protein, the nucleic acid that directs the recombination protein, and a CRISPR component. [0012] In an aspect, the invention provides a recombination system or composition comprising (i) a Cas protein; (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) for expression in vivo in a cell; or, (v) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell. In certain embodiments, the recombination protein comprises an amino acid sequence with at least 70% similarity or identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:781; SEQ ID NO:782; SEQ ID NO:783; SEQ ID NO:784; SEQ ID NO:785; or SEQ ID NO:786, or derivative or variant or functional portion thereof. [0013] In certain embodiments, the system or composition comprises a recruitment system for recruiting a guide nucleic acid and a recombination protein. In certain embodiments, the recruitment system comprises at least one aptamer sequence and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein. In certain STDU2-42312.601 (S22-113) embodiments, the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer
Figure imgf000005_0001
sequence. In certain embodiments, the nucleic acid molecule or nucleic additionally comprises the at least one RNA aptamer sequence or comprises one, two, three, or more RNA aptamer sequences. In certain embodiments two aptamer sequences comprise the same sequence or comprise sequences that bind to the same aptamer binding protein. [0014] In certain embodiments, the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. In certain embodiments, the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. In certain embodiments, the at least one peptide aptamer sequence is conjugated to the guide RNA. In certain embodiments, the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences. In certain embodiments, two or more aptamer sequences comprise the same sequence. In certain embodiments, an aptamer sequence comprises a GCN4 peptide sequence. [0015] In certain embodiments, the recombination protein N-terminus is linked to the aptamer binding protein C-terminus. In certain embodiments, the recombination protein and the aptamer binding protein are operably linked by a linker. [0016] In certain embodiments, the recombination system or composition comprises at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the recombination protein. In certain embodiments, the NLS is located at the recombination protein C-terminus or at the recombination protein N-terminus. [0017] In certain embodiments, the recombination protein comprises a microbial recombination protein or active portion thereof, a mitochondrial recombination protein or active portion thereof, a viral recombination protein or active portion thereof, or a eukaryotic recombination protein or active portion thereof, including without limitation, a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof. In certain embodiments, the recombination protein comprises an amino acid sequence with at least 70% identity , or at least 75% identity, or at least 80% identity, or at least 85% identity, or at least 90% identity, or at least 92% identity, or at least 95% identity, or at least 96% identity, or at least 97% identity, or at least 98% identity, or at least 99% identity to a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof. [0018] In certain embodiments, the system or composition comprises a donor nucleic acid. In certain embodiments, the donor nucleic acid comprises a single stranded nucleic acid which can STDU2-42312.601 (S22-113) comprise RNA and/or DNA and/or modified nucleotides. In certain embodiments, the donor
Figure imgf000006_0001
nucleic acid comprises a double stranded nucleic acid which can comprise RNA and/or modified nucleotides. In certain embodiments, the donor nucleic acid comprises RNA. In certain embodiments, the donor nucleic acid comprises DNA. In certain embodiments, the donor nucleic acid comprises homology arms. [0019] In certain embodiments, the target DNA sequence comprises a genomic DNA sequence in a host cell. In certain embodiments, the target DNA sequence comprises a mitochondrial DNA or a plastid or chloroplast DNA in a host cell. In certain embodiments, the target DNA is an episomal or viral nucleic acid sequence in a host cell. [0020] In certain embodiments, the recombination system is comprised in a cell, for example, a eukaryotic cell, a mammalian cell, an animal cell, a human cell, or a plant cell. [0021] The recruitment system is adaptable to a multitude of combinations and configurations of recombination proteins. For example, by selecting and incorporating multiple nucleic acid aptamers, the system can comprise multiple recombination proteins, which may be the same or different and in various ratios. In certain embodiments, the system comprises an exonuclease. In certain embodiments, the system comprises an SSAP. In certain embodiments, the system comprises an SSB. In certain embodiments, the system comprises an exonuclease and an SSAP. In certain embodiments, the system comprises an exonuclease and an SSB. In certain embodiments, the system comprises an SSAP and an SSB. In certain embodiments, the system comprises an exonuclease and an SSAP and does not comprise an SSB. In certain embodiments, the system comprises an exonuclease and an SSB and does not comprise an SSAP. In certain embodiments, the system comprises an SSAP and an SSB and does not comprise an exonuclease. In certain embodiments, the system comprises an exonuclease, an SSAP, and an SSB. [0022] In an aspect, the invention provides a recombination system comprising an recombination protein and a nucleic acid polymerase, including but not limited to a reverse transcriptase (RT). In certain embodiments, the invention provides a system or composition comprising: (i) a reverse transcriptase(s) (RT); (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription, or nucleic acid molecules comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing STDU2-42312.601 (S22-113) protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more
Figure imgf000007_0001
thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for expression in vivo in a cell; or, (v) vector(s) containing the nucleic acid molecule(s) of (iv) for expression in vivo in a cell. [0023] In certain embodiments, the recombination protein comprises an amino acid sequence with at least 70% similarity or identity, or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:781; SEQ ID NO:782; SEQ ID NO:783; SEQ ID NO:784; SEQ ID NO:785; or SEQ ID NO:786, or derivative or variant or functional portion thereof. [0024] In certain embodiments, the nucleic acid polymerase comprises a reverse transcriptase, such as but not limited to a reverse transcriptase which comprises an amino acid sequence having at least 70% similarity or 70% identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to the reverse transcriptase of any one of SEQ ID NO:627 to SEQ ID NO:755. [0025] In certain embodiments, the recombination system comprises a recombination protein and a prime editor. Non-limiting examples of prime editors include prime editor 1, which comprises a wild-type Moloney Murine Leukemia Virus (M-MLV) reverse transcriptase was fused to the Cas9 H840A nickase C-terminus, prime editor 2, which comprises mutant M-MLV RT (D200N/L603W/T330P/T306K/W313F) fused to the Cas9 H840A nickase C-terminus, prime editor 3, which comprises a nicking guide to nick the unedited strand in a prime editor system, prime editor systems which comprise a component that knocks down endogenous mismatch repair, prime editor systems that comprise Cas9 nuclease instead of Cas9 nickase, and twin prime editors. STDU2-42312.601 (S22-113) [0026] In certain embodiments, the system or composition further comprises a Cas protein; or
Figure imgf000008_0001
(iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or a Cas protein for expression in vivo in a cell; or the vector(s) of (v) additionally contains nucleic acid molecule(s) encoding a Cas protein. [0027] In certain embodiments, one or more of the components is provided as a complex. For example, a protein or a fusion protein and a nucleic acid are provided as a ribonucleoprotein (RNP). Non-limiting examples of an RNP include a CRISPR-guideRNA complex, and an SSAP-guide RNA complex. In certain embodiments, a fusion protein comprises one or more components. Non- limiting examples include a Cas9-SSAP fusion, a Cas9-RT fusion, and a SSAP-RT fusion. [0028] In certain embodiments, the system or composition comprises a recruitment system for recruiting a guide nucleic acid and a recombination protein. In certain embodiments, the recruitment system comprises at least one aptamer sequence and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein. In certain embodiments, the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence. In certain embodiments, the nucleic acid molecule or nucleic acid molecules additionally comprises the at least one RNA aptamer sequence or comprises one, two, three, or more RNA aptamer sequences. In certain embodiments two aptamer sequences comprise the same sequence or comprise sequences that bind to the same aptamer binding protein. [0029] In certain embodiments, the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. In certain embodiments, the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. In certain embodiments, the at least one peptide aptamer sequence is conjugated to the guide RNA. In certain embodiments, the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences. In certain embodiments, two or more aptamer sequences comprise the same sequence. In certain embodiments, an aptamer sequence comprises a GCN4 peptide sequence. [0030] In certain embodiments, the recombination protein N-terminus is linked to the aptamer binding protein C-terminus. In certain embodiments, the recombination protein and the aptamer binding protein are operably linked by a linker. [0031] In certain embodiments, the recombination system or composition comprises at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the STDU2-42312.601 (S22-113) recombination protein. In certain embodiments, the NLS is located at the recombination protein
Figure imgf000009_0001
C-terminus or at the recombination protein N-terminus. [0032] In certain embodiments, the recombination protein comprises a microbial recombination protein or active portion thereof, a mitochondrial recombination protein or active portion thereof, a viral recombination protein or active portion thereof, or a eukaryotic recombination protein or active portion thereof, including without limitation, a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof. In certain embodiments, the recombination protein comprises an amino acid sequence with at least 70% identity, or at least 75% identity, or at least 80% identity, or at least 85% identity, or at least 90% identity, or at least 92% identity, or at least 95% identity, or at least 96% identity, or at least 97% identity, or at least 98% identity, or at least 99% identity to a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof. [0033] In certain embodiments, the system or composition comprises a donor nucleic acid. In certain embodiments, the donor nucleic acid comprises homology arms. [0034] In certain embodiments, the recombination system is comprised in a cell, for example, a eukaryotic cell, a mammalian cell, an animal cell, a human cell, or a plant cell. [0035] In an aspect, the invention provides a system or composition comprising: (i) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; and (ii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iii) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for expression in vivo in a cell; or, (iv) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell; wherein the recombination protein comprises an amino acid sequence with at least 70% similarity or 70% identity, or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to any one of SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:781; SEQ ID NO:782; SEQ ID STDU2-42312.601 (S22-113) NO:783; SEQ ID NO:784; SEQ ID NO:785; or SEQ ID NO:786, or derivative or variant or
Figure imgf000010_0001
functional portion thereof. [0036] In certain embodiments, the system or composition of does not comprise a CRISPR protein, or does not comprise a Cas protein, or does not comprise a Cas9 protein, or does not comprise a Cas12a protein. [0037] In an aspect, the invention provides a method of recombination, which comprises providing in a cell, a system or composition, (i) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; wherein the target DNA sequence comprises a genomic DNA sequence in the cell, and (ii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iii) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for expression in vivo in a cell; or, (iv) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell. [0038] In certain embodiments, (i) and (ii) further comprise a Cas protein or a nucleic acid polymerase, including but not limited to a native or engineered polymerase having reverse transcriptase activity such as a reverse transcriptase (RT) or a Cas protein and RT; or (iii) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or a Cas protein and/or a RT for expression in vivo in the cell; or the vector(s) of (iv) additionally contains nucleic acid molecule(s) encoding a Cas protein and or RT. [0039] In certain embodiments, the recombination protein comprises an amino acid sequence with at least 70% similarity or identity, or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:781; SEQ ID NO:782; SEQ ID NO:783; SEQ ID NO:784; SEQ ID NO:785; or SEQ ID NO:786, or derivative or variant or functional portion thereof. STDU2-42312.601 (S22-113) [0040] In certain embodiments, the nucleic acid polymerase comprises a reverse transcriptase,
Figure imgf000011_0001
such as but not limited to a reverse transcriptase which comprises an amino acid sequence at least 70% similarity or 70% identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or is identical to the reverse transcriptase of any one of SEQ ID NO:627 to SEQ ID NO:755. [0041] In certain embodiments, one or more of the components is provided as a complex. For example, a protein or a fusion protein and a nucleic acid are provided as a ribonucleoprotein (RNP). Nonlimiting examples of an RNP include a CRISPR-guideRNA complex, and an SSAP-guide RNA complex. In certain embodiments, a fusion protein comprises one or more components. Non- limiting examples include a Cas9-SSAP fusion, a Cas9-RT fusion, and a SSAP-RT fusion. [0042] In certain embodiments, the target DNA sequence comprises a genomic sequence of albumin (ALB), AAVS1, HSP90AA1, DYNLT1, ACTB, BCAP31, HIST1H2BK, CLTA, or RAB11A. [0043] In certain embodiments, the system or composition comprises a recruitment system for recruiting a guide nucleic acid and a recombination protein. In certain embodiments, the recruitment system comprises at least one aptamer sequence and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein. In certain embodiments, the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence. In certain embodiments, the nucleic acid molecule or nucleic acid molecules additionally comprises the at least one RNA aptamer sequence or comprises one, two, three, or more RNA aptamer sequences. In certain embodiments two aptamer sequences comprise the same sequence or comprise sequences that bind to the same aptamer binding protein. [0044] In certain embodiments, the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. In certain embodiments, the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. In certain embodiments, the at least one peptide aptamer sequence is conjugated to the guide RNA. In certain embodiments, the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences. STDU2-42312.601 (S22-113) In certain embodiments, two or more aptamer sequences comprise the same sequence. In certain
Figure imgf000012_0001
embodiments, an aptamer sequence comprises a GCN4 peptide sequence. [0045] In certain embodiments, the recombination protein N-terminus is linked to the aptamer binding protein C-terminus. In certain embodiments, the recombination protein and the aptamer binding protein are operably linked by a linker. In certain embodiments, the linker comprises 39115. [0046] In certain embodiments, the recombination system or composition comprises at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the recombination protein. In certain embodiments, the NLS comprises the amino acid sequence of SEQ ID NO:16. In certain embodiments, the NLS is located at the recombination protein C- terminus or at the recombination protein N-terminus. [0047] In certain embodiments, the recombinant protein comprises a microbial recombination protein or active portion thereof, a mitochondrial recombination protein or active portion thereof, a viral recombination protein or active portion thereof, or a eukaryotic recombination protein or active portion thereof, including without limitation, a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof. In certain embodiments, the recombination protein comprises an amino acid sequence with at least 70% identity , or at least 75% identity, or at least 80% identity, or at least 85% identity, or at least 90% identity, or at least 92% identity, or at least 95% identity, or at least 96% identity, or at least 97% identity, or at least 98% identity, or at least 99% identity to a recombination protein set forth in Table 12 or derivative or variant or functional portion thereof. [0048] In certain embodiments, the system or composition comprises a donor nucleic acid. In certain embodiments, the donor nucleic acid comprises homology arms. [0049] In certain embodiments, the recombination system is comprised in a cell, for example, a eukaryotic cell, a mammalian cell, an animal cell, a human cell, or a plant cell. [0050] In some embodiments, the Cas protein is Cas9 or Cas12a. In some embodiments, the Cas protein is a catalytically dead. In some embodiments, the Cas9 protein is wild-type Streptococcus pyogenes Cas9 or a wild-type Staphylococcus aureus Cas9. In some embodiments, the Cas9 protein is a Cas9 nickase (e.g., wild-type Streptococcus pyogenes Cas9 with an amino acid substation at position 10 of D10A). [0051] Also disclosed is a eukaryotic cell comprising the systems or vectors disclosed herein. STDU2-42312.601 (S22-113) [0052] Further disclosed herein are methods of altering a target genomic DNA sequence in a host cell. The methods comprise contacting the systems, compositions, or vectors described herein with a target DNA sequence (e.g., introducing the systems, compositions, or vectors described herein into a host cell comprising a target genomic DNA sequence). Kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods are also disclosed herein. [0053] In some embodiments, the invention provides a system or composition comprising: (i) a nucleic acid polymerase, such as a reverse transcriptase(s) (RT); (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription, or nucleic acid molecules comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), (ii) and (iii) for expression in vivo in a cell; or, (v) vector(s) containing the nucleic acid molecule(s) of (iv) for expression in vivo in a cell. In this system or composition involving a RT (“the RT system or composition”), (iv) can involve (i) being enzyme, (ii) being nucleic acid molecule(s), and (iii) being nucleic acid molecules; or (i) being nucleic acid molecule(s) encoding the enzyme(s), (ii) being nucleic acid molecule(s), and (iii) being protein, or all of (i), (ii) and (iii) being nucleic acid molecules. In some embodiments the RT system or composition can include more than one reverse transcriptase. When there is more than one reverse transcriptase there can be more than one RNA for reverse transcription. In some embodiments, in the RT system or composition (i), (ii) and (iii) further comprises a Cas protein; or (iv) further comprises nucleic acid molecule(s) encoding a Cas protein, e.g., (iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) and/or a Cas protein for expression in vivo in a cell; or the vector(s) of (v) additional contain nucleic acid molecule(s) encoding a Cas protein. [0054] Reverse transcriptases that can be used according to the inventions herein include, without limitation, reverse transcriptases, retrotransposon reverse transcriptases, retron reverse transcriptases, LINE-1 reverse transcriptase, Ec86 reverse transcriptase, Human immunodeficiency virus (HIV) RT, Moloney murine leukemia virus (M-MLV) RT a group II intron RT, a group II intron-like RT, a chimeric RT, Maloney mouse leukemia virus (M-MLV) STDU2-42312.601 (S22-113) Transcriptase, Rous sarcoma virus (Rous sarcoma virus, RSV), avian myeloblastosis virus (AMV) reverse transcriptase, Lao Sishi correlated virus (RAV) reverse transcriptase and myeloblast Tumor correlated virus (MAV) reverse transcriptase or other Avian Sarcoma leukovirus (Avian sarcoma leukosis virus, ASLV) reverse transcriptase, and other naturally occurring and engineered nucleic acid polymerases. Such engineered polymerases include, with limitation, human DNA polymerase η which has reverse transcriptase activity in cellular environments (Su et al. 2019, J. Biol. Chem. 294(15):6073-81), and Taq DNA polymerase engineered to enhance reverse transcription and strand displacement (Barnes et el., Front. Bioeng. Biotechnol., 14 January 2021, doi.org/10.3389/fbioe.2020.553474), and telomerase reverse transcriptase (TERT) and related reverse transcriptases that are eukaryotic polymerase genes that do not represent a component of any mobile element or virus, or cellular single-copy rvt reverse transcriptase and related reverse transcriptase with similar domain structure, or R2 mobile element, or R2 retrotranposable element reverse transcriptase (R2 RT), or engineered phage or prokaryotic polymerases including derivatives of DNA polymerases with reverse transcriptase activities, or chimeric reverse transcriptase that are engineered by fusion, with or without peptide linker, between two reverse transcriptase can be used. These chimeric or fusion reverse transcriptase will have N-term from one reverse transcriptase and the C-term from another reverse transcriptase. Specifically, one type of fusion or chimeric reverse transcriptase consists of N-terminal polymerase domain of one reverse transcriptase and the C-terminal RNaseH domain of another reverse transcriptase, with the fusion site either before, within, or after the connection domain (originally located between the polymerase domain and the RNaseH domain). [0055] Reverse transcriptases further include, without limitation, those which comprises an amino acid sequence having at least 70% similarity or 70% identity or at least 75% similarity or identity, or at least 80% similarity or identity, or at least 85% similarity of identity, or at least 90% similarity or identity, or at least 92% similarity or identity, or at least 95% similarity or identity, or at least 96% similarity or identity, or at least 97% similarity or identity, or at least 98% similarity or identity, or at least 99% similarity or identity, or are identical to a reverse transcriptase comprises in any one of SEQ ID NO:627 to SEQ ID NO:755. [0056] In some embodiments, the RT system or composition further comprises a recruitment system comprising at least one aptamer sequence; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein. In some embodiments, in the RT STDU2-42312.601 (S22-113) system or composition or composition having a recruitment system, the at least one aptamer
Figure imgf000015_0001
sequence is an RNA aptamer sequence or a peptide aptamer sequence. In some the RT system or composition or composition having a recruitment system has nucleic acid molecule or nucleic acid molecules that additionally comprises the at least one RNA aptamer sequence, such as nucleic acid molecule or nucleic acid molecules comprises two RNA aptamer sequences; for instance, wherein the two RNA aptamer sequences comprise the same sequence. In some embodiments the RT system or composition or composition having a recruitment system has the aptamer binding protein comprising a MS2 coat protein, or a functional derivative or variant thereof; and/or the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof; and/or the at least one peptide aptamer sequence is conjugated to the Cas protein; and/or the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences; and/or the aptamer sequences comprise the same sequence. In some embodiments the RT system or composition or composition having a recruitment system has the aptamer sequence comprising a GCN4 peptide sequence. [0057] In some embodiments of the RT system or composition, the recombination protein N- terminus is linked to the aptamer binding protein C-terminus; and in some embodiments, the RT system or composition further comprises a linker between the recombination protein and the aptamer binding protein; for instance, in some embodiments, the linker comprises the amino acid sequence of SEQ ID NO:15. [0058] In some embodiments of the RT system or composition, the system or composition includes at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is / are linked to the recombination protein or the Cas protein or the reverse transcriptase or at least one NLS on each at least two or three of the recombination protein, the reverse transcriptase or the Cas protein; for instance, the nuclear localization sequence in some embodiments comprises the amino acid sequence of SEQ ID NO:16. In some embodiments of the RT system or composition, the nuclear localization sequence is on the recombination protein C-terminus on the recombination protein or the Cas protein. [0059] In some embodiments of the RT system or composition, the recombination protein comprises a recombination protein or active portion thereof. In some embodiments of the RT system or composition, the recombination protein comprises a mitochondrial recombination protein or active portion thereof. In some embodiments of the RT system or composition, the STDU2-42312.601 (S22-113) recombination protein comprises a viral recombination protein or active portion thereof. In some
Figure imgf000016_0001
embodiments of the RT system or composition, the recombination protein comprises a recombination protein or active portion thereof. In some embodiments of the RT system or composition, the recombination protein comprises RecE or RecT or RecE and RecT or derivative or variant or functional portion thereof. In some embodiments of the RT system or composition, the RecE, or derivative or variant thereof, comprises an amino acid sequence with at least 70% (or any whole number integer from 70 to 100% e.g., at least 71%, 72%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%) similarity or identity or homology to an amino acid sequence selected from the group consisting of SEQ ID NOs:1-8. In some embodiments of the RT system or composition, the fusion protein comprises RecT, or derivative or variant thereof. In some embodiments of the RT system or composition, the RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70%(or any whole number integer from 70 to 100% e.g., at least 71%, 72%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%) similarity or identity or homology to an amino acid sequence selected from the group consisting of SEQ ID NOs: 9-14. [0060] In some embodiments of the RT system or composition, the Cas protein is catalytically inactive (less than 5% nuclease activity as compared with a wild-type or non-mutated of the Cas protein) or catalytically dead. In some embodiments of the RT system or composition the Cas protein comprises Cas9 or Cas12a. In some embodiments of the RT system or composition the Cas9 protein comprises wild-type Streptococcus pyogenes Cas9 or a wild-type Staphylococcus aureus Cas9. In some embodiments of the RT system or composition the Cas protein comprises a nickase. In some embodiments of the RT system or composition the nickase comprises wild-type Streptococcus pyogenes Cas9 with an amino acid substation at position 10 of D10A. [0061] In some embodiments of the RT system or composition further comprises donor nucleic acid. In some embodiments of the RT system or composition the target DNA sequence is a genomic DNA sequence in a host cell. [0062] In some embodiments of the RT system or composition the RT and recombination protein are functionally linked to each other and comprise a fusion protein. In some embodiments of the RT system or composition the aptamer binding protein and the recombination protein are functionally linked to each other and comprise a fusion protein. In some embodiments of the RT system or composition the RT and the Cas protein are functionally linked to each other and STDU2-42312.601 (S22-113) comprise a fusion protein. In some embodiments of the RT system or composition the
Figure imgf000017_0001
recombination protein and the Cas protein are functionally linked to each other a fusion protein. In some embodiments of the RT system or composition. In some embodiments of the RT system or composition the RT, and the Cas protein, and the recombination protein are functionally linked to each other and comprise a fusion protein. [0063] With regard to RT, and linkers or ways to functionally link components of embodiments of the RT system or composition (as well as with regard to linkers or ways to functionally link components of systems or compositions discussed herein that do not involve RT) mention is made of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020/191171 and WO2021/226558 that involve what is known as prime editing and twin prime editing. Each of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020/191171 and WO2021/226558 is hereby incorporated herein by reference. RTs of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020/191171 and WO2021/226558 can be used in the practice of the present invention. Linkers or ways to functionally link of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020/191171 and WO2021/226558 can be used in the practice of the present invention. [0064] In some embodiments, the invention comprehends a cell or eukaryotic cell comprising any herein-described or discussed RT system or composition. In some embodiments, the invention comprehends a method of altering a target genomic DNA sequence in a cell comprising a target genomic DNA sequence, comprising introducing any herein-discussed or described RT system or composition. In some embodiments, the cell or eukaryotic cell is a mammalian cell, or in the methods the cell or eukaryotic cell is a mammalian cell; for instance, a human cell; for instance, a stem cell. In some embodiments the method involves the target genomic DNA sequence encoding a gene product. In some embodiments, the method includes introducing into a cell comprises administering to a subject. In some embodiments the method involves the subject being a STDU2-42312.601 (S22-113) mammalian non-human animal (e.g., a laboratory animal such as a rodent, rat, mouse, rabbit, or a
Figure imgf000018_0001
domestic animal such as a horse, dog or canine, or cat or feline, or a zoo non- domesticated animal in human care and custody) or a production animal such as a cow or pig), or a human. In some embodiments, the method the administering comprises in vivo administration. In some embodiments, in the method the cell or eukaryotic cell or mammalian cell is an ex vivo or in vitro cell. In some embodiments in the method comprises after the introducing step, administering to a subject the ex vivo or in vitro cells; and in such embodiments, the subject is a mammalian non-human animal or a human. [0065] In some embodiments, the invention involves use of the RT system or composition of for the alteration of a target DNA sequence in a cell. [0066] In systems or compositions discussed herein that do not involve RT, aspects of the RT system that do not pertain to RT or the RT system (e.g., linkers) can be applied to the herein- discussed systems or compositions that do not include RT. [0067] The linker may be a peptide of 5-30, 10-30, 10-20 or 15 amino acid residues. The linker may be - (Gly-Gly-Gly-Gly-Ser)2 - (SEQ ID NO:560), - (Gly-Gly-Gly-Gly-Ser)3 - (SEQ ID NO:561), or - (Gly-Gly-Gly-Gly-Ser)4 - (SEQ ID NO:562). In certain embodiments, the linker is - (Gly-Gly-Gly-Gly-Ser)3 - (SEQ ID NO:561). The amino acid sequence of SEQ ID NO:561 may be encoded by the nucleic acid sequence of SEQ ID NO:563. [0068] In certain embodiments, a linker is made up of a majority of amino acids that are sterically unhindered, such as glycine and alanine. Exemplary linkers are polyglycines (particularly (Glys, poly(Gly-Ala), and polyalanines. One exemplary suitable linker as shown in the Examples below is (Gly-Ser), such as - (Gly-Gly-Gly-Gly-Ser)2 - (SEQ ID NO:560), - (Gly- Gly-Gly-Gly-Ser)3 - (SEQ ID NO:561), or - (Gly-Gly-Gly-Gly-Ser)4 - (SEQ ID NO:562). [0069] Linkers may also be non-peptide linkers. For example, alkyl linkers such -NH-, -(CH2)s-C(O)-, wherein s=2-20 can be used. These alkyl linkers may further be substituted by any non-sterically hindering group such as lower alkyl (e.g., C1-4) lower acyl, halogen (e.g., CI, Br), CN, NH2, phenyl, etc. Nucleic acid sequence of linker
Figure imgf000018_0002
STDU2-42312.601 (S22-113) GGGGSGGGGSGGGGS (SEQ ID NO:561)
Figure imgf000019_0001
Amino acid sequence of linker any
Figure imgf000019_0002
p y p , p g p , g p such that Applicants reserve the right and hereby disclose a disclaimer of any previously known product, process, or method. It is further noted that the invention does not intend to encompass within the scope of the invention any product, process, or making of the product or method of using the product, which does not meet the written description and enablement requirements of the USPTO (35 U.S.C. §112, first paragraph) or the EPO (Article 83 of the EPC), such that Applicants reserve the right and hereby disclose a disclaimer of any previously described product, process of making the product, or method of using the product. It may be advantageous in the practice of the invention to be in compliance with Art. 53(c) EPC and Rule 28(b) and (c) EPC. All rights to explicitly disclaim any embodiments that are the subject of any granted patent(s) of applicant in the lineage of this application or in any other lineage or in any prior filed application of any third party is explicitly reserved. Nothing herein is to be construed as a promise. [0071] It is noted that in this disclosure and particularly in the claims and/or paragraphs, terms such as “comprises”, “comprised”, “comprising” and the like can have the meaning attributed to it in U.S. Patent law; e.g., they can mean “includes”, “included”, “including”, and the like; and that terms such as “consisting essentially of” and “consists essentially of” have the meaning ascribed to them in U.S. Patent law, e.g., they allow for elements not explicitly recited, but exclude elements that are found in the prior art or that affect a basic or novel characteristic of the invention. [0072] These and other embodiments are disclosed or are obvious from and encompassed by the following Detailed Description. [0073] Other aspects and embodiments of the invention will be apparent in light of the following detailed description and accompanying figures. BRIEF DESCRIPTION OF THE DRAWINGS [0074] FIG. 1A and FIG. 1B are the reconstructed RecE (FIG. 1A) and RecT (FIG. 1B) phylogenetic trees with eukaryotic recombination enzymes from yeast and human. STDU2-42312.601 (S22-113) [0075] FIG. 2A is a phylogenetic tree and length distribution of RecE/RecT homologs. FIG.
Figure imgf000020_0001
2B is the metagenomics distribution of RecE/T. FIG. 2C is a schematic showing disclosed herein. FIG. 2D are graphs of the genome knock-in efficiency of RecE/T homologs. [0076] FIG. 3A and 3B are graphs of the high-throughput sequencing (HTS) reads of homology directed repair (HDR) at the EMX1 (FIG. 3A) locus and the VEGFA (FIG. 3B) locus. FIGS.3C-3D are graphs of the mKate knock-in efficiency at HSP90AA1 (FIG.3C), DYNLT1 (FIG. 3D), and AAVS1 (FIG.3E) loci in HEK293T cells. FIG.3F is images of mKate knock-in efficiency in HEK293T cells with RecT. FIG. 3G is a schematic of an exemplary AAVS1 knock-in strategy and chromatogram trace from RecT knock-in group. FIG. 3H is schematics and graphs of the recruitment control experiment and corresponding knock-in efficiency. All results are normalized to NR. (NC, no cutting; NR, no recombinator). [0077] FIGS.4A-4C are graphs of the relative mKate knock-in efficiencies to the NE group at HSP90AA1 (FIG.4A), DYNLT1 (FIG.4B), and AAVS1 (FIG.4C) loci in HEK293T cells. (NC, no cutting control group. NR, no recombinator control group.) FIG. 4D is an image of an exemplary agarose gel of junction PCR that validates mKate knock-in at AAVS1 locus. FIG. 4E and 4F are graphs of the absolute and (FIG. 4E) and relative (FIG. 4F) LOV knock-in efficiencies at AAVS1 locus. FIG. 4G are the Sanger sequencing results of the junction PCR product of an exemplary mKate knock-in at AAVS1 locus. [0078] FIGS. 5A-5D are graphs of the genomic knock-in efficiencies at different loci across cell lines A549 (FIG. 5A), HepG2 (FIG. 5B), HeLa (FIG. 5C), and hESCs (H9) (FIG. 5D). FIG. 5E is images of mKate knock-ins in hESCs. FIG. 5F and 5G are genomic-wide off-target site (OTS) counts (FIG. 5F) and OTS chromosomal distribution (FIG. 5G) of REDITv1 tools. [0079] FIGS. 6A-6D are graphs of the relative mKate knock-in efficiency at the AAVS1 locus and the DYNT1 locus in A549 cell line (FIG. 6A), the DYNLT1 locus and the HSP90AA1 locus in HepG2 cell line (FIG. 6B), the DYNLT1 locus and the HSP90AA1 locus in Hela cell line (FIG. 6C), and the HSP90AA1 locus and the OCT4 locus in hES-H9 cell line (FIG. 6D). (NC, no cutting control group. NR, no recombinator control group. All data normalized to NR group.) FIG. 6E is representative FACS results of HSP90AA1 mKate knock-in in hES-H9 cells. [0080] FIGS. 7A-7D are graphs of the absolute mKate knock-in efficiencies of different homology arm lengths at the DYNLT1 (FIG. 7A) and HSP90AA1 (FIG. 7B) loci and the no recombinator controls for DYNLT1 (FIG. 7C) and HSP90AA1 (FIG. 7D). STDU2-42312.601 (S22-113) [0081] FIGS. 8A-8F are graphs of the indel rates of the top 3 predicted off-target loci
Figure imgf000021_0001
associated with sgEMX1 (FIGS. 8A-8C) or sgVEGFA (FIGS. 8D-8F) in the [0082] FIG.9A is a schematic of select embodiments of REDITv2N and corresponding knock- in efficiencies in HEK293T cells. FIG. 9B and 9C are graphs of genomic-wide off-target site (OTS) counts (FIG. 9B) and OTS chromosomal distribution (FIG. 9C) comparing REDITv2N against REDITv1. FIG.9D is a schematic of select embodiments of REDITv2D and corresponding knock-in efficiencies. FIG. 9E is a graph of editing efficiency of REDITv1, REDITv2N, and REDITv2D under serum starvation conditions. FIG. 9F is the knock-in efficiencies of REDITv3 in hESCs. FIG. 9G is images of mKate knock in using REDITv3 in hESCs. [0083] FIG. 10A and 10B are schematics and graphs of the relative mKate knock-in efficiencies of select embodiments of REDITv2N (FIG. 10A) and REDITv2D (FIG. 10B) at the DYNLT1 locus and the HSP90AA1 locus. [0084] FIGS. 11A-11D are images of agarose gels showing junction PCR of mKate knock-in at the DYNLT1 locus and the HSP90AA1 locus for a select REDITv2N system. FIG. 11E is the chromatogram sequence of junction PCR products at the DYNLT1 locus. [0085] FIG. 12A and 12B are graphs of the genomic distribution of detected off-target cleavages of select embodiments of REDITv2 (FIG. 12A) and REDITv2N (FIG. 12B). A pileup includes alignments that have two or more reads overlapping with each other. Flanking pairs include alignments that show up on opposite strands within 200bp upstream of each other. Target matched includes alignments that match to a treated target in the upstream sequence (up to 6 mismatches, including 1 mismatch in the PAM, are allowed in the target sequence). FIG.12C is a graph of the HTS HDR and indel reads at EMX1 locus for REDITv2N system. [0086] FIG. 13A is an image of an agarose gel showing junction PCR of mKate knock-ins at the DYNLT1 locus for REDITv2D system. FIG. 13B is the chromatogram sequence of junction PCR products at the DYNLT1 locus. [0087] FIGS. 14A-14C are graphs of the mKate knock-in efficiencies at the HSP90AA1 locus in REDITv2 (FIG. 14A), REDITv2N (FIG. 14B) and REVITv2D (FIG. 14C) when treated with different FBS concentrations. FIGS.14D-14F are graphs of the mKate knock-in efficiencies at the HSP90AA1 locus in REDITv2 (FIG. 14D), REDITv2N (FIG. 14E) and REVITv2D (FIG. 14F) when treated with different serum FBS concentrations. STDU2-42312.601 (S22-113) [0088] FIG. 15 is images of the nuclear localization of RecE_587 and RecT following EGFP fusion to the REDITv1 systems. Nuclei were stained with NucBlue Live Ready Probes Reagent. [0089] FIG. 16A and 16B are the relative mKate knock-in efficiencies at HSP90AA1and DYNLT1 loci following fusion of different nuclear localization sequences to either the N- or C- terminus of RecT and RecE_587. FIG. 16C and 16D are graphs of the absolute mKate knock-in efficiencies of the constructs from FIGS. 16A and 16B for the DYNLT1 locus (FIG. 16C) and the HSP90AA1 locus (FIG. 16D). [0090] FIGS. 17A-17D are graphs of the relative (FIGS. 17A and 17B) and absolute (FIGS. 17C and 17D) mKate knock-in efficiencies for the DYNLT1 locus (FIGS. 17A and 17C) and the HSP90AA1 locus (FIGS. 17B and 17D) following fusion new NLS sequences as well as optimal linkers to REDITv2 and REDITv3 variants. The REDITv2 versions using REDITv2N (D10A or H840A) and REDITv2D (dCas9) are indicated in the horizonal axis, along with the number of guides used. The different colors represent the different control groups and REDIT versions. [0091] FIG.18 is a graph of the relative editing efficiency of REDITv3N system at HSP90AA1 locus in hES-H9 cells. [0092] FIG. 19A is a diagram of an exemplary saCas9 expression vector. FIGS. 19B-19E are graphs of the relative mKate knock-in efficiencies at the AAVS1 locus (FIG. 19B) and HSP90AA1 locus (FIG. 19C) of different effectors in saCas9 system and the respective absolute efficiencies (FIG. 19D and 19E, respectively). (NC, no cutting control group. NR, no recombinator control group. [0093] FIG. 20A is a schematic of RecT truncations. FIGS. 20B and 20C are graphs of the relative mKate knock-in efficiencies at the DYNLT1 locus for wild-type Streptococcus pyogenes Cas9 and Streptococcus pyogenes Cas9n(D10A) with single- and double-nicking. [0094] FIG. 21A is a schematic of RecE_587 truncations. FIGS. 21B and 21C are graphs of the relative mKate knock-in efficiencies at the DYNLT1 locus for wild-type Streptococcus pyogenes Cas9 and Streptococcus pyogenes Cas9n(D10A) with single- and double-nicking. [0095] FIGS.22A and 22B are graphs of comparison of efficiency to perform recombineering- based editing with various exonucleases (FIG. 22A) and single-strand DNA annealing protein (SSAP) (FIG. 22B) from naturally occurring recombineering systems, including NR (no recombinator) as negative control. The gene-editing activity was measured using mKate knock-in STDU2-42312.601 (S22-113) assay at genomic loci (DYNLT1 and HSP90AA1). The data shown are percentage of successful
Figure imgf000023_0001
mKate knock-in using human HEK293 cells, each experiments were performed in . [0096] FIGS.23A-23E show a compact recruitment system using boxB and N22. The REDIT recombinator proteins were fused to N22 peptide and within the sgRNA was boxB, the short cognizant sequence of N22 peptide (FIG. 23A). FIGS. 23B-23E are graphs of the gene-editing efficiency using mKate knock-in assay, with wildtype SpCas9, with side-by-side comparisons to the MS2-MCP recruitment system. FIGS.23B and 23D are absolute mKate knock-in efficiency at DYNLT1, HSP90AA1 loci and FIGS. 23C and 23E are relative efficiencies. The data shown are percentage of successful mKate knock-in using HEK293 human cells, each experiments were performed in triplicate (n=3). [0097] FIGS.24A-24C show a SunTag recruitment system. The REDIT recombinator proteins were fused to scFV antibody and the GCN4 peptide in tandem fashion (10 copies of GCN4 peptide separated by linkers) was fused to the Cas9 protein (FIG. 24A). An mKate knock-in experiment (FIG.24B) with the DYNLT1 locus was used to measure the gene-editing knock-in efficiency (FIG. 24C). All data are measurements of gene-editing efficiency using mKate knock-in assay, with wildtype SpCas9. Absolute mKate knock-in efficiency at DYNLT1 are shown in the bottom right corner of each flow cytometry plot, where the control is without recombinator (NR), which included scFV fused to GFP protein as negative control, all experiments done in HEK293 human cells. [0098] FIGS. 25A and 25B exemplify REDIT with a Cas12A system. A Cpf1/Cas12a based REDIT system via the SunTag recruitment design was created (FIG. 25A) for two different Cpf1/Cas12a proteins. Using the mKate knock-in assay, the efficiencies at two endogenous loci (DYNLT1 and AAS1) were measured. (FIG. 25B). Shown are absolute mKate knock-in efficiency as measured by mKate+ cell percentage using HEK293 human cells, each experiment was performed in triplicate (n=3), where the negative control is without recombinator (NR). [0099] FIGS. 26A and 26B are the measurements of precision recombineering activity via mKate knock-in gene-editing assay using RecE and RecT homologs at the DYNLT1 locus (A) and the HSP90AA1 locus (B). Shown are absolute mKate knock-in efficiency as measured by mKate+ cell percentage using HEK293 human cells, each experiments were performed in triplicate (n=3), where the negative control is without recombinator (NR) and no cutting (NC). The original RecE and RecT from E. coli were also included as positive controls. STDU2-42312.601 (S22-113) [00100] FIGS. 27A and 27B is a schematic showing the SunTag-based recruitment of SSAP
Figure imgf000024_0001
RecT to Cas9-gRNA complex for gene-editing (FIG. 27A) and a graph efficiencies of SunTag compared to MS2-based strategies (FIG. 27B). [00101] FIGS. 28A-28C show comparisons of REDIT with alternative HDR-enhancing gene- editing approaches. FIG. 28A is schematics showing alternative HDR-enhancing approaches via fusing functional domains, CtIP or Geminin (Gem), to Cas9 protein (left) and when combined with REDIT (right). FIG. 28B is an alternative small-molecule HDR-enhancing approach through cell cycle control. Nocodazole was used to synchronize cells at the G2/M boundary (left) according to the timeline shown (right). FIG.28C is comparisons of gene-editing efficiencies using REDIT and alternative HDR-enhancing tools, Cas9-HE (CtIP fusion), Cas9-Gem (Geminin fusion), and Nocodazole (noc), along with combination of REDIT with these methods (Cas9-HE/Cas9- Gem/noc+REDIT). Donor DNAs have 200 + 400 bp (DYNLT1) or 200 + 200bp (HSP90AA1) of HAs. All assays performed with no donor, NTC and Cas9 (no enhancement) controls. #P < 0.05, compared to REDIT; ##P < 0.01, compared to REDIT. [00102] FIGS. 29A-29D show template design guideline, junction precision, and capacity of REDIT gene-editing methods. FIG.29A is graphs of a homology arm (HA) length test comparing different template designs of HDR donors (longer HAs) or NHEJ/MMEJ donors (zero/shorter HAs) using REDIT and Cas9 references. Top and bottom are two genomic loci tested using mKate knock-in assay. FIG.29B is a design of an exemplary junction profiling assay through isolation of knock-in clones, followed by genomic PCR using primers (fwd, rev) binding outside donor to avoid template amplification. Paired Sanger sequencing of the PCR products reveal homologous and non-homologous edits at the 5’- and 3’- junctions. FIG. 29C is a graph of the percentage of colonies with indicated junction profiles from the Sanger sequencing of knock-in clones as in FIG. 29B. Editing methods and donor DNA are listed at the bottom (HA lengths indicated in bracket). FIG. 29D is a graph of knock-in efficiencies using a 2-kb cassette to insert dual-GFP/mKate tags to validate REDIT methods with Cas9. HA lengths of donor DNAs indicated at the bottom. [00103] FIGS. 30A-30C show GISseq results (Figures 6C–6E) indicating that REDIT is an efficient method with the ability to insert kilobase-length sequences with less unwanted editing events. FIG.30A is a schematic showing the design, procedures, and analysis steps for GIS-seq to measure genome-wide insertion sites of the knock-in cassettes. High-molecular-weight (HMW) genomic DNA purification was needed to remove potential contamination from donor DNAs. STDU2-42312.601 (S22-113) Donor DNAs had 200 bp HAs each side. FIG. 30B is representative GIS-seq results showing
Figure imgf000025_0001
plus/minus reads at on-target locus DYNLT1. The expected 2A-mKate knock-in the stop codon of the last exon are the center of the trimmed reads (reads clipped to remove 2A-mKate cassette). The template mutations help to avoid gRNA targeting and distinguish genomic and edited reads are labeled. FIG. 30C is a summary of top GIS-seq insertion sites comparing Cas9dn and REDITdn groups, showing the expected on-target insertion site (highlighted) and reduced number of identified off-target insertion sites when using REDITdn. (Left) DYNLT1 and (Right) ACTB loci with MLE calculated from the distribution of filtered and trimmed GIS-seq reads. [00104] FIGS. 31A-31F show the dependence of REDIT gene-editing on endogenous DNA repair and applying REDIT methods for human stem cell engineering. FIG. 31A is a model showing the editing process and major repair pathways involved when using REDIT or Cas9 for gene-editing, the HDR pathway are highlighted for chemical perturbation (inhibition of RAD51). Donor DNAs with 200 + 200 bp HAs are used for all inhibitor experiments. FIGS. 31B and 31C are graphs showing the relative knock-inefficiency of REDIT tools compared with Cas9 reference treated with RAD51 inhibitor B02 and RI-1, or vehicle-treated, for the wtCas9-based REDIT and Cas9 (FIG. 31B) and for Cas9 nickase-based REDITdn and Cas9dn (FIG. 31C). All conditions were measured with 1-kb knock-in assay at two genomic loci (DYNLT1 and HSP90AA1). FIG. 31D are graphs of knock-in efficiencies in hESCs (H9) using REDIT and REDITdn tested across three genomic loci, compared with corresponding Cas9 and Cas9dn references. FIGS. 31E and 31F are flow cytometry plots of mKate knock-in results in hESCs using REDIT, REDITdn with Cas9, Cas9dn, and NTC controls. Donor DNAs in the hESC experiments have 200 + 200 bp HAs across all loci tested. [00105] FIGS. 32A-32B show chemical perturbations to dCas9 REDIT. Gene editing efficiencies were determined when treated with mammalian DNA repair pathway inhibitors (Mirin, RI-1, and B02) with (FIG. 32A) and without (FIG. 32B) cell cycle inhibitor (Thy, doubly Thymidine) blocking. Statistical analyses are from t-test results with 1% FDR via a two-stage step- up method. [00106] FIGS.33A and 33B are schematics of the DNA components (gene-editing vectors and template DNA) and tail vein injection of mice, respectively. [00107] FIGS. 34A-34C are results from the tail vein injection of mice with gene-editing vectors. FIG.34A is a schematic and gel electrophoresis of PCR analysis of liver hepatocytes from STDU2-42312.601 (S22-113) the injected mice. FIG. 34B is the Sanger sequencing results of the PCR amplicon. FIG. 34C is a schematic of next-generation sequencing and a graph of the quantification of knock-in junction errors. [00108] FIGS. 35A and 35B are schematics of the DNA components (gene-editing and control vector) and adeno-associated virus (AAV) treatment, respectively. FIG.35C is fluorescent images of lungs from AAV treated mice and graphs of corresponding quantitation of tumor number. [00109] FIGS. 36A-36C show the predicted structure of E. coli RecT (EcRecT) alone (FIG. 36A) and with bound single-strand DNA (FIG. 36B, 36C). [00110] FIGS. 37A-37B show predicted interactions of EcRecT SSAP amino acids with ssDNA. [00111] FIGS.38A-38F show development of the dCas9 gene-editor through mining microbial SSAPs. (FIG. 38A) Schematic model of dCas9 editor with single-strand annealing proteins (SSAP). (FIG. 38B) Design of the genomic knock-in assay to measure gene-editing efficiencies (left); workflow of the SSAP screening experiments (right). (FIG. 38C) Construct designs for screening gene-editing efficiency of SSAPs using the 2A-mKate knock-in assay, with an 800bp transgene. (FIG. 38D) Results of initial screen of three SSAPs: Bet protein from Lambda phage (LBet), RecT protein from Rac prophage (RacRecT), and gp2.5 from T7 phage (T7gp2.5). (FIG. 38E) Screening RecT-like SSAP candidates via metagenomic homolog mining and knock-in assay. The most active candidate is labeled as dCas9-SSAP. NTC: non-target control. Donor templates were added in all groups except the no-donor controls, with the homology arm (HA) lengths: DYNLT1, 200+200bp; HSP90AA1, 200+400bp; ACTB, 200+400bp. (FIG.38F) Measure gene-editing efficiencies using three types of donor designs with different HA lengths at DYNLT1 (left) and HSP90AA1 (right) loci in HEK293T cells. All results in this and the following figures are from replicate experiments with error bars representing standard error of the mean (S.E.M.), n = 3, unless otherwise noted. [00112] FIGS. 39A-39H show on-target and off-target editing errors of dCas9-SSAP. (FIG. 39A) Deep sequencing to measure the levels of indel formation when using dCas9-SSAP and Cas9 references at endogenous targets. The donor templates used are 200bp-HA HDR templates. Details of the assay described in Methods. (FIG. 39B) Clonal Sanger sequencing to analyze the accuracy of knock-in editing using dCas9-SSAP and Cas9 references with different HDR and MMEJ donors. The donor templates used are the 200bp-HA HDR templates and 25bp-HA MMEJ STDU2-42312.601 (S22-113) templates (Methods and Supplementary Notes). (FIG. 39C- FIG. 39E) Genome-wide detection of
Figure imgf000027_0001
insertion sites of knock-in cassette using unbiased sequencing, showing (FIG. (FIG.39D) representative reads aligned at knock-in genomic site, and (e) summary of detected on- target and off-target insertion sites. (FIG.39F- FIG.39G) workflow and results for measuring cell fitness effect as defined by percentage of live cells after editing (normalized to mock controls). (FIG. 39H) Summary analysis of knock-in accuracy of dCas9-SSAP editor, in comparison with Cas9 HDR and Cas9 MMEJ methods. Accuracy is defined as the overall yield (%) of correct knock-in within all edited outcomes (correct knock-in, knock-in with indels, and NHEJ indels). [00113] FIGS 40A-40G show validation of dCas9-SSAP editor and comparison with Cas9 reference and other HDR-enhancing methods. (FIG 40A) Comparison of efficiencies using dCas9- SSAP and other alternative Cas9, nCas9, and HDR-enhancing tools. Cas9-HE (CtIP-fusion Cas9), and Cas9-Gem (Geminin-fusion Cas9), nCas9 (Cas9-D10A nickase reference), and nCas9- hRAD51 (am improved Cas9 nickase editor). Donor templates are the same as in Fig.1. (FIG 40B) Imaging verification of mKate knock-in at endogenous genome locus using dCas9-SSAP editor. (FIG 40C) Design of knock-in donor with different lengths of transgenes. (FIG 40D) knock-in efficiencies for different transgene lengths using dCas9-SSAP editors. Donor HA lengths are 200bp+200bp for DYNLT1, 200bp+400bp for HSP90AA1. (FIG 40E) performance of dCas9-SSAP editor compared with Cas9 references across 7 endogenous loci in HEK293T cells. ND, no-donor controls; NT, non-target controls. (FIG 40F- FIG 40G) knock-in gene-editing in human embryonic stem cells (hESC, H9) using dCas9-SSAP editor, with quantified HDR efficiencies (FIG 40F) and flow cytometry analysis (FIG 40G). All statistical analysis are performed using multiple t-test to compare across all genomic targets, with 1% false-discovery rate (FDR) via a two-stage step-up method of Benjamini, Krieger and Yekutieli. [00114] FIGS 41A-41D show chemical perturbations to probe the editing mechanism of dCas9- SSAP editor. Gene-editing efficiency of dCas9-SSAP editor when treated with DNA repair pathway inhibitors (Mirin, RI1 and B02) without FIGS 41A, 41B) or with (FIGS 41C, 41D) cell cycle synchronization (DTB, double Thymidine blocking). All donor templates are the same as in Fig.38. Statistical analyses are from t-test results with 1% FDR via a two-stage step-up method of Benjamini, Krieger and Yekutieli. [00115] FIGS. 42A-42D show minimization of dCas9-SSAP editor as a compact CRISPR knock-in tool for convenient delivery. (FIG. 42A) Schematic showing the EcRecT predicted STDU2-42312.601 (S22-113) secondary structure and priming sites for constructing truncated EcRecT proteins based on the
Figure imgf000028_0001
structural prediction. (FIG. 42B) Relative knock-in efficiencies of various All groups were normalized to Cas9 references (individually for each target). (FIG. 42C) Schematic of dSaCas9-mSSAP system in AAV construct using the compact SaCas9 (left, sizes of elements not shown to scale) and (FIG. 42D) knock-in efficiencies at AAVS1 and HSP90AA1 endogenous targets via in vitro delivery of AAV2 vectors carrying the original and minimized dSaCas9-SSAP editors in HEK293T cells. [00116] FIGS. 43A-43E show gel electrophoresis and sequencing verification of knock-in- specific PCR products using dCas9-SSAP. (FIG. 43A) Agarose gel results of knock-in-specific junction PCR at DYNLT1 locus. (FIG. 43B- FIG. 43E) Sanger sequencing chromatogram of genomic junctions from knock-in experiments at DYNLT1 locus. For all samples, Applicants amplified the 5’ (FIG. 43B, FIG. 43C) and 3’ (FIG. 43D, FIG. 43E) end of genomic DNA using junction-spanning primers outside of the donor DNAs to confirm knock-in. [00117] FIG. 44 shows a phylogenetic tree and amino acid alignment of representative RecT homologs along with the protein conserved domain annotated. [00118] FIGS. 45A-45B show deep sequencing of short-sequence editing comparing dCas9- SSAP and Cas9 editors. (FIG. 45A) Donor design of 16-bp replacement at EMX1. (FIG. 45B) Analysis of precision HDR and indel editing outcomes using deep sequencing at EMX1 genomic locus. The first round of PCR used sequencing primers completely outside of the donor to ensure the sequencing results will be free from the donor template contamination, validated by the non- target control (where the donor DNAs are delivered into the cells). [00119] FIGS. 46A-46B are schematics showing the workflows used in Sanger sequencing of knock-in products (FIG. 46A) and the sequencing method used in deep on-target indel assay (FIG 46B). Assays described here correspond to Fig.41. gPCR, genomic PCR. Seq-F/seq-R are primers for Sanger sequencing binding upstream/downstream of the knock-in templates. [00120] FIGS. 47A-47B show Sanger sequencing chromatograms of genomic junctions from dCas9-SSAP experiments at DYNLT1 locus. The sequences in the red boxes were not precisely repaired. For all samples, the 5’ (FIG. 47A) and 3’ (FIG. 47B) ends of genomic DNA were amplified using junction-spanning primers to confirm knock-in precision. The genomic-binding primers used are completely outside of the donor DNAs to avoid contamination. STDU2-42312.601 (S22-113) [00121] FIGS. 48A-48B show Sanger sequencing chromatograms of genomic junctions from
Figure imgf000029_0001
dCas9-SSAP experiments at HSP90AA1 locus. For all samples, the 5’ (FIG. 48A) 48B) end of genomic DNA were amplified using junction-spanning primers to confirm knock-in precision. The genomic-binding primers used are completely outside of the donor DNAs to avoid contamination. [00122] FIGS. 49A-49B show genome-wide insertion site mapping and quantification. (FIG. 49A) Overall workflow for unbiased genome-wide insertion site mapping process. On-target and off-target insertions sites are recovered from reads that align to the reference genome (hg38). Full protocol and data analysis pipeline are detailed in Methods. (FIG.49B) Quantification of genome- wide insertion sites counting all aligned reads (with valid UMI) showed decreased insertion site abundance using Cas9-SSAP compared with Cas9 HDR, across two genomic loci (DYNLT1 and HSP90AA1). The abundance of insertion sites is measured as RPKU, or Reads Per Thousand UMIs. [00123] FIGS.50A-50B show testing of dCas9-SSAP editor tool using single-guide (FIG.50A) and dual-guide (FIG. 50B) designs across three genomic targets (shown on the top). The donor DNAs used are the same as shown in Fig. 3a with 800-bp knock-in design. [00124] FIGS. 51A-51C show validation of dCas9-SSAP knock-in efficiencies in three additional cell lines in HepG2 (FIG.51A), HeLa (FIG.51B), and U2OS (FIG.51C) cell lines. The knock-in experiments used similar donor DNA with ~800-bp cassettes encoding 2A-mKate transgene for all cell lines tested. [00125] FIGS. 52A-52C show the full set of flow cytometry analysis data using dCas9-SSAP editor for human stem cell engineering. Flow cytometry analysis of knock-in gene-editing at HSP90AA1 (FIG. 52A), ACTB (FIG. 52B), OCT4 (FIG. 52C) endogenous loci in human embryonic stem cells (hESC, H9) using dCas9-SSAP compared with non-target controls and Cas9 (Cas9 HDR) references. [00126] FIG. 53 is a schematic showing the RecT protein secondary structure predicted using an online tool (CFSSP, see Methods). The prediction results (secondary structure visualized at top, alignment at bottom) formed the basis for developing a truncated functional RecT variant. [00127] FIGS. 54A-54C show optimization of dCas9-SSAP for efficient and durable gene- editing. (FIG. 54A) Knock-in efficiencies for SSAP dosage optimization. Donor HA lengths are ~200bp for DYNLT1, ~300bp for HSP90AA1. n=3 biologically independent experiments. (FIG. STDU2-42312.601 (S22-113) 54B) Performance of dCas9-SSAP editor compared with Cas9 references across 7 endogenous loci in HEK293T cells after SSAP dosage optimization and donor HA extension. Donor HA lengths are ~200bp for DYNLT1, 673bp+750bp for HSP90AA1, 500bp+800bp for ACTB, 608bp+740bp for BCAP31, 212bp+413bp for HIST1H2BK, 705bp+602bp for CLTA, 464bp+440bp for RAB11A. All knock-in donors target the C-termini of endogenous proteins, except for CLTA/RAB11A donors which target N-termini. n=3 biologically independent experiments. (FIG. 54C) Stability of transgene expression at HSP90AA1 (left) and ACTB (right), post sorting at Day3 after dCas9-SSAP knock-in. Variable sorting efficiencies led to different starting mKate+ rates (FIG. 57). Error bars in a and b represent standard error of the mean (SEM). [00128] FIGS. 55A-55C show optimization of donor dosages and homology arms of donor DNA. (FIG.55A) Quantification of genomic mKate knock-in efficiency at DYNLT1, HSP90AA1, ACTB loci for donor dosage optimization when using dCas9-SSAP editor. non target, non-target controls. Donor HA lengths are 200bp+200bp for DYNLT1, 200bp+400bp for HSP90AA1, 200bp+400bp for ACTB. Quantification of mKate knock-in efficiency at HSP90AA1 (FIG. 55B) and ACTB (FIG.55C) locus for donor homology arm (HA) optimization when using dCas9-SSAP editor. non target, non-target controls. Donor HA lengths are 200bp+200bp or 673bp+750bp for HSP90AA1, 200bp+400bp or 500bp+800bp for ACTB. n=3 biologically independent experiments. All results in this figure are from replicate experiments with error bars representing standard error of the mean (S.E.M.). [00129] FIGS.56A-56D show validation of dCas9-SSAP editor with protein functional assays. (FIG. 56A) Design of genomic Puromycin/Blasticidin-resistance-cassette knock-in assay to validate functional on-target editing by dCas9-SSAP. (FIG. 56B) Immunoblotting confirms the presence and sizes of on-target dCas9-SSAP knock-in products at HSP90AA1 and ACTB loci, performed with anti-V5 antibody recognizing in-frame fusion with endogenous protein. Data shown represent 3 biologically independent experiments. (FIG. 56C - 56D) Validation and quantification of on-target knock-in using dCas9-SSAP via colony formation assay. Cells were selected by the knock-in resistance cassettes, then quantified using CrystalViolet staining. Scale bar=500 um. n=4 biologically independent experiments. Error bars represent standard error of the mean (SEM). [00130] FIGS. 57A-57E show validation the stability of on-target editing. (FIG. 57A) Workflow of the long-term time-course experiments to evaluate the editing outcome stability using STDU2-42312.601 (S22-113) dCas9-SSAP editor. (FIG.57B) Flow cytometry analysis of knock-in gene-editing at HSP90AA1,
Figure imgf000031_0001
ACTB endogenous loci at different time points post delivery of dCas9-SSAP and (FIG. 57C-57D) Representative crystal violet staining images for the on-target puromycin knock- in at HSP90AA1 and ACTB locus. Scale bar=500 um. The assay has been performed 3 times with similar results. (FIG. 57E) Quantification of HSP90AA1 and ACTB gene expression levels in HEK293T cells by bulk RNA-seq analysis, demonstrating significantly higher levels of HSP90AA1 expression. This led to the better cell survival in the HSP90AA1 group compared with ACTB group. Data from 2 biologically independent experiments are presented. [00131] FIG. 58 shows SSAP + Cas9 mediated knock-in editing with deactivated guide RNA (dgRNA). The SSAP + Cas9 comprises RecT and wtCas9. mKate knock-ins are depicted at DYNLT1, HSP90AA1, and ACTB. [00132] FIG. 59 shows dCas9-SSAP mediated knock-in of luciferase-expressing or mKate expressing 600-bp transgenes at the human albumin (ALB) locus (top) or the AAVS1 locus (bottom) in human HEK293T cells or human hepatocytes. Transgene knock-ins at the albumin locus were highly expressed in hepatocytes but not HEK293T (top). Transgene knock-ins at the AAVS1 locus were highly expressed in HEK293T but not hepatocytes (bottom). [00133] FIG. 60 shows dCas9-SSAP mediated knock-in of luciferase-expressing or mKate expressing 800-bp transgenes at the human albumin (ALB) locus (top) or the AAVS1 locus (bottom) in human HEK293T cells or human hepatocytes. Transgene knock-ins at the albumin locus were highly expressed in hepatocytes but not HEK293T (top). Transgene knock-ins at the AAVS1 locus were highly expressed in HEK293T but not hepatocytes (bottom). [00134] FIGS. 61A-61B shows electroporation of an RNP comprising an 800bp mKate- encoding transgene in K562 cells. Cells were electroporated with RNP comprising purified Cas9 or dCas9 protein complexed with guideRNA, a double stranded 800bp mKate transgene, with and without RecT. Knock-ins were at the HSP90AA1 (FIG. 61A) or HIST1H2BK (FIG. 61B) locus. [00135] FIG. 62 shows delivery of RNP comprising Cas9 or dCAS9, with or without SSAP to mouse primary hematopoietic stem cells (HSC) and AAV6 to knock in a GFP-expressing transgene. [00136] FIG. 63 shows transgene expression. [00137] FIGS. 64A-64D depicts SSAP-mediated knock-in of transgenes using an Rloop- forming guide without CRISPR components. (FIG. 64A) Model of guide-RNA-SSAP mediated STDU2-42312.601 (S22-113) gene editing showing MCP-MS2 aptamer pairing of SSAP and R-loop-gRNA. (FIG. 64B)
Figure imgf000032_0001
Vector/plasmid designs to express guide RNA, dsDNA donor, and SSAP. (FIG. plot showing identification of mCherry expressing subset. (FIG. 64D) Transgene knock-in at the ACTB locus using varying guide lengths (18nt, 20nt, 25nt) with and without RecT. [00138] FIGS. 65A-65B show R-loop-guide RNA design. The R-loop-guideRNA comprises two components, guide, and scaffold, depicted in guide-scaffold and scaffold-guide configurations (e.g., the guide at the 5' or 3'end of scaffold). The guide sequence is designed to match a target DNA. One or more aptamers can be incorporated, including without limitation MS2, PP7, and BoxB. MS2 is depicted. [00139] FIG. 66 shows a chimeric guide RNA comprising an MS2/PP7-aptamer. [00140] FIG. 67 shows the effect of varying guide length on knock-in efficiency at the ACTB locus comparing R-loop-SSAP (no CRISPR), Cas9 HDR, and dCas9-SSAP. Donor-only included only the mKate knock-in donor. [00141] FIG. 68 shows the effect of varying guide length on knock-in efficiency at the HIST locus comparing R-loop-SSAP (no CRISPR), Cas9 HDR, and dCas9-SSAP. Donor-only included only the mKate knock-in donor. [00142] FIG. 69 shows the effect of varying guide length on knock-in efficiency at the HSP90AA1 locus comparing R-loop-SSAP (no CRISPR), Cas9 HDR, and dCas9-SSAP. Donor- only included only the mKate knock-in donor. [00143] FIG. 70 shows R-loop-SSAP mediated knock-in of luciferase-expressing or mKate expressing 600-bp transgenes at the human albumin (ALB) locus (top) or the AAVS1 locus (bottom) in human HEK293T cells or human hepatocytes. Transgene knock-ins at the albumin locus were highly expressed in hepatocytes but not HEK293T (top). Transgene knock-ins at the AAVS1 locus were highly expressed in HEK293T but not hepatocytes (bottom). [00144] FIG. 71 shows R-loop-SSAP mediated knock-in of luciferase-expressing or mKate expressing 800-bp transgenes at the human albumin (ALB) locus (top) or the AAVS1 locus (bottom) in human HEK293T cells or human hepatocytes. Transgene knock-ins at the albumin locus were highly expressed in hepatocytes but not HEK293T (top). Transgene knock-ins at the AAVS1 locus were highly expressed in HEK293T but not hepatocytes (bottom). [00145] FIG. 72 shows schematics comparing RNA-mediated SSAP editing without reverse transcriptase (top) with RNA-mediated SSAP editing with reverse transcriptase (bottom). The STDU2-42312.601 (S22-113) RNA template/donor as depicted includes a Homology Arm (HA) region with one HA. In some
Figure imgf000033_0001
embodiments, the RNA template/donor comprises two HA regions, one at each is matched with the genomic region next to the editing site so SSAP can promote editing. [00146] FIG.73 shows insertion rate for a 4bp sequence inserted at the human EMX locus using an in vitro transcribed (IVT) RNA template. System components are 1. gE3 or sg2 or sgHE SpCas9 guideRNA targeting human EMX1; 2. dg2 of dg3 dead/deactivated guideRNA binding to a region near gE3/sg2/sgHE; and 3. 100/200HA: 100bp or 200bp HA region next to the 4bp edits, on both ends. [00147] FIG.74 shows a U6-expressed RNA template in sense or anti-sense orientation used to replace a 16bp sequence (install 16bp edits) at human EMX1. System components comprise 1. SpCas9 guideRNA targeting human EMX1; 2. dead/deactivated guideRNA binding to the region. Numbers at the top of each lane of the gel indicate lengths of homology arm (HA) region next to the edits, on both ends. [00148] FIG. 75 shows dosage relationship of a U6-expressed RNA template in sense or anti- sense orientation used to replace a 16bp sequence (install 16bp edits) at human EMX1. System components comprise 1. SpCas9 guideRNA targeting human EMX1; 2. dead/deactivated guideRNA binding to a region. [00149] FIGS. 76A-76B show a system of the invention inserted at the human AAVS1 locus (FIG. 76A) and repair of a defective Venus (green fluorescent protein) locus (FIG. 76B). [00150] FIG. 77 shows a sgRNA+ dgRNA system schematic (top) and example based on the TLR locus (bottom). SpCas9 guide_20bp sgRNA: 20bp guide for sgRNA targeting TLR used in first guideRNA. Also shown are SaCas9 guide designs. dg1-dg6: different 15bp/16bp dead/deactivated guide in dgRNA targeting TLR, used in second guideRNA with aptamer for recruitment. [00151] FIG. 78 shows a demonstration of the sgRNA+dgRNA system including signal achieved in GFP (green) channel indicating repair of Venus protein. pA19 encodes Cas9 and the guide RNA with sg318/sg530 that are two guides targeting the TLR gene-editing reporter genome region. BB is backbone, serve as negative control. Dg532/534/536/538 are different dead guideRNAs comprising an MS2/PP7 aptamer. The red box indicates the design with RNA repair template/donor for Venus at 3’end of RNA works the best. STDU2-42312.601 (S22-113) [00152] FIG. 79 shows a sgRNA+dgRNA system schematic with a direct fusion of RNA
Figure imgf000034_0001
template/donor to dgRNA. In certain embodiments, one of both of the sgRNA and can be circular RNA (e.g., a configuration where the 5’end and 3’end of the RNA is covalently linked). The circular RNA can enhance stability and efficiency. [00153] FIG.80 shows a demonstration of a system with fusion of dgRNA to the RNA template donor and SSAP. The box highlights repair of Venus significantly higher than control. [00154] FIG. 81 shows a test of pol2 (CMV) v.s. pol3 (U6) promoters TLR genomic editing. [00155] FIG. 82 shows a sgRNA+ dgRNA system example based on the EMX1 locus. gE3: 20bp guide for sgRNA targeting EMX1, used in the first guideRNA; dg1-dg8 (15bp/16bp dead/deactivated guide in dgRNA targeting EMX1, used in second guideRNA with fused RNA template/donor. [00156] FIG. 83 shows dgRNA with fusion RNA template donor and SSAP targeted at the human EMX site. pA19 has Cas9 and the guide RNA with sg334/sg516 that are two guides targeting the EMX1 gene-editing reporter genome region. BB is backbone, serving as negative control. dg518/520/522/526 are different dead guideRNAs that have MS2/PP7 aptamer and bind to nearby locations from the sg334/sg516, and has fused RNA template/donor by 36bp linker (36L). The designs are all antisense with 300bp homology arm region (a300+300). Red box: the design with optimal location of dg guideRNA supported higher editing efficiencies. [00157] FIG.84 shows a schematic of a system incorporating SSAP and prime editing. An MS2 aptamer recruits SSAP-MCP to a Cas9-RT complex. The system provides 1) locally reverse- transcribed template donor for SSAP, 2) bypasses the endogenous HDR machinery restricted to dividing cells, and 3) benefits from use of the homology arm region of the template/donor and allows SSAP editing with Cas/nCas/dCas. [00158] FIG. 85 shows SSAP + prime editing mediated editing at the HEK3 locus (top) and RFN2 locus (bottom).293T cells were transfected with a Cas9n-RT construct, pegRNA construct, nicking/recruiting sgRNA-MS2 construct, and SSAP-MCP construct. [00159] FIG. 86 shows different length edits mediated by SSAP + prime editing at the HEK3 locus (top) and RFN2 locus (bottom). 293T cells were transfected with a Cas9n-RT construct, pegRNA construct, nicking/recruiting sgRNA-MS2 construct, and SSAP-MCP construct. [00160] FIG. 87 shows a schematic of an editing system incorporating SSAP and retron. A retron-SSAP editor has three components: (1) Retron-sgRNA can be subdivided into three regions: STDU2-42312.601 (S22-113) the region of RNA that is reverse transcribed (called “msd”) and a region that remains as RNA in
Figure imgf000035_0001
the final molecule (called “msr”), and finally the guide RNA region (guide RNA or other aptamer to recruit SSAP). The gRNA region can be derived from Cas9 scaffold. This msr/msd RNA helps initiate the RT process that generates reverse-transcribed ssDNA directly linked to sgRNA. (2) The RT of a retron, which recognizes retron RNA and complete reverse transcription of the donor template (a linked RNA-DNA hybrid molecule). This RT can be fused to Cas9 or to MCP-SSAP. (3) The SSAP protein fused to MCP, optionally also fused to RT if RT is not fused to Cas9. [00161] FIGS. 88A-88D depicts SSAP array screening, showing cell viability vs. editing efficiency (fold over negative control (FIGS.88A, 88C) or percent of mKate knock-in (FIGS.88B, 88D)) for the ACTB target (FIGS. 88A, 88B) and the HSP90AA1 target (FIGS. 88C, 88D). The positive control is EcRecT. [00162] FIGS. 89A-89C depicts normalized (FIG. 89A) and absolute (FIG. 89B) editing efficiency, comparing activity at two targets, HSP90AA and ACTB. FIG.89C shows cell viability, comparing SSAP use for HSP90AA1 knock-ins with ACTB knock-ins. The positive control is EcRecT. [00163] FIGA. 90A-90D depict by scatter plot a comparison of cell viability vs. normalized (FIG.90A) or absolute (FIG.90B) editing efficiency for all targets combined. Bar graphs compare editing efficiency at two targets, HSP90 and QCTB, normalized (FIG.90C) or absolute (FIG.90D) for each of the candidates. The positive control is EcRecT. [00164] FIG. 91 depicts a tree and sequence alignment for SSAP_16 (1, SEQ ID NO:185), SSAP_10 (2, SEQ ID NO:179), SSAP_36 (3, SEQ ID NO:205), SSAP_152 (4, SEQ ID NO:321), and SSAP_184 (5, SEQ ID NO:353) compared with EcRecT (SEQ ID NO:171). See Table 12. [00165] FIG. 92 depicts a tree and sequence alignment for SSAP_16 (1, SEQ ID NO:185), SSAP_10 (2, SEQ ID NO:179), SSAP_36 (3, SEQ ID NO:205), SSAP_152 (4, SEQ ID NO:321), SSAP_184 (5, SEQ ID NO:353), SSAP_197 (6, SEQ ID NO:366), SSAP_305 (7, SEQ ID NO:424), SSAP_210 (8; SEQ ID NO:379), and SSAP_190 (9, SEQ ID NO:359) compared with EcRecT (SEQ ID NO:171). See Table 12. [00166] FIG. 93 depicts a tree and sequence alignment for SSAP_16 (1, SEQ ID NO:185), SSAP_10 (2, SEQ ID NO:179), SSAP_36 (3, SEQ ID NO:205) , SSAP_197 (6, SEQ ID NO:366), and SSAP_210 (8; SEQ ID NO:379) compared with EcRecT (SEQ ID NO:171). See Table 12. STDU2-42312.601 (S22-113) [00167] FIG. 94 depicts an evolution tree of candidate SSAPs. 296 candidates were selected
Figure imgf000036_0001
applying a set of filters and maximizing evolution distances. The SSAPs cover a phylogenetic family (branches) within the SSAP family. [00168] FIGS. 95A-95C depict editing efficiencies of 10 top-ranked SSAPs compared to EcRecT and a negative control using the dCas9 editing system. (FIG. 95A) mKate knock-ins at the ACTB locus. (FIG. 95B) mKate knock-ins at the HSP90 locus. (FIG. 95C) Scatter plot depicting editing efficiency at the ACTB target and at the HSP90AA1 target for the candidate SSAPs. [00169] FIG. 96 compares editing efficiencies in a Cas-free system of 10 top-ranked SSAPs with a negative control (pA25 expresses MCP-EBFP) and pCK914 which expresses MCP- EcRecT. Gene-editing efficiencies are for knock-in of 800bp mKate donor with homology arms, in HEK293 human cells, and a 20 base guideRNA to match the target genomic insertion site HSP90AA1 or ACTB. Constructs used are i) guideRNA with MS aptamer to recruit SSAP; ii) MCP-SSAP fusion protein; and iii) donor DNA that inserts mKate cargo (without promoter). [00170] FIGS. 97A-97C depict editing efficiencies of top SSAPs compared to EcRecT using a dCas9 editing system with a transcribed AAV donor in primary hepatocyte (mouse). The dCas9 is virally delivered separately using adeno-viral-Cas9 under the control of strong CMV promoter (Adeno-CMV-Cas9). (FIG. 97A) AAV donor designs: (top) typical AAV donor DNA; (bottom) AAV vector includes a promoter that transcribes the donor cargo into RNA. The donor RNA is transcribed in anti-sense orientation to avoid cargo expression. A 600bp luciferase cargo was knocked in at the mouse Albumin locus (FIG. 97B) or ACTB locus (FIG. 97C). [00171] FIGS.98A-98C depict AAV donor designs and editing efficiency. The dCas9 is virally delivered separately using adeno-viral-Cas9 under the control of strong CMV promoter (Adeno- CMV-Cas9). (FIG. 98A) top: 5’ release AAV design. A second guide RNA is provided to bind/cleave the 5’ end of the cargo (hsgRNA cleavage site adjacent to the left homology arm (Left HA). middle: 3’ release AAV design. A second guide RNA is provided to bind/cleave the 3’ end of the cargo (hsgRNA cleavage site adjacent to the right homology arm (Right HA). bottom: Intact AAV design. A 600bp luciferase cargo was knocked in at the mouse Albumin locus (FIG.98B) or ACTB locus (FIG. 98C). [00172] FIG. 99 depicts genome engineering across multiple human targets with SSAPs. The editing system included dCas9 (dSpCas9), guideRNA with MS2 aptamer, MCP protein fused to STDU2-42312.601 (S22-113) candidate SSAP, and donor DNA inserting a mKate fluorescent protein cargo in-frame into the
Figure imgf000037_0001
indicated endogenous genomic loci. [00173] FIGS. 100A-100C depict engineering of RecT. Model of RecT in complex with a duplex intermediate of DNA annealing. RecT and DNA strands have extensive interactions (yellow dash lines) with selected protein residues, including a core tyrosine amino acid, highlighted at top right. [00174] FIGS. 101A-101B depict a model of LiRecT showing interaction with dsDNA (highlighted) consistent with knock-in efficiency of N-terminal truncated mini-SSAP (mSSAP) (See, e.g., FIG. 42). [00175] FIG. 102 depicts a plasmid map of R2RT-Cas9-GCN4 (11495 bp). [00176] FIG. 103 depicts a plasmid map of U6-R2Bm_RNA-MS2-guideRNA (8410 bp). [00177] FIGS. 104A-104L depict a model and gating strategy for detecting in-frame 800bp knock-in encoding fluorescent protein. FIG. 104A: schematic showing dCas9, guide, knock-in donor and MCP-SSAP fusion protein. FIGS. 104B-104D: donor only; FIG. 104E-104H: dCas9 only; FIG. 104I-FIG. 104L: donor + dCas9. [00178] FIG. 105 depicts comparison of ERF family SSAPs with RecT and SSAP-16. SSAP proteins are RecT (SEQ ID NO:764), SSAP-16 (SEQ ID NO:765), D3_Orf52_Erf (SEQ ID NO:767), ERF-N10 (SEQ ID NO:779), and D3_Orf51_Exo (SEQ ID NO:766). All were tested with Cas9, nCas9, and dCas9 (left-to-right) with each SSAP. [00179] FIGS. 106A-106C depict vectors according to the invention. FIG. 106A depicts a map of an exemplary lentiviral vector for primer editor (PE) expression. FIG. 106B depicts a map of an exemplary vector for expressing MS2-pegRNA. FIG. 106C depicts a map of a vector for delivery of a reporting construct that undergoes splicing correction. [00180] FIGS. 107A-107C depict vector maps for expression of SSAP-16 (FIG. 107A), D3 (orf52)-ERF (FIG. 107B) and D3 (orf52)-EXO (FIG. 107C). [00181] FIG.108 depicts editing efficiency of recombineering system of the invention designed to include prime editors and SSAPs. SSAPs are RecT (SEQ ID NO:764), SSAP-16 (SEQ ID NO:765) and SSAP-ERF (SEQ ID NO:767). [00182] FIG. 109 depicts splice correction via systems of the invention designed to include prime editors and SSAPs. SSAPs are RecT (SEQ ID NO:764), SSAP-16 (SEQ ID NO:765) and SSAP-ERF (SEQ ID NO:767). STDU2-42312.601 (S22-113) [00183] FIG.110 depicts an exemplary circular single stranded DNA (cssDNA) donor construct
Figure imgf000038_0001
for homology directed knock-in of mCherry into the RAB11a locus. [00184] FIGS. 111A-111C depict RAB11A knock-in efficiencies using a cssDNA mKate donor, nCas9 or dCas9, and SSAPs of the invention. FIG. 111A: 90 ng donor / well. FIG. 111B: 30 ng donor / well. FIG. 111C: 10 ng donor / well. In each panel, nCas9 in columns 1-6, dCas9 in columns 7-12. SSAPs are RecT (SEQ ID NO:764), SSAP-16 (SEQ ID NO:765), SSAP D3_Orf52_Erf (SEQ ID NO:767) and SSAP ERF-N10 (SEQ ID NO:779). [00185] The following detailed description, given by way of example, but not intended to limit the invention solely to the specific embodiments described, may best be understood in conjunction with the accompanying drawings. DETAILED DESCRIPTION OF THE INVENTION [00186] The present invention is directed to a system and the components for DNA editing. In particular, the disclosed system based on CRISPR targeting and homology directed repair by phage recombination enzymes. The system results in superior recombination efficiency and accuracy on a kilobase scale. [00187] To facilitate an understanding of the present technology, a number of terms and phrases are defined below. Additional terminology is set forth throughout the detailed description. [00188] The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms, or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “and” and “the” include plural references unless the context clearly dictates otherwise. The present invention also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not. [00189] For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated. [00190] Unless otherwise defined herein, scientific, and technical terms used in connection with the present invention shall have the meanings that are commonly understood by those of ordinary skill in the art. For example, any nomenclature used in connection with, and techniques of, cell STDU2-42312.601 (S22-113) and tissue culture, molecular biology, immunology, microbiology, genetics and protein and nucleic
Figure imgf000039_0001
acid chemistry and hybridization described herein are those that are well known used in the art. The meaning and scope of the terms should be clear; in the event, however of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular. [00191] The terms “complementary” and “complementarity” refer to the ability of a nucleic acid to form hydrogen bond(s) with another nucleic acid sequence by either traditional Watson- Crick base-paring or other non-traditional types of pairing. The degree of complementarity between two nucleic acid sequences can be indicated by the percentage of nucleotides in a nucleic acid sequence which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 50%, 60%, 70%, 80%, 90%, and 100% complementary). Two nucleic acid sequences are “perfectly complementary” if all the contiguous nucleotides of a nucleic acid sequence will hydrogen bond with the same number of contiguous nucleotides in a second nucleic acid sequence. Two nucleic acid sequences are “substantially complementary” if the degree of complementarity between the two nucleic acid sequences is at least 60% (e.g., 65%, 70%, 75%, 80%, 85%, 90%, 95%. 97%, 98%, 99%, or 100%) over a region of at least 8 nucleotides (e.g., 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides), or if the two nucleic acid sequences hybridize under at least moderate, preferably high, stringency conditions. Exemplary moderate stringency conditions include overnight incubation at 37° C in a solution comprising 20% formamide, 5×SSC (150 mM NaCl, 15 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5×Denhardt’s solution, 10% dextran sulfate, and 20 mg/ml denatured sheared salmon sperm DNA, followed by washing the filters in 1×SSC at about 37-50° C., or substantially similar conditions, e.g., the moderately stringent conditions described in Sambrook et al., infra. High stringency conditions are conditions that use, for example (1) low ionic strength and high temperature for washing, such as 0.015 M sodium chloride/0.0015 M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50° C, (2) employ a denaturing agent during hybridization, such as formamide, for example, 50% (v/v) formamide with 0.1% bovine serum albumin (BSA)/0.1% Ficoll/0.1% polyvinylpyrrolidone (PVP)/50 mM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride and 75 mM sodium citrate at 42° C., or (3) employ 50% formamide, 5×SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium STDU2-42312.601 (S22-113) pyrophosphate, 5×Denhardt’s solution, sonicated salmon sperm DNA (50 μg/ml), 0.1% SDS, and 10% dextran sulfate at 42° C., with washes at (i) 42° C. in 0.2×SSC, (ii) 55° C. in 50% formamide, and (iii) 55° C. in 0.1×SSC (preferably in combination with EDTA). Additional details and an explanation of stringency of hybridization reactions are provided in, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd ed., Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (2001); and Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates and John Wiley & Sons, New York (1994). [00192] A cell has been “genetically modified,” “transformed,” or “transfected” by exogenous DNA, e.g., a recombinant expression vector, when such DNA has been introduced inside the cell. The presence of the exogenous DNA results in permanent or transient genetic change. The transforming DNA may or may not be integrated (covalently linked) into the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the transforming DNA may be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that comprise a population of daughter cells containing the transforming DNA. A “clone” is a population of cells derived from a single cell or common ancestor by mitosis. A “cell line” is a clone of a primary cell that is capable of stable growth in vitro for many generations. [00193] As used herein, a “nucleic acid” or a “nucleic acid sequence” refers to a polymer or oligomer of pyrimidine and/or purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively. The present technology contemplates any deoxyribonucleotide, ribonucleotide, or peptide nucleic acid component, and any chemical variants thereof, such as methylated, hydroxymethylated, or glycosylated forms of these bases, and the like. The polymers or oligomers may be heterogenous or homogenous in composition and may be isolated from naturally occurring sources or may be artificially or synthetically produced. In addition, the nucleic acids may be DNA or RNA, or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states. In some embodiments, a nucleic acid or nucleic acid sequence comprises other kinds of nucleic acid structures such as, for instance, a DNA/RNA helix, peptide nucleic acid (PNA), morpholino nucleic acid (see, e.g., Braasch and Corey, Biochemistry, 41(14): 4503-4510 (2002)) and U.S. Pat. STDU2-42312.601 (S22-113) No.5,034,506, incorporated herein by reference), locked nucleic acid (LNA; see Wahlestedt et al., Proc. Natl. Acad. Sci. U.S.A., 97: 5633-5638 (2000), incorporated herein by reference), cyclohexenyl nucleic acids (see Wang, J. Am. Chem. Soc., 122: 8595-8602 (2000), incorporated herein by reference), and/or a ribozyme. Hence, the term “nucleic acid” or “nucleic acid sequence” may also encompass a chain comprising non-natural nucleotides, modified nucleotides, and/or non- nucleotide building blocks that can exhibit the same function as natural nucleotides (e.g., “nucleotide analogs”); further, the term “nucleic acid sequence” as used herein refers to an oligonucleotide, nucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin, which may be single or double-stranded, and represent the sense or antisense strand. The terms “nucleic acid,” “polynucleotide,” “nucleotide sequence,” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. [00194] A “peptide” or “polypeptide” is a linked sequence of two or more amino acids linked by peptide bonds. The peptide or polypeptide can be natural, synthetic, or a modification or combination of natural and synthetic. Polypeptides include proteins such as binding proteins, receptors, and antibodies. The proteins may be modified by the addition of sugars, lipids or other moieties not included in the amino acid chain. The terms “polypeptide” and “protein,” are used interchangeably herein. [00195] As used herein, the term “percent sequence identity” refers to the percentage of nucleotides or nucleotide analogs in a nucleic acid sequence, or amino acids in an amino acid sequence, that is identical with the corresponding nucleotides or amino acids in a reference sequence after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Hence, in case a nucleic acid according to the technology is longer than a reference sequence, additional nucleotides in the nucleic acid, that do not align with the reference sequence, are not taken into account for determining sequence identity. Methods and computer programs for alignment are well known in the art, including BLAST, Align 2, and FASTA. [00196] A “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, e.g., an “insert,” may be attached or incorporated so as to bring about the replication of the attached segment in a cell. [00197] The term “wild-type” refers to a gene or a gene product that has the characteristics of that gene or gene product when isolated from a naturally occurring source. A wild-type gene is STDU2-42312.601 (S22-113) that which is most frequently observed in a population and is thus arbitrarily designated the
Figure imgf000042_0001
“normal” or “wild-type” form of the gene. In contrast, the term “modified,” or “polymorphic” refers to a gene or gene product that displays modifications in sequence and or functional properties (e.g., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product. [00198] RNA-guided CRISPR Recombineering System. In bacteria and archaea, CRISPR/Cas systems provide immunity by incorporating fragments of invading phage, virus, and plasmid DNA into CRISPR loci and using corresponding CRISPR RNAs (“crRNAs”) to guide the degradation of homologous sequences. Each CRISPR locus encodes acquired “spacers” that are separated by repeat sequences. Transcription of a CRISPR locus produces a “pre-crRNA,” which is processed to yield crRNAs containing spacer-repeat fragments that guide effector nuclease complexes to cleave dsDNA sequences complementary to the spacer. Three different types of CRISPR systems are known, type I, type II, or type III, and classified based on the Cas protein type and the use of a proto-spacer-adjacent motif (PAM) for selection of proto-spacers in invading DNA. The endogenous type II systems comprise the Cas9 protein and two noncoding crRNAs: trans-activating crRNA (tracrRNA) and a precursor crRNA (pre-crRNA) array containing nuclease guide sequences (also referred to as “spacers”) interspaced by identical direct repeats (DRs). tracrRNA is important for processing the pre-crRNA and formation of the Cas9 complex. First, tracrRNAs hybridize to repeat regions of the pre-crRNA. Second, endogenous RNaseIII cleaves the hybridized crRNA-tracrRNAs, and a second event removes the 5’ end of each spacer, yielding mature crRNAs that remain associated with both the tracrRNA and Cas9. Third, each mature complex locates a target double stranded DNA (dsDNA) sequence and cleaves both strands using the nuclease activity of Cas9. [00199] CRISPR/Cas gene editing systems have been developed to enable targeted modifications to a specific gene of interest in eukaryotic cells. CRISPR/Cas gene editing systems are commonly based on the RNA-guided Cas9 nuclease from the type II prokaryotic clustered regularly interspaced short palindromic repeats (CRISPR) adaptive immune system. Engineering CRISPR/Cas systems for use in eukaryotic cells typically involves reconstitution of the crRNA- tracrRNA-Cas9 complex. In human cells, for example, the Cas9 amino acid sequence may be codon-optimized and modified to include an appropriate nuclear localization signal, and the STDU2-42312.601 (S22-113) crRNA and tracrRNA sequences may be expressed individually or as a single chimeric molecule
Figure imgf000043_0001
via an RNA polymerase II promoter. Typically, the crRNA and tracrRNA sequences are as a chimera and are referred to collectively as “guide RNA” (gRNA) or single guide RNA (sgRNA). Thus, the terms “guide RNA,” “single guide RNA,” and “synthetic guide RNA,” are used interchangeably herein and refer to a nucleic acid sequence comprising a tracrRNA and a pre- crRNA array containing a guide sequence. The terms “guide sequence,” “guide,” and “spacer,” are used interchangeably herein and refer to the about 20 nucleotide sequence within a guide RNA that specifies the target site. In CRISPR/Cas9 systems, the guide RNA contains an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Cas9 via Watson-Crick base pairing to a target sequence. [00200] In some embodiments of the invention, there is provided a system or composition for RNA-guided recombineering utilizing tools from CRISPR gene editing systems. The system comprises: a Cas protein, a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and a recombination protein. In certain embodiments, the recombination protein comprises a microbial recombination protein. In certain embodiments, the recombination protein comprises a viral recombination protein. In certain embodiments, the recombination protein comprises a eukaryotic recombination protein. In certain embodiments, the recombination protein comprises a mitochondrial recombination protein. [00201] Cas protein families are described in further detail in, e.g., Haft et al., PLoS Comput. Biol., 1(6): e60 (2005), incorporated herein by reference. The Cas protein may be any Cas endonucleases. In some embodiments, the Cas protein is Cas9 or Cas12a, otherwise referred to as Cpf1. In one embodiment, the Cas9 protein is a wild-type Cas9 protein. The Cas9 protein can be obtained from any suitable microorganism, and a number of bacteria express Cas9 protein orthologs or variants. In some embodiments, the Cas9 is from Streptococcus pyogenes or Staphylococcus aureus. Cas9 proteins of other species are known in the art (see, e.g., U.S. Patent Application Publication 2017/0051312, incorporated herein by reference) and may be used in connection with the present invention. The amino acid sequences of Cas proteins from a variety of species are publicly available through the GenBank and UniProt databases. [00202] In some embodiments, the Cas9 protein is a Cas9 nickase (Cas9n). Wild-type Cas9 has two catalytic nuclease domains facilitating double-stranded DNA breaks. A Cas9 nickase protein is typically engineered through inactivating point mutation(s) in one of the catalytic nuclease STDU2-42312.601 (S22-113) domains causing Cas9 to nick or enzymatically break only one of the two DNA strands using the
Figure imgf000044_0001
remaining active nuclease domain. Cas9 nickases are known in the art (see, e.g., Application Publication 2017/0051312, incorporated herein by reference) and include, for example, Streptococcus pyogenes with point mutations at D10 or H840. In select embodiments, the Cas9 nickase is Streptococcus pyogenes Cas9n (D10A). [00203] In some embodiments, the Cas protein is a catalytically dead Cas. For example, catalytically dead Cas9 is essentially a DNA-binding protein due to, typically, two or more mutations within its catalytic nuclease domains which renders the protein with very little or no catalytic nuclease activity. Streptococcus pyogenes Cas9 may be rendered catalytically dead by mutations of D10 and at least one of E762, H840, N854, N863, or D986, typically H840 and/or N863 (see, e.g., U.S. Patent Application Publication 2017/0051312, incorporated herein by reference). Mutations in corresponding orthologs are known, such as N580 in Staphylococcus aureus Cas9. Oftentimes, such mutations cause catalytically dead Cas proteins to possess no more than 3% of the normal nuclease activity. [00204] In certain embodiments, the system comprises a nucleic acid molecule comprising a guide RNA sequence complementary to a target DNA sequence. The guide RNA sequence, as described above, specifies the target site with an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Cas9 via Watson-Crick base pairing to a target sequence. [00205] In certain embodiments, the system comprises a nucleic acid molecule comprising a deactivated guide RNA (dgRNA) sequence complementary to a target DNA sequence. The deactivated guide is shortened or modified such that a CRISPR complex comprising the dgRNA binds to but does not cut or nick target DNA. Non-limiting examples include guides such as are described by WO/2016/094872, which are modified in a manner which allows for formation of a CRISPR complex and successful binding to a target, while at the same time, not allowing for successful nuclease activity (e.g., without nuclease activity / without indel activity). The guide nucleic acids can be thought of as catalytically inactive or conformationally inactive with regard to nuclease activity. dgRNAs with short target recognition sequences can dramatically improve Cas9-mediated editing specificity by binding to and shielding off-target sites from an active Cas9 sgRNA complex. (Rose et al., Suppression of unwanted CRISPR-Cas9 editing by co- administration of catalytically inactivating truncated guide RNAs. Nature Communications (2020) STDU2-42312.601 (S22-113) vol.11, article 2697). Shortened / modified dgRNAs are used according to the invention to recruit
Figure imgf000045_0001
Cas9-SSAP for cleavage-free knock-in of long sequences. [00206] The terms “target DNA sequence,” “target nucleic acid,” “target sequence,” and “target site” are used interchangeably herein to refer to a polynucleotide (nucleic acid, gene, chromosome, genome, etc.) to which a guide sequence (e.g., a guide RNA) is designed to have complementarity, wherein hybridization between the target sequence and a guide sequence promotes the formation of a Cas9/CRISPR complex, provided sufficient conditions for binding exist. In some embodiments, the target sequence is a genomic DNA sequence. The term “genomic,” as used herein, refers to a nucleic acid sequence (e.g., a gene or locus) that is located on a chromosome in a cell. The target sequence and guide sequence need not exhibit complete complementarity, provided that there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA. Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell. Other suitable DNA/RNA binding conditions (e.g., conditions in a cell-free system) are known in the art; see, e.g., Sambrook, referenced herein and incorporated by reference. The strand of the target DNA that is complementary to and hybridizes with the DNA-targeting RNA is referred to as the “complementary strand” and the strand of the target DNA that is complementary to the “complementary strand” (and is therefore not complementary to the DNA-targeting RNA) is referred to as the “noncomplementary strand” or “non-complementary strand.” [00207] The target genomic DNA sequence may encode a gene product. The term “gene product,” as used herein, refers to any biochemical product resulting from expression of a gene. Gene products may be RNA or protein. RNA gene products include non-coding RNA, such as tRNA, rRNA, micro RNA (miRNA), and small interfering RNA (siRNA), and coding RNA, such as messenger RNA (mRNA). In some embodiments, the target genomic DNA sequence encodes a protein or polypeptide. [00208] In some embodiments, for instance, when the system includes a Cas9 nickase or a catalytically dead Cas 9, two nucleic acid molecules comprising a guide RNA sequence may be utilized. The two nucleic acid molecules may have the same or different guide RNA sequences, thus complementary to the same or different target DNA sequence. In some embodiments, the guide RNA sequences of the two nucleic acid molecules are complementary to a target DNA sequences at opposite ends (e.g., 3’ or 5’) and/or on opposite strands of the insert location. STDU2-42312.601 (S22-113) [00209] In some embodiments, the system further comprises a recruitment system comprising
Figure imgf000046_0001
at least one aptamer sequence and an aptamer binding protein functionally the recombination protein as part of a fusion protein. [00210] In some embodiments, the aptamer sequence is an RNA aptamer sequence. In some embodiments, the nucleic acid molecule comprising the guide RNA also comprises one or more RNA aptamers, or distinct RNA secondary structures or sequences that can recruit and bind another molecular species, an adaptor molecule, such as a nucleic acid or protein. Several CRISPR systems are compatible with guide RNA insertions and extensions, including but not limited to SpCas9, SaCas9, and LbCas12a (aka Cpf1). The RNA aptamers can be naturally occurring or synthetic oligonucleotides that have been engineered through repeated rounds of in vitro selection or SELEX (systematic evolution of ligands by exponential enrichment) to bind to a specific target molecular species. In some embodiments, the nucleic acid comprises two or more aptamer sequences. The aptamer sequences may be the same or different and may target the same or different adaptor proteins. In select embodiments, the nucleic acid comprises two aptamer sequences. [00211] Any RNA aptamer/ aptamer binding protein pair known may be selected and used in connection with the present invention (see, e.g., Jayasena, S.D., Clinical Chemistry, 1999. 45(9): p. 1628-1650; Gelinas, et al., Current Opinion in Structural Biology, 2016. 36: p. 122-132; and Hasegawa, H., Molecules, 2016; 21(4): p. 421, incorporated herein by reference). [00212] A number of RNA aptamer binding, or adaptor, proteins exist, including a diverse array of bacteriophage coat proteins. Examples of such coat proteins include but are not limited to: MS2, Qβ, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ϕCb5, ϕCb8r, ϕCb12r, ϕCb23r, 7s and PRR1. In some embodiments, the RNA aptamer binds MS2 bacteriophage coat protein or a functional derivative, fragment or variant thereof. MS2 binding RNA aptamers commonly have a simple stem-loop structure, classically defined by a 19 nucleotide RNA molecule with a single bulged adenine on the 5’ leg of the stem (Witherall G.W., et al., (1991) Prog. Nucleic Acid Res. Mol. Biol., 40, 185–220, incorporated herein by reference). However, a number of vastly different primary sequences were found to be able to bind the MS2 coat protein ( Parrott AM, et al., Nucleic Acids Res. 2000;28(2):489–497, Buenrostro JD, et al. Natura Biotechnology 2014; 32, 562-568, and incorporated herein by reference). Any of the RNA aptamer sequence known to bind the MS2 bacteriophage coat protein STDU2-42312.601 (S22-113) may be utilized in connection with the present invention to bind to fusion proteins comprising
Figure imgf000047_0001
MS2. In select embodiments, the MS2 RNA aptamer sequence AACAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO:145), AGCAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO:146), or AGCGUGAGGAUCACCCAUGCCUGCAG (SEQ ID NO:147). [00213] N-proteins (Nut-utilization site proteins) of bacteriophages contain arginine-rich conserved RNA recognition motifs of ∼20 amino acids, referred to as N peptides. The RNA aptamer may bind a phage N peptide or a functional derivative, fragment or variant thereof. In some embodiments, the phage N peptide is the lambda or P22 phage N peptide or a functional derivative, fragment or variant thereof. [00214] In select embodiments, the N peptide is lambda phage N22 peptide, or a functional derivative, fragment or variant thereof. In some embodiments, the N22 peptide comprises an amino acid sequence with at least 70% similarity to the amino acid sequence GNARTRRRERRAEKQAQWKAAN (SEQ ID NO:149). N22 peptide, the 22 amino acid RNA- binding domain of the λ bacteriophage antiterminator protein N (λN-(1–22) or λN peptide), is capable of specifically binding to specific stem-loop structures, including but not limited to the BoxB stem-loop. See, for example Cilley and Williamson, RNA 1997; 3(1):57-67, incorporated herein by reference. A number of different BoxB stem-loop primary sequences are known to bind the N22 peptide and any of those may be utilized in connection with the present invention. In some embodiments, the N22 peptide RNA aptamer sequence comprises a nucleotide sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCCCUGAAAAAGGGC (SEQ ID NO:150), GCCCUGAAGAAGGGC (SEQ ID NO:151), GCGCUGAAAAAGCGC (SEQ ID NO:152), GCCCUGACAAAGGGC (SEQ ID NO:153), and GCGCUGACAAAGCGC (SEQ ID NO:154). In some embodiments, the N22 peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 150-154. [00215] In select embodiments, the N peptide is the P22 phage N peptide, or a functional derivative, fragment or variant thereof. A number of different BoxB stem-loop primary sequences are known to bind the P22 phage N peptide and variants thereof and any of those may be utilized in connection with the present invention. See, for example Cocozaki, Ghattas, and Smith, Journal of Bacteriology 2008; 190(23):7699-7708, incorporated herein by reference. In some embodiments, the P22 phage N peptide comprises an amino acid sequence with at least 70% STDU2-42312.601 (S22-113) similarity to the amino acid sequence GNAKTRRHERRRKLAIERDTI (SEQ ID NO:155). In some embodiments, the P22 phage N peptide RNA aptamer sequence comprises a sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCGCUGACAAAGCGC (SEQ ID NO:156) and CCGCCGACAACGCGG (SEQ ID NO:157). In some embodiments, the P22 phage N peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 156-157, UGCGCUGACAAAGCGCG (SEQ ID NO:158) or ACCGCCGACAACGCGGU (SEQ ID NO:159). [00216] In certain embodiments, different aptamer/aptamer binding protein pairs can be selected to bring together a combination of recombination proteins and functions. [00217] In some embodiments, the aptamer sequence is a peptide aptamer sequence. The peptide aptamers can be naturally occurring or synthetic peptides that are specifically recognized by an affinity agent. Such aptamers include, but are not limited to, a c-Myc affinity tag, an HA affinity tag, a His affinity tag, an S affinity tag, a methionine-His affinity tag, an RGD-His affinity tag, a 7× His tag (SEQ ID NO:763), a FLAG octapeptide, a strep tag or strep tag II, a V5 tag, or a VSV-G epitope. Corresponding aptamer binding proteins are well-known in the art and include, for example, primary antibodies, biotin, affimers, single domain antibodies, and antibody mimetics. [00218] An exemplary peptide aptamer includes a GCN4 peptide (Tanenbaum et al., Cell 2014; 159(3):635-646, incorporated herein by reference). Antibodies, or GCN4 binding protein can be used as the aptamer binding proteins. [00219] In some embodiments, the peptide aptamer sequence is conjugated to the Cas protein. The peptide aptamer sequence may be fused to the Cas in any orientation (e.g., N-terminus to C- terminus, C-terminus to N-terminus, N-terminus to N-terminus). In select embodiments, the peptide aptamer is fused to the C-terminus of the Cas protein. [00220] In some embodiments, between 1 and 24 peptide aptamer sequences may be conjugated to the Cas protein. The aptamer sequences may be the same or different and may target the same or different aptamer binding proteins. In select embodiments, 1 to 24 tandem repeats of the same peptide aptamer sequence are conjugated to the Cas protein. In preferred embodiments between 4 and 18 tandem repeats are conjugated to the Cas protein. The individual aptamers may be separated by a linker region. Suitable linker regions are known in the art. The linker may be flexible or configured to allow the binding of affinity agents to adjacent aptamers without or with decreased STDU2-42312.601 (S22-113) steric hindrance. The linker sequences may provide an unstructured or linear region of the
Figure imgf000049_0001
polypeptide, for example, with the inclusion of one or more glycine and/or serine linker sequences can be at least about 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids in length. [00221] In some embodiments, the fusion protein comprises a recombination protein functionally linked to an aptamer binding protein. In some embodiments, the recombination protein comprises a microbial recombination protein. In some embodiments, the recombination protein comprises a recombinase. In certain embodiments, the recombination protein comprises 5’-3’ exonuclease activity. In certain embodiments, the recombination protein comprises 3’-5’ exonuclease activity. In certain embodiments, the recombination protein comprises ssDNA binding activity. In certain embodiments, the recombination protein comprises ssDNA annealing activity. [00222] The bacteriophage λ-encoded genetic recombination machinery, named the λ red system, comprises the exo and bet genes, assisted by the gam gene, together designated λ red genes. Exo is a 5’-3’ exonuclease which targets dsDNA and Bet is a ssDNA-binding protein. Bet functions include protecting ssDNA from degradation and promoting annealing of complementary ssDNA strands. Another bacteriophage system found in E. coli is the Rac prophage system, comprising recE and recT genes which are functionally similar to exo and bet. In some embodiments, the microbial recombination protein may be RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof. [00223] Recombination proteins and functional fragments thereof useful in the invention include nucleases, ssDNA-binding proteins (SSBs), and ssDNA annealing proteins (SSABs). Among microbial proteins, these include, without limitation, E. coli proteins such as ExoI (xonA; sbcB), ExoIII (xthA), ExoIV (orn), ExoVII (xseA, xseB), ExoIX (ygdG), ExoX (exoX), DNA polI 5’ Exo (ExoVI) (polA), DNA Pol I 3’ Exo (ExoII) (polA), DNA Pol II 3’ Exo (polB), DNA Pol III 3’ Exo (dnaQ, mutD), RecBCD (recB, recC, recD), and RecJ (recJ) and their functional fragments. [00224] While double-stranded DNA contains genetic information, use of the information involves single-stranded intermediates. Whereas the single-stranded intermediates form secondary structures and are sensitive to chemical and nucleolytic degradation, cells encode ssDNA binding proteins (SSBs) that bind to and stabilize ssDNA. Useful SSBs include, without limitation, SSBs of prokaryotes, bacteriophage, eukaryotes, mammals, mitochondria, and viruses. While SSBs are STDU2-42312.601 (S22-113) found in every organism, the proteins themselves share surprisingly little sequence similarity, and may differ in subunit composition and oligomerization states. SSB proteins may comprise certain structural features. One is use of oligonucleotide/oligosaccharide-binding (OB) domains to bind ssDNA through a combination of electrostatic and base-stacking interactions with the phosphodiester backbone and nucleotide bases. Another feature is oligomerization that brings together DNA-binding OB folds. Eukaryotic SSBs are regulated by phosphorylation on serine and threonine residues. Tyrosine phosphorylation of microbial SSBs is observed in taxonomically distant bacteria and substantially increases affinity for ssDNA. The human mitochondrial ssDNA- binding protein is structurally similar to SSB from Escherichia coli (EcoSSB), but lacks the C- terminal disordered domain. Eukaryotic replication protein A (RPA) shares function, but not sequence homology with bacterial SSB. The herpes simplex virus (HSV-1) SSB, ICP8, is a nuclear protein that, along other replication proteins is required for viral DNA replication. [00225] Without being bound by theory, it is thought that exonuclease activities and ssDNA binding activities of the recombination proteins of the invention uncover and protect single stranded regions of template and target DNAs, thereby facilitating recombination. Also, targeting can be cooperative, involving target directed CRISPR-mediated nicking of chromosomal DNA coordinated with recombination directed by homology arms designed into template DNAs. In certain embodiments of the invention, off-target effects are minimized. For example, whereas targeted recombination involves coordinated CRISPR and recombination functions, at off-target sites, homology with the HR template DNA is absent and nick repair may be favored. [00226] Single stranded DNA annealing proteins (SSAPs) also are ubiquitous among organisms with diverse sequences and have been classified into families and superfamilies by bioinformatics and experimental analysis. Moreover, phage encoded SSAPs are recognized to encode their own SSAP recombinases which substitute for classic RecA proteins while functioning with host proteins to control DNA metabolism. Steczkiewiz classified SSAPs into seven families (RecA, Gp2.5, RecT/Redβ, Erf, Rad52/22, Sak3, and Sak4) organized into three superfamilies including prokaryotes, eukaryotes, and phage (Steczkiewicz et al., 2021, Front. Microbiol 12:644622). Non- limiting examples of SSAPs that can be used according to the invention are provided in Table 7. Any one or more of the SSAPs can be employed in the invention. [00227] In certain embodiments, a microbial recombination protein is RecE or RecT, or a derivative or variant thereof. Derivatives or variants of RecE and RecT are functionally equivalent STDU2-42312.601 (S22-113) proteins or polypeptides which possess substantially similar function to wild-type RecE and RecT.
Figure imgf000051_0001
RecE and RecT derivatives or variants include biologically active amino acid sequences to the wild-type sequences but differing due to amino acid substitutions, additions, deletions, truncations, post-translational modifications, or other modifications. In some embodiments, the derivatives may improve translation, purification, biological half-life, activity, or eliminate or lessen any undesirable side effects or reactions. The derivatives or variants may be naturally occurring polypeptides, synthetic or chemically synthesized polypeptides or genetically engineered peptide polypeptides. RecE and RecT bioactivities are known to, and easily assayed by, those of ordinary skill in the art, and include, for example exonuclease and single-stranded nucleic acid binding, respectively. [00228] The RecE or RecT may be from a number of organisms, including Escherichia coli, Pantoea breeneri, Type-F symbiont of Plautia stali, Providencia sp. MGF014, Shigella sonnei, Pseudobacteriovorax antillogorgiicola, among others. Other non-limiting sources include Desulfotalea psychrophila, Lactococcus lactis, Flavobacterium psychrophilum, Mycobacterium smegmatis, Lactobacillus rhamnosus, Psychrobacter arcticus, Psychrobacter cryohalolentis , Psychromonas ingrahamii, Photobacterium profundum, Psychroflexus torquis, and Caulobacter crescentus. In certain embodiments, the RecE and RecT protein is derived from Escherichia coli. [00229] In some embodiments, the fusion protein comprises RecE, or a derivative or variant thereof. The RecE, or derivative or variant thereof, may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-8. The RecE, or derivative or variant thereof, may comprise an amino acid sequences with at least 70% (e.g., 75%., 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8. In select embodiments, the RecE, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8. In exemplary embodiments, the RecE, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-3. [00230] In some embodiments, the fusion protein comprises RecT, or a derivative or variant thereof. The RecT, or derivative or variant thereof, may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 9-14. The RecT, or derivative or variant thereof, may comprise an amino acid sequences with at least 70% (e.g., 75%., 80%, 85%, 90%, 91%, 92%, STDU2-42312.601 (S22-113) 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected
Figure imgf000052_0001
from the group consisting of SEQ ID NOs: 9-14. In select embodiments, the RecT, or or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14. In exemplary embodiments, the RecT, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NO:9. [00231] In certain embodiments, the fusion protein comprises a recombination protein comprising an amino acid sequence at least 75% similar, or at least 75% identical to a recombination protein of SEQ ID NO:166 to SEQ ID NO:491, a recombination protein of Table 9, a recombination protein of SEQ ID NO:179, SEQ ID NO:185, SEQ ID NO:205, SEQ ID NO:321, SEQ ID NO:353, SEQ ID NO:359, SEQ ID NO:366, SEQ ID NO:424, or SEQ ID NO:479, or a recombination protein of SEQ ID NO:166, SEQ ID NO:168, SEQ ID NO:169, SEQ ID NO:170, SEQ ID NO:171, SEQ ID NO:241, SEQ ID NO:253, SEQ ID NO:290, SEQ ID NO:408, SEQ ID NO:411, or SEQ ID NO:442. In certain embodiments the fusion protein comprises a recombination protein comprising a sequence having at least 80%, at least 85%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 98.5%, at least 99%, at least 99.5%, or 100% similarity or identity to the above referenced recombination proteins. Truncations may be from either the C-terminal or N-terminal ends, or both. For example, as demonstrated in Example 6 below, a diverse set of truncations from either end or both provided a functional product. In some embodiments, one or more (2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120 or more) amino acids may be truncated from the C-terminal, N-terminal ends as compared to the wild-type sequence. [00232] In some embodiments, the recombination protein comprises a tyrosine recombinase or functional fragment thereof. In some embodiments, the recombination protein comprises a serine recombinase or functional fragment thereof. In some embodiments, the recombination protein comprises an integrase, resolvase, or invertase, or functional fragment thereof. In some embodiments, the recombinase protein comprises a site-specific recombinase protein or functional fragment thereof. In some embodiments, the recombination protein comprises an exonuclease or functional fragment thereof. In some embodiments, the recombination protein comprises an ssDNA-binding protein or functional fragment thereof. In certain embodiments, the fusion protein STDU2-42312.601 (S22-113) comprises without limitation, Hin, Gin, Tn3, β/six, CinH, Min, ParA, γδ, Bxb1, φC31, TP901-1,
Figure imgf000053_0001
TGI, Wβ, φ370.1, φK38, φBTl, R4, φRVl, φFCl, MR11, A118, U153, Bxz2, gp29, Cre, Dre, Vika, Flp, Kw, SprA, HK022, P22, L1, or L5 or a homolog of any of such proteins or functional fragment thereof. Such recombinases, which may be classified in the art as integrases, resolvases, or invertases, may share substructures and activities with exonucleases and SSBs and be used according to the invention. [00233] The invention provides a system which comprises a reverse transcriptase, a guide nucleic acid, and a recombination protein, and optionally a Cas protein. [00234] The term “reverse transcriptase” describes a class of polymerases characterized as RNA-dependent DNA polymerases. All known reverse transcriptases require a primer to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA which can then be cloned into a vector for further manipulation. Avian myoblastosis virus (AMV) reverse transcriptase was the first widely used RNA-dependent DNA polymerase (Verma, Biochim. Biophys. Acta 473:1 (1977)). The enzyme has 5 '-3 ' RNA-directed DNA polymerase activity, 5 -3 ' DNA-directed DNA polymerase activity, and RNase H activity. RNase H is a processive 5' and 3' ribonuclease specific for the RNA strand for RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, New York: Wiley & Sons (1984)). Errors in transcription cannot be corrected by reverse transcriptase because known viral reverse transcriptases lack the 3'-5' exonuclease activity necessary for proof-reading (Saunders and Saunders, Microbial Genetics Applied to Biotechnology, London: Croom Helm (1987)). A detailed study of the activity of AMV reverse transcriptase and its associated RNase H activity has been presented by Berger et al, Biochemistry 22:2365-2372 (1983). Another reverse transcriptase which is used extensively in molecular biology is reverse transcriptase originating from Moloney murine leukemia virus (M-MLV). See, e.g., Gerard, G. R., DNA 5:271-279 (1986) and Kotewicz, M. L. et al., Gene 35:249-258 (1985). M-MLV reverse transcriptase substantially lacking in RNase H activity has also been described. See, e.g., U.S. Pat. No. 5,244,797. The invention contemplates the use of any such reverse transcriptases, or variants or mutants thereof (RT). With regard to RT, and linkers or ways to functionally link components of embodiments of the invention, such as the RT system or composition of the invention (as well as with regard to linkers or ways to functionally link components of systems or compositions discussed herein that do not involve RT) mention is made of WO2020/191241, WO2020/191153, WO2020/191245, STDU2-42312.601 (S22-113) WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239,
Figure imgf000054_0001
WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and that involve what is known as prime editing and twin prime editing. Each of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 is hereby incorporated herein by reference. RTs of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 can be used in the practice of the present invention. Linkers or ways to functionally link of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 can be used in the practice of the present invention. Mention is also made of US Patent Publications US2014/0349400 and US2018/0298391 both incorporated herein by reference, which involve systems containing Cas9, a reverse transcriptase, guide RNA and RNA for the activity of the reverse transcriptase; and reverse transcriptases and other aspects of these earlier systems can be used in the practice of the present invention. [00235] WO/2020/191153 describes a system comprising a CRISPR protein (e.g. a Cas9 nickase) and a reverse transcriptase for use with a guide RNA that specifies a target site and templates synthesis of a desired edit in the form of a replacement DNA strand by way of an extension (either DNA or RNA) engineered onto a guide nucleic acid (e.g., at the 5' or 3' end, or at an internal portion of a guide RNA). Through DNA repair and/or replication machinery, the endogenous strand of the target site is replaced by the newly synthesized replacement strand containing the desired edit. The invention provides single stranded binding protein (e.g., SSAP or SSB) used with a reverse transcriptase to edit without CRISPR-mediated nicking or cleavage or target DNA. [00236] Background regarding RT systems and compositions: Current genome editing technology is limited by the low efficiency and accuracy for precision editing leading to very unreliable ability for using current tools such as CRISPR system to introduce accurate replacement, deletion, or insertion in mammalian cells. The usual process involves delivery of gene editing tool (like CRISPR) and DNA repair template for introducing desirable changes to STDU2-42312.601 (S22-113) genome sequence. However, the DNA delivered into the cell could insert non-specifically into off-
Figure imgf000055_0001
target genomic loci or unintended targets, leading to major challenge for ensuring gene editing for therapeutic purposes. [00237] Description regarding RT systems and compositions: Here Applicants describe the invention using RNA as a molecular entity to mediate gene editing. Applicants designed and validated components of systems and methods to apply RNA as template (donor) to insert, delete, replace, or control genomic DNA sequences, mediated through the activity of SSAP (single-strand annealing protein, exemplified by RecT, lambda Red, T7gp2.5). [00238] In a first embodiment Applicants here show the efficiency of gene editing through the process of delivering three components into a cell: (1) Applicants introduced local DNA cleavage, nicking, or R-loop-formation using the CRISPR system composed of CRISPR enzymes (corresponds to Cas9/Cas9n/dCas9 or Cas12a/nCas12a/dCas12a respectively for cleavage/nick/R- loop-formation), and a guide RNA, where the guideRNA contains aptamer (such as MS2, or PP7, or BoxB) to recruit SSAP protein; (2) an RNA sequence bearing the desirable DNA changes with one or more homology arm (HA) region(s) that is either fused/linked to the guide RNA in (1), or fused/linked to a second guide RNA. The HA region is at least 20bp and provides a homology region next to the editing site for SSAP-mediated editing. In using a second guide RNA, this second guideRNA binds to a nearby genomic site, located between 0bp to 150bp away from the guide RNA in (1). This second guide RNA then forms a complex with CRISPR enzymes (such as Cas9/nCas9/dCas9 and Cas12a/nCas12a/dCas12a), and be recruited to the target genomic loci, and serve to provide RNA template/donor for the editing. The enzymes are either regular CRISPR enzymes or Cas proteins, but could also be nicking or deactivated CRISPR enzymes (dCas9, dCas12a, etc.) that only binds to target loci. The guide is regular guide RNA or shorter guide RNA (typically 2~6bp shorter than the regular guide RNA, so 14bp to 18bp) to allow efficient binding but not cleavage of targets. (3) SSAP protein fused to an RNA-aptamer-binding protein (RBP) via linker. The RBP is MS2 coat protein (MCP), PP7 coat protein (PCP), or BoxB binding peptide from lambda phage (lambda N22 peptide). For this component, Applicants also identified an additional factor that enhances this RNA-templated SSAP gene-editing: when Applicants fuse a reverse transcriptase (RT) to the SSAP protein via a long peptide linker, making this third component RBP-SSAP-RT, or RBP-RT-SSAP (- represent linkers), this further enhance editing efficiencies. STDU2-42312.601 (S22-113) [00239] In the second embodiment, the Cas9/nCas9/dCas9 or Cas12a/nCas12a/dCas12a protein
Figure imgf000056_0001
is fused via linker to a reverse transcriptase (RT). The guide RNA in this design also a binding-site (PBS) of at least 14-bp or more, which is complementary to a region at the editing site. This PBS helps to initiate RT activity. Alternatively, another design uses the same guide RNA as in the first embodiment, and to initiate RT activity, and a short oligo DNA (length is 14bp or more) that is complementary to a region at the editing site is supplied to the cell. This oligo DNA initiates RT activity and allows SSAP-mediated gene-editing. [00240] In the third embodiment, the Cas9/nCas9/dCas9 or Cas12a/nCas12a/dCas12a protein is fused via linker to a reverse transcriptase (RT) from a retron system. The guide RNA in this design has a msr/msd sequence from retron, and also one or more homology arm (HA) region(s), which is complementary to a region at the editing site. The msr/msd sequence helps to initiate RT activity. The HA region helps to mediate SSAP gene-editing. [00241] Overall, this suite of tools and methods provides a novel and nonobvious RNA- mediated/RNA-templated gene editing in eukaryotic/mammalian cells. Applicants further demonstrated that through designing cleavable RNA template using endogenous tRNA, ribozyme, or the direct repeat from Cas12a system, Applicants also achieve multiple-target gene editing using RNA as template. [00242] Description regarding Prime Editing systems and compositions: Here Applicants describe the invention using SSAPs to enhance editing methods that employ RNA as a molecular entity with reverse transcriptases to mediate gene editing. Applicants designed and validated components of systems and methods that apply RNA in prime editing as a template (donor) to insert, delete, replace, or control genomic DNA sequences. Prime-editing, has been generally described by Anzalone et al. (Anzalone et al., Nature. 2019; 576:149-157). Prime-editors use an engineered reverse transcriptase fused to Cas9 nickase and a prime-editing guide RNA (pegRNA).The pegRNA differs from regular sgRNAs and plays a major role in the system's function. The pegRNA contains not only (a) the sequence complimentary to the target sites that directs nCas9 to its target sequence, but also (b) an additional sequence spelling the desired sequence changes (Anzalone et al., Nature. 2019; 576:149-157). The 5′ of the pegRNA binds to the primer binding site (PBS) region on the DNA, exposing the non-complimentary strand. The unbound DNA of the PAM-containing strand is nicked by Cas9, creating a primer for the reverse transcriptase (RT) that is linked to nCas9. The nicked PAM-strand is then extended by the RT by STDU2-42312.601 (S22-113) using the interior of the pegRNA as a template, consequently modifying the target region in a
Figure imgf000057_0001
programmable manner. The result of this step is two redundant PAM DNA flaps: the that was reverse transcribed from the pegRNA and the original, unedited 5′ flap. The choice of which flap hybridizes with the non-PAM containing DNA-strand is an equilibrium process, in which the perfectly complimentary 5′ would likely be thermodynamically favored. However, the 5′ flaps are preferentially degraded by cellular endonucleases that are ubiquitous during lagging- strand DNA synthesis (Hosfield et al., Cell.1998; 95:135-146). Finally, the resulting heteroduplex containing the unedited strand and edited 3′ flap is resolved and stably integrated into the host genome via cellular replication and repair process. [00243] The first generation of PEs (PEI) was comprised of Moloney murine leukemia virus reverse transcriptase (M-MLV RT), linked to the C-terminus of nCas9 and pegRNA, which was expressed on a second plasmid. The efficiency of PEI reached maximum editing efficiency of 0.7- 5.5% (Anzalone et al., Nature. 2019; 576:149-157). To further enhance the efficiency of the reverse transcriptase, Anzalone and colleagues tested different M-MLV RT variants that have been shown to enhance binding, enzyme processivity, and thermostability. As was previously applied to enhance editing in CBE and ABE systems, a separate sgRNA was directed to introduce a nick in the non-edited strand, thus directing DNA repair to that strand using the edited strand as a template. This yielded another generation prime-editor, designated PE3, which performed all 12 possible transition and transversion mutations (24 single-nucleotide substitutions) with average editing efficiencies of 33% (±7.9%) (Anzalone et al., Nature. 2019; 576:149-157). The number of off-target effects observed with PEs was reduced, likely due to the need for complementation at Cas9 binding, PBS binding, and RT product complementation for flap resolution (Anzalone et al., Nature. 2019; 576:149-157). Prime-editing shows other advantages over previous CRISPR- mediated base-editing approaches, including less stringent PAM requirements due to the varied length of the RT template and reduced “bystander” editing. [00244] In certain embodiments, prime editor systems use Cas9 nuclease instead of a Cas9 nickase. (See, e.g. Adikusuma, F. et al., Nucleic Acids Research. 2021; 49(18):10785-10795). In certain embodiments, prime editor systems employ two or more prime editors (e.g. “twin prime editing”) which operate on both strands of a target DNA to promote editing of large pieces of DNA or strands of different targets to promote recombination (Anzalone et al., Nature Biotechnology. 2021; 40(5):731-740). The prime editor designated “PEmax” includes optimizations of the PE2 STDU2-42312.601 (S22-113) protein by varying RT codon usage, SpCas9 mutations, NLS sequences, and the length and
Figure imgf000058_0001
composition of peptide linkers between nCas9 and RT (Chen, PJ et al., Cell. 2021; 552). [00245] In certain embodiments, for example, the targeted modifications may be introduced using recombination proteins of the invention with a technique described in US Patent Nos. 11932884, 11898179, 11795443, 11732274, 11560566, 11542509, 11542496, 11447770 and/or 11268082 and/or International Patent Publications WO2015/089406, WO2017/070632, WO2018/176009, WO2020/191241, WO2020/191248, WO2021/226558, WO2021/072328 and/or WO2021/155065. [00246] Features regarding RT systems and compositions: There are 5 advantages of Applicants’ RNA-templated SSAP gene editing system: (1) it has reduced off-target or toxicity due to RNA and is less immunogenic compared with DNA used in existing gene editing process, and also that RNA cannot be integrated directly into unintended genomic DNA sites or off-target DNA sites; (2) Applicants easily multiplex the precision gene editing methods by using cleavable RNA template in Applicants’ methods; (3) RNA is easier to delivery into cells, it is easier to manufacture, less expensive to scale up for clinical usage; (4) RNA has a lot of engineering potential by combining other regulatory or combinatorial payload/components via chemical linkage or biochemical coupling, to enable more efficiency delivery, editing, or synergistic action of RNA-templated gene editing with other type of gene editing or therapeutic modalities; and (5) the efficiency of RNA-templated gene editing could be enhanced via RNA and protein factors and is orthogonal to regular DNA-repair pathways that may be critical for health of target cells. [00247] RNA-guided Recombination Protein System. In certain embodiments or the invention, there is provided a system or composition for RNA-guided recombineering that does not rely primarily on CRISPR proteins. In such embodiments, the system or composition comprises: a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and a recombination protein. In certain embodiments, the system or composition is capable of promoting R-loop formation. In certain embodiments, the system or composition is capable of recombination. In certain embodiments, the system or composition is free of CRISPR proteins. In certain embodiments, the recombination protein comprises a microbial recombination protein. In certain embodiments, the recombination protein comprises a viral recombination protein. In certain embodiments, the recombination protein comprises a eukaryotic STDU2-42312.601 (S22-113) recombination protein. In certain embodiments, the recombination protein comprises a
Figure imgf000059_0001
mitochondrial recombination protein. In various embodiments, the comprises a single stranded DNA annealing protein (SSAP), a single stranded DNA binding protein (SSB), an exonuclease, or a combination of two or more thereof. In certain embodiments, the system or composition does not comprise a Cas9. In certain embodiments, the system or composition does not comprise a Cas12a. In certain embodiments, the system or composition does not comprise a Cas. In certain embodiments, the system or composition does not comprise a CRISPR. [00248] Without being bound by theory, the system can be thought of as comprising a guide nucleic acid that promotes R-loop formation by binding to target DNA and a recombination protein that promotes recombination between the target nucleic acids and donor nucleic acids. [00249] In certain embodiments, the guide RNA and the recombination protein are effectively linked. In some embodiments, the linkage is covalent. In some embodiments, the linkage is non- covalent. In certain embodiments, the guide nucleic acid comprises an aptamer sequence and the recombination protein comprises or is joined to an aptamer binding domain. The following table provides non-limiting examples of R-loop guide nucleic acids used in the invention. Table 1. R-loop guide nucleic acid sequences c cc
Figure imgf000059_0002
STDU2-42312.601 (S22-113) RL-gRNA- R-loop guideRNA double boxB AAACAAACggccGCCCTGAAGAAGGGC simple- (AAACAAC linker, one boxB ggccCTGTCTCTCgccagcGCCCTGAAGAA G G A A C
Figure imgf000060_0001
STDU2-42312.601 (S22-113) RL-gRNA- R-loop guideRNA design3 gtttAagagctaggccAACATGAGGATC ms2-3 (enhanced scaffold with two MS2 ACCCATGTCTGCAGggcctagcAAG
Figure imgf000061_0001
STDU2-42312.601 (S22-113) [00250] The RLoop-guideRNA comprises a guide component and a scaffold component in
Figure imgf000062_0001
various arrangements, e.g., guide-scaffold and scaffold-guide embodiments, RLoop-guideRNA comprises the guide at 5' end of scaffold. In certain embodiments, RLoop-guideRNA comprises the guide at the 3’ end of scaffold. [00251] The guide sequence is engineered to bind to target DNA (genome target). In certain embodiments, the guide is from 17 to 160 bases. The scaffold comprises one or more of an aptamer sequence. Aptamers used in the invention include, without limitation, MS2, PP7, BoxB, and others. In certain embodiments, the fusion protein comprises an RNA binding component that binds to an aptamer such as is described above and an SSAP protein such as but not limited to RecT, LambdaRed, T7gp2.5, and others. [00252] Donor nucleic acids can be single-strand or double-stranded DNA and comprise (1) various lengths of homology arms (HA) to match a genomic target region, and (2) a transgene, e.g., knock-in sequence or replacement sequence etc. There is no limit to the size of the transgene. Insertions of 600-bp (FIG. 70) and 800-bp (FIG. 71) are exemplified herein. [00253] In certain embodiments, an RLoop-guideRNA binds to an RNA-binding-protein or domain fused to a recombination protein such as but not limited to SSAP. [00254] In various embodiments, the invention provides fusion proteins. In a fusion protein, a recombination protein may be linked to either terminus of an aptamer binding protein in any orientation (e.g., N-terminus to C-terminus, C-terminus to N-terminus, N-terminus to N-terminus). In select embodiments, a recombination protein N-terminus is linked to the aptamer binding protein C-terminus. Thus, the overall fusion protein from N- to C-terminus comprises the aptamer binding protein (N- to C-terminus) linked to the recombination protein (N- to C-terminus). [00255] In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an endonuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an exonuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an endonuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be functionally linked STDU2-42312.601 (S22-113) as a fusion protein or chimera or chimeric molecule to an exonuclease and/or a Cas or dCas. In
Figure imgf000063_0001
some embodiments, the recombination protein may be expressed independently a protein with a nuclease. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with an endonuclease. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with an exonuclease. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with a nuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an aptamer and/or aptamer binding protein. In some embodiments, the recombination protein may be expressed independently, not as a fusion protein, with an aptamer and/or aptamer binding protein. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease and/or Cas or dCas and/or to an aptamer and/or aptamer binding protein. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with a nuclease and/or a Cas or dCas and/or an aptamer and/or aptamer binding protein. In some embodiments, the aptamer and/or aptamer binding protein is an MCP protein. In some embodiments the recombination protein may be an SSAP. [00256] The term “nuclease” as used herein, refers to an agent, such as a protein or small molecule, that is capable of cleaving phosphodiester bonds that join nucleotide residues in a nucleic acid molecule. In some embodiments, the nuclease is but woven, e.g., an enzyme that is capable of binding to a nucleic acid molecule and cleaving phosphodiester bonds linking nucleotide residues in the nucleic acid molecule. The nuclease may be an endonuclease, which cleaves a phosphodiester bond in a polynucleotide strand, or an exonuclease, which cleaves a phosphodiester bond at the end of a polynucleotide strand. In some embodiments, the nuclease is a site-specific nuclease that binds to and/or cleaves a particular phosphodiester bond within a particular nucleotide sequence, which is also referred to herein as a “recognition sequence,” “nuclease target site,” or “target site.” In some embodiments, the nuclease is an RNA-guided (e.g., RNA-programmable) nuclease that complexes (e.g., binds) to RNA having a sequence complementary to the target site, thereby providing sequence specificity of the nuclease. In some embodiments, the nuclease recognizes a single-stranded target site, while in other embodiments, the nuclease recognizes a double-stranded target site, e.g., a double-stranded DNA target site. STDU2-42312.601 (S22-113) Target sites for many naturally occurring nucleases, for example many naturally occurring DNA restriction nucleases, are well known to those skilled in the art. In many cases, DNA nucleases such as EcoRI, HindIII or BamHI recognize palindromic double-stranded DNA target sites that are 4 to 10 base pairs in length and cut each of the two DNA strands at specific positions within the target site. Some endonucleases symmetrically cleave a double-stranded nucleic acid target site, e.g., cleave both strands at the same position, such that the ends comprise base-paired nucleotides, also referred to herein as blunt ends. Other endonucleases cleave double-stranded nucleic acid target sites asymmetrically, e.g., each strand is cleaved at a different position such that the ends contain unpaired nucleotides. Unpaired nucleotides at the ends of a double-stranded DNA molecule are also referred to as “overhangs.” e.g., “5'-overhangs” or “3'-overhangs,” depending on whether the unpaired nucleotide forms the 5' or 3' end of the corresponding DNA strand. The ends of a double-stranded DNA molecule that terminate in unpaired nucleotides are also referred to as sticky ends, so they can “stick” to the ends of other double-stranded DNA molecules that contain complementary unpaired nucleotides. Nuclease proteins typically comprise a “binding domain” that mediates interaction of the protein with a nucleic acid substrate (in some cases also specifically binding to a target site) and a “cleavage domain” that catalyzes the cleavage of phosphodiester bonds within the nucleic acid backbone. In some embodiments, the nuclease protein is capable of binding and cleaving a nucleic acid molecule in a monomeric form, while in other embodiments, the nuclease protein must dimerize or otherwise cleave a target nucleic acid molecule. Binding and cleavage domains of naturally occurring nucleases, as well as mode binding and cleavage domains that can be fused to create nucleases, are well known to those of skill in the art. For example, a zinc finger or transcriptional activator-like element can be used as a binding domain to specifically bind a desired target site and fused or conjugated to a cleavage domain, such as the cleavage domain of fokl, to create an engineered nuclease that cleaves the target site. [00257] Non-limiting examples of an exonuclease include exonuclease I, exonuclease II, exonnuclease III, exonuclease IV, exonuclease V, exonuclease VII, exonuclease VIII, lambda exonuclease, Xrn1, mung bean nuclease, TREX2, exonuclease T, T7 exonuclease, strandase exonuclease, 3’-5’ exophosphodiesterase, and Bal31 nuclease. [00258] In some embodiments, the fusion protein further comprises a linker between the recombination protein and the aptamer binding protein. The linkers may comprise any amino acid sequence of any length. The linkers may be flexible such that they do not constrain either of the STDU2-42312.601 (S22-113) two components they link together in any particular orientation. The linkers may essentially act as
Figure imgf000065_0001
a spacer. In select embodiments, the linker links the C-terminus of the to the N-terminus of the aptamer binding protein. In select embodiments, the linker comprises the amino acid sequence of the 16-residue XTEN linker, SGSETPGTSESATPES (SEQ ID NO:15) or the 37-residue EXTEN linker, SASGGSSGGSSGSETPGTSESATPESSGGSSGGSGGS (SEQ ID NO:148). [00259] In some embodiments, the fusion protein further comprises a nuclear localization sequence (NLS). The nuclear localization sequence may be at any location within the fusion protein (e.g., C-terminal of the aptamer binding protein, N-terminal of the aptamer binding protein, C-terminal of the recombination protein). In select embodiments, the nuclear localization sequence is linked to the C-terminus of the recombination protein. A number of nuclear localization sequences are known in the art (see, e.g., Lange, A., et al., J Biol Chem.2007; 282(8): 5101-5105, incorporated herein by reference) and may be used in connection with the present invention. The nuclear localization sequence may be the SV40 NLS, PKKKRKV (SEQ ID NO:16); the Ty1 NLS, NSKKRSLEDNETEIKVSRDTWNTKNMRSLEPPRSKKRIH (SEQ ID NO:17); the c-Myc NLS, PAAKRVKLD (SEQ ID NO:18); the biSV40 NLS, KRTADGSEFESPKKKRKV (SEQ ID NO:19); and the Mut NLS, PEKKRRRPSGSVPVLARPSPPKAGKSSCI (SEQ ID NO:20). In select embodiments, the nuclear localization sequence is the SV40 NLS, PKKKRKV (SEQ ID NO:16). [00260] The Cas protein and the fusion protein are desirably included in a single composition alone, in combination with each other, and/or the polynucleotide(s) (e.g., a vector) comprising the guide RNA sequence and the aptamer sequence. The Cas protein and/or the fusion protein may or may not be physically or chemically bound to the polynucleotide. The Cas protein and/or the recombination protein can be associated with a polynucleotide using any suitable method for protein-protein linking or protein-virus linking known in the art. [00261] The invention further provides compositions and vectors comprising a polynucleotide comprising a nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an RNA aptamer binding protein. [00262] The compositions or vectors may further comprise at least one or both of a polynucleotide comprising a nucleic acid sequence encoding a Cas protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence. In STDU2-42312.601 (S22-113) some embodiments, the nucleic acid molecule comprising a guide RNA sequence further comprises at least one RNA aptamer sequence. In some embodiments, the polynucleotide comprising a nucleic acid sequence encoding a Cas protein further comprises a sequence encoding at least one peptide aptamer sequence. [00263] Descriptions of the nucleic acid molecule comprising a guide RNA sequence, the aptamer sequences, the Cas proteins, the recombination proteins, and the aptamer binding proteins set forth above in connection with the inventive system also are applicable to the polynucleotides of the recited compositions and vectors. [00264] The nucleic acid sequence encoding the Cas protein and/or the nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein can be provided to a cell on the same vector (e.g., in cis) as the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence. In such embodiments, a unidirectional promoter can be used to control expression of each nucleic acid sequence. In another embodiment, a combination of bidirectional and unidirectional promoters can be used to control expression of multiple nucleic acid sequences. [00265] In other embodiments, a nucleic acid sequence encoding the Cas protein, the nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein, and the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence can be provided to a cell on separate vectors (e.g., in trans). Each of the nucleic acid sequences in each of the separate vectors can comprise the same or different expression control sequences. The separate vectors can be provided to cells simultaneously or sequentially. [00266] The vector(s) comprising the nucleic acid sequences encoding the Cas protein and encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein can be introduced into a host cell that is capable of expressing the polypeptide encoded thereby, including any suitable prokaryotic or eukaryotic cell. As such, the invention provides an isolated cell comprising the vector or nucleic acid sequences disclosed herein. Preferred host cells are those that can be easily and reliably grown, have reasonably fast growth rates, have well characterized expression systems, and can be transformed or transfected easily and efficiently. Examples of suitable prokaryotic cells include, but are not limited to, cells from the genera Bacillus (such as Bacillus subtilis and Bacillus brevis), Escherichia (such as E. coli), STDU2-42312.601 (S22-113) Pseudomonas, Streptomyces, Salmonella, and Envinia. Suitable eukaryotic cells are known in the
Figure imgf000067_0001
art and include, for example, yeast cells, insect cells, and mammalian cells. yeast cells include those from the genera Kluyveromyces, Pichia, Rhino-sporidium, Saccharomyces, and Schizosaccharomyces. Exemplary insect cells include Sf-9 and HIS (Invitrogen, Carlsbad, Calif.) and are described in, for example, Kitts et al., Biotechniques, 14: 810-817 (1993); Lucklow, Curr. Opin. Biotechnol., 4: 564-572 (1993); and Lucklow et al., J. Virol., 67: 4566-4579 (1993), incorporated herein by reference. Desirably, the host cell is a mammalian cell, and in some embodiments, the host cell is a human cell. A number of suitable mammalian and human host cells are known in the art, and many are available from the American Type Culture Collection (ATCC, Manassas, Va.). Examples of suitable mammalian cells include, but are not limited to, Chinese hamster ovary cells (CHO) (ATCC No. CCL61), CHO DHFR-cells (Urlaub et al., Proc. Natl. Acad. Sci. USA, 97: 4216-4220 (1980)), human embryonic kidney (HEK) 293 or 293T cells (ATCC No. CRL1573), and 3T3 cells (ATCC No. CCL92). Other suitable mammalian cell lines are the monkey COS-1 (ATCC No. CRL1650) and COS-7 cell lines (ATCC No. CRL1651), as well as the CV-1 cell line (ATCC No. CCL70). Further exemplary mammalian host cells include primate, rodent, and human cell lines, including transformed cell lines. Normal diploid cells, cell strains derived from in vitro culture of primary tissue, as well as primary explants, are also suitable. Other suitable mammalian cell lines include, but are not limited to, mouse neuroblastoma N2A cells, HeLa, HEK, A549, HepG2, mouse L-929 cells, and BHK or HaK hamster cell lines. Methods for selecting suitable mammalian host cells and methods for transformation, culture, amplification, screening, and purification of cells are known in the art. [00267] Methods of Altering Target DNA. The invention also provides a method of altering a target DNA. In some embodiments, the method alters genomic DNA sequence in a cell, although any desired nucleic acid may be modified. When applied to DNA contained in cells, the method comprises introducing the systems, compositions, or vectors described herein into a cell comprising a target genomic DNA sequence. Descriptions of the nucleic acid molecule comprising a guide RNA sequence, the Cas proteins, the recombination proteins, the recruitment systems, and polynucleotides encoding thereof, the cell, the target genomic DNA sequence, and components thereof, set forth above in connection with the inventive system are also applicable to the method of altering a target genomic DNA sequence in a cell. The systems, composition or vectors may be introduced in any manner known in the art including, but not limited to, chemical transfection, STDU2-42312.601 (S22-113) electroporation, microinjection, biolistic delivery via gene guns, or magnetic-assisted transfection,
Figure imgf000068_0001
depending on the cell type. [00268] In certain embodiments, delivery of editing systems or components comprises delivery of a ribonucleoprotein (RNP) complex. According to the invention, targeting nucleic acids, including but not limited to gRNAs, dgRNAs can be provided in complexes, such as without limitation, complexes comprising Cas9 or dCas9. In certain embodiments, an RNP complex comprises a guide nucleic acid and a Cas9 fusion protein, such as without limitation a complex comprising dCas9-SSAP. In certain embodiments, an RNP complex comprises a guide nucleic acid and a recombination protein, e.g., an SSAP or SSB, which may be adapted or modified to bind to the guide nucleic acid. In certain embodiments, the guide nucleic acid and the recombination protein or Cas9 fusion protein comprise binding elements that promote complex formation. In a non-limiting example, a recombination protein comprises an MCP domain and a guide RNA comprises an MS2 aptamer, whereby binding of the MS2 aptamer to the MCP domain produces an RNP. [00269] In some embodiments, the guide RNA and the Cas and/or recombination protein polypeptide are be incubated together to form a ribonucleoprotein (RNP) complex prior to introducing into a cell, for example mixed together in a vessel to form an RNP complex, and then the RNP complex is introduced into the cell. In other embodiments, the Cas polypeptide described herein can be an mRNA encoding the Cas polypeptide, which Cas mRNA is introduced into the primary cell together with the modified sgRNA as an “All RNA” CRISPR system. [00270] In some embodiments, the RNP complex and donor nucleic acid or vector are concomitantly introduced into a cell. In other embodiments, the RNP complex and the donor nucleic acid or vector are sequentially introduced into the primary cell. In some instances, the RNP complex is introduced into the primary cell before the donor. In other instances, the donor is introduced into the primary cell before the RNP complex. For example, the RNP complex can be introduced into a cell about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 90, 120, 150, 180, 210, or 240 minutes or more before the donor nucleic acid or vector, or vice versa. US Pat. 11,193,141 describes introduction of an RNP complex and a homologous donor adeno-associated viral (AAV) vector into a cell to mediate targeted integration. The methods described can be used with the instant invention. US Patent Publication 2019/0093128 describes introducing into the zygote a ribonucleoprotein (RNP) comprising a class STDU2-42312.601 (S22-113) 2 CRISPR/Cas endonuclease complexed with a corresponding CRISPR/Cas guide RNA that
Figure imgf000069_0001
hybridizes to a target sequence within the genomic DNA of the zygote. [00271] Non-limiting examples include use of (1) purified Cas9 or dCas9 protein; (2) synthesized guideRNA with MS2 aptamer; (3) purified MCP-SSAP fusion protein; (4) donor DNA (double, single strand DNA donor for HEK293 and K562, and AAV donor for HSC), delivered into HEK293, K562, and primary hematopoietic stem cells (mouse and human) for knock-in editing. [00272] The following table provides exemplary sequences for generating knock-ins including at ALB and AAVS1. The sequences can be employed in RNPs, nucleic acids, vectors, for expression, and the like. Table 2. Sequences for knock-ins C T A C A A C A T A A
Figure imgf000069_0002
STDU2-42312.601 (S22-113) AAGTAACTTAGAGTGACTGAAACTTCACAGAGCT AGCctgacctcttctcttcctcccacagggctcgagagatctggcagcggaatg c c gt ct a ga c g c g a c gt g G T G G A C T c gc a t C T A
Figure imgf000070_0001
STDU2-42312.601 (S22-113) Homology Arm AGATGGTAAATATACACAAGGGATTTAGTCAAAC (right) AATTTTTTGGCAAGAATATTATGAATTTTGTAATC A A C A T A A tg c c g c a g g g g a ga t T T A C T
Figure imgf000071_0001
STDU2-42312.601 (S22-113) AAGTTAAAATATTGATGAATCAAATTTAATGTTTC TAATAGTGTTGTTTATTATTCTAAAGTGCTTATATT cc t a ct gt tc ct c tt g c c t ct tc a t c t a g a a g ga c tg a T cc a g t g c t g cc g a g a
Figure imgf000072_0001
STDU2-42312.601 (S22-113) gataaggccagtagccagccccgtcctggcagggctgtggtgaggaggggggt gtccgtgtggaaaactccctttgtgagaatggtgcgtcctaggtgttcaccaggtcg ct c g g gt tc ct c tt g c c t ct tc a t c T g ca ga a tc ga t g gc ct g c c g tc t t a c g c t
Figure imgf000073_0001
STDU2-42312.601 (S22-113) tgttcctccgtgcgtcagttttacctgtgagataaggccagtagccagccccgtcct ggcagggctgtggtgaggaggggggtgtccgtgtggaaaactccctttgtgagaa a c g c A r U A r U G r U
Figure imgf000074_0001
STDU2-42312.601 (S22-113) targeting human (SEQ ID NO:593) HBA1 for A r U U G r U
Figure imgf000075_0001
STDU2-42312.601 (S22-113) modified base, “*” indicate A r U U r U T A
Figure imgf000076_0001
STDU2-42312.601 (S22-113) sequence shown in DNA format
Figure imgf000077_0001
T A T A
Figure imgf000077_0002
mic DNA sequence, the guide RNA sequence binds to the target genomic DNA sequence in the cell genome, the Cas protein associates with the guide RNA and may induce a double strand break or single strand nick in the target genomic DNA sequence and the aptamer recruits the recombination proteins to the target genomic DNA sequence through the aptamer binding protein of the fusion protein, thereby altering the target genomic DNA sequence in the cell. When introducing the compositions, or vectors described herein into the cell, the nucleic acid molecule comprising a guide RNA sequence, the Cas9 protein, and the fusion protein are first expressed in the cell. [00274] In some embodiments, the cell is in an organism or host, such that introducing the disclosed systems, compositions, vectors into the cell comprises administration to a subject. The method may comprise providing or administering to the subject, in vivo, or by transplantation of ex vivo treated cells, systems, compositions, vectors of the present system. [00275] A “subject” may be human or non-human and may include, for example, animal strains or species used as “model systems” for research purposes, such a mouse model as described herein. STDU2-42312.601 (S22-113) Likewise, subject may include either adults or juveniles (e.g., children). Moreover, subject may
Figure imgf000078_0001
mean any living organism, preferably a mammal (e.g., human or non-human) that may from the administration of compositions contemplated herein. Examples of mammals include, but are not limited to, any member of the Mammalian class: humans, non-human primates such as chimpanzees, and other apes and monkey species; farm animals such as cattle, horses, sheep, goats, swine; domestic animals such as rabbits, dogs, and cats; laboratory animals including rodents, such as rats, mice and guinea pigs, and the like. Examples of non-mammals include, but are not limited to, birds, fish and the like. In one embodiment of the methods and compositions provided herein, the mammal is a human. Plants include without limitation sugar cane, corn, wheat, rice, oil palm fruit, potatoes, soy beans, vegetables, cassava, sugar beets, tomatoes, barley, bananas, watermelon, onions, sweet potatoes, cucumbers, apples, seed cotton, oranges, and the like. [00276] As used herein, the terms “providing,” “administering,” and “introducing,” are used interchangeably herein and refer to the placement of the systems of the invention into a subject by a method or route which results in at least partial localization of the system to a desired site. The systems can be administered by any appropriate route which results in delivery to a desired location in the subject. [00277] The phrase “altering a DNA sequence,” as used herein, refers to modifying at least one physical feature of a DNA sequence of interest. DNA alterations include, for example, single or double strand DNA breaks, deletion, or insertion of one or more nucleotides, and other modifications that affect the structural integrity or nucleotide sequence of the DNA sequence. The modifications of a target sequence in genomic DNA may lead to, for example, gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, gene knock-down, and the like. [00278] In some embodiments, the systems and methods described herein may be used to correct one or more defects or mutations in a gene (referred to as “gene correction”). In such cases, the target genomic DNA sequence encodes a defective version of a gene, and the system further comprises a donor nucleic acid molecule which encodes a wild-type or corrected version of the gene. Thus, in other words, the target genomic DNA sequence is a “disease-associated” gene. The term “disease-associated gene,” refers to any gene or polynucleotide whose gene products are expressed at an abnormal level or in an abnormal form in cells obtained from a disease-affected individual as compared with tissues or cells obtained from an individual not affected by the disease. STDU2-42312.601 (S22-113) A disease-associated gene may be expressed at an abnormally high level or at an abnormally low
Figure imgf000079_0001
level, where the altered expression correlates with the occurrence and/or progression A disease-associated gene also refers to a gene, the mutation or genetic variation of which is directly responsible or is in linkage disequilibrium with a gene(s) that is responsible for the etiology of a disease. Examples of genes responsible for such “single gene” or “monogenic” diseases include, but are not limited to, adenosine deaminase, α-1 antitrypsin, cystic fibrosis transmembrane conductance regulator (CFTR), β-hemoglobin (HBB), oculocutaneous albinism II (OCA2), Huntingtin (HTT), dystrophia myotonica-protein kinase (DMPK), low-density lipoprotein receptor (LDLR), apolipoprotein B (APOB), neurofibromin 1 (NF1), polycystic kidney disease 1 (PKD1), polycystic kidney disease 2 (PKD2), coagulation factor VIII (F8), dystrophin (DMD), phosphate-regulating endopeptidase homologue, X-linked (PHEX), methyl-CpG-binding protein 2 (MECP2), and ubiquitin-specific peptidase 9Y, Y-linked (USP9Y). Other single gene or monogenic diseases are known in the art and described in, e.g., Chial, H. Rare Genetic Disorders: Learning About Genetic Disease Through Gene Mapping, SNPs, and Microarray Data, Nature Education 1(1):192 (2008), incorporated herein by reference; Online Mendelian Inheritance in Man (OMIM); and the Human Gene Mutation Database (HGMD). [00279] The invention provides knock-ins of large transgenes at therapeutically relevant loci in the human genome. In certain embodiments, the locus provides cell or tissue-specific expression. In certain embodiments, the invention comprises insertion of nucleic acids into the albumin (ALB) locus. The ALB locus provides for liver targeting in human hepatocytes, is highly expressed and in a liver-specific manner. In certain embodiments, the invention comprises insertion of nucleic acids into the AAVS1 locus. The AAVS1 locus is a safe-harbor locus for gene therapy that is well expressed in certain tissue types and can be used in a wide variety of treatments, with low expression in liver. US Patent Publication 2018/0214490 A1 describes gene therapy for lysosomal storage diseases, including targeting transgenes to safe harbor loci such as the AAVS1, HPRT and CCR5 genes in human cells, and Rosa26 in murine cells. US Patent 9267154 describes integration of exogenous nucleic acid sequences into the PPP1R12C locus, which is widely expressed in most tissues. describes cell-specific expression by targeting transgenes (e.g., encoding chimeric antigen receptors (CARs)) to the T-cell receptor α constant (TRAC) locus. These are exemplary and non- limiting as to loci that can be targeted according to the invention. STDU2-42312.601 (S22-113) [00280] In another embodiment, the target genomic DNA sequence can comprise a gene, the
Figure imgf000080_0001
mutation of which contributes to a particular disease in combination with mutations genes. Diseases caused by the contribution of multiple genes which lack simple (e.g., Mendelian) inheritance patterns are referred to in the art as a “multifactorial” or “polygenic” disease. Examples of multifactorial or polygenic diseases include, but are not limited to, asthma, diabetes, epilepsy, hypertension, bipolar disorder, and schizophrenia. Certain developmental abnormalities also can be inherited in a multifactorial or polygenic pattern and include, for example, cleft lip/palate, congenital heart defects, and neural tube defects. [00281] In another embodiment, the method of altering a target genomic DNA sequence can be used to delete nucleic acids from a target sequence in a cell by cleaving the target sequence and allowing the cell to repair the cleaved sequence in the absence of an exogenously provided donor nucleic acid molecule. Deletion of a nucleic acid sequence in this manner can be used in a variety of applications, such as, for example, to remove disease-causing trinucleotide repeat sequences in neurons, to create gene knock-outs or knock-downs, and to generate mutations for disease models in research. [00282] The term “donor nucleic acid molecule” refers to a nucleotide sequence that is inserted into the target DNA (e.g., genomic DNA). As described above the donor DNA may include, for example, a gene or part of a gene, a sequence encoding a tag or localization sequence, or a regulating element. The donor nucleic acid molecule may be of any length. In some embodiments, the donor nucleic acid molecule is between 10 and 10,000 nucleotides in length. For example, between about 100 and 5,000 nucleotides in length, between about 200 and 2,000 nucleotides in length, between about 500 and 1,000 nucleotides in length, between about 500 and 5,000 nucleotides in length, between about 1,000 and 5,000 nucleotides in length, or between about 1,000 and 10,000 nucleotides in length, [00283] The disclosed systems and methods overcome challenges encountered during conventional gene editing, including low efficiency and off-target events, particularly with kilobase-scale nucleic acids. In some embodiments, the disclosed systems and methods improve the efficiency of gene editing. For example, the disclosed systems and methods can have a 2- to 10-fold increase in efficiency over conventional CRISPR-Cas9 systems and methods, as shown in Examples 2, 3, and 5. In some embodiments, the improvement in efficiency is accompanied by a reduction in off-target events. The off-target events may be reduced by greater than 50% compared STDU2-42312.601 (S22-113) to conventional CRISPR-Cas9 systems and methods, for example, a reduction of off-target events by about 90% is shown in Example 3. Another aspect of increasing the overall accuracy of a gene editing system is reducing the on-target insertion-deletions (indels), a byproduct of HDR editing. In some embodiments, the disclosed systems and methods reduce the on-target indels by greater than 90% compared to conventional CRISPR-Cas9 systems and methods, as shown in Example 3. [00284] The invention further provides kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods described herein. For example, kits may include CRISPR reagents (Cas protein, guide RNA, vectors, compositions, etc.), recombineering reagents (recombination protein-aptamer binding protein fusion protein, the aptamer sequence, vectors, compositions, etc.) transfection or administration reagents, negative and positive control samples (e.g., cells, template DNA), cells, containers housing one or more components (e.g., microcentrifuge tubes, boxes), detectable labels, detection and analysis instruments, software, instructions, and the like. [00285] The RNAs may be delivered using adeno associated virus (AAV), lentivirus, adenovirus or other viral vector types, or combinations thereof. The RNAs can be packaged into one or more viral vectors. In some embodiments, the viral vector is delivered to the tissue of interest by, for example, an intramuscular injection, while other times the viral delivery is via intravenous, transdermal, intranasal, oral, mucosal, or other delivery methods. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector chose, the target cell, organism, or tissue, the general condition of the subject to be treated, the degree of transformation/modification sought, the administration route, the administration mode, the type of transformation/modification sought, etc. [00286] Such a dosage may further contain, for example, a carrier (water, saline, ethanol, glycerol, lactose, sucrose, calcium phosphate, gelatin, dextran, agar, pectin, peanut oil, sesame oil, etc.), a diluent, a pharmaceutically-acceptable carrier (e.g., phosphate-buffered saline), a pharmaceutically-acceptable excipient, and/or other compounds known in the art. Such a dosage formulation is readily ascertainable by one skilled in the art. The dosage may further contain one or more pharmaceutically acceptable salts such as, for example, a mineral acid salt such as a hydrochloride, a hydrobromide, a phosphate, a sulfate, etc.; and the salts of organic acids such as acetates, propionates, malonates, benzoates, etc. Additionally, auxiliary substances, such as STDU2-42312.601 (S22-113) wetting or emulsifying agents, pH buffering substances, gels or gelling materials, flavorings,
Figure imgf000082_0001
colorants, microspheres, polymers, suspension agents, etc. may also be present one or more other conventional pharmaceutical ingredients, such as preservatives, humectants, suspending agents, surfactants, antioxidants, anticaking agents, fillers, chelating agents, coating agents, chemical stabilizers, etc. may also be present, especially if the dosage form is a reconstitutable form. Suitable exemplary ingredients include microcrystalline cellulose, carboxymethylcellulose sodium, polysorbate 80, phenylethyl alcohol, chlorobutanol, potassium sorbate, sorbic acid, sulfur dioxide, propyl gallate, the parabens, ethyl vanillin, glycerin, phenol, parachlorophenol, gelatin, albumin, and a combination thereof. A thorough discussion of pharmaceutically acceptable excipients is available in REMINGTON'S PHARMACEUTICAL SCIENCES (Mack Pub. Co., N.J. 1991) which is incorporated by reference herein. [00287] In an embodiment herein the delivery is via an adenovirus, which may be at a single booster dose containing at least 1×105 particles (also referred to as particle units, pu) of adenoviral vector. In an embodiment herein, the dose preferably is at least about 1×106 particles (for example, about 1×106-1×1012 particles), more preferably at least about 1×1010 particles, more preferably at least about 1×108 particles (e.g., about 1×108-1×1011 particles or about 1×108-1×1012 particles), and most preferably at least about 1×1010 particles (e.g., about 1×109-1×1010 particles or about 1×109-1×1012 particles), or even at least about 1×1010 particles (e.g., about 1×1010-1×1012 particles) of the adenoviral vector. Alternatively, the dose comprises no more than about 1×1014 particles, preferably no more than about 1×1013 particles, even more preferably no more than about 1×1012 particles, even more preferably no more than about 1×1011 particles, and most preferably no more than about 1×1010 particles (e.g., no more than about 1×109 articles). Thus, the dose may contain a single dose of adenoviral vector with, for example, about 1×106 particle units (pu), about 2×106 pu, about 4×106 pu, about 1×107 pu, about 2×107 pu, about 4×107 pu, about 1×108 pu, about 2×108 pu, about 4×108 pu, about 1×109 pu, about 2×109 pu, about 4×109 pu, about 1×1010 pu, about 2×1010 pu, about 4×1010 pu, about 1×1011 pu, about 2×1011 pu, about 4×1011 pu, about 1×1012 pu, about 2×1012 pu, or about 4×1012 pu of adenoviral vector. See, for example, the adenoviral vectors in U.S. Pat. No. 8,454,972 B2 to Nabel, et. al., granted on Jun. 4, 2013; incorporated by reference herein, and the dosages at col 29, lines 36-58 thereof. In an embodiment herein, the adenovirus is delivered via multiple doses. STDU2-42312.601 (S22-113) [00288] In an embodiment herein, the delivery is via an AAV. A therapeutically effective
Figure imgf000083_0001
dosage for in vivo delivery of the AAV to a human is believed to be in the range of 20 to about 50 ml of saline solution containing from about 1×1010 to about 1×1010 functional AAV/ml solution. The dosage may be adjusted to balance the therapeutic benefit against any side effects. In an embodiment herein, the AAV dose is generally in the range of concentrations of from about 1×105 to 1×1050 genomes AAV, from about 1×108 to 1×1020 genomes AAV, from about 1×1010 to about 1×1016 genomes, or about 1×1011 to about 1×1016 genomes AAV. A human dosage may be about 1×1013 genomes AAV. Such concentrations may be delivered in from about 0.001 ml to about 100 ml, about 0.05 to about 50 ml, or about 10 to about 25 ml of a carrier solution. Other effective dosages can be readily established by one of ordinary skill in the art through routine trials establishing dose response curves. See, for example, U.S. Pat. No. 8,404,658 B2 to Hajjar, et al., granted on Mar. 26, 2013, at col. 27, lines 45-60. [00289] In an embodiment herein the delivery is via a plasmid. In such plasmid compositions, the dosage should be a sufficient amount of plasmid to elicit a response. For instance, suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg, or from about 1 μg to about 10 μg. [00290] The doses herein are based on an average 70 kg individual. The frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), or scientist skilled in the art. Mice used in experiments are about 20 g. From that which is administered to a 20 g mouse, one can extrapolate to a 70 kg individual. [00291] Lentiviruses are complex retroviruses that have the ability to infect and express their genes in both mitotic and post-mitotic cells. The most commonly known lentivirus is the human immunodeficiency virus (HIV), which uses the envelope glycoproteins of other viruses to target a broad range of cell types. [00292] Lentiviruses may be prepared as follows. After cloning pCasES10 (which contains a lentiviral transfer plasmid backbone), HEK293FT at low passage (p=5) were seeded in a T-75 flask to 50% confluence the day before transfection in DMEM with 10% fetal bovine serum and without antibiotics. After 20 hours, media was changed to OptiMEM (serum-free) media and transfection was done 4 hours later. Cells were transfected with 10 μg of lentiviral transfer plasmid (pCasES10) and the following packaging plasmids: 5 μg of pMD2. G (VSV-g pseudotype), and 7.5 ug of psPAX2 (gag/pol/rev/tat). Transfection was done in 4 mL OptiMEM with a cationic lipid STDU2-42312.601 (S22-113) delivery agent (50 uL Lipofectamine 2000 and 100 ul Plus reagent). After 6 hours, the media was
Figure imgf000084_0001
changed to antibiotic-free DMEM with 10% fetal bovine serum. [00293] Lentivirus may be purified as follows. Viral supernatants were harvested after 48 hours. Supernatants were first cleared of debris and filtered through a 0.45 um low protein binding (PVDF) filter. They were then spun in an ultracentrifuge for 2 hours at 24,000 rpm. Viral pellets were resuspended in 50 ul of DMEM overnight at 4 C. They were then aliquoted and immediately frozen at −80 C. [00294] In another embodiment, minimal non-primate lentiviral vectors based on the equine infectious anemia virus (EIAV) are also contemplated, especially for ocular gene therapy (see, e.g., Balagaan, J Gene Med 2006; 8: 275-285, Published online 21 Nov.2005 in Wiley InterScienc; available at the website: interscience.wiley.com. DOI: 10.1002/jgm.845). In another embodiment, RetinoStat®, an equine infectious anemia virus-based lentiviral gene therapy vector that expresses angiostatic proteins endostain and angiostatin that is delivered via a subretinal injection for the treatment of the web form of age-related macular degeneration is also contemplated (see, e.g., Binley et al., HUMAN GENE THERAPY 23:980-991 (September 2012)) may be modified for the system of the present invention. [00295] Lentiviral vectors have been disclosed as in the treatment for Parkinson's Disease, see, e.g., US Patent Publication No. 20120295960 and U.S. Pat. Nos. 7,303,910 and 7,351,585. Lentiviral vectors have also been disclosed for the treatment of ocular diseases, see e.g., US Patent Publication Nos. 20060281180, 20090007284, US20110117189; US20090017543; US20070054961, US20100317109. Lentiviral vectors have also been disclosed for delivery to the brain, see, e.g., US Patent Publication Nos. US20110293571; US20110293571, US20040013648, US20070025970, US20090111106 and U.S. Pat. No. 7,259,015. [00296] Several types of particle delivery systems and/or formulations are known to be useful in a diverse spectrum of biomedical applications. In general, a particle is defined as a small object that behaves as a whole unit with respect to its transport and properties. Particles are further classified according to diameter. Coarse particles cover a range between 2,500 and 10,000 nanometers. Fine particles are sized between 100 and 2,500 nanometers. Ultrafine particles, or nanoparticles, are generally between 1 and 100 nanometers in size. The basis of the 100-nm limit is the fact that novel properties that differentiate particles from the bulk material typically develop at a critical length scale of under 100 nm. STDU2-42312.601 (S22-113) [00297] As used herein, a particle delivery system/formulation is defined as any biological
Figure imgf000085_0001
delivery system/formulation which includes a particle in accordance with the present A particle in accordance with the present invention is any entity having a greatest dimension (e.g., diameter) of less than 100 microns (μm). In some embodiments, inventive particles have a greatest dimension of less than 10 microns (μm). In some embodiments, inventive particles have a greatest dimension of less than 2000 nanometers (nm). In some embodiments, inventive particles have a greatest dimension of less than 1000 nanometers (nm). In some embodiments, inventive particles have a greatest dimension of less than 900 nm, 800 nm, 700 nm, 600 nm, 500 nm, 400 nm, 300 nm, 200 nm, or 100 nm. Typically, inventive particles have a greatest dimension (e.g., diameter) of 500 nm or less. In some embodiments, inventive particles have a greatest dimension (e.g., diameter) of 250 nm or less. In some embodiments, inventive particles have a greatest dimension (e.g., diameter) of 200 nm or less. In some embodiments, inventive particles have a greatest dimension (e.g., diameter) of 150 nm or less. In some embodiments, inventive particles have a greatest dimension (e.g., diameter) of 100 nm or less. Smaller particles, e.g., having a greatest dimension of 50 nm or less are used in some embodiments of the invention. In some embodiments, inventive particles have a greatest dimension ranging between 25 nm and 200 nm. [00298] Particle characterization (including e.g., characterizing morphology, dimension, etc.) is done using a variety of different techniques. Common techniques are electron microscopy (TEM, SEM), atomic force microscopy (AFM), dynamic light scattering (DLS), X-ray photoelectron spectroscopy (XPS), powder X-ray diffraction (XRD), Fourier transform infrared spectroscopy (FTIR), matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF), ultraviolet-visible spectroscopy, dual polarization interferometry and nuclear magnetic resonance (NMR). Characterization (dimension measurements) may be made as to native particles (e.g., preloading) or after loading of the cargo (herein cargo refers to one or more RNAs and/or vectors encoding the same, and may include additional components, carriers and/or excipients) to provide particles of an optimal size for delivery for any in vitro, ex vivo and/or in vivo application of the present invention. In certain preferred embodiments, particle dimension (e.g., diameter) characterization is based on measurements using dynamic laser scattering (DLS). [00299] Particles delivery systems within the scope of the present invention may be provided in any form, including but not limited to solid, semi-solid, emulsion, or colloidal particles. As such any of the delivery systems described herein, including but not limited to, e.g., lipid-based systems, STDU2-42312.601 (S22-113) liposomes, micelles, microvesicles, exosomes, or gene gun may be provided as particle delivery
Figure imgf000086_0001
systems within the scope of the present invention. [00300] In terms of this invention, it is preferred to have one or more components of the system delivered using nanoparticles or lipid envelopes. CRISPR enzyme mRNA and guide RNA may be delivered simultaneously using nanoparticles or lipid envelopes. Other delivery systems or vectors may be used in conjunction with the nanoparticle aspects of the invention. [00301] In general, a “nanoparticle” refers to any particle having a diameter of less than 1000 nm. In certain preferred embodiments, nanoparticles of the invention have a greatest dimension (e.g., diameter) of 500 nm or less. In other preferred embodiments, nanoparticles of the invention have a greatest dimension ranging between 25 nm and 200 nm. In other preferred embodiments, nanoparticles of the invention have a greatest dimension of 100 nm or less. In other preferred embodiments, nanoparticles of the invention have a greatest dimension ranging between 35 nm and 60 nm. [00302] Nanoparticles encompassed in the present invention may be provided in different forms, e.g., as solid nanoparticles (e.g., metal such as silver, gold, iron, titanium), non-metal, lipid- based solids, polymers), suspensions of nanoparticles, or combinations thereof. Metal, dielectric, and semiconductor nanoparticles may be prepared, as well as hybrid structures (e.g., core-shell nanoparticles). Nanoparticles made of semiconducting material may also be labeled quantum dots if they are small enough (typically sub 10 nm) that quantization of electronic energy levels occurs. Such nanoscale particles are used in biomedical applications as drug carriers or imaging agents and may be adapted for similar purposes in the present invention. [00303] Semi-solid and soft nanoparticles have been manufactured, and are within the scope of the present invention. A prototype nanoparticle of semi-solid nature is the liposome. Various types of liposome nanoparticles are currently used clinically as delivery systems for anticancer drugs and vaccines. Nanoparticles with one half hydrophilic and the other half hydrophobic are termed Janus particles and are particularly effective for stabilizing emulsions. They can self-assemble at water/oil interfaces and act as solid surfactants. [00304] For example, Su X, Fricke J, Kavanagh D G, Irvine D J (“In vitro and in vivo mRNA delivery using lipid-enveloped pH-responsive polymer nanoparticles” Mol Pharm. 2011 Jun. 6; 8(3):774-87. doi: 10.1021/mp100390w. Epub 2011 Apr. 1) describes biodegradable core-shell structured nanoparticles with a poly(β-amino ester) (PBAE) core enveloped by a phospholipid STDU2-42312.601 (S22-113) bilayer shell. These were developed for in vivo mRNA delivery. The pH-responsive PBAE
Figure imgf000087_0001
component was chosen to promote endosome disruption, while the lipid surface layer was to minimize toxicity of the polycation core. Such are, therefore, preferred for delivering RNA of the present invention. [00305] In one embodiment, nanoparticles based on self assembling bioadhesive polymers are contemplated, which may be applied to oral delivery of peptides, intravenous delivery of peptides and nasal delivery of peptides, all to the brain. Other embodiments, such as oral absorption and ocular deliver of hydrophobic drugs are also contemplated. The molecular envelope technology involves an engineered polymer envelope which is protected and delivered to the site of the disease (see, e.g., Mazza, M. et al. ACSNano, 2013. 7(2): 1016-1026; Siew, A., et al. Mol Pharm, 2012. 9(1):14-28; Lalatsa, A., et al. J Contr Rel, 2012. 161(2):523-36; Lalatsa, A., et al., Mol Pharm, 2012. 9(6):1665-80; Lalatsa, A., et al. Mol Pharm, 2012. 9(6):1764-74; Garrett, N. L., et al. J Biophotonics, 2012. 5(5-6):458-68; Garrett, N. L., et al. J Raman Spect, 2012. 43(5):681-688; Ahmad, S., et al. J Royal Soc Interface 2010. 7:S423-33; Uchegbu, I. F. Expert Opin Drug Deliv, 2006. 3(5):629-40; Qu, X., et al. Biomacromolecules, 2006. 7(12):3452-9 and Uchegbu, I. F., et al. Int J Pharm, 2001. 224:185-199). Doses of about 5 mg/kg are contemplated, with single or multiple doses, depending on the target tissue. [00306] In one embodiment, nanoparticles that can deliver RNA to a cancer cell to stop tumor growth developed by Dan Anderson's lab at MIT may be used/and or adapted to the CRISPR Cas system of the present invention. In particular, the Anderson lab developed fully automated, combinatorial systems for the synthesis, purification, characterization, and formulation of new biomaterials and nanoformulations. See, e.g., Alabi et al., Proc Natl Acad Sci USA. 2013 Aug. 6; 110(32):12881-6; Zhang et al., Adv Mater. 2013 Sep. 6; 25(33):4641-5; Jiang et al., Nano Lett. 2013 Mar. 13; 13(3):1059-64; Karagiannis et al., ACS Nano. 2012 Oct. 23; 6(10):8484-7; Whitehead et al., ACS Nano. 2012 Aug. 28; 6(8):6922-9 and Lee et al., Nat Nanotechnol. 2012 Jun. 3; 7(6):389-93. [00307] US patent application 20110293703 relates to lipidoid compounds are also particularly useful in the administration of polynucleotides, which may be applied to deliver the CRISPR Cas system of the present invention. In one aspect, the aminoalcohol lipidoid compounds are combined with an agent to be delivered to a cell or a subject to form microparticles, nanoparticles, liposomes, or micelles. The agent to be delivered by the particles, liposomes, or micelles may be in the form STDU2-42312.601 (S22-113) of a gas, liquid, or solid, and the agent may be a polynucleotide, protein, peptide, or small molecule.
Figure imgf000088_0001
The minoalcohol lipidoid compounds may be combined with other compounds, polymers (synthetic or natural), surfactants, cholesterol, carbohydrates, proteins, lipids, etc. to form the particles. These particles may then optionally be combined with a pharmaceutical excipient to form a pharmaceutical composition. [00308] US Patent Publication No. 0110293703 also provides methods of preparing the aminoalcohol lipidoid compounds. One or more equivalents of an amine are allowed to react with one or more equivalents of an epoxide-terminated compound under suitable conditions to form an aminoalcohol lipidoid compound of the present invention. In certain embodiments, all the amino groups of the amine are fully reacted with the epoxide-terminated compound to form tertiary amines. In other embodiments, all the amino groups of the amine are not fully reacted with the epoxide-terminated compound to form tertiary amines thereby resulting in primary or secondary amines in the aminoalcohol lipidoid compound. These primary or secondary amines are left as is or may be reacted with another electrophile such as a different epoxide-terminated compound. As will be appreciated by one skilled in the art, reacting an amine with less than excess of epoxide- terminated compound will result in a plurality of different aminoalcohol lipidoid compounds with various numbers of tails. Certain amines may be fully functionalized with two epoxide-derived compound tails while other molecules will not be completely functionalized with epoxide-derived compound tails. For example, a diamine or polyamine may include one, two, three, or four epoxide-derived compound tails off the various amino moieties of the molecule resulting in primary, secondary, and tertiary amines. In certain embodiments, all the amino groups are not fully functionalized. In certain embodiments, two of the same types of epoxide-terminated compounds are used. In other embodiments, two or more different epoxide-terminated compounds are used. The synthesis of the aminoalcohol lipidoid compounds is performed with or without solvent, and the synthesis may be performed at higher temperatures ranging from 30.-100 C., preferably at approximately 50.-90 C. The prepared aminoalcohol lipidoid compounds may be optionally purified. For example, the mixture of aminoalcohol lipidoid compounds may be purified to yield an aminoalcohol lipidoid compound with a particular number of epoxide-derived compound tails. Or the mixture may be purified to yield a particular stereo- or regioisomer. The aminoalcohol lipidoid compounds may also be alkylated using an alkyl halide (e.g., methyl iodide) or other alkylating agent, and/or they may be acylated. STDU2-42312.601 (S22-113) [00309] US Patent Publication No.0110293703 also provides libraries of aminoalcohol lipidoid
Figure imgf000089_0001
compounds prepared by the inventive methods. These aminoalcohol lipidoid may be prepared and/or screened using high-throughput techniques involving liquid handlers, robots, microtiter plates, computers, etc. In certain embodiments, the aminoalcohol lipidoid compounds are screened for their ability to transfect polynucleotides or other agents (e.g., proteins, peptides, small molecules) into the cell. [00310] US Patent Publication No.20130302401 relates to a class of poly(beta-amino alcohols) (PBAAs) has been prepared using combinatorial polymerization. The inventive PBAAs may be used in biotechnology and biomedical applications as coatings (such as coatings of films or multilayer films for medical devices or implants), additives, materials, excipients, non-biofouling agents, micropatterning agents, and cellular encapsulation agents. When used as surface coatings, these PBAAs elicited different levels of inflammation, both in vitro and in vivo, depending on their chemical structures. The large chemical diversity of this class of materials allowed us to identify polymer coatings that inhibit macrophage activation in vitro. Furthermore, these coatings reduce the recruitment of inflammatory cells, and reduce fibrosis, following the subcutaneous implantation of carboxylated polystyrene microparticles. These polymers may be used to form polyelectrolyte complex capsules for cell encapsulation. The invention may also have many other biological applications such as antimicrobial coatings, DNA or siRNA delivery, and stem cell tissue engineering. The teachings of US Patent Publication No. 20130302401 may be applied to the system of the present invention. [00311] In another embodiment, lipid nanoparticles (LNPs) are contemplated. In particular, an antitransthyretin small interfering RNA encapsulated in lipid nanoparticles (see, e.g., Coelho et al., N Engl J Med 2013; 369:819-29) may be applied to the system of the present invention. Doses of about 0.01 to about 1 mg per kg of body weight administered intravenously are contemplated. Medications to reduce the risk of infusion-related reactions are contemplated, such as dexamethasone, acetaminophen, diphenhydramine or cetirizine, and ranitidine are contemplated. Multiple doses of about 0.3 mg per kilogram every 4 weeks for five doses are also contemplated. Lipids include, but are not limited to, DLin-KC2-DMA4, C12-200 and colipids disteroylphosphatidyl choline, cholesterol, and PEG-DMG may be formulated RNA instead of siRNA (see, e.g., Novobrantseva, Molecular Therapy—Nucleic Acids (2012) 1, e4; doi:10.1038/mtna.2011.3) using a spontaneous vesicle formation procedure. The component molar STDU2-42312.601 (S22-113) ratio may be about 50/10/38.5/1.5 (DLin-KC2-DMA or C12-200/disteroylphosphatidyl
Figure imgf000090_0001
choline/cholesterol/PEG-DMG). The final lipid:siRNA weight ratio may be ˜12:1 the case of DLin-KC2-DMA and C12-200 lipid nanoparticles (LNPs), respectively. The formulations may have mean particle diameters of ˜80 nm with >90% entrapment efficiency. A 3 mg/kg dose may be contemplated. [00312] LNPs have been shown to be highly effective in delivering siRNAs to the liver (see, e.g., Tabernero et al., Cancer Discovery, April 2013, Vol. 3, No. 4, pages 363-470) and are therefore contemplated for delivering CRISPR Cas to the liver. A dosage of about four doses of 6 mg/kg of the LNP (or RNA of the CRISPR-Cas) every two weeks may be contemplated. Tabernero et al. demonstrated that tumor regression was observed after the first 2 cycles of LNPs dosed at 0.7 mg/kg, and by the end of 6 cycles the patient had achieved a partial response with complete regression of the lymph node metastasis and substantial shrinkage of the liver tumors. A complete response was obtained after 40 doses in this patient, who has remained in remission and completed treatment after receiving doses over 26 months. Two patients with RCC and extrahepatic sites of disease including kidney, lung, and lymph nodes that were progressing following prior therapy with VEGF pathway inhibitors had stable disease at all sites for approximately 8 to 12 months, and a patient with PNET and liver metastases continued on the extension study for 18 months (36 doses) with stable disease. [00313] However, the charge of the LNP must be taken into consideration. As cationic lipids combined with negatively charged lipids to induce nonbilayer structures that facilitate intracellular delivery. Because charged LNPs are rapidly cleared from circulation following intravenous injection, ionizable cationic lipids with pKa values below 7 were developed (see, e.g., Rosin et al, Molecular Therapy, vol. 19, no. 12, pages 1286-2200, December 2011). Negatively charged polymers such as siRNA oligonucleotides may be loaded into LNPs at low pH values (e.g., pH 4) where the ionizable lipids display a positive charge. However, at physiological pH values, the LNPs exhibit a low surface charge compatible with longer circulation times. Four species of ionizable cationic lipids have been focused upon, namely 1,2-dilineoyl-3-dimethylammonium- propane (DLinDAP), 1,2-dilinoleyloxy-3-N,N-dimethylaminopropane (DLinDMA), 1,2- dilinoleyloxy-keto-N,N-dimethyl-3-aminopropane (DLinKDMA), and 1,2-dilinoleyl-4-(2- dimethylaminoethyl)-[1,3]-dioxolane (DLinKC2-DMA). It has been shown that LNP siRNA systems containing these lipids exhibit remarkably different gene silencing properties in STDU2-42312.601 (S22-113) hepatocytes in vivo, with potencies varying according to the series DLinKC2- DMA>DLinKDMA>DLinDMA>>DLinDAP employing a Factor VII gene silencing model (see, e.g., Rosin et al, Molecular Therapy, vol.19, no.12, pages 1286-2200, December 2011). A dosage of 1 μg/ml levels may be contemplated, especially for a formulation containing DLinKC2-DMA. Preparation of LNPs and CRISPR Cas encapsulation may be used/and or adapted from Rosin et al, Molecular Therapy, vol.19, no.12, pages 1286-2200, December 2011). The cationic lipids 1,2- dilineoyl-3-dimethylammonium-propane (DLinDAP), 1,2-dilinoleyloxy-3-N,N- dimethylaminopropane (DLinDMA), 1,2-dilinoleyloxyketo-N,N-dimethyl-3-aminopropane (DLinK-DMA), 1,2-dilinoleyl-4-(2-dimethylaminoethyl)-[1,3]-dioxolane (DLinKC2-DMA), (3- o-[2″-(methoxypolyethyleneglycol 2000) succinoyl]-1,2-dimyristoyl-sn-glycol (PEG-S-DMG), and R-3-[(w-methoxy-poly(ethylene glycol)2000) carbamoyl]-1,2-dimyristyloxlpropyl-3-amine (PEG-C-DOMG) may be provided by Tekmira Pharmaceuticals (Vancouver, Canada) or synthesized. Cholesterol may be purchased from Sigma (St Louis, Mo.). The specific CRISPR Cas RNA may be encapsulated in LNPs containing DLinDAP, DLinDMA, DLinK-DMA, and DLinKC2-DMA (cationic lipid:DSPC:CHOL: PEGS-DMG or PEG-C-DOMG at 40:10:40:10 molar ratios). When required, 0.2% SP-DiOC18 (Invitrogen, Burlington, Canada) may be incorporated to assess cellular uptake, intracellular delivery, and biodistribution. Encapsulation may be performed by dissolving lipid mixtures comprised of cationic lipid:DSPC:cholesterol:PEG-c-DOMG (40:10:40:10 molar ratio) in ethanol to a final lipid concentration of 10 mmol/1. This ethanol solution of lipid may be added drop-wise to 50 mmol/1 citrate, pH 4.0 to form multilamellar vesicles to produce a final concentration of 30% ethanol vol/vol. Large unilamellar vesicles may be formed following extrusion of multilamellar vesicles through two stacked 80 nm Nuclepore polycarbonate filters using the Extruder (Northern Lipids, Vancouver, Canada). Encapsulation may be achieved by adding RNA dissolved at 2 mg/ml in 50 mmol/1 citrate, pH 4.0 containing 30% ethanol vol/vol drop-wise to extruded preformed large unilamellar vesicles and incubation at 31° C. for 30 minutes with constant mixing to a final RNA/lipid weight ratio of 0.06/1 wt/wt. Removal of ethanol and neutralization of formulation buffer were performed by dialysis against phosphate-buffered saline (PBS), pH 7.4 for 16 hours using Spectra/Por 2 regenerated cellulose dialysis membranes. Nanoparticle size distribution may be determined by dynamic light scattering using a NICOMP 370 particle sizer, the vesicle/intensity modes, and Gaussian fitting (Nicomp Particle Sizing, Santa Barbara, Calif.). The particle size for STDU2-42312.601 (S22-113) all three LNP systems may be ˜70 nm in diameter. siRNA encapsulation efficiency may be
Figure imgf000092_0001
determined by removal of free siRNA using VivaPureD MiniH columns Biotech) from samples collected before and after dialysis. The encapsulated RNA may be extracted from the eluted nanoparticles and quantified at 260 nm. siRNA to lipid ratio was determined by measurement of cholesterol content in vesicles using the Cholesterol E enzymatic assay from Wako Chemicals USA (Richmond, Va.). PEGylated liposomes (or LNPs) can also be used for delivery. [00314] Preparation of large LNPs may be used/and or adapted from Rosin et al, Molecular Therapy, vol. 19, no. 12, pages 1286-2200, December 2011. A lipid premix solution (20.4 mg/ml total lipid concentration) may be prepared in ethanol containing DLinKC2-DMA, DSPC, and cholesterol at 50:10:38.5 molar ratios. Sodium acetate may be added to the lipid premix at a molar ratio of 0.75:1 (sodium acetate:DLinKC2-DMA). The lipids may be subsequently hydrated by combining the mixture with 1.85 volumes of citrate buffer (10 mmol/1, pH 3.0) with vigorous stirring, resulting in spontaneous liposome formation in aqueous buffer containing 35% ethanol. The liposome solution may be incubated at 37° C. to allow for time-dependent increase in particle size. Aliquots may be removed at various times during incubation to investigate changes in liposome size by dynamic light scattering (Zetasizer Nano ZS, Malvern Instruments, Worcestershire, UK). Once the desired particle size is achieved, an aqueous PEG lipid solution (stock=10 mg/ml PEG-DMG in 35% (vol/vol) ethanol) may be added to the liposome mixture to yield a final PEG molar concentration of 3.5% of total lipid. Upon addition of PEG-lipids, the liposomes should maintain their size, effectively quenching further growth. RNA may then be added to the empty liposomes at an siRNA to total lipid ratio of approximately 1:10 (wt:wt), followed by incubation for 30 minutes at 37° C. to form loaded LNPs. The mixture may be subsequently dialyzed overnight in PBS and filtered with a 0.45-μm syringe filter. [00315] Spherical Nucleic Acid (SNA™) constructs and other nanoparticles (particularly gold nanoparticles) are also contemplate as a means to delivery CRISPR/Cas system to intended targets. Significant data show that AuraSense Therapeutics' Spherical Nucleic Acid (SNA™) constructs, based upon nucleic acid-functionalized gold nanoparticles, are superior to alternative platforms based on multiple key success factors, such as: STDU2-42312.601 (S22-113) [00316] High in vivo stability. Due to their dense loading, a majority of cargo (DNA or siRNA)
Figure imgf000093_0001
remains bound to the constructs inside cells, conferring nucleic acid stability and to enzymatic degradation. [00317] Deliverability. For all cell types studied (e.g., neurons, tumor cell lines, etc.) the constructs demonstrate a transfection efficiency of 99% with no need for carriers or transfection agents. [00318] Therapeutic targeting. The unique target binding affinity and specificity of the constructs allow exquisite specificity for matched target sequences (e.g., limited off-target effects). [00319] Superior efficacy. The constructs significantly outperform leading conventional transfection reagents (Lipofectamine 2000 and Cytofectin). [00320] Low toxicity. The constructs can enter a variety of cultured cells, primary cells, and tissues with no apparent toxicity. [00321] No significant immune response. The constructs elicit minimal changes in global gene expression as measured by whole-genome microarray studies and cytokine-specific protein assays. [00322] Chemical tailorability. Any number of single or combinatorial agents (e.g., proteins, peptides, small molecules) can be used to tailor the surface of the constructs. [00323] This platform for nucleic acid-based therapeutics may be applicable to numerous disease states, including inflammation and infectious disease, cancer, skin disorders and cardiovascular disease. [00324] Citable literature includes: Cutler et al., J. Am. Chem. Soc. 2011133:9254-9257, Hao et al., Small. 20117:3158-3162, Zhang et al., ACS Nano. 20115:6962-6970, Cutler et al., J. Am. Chem. Soc. 2012134:1376-1391, Young et al., Nano Lett. 201212:3867-71, Zheng et al., Proc. Natl. Acad. Sci. USA. 2012109:11975-80, Mirkin, Nanomedicine 20127:635-638 Zhang et al., J. Am. Chem. Soc.2012134:16488-1691, Weintraub, Nature 2013495:S14-S16, Choi et al., Proc. Natl. Acad. Sci. USA.2013110(19):7625-7630, Jensen et al., Sci. Transl. Med.5, 209ra152 (2013) and Mirkin, et al., Small, doi.org/10.1002/smll.201302143. [00325] Self-assembling nanoparticles with siRNA may be constructed with polyethyleneimine (PEI) that is PEGylated with an Arg-Gly-Asp (RGD) peptide ligand attached at the distal end of the polyethylene glycol (PEG), for example, as a means to target tumor neovasculature expressing integrins and used to deliver siRNA inhibiting vascular endothelial growth factor receptor-2 (VEGF R2) expression and thereby tumor angiogenesis (see, e.g., Schiffelers et al., Nucleic Acids STDU2-42312.601 (S22-113) Research, 2004, Vol. 32, No. 19). Nanoplexes may be prepared by mixing equal volumes of aqueous solutions of cationic polymer and nucleic acid to give a net molar excess of ionizable nitrogen (polymer) to phosphate (nucleic acid) over the range of 2 to 6. The electrostatic interactions between cationic polymers and nucleic acid resulted in the formation of polyplexes with average particle size distribution of about 100 nm, hence referred to here as nanoplexes. A dosage of about 100 to 200 mg of CRISPR Cas is envisioned for delivery in the self-assembling nanoparticles of Schiffelers et al. [00326] The nanoplexes of Bartlett et al. (PNAS, Sep. 25, 2007, vol. 104, no. 39) may also be applied to the present invention. The nanoplexes of Bartlett et al. are prepared by mixing equal volumes of aqueous solutions of cationic polymer and nucleic acid to give a net molar excess of ionizable nitrogen (polymer) to phosphate (nucleic acid) over the range of 2 to 6. The electrostatic interactions between cationic polymers and nucleic acid resulted in the formation of polyplexes with average particle size distribution of about 100 nm, hence referred to here as nanoplexes. The DOTA-siRNA of Bartlett et al. was synthesized as follows: 1,4,7,10-tetraazacyclododecane- 1,4,7,10-tetraacetic acid mono(N-hydroxysuccinimide ester) (DOTA-NHSester) was ordered from Macrocyclics (Dallas, Tex.). The amine modified RNA sense strand with a 100-fold molar excess of DOTA-NHS-ester in carbonate buffer (pH 9) was added to a microcentrifuge tube. The contents were reacted by stirring for 4 h at room temperature. The DOTA-RNAsense conjugate was ethanol-precipitated, resuspended in water, and annealed to the unmodified antisense strand to yield DOTA-siRNA. All liquids were pretreated with Chelex-100 (Bio-Rad, Hercules, Calif.) to remove trace metal contaminants. Tf-targeted and nontargeted siRNA nanoparticles may be formed by using cyclodextrin-containing polycations. Typically, nanoparticles were formed in water at a charge ratio of 3 (+/−) and an siRNA concentration of 0.5 g/liter. One percent of the adamantane-PEG molecules on the surface of the targeted nanoparticles were modified with Tf (adamantane-PEG-Tf). The nanoparticles were suspended in a 5% (wt/vol) glucose carrier solution for injection. [00327] Davis et al. (Nature, Vol 464, 15 Apr. 2010) conducts a siRNA clinical trial that uses a targeted nanoparticle-delivery system (clinical trial registration number NCT00689065). Patients with solid cancers refractory to standard-of-care therapies are administered doses of targeted nanoparticles on days 1, 3, 8 and 10 of a 21-day cycle by a 30-min intravenous infusion. The nanoparticles consist of a synthetic delivery system containing: (1) a linear, cyclodextrin-based STDU2-42312.601 (S22-113) polymer (CDP), (2) a human transferrin protein (TF) targeting ligand displayed on the exterior of the nanoparticle to engage TF receptors (TFR) on the surface of the cancer cells, (3) a hydrophilic polymer (polyethylene glycol (PEG) used to promote nanoparticle stability in biological fluids), and (4) siRNA designed to reduce the expression of the RRM2 (sequence used in the clinic was previously denoted siR2B+5). The TFR has long been known to be upregulated in malignant cells, and RRM2 is an established anti-cancer target. These nanoparticles (clinical version denoted as CALAA-01) have been shown to be well tolerated in multi-dosing studies in non-human primates. Although a single patient with chronic myeloid leukemia has been administered siRNA by liposomal delivery, Davis et al.'s clinical trial is the initial human trial to systemically deliver siRNA with a targeted delivery system and to treat patients with solid cancer. To ascertain whether the targeted delivery system can provide effective delivery of functional siRNA to human tumors, Davis et al. investigated biopsies from three patients from three different dosing cohorts; patients A, B and C, all of whom had metastatic melanoma and received CALAA-01 doses of 18, 24 and 30 mg m−2 siRNA, respectively. Similar doses may also be contemplated for the CRISPR Cas system of the present invention. The delivery of the invention may be achieved with nanoparticles containing a linear, cyclodextrin-based polymer (CDP), a human transferrin protein (TF) targeting ligand displayed on the exterior of the nanoparticle to engage TF receptors (TFR) on the surface of the cancer cells and/or a hydrophilic polymer (for example, polyethylene glycol (PEG) used to promote nanoparticle stability in biological fluids). [00328] Delivery or administration according to the invention can be performed with liposomes. Liposomes are spherical vesicle structures composed of a uni- or multilamellar lipid bilayer surrounding internal aqueous compartments and a relatively impermeable outer lipophilic phospholipid bilayer. Liposomes have gained considerable attention as drug delivery carriers because they are biocompatible, nontoxic, can deliver both hydrophilic and lipophilic drug molecules, protect their cargo from degradation by plasma enzymes, and transport their load across biological membranes and the blood brain barrier (BBB) (see, e.g., Spuch and Navarro, Journal of Drug Delivery, vol. 2011, Article ID 469679, 12 pages, 2011. doi:10.1155/2011/469679 for review). [00329] Liposomes can be made from several different types of lipids; however, phospholipids are most commonly used to generate liposomes as drug carriers. Although liposome formation is spontaneous when a lipid film is mixed with an aqueous solution, it can also be expedited by STDU2-42312.601 (S22-113) applying force in the form of shaking by using a homogenizer, sonicator, or an extrusion apparatus (see, e.g., Spuch and Navarro, Journal of Drug Delivery, vol. 2011, Article ID 469679, 12 pages, 2011. doi:10.1155/2011/469679 for review). [00330] Several other additives may be added to liposomes in order to modify their structure and properties. For instance, either cholesterol or sphingomyelin may be added to the liposomal mixture in order to help stabilize the liposomal structure and to prevent the leakage of the liposomal inner cargo. Further, liposomes are prepared from hydrogenated egg phosphatidylcholine or egg phosphatidylcholine, cholesterol, and dicetyl phosphate, and their mean vesicle sizes were adjusted to about 50 and 100 nm. (see, e.g., Spuch and Navarro, Journal of Drug Delivery, vol.2011, Article ID 469679, 12 pages, 2011. doi:10.1155/2011/469679 for review). [00331] Conventional liposome formulation is mainly comprised of natural phospholipids and lipids such as 1,2-distearoryl-sn-glycero-3-phosphatidyl choline (DSPC), sphingomyelin, egg phosphatidylcholines and monosialoganglioside. Since this formulation is made up of phospholipids only, liposomal formulations have encountered many challenges, one of the ones being the instability in plasma. Several attempts to overcome these challenges have been made, specifically in the manipulation of the lipid membrane. One of these attempts focused on the manipulation of cholesterol. Addition of cholesterol to conventional formulations reduces rapid release of the encapsulated bioactive compound into the plasma or 1,2-dioleoyl-sn-glycero-3- phosphoethanolamine (DOPE) increases the stability (see, e.g., Spuch and Navarro, Journal of Drug Delivery, vol. 2011, Article ID 469679, 12 pages, 2011. doi:10.1155/2011/469679 for review). [00332] In a particularly advantageous embodiment, Trojan Horse liposomes (also known as Molecular Trojan Horses) are desirable and protocols may be found at cshprotocols.cshlp.org/content/2010/4/pdb.prot5407.long. These particles allow delivery of a transgene to the entire brain after an intravascular injection. Without being bound by limitation, it is believed that neutral lipid particles with specific antibodies conjugated to surface allow crossing of the blood brain barrier via endocytosis. Applicant postulates utilizing Trojan Horse Liposomes to deliver the CRISPR family of nucleases to the brain via an intravascular injection, which would allow whole brain transgenic animals without the need for embryonic manipulation. About 1-5 g of nucleic acid molecule, e.g., DNA, RNA, may be contemplated for in vivo administration in liposomes. STDU2-42312.601 (S22-113) [00333] In another embodiment, the system may be administered in liposomes, such as a stable
Figure imgf000097_0001
nucleic-acid-lipid particle (SNALP) (see, e.g., Morrissey et al., Nature 23, No. 8, August 2005). Daily intravenous injections of about 1, 3 or 5 mg/kg/day of a specific CRISPR Cas targeted in a SNALP are contemplated. The daily treatment may be over about three days and then weekly for about five weeks. In another embodiment, a specific CRISPR Cas encapsulated SNALP) administered by intravenous injection to at doses of abpit 1 or 2.5 mg/kg are also contemplated (see, e.g., Zimmerman et al., Nature Letters, Vol. 441, 4 May 2006). The SNALP formulation may contain the lipids 3-N-[(wmethoxypoly(ethylene glycol) 2000) carbamoyl]-1,2-dimyristyloxy-propylamine (PEG-C-DMA), 1,2-dilinoleyloxy-N,N-dimethyl-3- aminopropane (DLinDMA), 1,2-distearoyl-sn-glycero-3-phosphocholine (DSPC) and cholesterol, in a 2:40:10:48 molar percent ratio (see, e.g., Zimmerman et al., Nature Letters, Vol. 441, 4 May 2006). [00334] In another embodiment, stable nucleic-acid-lipid particles (SNALPs) have proven to be effective delivery molecules to highly vascularized HepG2-derived liver tumors but not in poorly vascularized HCT-116 derived liver tumors (see, e.g., Li, Gene Therapy (2012) 19, 775- 780). The SNALP liposomes may be prepared by formulating D-Lin-DMA and PEG-C-DMA with distearoylphosphatidylcholine (DSPC), Cholesterol and siRNA using a 25:1 lipid/siRNA ratio and a 48/40/10/2 molar ratio of Cholesterol/D-Lin-DMA/DSPC/PEG-C-DMA. The resulted SNALP liposomes are about 80-100 nm in size. [00335] In yet another embodiment, a SNALP may comprise synthetic cholesterol (Sigma- Aldrich, St Louis, Mo., USA), dipalmitoylphosphatidylcholine (Avanti Polar Lipids, Alabaster, Ala., USA), 3-N-[(w-methoxy poly(ethylene glycol)2000)carbamoyl]-1,2- dimyrestyloxypropylamine, and cationic 1,2-dilinoleyloxy-3-N,Ndimethylaminopropane (see, e.g., Geisbert et al., Lancet 2010; 375: 1896-905). A dosage of about 2 mg/kg total CRISPR Cas per dose administered as, for example, a bolus intravenous infusion may be contemplated. [00336] In yet another embodiment, a SNALP may comprise synthetic cholesterol (Sigma- Aldrich), 1,2-distearoyl-sn-glycero-3-phosphocholine (DSPC; Avanti Polar Lipids Inc.), PEG- cDMA, and 1,2-dilinoleyloxy-3-(N;N-dimethyl)aminopropane (DLinDMA) (see, e.g., Judge, J. Clin. Invest. 119:661-673 (2009)). Formulations used for in vivo studies may comprise a final lipid/RNA mass ratio of about 9:1. STDU2-42312.601 (S22-113) [00337] Other cationic lipids, such as amino lipid 2,2-dilinoleyl-4-dimethylaminoethyl-[1,3]-
Figure imgf000098_0001
dioxolane (DLin-KC2-DMA) may be utilized to encapsulate CRISPR Cas similar to e.g., Jayaraman, Angew. Chem. Int. Ed. 2012, 51, 8529-8533). A preformed vesicle with the following lipid composition may be contemplated: amino lipid, di stearoylphosphatidylcholine (DSPC), cholesterol and (R)-2,3-bis(octadecyloxy) propyl-1-(methoxy poly(ethylene glycol)2000)propylcarbamate (PEG-lipid) in the molar ratio 40/10/40/10, respectively, and a FVII siRNA/total lipid ratio of approximately 0.05 (w/w). To ensure a narrow particle size distribution in the range of 70-90 nm and a low polydispersity index of 0.11_0.04 (n=56), the particles may be extruded up to three times through 80 nm membranes prior to adding the CRISPR Cas RNA. Particles containing the highly potent amino lipid 16 may be used, in which the molar ratio of the four lipid components 16, DSPC, cholesterol and PEG-lipid (50/10/38.5/1.5) which may be further optimized to enhance in vivo activity. [00338] Any element of any suitable CRISPR/Cas gene editing system known in the art can be employed in the systems and methods described herein, as appropriate. CRISPR/Cas gene editing technology is described in detail in, for example, U.S. Patent Application Publication 2014/0068797; U.S. Patents 8,697,359; 8,771,945; and 8,945,839; US2010/0076057; US2011/0189776; US2011/0223638; US2013/0130248; WO/2008/108989; WO/2010/054108; WO/2012/164565; WO/2013/098244; WO/2013/176772; US20150050699; US20150045546; US20150031134; US20150024500; US20140377868; US20140357530; US20140349400; US20140335620; US20140335063; US20140315985; US20140310830; US20140310828; US20140309487; US20140304853; US20140298547; US20140295556; US20140294773; US20140287938; US20140273234; US20140273232; US20140273231; US20140273230; US20140271987; US20140256046; US20140248702; US20140242702; US20140242700; US20140242699; US20140242664; US20140234972; US20140227787; US20140212869; US20140201857; US20140199767; US20140189896; US20140186958; US20140186919; US20140186843; US20140179770; US20140179006; and US20140170753, incorporated herein by reference. [00339] Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions, and alterations can be made herein without departing from the spirit and scope of the invention as defined in the appended claims. STDU2-42312.601 (S22-113) [00340] The present invention will be further illustrated in the following Examples which are
Figure imgf000099_0001
given for illustration purposes only and are not intended to limit the invention in any way. Examples Example 1 - Materials and Methods [00341] RecE/T Homolog Screening RefSeq non-redundant protein database was downloaded from NCBI on October 29, 2019. The database was searched with E. coli Rac prophage RecT (NP_415865.1) and RecE (NP_415866.1) as queries using position-specific iterated (PSI)- BLAST1 to retrieve protein homologs. Hits were clustered with CD-HIT2 and representative sequences were selected from each cluster for multiple alignment with MUSCLE3. Then, FastTree4 was used for maximum likelihood tree reconstruction with default parameters. A diverse set of RecET homologs were selected, synthesized by GenScript, and cloned into pMPH_MCP vectors for testing. [00342] Plasmids construction pX330, pMPH and pU6-(BbsI)_CBh-Cas9-T2A-BFP plasmids were obtained from Addgene. Tested effector DNA fragments were ordered from IDT, Genewiz, and GenScript. The fragments were Gibson assembled into the backbones using NEBuilder HiFi DNA Assembly Master Mix (New England BioLabs). All sgRNAs (Table 3) were inserted into backbones using Golden Gate cloning. All constructs were sequence-verified with Sanger sequencing of prepped plasmids. Table 3. Sequence for sgRNAs ) ) ) ) ) ) ) ) ) )
Figure imgf000099_0002
STDU2-42312.601 (S22-113) nsp-HSP90AA1- HSP90AA1 TCGTCATCTCCTTCAAGGGG (SEQ ID NO:32) guide2
Figure imgf000100_0001
3) ) ere
Figure imgf000100_0002
maintained in Dulbecco s Modified Eagle s Medium (DMEM, Life Technologies), with 10% fetal bovine serum (FBS, HyClone), 100 U/mL penicillin, and 100 μg/mL streptomycin (Life Technologies) at 37 ºC with 5% CO2. [00344] hES-H9 cells were maintained in mTeSR1 medium (StemCell Technologies) at 37 ºC with 5% CO2. Culture plates were pre-coated with Matrigel (Corning) 12 hours prior to use, and cells were supplemented with 10 μM Y27632 (Sigma) for the first 24 hours after passaging. Culture media was changed every 24 hours. [00345] Transfection HEK293T cells were seeded into 96-well plates (Corning) 12-24 hours prior to transfection at a density of 30,000 cells/well, and 250 ng of total DNA was transfected per well. HeLa and HepG2 cells were seeded into 48-well plates (Corning) one day prior to transfection at a density of 50,000 and 30,000 cells/well respectively, and 400 ng of total DNA was transfected per well. Transfections were performed with Lipofectamine 3000 (Life Technologies) following the manufacturer’s instructions. [00346] Electroporation For hES-H9 related transfection experiments, P3 Primary Cell 4D- NucleofectorTM X Kit S (Lonza) was used following the manufacturer’s protocol. For each reaction, 300,000 cells were nucleofected with 4 μg total DNA using the DC100 Nucleofector Program. [00347] Fluorescence-activated cell sorting (FACS) mKate knock-in efficiency was analyzed on a CytoFLEX flow cytometer (Beckman Coulter; Stanford Stem Cell FACS Core). 72 hours after transfection, cells were washed once with PBS and dissociated with TrypLE Express Enzyme (Thermo Fisher Scientific). Cell suspension was then transferred to a 96-well U-bottom plate (Thermo Fisher Scientific) and centrifuged at 300xG for 5 minutes. After removing the supernatant, pelleted cells were resuspended with 50 μl 4% FBS in PBS, and cells were sorted within 30 minutes of preparation. [00348] RFLP HEK293T cells were transfected with plasmid DNA and PCR templates and harvested after 72 hours for genomic DNA using the QuickExtract DNA Extraction Solution (Biosearch Technologies) following the manufacturer’s protocol. The target genomic region was STDU2-42312.601 (S22-113) amplified using specific primers outside of the homology arms of the PCR template. PCR products
Figure imgf000101_0001
were purified with Monarch PCR & DNA Cleanup Kit (New England BioLabs).300 ng product was digested with BsrGI (EMX1, New England BioLabs) or XbaI (VEGFA, NEB), and the digested products were analyzed on a 5% Mini-PROTEAN TBE gel (Bio-Rad). [00349] Next-Generation Sequencing Library Preparation 72 hours after transfection, genomic DNA was extracted using QuickExtract DNA Extraction Solution (Biosearch Technologies). 200 ng total DNA was used for NGS library preparation. Genes of interest were amplified using specific primers (Table 4) for the first round PCR reaction. Illumina adapters and index barcodes were added to the fragments with a second round PCR using the primers listed in Table 4. Round 2 PCR products were purified by gel electrophoresis on a 2% agarose gel using the Monarch DNA Gel Extraction Kit (NEB). The purified product was quantified with Qubit dsDNA HS Assay Kit (Thermo Fisher) and sequenced on an Illumina MiSeq according to the manufacturer’s instructions. Table 4. Sequence for primers used for PCR template, RFLP and NGS 9) A
Figure imgf000101_0002
STDU2-42312.601 (S22-113) HSP90AA1- PCR HSP90AA1 TACTGTCTTGAAAGCAGATAGAAACC (SEQ PCR-200bp-F template ID NO:46)
Figure imgf000102_0001
STDU2-42312.601 (S22-113) EMX-OT1-R Off EMX1 OT- CCTCTCTATGGGCAGTCGGTGATgGCTGA Target 1 CTTTGGGCTCCTTCT (SEQ ID NO:68) C
Figure imgf000103_0001
nd merged) sequencing reads were analyzed to determine editing outcomes using CRISPPResso25 by aligning sequenced amplicons to reference and expected HDR amplicons. The quantification window was increased to 10 bp surrounding the expected cut site to better capture diverse editing outcomes, but substitutions were ignored to avoid inclusion of sequencing errors. Only reads containing no mismatches to the expected amplicon were considered for HDR quantification; reads containing indels that partially matched the expected amplicons were included in the overall reported indel frequency. [00351] Statistical Analysis Unless otherwise stated, all statistical analysis and comparison were performed using t-test, with 1% false-discovery-rate (FDR) using two-stage step-up method of Benjamini, Krieger and Yekutieli (Benjamini, Y., et. al, Biometrika 93, 491–507 (2006), STDU2-42312.601 (S22-113) incorporated herein by reference). All experiments were performed in triplicates unless otherwise
Figure imgf000104_0001
noted to ensure sufficient statistical power in the analysis. [00352] Determination of editing at predicted Cas9 off-target sites To evaluate RecT/RecE off- target editing activity at known Cas9 off-target sites, same genomic DNA extracts for knock-in analysis were used as template for PCR amplification of top predicted off-targets sites (high scored as predicted CRISPOR, a web-based analysis tool) for the EMX1, VEGFA guides, primer sequences are listed in Table 4. [00353] iGUIDE Off-target Analysis Genome-wide, unbiased off-target analysis was performed following the iGUIDE pipeline (Nobles, C.L., et al. Genome Biol 20, 14 (2019), incorporated herein by reference) based on Guide-seq invented previously (Tsai, S., et al. Nat Biotechnol 33, 187–197 (2015), incorporated herein by reference). HEK293T cells were transfected in 20uL Lonza SF Cell Line Nucleofector Solution on a Lonza Nucleofector 4-D with program DS-150 according to the manufacturer’s instructions. 300ng of gRNA-Cas9 plasmids (or 150ng of each gRNACas9n plasmid for the double nickase), 150ng of the effector plasmids, and 5pmol of double stranded oligonucleotides (dsODN) were transfected. Cells were harvested after 72hrs for genomic DNA using Agencourt DNAdvance reagent kit. 400ng of purified gDNA which was then fragmented to an average of 500bp and ligated with adaptors using NEBNext Ultra II FS DNA Library Prep kit following manufacturer’s instructions. Two rounds of nested anchored PCR from the oligo tag to the ligated adaptor sequence were performed to amplify targeted DNA, and the amplified library was purified, size-selected, and sequenced using Illumina Miseq V2 PE300. Sequencing data was analyzed using the published iGUIDE pipeline, with the addition of a downsampling step which ensures an unbiased comparison across samples. Example 2 [00354] In contrast to mammals, convenient recombineering-edit tools are available for bacteria, e.g., the phage lambda Red and RecE/T. Microbial recombineering has two major steps: template DNA is chewed back by exonucleases (Exo), then the single-strand annealing protein (SSAP) supports homology directed repair by the template, optionally facilitated by nuclease inhibitor. A system for RNA-guided targeting of RecE/T recombineering activities was developed and achieved kilobase (kb) human gene-editing without DNA cutting. STDU2-42312.601 (S22-113) [00355] Candidate microbial systems with recombineering activities were surveyed. Two lines
Figure imgf000105_0001
of reasoning guided the search: 1) Orthogonality: prioritizing proteins with minimal to mammalian repair enzymes; 2) Parsimony: focusing on systems with fewest interdependent components. Three protein families were identified: lambda Red, RecE/T, and phage T7 gp6 (Exo) and gp2.5 (SSAP) recombination machinery. Based on phylogenetic reconstruction, RecE/T proteins were determined to be the most distant from eukaryotic recombination proteins and among the most compact (FIG. 1). Thus, RecE/T systems were utilized for downstream analysis. [00356] The NCBI protein database was systematically searched for RecE/T homologs. To develop a portable tool, evolutionary relationships and lengths were examined (FIG. 2A). Co- occurrence analysis revealed that most RecE/T systems have only one of the two proteins (FIG. 2B). As prophage integration could be imprecise, the 11% of species harboring both homologs were prioritized as evidence for intact functionality. [00357] The top 12 candidates were codon-optimized and MS2 coat protein (MCP) fusions were constructed to recruit these RecE/T homologs, hereafter termed “recombinator”, to wild-type Streptococcus pyogenes Cas9 (wtCas9) via MS2 RNA aptamers. To understand their respective molecular effects as Exo and SSAP, each was tested independently (FIG. 2C). Initial results revealed Escherichia coli RecE/T proteins (simplified as RecE and RecT) as promising candidates, as determined by genome knock-in assays (FIG. 2D). While RecT is only 269 amino acid (AA) long, RecE was truncated from AA587 (RecE_587) and the carboxy terminus domain (RecE_CTD) based on functional studies (Muyrers, J.P., Genes Dev. (2000); 14, 1971-1982, incorporated herein by reference). [00358] To validate RecE/T recombineering in human cells, homology directed repair (HDR) was measured at five genomic sites with two templates. While the RecE variants (RecE_587, RecE_CTD) demonstrated variable increases in knock-in efficiency, RecT significantly enhanced HDR in all cases, replacing ~16bp sequences at EMX1 and VEGFA, and knocking-in ~1kb cassette at HSP90AA1, DYNLT1, AAVS1 (FIGS. 3A-E, FIG. 4). These results were verified using imaging (FIG. 3F) and junction sites were sequenced using Sanger sequencing to confirm precise insertion (FIG.3G, FIG.4G). To test if these activities are truly sequence-specific, a no-recruitment control with the PP7 coat protein (PCP) that recognizes PP7 aptamers not MS2 aptamers was employed. RecE had activities without recruitment, whereas RecT showed efficiency increases in a recruitment-dependent manner (FIG. 3H). Without being bound by theory, this may be explained STDU2-42312.601 (S22-113) by RecE exonuclease activity acting promiscuously (FIG. 2C). The RecE/T recombineering-edit
Figure imgf000106_0001
(REDIT) tools was termed as REDITv1, with REDITv1_RecT as the preferred Example 3 [00359] Three tests on REDITv1 were performed to explore: 1) activity across cell types, 2) optimal designs of HDR template, and 3) specificity. REDITv1 activity was robust across multiple genomic sites in HEK, A549, HepG2, and HeLa cells (FIGS. 5A-C, FIGS. 6A-C). Noticeably, in human embryonic stem cells (hESCs), REDITv1 exhibited consistent increases of kilobase knock- in efficiency at HSP90AA1 and OCT4, with up to 3.5-fold improvement relative to Cas9-HDR (FIGS. 5D-E, FIGS. 6D-E). Different template designs were also tested. REDITv1 performed efficient kilobase editing using HA length as short as 200bp total, with longer HA supporting higher efficiency. It achieved up to 10% efficiency (without selection) for kb-scale knock-in, a 5- fold increase over Cas9-HDR and significantly higher than the 1~2% typical efficiency (FIG. 7). Lastly, the accuracy of REDITv1 accuracy was determined using deep sequencing of predicted off-target sites (OTSs) and GUIDE-seq. Although REDITv1 did not increase off-target effects, detectable OTSs remained at previously reported sites for EMX1 and VEGFA (FIGS. 5F-G, FIG. 8). In short, REDITv1 showcased kilobase-scale genome recombineering but retained the off- target issues, with REDITv1_RecT having the highest efficiency. Example 4 [00360] To alleviate unwanted edits, a version of REDIT with non-cutting Cas9 nickases (Cas9n) was assessed. A similar strategy was previously employed (Ran, F.A., et al., Cell (2013), 154: 1380-1389, incorporated herein by reference) to address off-target issues but had low HDR efficiency. REDIT was tested to determine if this system could overcome the limitation of endogenous repair and promote nicking-mediated recombination. Indeed, the nickase version demonstrated higher efficiencies, with the best results from Cas9n(D10A) with single- and double- nicking. This Cas9n(D10A) variant was designated REDITv2N (FIG. 9A). A 5%~10% knock-in without selection was observed using REDITv2N double-nicking, comparable to REDITv1 using wtCas9 (FIG. 9A, FIG. 10A). Junction sequencing confirmed the precision of knock-in for all targets (FIG. 11). This result represented 6- to 10-fold improvement over Cas9n-HDR. Even with single-nicking REDITv2N, a ~2% efficiency for 1kb knock-in was observed, a level considerably STDU2-42312.601 (S22-113) higher than the 0.46% HDR efficiency in previous report (Cong, L. et al., Science. 339, 819-823, incorporated herein by reference) using regular single-nicking Cas9n and a less-challenging 12-bp knock-in template (FIG. 9A). [00361] The off-target activity of REDITv2N was investigated using GUIDE-seq. Results showed minimal off-target cleavage and a reduction of OTSs by ~90% compared to REDITv1 (FIG. 9B). Specifically, for DYNLT1-targeting guides, the most abundant KIF6 OTS was significantly enriched in REDITv1 group but disappeared when using REDITv2N (FIG. 9C). REDITv2N was highly accurate (FIGS. 9B-C, FIG. 12). [00362] Another byproduct of HDR editing is on-target insertion-deletions (indels). They could drastically lower yields of gene-editing, especially for long sequences. Indel formation was measured in an EMX1 knock-in experiment using deep sequencing. REDITv2N increased HDR to the same efficiency as its counterpart using wtCas9 (FIG. 12C, top), with a reduction of unwanted on-target indels by 92% (FIG. 12C, bottom). [00363] Concepts from GUIDE-seq, LAM-PCR, and TLA were used to develop an NGS-based assay to identify genome-wide insertion sites (GIS), or GIS-seq (FIG.30A). Using GIS-seq, NGS read clusters/peaks representing knock-in insertion sites were obtained (FIG. 30B), showing representative reads from the on-target site). GIS-seq was applied to DYNLT1 and ACTB loci to measure the knock-in accuracy. Sequencing results indicated that, when considering sites with high confidence based on maximum likelihood estimation, REDIT had less off-target insertion sites identified compared with Cas9 (FIG.30C). Together, the clonal Sanger sequencing of knock- in junctions (FIGS. 9C and 12), GUIDE-seq analysis (FIG. 9B), and GIS seq results (FIGS. 30A- 30C) indicated that REDIT can be an efficient method with the ability to insert kilobase-length sequences with less unwanted editing events. Example 5 [00364] REDIT was examined for long sequence editing ability in the absence of any nicking/cutting of the target DNA. Remarkably, when using catalytically dead Cas9 (dCas9) to construct REDITv2D, an exact genomic knock-in of a kilobase cassette was observed in human cells (FIG.9D, top, FIG.13). While REDITv2D has lower efficiency than REDITv2N, it achieved programmable DNA-damage-free editing at kilobase-scale with 1~2% efficiency and no selection (FIG. 9D, FIG. 10B). It was hypothesized that two processes could be contributing to the STDU2-42312.601 (S22-113) REDITv2D recombineering. One possibility was via dCas9 unwinding. If dCas9 could unwind
Figure imgf000108_0001
DNA as it induces sequence-specific formation of loop, a double-binding with two be expected to promote genome accessibility to RecE/T. However, a significant increase upon delivering two guide RNAs was not observed (FIG. 9D, bottom). Another possibility was that the unwinding of DNA during cell cycle permitted RecE/T to access the target region mediated by dCas9 binding. A 1kb knock-in was performed with different REDIT tools at varying serum levels (10% regular, 2% reduced, and no serum). As serum starvation arrests cell proliferation, the results indicated that the cell cycle correlated positively with REDITv2D recombineering (FIG.9E). Upon no-serum treatment, HDR efficiency only dropped in REDITv2D(dCas9) group, whereas REDITv1(wtCas9) and REDITv2N(D10A) were not affected (FIG. 9E, FIG. 14), supporting that DNA unwinding permitted RecE/T to access the target region. Example 6 [00365] Microscopy analysis revealed incomplete nuclei-targeting of REDITv1, particularly REDITv1_RecT (FIG. 15). Hence, different designs of protein linkers and nuclear localization signals (NLSs) were tested (FIG. 15A). The extended XTEN-linker with C-terminal SV40-NLS was identified as a preferred configuration, termed REDITv3 (FIG.16). REDITv3 further achieved a 2- to 3- fold increase of HDR efficiencies over REDITv2 across genome targets and Cas9 variants (wtCas9, Cas9n, dCas9) (FIG. 17). [00366] Finally, REDITv3 was utilized in hESCs to engineer kilobase knock-in alleles in human stem cells. REDITv3N single- and double-nicking designs resulted in 5-fold and 20-fold increased HDR efficiencies over no-recombinator controls, respectively (FIG.9F). The efficacy and fidelity were confirmed via a combination of assays described for previous REDIT versions (FIGS.9F-G, FIG.18). Additionally, REDITv3 works effectively with Staphylococcus aureus Cas9 (SaCas9), a compact CRISPR system suitable for in vivo delivery (FIG. 19). Example 7 [00367] To further investigate RecT and RecE_587 variants, both RecT and RecE_587 were truncated at various lengths as shown in FIG. 20A and FIG. 21A, respectively. The resulting efficiencies were measured using an mKate knock-in assay, with both wildtype SpCas9 and STDU2-42312.601 (S22-113) Cas9n(D10A) with single- and double-nicking at the DYNLT1 locus (FIGS.20B-C and FIGS.21B-
Figure imgf000109_0001
C, respectively). Efficiencies of the no recombination group are shown as the [00368] The truncated versions of both RecT and RecE_587 retained significant recombineering activity when used with different Cas9s. In particular, compared with the full- length RecT(1-269aa), the new truncated versions such as RecT(93-264aa) are over 30% smaller yet they preserved essentially the full activities of RecT in stimulating recombination in eukaryotic cells. Similarly, compared with the full-length RecE(1-280aa), truncated versions such as RecE_587(120-221aa) and RecE_587(120-209aa) are over 60% smaller but still retained high recombination activities in human cells. These truncated versions demonstrated the potential to further engineer minimal-functional recombineering enzymes using RecE and RecT protein variants, but also provide valuable compact recombineering tools for human genome editing that is ideal for in vitro, ex vivo, and in vivo delivery given their small size. [00369] Overall, REDIT harnessed the specificity of CRISPR genome-targeting with the efficiency of RecE/RecT recombineering. The disclosed high-efficiency, low-error system makes a powerful addition to existing CRISPR toolkits. The balanced efficiency and accuracy of REDITv3N makes it an attractive therapeutic option for knock-in of large cassette in immune and stem cells. Example 8 [00370] The reconstructed RecE and RecT phylogenetic trees with eukaryotic recombination enzymes from yeast and human (FIGS.1A and 1B) show the evolutionary distance of the proteins based on sequence homology. The dotted boxes indicate the full-length E. coli RecB and E. coli RecE protein. The catalytic core domain of E. coli RecB and E. coli RecE protein (solid boxes) was used for the comparison. The gene-editing activities of these families of recombineering proteins were measured using the MS2-MCP recruitment system, where sgRNA bearing MS2 stem-loop is used with recombineering proteins fused to the MCP protein via peptide linker and with nuclear-localization signals. [00371] Three exonuclease proteins were used: the exonuclease from phage Lambda, the RecE587 core domain of E. coli RecE protein, and the exonuclease (gene name gp6) from phage T7 (FIG. 22A). The gene-editing activity was measured using mKate knock-in assay at genomic loci (DYNLT1 and HSP90AA1). STDU2-42312.601 (S22-113) [00372] Similar measurements were made testing the genome editing efficiencies of three single-strand DNA annealing proteins (SSAPs) from the same three species of microbes as the exonucleases, namely Bet protein from phage Lambda, RecT protein from E. coli, and SSAP (gene name gp2.5) from phage T7 (FIG. 22B). [00373] From these results, the genome recombineering activities of all three major family of phage/microbial recombination systems was systematically measured and validated in eukaryotic cells (lambda phage exonuclease and beta proteins; E. coli prophase RecE and RecT proteins, T7 phage exonuclease gp6 and single-strand binding gp2.5 proteins). All six proteins from three systems achieved efficient gene editing to knock-in kilobase-long sequences into mammalian genome across two genomic loci. Overall, the exonucleases showed ~3-fold higher recombination efficiency (up to 4% mKate genome knock-in) when compared with no-recombinator controls. The single-strand annealing proteins (SSAP) showed higher activities, with 4-fold to 8-fold higher gene-editing activities over the control groups. This demonstrated the general applicability and validity that microbial recombination proteins in the exonuclease and SSAP families could be engineered via the Cas9-based fusion protein system to achieve highly efficient genome recombination in mammalian cells. Example 9 [00374] In order to demonstrate the generalizability of REDIT protein design, alternative recruitment systems were developed and tested. For a more compact REDIT system, the REDIT recombinator proteins were fused to N22 peptide and at the same time the sgRNA included boxB, the short cognizant sequence of N22 peptide, replacing MCP within the sgRNA (FIG. 23A). This boxB-N22 system demonstrated comparable editing efficiencies at the two genomic sites tested as shown in FIGS. 23B-23E with side-by-side comparisons of the MS2-MCP recruitment system. [00375] A REDIT system using SunTag recruitment, a protein-based recruitment system, was developed (FIGS. 24A and 27A). Because SunTag is based on fusion protein design, the sgRNA or guideRNAs are the same as wild-type CRISPR system. Specifically, the REDIT recombinator proteins were fused to scFV antibody peptide (replacing MCP), and the GCN4 peptide was fused in tandem fashion (10 copies of GCN4 peptide separated by linkers) to the Cas9 protein. Thus, the scFV-REDIT could be recruited to the Cas9 complex via GCN4’s affinity to scFV. STDU2-42312.601 (S22-113) [00376] mKate knock-in experiments (FIG. 24B and 27B) were used to measure the editing
Figure imgf000111_0001
efficiencies at the DYNLT1 locus and the HSP90AA1 locus, respectively. This REDIT system demonstrated a significant increase of gene-editing knock-in efficiency at the DYNLT1 genomic sites tested. In addition, the SunTag design significantly increased HRD efficiencies to ~2-fold better than Cas9 but did not achieve increases as high as the MS2-aptamer. Example 10 [00377] In order to demonstrate the generalizability of REDIT protein design and develop versatile REDIT system applicable to a range of CRISPR enzymes, Cpf1/Cas12a based REDIT system using the SunTag recruitment design was developed (FIG. 25A). Two different Cpf1/Cas12a proteins were tested (Lachnospiraceae bacterium ND2006, LbCpf1 and Acidaminococcus sp. BV3L6) using the mKate knock-in assay as previously shown (FIG. 25B). [00378] These results showed that the recombination proteins (exonuclease and single-strand annealing proteins) could be engineered using alternative designs such as the SunTag recruitment system to perform genome editing in eukaryotic cells. These protein-based recruitment system does not require the usage of RNA aptamers or RNA-binding proteins, instead, they took advantage of fusion protein domains directly connecting to the CRISPR enzymes to recruit REDIT proteins. [00379] In addition to the flexibility in recruitment system design, these results using Cpf1/Cas12a-type CRISPR enzymes also demonstrated the general adaptability of REDIT proteins to various CRISPR systems for genome recombination. Cpf1/Cas12a enzymes have different catalytic residues and DNA-recognition mechanisms from the Cas9 enzymes. Hence, the REDIT recombination proteins (exonucleases and single-strand annealing proteins) could function independent from the specific choices of the CRISPR enzyme components (Cas9, Cpf1/Cas12a, and others). This proved the generalizability of the REDIT system and open up possibility to use additional CRISPR enzymes (known and unknown) as components of REDIT system to achieve accurate genome editing in eukaryotic cells. Example 11 [00380] 15 different species of microbes having RecE/RecT proteins were selected for a screen of various RecE and RecT proteins across the microbial kingdom (Table 5). Each protein was STDU2-42312.601 (S22-113) codon-optimized and synthesized. As previously described for E. coli RecE/RecT based REDIT
Figure imgf000112_0001
systems, each protein was fused via E-XTEN linker to the MCP protein with localization signal. mKate knock-in gene-editing assay was used to measure efficiencies at DYNLT1 locus (FIG. 26A, Table 6) and HSP90AA1 locus (FIG. 26B, Table 6). The homologs demonstrated the ability to enable and enhance precision gene-editing. Table 5. RecE and RecT protein homologs Homolo So rce Protein
Figure imgf000112_0002
STDU2-42312.601 (S22-113) T15 Photobacterium sp. JCM 19050 RecT E15 Photobacterium sp. JCM 19050 RecE
Figure imgf000113_0001
Table 6. mKate Knock-In Gene-Editing Efficiencies DYNLT1 HSP90AA1
Figure imgf000113_0002
STDU2-42312.601 (S22-113) Homolog_T15 7.8033 0.7075 5.2333 0.2302 Homolog_E15 5.0700 0.5543 6.0500 0.5696 Example 12
Figure imgf000114_0001
[00381] Next, to benchmark the RecT-based REDIT design, it was compared with three categories of existing HDR-enhancing tools (FIGS. 28A and 28B): DNA repair enzyme CtIP fusion with the Cas9 (Cas9-HE), a fusion of the functional domain (amino acids 1 to 110) of human Geminin protein with the Cas9 (Cas9-Gem), and a small-molecule enhancers of HDR via cell cycle control, Nocodazole. Across endogenous targets tested, the RecT-based REDIT design had favorable performance compared with three alternative strategies (Figure 3C). Furthermore, the RecT-based REDIT design, which putatively acted through activity independently from the other approaches, may synergize with existing methods. To test this hypothesis, RecT-based REDIT design was combined with three different approaches (conveniently through the MS2-aptamer) (FIG. 28A, right). The RecT-based REDIT design could indeed further enhance the HDR- promoting activities of the tested tools (FIG. 28C). Example 13 [00382] The effect of template HA lengths on the editing efficiency of REDIT was quantified when using the canonical HDR donor bearing HAs of at least 100 bp on each side (FIG.29A, left). Higher HDR rates were observed for both Cas9 and RecT groups with increasing HA lengths, and REDIT effectively stimulated HDR over Cas9 using HA lengths as short as ∼100bp each side. When supplied with a longer template bearing 600–800 bp total HA, RecT achieved over 10% HDR efficiencies for kb-scale knock-in without selection, significantly higher than the 2–3% efficiency when only using Cas9. Recent reports identified that using donor DNAs with shorter HAs (usually between 10 and 50 bp) could significantly stimulate knock-in efficiencies thanks to the high repair activities from the Microhomology-mediated end joining (MMEJ) pathway. Knock-in efficiencies of the REDIT-based method were compared with Cas9, using donor DNA with 0bp (NHEJ-based), 10bp or 50bp (MMEJ-based) HAs. The results demonstrated that short- HA donors leveraging MMEJ mechanisms yielded higher editing efficiencies compared with HDR donors (FIG. 29A, right). At the same time, REDIT was able to enhance the knock-in efficiencies as long as there is HA present (no effect for the 0bp NHEJ donor). This effect is particularly STDU2-42312.601 (S22-113) significant with The 10 bp donors in which there was a significant effect, were chosen for further
Figure imgf000115_0001
characterization and comparison with the HDR donors. [00383] The knock-in cells were clonally isolated and the target genomic region was amplified using primers binding completely outside of the donor DNAs for colony Sanger sequencing (FIG. 29B. Junction sequencing analysis (∼48 colonies per gene per condition) revealed varying degrees of indels at the 5’- and 3’- knock-in junctions, including at single or both junctions (FIG. 29C). Overall, HDR donors had better precision than MMEJ donors, and REDIT modestly improved the knock-in yield compared with Cas9, though junction indels were still observed. [00384] Furthermore, the efficiencies of REDIT and Cas9 were compared when making different lengths of editing. For longer edits, 2-kb knock-in cassettes were used (FIG. 29D), and for shorter edits single-stranded oligo donors (ssODN) were used. When the knock-in sequence length was increased to ∼2-kb using a dual-mKate/GFP template, REDIT maintained its HDR- promoting activity compared with Cas9 across endogenous targets tested (FIG. 29D). For ssODN tests, at two well-established loci EMX1 and VEGFA, REDIT and Cas9 were used to introduce 12–16-bp exogenous sequences. As ssODN templates are short (<100 bp HAs on each side), next- generation sequencing (NGS) was used to quantify the editing events. Comparable levels of indels were observed between Cas9 and REDIT with improved HDR efficiencies using REDIT. Example 14 [00385] The sensitivity of REDIT’s ability to promote HDR in the presence or absence of two distinctive pharmacological inhibitors of RAD51, B02 and RI-1 (FIG. 31A). As expected, for Cas9-based editing, RAD51 inhibition significantly lowered HDR efficiencies (FIGS. 31B, 31C, and 32A). Intriguingly, RAD51 inhibition decreased REDIT and REDITdn efficiencies only moderately, as both REDIT/REDITdn methods maintained significantly higher knock-in efficiencies compared with Cas9/Cas9dn under RAD51 inhibition. [00386] Mirin, a potent chemical inhibitor of DSB repair, which has also been shown to prevent MRN complex formation, MRN-dependent ATM activation, and inhibit Mre11 exonuclease activity was also used. When treating cells with Mrining, only the editing efficiencies of Cas9 reference experiments were affected by the Miring treatment, whereas the REDIT versions were essentially the same as vehicle-treated groups across all genomic targets (FIG. 32A). STDU2-42312.601 (S22-113) [00387] To test if cell cycle inhibition affected recombination, cells were chemically
Figure imgf000116_0001
synchronized at the G1/S boundary using double Thymidine blockage (DTB). had reduced editing efficiencies under DTB treatment, though it maintained higher editing efficiencies under DNA repair pathway inhibition, compared with Cas9 reference experiments, when Miring RI-1, or B02 were combined with DTB treatment (FIG. 32B). [00388] To validate REDIT in different contexts, REDIT was applied in human embryonic stem cells (hESCs) to test their ability to engineer long sequences in non-transformed human cells. Robust stimulation of HDR was observed across all three genomic sites (HSP90AA1, ACTB, OCT4/POU5F1) using REDIT and REDITdn (FIGS. 31D and 31E). Of note, REDIT and REDITdn editing used donor DNAs with 200-bp HAs on each side and achieved up to over 5% efficiency for kb-scale gene-editing without selection compared with ∼1% efficiency using non- REDIT methods. Additionally, REDIT improved knock-in efficiencies in A549 (lung-derived), HepG2 (liver-derived), and HeLa (cervix-derived) cells, demonstrating up to ∼15% kb-scale genomic knock-in without selection. This improvement was up to 4-fold higher than the Cas9 groups, supporting the potential of using REDIT methods in different cell types. Example 15 [00389] In vivo use of dCas9-EcRecT (SAFE-dCas9) was tested using cleavage free dCas9 editor via hydrodynamic tail vein injection. The gene editing vectors and template DNA used are shown in FIG. 33A. A gene editing vector (60 µg) and template DNA (60 µg) were injected via hydrodynamic tail vein injection to deliver the components to the mouse. Successful gene editing of liver hepatocytes was monitored by transgene-encoded protein expression from the albumin locus. A schematic of the experimental procedure is shown in FIG. 33B [00390] At approximately seven days after injection, the perfused mice livers were dissected. The lobes of the liver were homogenized and processed to extract liver genomic DNA from the primary hepatocytes. The extracted genomic DNA was used for three different downstream analyses: 1) PCR using knock-in-specific primers and agarose gel electrophoresis (FIG. 34A); 2) Sanger sequencing of the knock-in PCR product (FIG. 34B); 3) high-throughput deep sequencing of the knock-in junction to confirm and quantify the accuracy of gene-editing using SAFE-dCas9 in vivo (FIG. 34C). Each downstream analysis confirmed knock-in success with . STDU2-42312.601 (S22-113) [00391] In addition, in vivo use was tested using adeno-associated virus (AAV) delivery into
Figure imgf000117_0001
LTC mice lungs. LTC mice include three genome alleles: 1) Lkb1 (flox/flox) allele KO when expressing Cre; 2) R26(LSL-TdTom) allele allows detection of AAV-transduced cells via TdTom red fluorescent protein; and 3) H11(LSL-Cas9) allele allows expression of Cas9 in AAV-transduced cells. Schematics of the REDI gene editing vector and Cas9 control vectors are shown in FIG. 35A. As shown in FIG. 35B, successful gene editing using the gene editing vector leads to Kras alleles that drive tumor growth in the lung of the treated mice. [00392] Approximately fourteen weeks after the AAV injection, perfused mice lungs were dissected. Fixed lung tissue was used for imaging analysis to identify tumor formation from successful gene-editing (FIG. 35C). Quantification of the surface tumor number via imagining analysis showed increased gene-editing efficiencies and total number of tumors in the REDIT treated mice (FIG. 35C). Escherichia coli RecE amino acid sequence (SEQ ID NO:1): MSTKPLFLLRKAKKSSGEPDVVLWASNDFESTCATLDYLIVKSGKKLSSYFKAVATNFP VVNDLPAEGEIDFTWSERYQLSKDSMTWELKPGAAPDNAHYQGNTNVNGEDMTEIEEN MLLPISGQELPIRWLAQHGSEKPVTHVSRDGLQALHIARAEELPAVTALAVSHKTSLLDP LEIRELHKLVRDTDKVFPNPGNSNLGLITAFFEAYLNADYTDRGLLTKEWMKGNRVSHI TRTASGANAGGGNLTDRGEGFVHDLTSLARDVATGVLARSMDLDIYNLHPAHAKRIEEI IAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIEVIPAHVTEYLNKVLTETDHA NPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKPQPSGTTAVEQGEAETMEPDATEHHQ DTQPLDAQSQVNSVDAKYQELRAELHEARKNIPSKNPVDDDKLLAASRGEFVDGISDPN DPKWVKGIQTRDCVYQNQPETEKTSPDMNQPEPVVQQEPEIACNACGQTGGDNCPDCG AVMGDATYQETFDEESQVEAKENDPEEMEGAEHPHNENAGSDPHRDCSDETGEVADP VIVEDIEPGIYYGISNENYHAGPGISKSQLDDIADTPALYLWRKNAPVDTTKTKTLDLGT AFHCRVLEPEEFSNRFIVAPEFNRRTNAGKEEEKAFLMECASTGKTVITAEEGRKIELMY QSVMALPLGQWLVESAGHAESSIYWEDPETGILCRCRPDKIIPEFHWIMDVKTTADIQRF KTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFLVASTTIECGRYPVEIFMMGEEAKLA GQQEYHRNLRTLADCLNTDEWPA IKTLSLPRWAKEYAND Escherichia coli RecE_587 amino acid sequence (SEQ ID NO:2): ADPVIVEDIEPGIYYGISNENYHAGPGVSKSQLDDIADTPALYLWRKNAPVDTTKTKTLD LGTAFHCRVLEPEEFSNRFIVAPEFNRRTNSGKEEEKAFLRECASTGKTVITAEEGRKIEL MYQSVMALPLGQWLVESAGHAESSIYWEDPETAILCRCRPDKIIPEFHWIMDVKTTADI QRFKTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFLVASTTIECGRYPVEIFMMGEEA KLAGQLEYHRNLRTLADCLNTDEWPAIKTLSLPRWAKEYAND* Escherichia coli CTD_RecE amino acid sequence (SEQ ID NO:3): GISNENYHAGPGVSKSQLDDIADTPALYLWRKNAPVDTTKTKTLDLGTAFHCRVLEPEE FSNRFIVAPEFNRRTNSGKEEEKAFLRECASTGKTVITAEEGRKIELMYQSVMALPLGQW STDU2-42312.601 (S22-113) LVESAGHAESSIYWEDPETAILCRCRPDKIIPEFHWIMDVKTTADIQRFKTAYYDYRYHV
Figure imgf000118_0001
ADCLNTDEWPAIKTLSLPRWAKEYAND* Pantoea brenneri RecE amino acid sequence (SEQ ID NO:4): MQPGIYYDISNEDYHRGAGISKSQLDDIAISPAIYQWRKHAPVDEEKTAALDLGTALHCL LLEPDEFSKRFQIGPEVNRRTTAGKEKEKEFIERCEAEGITPITHDDNRKLKLMRDSALAH PIARWMLEAQGNAEASIYWNDRDAGVLSRCRPDKIITEFNWCVDVKSTADIMKFQKDF YSYRYHVQDAFYSDGYESHFHETPTFAFLAVSTSIDCGRYPVQVFIMDQQAKDAGRAE YKRNIHTFAECLSRNEWPGIATLSLPFWAKELRNE Type-F symbiont of Plautia stali RecE amino acid sequence (SEQ ID NO:5): MQPGIYYDISNEDYHGGPGISKSQLDDIAISPAIYQWRKHAPVDEEKTAALDLGTALHCL LLEPDEFSKRFEIGPEVNRRTTAGKEKEKEFMERCEAEGVTPITHDDNRKLRLMRDSAM AHPIARWMLEAQGNAEASIYWNDRDTGVLSRCRPDKIITDFNWCVDVKSTADIIKFQKD FYSYRYHVQDAFYSDGYESHFDETPTFAFLAVSTSIDCGRYPVQVFIMDQQAKDAGRAE YKRNIHTFAECLSRNEWPGIATLSLPYWAKELRNE Providencia sp. MGF014 RecE amino acid sequence (SEQ ID NO:6): MKEGIYYNISNEDYHNGLGISKSQLDLINEMPAEYIWSKEAPVDEEKIKPLEIGTALHCLL LEPDEYHKRYKIGPDVNRRTNVGKEKEKEFFDMCEKEGITPITHDDNRKLMIMRDSALA HPIAKWCLEADGVSESSIYWTDKETDVLCRCRPDRIITAHNYIIDVKSSGDIEKFDYEYYN YRYHVQDAFYSDGYKEVTGITPTFLFLVVSTKIDCGKYPVRTYVMSEEAKSAGRTAYK HNLLTYAECLKTDEWAGIRTLSLPRWAKELRNE Shigella sonnei RecE amino acid sequence (SEQ ID NO:7): DRGLLTKEWRKGNRVSRITRTASGANAGGGNLTDRGEGFVHDLTSLARDIATGVLARS MDVDIYNLHPAHAKRIEEIIAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIEVI PAHVTAYLNKVLTETDHANPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKLQPSGTTA DEQGEAETMEPDATKHHQDTQPLDAQSQVNSVDAKYQELRAELHEARKNIPSKNPVDA DKLLAASRGEFVDGISDPNDPKWVKGIQTRDSVYQNQPETEKTSPDMKQPEPVVQQEPE IAFNACGQTGGDNCPDCGAVMGDATYQETFDEENQVEAKENDPEEMEGAEHPHNENA GSDPHRDCSDETGEVADPVIVEDIEPGIYYGISNENYHAGPGVSKSQLDDIADTPALYLW RKNAPVDTTKTKTLDLGTAFHCRVLEPEEFSNRFIVAPEFNRRTNAGKEEEKAFLMECA STGKMVITAEEGRKIELMYQSVMALPLGQWLVESAGHAESSIYWEDPETGILCRCRPDK IIPEFHWIMDVKTTADIQRFKTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFLVASTTIE CGRYPVEIFMMGEEAKLAGQLEYHRNLRTLADCLNTDEWPAIKTLSLPRWAKEYAND Pseudobacteriovorax antillogorgiicola RecE amino acid sequence (SEQ ID NO:8): MSKLSNLKVSNSDVDTLSRIRMKEGVYRDLPIESYHQSPGYSKTSLCQIDKAPIYLKTKV PQKSTKSLNIGTAFHEAMEGVFKDKYVVHPDPGVNKTTKSWKDFVKRYPKHMPLKRSE YDQVLAMYDAARSYRPFQKYHLSRGFYESSFYWHDAVTNSLIKCRPDYITPDGMSVIDF KTTVDPSPKGFQYQAYKYHYYVSAALTLEGIEAVTGIRPKEYLFLAVSNSAPYLTALYR ASEKEIALGDHFIRRSLLTLKTCLESGKWPGLQEEILELGLPFSGLKELREEQEVEDEFME LVG STDU2-42312.601 (S22-113) Escherichia coli RecT amino acid sequence (SEQ ID NO:9):
Figure imgf000119_0001
TTEIRKVPALGNCDTMSFVSAIVQCSQLGLEPGSALGHAYLLPFGNKNEKSGKKNVQLII GYRGMIDLARRSGQIASLSARVVREGDEFSFEFGLDEKLIHRPGENEDAPVTHVYAVAR LKDGGTQFEVMTRKQIELVRSLSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQR AVSMDEKEPLTIDPADSSVLTGEYSVIDNSEE* Pantoea brenneri RecT amino acid sequence (SEQ ID NO:10): MSNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI RIVTTEIRKTPQLAQCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLVHRPGENEDAPITHVYAV ARLKDGGTQFEVMTVKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSI EMQKAVVLDEKAESDVDQDNASVLSAEYSVLESGDEATN Type-F symbiont of Plautia stali RecT amino acid sequence (SEQ ID NO:11): MSNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI RIVTTEIRKTPALATCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGDNEDAPITHVYAV ARLKDGGTQFEVMTAKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSI EMQKAVVLDEKAESDVDQDNASVLSAEYSVLEGDGGE Providencia sp. MGF014 RecT amino acid sequence (SEQ ID NO:12): MSNPPLAQSDLQKTQGTEVKVKTKDQQLIQFINQPSMKAQLAAALPRHMTPDRMIRIVT TEIRKTPALATCDMQSFVGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLII GYRGMIDLARRSNQIISISARTVRQGDNFHFEYGLNEDLTHTPSENEDSPITHVYAVARL KDGGVQFEVMTYNQVEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQ KAVVLDEKAEANVDQENATIFEGEYEEVGTDGN Shigella sonnei RecT amino acid sequence (SEQ ID NO:13): MTKQPPIAKADLQKTQENRAPAAIKNNDVISFINQPSMKEQLAAALPRHMTAERMIRIA TTEIRKVPALGNCDTMSFVSAIVQCSQLGLEPGSALGHAYLLPFGNKNEKSGKKNVQLII GYRGMIDLARRSGQIASLSARVVREGDEFNFEFGLDEKLIHRPGENEDAPVTHVYAVAR LKDGGTQFEVMTRRQIELVRSQSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQR AVSMDEKEPLTIDPADSSVLTGEYSVIDNSEE Pseudobacteriovorax antillogorgiicola RecT amino acid sequence (SEQ ID NO:14): MGHLVSKTEQDYIKQHYAKGATDQEFEHFIGVCRARGLNPAANQIYFVKYRSKDGPAK PAFILSIDSLRLIAHRTGDYAGCSEPIFTDGGKACTVTVRRNLKSGETGNFSGMAFYDEQ VQQKNGRPTSFWQSKPRTMLEKCAEAKALRKAFPQDLGQFYIREEMPPQYDEPIQVHK PKALEEPRFSKSDLSRRKGLNRKLSALGVDPSRFDEVATFLDGTPDRELGQKLKLWLKE AGYGVNQ STDU2-42312.601 (S22-113) SV40 NLS amino acid sequence (SEQ ID NO:16):
Figure imgf000120_0001
PKKKRKV Ty1 NLS amino acid sequence (SEQ ID NO:17): NSKKRSLEDNETEIKVSRDTWNTKNMRSLEPPRSKKRIH c-Myc NLS amino acid sequence (SEQ ID NO:18): PAAKRVKLD biSV40 NLS amino acid sequence (SEQ ID NO:19): KRTADGSEFESPKKKRKV Mut NLS amino acid sequence (SEQ ID NO:20): PEKKRRRPSGSVPVLARPSPPKAGKSSCI Template DNA sequences (underlining marks the replaced or inserter editing sequences) EMX1 HDR template sequence (SEQ ID NO:79): CATTCTGCCTCTCTGTATGGAAAAGAGCATGGGGCTGGCCCGTGGGGTGGTGTCCAC TTTAGGCCCTGTGGGAGATCATGGGAACCCACGCAGTGGGTcataggctctctcatttactactcacat ccactctgtgaagaagcgattatgatctctcctctagaaaCTCGTAGAGTCCCATGTCTGCCGGCTTCCAGAG CCTGCACTCCTCCACCTTGGCTTGGCTTTGCTGGGGCTAGAGGAGCTAGGATGCACA GCAGCTCTGTGACCCTTTGTTTGAGAGGAACAGGAAAACCACCCTTCTCTCTGGCCC ACTGTGTCCTCTTCCTGCCCTGCCATCCCCTTCTGTGAATGTTAGACCCATGGGAGCA GCTGGTCAGAGGGGACCCCGGCCTGGGGCCCCTAACCCTATGTAGCCTCAGTCTTCC CATCAGGCTCTCAGCTCAGCCTGAGTGTTGAGGCCCCAGTGGCTGCTCTGGGGGCCT CCTGAGTTTCTCATCTGTGCCCCTCCCTCCCTGGCCCAGGTGAAGGTGTGGTTCCAG AACCGGAGGACAAAGTACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCG AGCAGAAGAAGAAGGGCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAG GCCAATGGGGAGGACATCGATGTCACCTCCAATGACTCGGATGTACACGGTCTGCA ACCACAAACCCACGAGGGCAGAGTGCTGCTTGCTGCTGGCCAGGCCCCTGCGTGGG CCCAAGCTGGACTCTGGCCACTCCCTGGCCAGGCTTTGGGGAGGCCTGGAGTCATGG CCCCACAGGGCTTGAAGCCCGGGGCCGCCATTGACAGAGGGACAAGCAATGGGCTG GCTGAGGCCTGGGACCACTTGGCCTTCTCCTCGGAGAGCCTGCCTGCCTGGGCGGGC CCGCCCGCCACCGCAGCCTCCCAGCTGCTCTCCGTGTCTCCAATCTCCCTTTTGTTTT GATGCATTTCTGTTTTAATTTATTTTCCAGGCACCACTGTAGTTTAGTGATCCCCAGT GTCCCCCTTCCCTATGGGAATAATAAAAGTCTCTCTCTTAATGACACGGGCATCCAG CTCCAGCCCCAGAGCCTGGGGTGGTAGATTCCGGCTCTGAGGGCCAGTGGGGGCTG GTAGAGCAAACGCGTTCAGGGCCTGGGAGCCTGGGGTGGGGTACTGGTGGAGGGGG TCAAGGGTAATTCATTAACTCCTCTCTTTTGTTGGGGGACCCTGGTCTCTACCTCCAG CTCCACAGCAGGAGAAACAGGCTAGACATAGGGAAGGGCCATCCTGTATCTTGAGG GAGGACAGGCCCAGGTCTTTCTTAACGTATTGAGAGGTGGGAATCAGGCCCAGGTA GTTCAATGGG VEGFA HDR template sequence (SEQ ID NO:80): STDU2-42312.601 (S22-113) AGGTTTGAATCATCACGCAGGCCCTGGCCTCCACCCGCCCCCACCAGCCCCCTGGCC TCAGTTCCCTGGCAACATCTGGGGTTGGGGGGGCAGCAGGAACAAGGGCCTCTGTC TGCCCAGCTGCCTCCCCCTTTGGGTTTTGCCAGACTCCACAGTGCATACGTGGGCTC CAACAGGTCCTCTTCCCTCCCAGTCACTGACTAACCCCGGAACCACACAGCTTCCCG TTctcagctccacaaacttggtgccaaattcttctcccctgggaagcatccctggacacttcccaaaggaccccagtcactccagcctgttg gctgccgctcactttgatgtctgcaggccagatgagggctccagatggcacattgtcagagggacacactgtggcccctgtgcccagccct gggctctctgtacatgaagcaactccagtcccaaatatgtagctgtttgggaggtcagaaatagggggtccaggagcaaactccccccacc ccctttccaaagcccattccctctttagccagagccggggtgtgcagacggcagtcactagggggcgctcggccaccacagggaagctg ggtgaatggagcgagcagcgtcttcgagagtgaggacgtgtgtgtctgtgtgggtgagtgagtgtgCgcACTCTAGAGgtgtCg Tgttgagggcgttggagcggggagaaggccaggggtcactccaggattccaatagatctgtgtgtccctctccccacccgtccctgtccg gctctccgccttcccctgcccccttcaatattcctagcaaagagggaacggctctcaggccctgtccgcacgtaacctcactttcctgctccct cctcgccaatgccccgcgggcgcgtgtctctggacagagtttccgggggcggatgggtaattttcaggctgtgaaccttggtgggggtcga gcttccccttcattgcggcgggctGCGGGCCAGGCTTCACTGAGCGTCCGCAGAGCCCGGGCCCGA GCCGCGTGTGGAAGGGCTGAGGCTCGCCTGTccccgccccccggggcgggccgggggcggggtcccgg cggggcggAGCCATGCGCCCCCCCCttttttttttAAAAGTCGGCTGGTAGCGGGGAGGATCGC GGAGGCTTGGGGCAGCCGGGTAGCTCGGAGGTCGTGGCGCTGGGGGCTAGCACCAG CGCTCTGTCGGGAGGCGCAGCGGTTAGGTGGACCGGTCAGCGGACTCACCGGCCAG GGCGCTCGGTGCTGGAATTTGATATTCATTGATCCGGGttttatccctcttcttttttcttaaacatttttttttA AAACTGTATTGTTTCTCGTTTTAATTTATTTTTGCTTGCCATTCCCCACTTGAAT DYNLT1 HDR template sequence (SEQ ID NO:81): AGTGACCTGTGTAATTATGCAGAAGAATGGAGCTGGATTACACACAGCAAGTTCCT GCTTCTGGGACAGCTCTACTGACGGTATGATTTTCATTCATGTTTGTGAAGTTTTGTT GTGTGAAATATATGACTGGAAGTTTCCTATCTTTGAATGCAATGCATGTTTATCACCT TTTAAAACATTTAATAATAGACTTGCCAAGGTTCTTTGTGTAGCATAGAGATGGGTA CTTGAATGTTGGCCTTATTGTGAGTAAAACGTCGTCCCCCAGCTTTCCCTGCCGTAAA TGCTGCTCTCTTCCCTCCCGCAGGGAGCTGCACTGTGCGATGGGAGAATAAGACCAT GTACTGCATCGTCAGTGCCTTCGGACTGTCTATTGGAAGCGGAGCTACTAACTTCAG CCTGCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTgccaccatggtgagcgagct gattaaggagaacatgcacatgaagctgtacatggagggcaccgtgaacaaccaccacttcaagtgcacatccgagggcgaaggcaagc cctacgagggcacccagaccatgagaatcaaggcggtcgagggcggccctctccccttcgccttcgacatcctggctaccagcttcatgta cggcagcaaaaccttcatcaaccacacccagggcatccccgacttctttaagcagtccttccccgagggcttcacatgggagagagtcacc acatacgaagatgggggcgtgctgaccgctacccaggacaccagcctccaggacggctgcctcatctacaacgtcaagatcagaggggt gaacttcccatccaacggccctgtgatgcagaagaaaacactcggctgggaggcctccaccgagacactgtaccccgctgacggcggcc tggaaggcagagccgacatggccctgaagctcgtgggcgggggccacctgatctgcaaccttaagaccacatacagatccaagaaaccc gctaagaacctcaagatgcccggcgtctactatgtggacaggagactggaaagaatcaaggaggccgacaaagagacatacgtcgagca gcacgaggtggctgtggccagatactgcgacctccctagcaaactggggcacaaacttaattccTAACCaGCtGTCCtGCCT ATGGCCTTTCTCCTTTTGTCTCTAGTTCATCCTCTAACCACCAGCCATGAATTCAGTG AACTCTTTTCTCATTCTCTTTGTTTTGTGGCACTTTCACAATGTAGAGGAAAAAACCA AATGACCGCACTGTGATGTGAATGGCACCGAAGTCAGATGAGTATCCCTGTAGGTC ACCTGCAGCCTGCGTTGCCACTTGTCTTAACTCTGAATATTTCATTTCAAAGGTGCTA AAATCTGAAATCTGCTAGTGTGAAACTTGCTCTACTCTCTGAAATGATTCAAATACA CTAATTTTCCATACTTTATACTTTTGTTAGAATAAATTATTCAAATCTAAAGTCTGTT GTGTTCTTCATAGTCTGCATAGTATCATAAACG HSP90AA1 HDR template sequence (SEQ ID NO:82): STDU2-42312.601 (S22-113) GCAGCAAAGAAACACCTGGAGATAAACCCTGACCATTCCATTATTGAGACCTTAAG GCAAAAGGCAGAGGCTGATAAGAACGACAAGTCTGTGAAGGATCTGGTCATCTTGC TTTATGAAACTGCGCTCCTGTCTTCTGGCTTCAGTCTGGAAGATCCCCAGACACATG CTAACAGGATCTACAGGATGATCAAACTTGGTCTGGGTAAGCCTTATACTATGTAAT GTTAAAAAGAAAATAAACACACGTGACATTGAAGAAAATGGTGAACTTTCAGTTAT CCAAACTTGGAGCACCTTGTCCTGCTTGCTGCTTGGAGGTATTAAAGTATGttttttttAGG GATAAGTAAGGTCTTACAAGAGCAAAGAAATGAAATTGAGACTCATATGTCCTGTA ATACTGTCTTGAAAGCAGATAGAAACCAAGAGTATTACCCTAATAGCTGGCTTTAAG AAATCTTTGTAATATGAGGATTTTATTTTGGAAACAGGTATTGATGAAGATGACCCT ACTGCTGATGATACCAGTGCTGCTGTAACTGAAGAAATGCCACCCCTTGAAGGAGAT GACGACACATCACGCATGGAAGAAGTAGACGGAAGCGGAGCTACTAACTTCAGCCT GCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTgtgagcgagctgattaaggagaaca tgcacatgaagctgtacatggagggcaccgtgaacaaccaccacttcaagtgcacatccgagggcgaaggcaagccctacgagggcac ccagaccatgagaatcaaggcggtcgagggcggccctctccccttcgccttcgacatcctggctaccagcttcatgtacggcagcaaaacc ttcatcaaccacacccagggcatccccgacttctttaagcagtccttccccgagggcttcacatgggagagagtcaccacatacgaagatgg gggcgtgctgaccgctacccaggacaccagcctccaggacggctgcctcatctacaacgtcaagatcagaggggtgaacttcccatccaa cggccctgtgatgcagaagaaaacactcggctgggaggcctccaccgagacactgtaccccgctgacggcggcctggaaggcagagc cgacatggccctgaagctcgtgggcgggggccacctgatctgcaaccttaagaccacatacagatccaagaaacccgctaagaacctcaa gatgcccggcgtctactatgtggacaggagactggaaagaatcaaggaggccgacaaagagacatacgtcgagcagcacgaggtggct gtggccagatactgcgacctccctagcaaactggggcacaaacttaattccTAaATCTgTGGCTGAGGGATGACTTA CCTGTTCAGTACTCTACAATTCCTCTGATAATATATTTTCAAGGATGTTTTTCTTTATT TTTGTTAATATTAAAAAGTCTGTATGGCATGACAACTACTTTAAGGGGAAGATAAGA TTTCTGTCTACTAAGTGATGCTGTGATACCTTAGGCACTAAAGCAGAGCTAGTAATG CTTTTTGAGTTTCATGTTGGTTTATTTTCACAGATTGGGGTAACGTGCACTGTAAGAC GTATGTAACATGATGTTAACTTTGTGGTCTAAAGTGTTTAGCTGTCAAGCCGGATGC CTAAGTAGACCAAATCTTGTTATTGAAGTGTTCTGAGCTGTATCTTGATGTTTAGAA AAGTATTCGTTACATCTTGTAGGATCTACTTTTTGAACTTTTCATTCCCTGTAGTTGA CAATTCTGCATGTACTAGTCCTCTAGAAATAGGTTAAACTGAAGCAACTTGATGGAA GGATCTCTCCACAGGGCTTGTTTTCCAAAGAAAAGTATTGTTTGGAGGAGCAAAGTT AAAAGCCTACCTAAGCATATCGTAAAGCTGTTCAAAAATAACTCAGACCCAGTCTTG TGGA AAVS1 HDR template sequence (SEQ ID NO:83): gatgctctttccggagcacttccttctcggcgctgcaccacgtgatgtcctctgagcggatcctccccgtgtctgggtcctctccgggcatctc tcctccctcacccaaccccatgccgtcttcactcgctgggttcccttttccttctccttctggggcctgtgccatctctcgtttcttaggatggcctt ctccgacggatgtctcccttgcgtcccgcctccccttcttgtaggcctgcatcatcaccgtttttctggacaaccccaaagtaccccgtctccct ggctttagccacctctccatcctcttgctttctttgcctggacaccccgttctcctgtggattcgggtcacctctcactcctttcatttgggcagctc ccctaccccccttacctctctagtctgtgctagctcttccagccccctgtcatggcatcttccaggggtccgagagctcagctagtcttcttcctc caacccgggcccctatgtccacttcaggacagcatgtttgctgcctccagggatcctgtgtccccgagctgggaccaccttatattcccagg gccggttaatgtggctctggttctgggtacttttatctgtcccctccaccccacagtggggcaagcttctgacctcttctcttcctcccacaggg cctcgagagatctggcagcggaGGAAGCGGAGCTACTAACTTCAGCCTGCTGAAGCAGGCTGGA GACGTGGAGGAGAACCCTGGACCTgtgagcgagctgattaaggagaacatgcacatgaagctgtacatggagggc accgtgaacaaccaccacttcaagtgcacatccgagggcgaaggcaagccctacgagggcacccagaccatgagaatcaaggcggtcg agggcggccctctccccttcgccttcgacatcctggctaccagcttcatgtacggcagcaaaaccttcatcaaccacacccagggcatcccc gacttctttaagcagtccttccccgagggcttcacatgggagagagtcaccacatacgaagatgggggcgtgctgaccgctacccaggac accagcctccaggacggctgcctcatctacaacgtcaagatcagaggggtgaacttcccatccaacggccctgtgatgcagaagaaaaca STDU2-42312.601 (S22-113) ctcggctgggaggcctccaccgagacactgtaccccgctgacggcggcctggaaggcagagccgacatggccctgaagctcgtgggc gggggccacctgatctgcaaccttaagaccacatacagatccaagaaacccgctaagaacctcaagatgcccggcgtctactatgtggac aggagactggaaagaatcaaggaggccgacaaagagacatacgtcgagcagcacgaggtggctgtggccagatactgcgacctcccta gcaaactggggcacaaacttaattccTAaactagggacaggattggtgacagaaaagccccatccttaggcctcctccttcctagtctcct gatattgggtctaacccccacctcctgttaggcagattccttatctggtgacacacccccatttcctggagccatctctctccttgccagaacct ctaaggtttgcttacgatggagccagagaggatcctgggagggagagcttggcagggggtgggagggaagggggggatgcgtgacctg cccggttctcagtggccaccctgcgctaccctctcccagaacctgagctgctctgacgcggctgtctggtgcgtttcactgatcctggtgctg cagcttccttacacttcccaagaggagaagcagtttggaaaaacaaaatcagaataagttggtcctgagttctaactttggctcttcacctttcta gtccccaatttatattgttcctccgtgcgtcagttttacctgtgagataaggccagtagccagccccgtcctggcagggctgtggtgaggagg ggggtgtccgtgtggaaaactccctttgtgagaatggtgcgtcctaggtgttcaccaggtcgtggccgcctctactccctttctctttctccatc cttctttccttaaagagtccccagtgctatctgggacatattcctccgcccagagcagggtcccgcttccctaaggccctgctctgggcttctg ggtttgagtccttggc OCT4 HDR template sequence (SEQ ID NO:84): GCGACTATGCACAACGAGAGGATTTTGAGGCTGCTGGGTCTCCTTTCTCAGGGGGAC CAGTGTCCTTTCCTCTGGCCCCAGGGCCCCATTTTGGTACCCCAGGCTATGGGAGCC CTCACTTCACTGCACTGTACTCCTCGGTCCCTTTCCCTGAGGGGGAAGCCTTTCCCCC TGTCTCCGTCACCACTCTGGGCTCTCCCATGCATTCAAAtGGAAGCGGAGCTACTAAC TTCAGCCTGCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTgccaccatggtga gcgagctgattaaggagaacatgcacatgaagctgtacatggagggcaccgtgaacaaccaccacttcaagtgcacatccgagggcgaa ggcaagccctacgagggcacccagaccatgagaatcaaggcggtcgagggcggccctctccccttcgccttcgacatcctggctaccag cttcatgtacggcagcaaaaccttcatcaaccacacccagggcatccccgacttctttaagcagtccttccccgagggcttcacatgggaga gagtcaccacatacgaagatgggggcgtgctgaccgctacccaggacaccagcctccaggacggctgcctcatctacaacgtcaagatc agaggggtgaacttcccatccaacggccctgtgatgcagaagaaaacactcggctgggaggcctccaccgagacactgtaccccgctga cggcggcctggaaggcagagccgacatggccctgaagctcgtgggcgggggccacctgatctgcaaccttaagaccacatacagatcc aagaaacccgctaagaacctcaagatgcccggcgtctactatgtggacaggagactggaaagaatcaaggaggccgacaaagagacat acgtcgagcagcacgaggtggctgtggccagatactgcgacctccctagcaaactggggcacaaacttaattccTAaTGACTAG GAATGGGGGACAGGGGGAGGGGAGGAGCTAGGGAAAGAAAACCTGGAGTTTGTGC CAGGGTTTTTGGGATTAAGTTCTTCATTCACTAAGGAAGGAATTGGGAACACAAAGG GTGGGGGCAGGGGAGTTTGGGGCAACTGGTTGGAGGGAAGGTGAAGTTCAATGATG CTCTTGATTTTAATCCCACATCATGTATCACTTTTTTCTTAAATAAAGAAGCCTGGGA CACAGTAGATAGACACACTT Pantoea stewartii RecT DNA (SEQ ID NO:85): AGCAACCAGCCCCCTATCGCCTCCGCCGATCTGCAGAAGGCCAACACCGGCAAGCA GGTGGCCAATAAGACCCCTGAGCAGACACTGGTGGGCTTCATGAATCAGCCAGCAA TGAAGAGCCAGCTGGCCGCCGCCCTGCCAAGGCACATGACAGCCGATCGGATGATC AGAATCGTGACCACAGAGATCCGCAAGACCCCCGCCCTGGCCACATGCGACCAGAG CTCCTTCATCGGCGCCGTGGTGCAGTGTTCTCAGCTGGGCCTGGAGCCTGGCAGCGC CCTGGGCCACGCCTACCTGCTGCCATTTGGCAACGGCCGGAGCAAGTCCGGACAGT CCAATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGATCTGGCCCGGAGATCTG GCCAGATCGTGTCTCTGAGCGCCAGGGTGGTGCGCGCAGACGATGAGTTCTCCTTTG AGTACGGCCTGGATGAGAACCTGATCCACCGGCCAGGCGAGAATGAGGACGCACCC ATCACCCACGTGTATGCAGTGGCAAGACTGAAGGACGGAGGCACCCAGTTCGAAGT GATGACAGTGAAGCAGATCGAGAAGGTGAAGGCCCAGTCCAAGGCCTCTAGCAACG GACCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTG STDU2-42312.601 (S22-113) TTTAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGATCCTGGATGAGAA
Figure imgf000124_0001
GCTGGACGGCTCCTCTGAGGAG Pantoea stewartii RecE DNA (SEQ ID NO:86): CAGCCCGGCGTGTACTATGACATCTCCAACGAGGAGTATCACGCCGGCCCTGGCATC AGCAAGTCCCAGCTGGACGACATCGCCGTGTCCCCAGCCATCTTCCAGTGGAGAAA GTCTGCCCCCGTGGACGATGAGAAAACCGCCGCCCTGGACCTGGGCACAGCCCTGC ACTGCCTGCTGCTGGAGCCTGATGAGTTCTCCAAGAGGTTTATGATCGGCCCAGAGG TGAACCGGAGAACCAATGCCGGCAAGCAGAAGGAGCAGGACTTCCTGGATATGTGC GAGCAGCAGGGCATCACCCCTATCACACACGACGATAACCGGAAGCTGAGACTGAT GAGGGACTCTGCCTTTGCCCACCCAGTGGCCAGATGGATGCTGGAGACAGAGGGCA AGGCCGAGGCCTCTATCTACTGGAATGACAGGGATACACAGATCCTGAGCAGGTGC CGCCCCGACAAGCTGATCACCGAGTTCTCTTGGTGCGTGGACGTGAAGAGCACAGC CGACATCGGCAAGTTCCAGAAGGACTTCTACAGCTATCGCTACCACGTGCAGGACG CCTTCTATTCCGATGGCTACGAGGCCCAGTTTTGCGAGGTGCCAACCTTCGCCTTTCT GGTGGTGAGCTCCTCTATCGATTGTGGCCGGTATCCCGTGCAGGTGTTTATCATGGA CCAGCAGGCAAAGGATGCAGGAAGGGCCGAGTATAAGCGGAACCTGACCACATAC GCCGAGTGCCAGGCAAGGAATGAGTGGCCTGGCATCGCCACACTGAGCCTGCCTTA CTGGGCCAAGGAGATCCGGAATGTG Pantoea brenneri RecT DNA (SEQ ID NO:87): AGCAACCAGCCCCCTATCGCCTCCGCCGATCTGCAGAAAACCCAGCAGTCCAAGCA GGTGGCCAACAAGACCCCTGAGCAGACACTGGTGGGCTTCATGAATCAGCCAGCAA TGAAGAGCCAGCTGGCCGCCGCCCTGCCAAGGCACATGACCGCCGATCGGATGATC AGAATCGTGACCACAGAGATCCGCAAGACACCACAGCTGGCCCAGTGCGACCAGAG CTCCTTCATCGGCGCCGTGGTGCAGTGTTCTCAGCTGGGCCTGGAGCCTGGCAGCGC CCTGGGCCACGCCTACCTGCTGCCATTTGGCAACGGCCGGTCCAAGTCTGGCCAGAG CAATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGATCTGGCCCGGAGATCCG GACAGATCGTGAGCCTGTCCGCCAGGGTGGTGCGCGCAGACGATGAGTTCTCTTTTG AGTACGGCCTGGATGAGAACCTGGTGCACCGGCCAGGCGAGAATGAGGACGCACCC ATCACCCACGTGTATGCAGTGGCAAGACTGAAGGACGGAGGCACCCAGTTCGAAGT GATGACAGTGAAGCAGGTGGAGAAGGTGAAGGCCCAGTCCAAGGCCTCTAGCAATG GCCCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTG TTTAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGGTGCTGGATGAGAA GGCCGAGTCTGACGTGGATCAGGACAACGCCTCTGTGCTGAGCGCCGAGTATTCCGT GCTGGAGTCTGGCGACGAGGCCACAAAT Pantoea brenneri RecE DNA (SEQ ID NO:88): CAGCCTGGCATCTACTATGACATCAGCAACGAGGATTATCACAGGGGAGCAGGCAT CAGCAAGTCCCAGCTGGACGACATCGCCATCTCCCCAGCCATCTACCAGTGGAGAA AGCACGCCCCCGTGGACGAGGAGAAAACCGCCGCCCTGGATCTGGGCACAGCCCTG CACTGCCTGCTGCTGGAGCCTGACGAGTTCTCTAAGAGGTTTCAGATCGGCCCAGAG GTGAACCGGAGAACCACAGCCGGCAAGGAGAAGGAGAAGGAGTTCATCGAGCGGT GCGAGGCAGAGGGAATCACCCCAATCACACACGACGATAATAGGAAGCTGAAGCT GATGAGGGATTCCGCCCTGGCCCACCCAATCGCAAGGTGGATGCTGGAGGCACAGG STDU2-42312.601 (S22-113) GAAACGCAGAGGCCTCTATCTATTGGAATGACAGAGATGCCGGCGTGCTGAGCAGG
Figure imgf000125_0001
AGCCGACATCATGAAGTTCCAGAAGGACTTCTACTCTTACAGATACCACGTGCAGGA CGCCTTCTATTCCGATGGCTACGAGTCTCACTTTCACGAGACACCCACATTCGCCTTT CTGGCCGTGTCTACCAGCATCGACTGCGGCAGGTATCCTGTGCAGGTGTTTATCATG GACCAGCAGGCAAAGGATGCAGGAAGGGCCGAGTACAAGAGAAACATCCACACCT TCGCCGAGTGTCTGAGCAGGAATGAGTGGCCTGGCATCGCCACACTGTCCCTGCCTT TTTGGGCCAAGGAGCTGCGCAATGAG Pantoea dispersa RecT DNA (SEQ ID NO:89): TCCAACCAGCCACCTCTGGCCACCGCAGATCTGCAGAAAACCCAGCAGTCTAACCA GGTGGCCAAGACCCCTGAGCAGACACTGGTGGGCTTCATGAATCAGCCAGCAATGA AGAGCCAGCTGGCCGCCGCCCTGCCAAGGCACATGACCGCCGATCGGATGATCAGA ATCGTGACCACAGAGATCCGCAAGACACCCGCCCTGGCCCAGTGCGACCAGAGCTC CTTCATCGGAGCAGTGGTGCAGTGTAGCCAGCTGGGCCTGGAGCCTGGCTCCGCCCT GGGCCACGCCTACCTGCTGCCATTTGGCAACGGCCGGTCCAAGTCTGGCCAGAGCA ATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGATCTGGCCCGGAGATCCGGA CAGATCGTGAGCCTGTCCGCCAGGGTGGTGCGCGCAGACGATGAGTTCTCTTTTGAG TACGGCCTGGATGAGAACCTGATCCACCGGCCAGGCGACAATGAGTCCGCCCCCAT CACCCACGTGTATGCAGTGGCAAGACTGAAGGACGGAGGCACCCAGTTCGAAGTGA TGACAGCCAAGCAGGTGGAGAAGGTGAAGGCCCAGTCCAAGGCCTCTAGCAACGG ACCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTGT TTAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGGTGCTGGACGAGAAG GCCGAGAGCGACGTGGATCAGGACAATGCCTCTGTGCTGAGCGCCGAGTATTCCGT GCTGGAGTCTGGCACAGGCGAG Pantoea dispersa RecE DNA (SEQ ID NO:90): GAGCCAGGCATCTACTATGACATCAGCAACGAGGCCTACCACTCCGGCCCCGGCAT CAGCAAGTCCCAGCTGGACGACATCGCCAGGAGCCCTGCCATCTTCCAGTGGCGCA AGGACGCCCCAGTGGATACCGAGAAAACCAAGGCCCTGGACCTGGGCACCGATTTC CACTGCGCCGTGCTGGAGCCAGAGAGGTTTGCAGACATGTATCGCGTGGGCCCTGA AGTGAATCGGAGAACCACAGCCGGCAAGGCCGAGGAGAAGGAGTTCTTTGAGAAGT GTGAGAAGGATGGAGCCGTGCCCATCACCCACGACGATGCACGGAAGGTGGAGCTG ATGAGAGGCTCCGTGATGGCCCACCCTATCGCCAAGCAGATGATCGCAGCACAGGG ACACGCAGAGGCCTCTATCTACTGGCACGACGAGAGCACAGGCAACCTGTGCCGGT GTAGACCCGACAAGTTTATCCCTGATTGGAATTGGATCGTGGACGTGAAAACCACA GCCGATATGAAGAAGTTCAGGCGCGAGTTTTACGATCTGCGGTATCACGTGCAGGA CGCCTTCTACACCGATGGCTATGCCGCCCAGTTTGGCGAGCGGCCTACCTTCGTGTT TGTGGTGACATCCACCACAATCGACTGCGGCAGATACCCCACCGAGGTGTTCTTTCT GGATGAGGAGACAAAGGCCGCCGGCAGGTCTGAGTACCAGAGCAACCTGGTGACCT ATTCCGAGTGTCTGTCTCGCAATGAGTGGCCAGGCATCGCCACACTGTCTCTGCCCC ACTGGGCCAAGGAGCTGAGGAACGTG Type-F symbiont of Plautia stali RecT DNA (SEQ ID NO:91): STDU2-42312.601 (S22-113) TCCAACCAGCCCCCTATCGCCTCTGCCGATCTGCAGAAAACCCAGCAGTCTAAGCAG
Figure imgf000126_0001
GAAGTCCCAGCTGGCCGCCGCCCTGCCAAGGCACATGACAGCCGATCGGATGATCA GAATCGTGACCACAGAGATCCGCAAGACCCCCGCCCTGGCCACATGCGACCAGAGC TCCTTCATCGGAGCAGTGGTGCAGTGTAGCCAGCTGGGCCTGGAGCCTGGCTCCGCC CTGGGCCACGCCTACCTGCTGCCATTTGGCAACGGCCGGTCCAAGTCTGGCCAGTCT AATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGACCTGGCCCGGAGAAGCGG ACAGATCGTGAGCCTGTCCGCCAGGGTGGTGCGCGCAGACGATGAGTTCTCCTTTGA GTACGGCCTGGATGAGAACCTGATCCACCGGCCAGGCGATAATGAGGACGCCCCCA TCACCCACGTGTATGCAGTGGCAAGACTGAAGGACGGAGGCACCCAGTTCGAAGTG ATGACAGCCAAGCAGGTGGAGAAGGTGAAGGCCCAGAGCAAGGCCTCTAGCAACG GACCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTG TTTAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGGTGCTGGATGAGAA GGCCGAGAGCGACGTGGATCAGGACAATGCCTCTGTGCTGAGCGCCGAGTATTCCG TGCTGGAGGGCGACGGCGGCGAG Type-F symbiont of Plautia stali RecE DNA (SEQ ID NO:92): CAGCCTGGCATCTACTATGACATCAGCAACGAGGATTATCACGGCGGCCCTGGCATC AGCAAGTCCCAGCTGGACGACATCGCCATCTCCCCAGCCATCTACCAGTGGAGGAA GCACGCCCCCGTGGACGAGGAGAAAACCGCCGCCCTGGATCTGGGCACAGCCCTGC ACTGCCTGCTGCTGGAGCCTGACGAGTTCTCTAAGAGATTTGAGATCGGCCCAGAGG TGAACCGGAGAACCACAGCCGGCAAGGAGAAGGAGAAGGAGTTCATGGAGAGGTG TGAGGCAGAGGGAGTGACCCCTATCACACACGACGATAATCGGAAGCTGAGACTGA TGAGGGATAGCGCAATGGCCCACCCAATCGCCAGATGGATGCTGGAGGCACAGGGA AACGCAGAGGCCTCTATCTATTGGAATGACAGGGATACCGGCGTGCTGAGCAGGTG CCGCCCCGACAAGATCATCACCGACTTCAACTGGTGCGTGGACGTGAAGTCCACAG CCGACATCATCAAGTTCCAGAAGGACTTTTACTCTTATCGCTACCACGTGCAGGACG CCTTCTATTCCGATGGCTACGAGTCTCACTTTGACGAGACACCAACATTCGCCTTTCT GGCCGTGTCTACAAGCATCGATTGCGGCCGGTATCCCGTGCAGGTGTTCATCATGGA CCAGCAGGCAAAGGATGCAGGAAGGGCCGAGTACAAGCGGAACATCCACACCTTTG CCGAGTGTCTGAGCCGCAATGAGTGGCCTGGCATCGCCACACTGTCCCTGCCTTACT GGGCCAAGGAGCTGCGGAATGAG Providencia stuartii RecT DNA (SEQ ID NO:93): AGCAACCCACCTCTGGCCCAGGCAGACCTGCAGAAAACCCAGGGCACAGAGGTGAA GGAGAAAACCAAGGATCAGATGCTGGTGGAGCTGATCAATAAGCCTTCCATGAAGG CACAGCTGGCCGCCGCCCTGCCAAGGCACATGACACCCGACCGGATGATCAGAATC GTGACCACAGAGATCAGAAAGACCCCCGCCCTGGCCACATGCGATATGCAGAGCTT CGTGGGAGCAGTGGTGCAGTGTTCCCAGCTGGGCCTGGAGCCTGGCAACGCCCTGG GACACGCCTACCTGCTGCCTTTTGGCAACGGCAAGTCTAAGAGCGGCCAGTCTAATG TGCAGCTGATCATCGGCTATCGGGGCATGATCGACCTGGCCCGGAGAAGCGGCCAG ATCGTGTCCATCTCTGCCAGGACCGTGCGCCAGGGCGATAACTTCCACTTTGAGTAC GGCCTGAACGAGAATCTGACCCACGTGCCTGGCGAGAATGAGGACTCTCCAATCAC ACACGTGTACGCAGTGGCAAGGCTGAAGGATGGAGGCGTGCAGTTCGAAGTGATGA CCTATAACCAGATCGAGAAGGTGCGCGCCAGCTCCAAGGCAGGACAGAATGGACCC TGGGTGAGCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTGTTCAA STDU2-42312.601 (S22-113) GTACCTGCCCGTGTCTATCGAGATGCAGAAGGCCGTGATCCTGGACGAGAAGGCCG
Figure imgf000127_0001
GGCACAGACGGCAAG Providencia stuartii RecE DNA (SEQ ID NO:94): GAGGGCATCTACTATAACATCAGCAATGAGGACTACCACAACGGCCTGGGCATCTC CAAGTCTCAGCTGGATCTGATCAATGAGATGCCTGCCGAGTATATCTGGTCCAAGGA GGCCCCCGTGGACGAGGAGAAGATCAAGCCTCTGGAGATCGGCACCGCCCTGCACT GCCTGCTGCTGGAGCCAGACGAGTACCACAAGAGATATAAGATCGGCCCCGATGTG AACCGGAGAACAAATGCCGGCAAGGAGAAGGAGAAGGAGTTCTTTGATATGTGCGA GAAGGAGGGCATCACCCCCATCACACACGACGATAACCGGAAGCTGATGATCATGA GAGACTCTGCCCTGGCCCACCCTATCGCCAAGTGGTGTCTGGAGGCCGATGGCGTGA GCGAGAGCTCCATCTACTGGACCGACAAGGAGACAGATGTGCTGTGCAGGTGTCGC CCAGACCGCATCATCACCGCCCACAACTACATCGTGGATGTGAAGTCTAGCGGCGA CATCGAGAAGTTCGATTACGAGTACTACAACTACAGATACCACGTGCAGGACGCCTT TTACTCCGATGGCTATAAGGAGGTGACCGGCATCACCCCTACATTCCTGTTTCTGGT GGTGTCTACCAAGATCGACTGCGGCAAGTACCCCGTGCGGACCTACGTGATGAGCG AGGAGGCAAAGTCCGCCGGAAGGACCGCCTACAAGCACAACCTGCTGACCTATGCC GAGTGTCTGAAAACCGATGAGTGGGCCGGCATCAGGACACTGTCTCTGCCCAGATG GGCAAAGGAGCTGCGGAATGAG Providencia sp. MGF014 RecT DNA (SEQ ID NO:95): TCTAACCCCCCTCTGGCCCAGAGCGACCTGCAGAAAACCCAGGGCACAGAGGTGAA GGTGAAAACCAAGGATCAGCAGCTGATCCAGTTCATCAATCAGCCTTCTATGAAGG CACAGCTGGCCGCCGCCCTGCCAAGGCACATGACACCCGACCGGATGATCAGAATC GTGACCACAGAGATCAGAAAGACCCCCGCCCTGGCCACATGCGATATGCAGTCCTT CGTGGGCGCCGTGGTGCAGTGTTCTCAGCTGGGCCTGGAGCCTGGCAACGCCCTGG GACACGCCTACCTGCTGCCTTTTGGCAACGGCAAGGCCAAGTCCGGCCAGTCTAATG TGCAGCTGATCATCGGCTATCGGGGCATGATCGACCTGGCCCGGAGATCCAACCAG ATCATCTCTATCAGCGCCAGGACCGTGCGCCAGGGCGATAACTTCCACTTTGAGTAC GGCCTGAATGAGGACCTGACCCACACACCTAGCGAGAATGAGGATTCCCCAATCAC CCACGTGTACGCAGTGGCAAGGCTGAAGGACGGAGGCGTGCAGTTTGAAGTGATGA CATATAACCAGGTGGAGAAGGTGCGCGCCAGCTCCAAGGCAGGACAGAATGGACCC TGGGTGAGCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTGTTCAA GTACCTGCCCGTGTCCATCGAGATGCAGAAGGCAGTGGTGCTGGACGAGAAGGCAG AGGCCAACGTGGATCAGGAGAATGCCACCATCTTTGAGGGCGAGTATGAGGAAGTG GGCACAGATGGCAAT Providencia sp. MGF014 RecE DNA (SEQ ID NO:96): AAGGAGGGCATCTACTATAACATCAGCAATGAGGACTACCACAACGGCCTGGGCAT CTCCAAGTCTCAGCTGGATCTGATCAATGAGATGCCTGCCGAGTATATCTGGTCCAA GGAGGCCCCCGTGGACGAGGAGAAGATCAAGCCTCTGGAGATCGGCACCGCCCTGC ACTGCCTGCTGCTGGAGCCAGACGAGTACCACAAGAGATATAAGATCGGCCCCGAT GTGAACCGGAGAACAAATGTGGGCAAGGAGAAGGAGAAGGAGTTCTTTGATATGTG CGAGAAGGAGGGCATCACCCCCATCACACACGACGATAACCGGAAGCTGATGATCA TGAGAGACTCTGCCCTGGCCCACCCTATCGCCAAGTGGTGTCTGGAGGCCGATGGCG STDU2-42312.601 (S22-113) TGAGCGAGAGCTCCATCTACTGGACCGACAAGGAGACAGATGTGCTGTGCAGGTGT
Figure imgf000128_0001
GACATCGAGAAGTTCGATTACGAGTACTACAACTACAGATACCACGTGCAGGACGC CTTTTACTCCGATGGCTATAAGGAGGTGACCGGCATCACCCCTACATTCCTGTTTCTG GTGGTGTCTACCAAGATCGACTGCGGCAAGTACCCCGTGCGGACCTACGTGATGAG CGAGGAGGCAAAGTCCGCCGGAAGGACCGCCTACAAGCACAACCTGCTGACCTATG CCGAGTGTCTGAAAACCGATGAGTGGGCCGGCATCAGGACACTGTCTCTGCCCAGA TGGGCAAAGGAGCTGCGGAATGAG Shewanella putrefaciens RecT DNA (SEQ ID NO:97): CAGACCGCACAGGTGAAGCTGAGCGTGCCCCACCAGCAGGTGTACCAGGACAACTT CAATTATCTGAGCTCCCAGGTGGTGGGCCACCTGGTGGATCTGAACGAGGAGATCG GCTACCTGAACCAGATCGTGTTTAATTCTCTGAGCACCGCCTCTCCCCTGGACGTGG CAGCACCTTGGAGCGTGTACGGCCTGCTGCTGAACGTGTGCCGGCTGGGCCTGTCCC TGAATCCAGAGAAGAAGCTGGCCTATGTGATGCCCTCCTGGTCTGAGACAGGCGAG ATCATCATGAAGCTGTACCCCGGCTATAGGGGCGAGATCGCCATCGCCTCTAACTTC AATGTGATCAAGAACGCCAATGCCGTGCTGGTGTATGAGAACGATCACTTCCGCATC CAGGCAGCAACCGGCGAGATCGAGCACTTTGTGACAAGCCTGTCCATCGACCCTAG GGTGCGCGGAGCATGCAGCGGAGGCTACTGTCGGTCCGTGCTGATGGATAATACAA TCCAGATCTCTTATCTGAGCATCGAGGAGATGAACGCCATCGCCCAGAATCAGATCG AGGCCAACATGGGCAATACCCCTTGGAACTCCATCTGGCGGACAGAGATGAATAGA GTGGCCCTGTACCGGAGAGCAGCAAAGGACTGGAGGCAGCTGATCAAGGCCACCCC AGAGATCCAGTCCGCCCTGTCTGATACAGAGTAT Shewanella putrefaciens RecE DNA (SEQ ID NO:98): GGCACCGCCCTGGCCCAGACAATCAGCCTGGACTGGCAGGATACCATCCAGCCAGC ATACACAGCCTCCGGCAAGCCTAACTTCCTGAATGCCCAGGGCGAGATCGTGGAGG GCATCTACACCGATCTGCCTAATTCCGTGTATCACGCCCTGGACGCACACAGCTCCA CCGGCATCAAGACATTCGCCAAGGGCCGCCACCACTACTTTCGGCAGTATCTGTCTG ACGTGTGCCGGCAGAGAACAAAGCAGCAGGAGTACACCTTCGACGCCGGCACCTAC GGCCACATGCTGGTGCTGGAGCCAGAGAACTTCCACGGCAACTTCATGAGGAACCC CGTGCCTGACGATTTTCCAGACATCGAGCTGATCGAGAGCATCCCACAGCTGAAGG CCGCCCTGGCCAAGAGCAACCTGCCCGTGTCCGGAGCAAAGGCCGCCCTGATCGAG AGACTGTACGCCTTCGACCCATCCCTGCCCCTGTTTGAGAAGATGAGGGAGAAGGC CATCACCGACTATCTGGATCTGCGCTACGCCAAGTATCTGCGGACCGACGTGGAGCT GGATGAGATGGCCACATTCTACGGCATCGATACCTCTCAGACACGGGAGAAGAAGA TCGAGGAGATCCTGGCCATCTCTCCTAGCCAGCCAATCTGGGAGAAGCTGATCAGCC AGCACGTGATCGACCACATCGTGTGGGACGATGCCATGAGGGTGGAGAGATCCACC AGGGCCCACCCTAAGGCAGACTGGCTGATCTCTGATGGCTATGCCGAGCTGACAAT CATCGCAAGGTGCCCAACCACCGGCCTGCTGCTGAAGGTGCGGTTTGACTGGCTGA GGAATGATGCCATCGGCGTGGACTTCAAGACCACACTGTCTACCAACCCCACAAAG TTTGGCTACCAGATCAAGGACCTGCGGTATGATCTGCAGCAGGTGTTCTACTGTTAT GTGGCCAATCTGGCCGGCATCCCTGTGAAGCACTTCTGCTTTGTGGCCACCGAGTAC AAGGACGCCGATAACTGTGAGACATTTGAGCTGTCTCACAAGAAAGTGATCGAGAG CACCGAGGAGATGTTCGACCTGCTGGATGAGTTTAAGGAGGCCCTGACCTCCGGCA ATTGGTATGGCCACGACAGGTCCCGCTCTACATGGGTCATCGAGGTG STDU2-42312.601 (S22-113)
Figure imgf000129_0001
Bacillus sp. MUM 116 RecT DNA (SEQ ID NO:99): AGCAAGCAGCTGACCACAGTGAATACCCAGGCCGTGGTGGGCACATTCTCCCAGGC CGAGCTGGATACCCTGAAGCAGACAATCGCCAAGGGCACCACAAACGAGCAGTTCG CCCTGTTTGTGCAGACCTGCGCCAACTCTAGGCTGAATCCATTTCTGAACCACATCC ACTGTATCGTGTATAACGGCAAGGAGGGCGCCACCATGAGCCTGCAGATCGCAGTG GAGGGCATCCTGTACCTGGCACGCAAGACAGACGGCTATAAGGGCATCGAGTGCCA GCTGATCCACGAGAATGACGAGTTCAAGTTTGATGCCAAGTCCAAGGAGGTGGATC ACCAGATCGGATTCCCCAGGGGCAACGTGATCGGAGGATATGCAATCGCAAAGAGG GAGGGCTTTGACGATGTGGTGGTGCTGATGGAGTCTAACGAGGTGGACCACATGCT GAAGGGCCGGAATGGCCACATGTGGAGAGACTGGTTCAACGATATGTTTAAGAAGC ACATCATGAAGCGGGCCGCCAAGCTGCAGTACGGCATCGAGATCGCAGAGGACGAG ACAGTGAGCAGCGGACCTAGCGTGGATAATATCCCAGAGTATAAGCCACAGCCCCG GAAGGACATCACACCCAACCAGGACGTGATCGATGCCCCCCCTCAGCAGCCTAAGC AGGACGATGAGGCCGCCAAGCTGAAGGCCGCCAGATCTGAGGTGAGCAAGAAGTTC AAGAAGCTGGGCATCGTGAAGGAGGATCAGACCGAGTACGTGGAGAAGCACGTGC CTGGCTTCAAGGGCACACTGTCCGACTTTATCGGCCTGTCTCAGCTGCTGGATCTGA ATATCGAGGCCCAGGAGGCCCAGTCCGCCGACGGCGATCTGCTGGAC Bacillus sp. MUM 116 RecE DNA (SEQ ID NO:100): ACCTACGCCGCCGACGAGACACTGGTGCAGCTGCTGCTGTCCGTGGATGGCAAGCA GCTGCTGCTGGGAAGGGGCCTGAAGAAGGGCAAGGCCCAGTACTATATCAATGAGG TGCCATCTAAGGCCAAGGAGTTCGAGGAGATCCGGGACCAGCTGTTTGACAAGGAT CTGTTCATGTCCCTGTTTAACCCCTCTTACTTCTTTACCCTGCACTGGGAGAAGCAGA GGGCCATGATGCTGAAGTATGTGACAGCCCCCGTGTCTAAGGAGGTGCTGAAGAAT CTGCCTGAGGCCCAGTCCGAGGTGCTGGAGAGATACCTGAAGAAGCACTCTCTGGT GGATCTGGAGAAGATCCACAAGGACAACAAGAATAAGCAGGATAAGGCCTATATCT CTGCCCAGAGCAGGACCAACACACTGAAGGAGCAGCTGATGCAGCTGACCGAGGA GAAGCTGGACATCGATTCCATCAAGGCCGAGCTGGCCCACATCGACATGCAGGTCA TCGAGCTGGAGAAGCAGATGGATACAGCCTTCGAGAAGAACCAGGCCTTTAATCTG CAGGCCCAGATCAGGAATCTGCAGGACAAGATCGAGATGAGCAAGGAGCGGTGGC CCTCCCTGAAGAACGAAGTGATCGAGGATACCTGCCGGACATGCAAGCGGCCCCTG GACGAGGATAGCGTGGAGGCCGTGAAGGCCGACAAGGATAATCGGATCGCCGAGT ACAAGGCCAAGCACAACTCCCTGGTGTCTCAGAGAAATGAGCTGAAGGAGCAGCTG AACACCATCGAGTATATCGACGTGACAGAGCTGAGAGAGCAGATCAAGGAGCTGGA TGAGTCCGGACAGCCTCTGAGGGAGCAGGTGCGCATCTACAGCCAGTATCAGAATC TGGACACCCAGGTGAAGTCCGCCGAGGCAGACGAGAACGGCATCCTGCAGGATCTG AAGGCCTCTATCTTCATCCTGGATAGCATCAAGGCCTTTAGGGGCAAGGAGGCCGA GATGCAGGCCGAGAAGGTGCAGGCCCTGTTCACCACACTGAGCGTGCGCCTGTTTA AGCAGAATAAGGGCGACGGCGAGATCAAGCCAGATTTCGAGATCGAGATGAACGA CAAGCCCTATCGGACCCTGAGCCTGTCCGAGGGCATCCGGGCAGGCCTGGAGCTGC GGGACGTGCTGAGCCAGCAGTCCGAGCTGGTGACCCCTACATTCGTGGATAATGCC GAGTCTATCACCAGCTTCAAGCAGCCAAACGGCCAGCTGATCATCAGCCGGGTGGT GGCAGGACAGGAGCTGAAGATCGAGGCCGTGAGCGAG STDU2-42312.601 (S22-113) Shigella sonnei RecT DNA (SEQ ID NO:101):
Figure imgf000130_0001
CACCAGCAGCCATCAAGAACAATGATGTGATCTCCTTTATCAATCAGCCCTCTATGA AGGAGCAGCTGGCCGCCGCCCTGCCTAGGCACATGACCGCCGAGAGGATGATCCGC ATCGCCACCACAGAGATCCGCAAGGTGCCTGCCCTGGGCAACTGCGACACAATGAG CTTCGTGAGCGCCATCGTGCAGTGTAGCCAGCTGGGCCTGGAGCCAGGCTCCGCCCT GGGCCACGCCTACCTGCTGCCCTTCGGCAACAAGAATGAGAAGTCCGGCAAGAAGA ATGTGCAGCTGATCATCGGCTATAGGGGCATGATCGATCTGGCCCGGAGATCTGGC CAGATCGCCTCTCTGAGCGCCAGAGTGGTGCGGGAGGGCGACGAGTTCAACTTTGA GTTCGGCCTGGATGAGAAGCTGATCCACCGGCCTGGCGAGAATGAGGACGCCCCAG TGACCCACGTGTACGCAGTGGCCAGACTGAAGGATGGCGGCACCCAGTTTGAAGTG ATGACAAGGCGCCAGATCGAGCTGGTGAGGTCCCAGTCTAAGGCCGGCAACAATGG CCCTTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGCCATCCGGAGACTGT TCAAGTACCTGCCAGTGTCTATCGAGATCCAGCGCGCCGTGAGCATGGACGAGAAG GAGCCACTGACCATCGACCCCGCCGATAGCTCCGTGCTGACAGGCGAGTATTCTGT GATCGATAACAGCGAGGAG Shigella sonnei RecE DNA (SEQ ID NO:102): GATCGCGGCCTGCTGACAAAGGAGTGGAGGAAGGGAAACCGGGTGAGCCGGATCA CCAGGACAGCCAGCGGAGCAAACGCAGGAGGAGGAAATCTGACCGACAGAGGCGA GGGCTTCGTGCACGATCTGACAAGCCTGGCCCGCGACATCGCAACCGGCGTGCTGG CCCGGAGCATGGACGTGGACATCTACAACCTGCACCCTGCCCACGCCAAGAGGATC GAGGAGATCATCGCCGAGAATAAGCCCCCTTTCAGCGTGTTTAGAGACAAGTTTATC ACAATGCCAGGCGGCCTGGACTACTCCAGGGCCATCGTGGTGGCCTCTGTGAAGGA GGCCCCAATCGGCATCGAAGTGATCCCCGCCCACGTGACCGCCTATCTGAACAAGG TGCTGACCGAGACAGACCACGCCAATCCAGATCCCGAGATCGTGGACATCGCATGC GGCAGAAGCTCCGCCCCTATGCCACAGAGGGTGACCGAGGAGGGCAAGCAGGACG ATGAGGAGAAGCTGCAGCCTTCTGGCACCACAGCAGATGAGCAGGGAGAGGCAGA GACAATGGAGCCAGACGCCACAAAGCACCACCAGGATACCCAGCCTCTGGACGCCC AGAGCCAGGTGAACAGCGTGGATGCCAAGTATCAGGAGCTGAGAGCCGAGCTGCAC GAGGCCAGGAAGAACATCCCTTCCAAGAATCCAGTGGACGCAGATAAGCTGCTGGC CGCCTCTCGCGGCGAGTTCGTGGACGGCATCAGCGACCCAAACGATCCCAAGTGGG TGAAGGGCATCCAGACACGGGATTCCGTGTACCAGAATCAGCCTGAGACAGAGAAA ACCAGCCCCGACATGAAGCAGCCAGAGCCTGTGGTGCAGCAGGAGCCTGAGATCGC CTTCAACGCCTGCGGACAGACCGGCGGCGACAATTGCCCAGATTGTGGCGCCGTGA TGGGCGATGCCACCTATCAGGAGACATTTGACGAGGAGAACCAGGTGGAGGCCAAG GAGAATGATCCTGAGGAGATGGAGGGCGCCGAGCACCCACACAACGAGAATGCCG GCAGCGACCCCCACAGAGACTGTTCCGATGAGACAGGCGAGGTGGCCGATCCCGTG ATCGTGGAGGACATCGAGCCTGGCATCTACTATGGCATCAGCAACGAGAATTACCA CGCAGGCCCCGGCGTGTCCAAGTCTCAGCTGGACGACATCGCCGACACACCTGCCCT GTATCTGTGGAGGAAGAACGCCCCAGTGGATACCACAAAGACCAAGACACTGGACC TGGGCACCGCATTCCACTGCCGCGTGCTGGAGCCAGAGGAGTTCAGCAATCGGTTTA TCGTGGCCCCCGAGTTCAACCGGAGAACAAATGCCGGCAAGGAGGAGGAGAAGGC CTTTCTGATGGAGTGTGCCTCCACAGGCAAGATGGTCATCACCGCCGAGGAGGGCA GAAAGATCGAGCTGATGTACCAGTCTGTGATGGCACTGCCACTGGGACAGTGGCTG GTGGAGAGCGCCGGACACGCAGAGTCTAGCATCTATTGGGAGGACCCCGAGACAGG STDU2-42312.601 (S22-113) CATCCTGTGCAGGTGTCGCCCCGACAAGATCATCCCTGAGTTCCACTGGATCATGGA
Figure imgf000131_0001
ATCACGTGCAGGATGCCTTCTACTCCGACGGCTATGAGGCCCAGTTTGGCGTGCAGC CCACCTTCGTGTTTCTGGTGGCCTCTACCACAATCGAGTGCGGCAGATACCCCGTGG AGATCTTTATGATGGGAGAGGAGGCAAAGCTGGCCGGACAGCTGGAGTATCACCGC AACCTGCGGACACTGGCCGATTGTCTGAATACCGACGAGTGGCCAGCCATCAAGAC CCTGTCCCTGCCCAGATGGGCAAAGGAGTACGCCAACGAC Salmonella enterica RecT DNA (SEQ ID NO:103): ACCAAGCAGCCCCCTATCGCCAAGGCCGACCTGCAGAAAACCCAGGGAAACAGGGC ACCTGCAGCAGTGAATGACAAGGATGTGCTGTGCGTGATCAACAGCCCTGCCATGA AGGCACAGCTGGCCGCCGCCCTGCCAAGGCACATGACCGCCGAGAGGATGATCCGC ATCGCCACCACAGAGATCAGGAAGGTGCCAGAGCTGCGCAACTGCGACAGCACCAG CTTCATCGGCGCCATCGTGCAGTGTTCTCAGCTGGGCCTGGAGCCCGGCAGCGCCCT GGGCCACGCCTACCTGCTGCCTTTTGGCAATGGCAAGGCCAAGAACGGCAAGAAGA ATGTGCAGCTGATCATCGGCTATCGGGGCATGATCGATCTGGCCCGGAGATCTGGCC AGATCATCTCCCTGAGCGCCAGAGTGGTGCGGGAGTGTGACGAGTTCTCCTACGAGC TGGGCCTGGATGAGAAGCTGGTGCACCGGCCAGGCGAGAACGAGGACGCACCCATC ACCCACGTGTATGCCGTGGCCAAGCTGAAGGATGGCGGCGTGCAGTTTGAAGTGAT GACCAAGAAGCAGGTGGAGAAGGTGAGAGATACACACTCCAAGGCCGCCAAGAAT GCCGCCTCTAAGGGCGCCAGCTCCATCTGGGACGAGCACTTCGAGGATATGGCCAA GAAAACCGTGATCCGGAAGCTGTTTAAGTACCTGCCCGTGAGCATCGAGATCCAGA GAGCCGTGAGCATGGACGGCAAGGAGGTGGAGACAATCAACCCAGACGACATCAG CGTGATCGCCGGCGAGTATTCCGTGATCGATAATCCCGAGGAG Salmonella enterica RecE DNA (SEQ ID NO:104): GATCGCGGCCTGCTGACAAAGGAGTGGAGGAAGGGAAACCGGGTGAGCCGGATCA CCAGGACAGCCAGCGGAGCAAACGCAGGAGGAGGAAATCTGACCGACAGAGGCGA GGGCTTCGTGCACGATCTGACAAGCCTGGCCCGCGACGTGGCAACCGGCGTGCTGG CCCGGAGCATGGACGTGGACATCTACAACCTGCACCCTGCCCACGCCAAGAGGGTG GAGGAGATCATCGCCGAGAATAAGCCCCCTTTCAGCGTGTTTAGAGACAAGTTTATC ACAATGCCTGGCGGCCTGGACTACTCCAGGGCCATCGTGGTGGCCTCTGTGAAGGA GGCCCCTATCGGCATCGAAGTGATCCCAGCCCACGTGACCGAGTATCTGAACAAGG TGCTGACCGAGACAGACCACGCCAATCCAGATCCCGAGATCGTGGACATCGCATGC GGCAGAAGCTCCGCCCCTATGCCACAGAGGGTGACCGAGGAGGGCAAGCAGGACG ATGAGGAGAAGCCCCAGCCTTCTGGAGCTATGGCCGACGAGCAGGCAACCGCAGAG ACAGTGGAGCCAAACGCCACAGAGCACCACCAGAATACCCAGCCCCTGGATGCCCA GAGCCAGGTGAACTCCGTGGACGCCAAGTATCAGGAGCTGAGAGCCGAGCTGCAGG AGGCCAGGAAGAACATCCCCTCCAAGAATCCTGTGGACGCAGATAAGCTGCTGGCC GCCTCTCGCGGCGAGTTCGTGGATGGCATCAGCGACCCTAACGATCCAAAGTGGGT GAAGGGCATCCAGACACGGGATTCCGTGTACCAGAATCAGCCCGAGACAGAGAAG ATCTCTCCTGACGCCAAGCAGCCAGAGCCCGTGGTGCAGCAGGAGCCCGAGACAGT GTGCAACGCCTGTGGACAGACCGGCGGCGACAATTGCCCTGATTGTGGCGCCGTGA TGGGCGACGCCACATATCAGGAGACATTCGGCGAGGAGAATCAGGTGGAGGCCAAG GAGAAGGACCCCGAGGAGATGGAGGGAGCAGAGCACCCTCACAACGAGAATGCCG GCAGCGACCCACACAGAGACTGTTCCGATGAGACAGGCGAGGTGGCCGATCCAGTG STDU2-42312.601 (S22-113) ATCGTGGAGGACATCGAGCCTGGCATCTACTATGGCATCAGCAACGAGAATTACCA
Figure imgf000132_0001
TGTATCTGTGGAGGAAGAACGCCCCTGTGGATACCACAAAGACCAAGACACTGGAC CTGGGCACCGCATTCCACTGCCGCGTGCTGGAGCCTGAGGAGTTCAGCAATCGGTTT ATCGTGGCCCCAGAGTTCAACCGGAGAACAAATGCCGGCAAGGAGGAGGAGAAGG CCTTTCTGATGGAGTGTGCCTCCACCGGCAAGACAGTGATCACCGCCGAGGAGGGC AGAAAGATCGAGCTGATGTACCAGTCTGTGATGGCACTGCCTCTGGGACAGTGGCT GGTGGAGAGCGCCGGACACGCAGAGTCTAGCATCTATTGGGAGGACCCCGAGACAG GCATCCTGTGCAGGTGTCGCCCAGACAAGATCATCCCCGAGTTCCACTGGATCATGG ACGTGAAAACCACAGCCGACATCCAGCGGTTCAAGACAGCCTACTATGATTACAGG TATCACGTGCAGGATGCCTTCTACTCCGACGGCTATGAGGCCCAGTTTGGCGTGCAG CCAACCTTCGTGTTTCTGGTGGCCTCTACCACAGTGGAGTGCGGCAGATACCCCGTG GAGATCTTTATGATGGGAGAGGAGGCAAAGCTGGCCGGACAGCAGGAGTATCACCG CAACCTGCGGACACTGGCCGATTGTCTGAATACCGACGAGTGGCCTGCCATCAAGA CCCTGTCCCTGCCACGGTGGGCCAAGGAGTACGCCAACGAC Acetobacter RecT DNA (SEQ ID NO:105): AACGCCCCCCAGAAGCAGAATACCAGAGCCGCCGTGAAGAAGATCAGCCCTCAGGA GTTCGCCGAGCAGTTTGCCGCCATCATCCCACAGGTGAAGTCCGTGCTGCCCGCCCA CGTGACCTTCGAGAAGTTTGAGCGGGTGGTGAGACTGGCCGTGCGGAAGAACCCTG ACCTGCTGACATGCTCCCCAGCCTCTCTGTTCATGGCATGTATCCAGGCAGCCTCCG ACGGCCTGCTGCCTGATGGAAGGGAGGGAGCAATCGTGAGCCGGTGGAGCTCCAAG AAGAGCTGCAACGAGGCCTCCTGGATGCCAATGGTGGCCGGCCTGATGAAGCTGGC CCGGAACAGCGGCGACATCGCCAGCATCTCTAGCCAGGTGGTGTTCGAGGGCGAGC ACTTTAGAGTGGTGCTGGGCGACGAGGAGAGGATCGAGCACGAGCGCGATCTGGGC AAGACCGGCGGCAAGATCGTGGCAGCCTACGCCGTGGCAAGGCTGAAGGACGGCA GCGATCCAATCCGCGAGATCATGTCCTGGGGCCAGATCGAGAAGATCAGAAACACA AATAAGAAGTGGGAGTGGGGACCCTGGAAGGCCTGGGAGGACGAGATGGCCAGAA AGACCGTGATCCGGAGACTGGCCAAGAGACTGCCCATGTCTACAGATAAGGAGGGA GAGAGGCTGCGCAGCGCCATCGAGAGGATCGACTCCCTGGTGGACATCTCTGCCAA CGTGGACGCACCTCAGATCGCAGCAGACGATGAGTTTGCCGCCGCCGCCCACGGCG TGGAGCCACAGCAGATCGCAGCACCTGACCTGATCGGCCGCCTGGCCCAGATGCAG TCCCTGGAGCAGGTGCAGGACATCGAGCCCCAGGTGTCTCACGCCATCCAGGAGGC CGACAAGAGGGGCGACAGCGATACAGCCAATGCCCTGGATGCCGCCCTGCAGAGCG CCCTGTCCCGCACCTCTACAGCCAAGGAGGAGGTGCCTGCC Acetobacter RecE DNA (SEQ ID NO:106): GTGATCTCTAAGAGCGGCATCTACGACCTGACCAACGAGCAGTATCACGCCGATCCT TGCCCAGAGATGTCCCTGAGCTCCTCTGGAGCCAGGGACCTGCTGAGCTCCTGTCCT GCCAAGTTCATCGCCGCCAAGCAGCTGCCACAGCAGAATAAGAGGTGCTTTGACAT CGGCTCTGCCGGACACCTGATGGTGCTGGAGCCACACCTGTTCGACCAGAAGGTGT GCGAGATCAAGCACCCTGATTGGCGCACAAAGGCAGCAAAGGAGGAGCGGGACGC CGCCTACGCCGAGGGAAGAATCCCCCTGCTGAGCCGCGAGGTGGAGGACATCAGGG CAATGCACTCCGTGGTGTGGAGAGATTCTCTGGGAGCCAGGGCCTTCAGCGGAGGC AAGGCAGAGCAGTCCCTGGTGTGGCGCGACGAGGAGTTTGGCATCTGGTGCCGGCT GCGGCCCGATTACGTGCCTAACAATGCCGTGCGGATCTTCGACTATAAGACCGCCAC STDU2-42312.601 (S22-113) AAACGGCTCCCCCGATGCCTTTATGAAGGAGATCTACAATCGGGGCTATCACCAGC
Figure imgf000133_0001
TTCTGGTTTGTGGTGCAGGAGAAAACCGCCCCCTTCCTGCTGTCTTTCTTTCAGATGG ATGAGATGAGCCTGGAGATCGGCCGGACCCTGAACAGACAGGCCAAGGGCATCTTT GCCTGGTGCCTGCGCAACAATTGTTGGCCAGGCTATCAGCCCGAGGTGGATGGCAA GGTGAGATTCTTTACCACATCTCCCCCTGCCTGGCTGGTGAGGGAGTACGAGTTTAA GAATGAGCACGGCGCCTATGAGCCACCCGAGATCAAGCGGAAGGAGGTGGCC Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecT DNA (SEQ ID NO:107): CCAAAGCAGCCCCCTATCGCCAAGGCAGACCTGCAGAAAACCCAGGGAGCACGGAC CCCAACAGCAGTGAAGAACAATAACGATGTGATCTCCTTTATCAATCAGCCTTCTAT GAAGGAGCAGCTGGCCGCCGCCCTGCCAAGGCACATGACCGCCGAGCGGATGATCA GAATCGCCACCACAGAGATCAGGAAGGTGCCCGCCCTGGGCGACTGCGATACAATG TCTTTTGTGAGCGCCATCGTGCAGTGTAGCCAGCTGGGCCTGGAGCCTGGCGGCGCC CTGGGCCACGCCTACCTGCTGCCTTTCGGCAATCGGAACGAGAAGTCCGGCAAGAA GAATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGACCTGGCCCGGAGATCCG GACAGATCGCCAGCCTGTCCGCCAGGGTGGTGCGCGAGGGCGACGATTTCTCTTTTG AGTTCGGCCTGGAGGAGAAGCTGGTGCACAGGCCAGGCGAGAACGAGGACGCCCC CGTGACCCACGTGTACGCAGTGGCACGCCTGAAGGATGGAGGCACCCAGTTTGAAG TGATGACACGGAAGCAGATCGAGCTGGTGAGAGCCCAGTCTAAGGCCGGCAATAAC GGCCCTTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGCCATCAGGCGCCT GTTCAAGTACCTGCCCGTGAGCATCGAGATCCAGAGGGCCGTGAGCATGGATGAGA AGGAGACACTGACAATCGACCCAGCCGATGCCAGCGTGATCACCGGCGAGTATTCC GTGGTGGAGAATGCCGGCGTGGAGGAGAACGTGACAGCC Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecE DNA (SEQ ID NO:108): TACTATGACATCCCAAACGAGGCCTACCACGCAGGCCCCGGCGTGTCTAAGAGCCA GCTGGACGACATCGCCGATACCCCCGCCATCTATCTGTGGCGGAAGAATGCCCCTGT GGACACCGAGAAAACCAAGTCCCTGGATACCGGCACAGCCTTCCACTGCAGGGTGC TGGAGCCAGAGGAGTTCAGCAAGCGGTTCATCATCGCCCCCGAGTTCAACCGGAGA ACCTCCGCCGGCAAGGAGGAGGAGAAAACCTTCCTGGAGGAGTGTACCCGGACAG GCAGAACCGTGCTGACAGCCGAGGAGGGCAGGAAGATCGAGCTGATGTACCAGTC CGTGATGGCACTGCCACTGGGACAGTGGCTGGTGGAGTCTGCCGGCTACGCCGAGA GCTCCGTGTATTGGGAGGACCCTGAGACAGGCATCCTGTGCCGGTGTAGACCCGAT AAGATCATCCCTGAGTTCCACTGGATCATGGACGTGAAAACCACAGCCGACATCCA GAGGTTTCGCACCGCCTACTATGACTACAGATACCACGTGCAGGACGCCTTCTACTC TGATGGCTATAGAGCCCAGTTTGGCGAGATCCCTACATTCGTGTTTCTGGTGGCCAG CACCACAGCAGAGTGCGGCAGATACCCCGTGGAGATCTTTATGATGGGAGAGGACG CAAAGCTGGCCGGACAGCGCGAGTATAGGCGCAATCTGCAGACCCTGGCCGAGTGT CTGAACAATGATGAGTGGCCTGCCATCAAGACACTGTCTCTGCCACGGTGGGCCAA GGAGAACGCCAATGCC Pseudobacteriovorax antillogorgiicola RecT DNA (SEQ ID NO:109): GGCCACCTGGTGAGCAAGACCGAGCAGGATTACATCAAGCAGCACTATGCCAAGGG CGCCACAGACCAGGAGTTCGAGCACTTTATCGGCGTGTGCAGGGCCAGAGGCCTGA ACCCAGCCGCCAATCAGATCTACTTCGTGAAGTATCGGTCCAAGGATGGACCAGCA STDU2-42312.601 (S22-113) AAGCCAGCCTTTATCCTGTCTATCGACAGCCTGAGGCTGATCGCACACCGCACCGGC
Figure imgf000134_0001
GACAGTGCGGAGAAACCTGAAGAGCGGCGAGACAGGCAATTTCTCCGGCATGGCCT TTTATGACGAGCAGGTGCAGCAGAAGAACGGCCGGCCTACCTCCTTTTGGCAGTCTA AGCCAAGAACAATGCTGGAGAAGTGTGCAGAGGCAAAGGCCCTGAGGAAGGCCTTC CCTCAGGATCTGGGCCAGTTTTACATCAGAGAGGAGATGCCCCCTCAGTATGACGAG CCTATCCAGGTGCACAAGCCAAAGGCCCTGGAGGAGCCCAGGTTCAGCAAGTCCGA TCTGTCCAGGCGCAAGGGCCTGAACAGGAAGCTGTCTGCCCTGGGAGTGGACCCCA GCCGCTTCGATGAGGTGGCCACCTTTCTGGACGGCACACCTGATCGCGAGCTGGGCC AGAAGCTGAAGCTGTGGCTGAAGGAGGCCGGCTACGGCGTGAATCAG Pseudobacteriovorax antillogorgiicola RecE DNA (SEQ ID NO:110): AGCAAGCTGTCCAACCTGAAGGTGTCTAATAGCGACGTGGATACACTGAGCCGGAT CAGAATGAAGGAGGGCGTGTATCGGGACCTGCCAATCGAGAGCTACCACCAGTCCC CCGGCTATTCTAAGACCAGCCTGTGCCAGATCGATAAGGCCCCTATCTACCTGAAAA CCAAGGTGCCACAGAAGTCCACAAAGTCTCTGAACATCGGCACCGCCTTCCACGAG GCTATGGAGGGCGTGTTTAAGGACAAGTATGTGGTGCACCCCGATCCTGGCGTGAAT AAGACCACAAAGTCTTGGAAGGACTTCGTGAAGAGGTATCCTAAGCACATGCCACT GAAGCGCAGCGAGTACGACCAGGTGCTGGCCATGTACGATGCCGCCCGGTCTTATA GACCTTTTCAGAAGTACCACCTGAGCCGGGGCTTCTACGAGAGCTCCTTTTATTGGC ACGATGCCGTGACAAACAGCCTGATCAAGTGCAGACCCGACTATATCACCCCTGAT GGCATGAGCGTGATCGACTTCAAGACCACAGTGGACCCCAGCCCCAAGGGCTTTCA GTACCAGGCCTACAAGTATCACTACTACGTGAGCGCCGCCCTGACCCTGGAGGGAA TCGAGGCAGTGACCGGCATCAGGCCAAAGGAGTACCTGTTCCTGGCCGTGTCCAATT CTGCCCCATACCTGACCGCCCTGTATCGCGCCTCTGAGAAGGAGATCGCCCTGGGCG ACCACTTTATCCGGCGGAGCCTGCTGACCCTGAAAACCTGTCTGGAGTCTGGCAAGT GGCCCGGCCTGCAGGAGGAGATCCTGGAGCTGGGCCTGCCTTTCTCCGGCCTGAAG GAGCTGAGAGAGGAGCAGGAGGTGGAGGATGAGTTTATGGAGCTGGTGGGC Photobacterium sp. JCM 19050 RecT DNA (SEQ ID NO:111): AACACCGACATGATCGCCATGCCCCCTTCTCCAGCCATCAGCATGCTGGACACAAGC AAGCTGGATGTGATGGTGCGGGCAGCAGAGCTGATGTCCCAGGCCGTGGTCATGGT GCCCGACCACTTCAAGGGCAAGCCAGCCGATTGCCTGGCAGTGGTCATGCAGGCAG ACCAGTGGGGCATGAACCCCTTTACCGTGGCCCAGAAAACCCACCTGGTGAGCGGC ACCCTGGGATACGAGTCCCAGCTGGTGAATGCCGTGATCAGCTCCTCTAAGGCCATC AAGGGCCGGTTCCACTATGAGTGGTCTGATGGCTGGGAGAGACTGGCCGGCAAGGT GCAGTACGTGAAGGAGTCTCGGCAGAGAAAGGGCCAGCAGGGCAGCTATCAGGTG ACCGTGGCCAAGCCAACATGGAAGCCAGAGGACGAGCAGGGCCTGTGGGTGCGGT GTGGAGCCGTGCTGGCCGGAGAGAAGGACATCACATGGGGCCCTAAGCTGTACCTG GCCAGCGTGCTGGTGCGGAACAGCGAGCTGTGGACCACAAAGCCCTACCAGCAGGC CGCCTATACCGCCCTGAAGGATTGGTCCCGCCTGTATACACCTGCCGTGATGCAGGG CTCTATGACCGGCAAGAGCTGGTCCCTGACAGGCAGGCTGATCAGCCCCCGC Photobacterium sp. JCM 19050 RecE DNA (SEQ ID NO:112): GCCGAGCGGGTGAGAACCTATCAGCGGGACGCCGTGTTCGCACACGAGCTGAAGGC CGAGTTTGATGAGGCCGTGGAGAACGGCAAGACCGGCGTGACACTGGAGGACCAGG STDU2-42312.601 (S22-113) CCAGGGCCAAGAGGATGGTGCACGAGGCCACCACAAACCCCGCCTCTCGGAATTGG
Figure imgf000135_0001
GGAGGCAGGCCTGGTGCTGAAGGCCAGGCCTGACAAGGAGATCGGCAACAATCTGA TCGATGTGAAGTCCATCGAGGTGCCAACCGACGTGTGCGCCTGTGATCTGAACGCCT ATATCAATCGGCAGATCGAGAAGAGAGGCTACCACATCTCCGCCGCCCACTATCTGT CTGGCACAGGCAAGGACCGCTTCTTTTGGATCTTCATCAATAAGGTGAAGGGCTACG AGTGGGTGGCAATCGTGGAGGCCTCTCCCCTGCACATCGAGCTGGGCACCTATGAG GTGCTGGAGGGCCTGCGGAGCATCGCCAGCTCCACAAAGGAGGCAGATTACCCAGC ACCTCTGTCCCACCCTGTGAACGAGAGAGGCATCCCACAGCCCCTGATGTCTAATCT GAGCACATACGCCATGAAGAGGCTGGAGCAGTTTCGCGAGCTG Providencia alcalifaciens DSM 30120 RecT DNA (SEQ ID NO:113): AAGGCACAGCTGGCCGCCGCCCTGCCTAAGCACATCACCAGCGACCGGATGATCAG AATCGTGTCCACCGAGATCAGAAAGACCCCATCTCTGGCCAACTGCGACATCCAGA GCTTCATCGGCGCCGTGGTGCAGTGTTCTCAGCTGGGCCTGGAGCCAGGCAACGCCC TGGGACACGCCTACCTGCTGCCCTTTGGCAATGGCAAGTCCGACAACGGCAAGTCTA ATGTGCAGCTGATCATCGGCTATCGGGGCATGATCGATCTGGCCCGGAGAAGCGGC CAGATCATCTCTATCAGCGCCAGGACCGTGCGCCAGGGCGACAACTTCCACTTTGAG TACGGCCTGAACGAGAATCTGACCCACATCCCCGAGGGCAATGAGGACTCCCCTAT CACACACGTGTACGCAGTGGCACGGCTGAAGGATGAGGGCGTGCAGTTCGAAGTGA TGACATATAACCAGATCGAGAAGGTGAGAGATAGCTCCAAGGCCGGCAAGAATGGC CCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTGTT TAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGATCCTGGACGAGAAGG CCGAGGCCAATATCGAGCAGGATCACTCCGCCATCTTCGAGGCCGAGTTTGAGGAG GTGGACTCTAACGGCAAT Providencia alcalifaciens DSM 30120 RecE DNA (SEQ ID NO:114): AACGAGGGCATCTACTATGACATCTCTAATGAGGACTATCACCACGGCCTGGGCATC TCTAAGAGCCAGCTGGATCTGATCGACGAGAGCCCCGCCGATTTCATCTGGCACCGG GATGCCCCTGTGGACAACGAGAAAACCAAGGCCCTGGATTTTGGCACAGCCCTGCA CTGCCTGCTGCTGGAGCCAGACGAGTTCCAGAAGAGGTTTCGCATCGCCCCCGAGGT GAACCGGAGAACAAATGCCGGCAAGGAGCAGGAGAAGGAGTTCCTGGAGATGTGC GAGAAGGAGAATATCACCCCCATCACAAACGAGGATAATAGGAAGCTGTCTCTGAT GAAGGACAGCGCAATGGCCCACCCTATCGCCCGCTGGTGTCTGGAGGCCAAGGGCA TCGCCGAGAGCTCCATCTATTGGAAGGACAAGGATACAGACATCCTGTGCCGGTGT AGACCAGACAAGCTGATCGAGGAGCACCACTGGCTGGTGGATGTGAAGTCCACCGC CGACATCCAGAAGTTCGAGCGGTCTATGTACGAGTATAGATACCACGTGCAGGATTC CTTTTATTCTGACGGCTACAAGAGCCTGACAGGCGAGATGCCCGTGTTCGTGTTCCT GGCCGTGTCCACCGTGATCAACTGCGGCAGATACCCCGTGCGGGTGTTCGTGCTGGA CGAGCAGGCAAAGTCCGTGGGACGGATCACCTATAAGCAGAATCTGTTTACATACG CCGAGTGTCTGAAAACCGACGAGTGGGCCGGCATCAGAACCCTGAGCCTGCCCTCC TGGGCAAAGGAGCTGAAGCACGAGCACACCACAGCCTCT Pantoea stewartii RecT Protein (SEQ ID NO:115): STDU2-42312.601 (S22-113) MSNQPPIASADLQKANTGKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI
Figure imgf000136_0001
LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGENEDAPITHVYAV ARLKDGGTQFEVMTVKQIEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSIE MQKAVILDEKAESDVDQDNASVLSAEYSVLDGSSEE Pantoea stewartii RecE Protein (SEQ ID NO:116): MQPGVYYDISNEEYHAGPGISKSQLDDIAVSPAIFQWRKSAPVDDEKTAALDLGTALHC LLLEPDEFSKRFMIGPEVNRRTNAGKQKEQDFLDMCEQQGITPITHDDNRKLRLMRDSA FAHPVARWMLETEGKAEASIYWNDRDTQILSRCRPDKLITEFSWCVDVKSTADIGKFQK DFYSYRYHVQDAFYSDGYEAQFCEVPTFAFLVVSSSIDCGRYPVQVFIMDQQAKDAGR AEYKRNLTTYAECQARNEWPGIATLSLPYWAKEIRNV Pantoea brenneri RecT Protein (SEQ ID NO:117): MSNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI RIVTTEIRKTPQLAQCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLVHRPGENEDAPITHVYAV ARLKDGGTQFEVMTVKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSI EMQKAVVLDEKAESDVDQDNASVLSAEYSVLESGDEATN Pantoea brenneri RecE Protein (SEQ ID NO:118): MQPGIYYDISNEDYHRGAGISKSQLDDIAISPAIYQWRKHAPVDEEKTAALDLGTALHCL LLEPDEFSKRFQIGPEVNRRTTAGKEKEKEFIERCEAEGITPITHDDNRKLKLMRDSALAH PIARWMLEAQGNAEASIYWNDRDAGVLSRCRPDKIITEFNWCVDVKSTADIMKFQKDF YSYRYHVQDAFYSDGYESHFHETPTFAFLAVSTSIDCGRYPVQVFIMDQQAKDAGRAE YKRNIHTFAECLSRNEWPGIATLSLPFWAKELRNE Pantoea dispersa RecT Protein (SEQ ID NO:119): MSNQPPLATADLQKTQQSNQVAKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMIR IVTTEIRKTPALAQCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQLI IGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGDNESAPITHVYAVAR LKDGGTQFEVMTAKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSIEM QKAVVLDEKAESDVDQDNASVLSAEYSVLESGTGE Pantoea dispersa RecE Protein (SEQ ID NO:120): MEPGIYYDISNEAYHSGPGISKSQLDDIARSPAIFQWRKDAPVDTEKTKALDLGTDFHCA VLEPERFADMYRVGPEVNRRTTAGKAEEKEFFEKCEKDGAVPITHDDARKVELMRGSV MAHPIAKQMIAAQGHAEASIYWHDESTGNLCRCRPDKFIPDWNWIVDVKTTADMKKFR REFYDLRYHVQDAFYTDGYAAQFGERPTFVFVVTSTTIDCGRYPTEVFFLDEETKAAGR SEYQSNLVTYSECLSRNEWPGIATLSLPHWAKELRNV Type-F symbiont of Plautia stali RecT Protein (SEQ ID NO:121): MSNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI RIVTTEIRKTPALATCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGDNEDAPITHVYAV STDU2-42312.601 (S22-113) ARLKDGGTQFEVMTAKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSI EMQKAVVLDEKAESDVDQDNASVLSAEYSVLEGDGGE
Figure imgf000137_0001
Type-F symbiont of Plautia stali RecE Protein (SEQ ID NO:122): MQPGIYYDISNEDYHGGPGISKSQLDDIAISPAIYQWRKHAPVDEEKTAALDLGTALHCL LLEPDEFSKRFEIGPEVNRRTTAGKEKEKEFMERCEAEGVTPITHDDNRKLRLMRDSAM AHPIARWMLEAQGNAEASIYWNDRDTGVLSRCRPDKIITDFNWCVDVKSTADIIKFQKD FYSYRYHVQDAFYSDGYESHFDETPTFAFLAVSTSIDCGRYPVQVFIMDQQAKDAGRAE YKRNIHTFAECLSRNEWPGIATLSLPYWAKELRNE Providencia stuartii RecT Protein (SEQ ID NO:123): MSNPPLAQADLQKTQGTEVKEKTKDQMLVELINKPSMKAQLAAALPRHMTPDRMIRIV TTEIRKTPALATCDMQSFVGAVVQCSQLGLEPGNALGHAYLLPFGNGKSKSGQSNVQLI IGYRGMIDLARRSGQIVSISARTVRQGDNFHFEYGLNENLTHVPGENEDSPITHVYAVAR LKDGGVQFEVMTYNQIEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEM QKAVILDEKAEANIDQENATIFEGEYEEVGTDGK Providencia stuartii RecE Protein (SEQ ID NO:124): EGIYYNISNEDYHNGLGISKSQLDLINEMPAEYIWSKEAPVDEEKIKPLEIGTALHCLLLE PDEYHKRYKIGPDVNRRTNAGKEKEKEFFDMCEKEGITPITHDDNRKLMIMRDSALAHP IAKWCLEADGVSESSIYWTDKETDVLCRCRPDRIITAHNYIVDVKSSGDIEKFDYEYYNY RYHVQDAFYSDGYKEVTGITPTFLFLVVSTKIDCGKYPVRTYVMSEEAKSAGRTAYKH NLLTYAECLKTDEWAGIRTLSLPRWAKELRNE Providencia sp. MGF014 RecT Protein (SEQ ID NO:125): MSNPPLAQSDLQKTQGTEVKVKTKDQQLIQFINQPSMKAQLAAALPRHMTPDRMIRIVT TEIRKTPALATCDMQSFVGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLII GYRGMIDLARRSNQIISISARTVRQGDNFHFEYGLNEDLTHTPSENEDSPITHVYAVARL KDGGVQFEVMTYNQVEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQ KAVVLDEKAEANVDQENATIFEGEYEEVGTDGN Providencia sp. MGF014 RecE Protein (SEQ ID NO:126): MKEGIYYNISNEDYHNGLGISKSQLDLINEMPAEYIWSKEAPVDEEKIKPLEIGTALHCLL LEPDEYHKRYKIGPDVNRRTNVGKEKEKEFFDMCEKEGITPITHDDNRKLMIMRDSALA HPIAKWCLEADGVSESSIYWTDKETDVLCRCRPDRIITAHNYIIDVKSSGDIEKFDYEYYN YRYHVQDAFYSDGYKEVTGITPTFLFLVVSTKIDCGKYPVRTYVMSEEAKSAGRTAYK HNLLTYAECLKTDEWAGIRTLSLPRWAKELRNE Shewanella putrefaciens RecT Protein (SEQ ID NO:127): MQTAQVKLSVPHQQVYQDNFNYLSSQVVGHLVDLNEEIGYLNQIVFNSLSTASPLDVA APWSVYGLLLNVCRLGLSLNPEKKLAYVMPSWSETGEIIMKLYPGYRGEIAIASNFNVIK NANAVLVYENDHFRIQAATGEIEHFVTSLSIDPRVRGACSGGYCRSVLMDNTIQISYLSIE EMNAIAQNQIEANMGNTPWNSIWRTEMNRVALYRRAAKDWRQLIKATPEIQSALSDTE Y Shewanella putrefaciens RecE Protein (SEQ ID NO:128): STDU2-42312.601 (S22-113) MGTALAQTISLDWQDTIQPAYTASGKPNFLNAQGEIVEGIYTDLPNSVYHALDAHSSTGI
Figure imgf000138_0001
DFPDIELIESIPQLKAALAKSNLPVSGAKAALIERLYAFDPSLPLFEKMREKAITDYLDLR YAKYLRTDVELDEMATFYGIDTSQTREKKIEEILAISPSQPIWEKLISQHVIDHIVWDDAM RVERSTRAHPKADWLISDGYAELTIIARCPTTGLLLKVRFDWLRNDAIGVDFKTTLSTNP TKFGYQIKDLRYDLQQVFYCYVANLAGIPVKHFCFVATEYKDADNCETFELSHKKVIES TEEMFDLLDEFKEALTSGNWYGHDRSRSTWVIEV Bacillus sp. MUM 116 RecT Protein (SEQ ID NO:129): MSKQLTTVNTQAVVGTFSQAELDTLKQTIAKGTTNEQFALFVQTCANSRLNPFLNHIHCI VYNGKEGATMSLQIAVEGILYLARKTDGYKGIECQLIHENDEFKFDAKSKEVDHQIGFP RGNVIGGYAIAKREGFDDVVVLMESNEVDHMLKGRNGHMWRDWFNDMFKKHIMKR AAKLQYGIEIAEDETVSSGPSVDNIPEYKPQPRKDITPNQDVIDAPPQQPKQDDEAAKLK AARSEVSKKFKKLGIVKEDQTEYVEKHVPGFKGTLSDFIGLSQLLDLNIEAQEAQSADG DLLD Bacillus sp. MUM 116 RecE Protein (SEQ ID NO:130): MTYAADETLVQLLLSVDGKQLLLGRGLKKGKAQYYINEVPSKAKEFEEIRDQLFDKDLF MSLFNPSYFFTLHWEKQRAMMLKYVTAPVSKEVLKNLPEAQSEVLERYLKKHSLVDLE KIHKDNKNKQDKAYISAQSRTNTLKEQLMQLTEEKLDIDSIKAELAHIDMQVIELEKQM DTAFEKNQAFNLQAQIRNLQDKIEMSKERWPSLKNEVIEDTCRTCKRPLDEDSVEAVKA DKDNRIAEYKAKHNSLVSQRNELKEQLNTIEYIDVTELREQIKELDESGQPLREQVRIYS QYQNLDTQVKSAEADENGILQDLKASIFILDSIKAFRGKEAEMQAEKVQALFTTLSVRLF KQNKGDGEIKPDFEIEMNDKPYRTLSLSEGIRAGLELRDVLSQQSELVTPTFVDNAESITS FKQPNGQLIISRVVAGQELKIEAVSE Shigella sonnei RecT Protein (SEQ ID NO:131): MTKQPPIAKADLQKTQENRAPAAIKNNDVISFINQPSMKEQLAAALPRHMTAERMIRIA TTEIRKVPALGNCDTMSFVSAIVQCSQLGLEPGSALGHAYLLPFGNKNEKSGKKNVQLII GYRGMIDLARRSGQIASLSARVVREGDEFNFEFGLDEKLIHRPGENEDAPVTHVYAVAR LKDGGTQFEVMTRRQIELVRSQSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQR AVSMDEKEPLTIDPADSSVLTGEYSVIDNSEE Shigella sonnei RecE Protein (SEQ ID NO:132): DRGLLTKEWRKGNRVSRITRTASGANAGGGNLTDRGEGFVHDLTSLARDIATGVLARS MDVDIYNLHPAHAKRIEEIIAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIEVI PAHVTAYLNKVLTETDHANPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKLQPSGTTA DEQGEAETMEPDATKHHQDTQPLDAQSQVNSVDAKYQELRAELHEARKNIPSKNPVDA DKLLAASRGEFVDGISDPNDPKWVKGIQTRDSVYQNQPETEKTSPDMKQPEPVVQQEPE IAFNACGQTGGDNCPDCGAVMGDATYQETFDEENQVEAKENDPEEMEGAEHPHNENA GSDPHRDCSDETGEVADPVIVEDIEPGIYYGISNENYHAGPGVSKSQLDDIADTPALYLW RKNAPVDTTKTKTLDLGTAFHCRVLEPEEFSNRFIVAPEFNRRTNAGKEEEKAFLMECA STGKMVITAEEGRKIELMYQSVMALPLGQWLVESAGHAESSIYWEDPETGILCRCRPDK IIPEFHWIMDVKTTADIQRFKTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFLVASTTIE CGRYPVEIFMMGEEAKLAGQLEYHRNLRTLADCLNTDEWPAIKTLSLPRWAKEYAND STDU2-42312.601 (S22-113) Salmonella enterica RecT Protein (SEQ ID NO:133):
Figure imgf000139_0001
ATTEIRKVPELRNCDSTSFIGAIVQCSQLGLEPGSALGHAYLLPFGNGKAKNGKKNVQLII GYRGMIDLARRSGQIISLSARVVRECDEFSYELGLDEKLVHRPGENEDAPITHVYAVAKL KDGGVQFEVMTKKQVEKVRDTHSKAAKNAASKGASSIWDEHFEDMAKKTVIRKLFKY LPVSIEIQRAVSMDGKEVETINPDDISVIAGEYSVIDNPEE Salmonella enterica RecE Protein (SEQ ID NO:134): DRGLLTKEWRKGNRVSRITRTASGANAGGGNLTDRGEGFVHDLTSLARDVATGVLAR SMDVDIYNLHPAHAKRVEEIIAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIE VIPAHVTEYLNKVLTETDHANPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKPQPSGA MADEQATAETVEPNATEHHQNTQPLDAQSQVNSVDAKYQELRAELQEARKNIPSKNPV DADKLLAASRGEFVDGISDPNDPKWVKGIQTRDSVYQNQPETEKISPDAKQPEPVVQQE PETVCNACGQTGGDNCPDCGAVMGDATYQETFGEENQVEAKEKDPEEMEGAEHPHNE NAGSDPHRDCSDETGEVADPVIVEDIEPGIYYGISNENYHAGPGVSKSQLDDIADTPALY LWRKNAPVDTTKTKTLDLGTAFHCRVLEPEEFSNRFIVAPEFNRRTNAGKEEEKAFLME CASTGKTVITAEEGRKIELMYQSVMALPLGQWLVESAGHAESSIYWEDPETGILCRCRP DKIIPEFHWIMDVKTTADIQRFKTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFLVAST TVECGRYPVEIFMMGEEAKLAGQQEYHRNLRTLADCLNTDEWPAIKTLSLPRWAKEYA ND Acetobacter RecT Protein (SEQ ID NO:135): MNAPQKQNTRAAVKKISPQEFAEQFAAIIPQVKSVLPAHVTFEKFERVVRLAVRKNPDL LTCSPASLFMACIQAASDGLLPDGREGAIVSRWSSKKSCNEASWMPMVAGLMKLARNS GDIASISSQVVFEGEHFRVVLGDEERIEHERDLGKTGGKIVAAYAVARLKDGSDPIREIM SWGQIEKIRNTNKKWEWGPWKAWEDEMARKTVIRRLAKRLPMSTDKEGERLRSAIERI DSLVDISANVDAPQIAADDEFAAAAHGVEPQQIAAPDLIGRLAQMQSLEQVQDIEPQVS HAIQEADKRGDSDTANALDAALQSALSRTSTAKEEVPA Acetobacter RecE Protein (SEQ ID NO:136): MVISKSGIYDLTNEQYHADPCPEMSLSSSGARDLLSSCPAKFIAAKQLPQQNKRCFDIGS AGHLMVLEPHLFDQKVCEIKHPDWRTKAAKEERDAAYAEGRIPLLSREVEDIRAMHSV VWRDSLGARAFSGGKAEQSLVWRDEEFGIWCRLRPDYVPNNAVRIFDYKTATNGSPDA FMKEIYNRGYHQQAAWYLDGYEAVTGHRPREFWFVVQEKTAPFLLSFFQMDEMSLEIG RTLNRQAKGIFAWCLRNNCWPGYQPEVDGKVRFFTTSPPAWLVREYEFKNEHGAYEPP EIKRKEVA Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecT Protein (SEQ ID NO:137): MPKQPPIAKADLQKTQGARTPTAVKNNNDVISFINQPSMKEQLAAALPRHMTAERMIRI ATTEIRKVPALGDCDTMSFVSAIVQCSQLGLEPGGALGHAYLLPFGNRNEKSGKKNVQL IIGYRGMIDLARRSGQIASLSARVVREGDDFSFEFGLEEKLVHRPGENEDAPVTHVYAVA RLKDGGTQFEVMTRKQIELVRAQSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEI QRAVSMDEKETLTIDPADASVITGEYSVVENAGVEENVTA Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecE Protein (SEQ ID NO:138): STDU2-42312.601 (S22-113) MYYDIPNEAYHAGPGVSKSQLDDIADTPAIYLWRKNAPVDTEKTKSLDTGTAFHCRVLE
Figure imgf000140_0001
QWLVESAGYAESSVYWEDPETGILCRCRPDKIIPEFHWIMDVKTTADIQRFRTAYYDYR YHVQDAFYSDGYRAQFGEIPTFVFLVASTTAECGRYPVEIFMMGEDAKLAGQREYRRN LQTLAECLNNDEWPAIKTLSLPRWAKENANA Pseudobacteriovorax antillogorgiicola RecT Protein (SEQ ID NO:139): MGHLVSKTEQDYIKQHYAKGATDQEFEHFIGVCRARGLNPAANQIYFVKYRSKDGPAK PAFILSIDSLRLIAHRTGDYAGCSEPIFTDGGKACTVTVRRNLKSGETGNFSGMAFYDEQ VQQKNGRPTSFWQSKPRTMLEKCAEAKALRKAFPQDLGQFYIREEMPPQYDEPIQVHK PKALEEPRFSKSDLSRRKGLNRKLSALGVDPSRFDEVATFLDGTPDRELGQKLKLWLKE AGYGVNQ Pseudobacteriovorax antillogorgiicola RecE Protein (SEQ ID NO:140): MSKLSNLKVSNSDVDTLSRIRMKEGVYRDLPIESYHQSPGYSKTSLCQIDKAPIYLKTKV PQKSTKSLNIGTAFHEAMEGVFKDKYVVHPDPGVNKTTKSWKDFVKRYPKHMPLKRSE YDQVLAMYDAARSYRPFQKYHLSRGFYESSFYWHDAVTNSLIKCRPDYITPDGMSVIDF KTTVDPSPKGFQYQAYKYHYYVSAALTLEGIEAVTGIRPKEYLFLAVSNSAPYLTALYR ASEKEIALGDHFIRRSLLTLKTCLESGKWPGLQEEILELGLPFSGLKELREEQEVEDEFME LVG Photobacterium sp. JCM 19050 RecT Protein (SEQ ID NO:141): MNTDMIAMPPSPAISMLDTSKLDVMVRAAELMSQAVVMVPDHFKGKPADCLAVVMQ ADQWGMNPFTVAQKTHLVSGTLGYESQLVNAVISSSKAIKGRFHYEWSDGWERLAGK VQYVKESRQRKGQQGSYQVTVAKPTWKPEDEQGLWVRCGAVLAGEKDITWGPKLYL ASVLVRNSELWTTKPYQQAAYTALKDWSRLYTPAVMQGSMTGKSWSLTGRLISPR Photobacterium sp. JCM 19050 RecE Protein (SEQ ID NO:142): MAERVRTYQRDAVFAHELKAEFDEAVENGKTGVTLEDQARAKRMVHEATTNPASRN WFRYDGELAACERSYFWRDEEAGLVLKARPDKEIGNNLIDVKSIEVPTDVCACDLNAYI NRQIEKRGYHISAAHYLSGTGKDRFFWIFINKVKGYEWVAIVEASPLHIELGTYEVLEGL RSIASSTKEADYPAPLSHPVNERGIPQPLMSNLSTYAMKRLEQFREL Providencia alcalifaciens DSM 30120 RecT Protein (SEQ ID NO:143): MKAQLAAALPKHITSDRMIRIVSTEIRKTPSLANCDIQSFIGAVVQCSQLGLEPGNALGH AYLLPFGNGKSDNGKSNVQLIIGYRGMIDLARRSGQIISISARTVRQGDNFHFEYGLNEN LTHIPEGNEDSPITHVYAVARLKDEGVQFEVMTYNQIEKVRDSSKAGKNGPWVTHWEE MAKKTVIRRLFKYLPVSIEMQKAVILDEKAEANIEQDHSAIFEAEFEEVDSNGN Providencia alcalifaciens DSM 30120 RecE Protein (SEQ ID NO:144): MNEGIYYDISNEDYHHGLGISKSQLDLIDESPADFIWHRDAPVDNEKTKALDFGTALHCL LLEPDEFQKRFRIAPEVNRRTNAGKEQEKEFLEMCEKENITPITNEDNRKLSLMKDSAM AHPIARWCLEAKGIAESSIYWKDKDTDILCRCRPDKLIEEHHWLVDVKSTADIQKFERS MYEYRYHVQDSFYSDGYKSLTGEMPVFVFLAVSTVINCGRYPVRVFVLDEQAKSVGRI TYKQNLFTYAECLKTDEWAGIRTLSLPSWAKELKHEHTTAS STDU2-42312.601 (S22-113) Mouse Albumin knock-in sense template (SEQ ID NO:160)
Figure imgf000141_0001
GTGGAAACAGGGAGAGAAAAACCACACAACATATTTAAAGATTGATGAAGACAACT AACTGTAATATGCTGCTTTTTGTTCTTCTCTTCACTGAGCTTTTCGAACTGCGGGTGG CTCCAGGATcctgtgggaggaagagaagaggtcagCTACTCCCTGAAGATGCCAGTTCCCGATCGT TACAGGAAAATCTGAAGGTG (SEQ ID NO:162) ACTTTGAGTGTAGCAGAGAGGAACCATTGCCACCTTCAGATTTTCCTGTAACGATCG GGAACTGGCATCTTCAGGGAGTAGCTGACCTCTTCTCTTCCTCCCACAGGATCCTGG AGCCACC Example 16 [00393] The structure of E. coli RecT (EcRecT) alone (FIG.36A) and with bound single-strand DNA (FIGS.36B and 36C) was predicted. The contact interface is consistent with truncation data (Example 7, FIG.20A). Predicted interactions of EcRecT SSAP amino acids with DNA are shown in FIGS. 37A and 37B. Example 17 [00394] 322 SSAP proteins were identified from sequence data, synthesized and screened for activity with Cas9 and dCas9. Gene editing activities are shown below in Table 7, followed by amino acid sequences of the proteins. Table 7. Cas9 Activity with SSAP d)
Figure imgf000141_0002
STDU2-42312.601 (S22-113) WP_074846740.1 0.558 0.254 0.626 0.863 0.407 UPI0008EA8633 0.551 0.397 0.895 0.705 1.131
Figure imgf000142_0001
STDU2-42312.601 (S22-113) UPI00078ED021 -0.491 0.218 0.538 -1.201 -0.336 WP_016998679.1 -2.985 -4.713 -0.683 -1.257 -0.422
Figure imgf000143_0001
STDU2-42312.601 (S22-113) UPI00025CF49A -0.927 0.813 0.564 -2.667 -0.795 WP_020007369.1 -3.455 -4.226 -0.744 -2.685 -0.421
Figure imgf000144_0001
STDU2-42312.601 (S22-113) OAB27843.1 -1.809 0.282 0.572 -3.899 -0.575 WP_045958294.1 -2.219 -0.504 -0.093 -3.935 -0.892
Figure imgf000145_0001
STDU2-42312.601 (S22-113) WP_055284109.1 -1.184 2.314 1.852 -4.682 -0.968 WP_128520904.1 -1.789 1.105 2.418 -4.682 -0.680
Figure imgf000146_0001
STDU2-42312.601 (S22-113) SCQ72869.1 -6.118 -6.279 -0.994 -5.957 -0.867 WP_006845711.1 -5.195 -4.425 -0.688 -5.964 -1.008
Figure imgf000147_0001
STDU2-42312.601 (S22-113) 461 WP_076170610.1 -6.169 -4.432 -0.752 -7.905 -1.546 399 WP_142511229.1 -3.700 0.548 0.321 -7.949
Figure imgf000148_0001
Figure imgf000148_0002
UPI0000010203 (SEQ ID NO:172) ATNESLKNQLSTKKETGLGSAGNTIKGLMNSPAIKKRFEEVLKQRAPQYMSSIVNLVNS DINLKKCDQMSVVASCMVAATLDLPVDKNLGYAWVVPYGNKAQFQLGYKGYVQLAL RTGQYKSINVIEIHEGELIDWNPLTEELKIDFSKKESDAVIGYAGYFELLNGFKKSTYWTK EQITKHKNKFSKSDFGWKKDFDAMARKTVLRNMLSKWGILSIEMQNAYTADQGIIKNEI IETGEVKENIEYIEADFESYEDNSIEEGGANE UPI00000105D3 (SEQ ID NO:173) ATNESLKNQLTTKKETGLGSAGNTIKGLMNSPAIKKRFEEVLKQRAPQYMSSIVNLVNS DINLKKCDQMSVVASCMVAATLDLPVDKNLGYAWVVPYGNKAQFQLGYKGYVQLAL RTGQYKSINVIEIHEGELIDWNPLTEELKIDFSKKESDAVIGYAGYFELLNGFKKSTYWTK EQITKHKNKFSKSDFGWKKDFDAMARKTVLRNMLSKWGILSIEMQNAYTADQGIIKNEI METGEVKENIEYIEADFESYEDNSIEEGGANE UPI0000030D3A / HAW2682705.1 RecT [Escherichia coli] (SEQ ID NO:167) TKQPPIAKADLQKTQGNRAPAAVKNSDVISFINQPSMKEQLAAALPRHMTAERMIRIAT TEIRKVPALGNCDTMSFVSAIVQCSQLGLEPGSALGHAYLLPFGNKNEKSGKKNVQLIIG STDU2-42312.601 (S22-113) YRGMIDLARRSGQIASLSARVVREGDEFSFEFGLDEKLIHRPGENEDAPVTHVYAVARL
Figure imgf000149_0001
VSMDEKEPLTIDPADSSVLTGEYSVIDNSEE UPI0000030D3E (SEQ ID NO:166) STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI VENTAYTAERQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKALGFLKQKAAEQKVAA UPI000009AF52 (SEQ ID NO:174) ANELGIFSVDNLNMTTIKQYLDGGGKASDAELVLLINLCKQNNMNPFMKEVYFIKYGN QPAQIVVSRDFYRKRAFQNPNFVGIEVGVIVLNKDGVLEHNEGTFKTHEQELVGAWAR VHLKNTEIPVYVAVSYDEYVQMKDGHPNKMWTNKPCTMLGKVAESQALRMAFPAEFS GTYGEEEYPEPEKEPREVNGVKEPDRAQIESFDKEDYAAKKIEELKEKAQPQKEVVEET GEVIDEEPLEGF UPI000009B019 (SEQ ID NO:175) STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI VENTAYTAERQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKALGFLKQKATEQKVAA UPI000009B628 (SEQ ID NO:176) STNDELKNKLANKQNGGQVASAQSLGLKGLLEAPTMRKKFESVLDKKAPQFLTSLLNL YNGDDYLQKTDPMTVVTSAMVAATLDLPIDKNLGYAWIVPYKGRAQFQLGYKGYIQL ALRTGQYKSINVIEVRDGELLKWNRLTEEIELDLDNNTSEKVIGYCGYFQLINGFEKTVY WTRKEIEAHKKKFSKSDFGWKKDYDAMAKKTVLRNMLSKWGILSIDMQTAVTEDEAE PRERKDVTEDESIPDIIDAPITPSDTLEAGSEVQGSMI UPI000009BC15 (SEQ ID NO:177) STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI VENTAYTAERQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKVLGFLKQKASEQKVAA UPI00000B3F97 Bet [Gammaproteobacteria] (SEQ ID NO:178) EKPKLIQRFAERFSVDPNKLFDTLKATAFKQRDGSAPTNEQMMALLVVADQYGLNPFT KEIFAFPDKQAGIIPVVGVDGWSRIINQHDQFDGMEFKTSENKVSLDGAKECPEWMECII STDU2-42312.601 (S22-113) YRRDRSHPVKITEYLDEVYRPPFEGNGKNGPYRVDGPWQTHTKRMLRHKSMIQCSRIAF
Figure imgf000150_0001
ANEHFQGVELTFAKQEIFNAQQQAAKALTQPLAS UPI000019AB49 Bet [Escherichia coli] (SEQ ID NO:179) STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI VENTAYTAERQPERDITPVNNETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKALGFLKQKATEQKVAA UPI000034E66D Bet [Lactococcus phage phiLC3] (SEQ ID NO:180) ANEIDIYDAKNLNTATVKKFLKGGGQASDEELAMLLAISRNQNMNPFMKEVYFIKYGS AAAQIVVSRDFYRKRAFQNPNFAGIEVGVIVLNKDGVLEHNEGTFKTKDQELVGAWAR VHLKNTEIPVYVAVSYDEYVQMKNGQPNSMWTNKPCTMLGKVAESQALRMAFPAEFS GTYGEEEYPEPEKEPREVNGVKEPDRAQIESFDKENYAARKIEELKEKAQPQKEFVEEIG EAIDEITAEDF UPI00005F0A78 (SEQ ID NO:181) STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI VENTAYTAERQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKALGFLKQKASEQKVAA UPI000150D6AC (SEQ ID NO:182) ANELGIFSVDNLNMTTIKQYLDGGGKASDAELVLLINLCKQNNMNPFMKEVYFIKYGN QPAQIVVSRDFYRKRAFQNPNFVGIEVGVIVLNKDGVLEHNEGTFKTHEQELVGAWAR VHLKNTEIPVYVAVSYDEYVQMKDGHPNKMWTNKPCTMLGKVAESQALRMAFPAEFS GTYGEEEFPEPEKEPREVNGVKEPDRAQIESFDKEDYAAKKIEELKEKAQPQKEVVEETG EVIDEEPLEGF UPI0001594E53 (SEQ ID NO:183) TTALQTLTNKLAERFEMGDGSGLVETLKSSAFAGATVSDAQMIALLVIANQYQLNPWT KEIYAFPGGNGGLTPIVGVDGWVRIINREPQYDGMEFHFTDDYSACTCTIYRKDRSKPIV VTEFMGECKKSSPAWNSHPKRMLRHKAMIQCARLAFGFTGIYDQDEAERIAENEKPPK NITPQNNVVETTAVELISEEQLSQIRQLMQVTGTEEAKILAYIGVQALNQIPKSQAEAVIK KLNLTLDKQNAEKADNGESVGEEIPL UPI00015968D7 (SEQ ID NO:184) AKNELAKGSYLTDLQKLDGNTLRDFVDPKHQASPQELQALLAIVKGRNLNPFTKEVYFI KYGSAPAQIVVSKEAIMKRAEENPDFDGFEAGIVVETKDGAIERLTGTIVPKSATLRGGW STDU2-42312.601 (S22-113) CKVYRKDRSHAIEADADFAYYTTSKNLWQKMPALMIRKVAIVSAFREAFSESVGGLYT
Figure imgf000151_0001
TVEEPTQDGNLEW UPI00015C01AE (SEQ ID NO:185) MAKENYSDPNGKLLNSITTFEVNGEEVKLSGNIIRDYLVSGNAEVTDQEIIMFLQLCKYQ KLNPFLNEAYLVKFKNTKGPDKPAQIIVSKEAFMKRAETHEQYDGFEAGVIVERGGEIIE LEGAVSLASDKLLGGWAKVFRKDRNRPVSVRISEKEFNKRQSTWNTMPLTMMRKTAV VNAMREAFPDNLGAMYTEEEQGSLQNTETSVQQEIKQNANAEVLDIPSQQNEVPDFKE VREPEHVEMPPIYGEQQSTPPARPY UPI00015C02E0 (SEQ ID NO:186) ATNDELKNQLANKQNGGQVASAQSLDLKGLLEAPTMRKKFEKVLDKKAPQFLTSLLNL YNGDDYLQKTDPMTVVTSAMVAATLDLPIDKNLGYAWIVPYKGRAQFQLGYKGYIQL ALRTGQYKSINVIEVRDGELLKWNRLTEEIELDLDNNTSEKVIGYCGYFQLINGFEKTVY WTRKEIEAHKKKFSKSDFGWKKDYDAMAKKTVLRNMLSKWGILSIDMQTAVTEDEAE PRERKDVTEDESIPDIIDAPITPSDTLEAGSEVQGSMI UPI00019E1F9A (SEQ ID NO:187) PQEIAKVEYTAADGQEVRLTPGVIAKYIVSGNGLASEKDIYSFMARCQARGLNPLAGDA YMTVYQGKDGNTSSSVIVSKDYFVRTATAQDSFDGMEAGVTVLNGQGQIQKREGCEFF PSLGEKLLGGWAKVHVKDREHPSKAAVTMDEYDQHRSLWKSKPATMIRKVAIVQALR EAYPGQFGGVYDRDEMPPSQEPQQVPVEVYEAPEAYETPDNQNRATEEF UPI0001BEF484 (SEQ ID NO:188) NTPTMKRKFEEVLHENANAFMSNVMTLVSNDSYLAESEPMSILSGALTAATLNLGLDK NLGYAYLVPFNTKNKQTGKWERKAQFILGYKGYIQLAQRSGKYKALNVIEVYEGELLS WNRLTEEFEFDPNGRQSDDVIGYVGYFELLNGFKKTVYWTKQEIEAHRIANSKDKEKTK LSGVWATDYNAMARKTVLRNMLSKWGILSIEMQEATTSDEKVQQMQEDGNIISETEVE ENTTMKTAEVINEADSDSLNQTDLFDTKNPPLE UPI0001CE597A CK3_26380 [butyrate-producing bacterium SS3/4] (SEQ ID NO:189) ENATAVQQAESQGTQDFSAPVKHNTDFSLGIFGSSDNFLMATQMAKAFASSTIVPKEYQ GNFANGLVAMDIANRLKTSPFMVMQNLDVIQGRPAWRATFLIAMINRSKKYDIELQFEE KRDKNGKPYSCTCWTTKDGRKVTGIEVTMDMAEAEGWTKKNGSKWITMPQVMLRYR AASFFSRMNCPELSNGLYTTDEVYEMADSEYKVYNLEDEVKRDLAQNANKEEFVAPPN ETAPESESKGSEPLDPAVENQKSGDTPDWMKPETM UPI0001D2DF22 RecT [Cellulosilyticum lentocellum] (SEQ ID NO:190) SDKKELVLKETHSRLNQLLATKMEAMPKDFNQTRFLQNCMTVLQDTKGIENCHPVSIA RTLLKGAFLGLDFFQRECYAIPYGGELQFQTDYKGETKMAKKYSIRDIKDIYAKVVRKG DEFKEEIVAGQQVVDFKPLPFNDAEIIGAFAVVLYQDGGMEYETMSTKQIEGIRDNFSK STDU2-42312.601 (S22-113) MKNGLMWTKTPEEAYKKTVLRRLTKKIEKDFASIDQAKAYEESSDMQFKQDEQKQDA KDPFADAVDVEFTEETEGQVRLDGEADGAK
Figure imgf000152_0001
UPI0001E0C499 (SEQ ID NO:191) SNELMTKAVTYEVNNEEVKLSGQIVKQYLTSGQAVTDQEVTMFIQLCRYQHLNPFLNE AYLVKFNGKPAQIITSKEAFMKRAESNPNYAGLKAGCIVERNGELIYTEGAFTLKTDNIL GAWADVIRKDRREPTHVEISMDEFSKSQATWKSMPATMIRKTAIVNALREAFPQDLGA LYTEDDKNPNEATQTTYKQEPEVNTTKTADVLAKKFSGAPQIKSVENVQESEEESNNAS NHGEATEPVNNVEEPTATAEVEQGQLL UPI0001E2AFC1 (SEQ ID NO:192) TNNQLATQIKRDITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV YIVAYQNNSGTEFSLIVSKEAFMKRAERCEGYDGFEAGITVMRNGEMIEIEGSLKLPEDIL IGGWAVVYRKDRSHRYKVTVDFNEYVKTDRNGNPRSTWKSMPATMIRKTALVQTLRE AFPDELGNMYTDIDGGDTFDAIKDVTPQESREDVVARKMAQIDQFNKEQEANHADPEP TQNEDPIQGELLDGELEY UPI0001E35ACE (SEQ ID NO:193) TNNQIVEAKGDFLTNPQLLNSGIIRKYLDPQGKASDEELAYFIAQAKAQNLNPFTKEIYFI KYGTQPAQIVTAKSAFEKKADSHPQFDGKEAGVIYLMDGEIKYSKGAFIPKGAEILGGW AKVYRKDRTYPTETEVSFEEYDNSKIRARVKELTQQGKDVTYPVMNSYGKPIGENNWD TMPCVMIRKVALVSAYREAFPAELGASYEADEIQLDNTPKDVTPQESREDVVARKMAEI EQFNKEQEANHADPEPTQNEDPIQGELLDGELEY UPI00020BA2E0 (SEQ ID NO:194) NNEVMEKSVEYEVNGNSVKLTPNMIKQFITKGNADVTDQEAIMFMKLAEQQQLNPFLN EVYLIKFKGKPAQNIVAKEAFMKRAEKHSEYDGLEAGIIVQRGEEIKELPGAVCLPTDNL LGGWARVYRKDRKNPFYVQLDFKEFSKGQATWNQMPKNMIRKTAIVNALREAFPEAL GAMYTEDDARLEEVKTAEPIKEKAETTQILENKFKELSENGQTEVGDEQTNESTEPEPTA KQEQLL UPI000212F382 (SEQ ID NO:195) TVQLVQPRNSDEYDFDQTKLDLIKRTICKGATNDELQLFIHACKRTGLDPFMRQIFAVKR WDSSTKKEIMTIQTGIDGYRLIADRTGKYAPGKDTEFGYDNKGNIRWAKAYIKKMTPD GQWHEISAIAFWEEYVQTTREGKSTLFWLKKSHIMLSKCTEALALRKTFPAERSGIYTKE EMAQEFSPLEEHLVERIAASRNDQGRS UPI00022F8B4D (SEQ ID NO:196) SNNQLSTQQAKRDIAIDTSVWTFQDVKRYFDPQNLLTEKQVGQALSLIKGRNLNPLANE VYIVAYKKKTGGTEFSLIVSKEAFLKRAAQNPNYEGFEAGVVTVDTDGVMHERKGALM LPGDTLVGGWARVYRKNFKVPVEIFVSREEYDKKQSTWNAMPATMIRKTALVNALRE STDU2-42312.601 (S22-113) AFPEDLGNMYTEDDGGETFDRIKQAEPVESREDVMARKMAQIEQMKQEQAQRQIDTSY PTDDVIDPDDEPAQGELLEDLEY
Figure imgf000153_0001
UPI0002314B74 (SEQ ID NO:197) AITPNPIPAQDGSPIPSPDDIVGELARRKIYAGIPDDDVALALALCQKYGFDPLLKHLVLL ATKDRDETTGQGQKHYNAYVTRDGLLHVAHTSGMLDGLETIQGKDDLGEWAEAVVY RKDMSRPFRYRVYLSEYVREAKGVWKTHPQAMLTKTAEVFALRRAFDVALTPFEEMG FDNQNIAGDTGPSPKTGFTEKAGFTGNTDFSAEASLPGKARFSTEAGLTDMTVIPPNRVT GSIPETSRLNTSAGSTGRQRRQLF UPI00025CAD2E (SEQ ID NO:198) STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDDESCTCRIYRKDRNHPICV TEWMDECRRAPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAER IVENTAYTTERQPERDITPVNEETMSEINALLTSMEKTWDDDLLPLCSQIFRRYIRASSEL SQAEAEKVLGFLKQKATEQKVAA UPI00025CF49A (SEQ ID NO:199) EKPKLIQRFAERFSVDPNKLFDTLKATAFKQRDGSAPTNEQMMALLVVADQYGLNPFT KEIFAFPDKQAGIIPVVGVDGWSRIINQHDQFDGMEFKTSETKVSLDGAKECPEWMECII YRRDRSHPVKITEYLDEVYRPPFEGNGKNGPYRVDGPWQTHTKRMLRHKSMIQCSRIAF GFVGIFDQDEAERIIEGQATHVVEPSVIPPEQVDDRTRGLVYKLIERAEASNAWNSALEY ANEHFQGVELTFAKQEIFNAQQQAAKALTQPLAS UPI0002AD92E7 (SEQ ID NO:200) TTVNQTELKNKLAEKAKTPAKTGNTVFDLIRKMEPEIKRALPKQISPERFARIAMTAVRN TPKLQACEPISFIAALMQSAQLGLEPNTPLGQAYLIPYGKEVQFQLGYQGMLTLAYRTGE YQSIYAMPVYANDEFEYEYGLNEKLVHKPAPDPEGEPIYYYAVYKLKNGGHGFVVMSR QQIERHRDKYSPSAKQGKFSPWNTDFDSMAKKTVLKQLLKYAPKSVEFATQIAQDETIK TEIAEDMTEVQGIEVEYEATDDQENQENQEQED UPI0002B78771 (SEQ ID NO:201) EFETDEEEKEMSNNQLSTQQAKRDIAIDTSVWTFQDVKRYFDPQNLLTEKQVGQALSLI KGRNLNPLANEVYIVAYKKKTGGTEFSLIVSKEAFLKRAAQNPNYEGFEAGVVTVDTD GVMHERKGALMLPGDTLVGGWARVYRKNFKVPVEIFVSREEYDKKQSTWNAMPATM IRKTALVNALREAFPEDLGNMYTEDDGGETFDRIKQAEPVESREDVMARKMAQIEQMK QEQAQRQIDTSYPTDDVIDPDDEPAQGELLEDLEY UPI0002B78B34 (SEQ ID NO:202) TTNQVVTHKNFFNAPNVQKSFDDVWKGAGVQFATSILSVIQGNASLKSASNESIMTSAM KAAVLNLPIEPSLGRAYLVPYKGQVQFQLGYKGLIELAQRSGKYKSINAGPVYKSQFVS YDPLFEELTLDFTQPQDEVIGYFASFSLLNGFRKLTYWTKAEVEAHGKKFSKTFGNGPW STDU2-42312.601 (S22-113) KTDFDAMARKTVLKHILSIYGPLSVEMQTGMQNDESENDNATRDIKTAEPVNADQQLL EDLMNVDTETGEILEEVSELKDNGELDLKYEDPNAR
Figure imgf000154_0001
UPI0002B884F0 / WP_003158887.1 Bet [Pseudomonas aeruginosa] (SEQ ID NO:203) GTALTPLLTKFATRYEMGTTPEEVANTLKQTCFKGQVNDSQMVALLIVADQYKLNPFT KELYAFPDKNNGIVPVVGVDGWARIINENPQFDGMEFSMDQQGTECTCKIYRKDRSHAI SATEYMAECKRNTQPWQSHPRRMLRHKAMIQCARLAFGFAGIYDQDEAERIVERDVTP AEQYEDVSEAICLIKDSPTMEDLQSAFSNAWKAYKTKGARDQLTAAKDQRKKELLDAP IDVEFEETGDDRAA UPI0002CB4A67 / WP_010792303.1 Bet [Pseudomonas aeruginosa] (SEQ ID NO:204) GTALTPLLTKFATRYEMGTTPEEVANTLKQTCFKGQVNDSQMVALLIVADQYKLNPFT KELYAFPDKNNGIVPVVGVDGWARIINENPQFDGMEFSMDQQGTECTCKIYRKDRSHAI SATEYMAECKRNTQPWQSHPRRMLRHKAMIQCARLAFGFAGIYDQDEAERIVERDVTP AEQYEDVSEAVCLIKDSPTMEDLQSAFSNAWKAYKTKGARDQLTAAKDQRKKELLDA PIDVEFEETGDDRAA UPI0002E4C0BF (SEQ ID NO:205) SSIAAAAESAEVTPASIINKYRDDIATVLPPKLRERIDRWIRLAIGAVNSNPELISRVRADQ GASMMQALMKCAALGHEPGSGLFHLVPKGSRIEGWEDYKGILQRIDRSGVYARTVIGV VYANDEYSYDQNVDERPRHVRATGDRGEPISSYAYAVYPSGAITTVAEATPEQIASSKS KARGADNAASPWRAPGAPMHRKVAVRLLEKHVATSAEDRREPISRSAANDVVIDATA DYYQEP UPI0003282677 (SEQ ID NO:206) TNELTQTKGAYLTDLQKLDGATLRNFVDPKHQASPQELQTLLAIVKNRNLNPFTKEVYF IKYGNNPAQIVVSKDAFMKRAEQNQNYDGFESGIIYEDASGELKNKKGVILPKNCTLIGG WCEVYRKDRTRPVYREVELSAYNTGKNWWQKAPGQMIEKVAIVAAVRDSFSEDVGGL YTSEEMEQAAPIDVTPQESQEEVRTRKMAQIEEMKREQEKHQSSAYPEDEIPNFEDEPLQ GELLEEMEY UPI00033853AF (SEQ ID NO:207) NERTNLQYAPAPVERFKECLNSHEIKARLKNSLKNNWTQFQTSMLDLYSGDAYLQKCD PMAVALECVKAATLDLPISKSLGFAYVVPYNNVPTFTLGYKGLIQLAQRTGQYRTINAD VVYEGEIRGADKLSGMVDLSGERTGDEVVGYFAYFKLINGFEKMIYMTRAEAEKWRD DYSPSAKSKYSPWRTDFDKMALKTCIRRLISKYGIMSVEMQGVMTEEAEPRAAAAAKR AEETVQANANSKVIDIDAAPPAANESPAEAAPQPDF UPI0003427695 (SEQ ID NO:208) DYVTKIQEVLNRLLDAKHDALPSGFKKTRFSENCRAYVKEYTDLQKYDEEEVALVLFK GAVLGLDFLAKECHVITEGSALRFQTDYKGEMALVKKYSVRPILDIYAKNVREGDVFRE EISEGKPLIHFNPLAFNNSQIIGSFAVALFSDGGMVYETMPAEEIESIRRNYGKNPGSDTW STDU2-42312.601 (S22-113) EKSQGEMYKRTVLRRLCKTIEIDFDAEQSLAYEAGSSFEFNREPQPKKRSPFNPPEVEESE VLSDDGTSEAE
Figure imgf000155_0001
UPI000353091F (SEQ ID NO:209) SNALTITQDQTEFTPKQLSVLENLGVQGAAPQEVAMFFDYCQRTGLSPWARQIYMIGR WDRNLGRKKYAVQVSIDGQRLVAERSGVYEGQTAPQWCGPDGQWVDVWLANEPPQA ARVGVWRKSFREPAYGVARLSSYMPVTRDGKPQGLWGTMPDVMLAKCAESLALRKA FPLELSGLYTSEEMQQADAPRTEPAPVDEDVVDAEIVDDEERMQWVEAIQAAETTDVL RKMWADIKTCPDALQAELRELIPARAKELAA UPI000386D631 (SEQ ID NO:210) IECAKLGLEPNNILGQAYLVPVCVDGVNKVEFQLGYKGLIELAYRSGKIKSLYANEVFE KDEFHIDYGLDQKLIHKPFLGGDRGEVIGYYAVYQMDNRGASFVFMTRDEILGHSRKYS RSFGCDLWESEFDAMAKKTVIKKLLKYAPLSIELQKSVSVDESVKGIGCIGVI UPI0003E3D237 (SEQ ID NO:211) GTALTPLLTKFATRYEMGTTPEEVANTLKQTCFKGQVNDSQMVALLIVADQYKLNPFT KELYAFPDKNNGIVPVVGVDGWARIINENPQFDGMEFSMDQQGTECTCKIYRKDRSHAI SATEYMAECKRNTQPWQSHPRRMLRHKAMIQCARLAFGFAGIYDQDEAERIVERDVTP GEPVEDVTEALSLINSAPTMDDLQAAFSDAWKAYKSKGARDQLTVAKDQRKKELLEAP IDVEFEETGDDRAA UPI00044F7143 (SEQ ID NO:212) STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV TEWMDECRREPFKTREGKEIIGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI VENTACTAEHQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKALGFLKQKATEQKVAA UPI0004995B90 (SEQ ID NO:213) TNNQLATQTKRNITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV YIVAYTNRNGGKEFSLIVSKEAFLKRAAQCKDYEGFEAGVVVVDSEGVMHERKGAIML PEDTLIGGWARVHRKNFKVPVEIFVSREEYDKKQSTWNTMPATMIRKVALVNALREAF PEDLGNMYTEDDGGETFDRIKDVTPQESREDVVARKMAQIEQFNKEQEANHADPEPAQ TEETIQGELLDGELEY UPI00051F5876 (SEQ ID NO:214) SNREIEVIRACSKAGNNGGSSPWDSFPDEMARKAIVKRASKYWPRRDRLDTAIDYLNTQ GGEGIILNADHIPERDVTPASDEIINEITQAITEINKTWDDLLPLCSKTFRRTIASHEYLSQE EAVKTLDFVKKKAARNKATAEAKIHATTENNSEAVS UPI000588C848 (SEQ ID NO:215) STDU2-42312.601 (S22-113) ATNDELKNKLANKQNGGQVASAQSLDLKGLLEAPTMRKKFEKVLDKKAPQFLTSLLNL
Figure imgf000156_0001
ALRTGQYKSINVIEVREGELLKWNRLTEEIELDLDNNTSEKVVGYCGYFQLINGFEKTVY WTRKEIEAHKKKFSKSDFGWKKDYDAMAKKTVLRNMLSKWGILSIDMQTAVTEDEAE PRERKDVTDDESIPDIIEAPITPSDTLEAGSVVQGSMI UPI000598CD40 (SEQ ID NO:216) TNNQLATQTKRNITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV YIVAYTNRNGGKEFSLIVSKEAFLKRAAQCKDYEGFEAGVVVVDSEGVMHERKGAIML PEDTLIGGWARVHRKNFKVPVEIFVSREEYDKKQSTWNTMPATMIRKVALVNALREAF PEDLGNMYTEDDGGETFDRIKDVTPQESREDVVARKMAQIEQFNKEQEANHADPEPAQ TEEPIQGELLDGELEY UPI0005DCEBAD (SEQ ID NO:217) TNNQLATQTKRNITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV YIVAYTNRNGGKEFSLIVSKEAFLKRAAQCKDYEGFEAGVVVVDSEGVMHERKGAIML PEDTLIGGWARVHRKNFKVPVEIFVSREEYDKKQSTWNTMPATMIRKVALVNALREAF PEDLGNMYTEDDGGETFDRIKDVTPQESREDVVARKMAEIEQFNKEQEANHADPEPAQ TEESIQGELLDGELEY UPI0005E4CB74 (SEQ ID NO:218) TNNQLATQTKRNITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV YIVAYTNRNGGKEFSLIVSKEAFLKRAAQCKDYEGFEAGVVVVDSEGVMHERKGAIML PEDTLIGGWARVHRKNFKVPVEIVVSREEYDKKQSTWNTMPATMIRKVALVNALREAF PEDLGNMYTEDDGGETFDRIKDVTPQESREDVVARKMAEIEQFNKEQETNHADPEPAQ TEETIQGELLDGELEY UPI0005FEB4B0 (SEQ ID NO:219) NEIQAYDKINDRDGMEMLGAAIQRSGMFGAETKEQGIILALQCMVEKKPPLEMAKNYH IIQGKLSKRADAMLADFRKAGGKFIFADLKNPTVQKAKVTFEDYKDFDVEYSIDDAKTA GVYNAKGAWVKYPGAMLRARLVSETLRAIAPEIVTGVYTPEELETPINAKPELKCAQPV KAKPEPKKAQPDVIEATVCESELDAKLVELIGDREQIVNLYWEKKGLIDGLDTTWRDLN DDTKRKMIDQFDQFMDAAQRKAAQ UPI00062002D2 (SEQ ID NO:220) AENEKQALLQEENKSENVVSTVKRTALATNPFSDTDQFNNIFKMAQLISQSDMIPATYK GKPMNCVIALEQANRMGVSPLMVMQNLYVVKGVPSWSGQGCMMIIQGCGKFRDVDY VYSGEKGTDSRSCKVVATRISDGKRIEGTEITMQMVKSEGWISNTKWKNMPEQMLGYR AATFFARMYCPNELNGFATEGEAEDMNHKPQRIEAINVLGDTAHE UPI00064B44C1 (SEQ ID NO:221) STDU2-42312.601 (S22-113) TIMDLLNDPKMKSQIQRALPNGMSAERIARIALTALRMNPQLQECSPQSFAAALMTSAQ LGLEPNTPLGHAWLIPRKNHGKMEVQFELGYKGMLDLVRRSGMITAIFAEEVREKDEFE FEYGTNPYLKHKPYLGGDRGKVLFYYAVATFKDGGYAFKVMSIPEIEEARKLSQSANSP YSPWNRFYDEMAKKTVLKRLCKYLPLSIEVQRNLAQDETIRTQIEADDILDLPNENEFEV VEVEEIPGEEEKEEAKEGPFPNKALRESPTPLT UPI00064D5E13 (SEQ ID NO:222) STALTTLTSQLSQRFKLDGGEELLTTLKQTAFKGQVTDAQMTALLIVANQFGLNPWTKE IYAFPDKNGGIVPVVGVDGWARIINEHPQFDGMDFEMDGEQSCTCVIYRKDRTRPIRITE YMAECKKTGGGPWQSHPRRMLRHKAMIQCARMAFGFGGISDEDDAERIREKDITPQAE VVPKALEPYPADKFEENFEQWKSLIESGRRSADDVIAKIKSRNTMTDEQETRLRACGGE EGKTYENA UPI00065C2D47 Bet [Pseudomonas phage PS-1] (SEQ ID NO:223) SNVATIKPSSLSARMAERFGVDPNEMMATLKATAFKGQVSDAQMQALLIVADQYGLNP WTKEIYAFPDKGGIVPVVGVDGWSRIINENGAFDGMDFQQDDESCTCIIYRKDRNHPIK VTEWMAECKRNTQPWQSHPKRMLRHKAMIQCARLAFGYTGIFDEDEAQRIVEKDVTP AVNEPDITPALEAIKNASSMEELHAAFKAAWNQHPSARARLTAVKDERKKALSEPIEGE LVENEDGPAQQ UPI00067A7349 RecT [Streptococcus phage APCM01] (SEQ ID NO:224) AKNELVKGEYLTDLQKLDGNTLRNFVDPKHRASPQELQALLAIVKNRNLNPFTKEVYFI KYGSAPAQIVVSKEAIMKRAEENPNFDGFEAGIVIETKSGSIERLTGTIAPKRAELRGGWC KVYRKDRSHAIEADADFAYYTTGKNLWQKMPALMIRKVAIVSAFREAFSESVGGLYTA DEMEQNNTQETQEEVRARRMKQAYEEKLRLLTEMEAKSYKKVEDESASKEIEAAKTTK NTKEVEVIEETEVTEEPTQEDSLEW UPI0006CE3F5D (SEQ ID NO:225) STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV TEWMDECRREPFKTRDGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAER IVENTTYTTDRQPERDITPVSDETMREINDLLITMNKTWDDDLLPLCSQIFRRDIGASSDL TQIEAVKALGFLKQKAAEQKVEA UPI00078E90BE RecT [Pirellula sp. SH-Sr6A] (SEQ ID NO:226) SEATKEAKPETAIAKKPNGIKDWLKSDALKTQIASVAPKHMAPERVMRIALNAVSRTPK LADCTPESFMRCLLDLSSWGLVPDGRHAHLIPYGTECTLVLDYKGLVTLAYRSGWVKKI HADVVFEGDIFVYSLGTVCQHIPWEFRDDANKPEHKGHFRAAYCVVTMADGIEKHEV MTASEIDAIKAKSRSGNSGPWKDHYTEMAKKTVYRRASKWLPLSPEQADAMERDDDRI IDAVSVAVTQRLSKAAMPLIGANETGDTE UPI00078EBE91 RecT [Pirellula sp. SH-Sr6A] (SEQ ID NO:227) STDU2-42312.601 (S22-113) SEATKEAKPETAIAKKPNGIKDWLKSDALKTQIASVAPKHMAPERVMRIALNAVSRTPK
Figure imgf000158_0001
HADVVFEGDIFVYSLGTVCQHIPWEFRDDANKPEHKGHFRAAYCVVTMADGIEKHEV MTASEIDAIKAKSRSGNSGPWKDHYTEMAKKTVYRRASKWLPLSPEQADAMERDDDRI IDAVSVAVTQRLSKAAMPLIGANETGDTE UPI00078ED021 (SEQ ID NO:228) SEIQQQAEAQTQAHPTAVLDDYRGAIASVAPPGTNIDLFIRMTKSNVNRSDEIVAAVKR NPGLFMQAVMDSAALGHIPGSEYYYLTPRRDGISGIESWKGVAKRIFNTGRYQRIVCEV VYEGEQWEFQPGEDLKPKHVIDWDARQVGSKVRFTYAYAVDFEGNPSTVAVCTKLDL DKAQKQSRGKVWDQWYEQMAKKTAIKRLEDFVDTSAVDLRADGSSRRHSAEVAE UPI000795D815 (SEQ ID NO:229) ASKNEAIEVSPAEIASVKEKPASIVKAEKAKKEPCALVKYEDAEGREVVLTREDIINTISS NPRITDKEIKLFIELARAQKLNPFTREIFITKYGDYPATFIVGKDVFTKRAQSNPLFKGMQ AGIIVQRGNAVDQREGSATFGDEMLIGGWCKVYVQGYDVPIYDSVSFNEYAARKTDGT LNAMWASKPATMIRKVAIVHALREAFPSDFQGLYDQSEMGLSGQGGE UPI00079B135B (SEQ ID NO:230) ATSLKRAVTGDGKPATVQQLLTNPKIKSQIALALPQHLTPERLTRIVLTEIRRTPALAKCK PESLLAAVMQCAQLGLEPGGSLGHAWLMPFKNEVQFIIGYRGMIDLARRSGQVLSIEAR GVYESDTFHVSFGLEPDLTHQPDWDPADRGKLAFVYAVARLKDGGFQFDVMSRAEVE KIRAQSPAGKSGPWVTHFEEMAKKTVIRRLFKYLPVSVEMARAVGLDEAAERGEQSDA IDADCVIESEEEATPEEKGDSAA UPI0007B45EC7 (SEQ ID NO:231) SEISKAVATQQNPLAVVARYKRELGTVLPTVLRQDPDRWLMAAENAARKNPDIMAVT KADQGASYMRALVECARLGHEPGSKDFHFIKRGNAISGEESYRGIIKRVLNSGFYRSVV ARTVFSNDTYSFDPLTDIVPNHVPAQGDRGKPLSAYAFAVHWDGTPSTVAEATPERIAT AKAKSFASDKPTSPWQLPTGVMYRKTAIRELEPYVHVAPEPQPRRHLDGTVGGIPATDF DVDDGDVLDITADQLAEAGEIV UPI0007B642FE (SEQ ID NO:232) SELQQAAQGQADAGPVQVIYSHAKEIQNVLAKGTDMDRWLQMARLAVMRDPNLVNA AKRDPGSLMQAMLDCAEKGHIPGTEDYYLVPRKGGIQGMESWKGIAKRIMRSGRYQSI VAEVVYEGEDFDFNPNTMDRPVHQIKYMARTSGQPVLSYAYAVDHEGKPSTIAVADPR YIAKVKANSKGTVWADWDEAMYKKTAVKMLVDYVDTSSTDRRGVSTVQVDGPVGTF IDGVLEIEGGDQ UPI0007B64693 (SEQ ID NO:233) SELQQAAQGQQSNNPVSFIYSHAKDIQNVLTKGTDMDRWLQMARLAVMRDQNLVASA KRDPGSLMQALLDCAEKGHVPGTEDYYLVPRKGGIQGMESWKGIAKRIMRSGRYQSIV STDU2-42312.601 (S22-113) NEVVYEGETFEFNPNTMDRPVHNINYMTRTSGKPVMSYAYALDHDGKPSSVAIADPRYI
Figure imgf000159_0001
DDTFTAREAGE UPI0007BCAEAB (SEQ ID NO:234) TQDLATAIADQQPAQRRTAFDLVESMRGELHKALPEHASIDNFLRLALTELKMNPQLGN CSGESLLGALMTAARVGLEVGGPLGQFYLTPRRLKRDGWAVVPIVGYRGLITLARRAG VGQVNAVVVHEGDTFREGASSERGFFFDWEPAVERGKPVGALAAARLAGGDVQHRYL SLAEVHERRDRGGFKDGSNSPWATDYDAMVRKTALRALVPLLPQSTALSFAVQADEQ VQRYDAGDIDIPALDETDTEDTK UPI0007F13B78 (SEQ ID NO:235) TNQLAHKDFFNTPAVKQKFQEVLNGNERQFTASLLSIVNNNKLLARASNTSIMTAAMK AAVLNLPIEPSLGFAYIVPYGQDAQFQLGYKGLIQLAIRSGQFKAINSGKVYKAQFKSYD PLFETLDIDFTQPEDEVYGYFATFELVNGFKKLTFWTKEQAESHGKRFSKTYAKGPWST DFDAMAQKTVLKSILSKYAPLSTEMQEGLISDNQTEEVETDPIDVTPKNEDTQTLLSDLM SDEAESETEKV UPI000865F43D (SEQ ID NO:236) TSQQLDTTHTINQQVTTFRHTLVQMKNEIAAALPAHMTGDRFLRLILTEVRKNPELAECS TESIFGGILTAAALGLEPGLNGECWLIPRKVGKGPGSRKEATFQVGYKGIIKLFWQNPLA SYLDTGVVYANDAWKFRKGLDPILEHTPATGDRGAVRGYYAVVGLTTGARIFDFFTPK QISALRGTAGPNGGISDPEHWMERKTALLQVMKMAPKSTDLASAASVDGTVQTVEAA AQVAAASTGPVNPTTGEVLEAEPVEGGAA UPI000865FB15 (SEQ ID NO:237) TQQMPIEAQGEPTKELQQKAAVDRFNATLHQMQNEIARALPKHMTGDRFVRIVLTEVR KDPTLALCDPLTMFGSLLTAAALGLEPGLNGECWLVPRKNHGTLEAQLQVGYRGVVKL FWQNPAAAYLDTGYVCERDHFRFAKGLNPILEHTPAEGDRGKVVRYYAVAGLNTGAR VFDVFTPAQIKTLRGGKVGSNGDIPDPEHWMERKTALLQVLKLMPKSTQLAAVPAADG RAHTISDAQQIFGGVDTSTGEVLEAEPVEGDAA UPI0008D18539 (SEQ ID NO:238) ETIDIKQELASQAQTDSKKEVKLTKAMSIAEMIKAMMPEIKRALPSMITPERFTRIALSAL NNTPELQACTPMSFISALLNAAQLGLEINSPLGHAYLIPYKNKGVLECQFQIGYLGLIALA YRNELMQTIQAQCVYENDEFLYEYGLNPKLVHRPATSDRGEPVFFYGLFKMINSGFGFC VMSKQEMDEFARTYSKGLASSFSPWKTSYNEMAKKTVIKQALKYAPIKTDFQKALSTD ESIKYAISEDMTEAVNEIVSQNTEVA UPI0008D990CB (SEQ ID NO:239) SNLKNQLANKAGGTATKKQPQTMQDWIKVMEPQIKKALPSVITAERFTRMALTAISTNP KLAECTPESFMGALMNAAQLGLEPNTPLGQAYLIPYGKSVQFQVGYKGLMELAQRSGQ STDU2-42312.601 (S22-113) FKSIYAHTVYENDEFEVEYGLTQNIVHKPNFDDRGKPIGFYAVYKLTNGGENFVFMTQR
Figure imgf000160_0001
EDMTEVPEEMVEAEYEVVEQNTMAEDADLKGTPFETK UPI0008E12231 (SEQ ID NO:240) SNNELLAKPVEFEVNGEAVKLTGKTVKNFLVSGNGEVSDQEVVMFINLCKYQKLNPFL NEAYLVKFKSKSGPDKPAQVIVSKEAFMKRAEKHPNYEGFEAGIIVERDGQLVDIEGAIK LTNDKLVGGWARVYRSDRQKPITTRISLSEFSKGQSTWNSMPLTMIRKSAIVNAQREAF PETLGALYTEDDAKLDTTSSHDQEQVIEQEIKTKANQEVIDVEYTEESEQKSPQQEQTET TQAGPGF UPI0008EA8633 (SEQ ID NO:241) ATNSSLKNQLSKKENVTIGNTMQGLLNNPKMKKRFEEILDKKAPQYMSSILNLYNGDTS LQKCEPMSVLSSSMIAATMDLPVDKNLGYAWIVPYKNKAQFQMGYKGYIQLALRTGQ YKHINAIEIHEGELVNWNPLTEELEIDFTKKESDKIIGYAGYFELLNGFKKSTYWTKTQIE NHRKKFSKSDYGWNKDFDAMAIKTVIRNMLSKWGILSIEMQNAYTADENIIKDSFIDDS ENVSANIEDLVEADYTVNQDSLESKEEFEGTPLE UPI00091F1EB0 (SEQ ID NO:242) KKMTVMKTSAPLCYADVAEVKCEEFYEDQYKAGAEELFDNTSYDRLKVYLEKHGGLE GVHADVVRAGDTFVYRPGVIRRHGYVPGEQRGQVYAVYAKAHIKGGATRCVILARHE VEIDMDAKHGGNPDGDWENLAKVVALRSLAEALPLPSAVLQSCRTWSAK UPI000958E115 (SEQ ID NO:243) SNPPLAQADLQKTQGTEVKTKTKDQQLIHFINQPSMKAQLAAALPRHMTPDRMIRIVTT EIRKTPALANCDMQSFIGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLIIG YRGMIDLARRSNQIISISARTVRQGDSFHFEYGLNENLTHVPGENEDSPITHVYAVARLK DGGVQFEVMTHNQIEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQK AVILDEKAEANVDQENASVFEGEFEEVSQSA UPI0009805C1D (SEQ ID NO:244) ANELGIFSVDNLNMTTIKQYLDGGGKASDAELVLLINLCKQNNMNPFMKEVYFIKYGN QPAQIVVSRDFYRKRAFQNPNFAGIEVGVIVLNKDGVLEHNEGTFKTHEQELVGAWAR VHLKNTEIPVYVAVSYDEYVQMKDGHPNKMWTNKPCTMLGKVAESQALRMAFPAEFS GTYGEEELPEPEKEPREVNGVKEPDRAQIESFDKEDYAAKKIEELKEKAQPQKEVVKET GEVIDKITAEDF UPI0009805F63 (SEQ ID NO:245) ANELGIFSVDNLNMTTIKQYLDGGGKASDAELVLLINLCKQNNMNPFMKEVYFIKYGN QPAQIVVSRDFYRKRAFQNPNFAGIEVGVIVLNKDGVLEHNEGTFKTHEQELVGAWAR VHLKNTEIPVYVAVSYDEYVQMKDGHPNKMWTNKPCTMLGKVAESQALRMAFPAEFS STDU2-42312.601 (S22-113) GTYGEEELPEPEKEPREVNGVKEPDRAQIESFDKEDYAARKIEELKEKAQPQKEVVKET GEVIDEITAEDF
Figure imgf000161_0001
UPI0009880690 (SEQ ID NO:246) TNNQLVEAKGDFLTNPQLLNSGIIRKYLDPQGKASDEELAYFIAQAKAQNLNPFTKEIYFI KYGTQPAQIVTAKSAFEKKADSHPQFDGKEAGVIYLLDGEIKYSKGAFIPKGAEILGGW AKVYRKDRTYPTETEVSFEEYDNSKIRARVKELTQQGKDVTYPVMNSYGKPIGENNWD TMPCVMIRKVALVSAYREAFPAELGASYEADEIQLDNTPKDITPQENREDVIARKMAQIE QFNKEQAHTDPEPTQTEEPIQGELLDGELEY UPI0009F5E532 (SEQ ID NO:247) RTDGTKEAGAAATAPTEGKAPAKAHKPADTIGAMIEKLKPQIERALPKHVTPDRMARM ALTAIRNNPKLGQAEAVSLMGSIIQASQLGLEPNTPLGQCYIIPYNSKNGMQAQFQMGY KGIVDLAHRSGQYRQLTAHPVDEADEFRYSYGLNPDLVHVPAEKPSGKITHYYAVYHL TNGGFDFRVWSREKVEAHAKQYSKSFSSGPWQTNFDQMACKTVMIDLLRYAPKSVEIA KATSADNRTHTINPEDPDLNIDTIDGDFELEGEER UPI0009F8F604 (SEQ ID NO:248) EALLLRRWQMGNLTKTTGFALAPQNLEQAMQLATMICNSQLAPNNYKGKPEDTLVAM MMGHELGLNPLQSIQNIAVINGRPSIYGDALLALVQNSPAFGGIQESFDEDTMTATCTV WRKGGEKHTQHYSKDDADTAGLWGKQGPWKQHPKRMLAMRARGFAVRNQFADALA GLVTREEAEDMEKEINPTPAPQAQSKRIGQKQSRTQYSESDFNENFPKWKAAVESGKKT SEQIISMVSTKGDLTQGMIEAIESIEAGEPA UPI000A08A794 (SEQ ID NO:249) GHLVSKTEQDYIKQHYAKGATDQEFEHFIGVCRARGLNPAANQIYFVKYRSKDGPAKP AFILSIDSLRLIAHRTGDYAGCSEPIFTDGGKACTVTVRRNLKSGETGNFSGMAFYDEQV QQKNGRPTSFWQSKPRTMLEKCAEAKALRKAFPQDLGQFYIREEMPPQYDEPIQVHKP KALEEPRFSKSDLSRRKGLNRKLSALGVDPSRFDEVATFLDGTPDRELGQKLKLWLKEA GYGVNQ UPI000B36BD3F (SEQ ID NO:250) TDVKQELERKVGKQDSTAVRLTKNMSIPDMIKALEPEIRRALPAVLTPERFLRMALSAV NNTPKLAECTPMSFIAAMMNAAQLGLEPNTPLGQAYLIPYKNKNQLECQFQIGYKGIIDL AYRTGQMQMIQAHAVHEFDDFEYEYGLNPKLIHRPGDGNRGEITYFYGLFKLVNGGFG FEVMNREAMEAFAQQYSQSYGSQYSPWVKNFEDMAKKTVIKKALKYGPVKAEFQKAI SMDETIKTEIAVDMTEVQNEESE UPI000B38B374 (SEQ ID NO:251) ENEVMTQDQAYEVASPFGSSENFQKLFDIGKMFASSSLVPDRYRGKPMDCTIAVDMAN RMGVSPMMVMQQLYVVKGNPQWSGQACMSLIRGSSEYKNVRPVYTGKKGEDSWGC STDU2-42312.601 (S22-113) YIEAEKKKTGEIVKGTEVTIAMAKAEGWYSKKDKYGNETSKWQTMPELMLAYRAAAF FARVYIPNALMGCAVEGEAEDIMKRAITAEDPFKEDAK
Figure imgf000162_0001
UPI000B49B5D9 (SEQ ID NO:252) TLQAVCPTQDKAVESQLDQTKFELIKRTICKGTTDDEFQLFIHACKRTGLDPFMRQIFAV KRWDSAERREVMTIQTGIDGYRLIADRTGRYAPGRDAEFGYDAHGGLRWAKAYVKKM TPDGHWHEISATAFWTEYVQTTKDGRPTVFWMKKGHVMLSKCAEALALRKTFPAELS GIYTQEEMAQTMSLPDTKGDSQTIGSDKAYEIERSIDNDPEFKTQLLTRLQRAFGCKSFS DLPQDQFKNVKKVIENHQIKEKIA UPI000B4BEFE6 / WP_088258624.1 Bet [Fimbriiglobus ruber] (SEQ ID NO:253) TDIAHRSYSAPQLSLIRRTVAKDTNQDEFDLFIEICKQQGLDPFKKQIFAQVYNKDKADK RQIVIVTSIDGYRAKAQRCGDYRPAEEETRFEADAALKIRRQPMGSFVRSCGLQVRPGQ GVVSRVGEARWDEFAPLDDAEFDWVDTGETWPDTGKPKKKKVAKSAKKTLKEGNWK NMPHVMLGKCAEAQALGGGGRKRSAACTSKRRWTRSTWT UPI000B5661AA (SEQ ID NO:254) TASKQTDIFSFVSGGEDITITLADIKNYFCANATDQECVLFGQLCKANGLNPWLKEAYLI KYDKNAPAAMVTGKDAYMKRANEHPAFDGYEAGVKVYLPDVGQVEYREGTAYYEDL GEQLIGGYAKVYRKDRSRPYYEEVPLKEYDTKQSKWKTSPATMIRKVALVHALREAFP TNIQGMYDADETPYAADYEGSFREMDDPTPAPSMRGRIAPAPVADPLEDLEADVIEAGD VE UPI000B94B1D1 (SEQ ID NO:255) ADLTKTANGADLAAAIGGKQAETGRATAFDLVKSMEAEFAKALPRHVPVEQFMRTAV TELRQNADLQRSTSESLLGAFLTAARLGLEVGGPMGEFYLTPRFAKLPGQDQKAWQVV PIVGYRGLVKLARNAGVGAVKAWVVYEGDHFVEGANSERGPFFDFHPVPGDPAGRKE VGVLAVARLSGGDVQHTYLTIEQVEKRKARGSAGDKGPWATDRAAMIRKSGIRALAGE LPQSTLLALARVVDEEVQTYVPGSLVDVGTGELEA UPI000BD04ECE (SEQ ID NO:256) NTELETMNNVYDNLQSVIMQQGIAALLPAQVTPEQFTRTAATALIENVDLQNADKQSLV LALTRCAKDGLMPDGREAALVVRSTKVNKQFVKKAVYMPMVDGVIKRARQSGQVANI IAKVVYSQDEFEYVIDENGEHLTHRPAFVDGDDIVKVYAFAKLNSGELVVEVMSRAGV EKIRDTVQSAKYDSSPWVKWFDRMALKTVIHRLARRLPCASELFSLFEVYEDANSTEKT LRMAPASFKRLSIN WP_032686941.1 RecT [Raoultella planticola] (SEQ ID NO:257) TKQPPIAKADLQKTQGTRVSSPKGNNDVISFINQPSMKEQLAAALPRHMTAERMIRIATT EIRKVPALASCDTMSFVSAIVQCSQLGLEPGGALGHAYLLPFGNKNEKSGKKNVQLIIGY RGMIDLARRSGQIASLSARVVREGDEFSYEFGLEEKLTHRPGENEDAPVTHVYAVARLK STDU2-42312.601 (S22-113) DGGTQFEVLTSKQIELVRSQSKAANSGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQRAV SIDEKEALTIDPADTSVLTGEYSVINSESEE
Figure imgf000163_0001
WP_069728515.1 RecT [Pantoea brenneri] (SEQ ID NO:258) SNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMIRI VTTEIRKTPQLAQCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQLII GYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLVHRPGENEDAPITHVYAVAR LKDGGTQFEVMTVKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSIEM QKAVVLDEKAESDVDQDNASVLSAEYSVLESGDEATN WP_045958294.1 RecT [Xenorhabdus poinarii] (SEQ ID NO:259) TNTPPLAQADLQKAQPQTKVAATKDQALIQFINKPSMKAQLAAALPRHMAPDRMIRIVT TEIRKTPALANCDMQSFVGAVVQCSQLGLEPGSALGHAYLLPFGNGKSKTGQSNVQLII GYRGMIDLARRSGQIVSISARTVRDGDQFHYEYGLNENLTHIPGENEDAPITHVYAVARL QDGGVQFEVMTRKQVEKVREKSSAGNNGPWVTHWEEMAKKTVIRRLFKYLPVSIEMQ KAVILDEKADANIDQDNAAIFEGEFEEVGNDG WP_102086779.1 RecT [Proteus mirabilis] (SEQ ID NO:260) SNPPLAQADLQKTQGTEVKTKTKDQQLIHFINQPSMKTQLAAALPRHMTPDRMIRIVTT EIRKTPALANCDMQSFIGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLIIG YRGMIDLARRSNQIISISARTVRQGDSFHFEYGLNENLTHVPGENEDSPITHVYAVARLK DGGVQFEVMTHNQIEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQK AVILDEKAEANVDQENASVFEGEFEEVGSNGN WP_109615067.1 RecT [Edwardsiella piscicida] (SEQ ID NO:261) TNNQQPPIATADLQKAQSQAPAVKPDQKLINFINQPSMKGQIAAALPRHMAPDRMIRIIT TEIRKTPALATCDMQSFIGSVVQCSQLGLEPGGALGHAYLLPFGNGKAKSGQSNVQLIIG YRGMIDLARRSGQIVSISARTVRDGDQFHYEYGLDETLKHVPGDNESSPITHVYAVAKL KDGGVQFEVMTFNQIEKVRGQSKAGNNGPWQTHWEEMAKKTVIRRLFKYLPVSIEMQ KAVILDEKAEANIDQENASVISAEFSVVED WP_124537594.1 RecT [Morganella morganii] (SEQ ID NO:262) SNPPIAQADLQKAQGTAVKEKTKDQQLIQFINQPGMKAQLAAALPRHITPDRMIRIVTTE IRKTPSLATCDMQSFIGAVVQCSQLGLEPGNALGHAYLLPFGNGKAASGQSNVQLIIGYR GMIDLARRSGQIISISARTVREGDSFHFEYGLNEDLTHVPGENDSGPITHVYAVARLKEG GVQFEVMSFSQIEKVRDSSKAGKNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQRAVIL DEKAEANVDQEHASIFEGEYETVSPE WP_006657622.1 RecT [Providencia alcalifaciens] (SEQ ID NO:263) STPPLAKSDLQKTQGTEVKIKTNEQKLVEFINQPGMKAQLAAALPKHITSDRMIRIVSTEI RKTPSLANCDIQSFIGAVVQCSQLGLEPGNALGHAYLLPFGNGKSDNGQQNVQLIIGYR GMIDLARRSGQIISISARTVRQGDNFHFEYGLNENLTHIPEGNEDSPITHVYAVARLKDG STDU2-42312.601 (S22-113) GVQFEVMTYNQIEKVRNLSKAGKNGPWVTHWEEMAKKTVIRRLFKYLPVSIEMQKAVI LDEKAEANIEQEHSAIFEAEFEEVDSNGN WP_109401438.1 RecT [Proteus terrae] (SEQ ID NO:264) SNPPLAQADLQKTQGTEVREKTKDQMLVEFINKPNMKAQLAAALPRHMAPDRMIRIVT TEIRKTPELANCDMQSFVGAVVQCSQLGLEPGNALGHAYILPFEKKRKQGNQWVTVRT DAQLIIGYRGMIDLARRSGQIVSISARTVRQGDSFHFEYGLNENLTHVPGENEDSPITHVY AVARLKDGGVQFEVMTHNQIEKVRTSSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPV SIEMQKAVILDEKAEANVDQENSSVFEGEFEEVGQGA WP_115149784.1 RecT [Plesiomonas shigelloides] (SEQ ID NO:265) SNQRPPIATADLQKAQSQPPAAKPEQNLINFINQPSMKSQIAAALPRHMAPERMIRIITTEI RKTPKLATCDVQSFIGAVVQCSQLGLEPGGGLGHAYLLPFGNGKAESGKPNVQLIIGYR GMIDLARRSGQIVSISSRIVREGDQFHYEYGLNETLKHVPGDNESAPITHVYAVAKLKDG GTQFEVMSFNEIEKIRGQSKAGNDGPWIKHWEEMAKKTVIRRLFKYLPVSIEMQRAVIL DEKAEADIEQDNASIIGAEYSVVENAA WP_034910107.1 RecT [Gilliamella apicola] (SEQ ID NO:266) SEQNQPPIAKSDLEKTQLTNQDKKPATLAELVNSPKIKNQLAMALPKHMNPDRMARIVT TEIRKTPALADSNIQSFLGAVVQCSQLGLEPGGALGHAYLLPFGNGKAKDGKSNVQLIIG YRGMIDLARRSGQIISISARTVREGDDFHYEYGLNEDLKHTPKADESAPITYVYAVARLK DGGSQFEVMTFNQIESVRKQSKAGDKGPWITHWEEMAKKTVIRRLFKYLPVSIEIQQAVI LDEKAEAGISQDNEMILDADFSVVEA WP_016979878.1 RecT [Pseudomonas fluorescens] (SEQ ID NO:267) NSTAETATPFSSQDLEKTQPTKAQSKTGSLASLLASPKMKSQFAAALPKHMTPERMARI VTTEIRKNPELVKCEQHSFLGAVIQCAQLGLEPGNTLGHAYILPYGKQAQLIIGYRGMID LARRSGQIISISARTVREGDYFEYEFGLDENLIHRPVETTQPGAVTHVYAVARLKDGGRQ FEVMSRAQIEEVRVQSKAAKSGPWVTHWEEMAKKTVIRRVFKYLPVSVEIQRAVMLDE KAEAGVCQENECVFDGDFEVITDTEE WP_080977968.1 RecT [Pseudomonas stutzeri] (SEQ ID NO:268) STENVAPFSQKDMQQATGQQVKPRSPADSLAAMLASPKMKAQFAAALPKHMTAERM ARIVTTEIRKTPALVKCDQHSFLGSVIQCAQLGLEPGNSLGHAYLLPYGNQVQLIIGYRG MIDLARRSGQIVSLSARTVREHDEFDYQLGLHEDLTHKPFEGEHAGEITHVYAVARLQG GGVQFEVMSKAQVEAVRAQSKAGKSGPWVSHWEEMAKKTVIRRLFKYLPVSVEIQRA VTLDEAAEAGLPQGNEYVFDGDFEVVNDASGAQQ KXJ39364.1 AXA67_02205 [Methylothermaceae bacteria B42] (SEQ ID NO:269) ATSLKRAVTGDGKPATVQQLLTNPKIKSQIALALPQHLTPERLTRIVLTEIRRTPALAKCK PESLLAAVMQCAQLGLEPGGSLGHAWLMPFKNEVQFIIGYRGMIDLARRSGQVLSIEAR GVYESDTFHVSFGLEPDLTHQPDWDPADRGKLAFVYAVARLKDGGFQFDVMSRAEVE STDU2-42312.601 (S22-113) KIRAQSPAGKSGPWVTHFEEMAKKTVIRRLFKYLPVSVEMARAVGLDEAAERGEQSDA IDADCVIESEEEATPEEKGDSAA
Figure imgf000165_0001
WP_106478153.1 RecT [Halomonadaceae bacterium R4HLG17] (SEQ ID NO:270) SEVATQDTLGKELQQHSGQQKKPMPTTIQGMLKDPRFTSQIARALPKHITPDRITRIALTE VNKTPALGKCDPVTLFGSIIQSAQLGLELGGALGHAYLVPYGNQAQFIIGYRGMIDLARR SGQMVSLQAHTVHDNDEFDFEYGLDEKLRHVPARGDRGPMVAVYSVAKLVGGGHQIE VMWKEDVDAIRSKSKAGNSGPWRDHYEEMAKKTAIRRLFKYLPVSVEMQKAVALDEQ AEAGVQDNNVFDGEFSYGEAE WP_129141488.1 RecT [Halomonas coralii] (SEQ ID NO:271) TDQATAEPQEDLGKQLQQHSQRKPMPTTIQGMLKDDRFTGQIARALPKHITPDRISRIAL TEVNKTPALGKCDPMSLFGSIIQSAQLGLELGGALGHAYLVPYKDQAQFIIGYRGMIDLA RRSGQMVSLQAHTVHENDDFEFEYGLDEKLRHVPARGQRGPMIAVYAVAKLTGGGHQ IEVMWKEDVDAIRQQSKAGNSGPWRDHYEEMAKKTAIRRLFKYLPVSVEMQKAVSLD EQAEAGVQDNNVFDGEFSYQEPE WP_084261900.1 RecT [Zymobacter palmae] (SEQ ID NO:272) TNTVQQQAPQQDQLAQQLQQASGNTPQKKPMPSTIQGMLKDDRFKTQIARALPKHVTP ERIMRIALTEINKTPKLKECDPIGLFGSIVQSAQLGLELGGALGHAYLVPYGKQAQFIIGY RGMIDLARRSGQMVSLQAHTVHENDEFNFEYGLNENLRHVPARGERGPMIAVYAVAK LVGGGHQIEVMWKEDVDAVRKSSKAGGSGPWRDHYEEMAKKTAIRRLFKYLPVSVEM QRAVSLDEQAEEGVQDNNVFDGDYTVAEH WP_020007369.1 RecT [Salinicoccus albus] (SEQ ID NO:273) STNESLKNQVATNQKNEVSNGNKPKTIGDYIDQMAPAMAQALPKHMSVERMTRMATT VIRTTPQLKEADVASLLGAVMQSAQLGLEPGPMGHCYFLPFKNNKKGTTEVTFIIGYKG MIDLARRSGHISTIYAHAVYENDEFEYELGLHADLKHKPSEDERGAFKGAYAVAHFKD GGYQFEYMPKSDIDKRRSRSKAGNSNYSPWATDYEEMAKKTVIRHMWKYLPVSVEMQ QAVAHDEGTGKDIKDVTPDEDSFVDMPEYIADVPAEGEGE WP_131521405.1 RecT [unclassified Lysinibacillus] (SEQ ID NO:274) ATTTDLKAQMQQAPATQQKPKTIDDYLKQMAPAMAQALPKHMDVDRLMRLAMTTIR TTPALKDADVSSLLGAVMQAAQLGLEPGLMGHCYLLPFKNNKKGITEVQFIIGYKGMID LARRSGHIQSIYAHAVYQKDEFEYELGLDPKLKHKPCMDEDKGNFVGAYAVAHFKDG GYQFEFMSKAEIEKRKGRSKAANSTYSPWATDYEEMAKKTVVRHMWKYLPISVEMQQ QVAYDEGTAPKREMKDITPETEFFVDAPEIEVEVVNE WP_132769795.1 RecT [Tepidibacillus fermentans] (SEQ ID NO:275) ATNEKVKTQLANRANGQAPTPTPEQTIAAYMKKMAPRFAEVLPKHMDIDRMTRIALTTI RTNPKLLEASVPSLLGAIMQAAQLGLEPGLVGHCYLVPFKNGKTGQTEVQFIIGYKGMI DLARRSGNIESIYAHAVYENDTFEYEYGLHPKLVHKPAMTDRGEFIGAYAVAHFKDGG STDU2-42312.601 (S22-113) YQFEFMPKEEIEKRRNRSKTANGGPWVTDYEEMAKKTVVRHMWKYLPISIEIQQAAAQ DEVIRKDVTSEPEFVDDVIDISTEIEEQSVEVEGEEAQ
Figure imgf000166_0001
WP_120191052.1 RecT [Ammoniphilus oxalaticus] (SEQ ID NO:276) STKATSNELKNQLANRQGNNAATNNNPANTIAAYLKRMAPEIEKALPAHMDADRLARI ALTTIRTTPKLLECTIPSLMGAVMQSAQLGLEPGLIGHCYIIPYGKEATFIIGYKGMIDLAR RSGNIESIYAHAVYKNDEFEYEYGLKPNLVHKPAMSDQGDFIGAYAVAHFKDGGYQFE FMPKEEIDKRRNRSAASKGGPWVTDYEEMAKKTVVRHMWKYLPISIEIQQAATQDEVV RKDITEDPMPVDVLDIPFEASDAEETSEEGEINFD WP_066790810.1 RecT [Rummeliibacillus stabekisii] (SEQ ID NO:277) ATTTELKEQMKQQAPAQTKKPKTIEDYMKQMAPAMAEALPKHMSVDRLTRLAMTTIR TTPALRQADVSSLLGAVMQAAQLGLEPGLLGQCYLLPFKNKKKAITEVQFIIGYKGMID LARRSGHIQSIYSHAVFENDVFEYELGLEPKLKHTPTMSTDKGAFIGAYAVAHFKDGGH QFEFMSKADIEKRKGRSKAANSDYSPWLTDYEEMAKKTVIRHMWKYLPISVEMQEQV AYDEGVGRSIKDVTPEEDVFVQAPDEILEAEATEA WP_098408280.1 RecT [Bacillus] (multispecies) (SEQ ID NO:278) TQAEKLKNDIAKQEQKNEVAQDDKPKTILDVMMQHKESFEMALPKHLDADRLIRLAVT EFRKNPMLKECTPESLLGAVMQAAQVGLEPDALGSAYLVPYYNKNKNVKEVQLQIGY KGLIELVRRSGQVTSIVANEVYENDEFDFEYGINEKLYHKPTMDADRGKLKCFYAYARF KDGGHAFTVMSVEQINQIRDKFSKSQKNGKHFGPWADHYESMAKKTVIKQLVKYMPIS VEIQNQITRDETVHSSFKEEPKPIYAFEESPDIIDAPIEN WP_047150996.1 RecT [Aneurinibacillus tyrosinisolvens] (SEQ ID NO:279) SDLKEKLEKRANETEAAPPSPAQTIAAYLKRMEPEIARALPKHMDVERLTRIALTTIRTN PRLLECTVPSLLGAVMQAAQLGLEPGLLGQCYIIPHGREATFIIGYKGMIDLARRSGNIKS IYAHDVRENDEFEYEYGLHPFLKHRPAMTDRGKFIGVYAVAHFNDGGYQFEFMPYEEIE RRKLRSRSYKNGPWVTDYEEMAKKTVIRHMFKYLPLSVEIMRSAAQDETVRPDLTSDP VSIYERPIEGKIITAEDVQPEEIPNVPDAEQGDV WP_018705791.1 RecT [Siminovitchia fordii] (SEQ ID NO:280) ATNQDIKNQLANKANGNKPASPANTIAAYLKKMGPEIEKALPKHMDADRLARIALTTIR TTPKLLECNISSLMGAVMQSAQLGLEPGLIGHCYIIPYGKEATFIIGYKGMIDLARRSGQI QNIYAHAVFENDEFDYALGLHPKLEHKPAGSNRGEIIGAYAVAHFKDGGYQFEYMAKE DIEKRKSRSAAARSKHSPWATDYEEMAKKTVIRHMWKYLPISVEIQQQAIQDEVVRKD VTSEPEFIDMEDMPEVEEGQSEESEQVEAPFD WP_035430909.1 RecT [Bacillus sp. UNC322MFChir4.1] (SEQ ID NO:281) ATNKDVKNQLANRKENKPATPEQKVEAYMTAMAPRFAEVLPKHMSMDRMSRIALTTI RTNPKLLECSVPSLMGAVMQAVQLGLEPGLLGHCYILPYKSEATFIIGYKGMIDLARRSG HIQSIYAHAVYENDEFDYELGLHPKLTHKPSFGERGEFIGAYAVAHFKDGGHQMEFMPK STDU2-42312.601 (S22-113) SEIEKRRSRSASGNSSYSPWKSDYEEMAKKTVVRYMFKYLPISIEVQSQAQHDEVVRKD ITEEPEFIEMDSIEVAEASEGDGQKEFVIEE
Figure imgf000167_0001
RDC50983.1 RecT [Acinetobacter sp. RIT592] (SEQ ID NO:282) ADLKNKLANKAAGTVTKTSPNAGMKQLMKSMSKEIEAALPSHMSSERFQRVALTAFG NNPKLMNCDPMSFIAAMMDSAQLGLEPNTPLGQAYLIPYGTKVQFQVGYKGLLELALR SGKIKTLYAHEVRENDTFEVKYGLHQDLIHEPVLKGNRGEVIGYYAVYHLDTGGHSFVF MTKDEVLEHAKGKSKTFNNGPWQTDFDAMAKKTVIKQLLKYAPLSIEMQKAVSSDET VKSKIDEDMSLVVDESDSIEANFEIKEDEDGQLDVYVK WP_150051132.1 RecT [Methylomonas rhizoryzae] (SEQ ID NO:283) SELLSALNAPETQKPQTLPAMLKQHQPRFKAIAPRDVDVTRFSAALMADVRSNQKLAE CNPMTVLGAFIRSTQLGLEPGSQLGQAYFVPFKGECQLVIGYRGMIELAYRSGKVASISA RTVYENDVFEWELGTDERITHKPATGDRGALVAVYAMAKLTTGGIHFEVLDLAEIEKA KRASKSSSFGPWKDHFEEMAKKTAIRRLFKYLPVGTDLTRAVALDEKAESGSQQNDIEA ETVLDGEFYPAGGGNDG WP_097006457.1 [Lacrimispora amygdalina] (SEQ ID NO:284) AVDVKNELERKASGQNSQVKLTKSMTIADMVKALEPEIKRALPAVLTPERFTRMALSAI NNTPELAGCTPMSFIAAMMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQIGYKGMID LAYRTGQIQVIQGQAVREFDYFEYQYGLDPKLVHRPGEEERGEITFIYGLFRLSNGGYGF EVSNKADMDAFAAKYSKSFGSKYSPWTENYEDMAKKTVIKRALKYAPVSVDFQKAMS MDETIKTEISVDMSEIRNECPEISENGEAA WP_087225255.1 RecT [Lachnoclostridium sp. An14] (SEQ ID NO:285) TDVKQELERKVGKQDSTAVRLTKNMSIPDMIKALEPEIRRALPAVLTPERFLRMALSAV NNTPKLAECTPMSFIAAMMNAAQLGLEPNTPLGQAYLIPYKNKNQLECQFQIGYKGIIDL AYRTGQMQMIQAHAVHEFDDFEYEYGLNPKLIHRPGDGNRGEITYFYGLFKLVNGGFG FEVMNREAMEAFAQQYSQSYGSQYSPWVKNFEDMAKKTVIKKALKYGPVKAEFQKAI SMDETIKTEIAVDMTEVQNEESE WP_002566991.1 RecT [Enterocloster bolteae] (SEQ ID NO:286) GVNVKHELEQRAAGQGASVRLTKNMTIVDMVKALEPEIRRALPAVLTPERFTRMALSSI NNTPELAECTPMSFIAALLNAAQLGLEPNTPLGQAYLIPYKNKGKLECQFQLGYKGLIDL AYRTGQVQIIQAQVVREFDSFEYQYGLDSKLVHKPGEGARGEITYVYGLFKLSNGGYGF EVSNKTEMDTFAARYSKSFGSKYSPWTEDYESMAKKTVIKRVLKYAPISSDFQKALSM DETIKTGIAVDMSEIRNECLPEEAGSEAA WP_132412730.1 RecT [Kribbella albertanoniae] (SEQ ID NO:287) ATADSVREELARSKEVERTQPKASNADNVIGLINRSLPEIAKALPGHVKPERIARIATTAV RVTPKLADCTQASFLGALLTAAQLGLEPNTPTGEAYLLPFGRNVQLIIGYRGYIKLANQS GQVRNIMAMTVYENDHFDYKYGSNPFLEHTPTLGQDPGPVKCWYACATFTNGGTNFV STDU2-42312.601 (S22-113) VLDKFKVEGYRARARSKDDGPWVTDYDAMARKTCIRRLAPYLPMSVELAQAMQVDE EVTAFTPGVSDPEVLATLAGVDTGTGEVQQ
Figure imgf000168_0001
WP_130067396.1 RecT [Bacillus albus] (SEQ ID NO:288) ATNEKLKNQLANRKESAPATPEQTVEAYMKKMAPKMAEVLPKHMDMGRMSRMALTT MRTSPKLLNCTVSSLMGAVMQAVQLGLEPGLLGHCYILPYKGEATFIIGYKGMIDLARR SGHIQSIYAHAVHENDEFDYELGLHPKLEHKPVHGDRGAFVGAYAVAHFKDGGYQME FMPKSEIEKRRKRSASANSSFSPWKSDYEEMAKKTVIRYIFKYLPISIEVQLLAAQDEVVR KDITEEPEFIEADPIDVEQPTTEGDGQQEFSIEE WP_087099033.1 RecT [Bacillus cytotoxicus] (SEQ ID NO:289) ATNEKIKNQLANRKANASLSPEQTVEAYMKKMAPRFAEVLPKHMDMDRMSRIALTTIR TNPKLLECNVPSLMGAVMQAVQLGLEPGLLGHCYILPYKGEATFIIGYKGMIDLARRSG HIQSIYAHAVYENDEFEYELGLNPQLKHKPSFGDRGEFIGAYAVAHFKDGGHQMEFMP KSEIEKRRKRSASANSNYSPWKSDYEEMAKKTVVRYMFKYLPISIEVQSQAQHDEVVR KDITEEPQFIEADSVEVEETPTEGTNQEEFVIEE WP_149216302.1 RecT [Bacillus sp. JAS24-2] (SEQ ID NO:290) ATNKDVKNQLANRKASAPVTTEQTVEAYMKKMGPKMAEVLPKHMDMDRMSRIALTT IRTNPKLLECSVPSLMGAVMSAVQLGLEPGLLGHCYILPYKSEATFIIGYKGMIDLARRS GHIQSIYAHAVYENDEFEYELGLHPQLKHKPSFGDRGEFIGAYAVAHFKDGGHQMEFM PKSEIEKRRGRSASANSNYSPWKTDYEEMAKKTVVRYMFKYLPISIEVQSQAQQDEVVR KDITEEPEFIEVEQQTEGDGQGDFVIEGE WP_125141636.1 RecT [Clostridium transplantifaecale] (SEQ ID NO:291) TDVKEELARKAGNTGKQEIRLNKNMSIPDMVKVLEPEIKRALPSVLTPERFTRMALSAIN NTPKLAECSPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKNQLECQFQIGYKGYIDL AYRTGQVQMIQAQAVHEFDYFEYEYGLTPKLVHRPGEGERGEITYFYGLFKMINGGFGF EVMNRAAMDAFAKQYSQSINSKYSPWNSQYEEMAKKTIIKKALKYGPVKSDFQKAISM DESIKTELSIDMSEVRNEDLIDGEFEEAA WP_120055566.1 RecT [Lachnoclostridium pacaense] (SEQ ID NO:292) TDVKQELEKRAGSSNQAIKLTKSMTIVDMVKALEPEIKRALPAVLTPERFTRMALSAINS TPKLAECTPMSFIAALMNAAQLGLEPNTPLGQAYLLPYKNKGVLECQFQIGYKGVIDLA YRTGQIQMIQAQAVRESDYFEYQYGLEPKLVHRPGDGARGEVTFIYGMFRLTNGGYGF EVSNKADMDAFAEKYSKSYGSRYSPWTENYEDMAKKTVIKRALKYAPISSDLQKALSS DETIKTVLSVDMSEINNECQIDEVIQEDAA WP_118246619.1 RecT [Clostridium sp. AM58-1XD] (SEQ ID NO:293) SVDVKNELEKRAAGTVNPAVKLTKNMTIVDMVRALEPEIKRALPTILTPERFMRMALSA INNTPELADCTPMSFIAALMNAAQLGMEPNTPLGQAYLIPYKNKGTLECQFQIGYKGLID LAYRTGLIQVIQAQTVREFDSFEYQYGLDSRLTHRPGDGERGEITYIYGLFKLTNGGYGF STDU2-42312.601 (S22-113) EVSNKADMDAFAEKYSKSFGSRFSPWKENYEDMAKKTVIKRALKYAPVSSDFQKALS MDETIKSELSIDMSEIRNECQVEASGQEGAA
Figure imgf000169_0001
WP_025114396.1 RecT [Lysinibacillus fusiformis] (SEQ ID NO:294) ATTNELKAKSQNQVQQNVTPEQSLNTLLKRMGPQIQRALPKHMDADRIARIALTAVRA TPKLLECDQMSFVAALMQSAQLGVEPNTGLGQAYLIPYGKQVQFQLGYKGLIDLAVRS GQYKAIYAHEVYKEDEFSFAYGLHKDLVHVPSTNPEGEPIGYYAVYHLKNGGYDFVYW TRERIDKHAHEFSQAVKKGWTSPWKTNYDAMAKKTVLKEVLKYAPKSIELQKVVEAD ETIKTEVSEDMSDVIDVTDYSVIEDESAQEELIIEQ WP_083048409.1 RecT [Marispirochaeta aestuarii] (SEQ ID NO:295) RTDGTKEAGAAATAPTEGKAPAKAHKPADTIGAMIEKLKPQIERALPKHVTPDRMARM ALTAIRNNPKLGQAEAVSLMGSIIQASQLGLEPNTPLGQCYIIPYNSKNGMQAQFQMGY KGIVDLAHRSGQYRQLTAHPVDEADEFRYSYGLNPDLVHVPAEKPSGKITHYYAVYHL TNGGFDFRVWSREKVEAHAKQYSKSFSSGPWQTNFDQMACKTVMIDLLRYAPKSVEIA KATSADNRTHTINPEDPDLNIDTIDGDFELEGEER WP_099424140.1 RecT [Solibacillus sp. R5-41] (SEQ ID NO:296) ATSNELKKQAQGQVTAKPTTPEGSLNALLKKMGPEIQRALPKHMDADRIARIALTAVRT TPKLLECDQLSFVAALMQSAQLGVEPNTGLGQAYLIPYGGKVQFQLGYKGLIDLAVRSG QYKAIYAHEVYADDEFSFAYGLHKDLVHVPSANPSGDPIGYYAVYHLKNGGYDFVYW TRERIDIHSKAFSQAVQKGWTSPWKTNYDAMAKKTVLKEVLKYAPKSIEMQKVVEADE TIKNEVAPDMSNVIDVTDYSILEDPQDVTDAQ WP_076065282.1 RecT [Viridibacillus sp. FSL H8-0123] (SEQ ID NO:297) ATNNALKEQMKQAPSKEVKPEQSLNTLLKRMGPEIQRALPKHMDADRIARIALTAVRN TPKLLDCDQMSFVAALMQSAQLGVEPNTGLGQAYLIPYGKQVQFQLGYKGLIDLAVRS GQYKAIYAHEVYEDDEFSFAYGLHKDLVHVPAPNPTGEPIGYYAVYHLQNGGYDFVY WTRERIDQHAHKFSMAVQKGWTSPWKTNFDAMAKKTVLKEVLKYAPKSIEMQKVVD ADETVKTDVSDDMSNVIDVTDYTVMDQEQETIQEPTK WP_024292388.1 RecT [Lacrimispora indolis] (SEQ ID NO:298) SDVKQELEKRAAGGGGQSQSVRLTKNMTIVDMVKALEPEIKRALPSILTPERFTRMALS AINNTPKLGECTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQLGYRGLI DLAYRNERMQSVEAQVVYENDEFSYELGLHPSLIHRPSFDEPGEIRAFYAIFRLDNGGFR FEVMSKSYVDAYAARYSKAFTSDFSPWKSNYEGMAKKTVIKQLLKYAPMKSEFQKAV TMDETIKTELSVDMSEVSNQEVIDRELTEQVA WP_009524931.1 RecT [Peptoanaerobacter stomatis] (SEQ ID NO:299) GAKELIQKKQENKQISPTSNMNMLLQSMAGAIKKALPAQINSERFQRVALTAFSSNQKL QQCDPISFLAAMMQSAQLGLEPNTPLGQAYLIPYGKQVQFQVGYKGLLELAQRSGQFK SIYSHEVRENDEFEMEYGLNQKLVHKPNLKQERGEVIGYYACYHLTNGGESMFFMTKD STDU2-42312.601 (S22-113) EIINFGKSKSKTFNNGPWQTDFDAMAKKTVLKQLLKYAPLSIESQKFMSMDETVKSDIS ANMDEINNDTVDFEVDIQTGEVINDIVVENTNEDEAN WP_015358111.1 RecT [Thermoclostridium stercorarium] (SEQ ID NO:300) TTVNQTELKNKLAEKAKTPAKTGNTVFDLIRKMEPEIKRALPKQISPERFARIAMTAVRN TPKLQACEPISFIAALMQSAQLGLEPNTPLGQAYLIPYGKEVQFQLGYQGMLTLAYRTGE YQSIYAMPVYANDEFEYEYGLNEKLVHKPAPDPEGEPIYYYAVYKLKNGGHGFVVMSR QQIERHRDKYSPSAKQGKFSPWNTDFDSMAKKTVLKQLLKYAPKSVEFATQIAQDETIK TEIAEDMTEVQGIEVEYEATDDQENQENQEQED WP_002595146.1 RecT [Enterocloster clostridioformis] (SEQ ID NO:301) GIDVKHELEKRAAGQDKPVKLTRNMTIADMVKALEPEIKRALPAILTPERFTRMALSAV NNTPELANCTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGTLECQFQLGYKGLID LAYRTGQIQIIQAQAVREFDYFEYQYGLDSRLVHKPGNEERGQITFIYGLFKLSNGGYGF EVSNKAEMDAFAAKYSKSFGSKYSPWTEDYESMAKKTVIKRALKYAPVSSDFQKALSL DETVKSEIAVDMSEIRNDCIPADMGTEAA WP_100306418.1 RecT [Lacrimispora celerecrescens] (SEQ ID NO:302) SDVKQELEKRAAGGGSQSQSVKLTKNMTIVDMVKALEPEIKRALPSILTPERFTRMALS AINNTPKLGECTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQLGYRGLI DLAYRNDRMQSIEAQVVYENDEFSYELGLHPSLTHRPSFDEPGEIRAFYAIFRLDNGGFR FEVMSKSYVDAYATKYSKAFTSDFSPWKNNYEGMAKKTVIKQLLKYAPIKSDFQKAIT LDETVKTQLSIDMSEIRNECLPDTSENSEVA WP_071062796.1 RecT [Andreesenia angusta] (SEQ ID NO:303) SNLKNQLANKAGGTATKKQPQTMQDWIKVMEPQIKKALPSVITAERFTRMALTAISTNP KLAECTPESFMGALMNAAQLGLEPNTPLGQAYLIPYGKSVQFQVGYKGLMELAQRSGQ FKSIYAHTVYENDEFEVEYGLTQNIVHKPNFDDRGKPIGFYAVYKLTNGGENFVFMTQR EVEEFGKAKSKTFNNGPWKTDFEAMAKKTVLKQLLKYAPIKVEFQREIAQDATIKTEIA EDMTEVPEEMVEAEYEVVEQNTMAEDADLKGTPFETK SFO83314.1 RecT [Amycolatopsis arida] (SEQ ID NO:304) HGTALNPERFTRVALTVIRQSADLQRCRPESLLGALMTSAQLGLEPGPLGEAYLVPYGD QVTFIPGYRGLIKLAWQSGQLRHISARVVHEGDRFSYSYGLHPDLIHQPTRGDRGPITDV YAAATLIDGGVEFEVLDVATVETIRARSRAGRKGPWVTDWEAMARKTAIRQLAKWLP MATVMSRAIAAEGTVRTDLDADALDDLTADPGPEVLDADPAWDGPEPPGDQARNQEP TTQGDA WP_110092637.1 RecT [Corynebacterium striatum] (SEQ ID NO:305) GTNLEQRMAANNAPAKQNRPVTLADQIRSMESQFQLAMPKGMEAQQLVRDALTCLRQ TPKLAECTPQSVLGGLMTCSQLGLRPGVLGHAYLLPFWDRKQGGMVAQLVVGYRGLV ELAHRSGQIQSLIARTVYENDHFDVDYGLDDKLVHKPCMNGPKGNPIAYYAVAKFTTG STDU2-42312.601 (S22-113) GHSFIVMSKDEMLAYRDEFAKAKNKQGEVFGPWADNFDAMAHKTCVRQLAKWMPSS TDLDRGIAADETVRVDLSESALDYPQHVDGEVVDSKPAEDEAA
Figure imgf000171_0001
WP_129692339.1 RecT [Gottfriedia acidiceleris] (SEQ ID NO:306) ATPAELKNLLAAKPKGEVKLTPDQQVSSYLKAYEGTFRQIAPKHFNTERFQRIALSEIRK NPKLLDCNLPSLMSAVLQSVKLGLEPGLFGQAYLIPYGKEVQFQIGYKGLIELAQRSGRI AKIQAREVYEHDEFEVSYGIDDTIIHKPKLDGDRGDVRLYYAVAWFKDGAAQFEIMSKS DVENHRDKFSKTKNYGPWKENFDAMARKTVLKKLVNQLPMDVEFHEAVQEDETVRK TINDEPEVIAAEYEIIDAPEVVEGNE WP_118016648.1 RecT [unclassified Coprococcus] (multispecies) (SEQ ID NO:307) ANNIDLKQELAEQASKVPAKKDEEVKLTKSMTIPDMVKAMMPEIKKALPAVMTPERFT RIALSALNTTPALNQCTPMSFLAALMNAAQLGLEPNTPLGQAYLIPYKNHGTLECQFQIG YKGLIELAYRSGQMQTIQAQTVYENDEFAYQYGLEPVLVHRPAYSDRGEVKYFYGIFKT VNGGYGMAVMSRAEMDLYAKTYSKAYDSSYSPWKSNYEDMAKKTVIKQALKYAPIK TDFQRALSFDETIKKEISLDMSTVKNELLDVA WP_051200279.1 RecT [Butyrivibrio sp. FCS006] (SEQ ID NO:308) PYLFGGQMKEQEIKNQLAAKAVETTNPKLSKNMNIADLIKAIEPEIKKALPTVITPERFTR IALSALNTTPKLAECSQMSFLAALMNAAQLGLEVNSPLGQAYLIPYNNKGKLECQFQIG YKGMLGLAYRNPEIQTIQAQVVYENDDFKYELGLDSKLYHKPSLSDRGKVRCYYALYK LRNGGYGFEVMSRRDVEEYAKRYSKVTDSLYSPWANNFDSMAKKTVIKQLLKYAPLR TDLEKAMSMDESIKTRVSVDMSEVENEETFDAEVEV WP_107514794.1 RecT [Staphylococcus equorum] (SEQ ID NO:309) ATNETLKQKVVERKPNGVKEQSPKTQLNHLLKKMAPEIQRALPKHMDSDRMARIAMT AVSNTPKLLECDQMSFIAALMQASQLGVEPNTGLGQAYLIPYAGKVQFQLSYKGLIDLA TRSGQYKSIYAHEVYTNDEFEYRYGLFKDLIHIPSQEPEGNPIGYYAVYHLKNGGYDFV YWTRERVDKHAKEFSQAVQKGWTSPWITNYDAMAKKTVLKEVLKYAPKSIEMNKAV ENDSTIKEEIDKDMSTVIDVTDYSEVEEQESLETGGQTSK WP_117624242.1 RecT [Hungatella hathewayi] (SEQ ID NO:310) RRDRNVTAVKQELEKKAAGTSQAVKLTKNMTIVDMVKALEPEIKRALPSILTPERFTRM ALSAINNTPKLAECTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQLGY RGMIDLAYRNERMQSIEAQTVYEHDEFFYELGLHPALVHRPTFEDRGEIRAFYAIFRLDN GGYRFEVMSKSYVDAYAMRYSKAFTSEFSPWKSNYEGMAKKTVIKQLLKYAPVKSEF QKAITLDETVKTELSVDMSEVQNEDLSETLTAESAA WP_118771779.1 RecT [Roseburia intestinalis] (SEQ ID NO:311) GDIRSELAKKAEQTQGNTKLTKSMSIADLIKAMEPEIQKALPSVITPERFTRMALSALNTT PKLQECTPMSFLAALMNAAQLGLEPNTPLGQAYLIPYKNKNVLECQFQLGYRGMIDLA YRNGHMQSIEAQAVYENDVFSYALGLHPELVHKPTLEEKGALKAFYAIFRLDNGGFRFE STDU2-42312.601 (S22-113) VMGKTYIDWYANRYSKAFTSEFSPWKSNYEGMAKKTVIKQLLKYAPLKTEFQRALSTD ETIKNSLNVDMGEVLSEDIIDMPCEEVA
Figure imgf000172_0001
WP_107378794.1 RecT [Staphylococcus chromogenes] (SEQ ID NO:312) ANAKEFKKQMNSKNEVAETNNAPQKAKGPRQQVSDLLDRMAPEIQKALPNNMSAERM ARIAMTAVSSNPKLLECDPKSFIGALMQASQIGLEPNTALGQAYLIPYGNQVQLQLSYLG LIELATRTGQYKAIYAHEVYKDDEFSYEYGLYKNLIHKPVDDPNGEPIGYYAVYHLMNG GYDFAYWTRKKVEAHAQQYSKAVQQGWNSPWKSDFNAMAKKTVLKDLLKYAPKAIE VSQAIGSDSKVSEINDEGEIIDVTDYSQEEEK WP_094369469.1 RecT [Romboutsia weinsteinii] (SEQ ID NO:313) TNLKNTLKNKEAKGNNLAINPSYAMKQLMIKMKGEITSALPKELCSERFQRVALTAFNS NPKLQNCAPMTFIAAMMQSDQLGLEPNTPLGQAYLIPYKVKGIDKVQFQIGYKGLLELA HRSGRLKTLYAHEVRENDEFDIDYGLEQRLIHKPLLKGNRGEVIGYYAVYHLEHNGYSF VFMTYDEVLEHGKKYSKSFEGGIWEKEFDSMAKKTVIKKLLKYAPLSIEIQKAINFDESV KGSIDSDMLLVDKADESIDVEGNVLNQRGIKYGCI CDF42377.1 [Roseburia sp. CAG:182] (SEQ ID NO:314) DVKEELAKMAEEKPTKKLTKSMSIQDMIKVIEPEIKKALPSVLTPERFTRMALSAINNTP KLAECSQISFLAALMNAAQLGLEPNTPLGQAYLIPFQNKGKLECQFQIGYKGIIELVYRN PLIQTIQAQVVYENDEFEYELGLNSRLFHRPALYDRGETVLFYALFKMSNGGYGFEVLS KQDMDAYAKRYSKGISSEYSPWKSNYEEMAKKTMIKKVLKYAPIRTDFQKAVSMDESI KKELSVDMSEVSNENIIDMEEITQEEE WP_123609006.1 RecT [Mobilisporobacter senegalensis] (SEQ ID NO:315) KDIKSALEKKVDKQDVKLTKSMSITDMIKALEPEIKKALPSVITPERFTRMALSAVNNTP KLAECSQMSFLAALMNAAQLGLEPNTSLGQAYLIPYQNKGKLECQFQLGFKGMIDLVY RNEKVQTIQAHCVYEEDYFEYELGLDSKLAHKPALANRGKMILVYAFFKLENGGFGFE VMSKEDIDIHALKYSKGYSSQYSPWKSNYEDMAKKTVIKKVLKYAPLKIDFQRAISVDE TVKAEISIDMSEVQNEEIIDGQCTDVGEIEEK WP_115856892.1 RecT [Staphylococcus felis] (SEQ ID NO:316) ANANSFKEQVSNKNEVSENNNTPQQKTKGPRQQVSDLLERMAPEIQKALPSHMSAERM ARIAMTAISSNTQLLECNPRSLIGALLQASQIGLEPNTALGQAYLIPYYNRNKGEFEAQLQ LSYLGLIELATRTGQYKAIYAHEVYKEDEFYYEYGLHKNLVHKPVDDPKSEPIGYYAVY HLQNGGYDFSFWTRNKVELHSGQYSKAVQKGWNSPWKTDFNAMAKKTVLKDLLKYA PKSVEVSRAVGTDSKVSEISQNGEIIDVTDYSKEEE WP_108404827.1 RecT [Corynebacterium liangguodongii] (SEQ ID NO:317) KDLETRMAANQQPAQQRPTTLADQIRGMEQQFALAMPKGAEASQLVRDALTALRQAP KLAQCTPQSVLGSLMTCAQLGLRPGVLGHAYLIPFYDRRAGGLVAQLVIGYQGLVELA HRSGQIKSLIARTVYENDVFDVDYGLEDKLVHKPYMGGDKGQPIAYYAVAKFTTGGHA STDU2-42312.601 (S22-113) FYVMSHPEMLDYRARFAKSAERGPWVDNFEAMALKTCVRQLSKWMPKSTELATAIAA DESVRVDLTPDAINYPEHVDGEVVDAQGTTEDTAGEGEQSA
Figure imgf000173_0001
WP_021747387.1 RecT [unclassified Oscillibacter] (multispecies) (SEQ ID NO:318) KEGLIQGTQSAQAAKKGPATMQDYIKKMQGEIAKALPSVLTPERFTRITLSALSTNPKLA QTTPKSFLGAMMTAAQLGMEPNTPLGQAYLIPFKNHGVLECQFQLGYKGLIDLAYRSG EVSTIQAQTVYENDEFEYELGLEPKLHHVPAKGERGEPVYFYAVFRTKDGGYGFEVMS VDDVRTHAKKYSKAYSNGPWQTNFEEMAKKTVLKKALKYAPLKTEFMRGLTSDETIK TEISEDMYSVPDETVIEAEGYEVDGDTGEVIERPADGQ WP_103110615.1 RecT [Brevibacillus reuszeri] (SEQ ID NO:319) SNKLAQRAGQQTQPVKPDQQISALLKRMEPEIARALPKHLTSDRLARIAMTSIRQNPKLL ACDQMSLLAGVMQSAQLGLEPNTPLGEAYLIPYGKEAQFQVGYKGIISLAHRTGEYQAI YAHEVFKNDEFSYSYGLDKTLNHKPADEPEGDPIYFYAVYRLKNGGFDFVVWSTKKID AHAKKYSQAYQKGWTTPWKTDFVAMAKKTVLKEVLKYAPKSAEMAKALVMDETVK NEISEDMSEVPGMVIDIEADAANVEETAGGGASE WP_016998679.1 RecT [Mammaliicoccus] (SEQ ID NO:320) ATNESIKNQVASRKKNEVQNKSPKTQLNDLLIKMGPEIQRALPKHMDADRMARIAMTA VSTTPKLLECDQMSFIGALMQASQLGVEPNTGLGQAYLIPYGGKVQFQLSYKGLIDLAT RSGQYKAIYAHEVFPNDEFNYQYGLFKNLEHIPSQEPEGEPIGYYAVYHLKNGGYDFVY WTRERVDKHAKDFSQAVQKGWTSPWKTNYDAMAKKTVLKEVLKYAPKSIEMNKAVN SDSTIKDEINEDMSSVIDITDYEEVNDQQEEKKEESK WP_147540090.1 RecT [Clostridiaceae bacterium] (SEQ ID NO:321) SNLKKALKTNETKGNSVTVSKAYAMKQLMIKMKGEITSALPTNLSSERFEKVALTAFNS NPKLQKCDPRTFIAAMMQSAQLGLEPNTALGLAYLIPYEVKGINKVQFQIGYKGLLELA NRSGKLKTLYAHEVRENDEFDIDYGLEQKLIHKPLLKGNRGNVIGYYAVYHLEPSGYNF VFMTYDEVLEHGKKYSKSFEGGVWEKEFDSMAKKTVIKKLLKYAPLSIEMQKAIVFDE SVKGSIDSDMLLVDKEDESIEGSELN WP_019168122.1 RecT [Staphylococcus intermedius] (SEQ ID NO:322) ANANSFKEQVSKNEVQETNNEKPKGPRQQVSDLLERMAPEIQKALPSHMSAERMARIA MTAISSNPQLLECNPRSLIGALLQASQIGLEPNTALGQAYLIPYYNHKKKEFEAQLQLSYL GLIELATRTGQYKAIYAHEVYKEDEFYYEYGLHKNLVHKPVDDPNGEPVGYYAVYHLQ NGGFDFAYWTKNKIELHAGNYSKAVQKGWNSPWKTDFNAMAKKTVLKDLLKYAPKSI EISQAVGSDSKVTEINKQGEIIDITEYGQEALEG WP_148820236.1 RecT [Corynebacterium urealyticum] (SEQ ID NO:323) AKNLEARMQQSTNAPARADKPLSLPDQIRQMEDQFRLAMPKGAEATQLVRDALTCLR QTPQLAQCTPASVLGGLMTCAQLGLRPGVLGHAYLIPFNDRRSGNSVAQLVIGYQGLVE LAHRSGQIKALIARTVYENDHFDVDYGLEDKLVHKPHMGADKGNPVAYYAVVKFTTG STDU2-42312.601 (S22-113) GHAFYVMSHPEMLQYRDKNAKSPKRGPWVDNFEAMAHKTCVRQLAKWMPKSTEFSQ ALATDESIRLDVTPDAINYPDHPAEGEVIDGEVEQDGGQQ WP_096823857.1 RecT [Staphylococcus nepalensis] (SEQ ID NO:324) ATQNQFKNQLTQKKENNNQPQQKAVGPKQEISNLLDRMAPQIQKALPQHMSAERMARI AMTAVSSTPKLLECDPKSLIGALMQSSQIGLEPNTNLGQAYLIPYGKEVQLQVSYLGMIE LANRSKQYKAIYAHEVYPEDYFEYQYGLQKDLIHKPADNPQSEPIGYYAVYHLLNGGY DFVYWSKAKIDDHARQFSKAVQKGWQSPWKTNFNAMAKKTVLKDLLKFAPKSIEMN NAVSSDSKAQQIDDDGNIIDVTDYSQVNDEPEQLQEGQ WP_098170605.1 RecT [Bacillus sp. AFS017336] (SEQ ID NO:325) ATNESLKNQITNKKTGEVPLTPAQQVSSYLKAYEGTFQQIAPKHFNTERFQRIALSEIRKN PKLLECSVPSLMSAVLQSVKLGLEPGLFGQAYLIPYGKEVQFQIGYRGLIELSQRSGRILK IQAREVYENDEFEVSYGIDDNIIHKPALDVDRGKVRLYYAVAWFKDGGAQFELMSISDV EKHRDKFSKTAKFGPWKDHFDEMAKKTVLKKLVKQLPMDVEFQEAVQEDETVRKTIT DEPEILQAEFEIVDQPEISVE WP_087290962.1 RecT [Pseudoflavonifractor sp. An184] (SEQ ID NO:326) ATEKAIQRATGRAPALENRPALQQYIKQMSGEIKKALPSVMTPERFTRIVLSALSTNPKL AETTPQSFLGAMMTAAQLGLEPNTPLGQAYLLPYWNSKANAYECQFQLGYKGLLDLA YRSGEISVIQAHVVYSEDQFSYSFGLKPELKHIPAGEERGEPVYVYAIFHTKDGGYGFEV CSIDDIRAHAQRYSKSFQNGPWQTNFEEMAKKTVLKRVLKYAPLKSEFLRGLAQDETIK QEISEDMYMVEAAYAEPDVSSAEND WP_051264703.1 RecT [Nakamurella lactea] (SEQ ID NO:327) ASNLAARAAEQVEQQTAPNRPPTIKEQIGRMESQFALAMPRGSEAAQLVRDAITAINTN PQLAECTPASVLGALMTCAQLGLRPGVLGHAWVLPFRSKGVMQAQLVIGYQGLVELA HRTGQVASLIAREVHERDHFDVDYGLADSLIHKPLLNGDRGPVTGYYAIVKFKGGGHSF IYASKADVEAHRDKFSKMKSFGPWVDNFDSMALKTVVRMLAKWMPKSTEFANAISAD EGVRVDYSPTADVAQATEYVQPQLEEAPVEGVVVSEGGES CCZ61365.1 [Clostridium hathewayi CAG:224] (SEQ ID NO:328) ANDIRGELARRASGTETQAVKLTKNMSIPDMIKALEPEIKRALPTILTPERFTRIALSAINN TPKLAECSPMSFIAALMNAAQLGLEPNTPLGQAFLIPYKVKGSLECQFQIGYRGMIDLAY RNERVQSIEAHTVYENDVFEYELGLNPRLVHIPTMEEPGDPIAFYGIFRLDNGGFRFEVM NKNAIDAYAARYSKAYDSASSPWKNNYESMACKTVLKQLLKYSPMKSEFQKAVSMDE SVKTELSVDMSEVQNVNLIEETQEDAA WP_068720576.1 RecT [Veillonellaceae bacterium DNF00626] (SEQ ID NO:329) KTTGGLQQQQQQQAQALQNGGTTLKGYLQAMMPEIKKALPTVMTPERFTRIVMTTIST NPALQNCTPQSFLGAVMQAAQLGVEPNTPLGQAYLIPYGNQVQFQLGYKGLIDLAYRS GEVQSLQAHEVYQNDTFEYELGLNPKLKHIPALTNRGDVILYYAVIKFKNGGEGFEVMS STDU2-42312.601 (S22-113) KEDVEAFAKSKSKTYGRGPWQTDFDEMAKKTVLKKVLKYAPMKTDFIRAVATDETVK SSVAEQMADLPDETVTIDTEAQVVVDKETGEVKS
Figure imgf000175_0001
WP_037404193.1 RecT [Solobacterium moorei] (SEQ ID NO:330) TEIKAAKAPATVAKAGVSTQNKTIKDYITIMKPEIEKALPSTITPERFTRITLSAVSNNPKL QACSPSTFLSAMMQSAQLGLEPNTPLGQAYLIPYGNSCQFQLGYKGLLQLAYNSGQIKTI RTETVYENDEFKYELGLHSDLVHVPAMSNRGNPTAYYAVIEYTNGGYGFEVMSHDDVL EHAKKFSKTFNNGPWQSDFESMAKKTVLKQALKYAPLSTELVSKINTDETVKSSISDHM EEVKNDIDLSQIIDAETGEIHE WP_027347470.1 RecT [Helcococcus sueciensis] (SEQ ID NO:331) AQAKELLENKTNNTVKKSEKQTMENLLTLMADEIKKALPENVKSERFRRIALTAFNGN KDLQQCEPTSFLAAMMQSAQLGLEPNTPLGQAYLIPYNNSKKNIKEVQFQVGYKGMLD LAHRTNQYKNIQANIVYEKDEFDIEYGLNPKLKHIPNMKEDRGQAIGYYAVYNLINGGQ GFEYMTRAEVEKHAQKFSKTYRNGPWQTDFDEMAKKTVLKKVLKYAPMSTELQEATA IDERVVNEENIKSKNEDKFVDVDWSYVDDVEEDVIE WP_072526012.1 RecT [Clostridium sp. Marseille-P3244] (SEQ ID NO:332) AARATNSVKEELAKKAETKAVGEKKLTRSMSIADLIKAMAPEIKKALPEVITPERFTRM ALSALNTTPKLQECTQMSFLAALMNAAQLGLEPNTPLGQAYLIPFNNKGTMECQFQIGY KGLIDLGYRNPQMQIISAQAVYENDEFEYELGLNPKLEHRPALHDRGELRLFYGLFKLV NGGFGFEVMSKEAVDAYAKEYSKSFDSSFSPWKTNYEAMAKKTVIKQALKYAPIKADF RKALSTDETIKNEIAEDMSEIHGEDIFDAEYTEQTA WP_092453396.1 RecT [Clostridium fimetarium] (SEQ ID NO:333) ETIDIKQELASQAQTDSKKEVKLTKAMSIAEMIKAMMPEIKRALPSMITPERFTRIALSAL NNTPELQACTPMSFISALLNAAQLGLEINSPLGHAYLIPYKNKGVLECQFQIGYLGLIALA YRNELMQTIQAQCVYENDEFLYEYGLNPKLVHRPATSDRGEPVFFYGLFKMINSGFGFC VMSKQEMDEFARTYSKGLASSFSPWKTSYNEMAKKTVIKQALKYAPIKTDFQKALSTD ESIKYAISEDMTEAVNEIVSQNTEVA WP_027295741.1 RecT [Robinsoniella sp. KNHs210] (SEQ ID NO:334) TTRTGNIKEELAKKAEGTNGDTRLTKAMSIADLIKAMEPEIKKALPEVITPERFTRMALS ALNTTPKLRECTQISFLAAMMNAAQLGLEPNTPLGQAYLIPFNNKGTMECQFQIGYKGM IDLSYRNPQMQMISAQAVYENDEFKYELGLNPTLIHRPVLRGRGEVILFYGLFKLTNGGY GFEVMSKEEMDAYAKAYSKAIDSSFSPWKSNYNGMAKKTVIKQVLKYAPIKADFRKAL SSDETIKNEISENMSEIHGEIIFDTDYMEESA WP_117768035.1 RecT [Blautia sp. OF03-15BH] (SEQ ID NO:335) NVKEELAQKAEITQKEVKLKKSMSISDMIRALQPEIKKALPSVVTPERFIRMALSALNTT PKLAECSQISVLAALMNAAQLGLEPNTPMGQAYLIPFNNKGKMECQFQIGYKGLLELVY RNPAIQIIQAQTVYENDYFEYELGLNSRLIHRPELEDRGEIRLFYGLFKMVNGGYGFEVM STDU2-42312.601 (S22-113) SRQEMDQYAARYSKSFASGFSPWENNYEDMAKKTMIKRVLKYAPVKIETARALINDESI KLHLSEDMSEVENETVVDGQAEEKAA
Figure imgf000176_0001
SCJ42694.1 [Ruminococcus sp.] (SEQ ID NO:336) GTSIQKNVENNALQKEKMPTMQAYIKKMEGEIKKALPSVMTPERFTRITLSALSTNPKL AATTPGSFLGAMMTAAQLGLEPNTPLGQAYLIPYSNKGKLECQFQIGYKGLIDLAYRSG SISVIQAHTVYENDDFEYELGLDPKLKHIPSKSADKGNPAWFYAVFKTKDGGYGFEVMS IEDIRSHAAKYSQSYNSAYSPWKTNFEEMAKKTVLKKALKYAPLKSDFVRQISTDETIKT KLSDDMFSVPAETIEVEGIEVDTETGEITEVDHA WP_092724975.1 RecT [Romboutsia lituseburensis] (SEQ ID NO:337) SNLKNVLKNQEDKGQGITVNPTYAMKQLMIKMKNDIDLALPKNLSSERFQKVSMSAFN NNEKLQNCEPTTFIAAMMQSAQLGLEPNTPLGQVYLIPHNLNGVDKVQFQVGYKGLLQ LAHRSGKLKTLYAHEVKENDEFEIDYGLEQKLIHKPLLKGNRGDVIGYYAVYHLEPSGY SFEFMTYDEVAKHGKKYSKDFEGGIWEKDFDSMAKKTVIKKLLKYAPLSIEMQKAVAF DESVKSSIDSDMLLVESIGE KKZ74881.1 VO63_05385 [Streptomyces showdoensis] (SEQ ID NO:338) TSDARNAVARRAANVGQVEQAGEQPKPTMAQQIERMKPEIARALPKHMDADRIARIAL TLIRKNPDLANCTTESFLGALMTCSQLGFEPGSPTQEAYIIPRKGQAEFQLGYQGMVTLF YQHPMASSVKVETVRENDYFEHEEGLEERLIHRPFADGPRGKAIAYYSVARLINGGRTF KVMYPAEIEERRQKLPSKNSPAWRDNYDEMAKKTVLRNHFKALPKSAELARALAHDG TVRTDWQPDAIDVPPEYLSEPQRPELEAGAQ WP_055284109.1 RecT [Dorea longicatena] (SEQ ID NO:339) TVGKTDEIKQELARKVENTKAGTKLKKSMSIADMIKVMEPQIKKALPEVITPERFTRMA LSALNTTPKLNECTPMSFLAALMNAAQLGLEPNTPLGQAFLIPYNNKGKMECQFQLGY KGLIDLSYRNPNMQIITAHTVYENDEFEYELGLNPCLDHRPTLGERGEIRLFYGLFKLTN GGFGFEVMSKTAMDDFAKEYSKAFDSSFSPWRTNYESMALKTIIKKALKYAPLKSEFRN ALSTDETIKNEIGADMSEINSENIFDTVYQEECA SDL28883.1 RecT [Streptomyces indicus] (SEQ ID NO:340) STDARNAVARRAETVGQVEQQAQQQPTLAQQIERMKPEMERALPKHMSADRMARIAL TLIRKNPDLATCNTQSFLGALMTCSQLGFEPGSPTQEAYIIPRKGNAEFQLGYQGMVTLF YQHPMASSIKVETVRENDYFEHEEGLEERLVHRPCATGPRGRAIAYYSVARLINGGRTF KVMYPDEIEERRQKLPSKNSPAWRDNYDEMAKKTVLRNHFKALPKSAQLARALAHDG TVRTDATADVIDVAPEYPQRPELEAGPTA WP_145458209.1 RecT [Staphylococcus pettenkoferi] (SEQ ID NO:341) ATQKDFKNQISQKETQQKQEVQKKKKGPRQQVSDLLDRMAPEIEKALPNHLSADRMAR VAMTAVSSNPKLLECDPKSFIGAVMQSAQLGLEPNTALGEAYLVPYAGKVNFQLSYLG LINLATRSGQYKAIYAHEVYAEDEFRYQYGLHKDLIHKPVDNPKGKPIGYYAVYHLLNG STDU2-42312.601 (S22-113) GYDFVYWTTERIQKHAKKYSFAVQKGYQSPWNDEFDAMAKKTVLKDLLKYAPKSIEM NNAVRSDDKQSELSDEGVVIDVTNYDEENGEEK
Figure imgf000177_0001
WP_117787252.1 RecT [Tyzzerella nexilis] (SEQ ID NO:342) AGVKEELAKKAESTKGETKLTKSMSIADLIKAMEPEIKKALPEVITPERFTRMALSALNT TPKLRECTQMSFLAALMNAAQLGLEPNTPLGQAYLIPYNNKGVMECQFQIGYKGLIDLS YRNPQMQIISAQAVYENDDFSYELGLNPKLEHCPTLGERGEVRLFYGFFKLVNGGFGFE VMSKTAMDEYAKEYSKAFDSSFSPWKSNYIGMAKKTVIKQALKYAPLKTDFRKALSND ETIKTELSDDMSDIHGEEIWDVEYQEKTA WP_073112630.1 RecT [Hespellia stercorisuis] (SEQ ID NO:343) ADIKEELAKKVAEGTEDKKKLTKSMSIADLIKAMEPEIKKALPEVITPERFTRMALSALN TTPKLKECTQTSFLTALMNAAQLGLEPNTPLGQAYLIPYKNKGNLECQFQIGYKGLIDLS YRNRQMQIIQAQAVYENDEFEYELGLNPVLVHRPALQNRGAVKLFYGIFKLTNGGFGFE VMSKADMDAYAKEYSKAFDSSFSPWKSNYIGMAKKTVIKQAIKYAPLKTDFRKALSTD ETIKTEFCEDMSEVQCKDIWDTEYKERSA CDD36322.1 [Roseburia sp. CAG:309] (SEQ ID NO:344) DVKNELAKKAENTGKVKLTKSMSIADMIKTLEPEIARALPSVITPERFTRMALNALNNTP KLAECTQMSFLAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQIGYKGMLDLVY RNEMVQTVQAQVVYQNDEFHYALGLTGRLEHIPTLRDRGEPYAFYALFKLENGGYGFE VMSKTDMDAFALQYSKGISSEYSPWKTNYIDMAKKTVIKKVLKYAPLKTEFQRALSND ETIKTHFAVDMSEVEPETVIDMEEGELLESAS WP_128520904.1 RecT [Absicoccus porci] (SEQ ID NO:345) TTTNQQGMITKKANNSVAKKTNRTMKDYITMYQGEIAKALPSVMTPERFVRIATTAVT NTPKLASCTPQSFIGALLNAAQLGLEPNTPLGQAYLIPYGNQCQFQIGYLGMVELAQRA GTNVDAHVVYANDEFDYSLGLHPDIKHVPAMKDRGEAIAYYAVWHNGENFGFEVMSR EDVEKHMKKYSKTYSNGPWKTEFDEMAKKTVLKRALKYAPKKTDLARAVMQDETIK QFNPKADNDMADAKNDFFDVEYDEVDENTDPVTGEVK GAK01483.1 RecT [Geomicrobium sp. JCM 19055] (SEQ ID NO:346) GYKGMIDLARRSGHIKSIYAHTVHANDEFEYELGLEPKLVHKPATGDRGNMEYAYAVA HFVDGGYQFEVFSHHDIEQVKKRSKAGNFGPWKTDYEEMAKKTVVRRMFKYLPISIEIQ QHASQDETVRRDITEEAEKVDNIIDLPNYEDPNNIDVPDEEQDEQKDEKQKQQGSAEEIA LDFK WP_135329961.1 RecT [Streptomyces sp. MZ04] (SEQ ID NO:347) STNLAARVEARRQNPTTKQPARRGKAAQQPTLVQFVQSMRGEIARALPSHVASPERIAR IALTELRRVDHLAECTQESFGGALMTCAALGLEPGGVGGEAYLLPFWNKKVRAYEVTL VIGYQGMVRLFWQHPAAAGLAAHTVHEGDEFDFEYGLEPFLRHKPARTGRGKPTDYY STDU2-42312.601 (S22-113) AVAKMANGGSAFVVMNVEDIEAIRHRSKARDAGPWSTDYGLRRHGAQDLHSAVVQV AAEVC WP_079588582.1 RecT [Acetoanaerobium noterae] (SEQ ID NO:348) SNLKNELAKKANNSVTDGNKEPQTIKDWIKVMEPAIKKALPSVITPERFTRMALTAISVN PKLAECTPKSFMGSLMNAAQLGLEPNTPLGQAYLIPYKNKGNMEVQFQIGYKGLIELAY RSGEFANIYAKEVFENDEFEYEFGLEPVLKHKPASGNRGEVIAYYAVFKLTNGGFGFEV MSKEDITNHAKTYSQAYSSSYSPWSKNFDEMAKKTVLKKVLKYAPIKVEFVKQIVQDS TIKTEINSDMTEVESQNVFEAEETDYEVIDQEETK WP_107635892.1 RecT [Staphylococcus haemolyticus] (SEQ ID NO:349) ATQNEFKNQLAKKEDKGNTNAPTQTKSTNPRTIAQNYLAKMKPEIEKALPAHMSHERM TRIALSAVNSNPELTEVILNNPTSFLGALMQSAQLGLEPNTNLGHAYLIPYYDKNSGKKI VNLQLGYMGLLDLAHRSGMYQKIFAMPVYKDDYFEYQYGTNEKLNHVPAQVSKGEPI GYYAFYKLTNGGVHFVYWSRQKMQMHKDRYTRRGSVWNNNFDAMALKTVIKDVLK YAPKSVEMGEAVQSDENNFEFNEDSKVIDVTDYETEENK WP_107638953.1 RecT [Staphylococcus hominis] (SEQ ID NO:350) ATANDFKNQVTKKESDNTKESSNKKTELATTSPRQVAQNYLEKMKPEIAKALPAHMSH ERMTRIALSAVNSNPQLTEVILNNPTSFLGALMQSAQLGLEPNTSLGHAYLIPYNFKGKK IVNLQLGYMGLLELAHRSGLYKKIFAMPVFKDDFFEYQYGTNEKLNHIPAQVQNGDAV GYYAFYQLTNGGVHFVYWSRQKMERHKDLYTRKGSVWNTNFDAMALKTVIKDVLKY APKSVEMSSAVQSDNSNFEFSEDSSTVIDVTDYETEDNK SUY49750.1 RecT [Lacrimispora sphenoides] (SEQ ID NO:351) ADVKQELEKRAAGSGGQSVKLTKNMTIVDMVKALEPEIKRALPCILTPERFSRMALSAI NNTPKLGECTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQLGYRGLID LAYRNERMQSIEAQVVYDNDEFSYELGLHPSLIHRPTFDEPGEIQAFYAIFRLDNGGFRFE VMSKNYVDSLCHALFKSIYFRFQSLEK CDE68291.1 [Clostridium sp. CAG:277] (SEQ ID NO:352) DFKEELAAKAEVAATTKKSDGVKLTKNMSIVDMIKALEPEIKRALPSVLTPERFVRMAL TAVNNTPALAQCTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKKKGVVECQFQIGY KGMIDLVYRNDNVQTIQAHIVRENDHFEYELGLESKLRHIPAMEGRGEMMYVYALFKL TNGGYGFEVLNKEAVIAHAERYSPSYDGFSPWKTDFESKGLELFLILDLSSKQSGK WP_060905391.1 RecT [Streptomyces scabiei] (SEQ ID NO:353) DADRMARIALTLIRKNPDLATCSGESFLGALMTCSQLGFEPGSPTQEAFIVPYKGEATFQ LGYQGMVTLFYQHPMASSVKVETVRENDYFEHEEGLEEKLVHRPCKTGPRGKAIAYYS VARLINGGRTFKVMYPAEIEERREKLPSKNSPAWRNSYDEMAKKTVLRNHFKALPKSA ELARAMAHDGTVRTDWQPDAIDVPPEYLSEPQRPELGTGSTQ STDU2-42312.601 (S22-113) WP_146678271.1 RecT [Pirellula sp. SH-Sr6A] (SEQ ID NO:354)
Figure imgf000179_0001
SEATKEAKPETAIAKKPNGIKDWLKSDALKTQIASVAPKHMAPERVMRIALNAVSRTPK LADCTPESFMRCLLDLSSWGLEPDGRHAHLIPYGTECTLVLDYKGLVTLAYRSGWVKKI HADVVFEGDIFVYSLGTVCQHIPWEFRDDANKPEHKGHFRAAYCVVTMADGIEKHEV MTASEIDAIKAKSRSGNSGPWKDHYTEMAKKTVYRRASKWLPLSPEQADAMERDDDRI IDAVSVAVTQRLSKAAMPLIGANETGDTE WP_126032909.1 RecT [Bifidobacterium castoris] (SEQ ID NO:355) GALATTAKNNELTTMNTMGDIHALIRGRRAQIESVMSGVLTPERLYSLLQSAVSHEPKL LQCTPESIVACCMKCAVLGLEPSNVDGLGKAYILPYGNKNYQTGQVEATFILGYKGMIE LARRSGEIKSLNVTPVFEDDGIKLFMDEAGQPYIKAGEVNPLANHTPDKLMFVFLNAEF TNGGHYRTYMTRAEIDAAKKRSSAGDRGPWKTDYVAMARKTVVRRAFPYLPVSTEAQ SAAVEDETTPHFDFLDRNTTPVGEPSDVMQEATA WP_114599505.1 RecT [Staphylococcus warneri] (SEQ ID NO:356) ATQNDFKNQITDKKENKPQQSTNPRQVASDLLERMKPEIAKALPAHMSQDRMTRIALS AVNSNPKLSEVILNNPTSFLGALMQSAQLGLEPNTNLGHAYLIPYGNIVQLQLGYLGLLE LAYRSGKYQKIMAMPVYKDDFFEYQYGTDEKLNHIPAQQQTGDAVGYYAFYKLTNGG THFVYWSRQKMNMHQQQYSKGGNVWRNNFDAMALKTVIKDVLKYAPKSIEMGEAVT SDNNNFDFKDGGDIIDVTDYETEEN SCQ72869.1 RecT protein [Propionibacterium freudenreichii] (SEQ ID NO:357) TQQMPIKAQGEPTKELQQKAAVDRFNATLHQMQNEIARALPKHMTGDRFVRIVLTEVR KNPTLALCDPLTMFGSLLTAAALGLEPGLNGECWLVPRKNHGTLEAQLQVGYRGVVKL FWQNPAATYLDTGYVCERDEFRFAKGLNPILEHTPAEGDRGKVVRYYAVAGLNTGAR VFDVFTPAQIKTLRGGKVGSNGDIPDPEHWMERKTALLQVLKLMPKSTQLAAVPAADG RAHTISDAQQIFGGVDPTTGEVLDAEPVEDGAA WP_127100780.1 RecT [Asaia sp. W19] (SEQ ID NO:358) SNALATPTEKLRTQITSMTGEFRNALPSHIKPEKFQRVVMTVVQQNQGLMNADRKSLLA SCLKCAADGLIPDGREAALVMFGQQVQYMPMLAGIQKRIRNSGEIASIQAHVIYENDHFI WHQGIDASIEHRPLFPGDRGKAIGAYAVAKFKDGSDPQFEVMDVAAIEKVRAVSRAGK SGPWVQWWDEMARKTVFRRLSKWLPMDTEAEDLMRRDDENDAQDVAAPTIRVEAEA PSKLDALEHDDDGVVLEETRELEGSAA EIC09117.1 RecT protein [Microbacterium laevaniformans OR221] (SEQ ID NO:359) TDLSTVAAAAKQNPTMKDLVEAQLPAIERQLGGTMNSDAFVRAVLSEITKSPDLMQAD PKTLLGGVMLAAQLRLEIGSGLGEFYLTPRKDHGRMICLPIVGFQGMVKLALRSEFVTN VQAFIVREGDDFTYGANAERGMFYDWTPKDFEEKRPMVGVVATARMKQGGTTWAYL TREQVEDRRPSYWQKTPWGSHPDEMAKKTAVRALAKYLPKATDLGRAIEADEQKVQH VKGLDEVTVTRLDDEPETVVVQETTDAWAATPVAEVQP STDU2-42312.601 (S22-113) WP_136046271.1 RecT [Microbacterium sp. K41] (SEQ ID NO:360)
Figure imgf000180_0001
SKDLSTAAAAAKSQPTMKDLVEAQLPAIERQLGGAMNSDAFVRAVLSEIGKSPDLMNA DPKTLLGGVMLAAQLRLEIGSGLGEFYLTPRKDRGRQICLPIIGYQGMIKLALRSEYVLN VQAFLVREGDDFTYGGNSERGMFYDWTPKDFEESRPWIGVVATAKMRGGGTTWVYLT RTQVIDRRPSYWASTPWKTNEDEMVKKTAVRALAKFLPKSTDLGRALEADEAKVQHL KGVDEVQVTRLDDDAETFVVQEQDPMSRTPEEQAEDEANR WP_136309287.1 RecT [Streptococcus pyogenes] (SEQ ID NO:361) SDLSVAAAAAKTQPTMKDLVEAQLPAIERQLGGAMNSAAFVRAVLSEIGKSPDLMAAD PKTLLGGVMLAAQLRLEIGSGLGEFYLTPRKERGRQICLPIIGYQGLIKLALRSEFVMNV QAFLVRQGDQFSYGANAERGMFYDWVPQDFEETRDWIGVVATARMRSGGTTWVYLT RTQVIDRRPSYWNSTPWKTNEDEMVKKTAVRALAKFLPKSTDLGRALEADEAKVQTLR GLDEVEVTRLDDEADTVVVQEQNPMSRTPEEQAEDAEAQR WP_110990907.1 RecT [Mesotoga sp. TolDC] (SEQ ID NO:362) KASEIASMVKKEDERRNHKPDPLAGIVKNLTSIKGEIANALPDAGITPERMIRIVVTLLRQ NKSLAEAAMQNPASLLGAVMMAAQLGLDPTNGLDQCALVPRKGKVCFDIMYEGLVEL GYRSDRMESIVARTVYEKDTFSLKYGLNEELVHIPYLDGDPGESKGYYMVGKLKGGGN IIVYMTKEQVHKIRDRYSVAYKAGLSGSRKDSPWFTSEDRMGEKTVVKAGFRWIPKSPII RTALALDETAREASRLPMRN WP_109196224.1 RecT [Streptomyces sp. CS014] (SEQ ID NO:363) TENTVTAAVAVRDTGPAAQIEAYRDEYAALVPSHINADQWVRLAVGAIRGNEDLTNAA RTDIGVFLRELKTAARLGLEPGTEQFYLTPRKSKAHRGQKIIKGIVGYQGIIELIYRAGAV SSVIVESVRANDTFRYVIGRDERPVHEIDWFGGDRGDLVGVYAYATMKDGATSKVVVL NHAQVMQIRAKSDSKHSEYSPWNTNPESMWLKSAVRQLMKWVPTSAEYMREQLRAQ AEVAAEQPPAADLPPMPSVELNDEDEAVDAELVDEEA WP_068202759.1 RecT [Isoptericola dokdonensis] (SEQ ID NO:364) TQDLATAIADQQPAQRRTAFDLVESMRGELHKALPEHASIDNFLRLALTELKMNPQLGN CSGESLLGALMTAARVGLEVGGPLGQFYLTPRRLKRDGWAVVPIVGYRGLITLARRAG VGQVNAVVVHEGDTFREGASSERGFFFDWEPAVERGKPVGALAAARLAGGDVQHRYL SLAEVHERRDRGGFKDGSNSPWATDYDAMVRKTALRALVPLLPQSTALSFAVQADEQ VQRYDAGDIDIPALDETDTEDTK WP_114797327.1 RecT [Gaiella occulta] (SEQ ID NO:365) STAVARRDPVAEVCTTIASKEFEAKIVQALPDGVTPARFVRTTLTAIQQNPDVVKGTRQS LYNAVIRCAQDGLLPDGREAALVVFRAKGTDVVQYLPMIGGLRKIAAEYGIKIETAVVY ERDKFEWELGFEPRVLHVPPALGEDRGEPIGAYAVATDKLGRKYVEVMSRQEIEEVRK VSRAATSEYGPWVKWWAEMARKTVGRRLFKQLPLHDLDERGERVISASDAEISFSPSG LDSLPHVDPSEPEEVLTGDVMDDDDDDGIPFGEPAA STDU2-42312.601 (S22-113) PAV10712.1 CBG25_01455 [Arsenophonus sp. ENCA] (SEQ ID NO:366)
Figure imgf000181_0001
NTELETMNNVYDNLQSVIMQQGIAALLPAQVTPEQFTRTAATALIENVDLQNADKQSLV LALTRCAKDGLMPDGREAALVVRSTKVNKQFVKKAVYMPMVDGVIKRARQSGQVANI IAKVVYSQDEFEYVIDENGEHLTHRPAFVDGDDIVKVYAFAKLNSGELVVEVMSRAGV EKIRDTVQSAKYDSSPWVKWFDRMALKTVIHRLARRLPCASELFSLFEVYEDANSTEKT LRMAPASFKRLSIN WP_147981944.1 RecT [Streptomyces sp. ms191] (SEQ ID NO:367) VEHYKADLAQVMPSHVKPDTFIRLAVGVLRRDRNLAQAAQNNPAALMGALMDAAQL GLTPGTEQFYLVPRKKAGRLEVQGIRGYQGEIELIYRAGAVSSVIVEVVRQADTFRYSPG RDERPEHEIDWDAEDRGPLRLVYAYAVMKDGATSKVVVLNRAQVMKAKAMSQGSDS AYSPWQKHEEAMWMKTAAHRLTKWVPTSAEYMREQLRAAAEVAAEHRPTPVAAAPG MPSVPGEDEAIEAEFVDEDEVA BAQ93806.1 phage RecT family (TIGR00616) [uncultured Mediterranean phage uvMED] (SEQ ID NO:368) TSSITPLVAMQGTLEKMADKFTEALPRQMDVNKFISVAKLTLNKNPRLLQADKTSLMQ TFMKAAQDGLYLDGKEAAAVQYGQSVQYIPMVEGIIKVLHNSGLIKTISAEVVYENDFF DYELGTAPKITHKPLIVGDRGKPMCVYAVAITTNDGEYYEVMNMDQINQCRQVSKASS SPHSPWVKWFDQMAKKTVIHRIAKRLPKNDAINSVVTVDDEPNFQQAVNVTPSEPKDS LSRLRDSIGMEGKDVEQAANDLLEKYNKEAREE WP_061405262.1 RecT [Streptomyces] (multispecies) (SEQ ID NO:369) SQISNALATRDQGPAAQIEQYRDEYAALVPSHVNADQWVRLAVGAVRGDEKLMEAAQ NDIGLFLREMKTAARLGLEPGTEQFYLTPRKSKPHGGRKVIKGIVGYQGIVELIYRAGAA STVIVEAVRENDTFRYVPGRDDRPVHEIDWFANDRGPLVGVYAYAVMKDGAVSKVVV LNRSRVMEFKAKSDSKHSEYSPWNTNEEAMWLKSAVRQLAKWVPTSAEYRRDQLLAH TETADSVVASVSTAPLPPQPSALDDADPDDDGPIDAELVD WP_114014965.1 RecT [Streptomyces reniochalinae] (SEQ ID NO:370) SQISNAVAKRDNSPGAMVQQYKADFSTVLPDHVKPDTWVRLAQGVLRRDKNLAQAAE RNPGSLMTALLDCARLGHEPGTESFYLVPFGGEVQGIEGYRGVVERMYRAGAIASVKA EVVCQGDDFDYQPDMDKPRHRVDWFGDRGPIVGAYAYAIFKDGSTSRVAVINRAYIDK VKKESKGSDRATSPWMKWEEQMVLKTVAKRLEPWVPTSNEWRREQLRAAREVANEP TPPTTPAPPAPEQVDPDTGEVIDGELVDDTPTQ WP_027699748.1 RecT [Weissella oryzae] (SEQ ID NO:371) SNNLTSAQYFNAPNIKGKFEEVLGKNANGYVTSLLSVINGSQQLQRAEPSSIMVAAMKA ATLNLPIESSLGFAYIVPYGNNAQFQIGYKGLIQLALRSGQIKGLNSGVVYETQFISYDPL FEELEIDFKKPAEGKIAGYFASMKLTNGFSKVVYWTKEQVEQHRDRFSKGKNNGPWKS DFDAMAQKTVMKAMISKYAPLNQEMQQAIVEDSESELTVPRDVTTSNEAAELNSLLTT PKVQEGANTDLSEPFPNAEETQLFDDLASVTGD STDU2-42312.601 (S22-113) SYW13692.1 Phage RecT family protein [Oenococcus oeni] (SEQ ID NO:372)
Figure imgf000182_0001
SNELKTILNAPTTKEKFDEVLGRNAQGYINSVLNAVGNSKLLQNASPNSILSGAMKAAT LNLSIDPNLGYAYFVPYGHEAQLQIGYQGLIQLAQRSGQIKILNAAPIYDEQFKSLDPVTG KLTLNKKIVPDTNKKPTGYVAYLKTVTGFEHTEFMSYADIEKFAKRFSKSFNSSTSPWK TDFNAMAKKTLIKQVLKYAPMSIDLQTAVSADNDDIEPKDITPDEDKETVDKISNLISDN KQDDTLSQLEEVANANN WP_141158250.1 RecT [Pseudarthrobacter sp. NIBRBAC000502771] (SEQ ID NO:373) TSQLAEATAAKAVEQRKNPTARDLIQAQQAAIETQLAGAMNSAAFVRAAISSVSASPQL QQATPASLLGGIMLAAQLKLEIGPALGHFHLTPRMVSKKDGDNWVEVWTCLPIIGYQG YIELAYRSGRIEKIESLLVRKGDKFDHGANSERGRFFDWAPADYEETREWTGVIALAKIK GAGTVWAYLPKEKVIARRPDRWEKTPWATNEEEMARKSGIRALAPYLPKSTELGKALE ADEHKVEHIAGVHDLVVSKAEDEPLEEPTA TAK04183.1 EPO34_03495 [Patescibacteria group bacterium] (SEQ ID NO:374) TNQPTTHVATTPNQRPATTLEQFRHQLVGDYQKQVLNYFNGQKEKAMKFMSAVVYSA QKNPALLECDRTTLLHAFMACAEYQLYPSSVSGEAYVIPYKGKAQFQLGYQGIITLLYR AGVEAVNAQIICENDAFEYEEGLEPNLVHKPNVLKDRGKPIGVYAIAAINGHKLFKVLSE AEVMKFKGFSQSKNSEYTPWNPDNDPELWMWRKTAIKQLSKLLPKNDALQKAISEDN QDSVIEARRSTLDAGGPAVGRALHDPNASNEPEGK WP_092601202.1 RecT [Actinopolyspora xinjiangensis] (SEQ ID NO:375) TGQTIGTAVAKKDDENPSAIIATNRADLARVMPSHVRTDSWVRIAQGIVRRDKNLAHAA RQSPGTLMVALMEAARLGLEPGTEQYYLTPRKNKGKPEVLGIPGYQGLIELMYRSGAV SSVVVETVRENDTFQWAPGRMERPEHEADWFAINGERGQLRGVYAYAIMSNGATSKV VVLNRNDIARARDSAQGADSEHSPWKNHEEAMWLKTAARRLAKWVPTSTEDRRIVQG VAERSDQPTEAPLDLTDEPDTDQPIEGELVDEEATQ WP_067024969.1 RecT [Mycobacterium sp. 1245499.0] (SEQ ID NO:376) TTQPEYKPVAQGADKQMTTGKLLKMLEPEIGRALPKGMDPDRICRLVMTEVRKNPMLT QCTQESFAGALLTASALGLEPGVNGEAWLVPYRDRKRGIVECQFIMGYNGVAKLFWQS PHADRLDAQLVCANDHFRYVKGLSPILEHVESDGDRGDPIAYYAIVGVKGAQPMWDVF KPEAIAQLRGGRVGTKGDIDDPQRWMERKTALKQVLKLAPKSTRLDLAIRADERSGSDL YKSQGMEVHAIEPGFIETEAEPETQEQ WP_075737485.1 RecT [Streptomyces acidiscabies] (SEQ ID NO:377) TDNAISNAIATRDNGPEAIVQQHRDDLTLVLPAHHKGETWMRLATGALRRDANLRQTA ARNPGSLMNALLECARLGHEPGTESFYLVPFGNEVQGIEGYRGIVERIYRAGAVKAVKA EVVYENDHFRYHPGMDRPEHEPDYFADRGRIIGAYAYGVFQDGSTSRVVVINRAYIDK VKKESKGSDRASSPWVKWEEGMVLKTVARRLEPWVPTAVEWRTEPTPASAAEATAPV GDGVKAIAAPAPTSPYDDEGPIEGEFVDEYDGGAA STDU2-42312.601 (S22-113) AKT73182.1 RecT (prophage associated) [Yersinia pestis] (SEQ ID NO:378)
Figure imgf000183_0001
NQVATLESIHADLSSALTRQGIQSLLPSHVSPEQFTRTAATALVADPELQNADRQSLVMS LIRCAQDGLVPDGREAAMVVYNTKQGDQWVKKAQYLPMVDGVLKRARQSGQVANIT GKVVHMADKFDYWVDENGEHIEHRPAFENHGEIRLVYAFAKLTSGEIVVEVMSRSEVE KVRDATAKKDRDGKPKVPAVWQKWFDRMALKTVLHRLARRLPCASELYSLLDVNQIA DEAEKPAECGAQRESSTTAA WP_123127078.1 RecT [Rufibacter latericius] (SEQ ID NO:379) SNQLQVAREQVISAQKSFKNVPNNKLDFEREAGFAMQMIQSNPFLASMDANSIRNCIVN VALTGLTLNPVLKLAYLVPRKGKLILDPSYMGLINVLVTSGAAKKIEADVVCENDFFDY EKGTNGFIKHKPSLSSRGEIIAAYAIAHLPNGEVQFEIMNREELEKVRKSSEAAKKGSSPY DGWASEMMRKAPIRRLFKYLPKHNIPDQVINTLSLDEQNNGVDFSAQKQEAFKGKAAD FFEDEPANTVDADYTDMSHEEADNELAA WP_093587584.1 RecT [unclassified Streptomyces] (multispecies) (SEQ ID NO:380) SQIGNAIATRDEGPAAQIEVYRDEYAALVPSHVNADQWVRLAVGAIRGNDDLLKAAGN DIGLFLREMKTAARLGLEPGTEQFYLTPRKSKAHGGRPVIKGIVGYQGIVELIYRAGAAS TVIVEAVRQNDVFRYVPGRDDRPVHEIDWFGQDRGPLVGVYAYAVMKDGAVSKVVVL NKARVMELKAKSDSKNSPYSPWNTNEEAMWLKSGVRQLAKWVPTSAEFRRDQLLAHT DTADGVIASVSAPPLPPQSAALEDLDPDDEGPIDGELLDD WP_030975214.1 RecT [Streptomyces sp. NRRL S-1824] (SEQ ID NO:381) SEISNAIATRDQGPAAQIEAYRDEYAALVPSHINVDQWVRLAAGAIRGNEDLMEAARND IGVFLRELKTAARLGLEPGTEQFYLTARKSKAHGYALIIKGIVGYQGIVELIYRAGAVSSV IVEAVRANDTFSYVPGRDDRPIHEIDWFGGDRGPLVGVYAYAVMKDGAVSKVVVMNH KRVMEIKARSDSKNSQYSPWNTDEESMWLKSAIRQLAKWVPTSAEYKSEQLRAHAEAI GELASVASAPLPPQPSVLDDVDPDDEGPIEGELVD RKT60104.1 RecT [Agromyces sp. OV415] (SEQ ID NO:382) STTVALPAQKAEAVIQQVTGAANGFAAALADRIGPDRFVRAAVTSIRTSPQLAQCEPLSI LGGLFVAAQLALEVGGPRGLAYLVPYGREAQLIVGYRGYVELFYRAGARKVEWFIVRD GDTFRQWSTGRGGRDYEWTPLDDDSNRRPIGAVAQIQGAHGEFQFEHMTVDQINERRP KRATSGPWVDWYEEMALKTVMRQLAKTARQSTDDLAFAAANDGAVITQVEGGQARV VHPATSEPEQPLSLDALERTPGELAEETNP WP_017415747.1 RecT [Clostridium tunisiense] (SEQ ID NO:383) TTKANVTSVKNALKEQIQVQQVAAQTDTSFQGVLTKQLQHQFKAIQSLVPKHVTPERLC RIGINAASRNPQLMNCTPETIVGAIVNCATLGLEPNLLGHAYIVPFYNNKTGKMEAQFQV GYKGALDLIRRTGAVSTLSAHEVYGPRSIFWTQYFY RYE05836.1 EOP33_01060 [Rickettsiaceae bacterium] (SEQ ID NO:384) STDU2-42312.601 (S22-113) TNSIETNIEDLSPGNQTKTQISETNAPIVSEIRTIKRDGVYDLCSSRREKVLPFLGNNSQKF
Figure imgf000184_0001
KGWLQLLWGSKLITNAYSCAVYQGDQFEYELGLNPNIKHIPQHKSINDVNELIATYGVI KLKSNEVQLRVCWRDELEKSKKSSKSNGREDSPWNRHFEAMALIVPMRKMAKNLALA LRAEDFDDEDYVNENNNQGMA WP_052399147.1 RecT [Francisella sp. FSC1006] (SEQ ID NO:385) SNLVVAKQCLASAEKSFIGISGDEKKYKRECNFAVQSLQANSYMLQQANANPNSLRNAI INLASMGLTLNPAEKKAYLVPYGGKNPRVDLQISYMGLIDLAISDGAIMWAQAKVVRQ NDLFNITGVDTPPEHKYNPFDSEQTRGDVIGVYCVAKFPNSDYITEIMSITDINSIKSRSSG VKSGNTTPWDTDFDEMAKKTVIKRASKYWKGSSKLSKAIDFLNNENNEGINFNKQEEK PKQNINDLMNDDVVDIDSEVGDE WP_067349107.1 RecT [Streptomyces noursei] (SEQ ID NO:386) TSPIRAAVARRAGDPAALISQYTADFAAVLPSHIKPATFVRLAQGILRRDEKLAQAAAND PGQFMSVLLDAARLGLEPGTEAYYLVPFKGRVQGIVGWQGEIELMYRAGAISSVIVETV REDDVFVWTPGLVDRETPPRWEGPMSYPFHEVEWAGDRGPLRLVYAYAVMKGGATS KVVVLNAQDIERAKKTSQGADSPSSPWRQHEAAMWSKTAVHRLAKYVPTSAEYITAQ VRAVRQADALSAPPVEEVVDVELVGDGQEQEARR WP_143887802.1 RecT [Streptococcus lutetiensis] (SEQ ID NO:387) ANQLQMSHKDFFNRPAVKNKFSEVVSGKSDQFITSLLSVVNNNKLLSKADNNSILNAA MKAATLNLPIEPSLGSAYIVPFKGQAQFQLGYKGLIELAQRSGQYKSINAGVVYKAQFK SYDPLFETLDLDFNQPQDEVIGYFACFELLNGFRKITYWTREEVYNHGKRFSKSFNNGP WKTDFDAMAKKTLLKSIIGTYGPKSVDMQEAITDDNKTEYEKAEPIDVTPQEENLTDLI GETPQEELPIANPETGEIQEEQTALFNQLGDLTDD WP_073793143.1 RecT [Streptomyces uncialis] (SEQ ID NO:388) SQISTAIATRDNGPAAVVEQYRESLALVMPSHLQQRVGAWIRNTQGLLRRDSKLMEAA QNDVGQFVAVLMDAARLGLEPGTEHYYLVPRWNNKKRATEVTGVRGYQGEIELMYR AGAVSSVIVEVVHTQDQFRFRPGRDARPVHDIDWDLEDRGSLRLVYAYAVMKDGATS KVVVLNRQHIAAARAKSDSAAKDWSPWNTDEEAMWLKTAAHRLTKWVPTSAEYLRE QIRAQVAVESEQRPEPLPVAPPPAPGTVDADPDDEGPIDGELVD WP_116200709.1 RecT [Amycolatopsis circi] (SEQ ID NO:389) ISQTVTTAVAQQKDSSPAALVRKYRTDFATVLPSHIKPETWLRIATGALRRSPQLANAA KRNPSSLLVALLEAARKGLEPGTEQFYLVPRKGKNGPEVLGITGYQGEVELMYRAGAV SSVKVEVVREHDTFAYNPGEHDRPVHEIDWRADRGDLVLTYAYAVMRDGATSNVVVL SADDIAVILKKADGADSPFSPWQWNPKAMWLKSAARQLAKWVPTSAEYVRLPDVPLE SLPPAKPLDLPRVDDVVDAEIVEDWPTAPDDTADGAR WP_020135111.1 RecT [Streptomyces sp. 351MFTsu5.1] (SEQ ID NO:390) STDU2-42312.601 (S22-113) SQISNAIEKRDQGPGAVIEQYKQELALVAASHVKVDTFARLAVGALRQNPKLAAAAQS
Figure imgf000185_0001
VVEVVRANDQFNYVPGLHERPVHNVDWFGDRGDLVGVYAYAVMAGGATSKVVVLSR THINRAKAKSDGADSDYSPWRTDEEAMWLKTAARRLGKWVPTSAERLTMPAERTDTV LPVGSAAPALDAADPDEDEGPVDGELEPAGGWPETAQPPQ WP_099421180.1 RecT [Streptococcus macedonicus] (SEQ ID NO:391) ANQMQVSHKDFFNSPAVKNKLSEVVGGKSDRFIASLLSILNNNKLLSSADNNSILTAAM KAATLNLPIEPSLGFAYIVPYKRQAQFQLGYKGLIQLAIRSGQIKSINSGVIYKAQFKSYD PLFETLEVDFSQPEDEVAGYFATIELLNGFKKLIYWTKERAYNHGKRFSKSFGNSPWQT DFDAMAQKTLLKQIISKYAPLSVELQEAITADNENEDEKAAPIDVTPQEESLSDLIGEAA QEELPAADPETGEIQEEQTALFEQLGDLTDD WP_141925904.1 RecT [Haloactinospora alba] (SEQ ID NO:392) GQSVTNAVAQRDTSPSGMVGKYRDDFAQVMPSHVNGAGWVRIAQGILRRDAKLAEAA RNAPQSLMSALMDAAQQGLTPGTTEFYLVPRKRKGSLEVQGITGYQGEIELIYRAGAVA SVVAEIVHEHDTFEWIPGKHERPIHEADWFGNRGTMVGAYAYAVMNSGSTSKVVILNQ HDIEKARAMSDGADSSYSPWQKWPESMWLKTAAHRLAKWVPTSAEYRHEQERARAR SEDTEIPASPDSDVVHAEIVEENDDEQAT WP_136710836.1 RecT [Clostridium tyrobutyricum] (SEQ ID NO:393) SDKKMVVLGESHKALSKLLETKQEALPKDFNKARFLQNCMTVLQDTKDIDKCQPISVA RTMLKGAFLGLDFFNRECYAIPYGGNLQFQTDYKGEIKLAKKYSFNSIKDIYAKIVREGD DFQESIEDGRQTINFKPLPFNNGEIIGAFAVCLFQDGSMLYETMTKQEIEDIRNNFSKAKN SPAWVKTPGEMYKKTVLRRLCKLIELDFDSVETKKTYDETSEFEFGSANHEVSNFDKDD SNIIEADAEIQDDVQEGDGEDE WP_132110073.1 RecT [Actinocrispum wychmicini] (SEQ ID NO:394) SQTVTAAVAQRDNGAQALIAKYRTDFAQVLPSHLRPTTFVRLSQGLLRRNVKLAEAAE RNPASFLAALLECARLGHDPGTDQFALVPFNDRKRNTVEVVGIEQYQGVIERMYRAGA VRSVKAEVVRAADPFEYAPDVMDRPGHKPNWFADRGELIGVYAYAEFFDGSTSRVVM MNRETVMAHKAKSRGATSEDSPWQAWEESMWLKTAVHELEKWVPTSSEDRRAARDG TADPAPVEVPRVADEVLDADLVEDDHADHPTATPTGDVR WP_125769509.1 RecT [Companilactobacillus furfuricola] (SEQ ID NO:395) VNNLAKLPIQTLVKEPKIVEKFESVLGNKSAQFVTSLINVVNSNQSLKNVDQMSVVASA MVAASLDLPINQDLGYMWLVPYGGKAQPQMGYKGYIQLAQRTGQYKHLNAVAVYED EFQSYNPLTEQLDYEPHFKDRDSSEKPVGYVGYFELTSGFEKTVYWTRKQIDDHRQSFS KMSGKSKPSGVWATNFDAMALKTVLRNLISKWGPMSVEMQKAYESDEHATTISANDI KDIEVQEQEPATDVSQLINGSATEVNVNDSTTNSKDSE WP_004234437.1 RecT [Streptococcus parauberis] (SEQ ID NO:396) STDU2-42312.601 (S22-113) ANQLTVVNTLQSDAVKEKFEAVMGEKANGFVSSVLSVVTNNNILSKADFNSVYTSAMK AAVLDLPVEPSLGMAHIVPYKGKAQFQIGYKGLIQLALRSGQVVGLNAGKVYEGQFKS FNALTEKLDIVDIYNPKKDEPIVGYFAYMKLSNGFEKTTYWTKEQVEEHGKKYSQSYDS KFSPWQTNFDAMARKTVLKSILSTYAPLTIEMQNANDFDNGKNTGIEPLEVKDVTPETD NESLLTDLLEDEPSVNTETGEIIEDTELDLDYGQINAK WP_006845711.1 RecT [Weissella koreensis] (SEQ ID NO:397) ANELVKQLKSEKVAAQFETTAGKNAAAFASEVAISVMGNKALENASLSSVVVEATKAS ALGLSLLPTVGEAYLVPYKGQAQFQLGYKGLVQLAMRSGQMKSFGTVKVYEGEHPRW DKYSQELHTDGDETGEVVGYYAQFTLINGFKKADYWTKSAVEEHRSRFSKSKSGPWST DFDAMAQKTVLKSILQYAPKSSEMTRAMASEDMNGDISEGTAKPIDITPETETPKVEEA NQNQQIDTNEMVDEIKEYAKETNEAPKEQTVSAADEFFK WP_073846185.1 RecT [Amycolatopsis sp. CB00013] (SEQ ID NO:398) TTQTVTSAVAQQDSSPAALVRKYRTDFATVLPSHIKPETWLRIATGALRRSSQLAHAAE KNPTSLLVALLDAARKGLEPGTEQYYLVPRKTKRGPEVLGITGYQGEVELMYRAGAVS SVKVEGVREHDTFAYNPGEHDRPVHEINWRANRGDLVLAYAYAYAKMRDGATSNVS VLSADDIAVILSKAEGADSPFSPWQWNPKAMWLKSAAHQLAKWVPTSAERVWQPDGP PLEAPPATPVTLPTVEDVVDAEVVEDWPTTPADTADSEQ WP_142511229.1 RecT [Leuconostoc pseudomesenteroides] (SEQ ID NO:399) ANEITLAKQLSSDKVVEQFAATAGESAKSFAKEVALTISGNPALQHAKLGTVIVEATKAS ALGLSLLPTVGEAYLVPYKGDAQFQLGYKGIVQLAMRSGQMKSFGAESVYEGENPKW DKYNQELVTDGEETGKIIGYYAFFTLVNGFKMAAYWPKEKVEAHRDRFSKSKKGPWST DFDAMAKKTVLKSILQYAPKSSEMKRALAEDTQAEYVQAGIQDVTPEPANIEAPIETAN APEINAQEESLFGELSDVDKETAPNPFAQNLGGDN WP_023055804.1 RecT [Peptoniphilus sp. BV3C26] (SEQ ID NO:400) TNIQKQENRALSPVNQMKNLLANQGMQNLFADALKENKDRFIASIIDLYNGDNYLQNC DPKEVAMEALKAATLNLPINKSLGYAYIVPFKNKGKLTPQFQIGYKGYIQMAQRSGQYK ALNAGIMYEGMEIKRDFLRGTFEIVGEPKSDKAIGYFAYFQLLNGYEKALYMSKEDITD HAKRYSQSFGSDFSPWKNQFDEMAQKTVLRRLLTKYGVLTTEFQEAAKREEDEEVLKA TEENAMIEMNSQEETIAVDPKTGEIIEETEAPF PCR98661.1 RecT [Lactococcus fujiensis JCM 16395] (SEQ ID NO:401) KSAPVQARFQEVLGKKSSGFVSSLLTVVNNNNLLKRATPDSIMTAAMKAATLDLPIEPS LGFAYIIPYGQEAQFQIGYKGLIQLALRSGQITGLNSGIVYKSQFISYDPLFEELEIDFMQP EDEVVGYFASMKLSNGFMKVVYWTKARVENHKKRFSKAGAKSPWATDFDAMAQKT VLKAMISKFAPLSQEMQIAVIADNESETLEPKDVTPEQPLISIDEPKENENSQSQISIPEDQ APQQENEEFVEELFPVGQA WP_106316803.1 RecT [Actinoplanes italicus] (SEQ ID NO:402) STDU2-42312.601 (S22-113) PETIANAVAQRDQSPTALVADYRNDFAAVLPSHLPPATFVRLAQGVLRRDQNLMRTAM
Figure imgf000187_0001
EVVRENDFYEYEEGMPHPIHRYERFASPEQRGPLLGVWAYAVMLDGGMSRPVEMGRE EVLAHRDMNPSNNRSDSPWKKWERSMWLKCAVHELEKWVPSSTEYRREIARMSAPQP AAAAAPVTYVPPQVGQRDAIEGEVAEDWPEPAEVPGGAQ WP_013655830.1 RecT [Cellulosilyticum lentocellum] (SEQ ID NO:403) SDKKELVLKETHSRLNQLLATKMEAMPKDFNQTRFLQNCMTVLQDTKGIENCHPVSIA RTLLKGAFLGLDFFQRECYAIPYGGELQFQTDYKGETKMAKKYSIRDIKDIYAKVVRKG DEFKEEIVAGQQVVDFKPLPFNDAEIIGAFAVVLYQDGGMEYETMSTKQIEGIRDNFSK MKNGLMWTKTPEEAYKKTVLRRLTKKIEKDFASIDQAKAYEESSDMQFKQDEQKQDA KDPFADAVDVEFTEETEGQVRLDGEADGAK WP_148001988.1 RecT [Streptomyces sp. adm13(2018)] (SEQ ID NO:404) SQIGNEIARQSHSPAAIIEQHKADLAVVAASHVRVDTFARLAVGVLRQNEKLAAAAANN PGSLMSALMTAARLGLEPGTEQFYLRPIKRKGQLEVQGIVGYQGIVELIYNAGAAQSVV VEVVRARDEFAWTPGALDEHRPPRWPGAMKQPHHKVDWFGDRGPLVGAYAYAVMQ GGAISKVVVLNRDHIARAKAKSDGADTDYSPWRTDEEAMWLKTAARRLGKWVPTSAE KRTGVIERLDTPPAPLNEIDPDEDDEPIDGELVD WP_011988985.1 RecT [Clostridium kluyveri] (SEQ ID NO:405) PDKKMMVLSESHKALNKLLETKKEALPKDFNKSRFLQNCMTVLQDTKDIDKCQPISVA RTMLKGAFLGLDFFNRECYAIPYNGNLQFQTDYKGEIKLAKKYSINPIKDIYAKVVRKG DEFQESIVNGHQTVNFKPLPFNNDEIIGAFAVCLFQDGSMIYETMTKQEIEDIRNNFSKAK NSPAWVKTPGEMYKKTVLRRLCKFIELDFNSIESKKTYDEASDFQFEHEPNKEVSNFDK GSIDEDKTVEADTETEAKEDNREYAFKESE GAC42786.1 recombinational DNA repair protein [Paenibacillus popilliae ATCC 14706] (SEQ ID NO:406) STSHLLTIHNNLEKLIDSKREAMPKSFNKTRFLQNCMTVLQDTKDVGKCDPQSVARTLL KGAFLGLDFFNKECYAITYGGSVQFQTDYKGEKKLAKKYSVRPVKDIYAKLVREGDEFI EEIKDGQPTVQFKPLPFNDSEIKGAFAVSLFEDDGLAYEVMSVAEIELTRKNYSKQPNGQ AWVKSKGEMYKKTVLRRLCKNIELDFDTIEQAQAFEDSSDFEFNKEPKQAQQSPLNPQA TVIDAEYEEVKEESDNETNQE OBR91022.1 RecT [Clostridium ragsdalei P11] (SEQ ID NO:407) LDKQANGFITSLLNLKQDKLKGCNDMTVLGSALKAAPLKLPIDPNLGFAWIIPFKNHGK LEAQFQVGYRGFIQMAQRSAQYKKLNVTEIYEGQLKSFNPLTEELELDLDNKQSDEVVG YAAYFRRLNGFEKMVYWSKEKVTAHARRFSKSFGNGPWKTDFDAMARKTVLKNMLS TWGILSIDMQEAITSDSKIIKTTEDDYELLEEGTEDESNANVTDVEYTESDESGKEEDGK DPYEGTPFSENNTES SEI77195.1 RecT [Paenibacillus polymyxa] (SEQ ID NO:408) STDU2-42312.601 (S22-113) PDKLLVIHDNLNKMLDEKSEAMPTSFNKTRFLQNCMAVLQDTKDIEQCDAKSVARTML
Figure imgf000188_0001
YAKLVRDGDDFREEIESGQPTINFRPLPFNDGIIRGAFAVALFEDGGMIYETMSLKEIEKT RDDYSKQSTGKAWTKSPGEMYKKTVLRRLCKNIELDFDTIEQAQAFEDSSDFDMNKEL KPQQQSPLNPNTTIIDAEYEEIKEEPADGPEQE KKT72154.1 RecT [Candidatus Collierbacteria bacterium GW2011_GWB1_44_6] (SEQ ID NO:409) SNQIQIKSEVDLKMILANQYMKQINNFFGNEKQAMKFLSSVMSAVQRIPELLNCEPKSLI NSFMTMAQLGLMPSEVSGEAYVLPYNNKNGKVAQFQLGYQGLVTLFFRAGGQKIRAEI VRKNDEVSYVNGEIKHTIDIFKSNEERGEAVGAYAVATINGQEVCKYMNATDILAFGSR FSKSWTTSFTPWKEANDPELNMWKKTVLKQLGKMLPKNESINLAIAEDNKDSIISDRLL PAVEESKNLTMGSIVKTEEPVIEVEPEEIKQ WP_125777163.1 RecT [Antribacter gilvus] (SEQ ID NO:410) SADVVIRQHATELTSVLPSHLAEKGDGWLNAAVAAVRKDRNLWNAANSDPGAVMNA LAEAARLGLQPGSKEYYLTVRGGKVLGIVGYQGEIELMYRAGAVSSVIVEPVFERDGFE YTPGVDDRPKHRIDWDADDRGPIRLAYAYAVMKDGAVSKVVVVNKTRIRRAKDASAT AGKSHSPWTSDEVAMWMKTAAHDLAKWVPTSAEYIREQLRAVKEVEAEPARASDPRP EPVHIVEAQILDEDPFPNAPEDGAA WP_130123223.1 RecT [Lactococcus sp. S-13] (SEQ ID NO:411) SNQITKTQQTLKSPEVKAKFEEVLGKKADGFVASLLSVVGNSNLKTVEANSVMTAAMK AATLDLPIEPSLGFAYVIPYGREAQFQIGYKGFIQLALRSGQLTGLNCGIVYESQFVSYDP LFEELELDFTQQASGDAVGYFASMKLANGFKKVTYWSKEQVLAHKKKFVKSANGPWR DHFDAMAQKTVLKAMLTKYAPASIESKMIQTAITEDDSERFENAKDVTPDEPVISIDEPV TSEVSQNESSAESQEQFPEDEVEELFPIGKS WP_147265819.1 RecT [Nocardia puris] (SEQ ID NO:412) AESISSEVARQASPLAVVARYRSELAGSLPAAVRHDVDRWLMVAEMAVRRSPDLMEIV RRDQGASLMRALIECARLGHEPGSPEFYLIPRGGIVSGEESYRGIIKRILNSGEYQRVVAR VVHERDRFSFDPRIDEIPDHRPAEGERGAPARAYAFAVRWDGTPSTVGEATPERIIAAKA KARGVDRKDSPWNSPTGVMYRKTAIRELASYVHTSAEPRPRPAAPTEPPAVDEVSTVY DAEVIDEVDVLDITAEPTA TCP18101.1 RecT [Nicoletella semolina] (SEQ ID NO:413) LKNADPQSVFNAACMAATLNLPIQNGLGFAYIVPYQNKKEKKTEAQFQLGYKGLIQLA QRSGQFKRLVAVPVYEKQLIAEDPINGFEFDWKQKPENGEKPIGYYAYFKLLNDFTAEL YMTTHEVDEHAQRYSQTYRTYLDKKSKGQWASSVWADNFEAMALKTVMKLLLSKQA PLSVEMQQAVLADQAVVKNVETNEFSYVDNQIEEAEYTELKVSTDIFEKCKQSILNKET TLQELCDSGYEFSQEQYAELEKLEVE OAB27843.1 recombinase [Paenibacillus macquariensis subsp. defensor] (SEQ ID NO:414) STDU2-42312.601 (S22-113) SDKLLVIHKNLENLLDSKREAMPSNFNKTRFLQNCMTVLQDTRDIDKCDATSVARTML KGAFLGLDFFNKECYAITYAGAVQFQTDYKGEKKLAKKYSVRPVRDIYAKLVKEGDDF KEEVKDGQQTIQFAPKPFNDGEVLGAFAVALFEDGGLVYEVMSKVDIETTRKNYSKQA NGQAWTKSPGEMYKKTVLRRLCKNIELDFDTIEQAQAFEDSGDMDLNKEVKPPQVSPL NATVIDAEYTEIREGDPNATNQE WP_019417330.1 RecT [Anoxybacillus] (SEQ ID NO:415) ATTQSLKNQIAKKQNSNIQQGVTLKQLLNSESMKKRFEEVLGKRAQQFATSILNLYNSE KMLQKCEPMSIISSAMVAASLDLPVDKNLGYMYIVPYGTTATPIMGYRGYIQLALRTGQ YKHINVIEVYEGELQKWDRLTEEFEMDSKQKKSDVVVGYAAYFELINGFRKTVYWTRE QIEAHRKKYSKSDFGWKNDFDAMAKKTVLKSLLSKWGILSIEMQNAFNEDEKEVDTKE VKDITSEVQEAEYIEAEAFEVPIETETPQQEEIVFDAQ CDA71469.1 phage RecT family [Ruminococcus sp. CAG:579] (SEQ ID NO:416) NERTNLQYAPAPVERFKECLNSHEIKARLKNSLKNNWTQFQTSMLDLYSGDAYLQKCD PMAVALECVKAATLDLPISKSLGFAYVVPYNNVPTFTLGYKGLIQLAQRTGQYRTINAD VVYEGEIRGADKLSGMVDLSGERTGDEVVGYFAYFKLINGFEKMIYMTRAEAEKWRD DYSPSAKSKYSPWRTDFDKMALKTCIRRLISKYGIMSVEMQGVMTEEAEPRAAAAAKR AEETVQANANSKVIDIDAAPPAANESPAEAAPQPDF WP_019108121.1 RecT [Peptoniphilus senegalensis] (SEQ ID NO:417) TNQIARKPVNEIKNVLSVPSVRNLFDNALADNAGAFVSSLIDLYGGDSYLQNCEPKDVV MEALKAATLKLPINKNLGFGYVVPFKNKNGKLVPTFIIGYKGLIQLAMRTGQYKAINSGI IYEGMEIKEDVLRGTLEIKGSKQSEKIKGYFAYFQLINGFEKALYMDVEEAADWGRKYS KSFAKGPWTTEFDAQAQKTCLRRLLSKYGVLSTEMQRLEKTEEDVDIAVGTIENNAVEE LNIPSSQADYIVDEETGEILDDEEIVAPF AFH22576.1 RecT family protein [environmental Halophage eHP-30] (SEQ ID NO:418) TEQNQTPAKTESKSPIKAQLYKDNVQQRFQELLGERASAFMTSVMSVVKDNDQLSQAE PSSVLNAAMTAATLDMPIDNNLGMAYIVPYKDGKSGKTYAQFQLGYKGFIQLAQRSGQ FKTISATPVRQGQIVTADPLRGYEFDFTQGQDKEVVGYAAYFALLNGFEKTLYMSKAE MEQHAASYAAGYKKGYSNWNRKFDEMALKTVIKQLLSKYAPLSVDMQKAQQTDQTV SVEEPNAIEQQEAAPEIDASSNNNQNQ WP_138067957.1 RecT [Streptococcus pseudoporcinus] (SEQ ID NO:419) ANQLTVVNTLQSDAVKEKFEAVMGEKANGFVSSVLSVVTNNNLLAKADFNSVYTSAM KAAVLDLPVEPSLGMAYIVPYKGKAQFQIGYKGLIQLAQRSGKVTKLNSGKIYKGQFKS YNALSEELDIDDIYTPKEDEEVVGYFGYMKLSNGFEKITYWTKERVEKHGKKYSQSYDS KFSPWQTNFDAMAEKTVLKSILSTYAPLTIEMQNANDFDNGKNTGIEPLEVKDVTPEND NESLLSDLLEDEPSVDAETGEIMENTELDLDYGSINAK WP_072904346.1 RecT [Hathewaya proteolytica] (SEQ ID NO:420) STDU2-42312.601 (S22-113) ADSKKELILKESYSVLDRLIETKISAMPKDFNRTRFLQNCMTVLQDTKDIEKCQPISVART
Figure imgf000190_0001
FEEEIKEGQQFVNFKPIPFSDKPIIGAFAVVLYQDGGMEYETMSKTQIEGIRDNFSKMKN GLMWTKTPEEAYKKTVLRRLTKKIEKDFDTIEQAKTYEETSDSEFKKEEKCNEKSVFDV EYSEVESEELEQQTMLENSPFGGEQ GAE17732.1 RecT [Bacteroides pyogenes DSM 20611 = JCM 6294] (SEQ ID NO:421) QVADPQSVLNSAVIAATLDLPINPNLGFAAIVPYNDRKSGKCIAQFQLMYKGLVELCLRS GQFASLIDEVVYEGQIVKKNKFTGEYIFDEDAKTSNKVIGYMAYFRLVNGFEKTFYMTS EEVTAHAKAYSQSFKSGYGVWKDNFDIMARKTVLKLLLSKYAPKSIEMQRAITFDQAA VKGDLTETNVDEAEIEYIDNESGSDKIKQAAEDAVIQSQQKTLL CDF09406.1 [Eubacterium sp. CAG:76] (SEQ ID NO:422) AERKQITTKEYLAEVKGGLENELNLNAKALPENFNQSRFVLNCISLIKSNLSNYNNITPES VYLALAKGAYLGLDFFNGECYAIPYSGEVNFQTDYKGEIKLAKTYSRNPIKDIYAKNVR DGDFFEEIIESGKQSVNFRPVPFSDKKIIGTFAVVLFKDGSMMYDTMSVKEIEEVRNNFS KAKNSKAWAATPGEMYKKTVLRRLCKLIDLDFNSQQRLAYEDAGDFDKEKADEPVAD DTVNVFDAEFKEVEPENKDAAIIEEMGLEEA WP_099299656.1 RecT [Pediococcus pentosaceus] (SEQ ID NO:423) MNDISKVPMKVLVQQDKVQRMLENTLKGKTRQFTTSLINVVNSNQSLADVDQMSVIKS AMVAASLDLPIDQNLGFMWLVPYKGMATPQIGYKGYIQLALRTGQYKKLNTIVVHEGE MKYWNPLTEDFEYDPKGKESDEVIGYLGYLRMINGFEKTVYWTKQNIEDHRMKFSKM SGKAKPSGVWASNYDAMALKTVMRNLLSKWGIMSIEMQQAVVQDEKAPETDVRDVT PTETNSIDSLLAPEPKGEPINDSNEATVPTNAE WP_118227047.1 RecT [Bacteroides eggerthii] (SEQ ID NO:424) GTVTTVPQLKSMLANENVKSRFKEILGKKAPGFISSIVAVANSNTLLQKAEPQSIMNAAV IAATLDLPINPNLGFAYIIPYGNQASFQIGYKGMTQLAMRSGQYKTINVTEVYEGEIKSEN RFTGEYTFGERKSDKIVGYMAYFSLTNGFEKYMYMSREECEKHGKKFSQTYKRGGGL WATDFDSMSKKTVLKMLISKYGILSIDMQRAQTFDQAVVKDDLVEKNIDEAEVSYEDN PTNADVRRNAMKEALEEAEVVDETTGEIFNQPAQ WP_094754495.1 RecT [Criibacterium bergeronii] (SEQ ID NO:425) EVNNMNNQMQQTATQVTPINQMKNLLANKGINQMFEQALKMNAGAFISSLIDLYNSD GYLQKCEPKDVAMEALKAATLNLPINKGLGFAYIVPYGKAPQFQIGYKGYIQLAMRTG QYKHINAGAVYEGEEVKENRLAGTVEILGDKKNDNETGYFAYFKLTNGFEKCLYMSKQ EMTTHAQRFSKAFKNGPWQSDFSAMATKTVLRLLLSKYGVLSTQMQEAIAKENDDELQ QQINQNANKEVIDIEKIDNKNVIDIEAIDAADDDIEAPF WP_045553720.1 RecT [Listeria] (multispecies) (SEQ ID NO:426) STDU2-42312.601 (S22-113) ATNDELKNQLANKQNGGQVASAQSLDLKGLLEAPTMRKKFEKVLDKKAPQFLTSLLNL
Figure imgf000191_0001
ALRSGQYKSINVIEVREGELLKWNRLTEEIELDLDNNTSEKVVGYCGYFQLINGFEKTVY WTRKEIEAHKQKFSKSDFGWKKDYDAMAKKTVLRNMLSKWGILSIDMQTAVTEDEAE PRERKDVTEDESIPDIIDAPITPSDTLEAGSEVQGSMI WP_106024518.1 RecT [Clostridium thermopalmarium] (SEQ ID NO:427) ATVNELKNEIATKKETGVGSAGNTIKGLINSPAIKKRFEDVLNKKAPQYMSSIVNLVNGD TNLKKCDQMSVIASCMVAATLDLPIDKNLGYAWIVPYGNRAQFQLGYKGYVQLALRT GQYKAINVIEVHEGELIEWNPLTEELKIDFSQKKSDAIIGYAGYFELLNGFKKSTYWTKE QIIRHKNKFSKSDFGWKKDFDAMAKKTVLRNMLSKWGILSIEMQNAYTADQATIRPEA VETGDIKGNVDYVEADFEENYEGTPFEEVEEGGVNE WP_073010654.1 RecT [Virgibacillus chiguensis] (SEQ ID NO:428) ATNSSLKNQIANKGNGNQNTPQGYTVKQLMSASSVKNRFEETLGKKAPQFMASVINLV NGDTNLQKCDQMSVVSSAMVAAALDLPIDKNLGYAWVVPYGNKATFQMGYKGYIQL ALRTGQYKNINVIEVYEGEVKSFNRLTEEIELEFEGKESDKVIGYVGYFELINGFRKTVY WSKDEIERHKKRFSKTGFAWKDNYDAMAKKTVIRNMLNKWGILSIDMQTAVTTDGNA VTQDFEQEDSGLVIDAEFSEVNEASEGQQEIKFENADA WP_111921306.1 RecT [Clostridium cochlearium] (SEQ ID NO:429) ATNESLKNQLATKKETGIGSAGNTIKSLINSPVIKKRFEEVLDKRAPQYMSSIVNLVNSDT NLKKCDQMSVIASCMVAATMDLPVDKNLGYAWIVPYGNKAQFQMGYKGYVQLALRT GQYKSINVIEVHEGELEEWNPLTEELKIDFSKKESDAVIGYAGYFELLNGFKKSTYWTKE QITKHKNKFSKSDFGWKKDFDAMAKKTVLRNMLSKWGILSIEMQNAYTADHGVIKNEI METGEVKENVEYIEADFESYEGTSIEEGGSNE WP_019125538.1 RecT [Peptoniphilus grossensis] (SEQ ID NO:430) TNIQKQENRALSPVNQMKNLLKNQGMQNLFADALKENKDRFLASIIDLYNGDTSLQDC NPKEVVMEALKAATLNLPINKNLGYAYIVPYNSKGTTRPQFQIGYKGYIQMAQRSGQY KALNAGILYEGMEVKRDFLRGTFEIIGEPKSDKVMGYFAYFQLLNGYEKAIYMTKDEVT EHAERYSQSYGSKYSPWKKQFDEMGQKTVIKKLLSKYGVLTTEFQDAVKEEEDREVLR ATENNAMLEMTNPDEEEETIEVNPETGEIIEDDVKAPF ERL63827.1 YqaK [Schleiferilactobacillus shenzhenensis LY-73] (SEQ ID NO:431) SAVSESKDLQHVDQLSVLNSAMTAASLNLPINQNLGFFYLVPYKGIAQAQMGYKGYIQ LAQRSGQYQRLNAIPVYADEFGSWNPLTEELDYTPHFEDRKASDKPVGYVGFFKLANG FEKTVYWSRKQIEAHRDRFSKSSKSSASPWNTDFDAMALKTVLRNLITKWGPMTTDIQR ANDADEGDYKNDLSTDTSEPKDVTPGASLEQFLGETDQQQKPATKPAPKKKAEEAKPN DLKPDVTHDPNEHTEQTSLSDDDLPFD WP_051267408.1 RecT [Gulosibacter molinativorax] (SEQ ID NO:432) STDU2-42312.601 (S22-113) TDLTEKIATKAVAVKKDPKIADLMKSYEPQFARSLGKSMDAAKFGQDALTAIKQTPKLL
Figure imgf000192_0001
YSKVGAFTVHANDHFRTGANSERGEFYDYERATGDRGELTGVIGYAKVKGFDESSFVY LDAATVRERHRPKFWDKTPWASDEGEMFRKTAIRVLQKYLPKSIEAAPLALAAQADQA TVRRVDGVDDLQIDHEDIAIAEVIEDD WP_112330076.1 RecT [Cereibacter johrii] (SEQ ID NO:433) TENTAQAPAAARQLTPIQAISQTLESDAFAPKISASLDGTGISPARFKRAALACLSRPEAS YLVEKCDRGSIFTAVMNAAAAELELHPALGQAYIVPRGGQAVLQVGYKGFIALASRAG LAVEADVIYAGDRFSIRKGTNPDVSVEPELDPAKRGEWVAVYVITHYASGAKTLTFMTR AEVEAIRNRYSDAYKRGGAGAKTWNESPEEMAKKTCIRRASKLWPISVPGGGDDDGGE VIEADPAPVPAPRMRDVTPGGGLDRLAASL WP_063601171.1 RecT [Clostridium coskatii] (SEQ ID NO:434) SDKKMVVLNESHTMLNKLLETKQEALPKDFNKARFLQNCMTVLQDTKGIEQCQPITVA RTMLKGAFLGLDFFNKECYAIPYKDNLQFQTDYKGEIKLAKKYSFNPIKDIYAKIVRQG DDFQEAIINGQQTINFTPVPFNNGEIIGAFAVCLFQDGSMLYETMAKQEIENTRKNFSKAP NSPAWTKTPGEMYKKTVLRRLCKLIELDFDSVECKKVYNETSDFEFENQQHEVSNFDK KDIDEDKIVEADVEVQDDNENNVPEDGE WP_118206945.1 RecT [Bacteroides stercoris] (SEQ ID NO:435) STITTIPQLKSMLANDNVKARFKEILGKKAPGFISSIVAVANSNTLLQKAEPQSIMNAAVV AATLDLPINPNLGFAYVVPYGNQAQFQMGWRGFVQLAMRSGQYKTINVNEIYEGEIKK SNRFTGEYEFGERASDKIVGYMAYFSLINGFEKFLYMSKEDCEKHGRKFSQTYKRGTGI WSTDFDSMAKKTVLKMLLSKFGILSIEMQRAQTFDQAIIKDNLAETDIDEAEVSYNDNP DNEEARRNAMKEALQEAEVVDENTGELFNTETK WP_099840029.1 RecT [Clostridium combesii] (SEQ ID NO:436) ANTKAIVLQETANNLNTLLKAKVKALPKGFNETRFLQNCMTVLQDTRNIEKCNSVSVA RTMLKGAFLGLDFFSKECYAIPYNDYKTGKCHLEFQTDYKGERKLMKQYSVRPIKDIYA KVVREGDKFEEIIEKGIPTINFRPKPFSNEKIIGVFAVVLFEDGGLLYETMSVEDVEKIKVG FAKRDKEGNYSKAWTATPEEMYKKTVIRRLRKSVELEFDSVEQQKTYEEASEFDVKRD EEVKEEASPFENVDFEEAEEGNTIEAKQE WP_069686512.1 RecT [Oceanobacillus sp. E9] (SEQ ID NO:437) ATNDSLKNQLSSKQGNQNTPSGYTIKQLMGAESVKKRFEEMLDSKASQFMASVINLVN GDTNLQKCDQMSVVSSAMVAATLDLPIDRNLGYAWVIPYGNQATFQLGYKGYIQLALR SGQYRNINVIEVYEGELQSFNRLTEEIELDFEKRTSDKVIGYTGFFELINGFRKTVYWSKA EIEKHKNKFSKSGFGWKNDWDAMAKKTVVRNMLNKWGILSIDMQKAYVEETKDPSEP NGEVIDLNLTEDELTAAQEQFSDENANE RMD50745.1 [Candidatus Parcubacteria bacterium] (SEQ ID NO:438) STDU2-42312.601 (S22-113) TEYKRPQQPEQTKMLSAKLNQAGAPNKVSSFDVQLRDWFKKHSRKMQTLAGSKEEAN
Figure imgf000193_0001
GLCKLAYNSGVVRSIATEVVYANDLFEFELGTNAYLRHVPTLSDNRGERIAAWCVVKT THGEVIIVKPISFIEGIRKRSPAGNKKDSPWNTSDDDYDAMARKTVLKQALKTIPKSSDL AAAIQVDNAVESGSVDNVVTHIEPVTDPTPEETKE WP_061413958.1 RecT [Lactococcus sp. DD01] (SEQ ID NO:439) ANLTPTQTVLKSDAAKRKFEEVLGKKTNGFVGSLLSLVGSTNLKNVDSNSVMTAAMK AATLDLPIEPSLGFAYVIPYGREAQFQIGYKGLIQLAIRSGQVTKLNAGPVYENQFIKYDS LFEELEIDFSMPQGVEIAGYFASMELANGFRKIIYWDKEKVTAHGKRFSKSFNRSSSPWQ TDFDAMATKTVLKAMLSTYAPLSTEMQQAIVADNESATPKDATPVTDDLVLEAVEDSK QIEENEIINDQVASENYQEPQGEPEVLDLEL WP_147129628.1 RecT [Nocardia ninae] (SEQ ID NO:440) AESISKEVARQANPLAVVAKYQNELGKSMPAAIRGDVGRWMMVAEMAVRKNPKLLSI VQADQGASLMRALIECARLGHEPGTKYFYLVPRGNQISGEEGYHGIIKRVLNSGHYQKV LARTVFERDEYSFDPLTDQLPTHVPASGERGKPVSAYAFALHWDGTPSTVAEASPERIA AAKAKSYGTDRKDSPWQSVTGVMYRKTAIRELEPYVHTSAEPQPRQDNAGSRGAVMD PSTYDDAEPLDADVLDITADQIAEHDGEGAL WP_074846740.1 RecT [Clostridium cadaveris] (SEQ ID NO:441) ATNSSLKNQLSKKENVTIGNTMQGLLNNPKMKKRFEEILDKKAPQYMSSILNLYNGDTS LQKCEPMSVLSSSMIAATMDLPVDKNLGYAWIVPYKNKAQFQMGYKGYIQLALRTGQ YKHINAIEIHEGELVNWNPLTEELEIDFTKKESDKIIGYAGYFELLNGFKKSTYWTKTQIE NHRKKFSKSDYGWNKDFDAMAIKTVIRNMLSKWGILSIEMQNAYTADENIIKDSFIDDS ENVSANIEDLVEADYTVNQDSLESKEEFEGTPLE WP_038246219.1 RecT [Virgibacillus] (multispecies) (SEQ ID NO:442) ATNDSVKNQIANKNQGSNQVNPNNLGLKQLLSTPTMRKKFDEVLDKKAPQFMSSLLNL YSNDSYLQKAEPMSVVTSALVAATLDLPIDKNLGYAWIVPYGGKAQFQLGYKGYIQLA LRTGQYRNINVIEVYEGELKSFNRLTEEMELDFEQKQSDKVIGYTGYFELINGFRKTVY WSKEEIEKHKKRFSKSDFGWKKDWDAMAKKTVIRNMLNKWGILSIDMQKGIVEDNKD PIEKANEFDEQDIIEADFSEVNDDQEIDFSDAQ WP_106064284.1 RecT [Clostridium liquoris] (SEQ ID NO:443) TTASELKNQLATRKETGVGSAGNTVKGLLESPAIKKRFEEVLKQRAPQYMSSIVNLVNG DANLKKCDQMSVIASCMVAATLDLPVDKNLGYAWVVPYGNKAQFQLGYKGYVQLAL RTGQYKSINVIEIHEGELIEWNPLTEELRIDFEKKKSDAIIGYAGYFELINGFRKSTYWTKE QITKHKNKFSKSDFGWKKDFDAMAKKTVLRNMLSKWGILSIEMQNAYTADQETIKSEV LETGNIKENVEYVEADFDVDFEGTPFEEGVTNE WP_028562280.1 RecT [Paenibacillus pinihumi] (SEQ ID NO:444) STDU2-42312.601 (S22-113) ADANKLLVINEKLIKLIESKQDAMPKSFNKTRFIQNCMAVLQDTDEIDKCDATSVARTLL
Figure imgf000194_0001
KEVKDGQRTIQHKPPEGFNDGKVIGAFAIVLYKDGGMDCESMSVAEIETTRKNYSKQA NGPSWTKSPGEMQKKTVLRRLCKTIQLDFDTIEAKEAFEDGGDFDFKQDPKPQQQSPFD KNATVVDAEYEEVEEEDQSESAT WP_068672306.1 RecT [Oceanobacillus sp. Castelsardo] (SEQ ID NO:445) ATNSTLKNQISNKKQGNNQVGKTQGTTMKQLLASPAVMNRFEEVLGKRANQFTASILG LYNSEKMLQKAEPMSVISSAMIAATLDLPVDKNLGYAWIVPYGGKAQFQMGYKGYIQL ALRTGQYRNINVIEVYEGELKKWDRLTEEIELDFESRTSDKVIGYTGYFELINGFRKTVY WSKEDVEKHKKRFSKSDFGWKNDWDAMARKTVIRNMLNKWGILSIDMQKGMVEDSK DPVEVNEEFSSDVIDADYEVVGENEQQDFTVEENA WP_067592792.1 RecT [Nocardia terpenica] (SEQ ID NO:446) SSIAAAAESAEVTPASIINKYRDDIATVLPPKLRERIDRWIRLAIGAVNSNPELISRVRADQ GASMMQALMKCAALGHEPGSGLFHLVPKGSRIEGWEDYKGILQRIDRSGVYARTVIGV VYANDEYSYDQNVDERPRHVRATGDRGEPISSYAYAVYPSGAITTVAEATPEQIASSKS KARGADNAASPWRAPGAPMHRKVAVRLLEKHVATSAEDRREPISRSAANDVVIDATA DYYQEP WP_079708113.1 RecT [Paraliobacillus ryukyuensis] (SEQ ID NO:447) ATNDTLKNQISNKKNNQVAEGKQGTTMKGLLNSPAVMKRFEEVLGKRANQFTASILSL YNNEKTLQKSEPMSVISSAMIAATLDLPIDKNLGYAWIVPYGNKAQFQLGYKGYIQLAL RTGQYRNINVIEVYEGELVKWNRLTEELELDFEQKKSDKVIGYTGYFELINGFRKTVYW SKADIEKHKQKFSKSNFGWSNDWDAMAKKTVIRNMLNKWGILSIDMQKAYSTDEIEQE QESNDFIDGEWAEVSEDDITEAMNEV OLA20462.1 BHW17_09115 [Dorea sp. 42_8] (SEQ ID NO:448) AVNNSLAKRDQSMKLSVYLQNDAVKKQINQVVGGKNGTRFISSIVSAVQSTPALQECTS PSIVNAALLGEALNLSPSPQLGQFYMVPFDNRKKGCKEAQFQLGYKGYIQLAERSGYYK KLNVLAIKEGELIRYDPLDEEIEVELIDDDVIREETPAMGYYAMFEYENGFCIQQKWRSE DFGTFRAGQNSGKGSLEVFFFLVQRF WP_058906805.1 RecT [Lactiplantibacillus plantarum] (SEQ ID NO:449) SNELAHMPMKQLVKQDAIQQMLSRTLADKASQFSTSLINLVNGNQSLAKVDQMSVIQS AMVAATLNLPIDQNLGYMWLVPYKGRATPQIGYKGYIQLAQRTGQYLAMNAIAIHSGE LKGWNPLTEDFQFDPMGRTSDEVIGYVGYFKLTNGFEKTVFWTKASMEEHRMSFSKMS GGKTPQGVWASNYDAMAIKTVLRNMLSKWGPMSIEMEQALANDETAPQTPLNVEAEE SASETTDNMLDKFRQQQGEVNTSDQEHNTEDQGDPRDQS RZT66774.1 RecT [Leucobacter luti] (SEQ ID NO:450) STDU2-42312.601 (S22-113) SDLSQAAVAVKKSKTVEDYLTEYEPQFQRALGKSMDAAKFSQDALTAIKQTPQLGQAD
Figure imgf000195_0001
GAFLIYEKDYFDEGANSERGEFYDFKKSRGDRGPVVGVIAYVKLKGFDESQYVFLDAD TIRSRHRPRYWEKTPWGSDEGEMFKKTGVRVLQKLLPKSVEAAPLALAADADQATVR KVDGIEDLTIQHDVVDAEVVPDGVPV WP_087916041.1 RecT [Paenibacillus donghaensis] (SEQ ID NO:451) SNTQLATIHNNLERLIDSKRDAMPSSFNKTRFLQNCMTVLQDTYGIEKADPVSIARTMLK GAFLGLDFFNKECYAIIYGGKVEFMTDYKGEVKLAKKYSIKRIKDIYAKVVRAGDEFEE TIEGGNQSINFKPLPFNDGEVLGAFAVVVYEDGSMNYDTMSVKEIESIKENFSKKSKDTG QFSKAWVVTTSEMYRKTVLRRLCKNIELDFDTIEAKQAFEDGGDFEFNKDKKPAQESPL NPKSTVIDGEFTAVGEGAADGTE WP_009411480.1 RecT [Capnocytophaga sp. oral taxon 324] (SEQ ID NO:452) ETQVLQKQSLANFLNKSDKFLEQNLGAKKSEFVSNLLALSDSNKELSQCEPADLMKCA MNATALNLPLNKNLGYAYVIPYFDGKTNRTIPQFQMGYKGFVQLAIRSGQYKTINTCEI REGEIKRNKVTGHIDFLGENPSGAVIGYLAYIELLNGFQQSLFMTIEEVQAHARKYSKIY AKTNRGLWKDEFDLMAKKTVLKLLLNRYGVLSVEMQKAIEKDQADNEGNYIDNPQGR YIQDAEVIEQNEPTENAQPVQPVTSEEPNKVDFKDV WP_116232802.1 RecT [Paenibacillus sp. VMFN-D1] (SEQ ID NO:453) AKALLENKLQERAAGASTPSTQGTSLKALLNSPAIKKRFDELLDKRSAQYMTSIVNLYN SDAMLQKAEPMSVISSCIVAATLDLPVDKNLGYAWIVPYSGKAQFQLGYKGYIQLALRT GQYKAINVIEVYEGELVKWNPLTEALELDFEKRKSDAVIGYAGYFELINGFRKSVYWTR EQIESHRKKFSKSDFGWKKDYDAMAKKTIIRNMLSKWGILSIEMQDAYSKEIEAIPPLNN ENEEDPPIDLTPEDYRVGDEPQDGKEQGEMNFE WP_123849158.1 RecT [Chitinophaga lutea] (SEQ ID NO:454) SNVNAPAAPVKSKIEVLKDIMNAPSVQEQFQNALRENSGVFVASVIDLFNSDTYLQNCE PKQVVMECLKAATLKLPINKNLGFAYVVPYKSNGKQIPQFQIGYKGYIQLAMRTGQYRI INADKVYEGEYRTKNKLTGEFDLSGTATSETVVGYFAHIEMLNGFAKTLYMTKEKVAA HAKKYSKSFGKETSPWHTEFDAMALKTVLRNLLSHYGYLSVEMMGAMNADIESDQVG SEVSQTINDKANKQEMTFDDAEVVDDDEKEQNPI WP_078410260.1 RecT [Priestia abyssalis] (SEQ ID NO:455) ATNQSLKNQLQSRQSAGTPAQQSNSLKALLSSPTVKKRFEEVLDKRSAQFMTSIVNLYN SEKMLQKCEPMSVISSAMVAATLDLPVDKNLGYAWIVPYKNTASFQLGYKGYIQLALR TSQYRFINVTPVHEGELMKWNPLTEEIEIDFDARQSDVIIGYAAYFELLNGFRKTVYWTK NQVEKHRKKFAKSDFGWKNDYDAMAMKTVLKAMLSKWGILSIEMQKAYSEDEEPREL KDITEEAQEVDYIEAEVIDVPAEEKASAFDQENFHIE AAT90028.1 phage recombination protein [Leifsonia xyli subsp. xyli str. CTCB07] (SEQ ID NO:456) STDU2-42312.601 (S22-113) AVKKNPTIEDYLIKYEPEFQRALGASMDAAKFAQDALTAIKQNPKIGHSDPRSLFGALFL AAQLKLPVGGPLAQFHLTTRTVKGNLTVVPIVGYGGYVQLIMNTGLYSRVSAFLIHAGD YFVTGANSERGEFYDFRRADSDRGEVKGVIAYAKVKGHNESSWVYIDAETMRAKHRP KYWESTPWADDAGEMFKKTGIRVLQKYLPKSVESLNVALAASADQAIVRKVDGVPDL DIQHDRDTETVAVPEQPVSVPQPGDET WP_080022455.1 RecT [Clostridium thermobutyricum] (SEQ ID NO:457) QSTGDIVFPQNYNYSNALKSAQLILAETVDRNKVPVLQSCSKPSICNALLDMVIQGLSPA KKQCYFVPYGGKLQLMKSYLGNIAATKRLKGVKDVFANVIYEGDVFEYKLNLNTGLIEI EKHEQKFENISKKILGAYAVVVRENQNNYVEVMNIEQIKNAWNQGAAKGNSQAHKNF AEEMAKKTVINRACKRFVNTSDDSDTLIESINRTNEYKEEDIIETTKSEVGEEIKENANTE NLGLEDTEVVEAEVIENIEFEGDK WP_081759639.1 RecT [Clostridium jeddahense] (SEQ ID NO:458) LGERTPQFISSIVSLVNADANLQRAFYDAPVTVIQSALKVATFNLPIDPNLGYAYIVPFNN TVKNPDGSIRKRIEASFIMGYKGMNQLALRTGVYKTINVVDVREGELKSYNRLTEDIEL DFVEDDEEREKLPIIGWVGYYRLINGTEKTIYMTRKQIETHEKKNHKGQYMGKGWRED FDSMAMETVFRRLIGKWCLMSIDYQRANPGTLAAADALAHGQFDDEDPLPDAVPLQAE AQEVNPETGEVQS WP_089281299.1 RecT [Anaerovirgula multivorans] (SEQ ID NO:459) DAKHLTVVHQNLNTLLKAKADALPKGFNQTRFLQNCMTVLQDTKDIESVEPKSVARTM LKGAFLGLDFFNKECYAIVYNKKAGNSWIKTLEFQTDYKGEIKLAKKYSINTIKDIYAKL VREGDEFEEGVKDGKQVINFKPKPFNNNKILGAFAVAYYENGSMIYDTMSVEEIESVKK AYAKADKEGKYSKAWIESTGEMYKKTVLRRLCKLIELDFDTIEQKQAFDEGSGMEFKQ EGKTDKPKSSLEAEFVEAEYEEVEESETSEVVEE RDI65706.1 phage RecT family recombinase [Nocardia pseudobrasiliensis] (SEQ ID NO:460) SSIANAANASELTPASIVNRYRDDIAAVLPPKLQARIDRWLRLAIGAVNSNADLVDRVR ADQGASMMQTLMKCAALGHEPGSGLFHLVPKGPRIEGWEDYKGVLQRIDRSGVYARV VVEVVHANDDYAYDPNLDDRPQHKRAAADRGEPVSAYAYAVYPNGAVTAVAEATPE LIAASKAKARGADNASSPWRAPGAPMHRKVAIRQLEKFVATSAEDMREVAVRNAAPD VEDAPADYYQEP WP_076170610.1 RecT [Paenibacillus rhizosphaerae] (SEQ ID NO:461) SSKLVEINSKLDSFLDAQHKAMPKGFNKTRFLQNSMSVLRDIEGLEQCDPKSVALVMLK GAFLGLDFFNKECYPVVYAGKVEFQTDYKGEVKLVKKYSTKPVREIYAKLVRQGDDFS EEIVAGSQTINFKPLPFNNGEIVGAFAVVNYVDGTMQYDTMSTEEIEKIKVNFSRKSKKT NEYSKAWVVTPGEMYKKTVLRRLCKTIDLDFDTIEQAQAFEDAADMDFNQDSKPQQQS PLNPMVIDVEYEEVKEEQADAAEQE WP_106833617.1 RecT [Brevibacillus porteri] (SEQ ID NO:462) STDU2-42312.601 (S22-113) ADQNKLVVIYNNLEKLLDSKREAMPTSFNKTRFLQNCMTVLQETKDIELCNPTTVART
Figure imgf000197_0001
DEFAEEINSGNQTINFRPKPFNNEEILGAFAVVNYMDGTMAYDIMSKEEIEKIKENFSRKS KQTGEYSKAWVVTPGEMYKKTVLRRLCKNIDLDFDTIEQRQAFEDAGDVDFNQEVKPA QQSPLNSTVIEAEFEEVSEEQTNAAEQE RDE19343.1 RecT [Parageobacillus thermoglucosidasius] (SEQ ID NO:463) AKQADLKNKLANKNSTNPTAYLKNLVYAPTVQQKFKEVLKEKAAHFLTSLISLVDSSPD LQKCNPMTIIASAMKAATLELPIDKNLGYAWIVPYKNVATFQIGYKGYIQLALRTGLYR SINVIEVYEGELRKWNRLTEELDIDEGARKSDHVIGYAGYFELTNGFIKRVYWSKEDIER HRKKFAKSDFGWENNYDAMAKKTVLRNMLSKWGILSIDMQRAYVNDIDDPEQTKEVI DVEWSEIIEEANVANSPEQQEIVFEQ WP_138600901.1 RecT [Pseudoalteromonas] (multispecies) (SEQ ID NO:464) SLSLQEYQNLLYGKLTACKGQFDACLSENGYKLDFNTELNYVYQIVMSGLNVEYSFPYT PVESVITSFLKAAKIGLSLCPTEQLCFLKTEYSESSGQYVTQLGLGYKGILKLAYRSGKV KQINANVFYEKDNFQYNGVNSKVTHTTTVLSKAMRGQLAGGYCQTELIDGSFKTTVMP PEEILAIEEQGKAMGNEAWLSVHVDQMREKTLIKRHWKTLCPCIYRDSVMNDPMLFDD QDCQHSSNQQAYEEQFESAYSREAY WP_082209600.1 RecT [Peptostreptococcaceae bacterium VA2] (SEQ ID NO:465) QPFLVQRYPHLDVVLNDQVHVLKSFFFQNHIILYLYKYIECLQIFHKPLLKGDRGKVIGY YAVYHLEPNGYNFVFMTYDEVKNHGKKYSKNFEGGIWEKEFDSMAKKTVIKKLLKYA PLSIEMQKAVTFDESVKGSIDNDMLLVESIEDVEEIQLDTNI WP_026627303.1 RecT [Dysgonomonas capnocytophagoides] (SEQ ID NO:466) STQQVQQQTKPLSLANFLNAPSTANFLKETLAEKKSEFVSNLIALCDADPKLAQCDPAQ LMKCAMNATSLNLPLNKNLGYAYVIAYKGVPSFQIGYKGLIQLAIRTGQYKFINATEIRE GEIRHNKITGEVIFNGEKPDAPIVGYMSYLELVNGFTASLYMTEEQIEQHALRFSQTYKN DKQYRSSTSKWSDPLARPTMCKKTVLKLLLGTYGLMTTEFAKALDSDSDDEVSTSGHR FEEAEIVQQGEPNEEQSDEPKRMEI WP_109523733.1 RecT [Nocardia aurea] (SEQ ID NO:467) SESISAAADAQKVTPRIVLDRHRDAFAQVLPPTINLDRWLRLAESAINASAGLLDIFRRD RGASALKALMKCAQLGHEPGSGLFHLVPKGQAIEGWEDYKGILQRILRSALYAKVVVA PVYANDEYAFDVNVDERPRHKQAAGDRGEPVRAYAYAVHRDGSTSTIAEATPAMIAG AKAKGHKTDASTSPWQNPRAPMHQKVAVRELERFVSTSAVDLRVTGDVTDLIIEEP GAE09585.1 [Paenibacillus sp. JCM 10914] (SEQ ID NO:468) TMDYVTKIQDALDRELDAKHDALPSGFKKTRFSENCRAYVKDYKDLQKYDAEEVASV LFKGAVLGLDFLAKECHVITEGSALRFQTDYKGEMTLVKKYSVRPILDIYAKNVREGDD FREEISGGKPLIHFNPRAFNNSKITGSFAVALFTDGGMVYETMPAEEIESIREHYGKNPGS STDU2-42312.601 (S22-113) DTWEKSQGEMYKRTVLRRLCKTIEIDFDAEQSLAYEAGSSFEFDREQQPKKRSPFNPPEV EESEVLSNDGITETQ
Figure imgf000198_0001
RRG08833.1 RecT [Lactobacillus sp.] (SEQ ID NO:469) NSLSGALNSRNQAGSPTSMIKNLMRSDSIKNRFDEVMGAKAPQFMASITNLVNSNQDLQ HVDAMSVVASAMVAATLDLPIDPNLGYMYIVPYRGQAQPQMGYKGYIQLALRTGQYK HINALPVYDDEVKSWNPLTEELEYESSGTSHDNQTPAGYVGYFQLINGFEKTTYWTYDQ INSHRQKFSKMSSKTDPTGVWKSNFDAMALKTVLRNLISKWGIMSIEMQQAFVKDERP QEFDHETGEIQDVQEVEAEEENVAPETQGSTDKKEE GEA30849.1 CDIOL_17720 [Clostridium diolis] (SEQ ID NO:470) ATNSSLKNQLIEKEQSTVNVQETIFKNLINSDEIKSKFTEVLKDKAFEYINSIINLVKEIPVP NALGASDSHQSADLGSLLIECEPRSIIDACMIAASLDLSIDKNLEYVWIIPYKKKSNFQLG YKGYIQLLLRTGEYKAINVIEVYEGQLKSWNPLTEEFDIDVSAKKSDAVIGYAGYFEMV NGFRKYVYWSKDNMDAFRNNSFKGDPRWNNDYKAMAKRTVMRNMLSKWGRLSAE MQRAYLEDINTDKFINGN WP_077867213.1 RecT [Clostridium saccharobutylicum] (SEQ ID NO:471) ATNSSLKNQLIEKEQTTVNVQETMFKNLINSDDVKSKFTEVLKDKAIQYINSIINLVNSDK DLIECEPKSIIDACMSAVSLDLSVDKNLEYVEIIPYKKKANFQLGYRGYIQLLLRTGEYKS VNIIEVYEGQLKSWNQLAEEFDIDFTYKKSDAVIGYAGYFEMLNGFRKSVYWSKENMD ALRENSFKSDTRWNNDYKAMAKRAVIRNMISKWGSLSIEMEKAYCEDLNTDKFVNGN WP_132305216.1 RecT [Paenibacillus sp. BK033] (SEQ ID NO:472) AANTQLITIHNNLEKLIEAKKDAMPQGFNKTRFIQNCMTVLQDTYGIEKCEPTTVARTLL KGAFLGLDFFNKECYAIPYGASMNFQTDYKGERKLAKKYSVRKVKDIYAKLVRAGDVF EENITDGQQTIQFAPVPFNNGDIVGAFAVVLFHDGGMLYETMSIAEMEHIKENYSKKSK DTGKFSKAWEVSTGEMYKKTVLRRLCKNIELDFDTIEQARAFEDAADVDFNKKTAPQQ TSPLNVVEAEYEVVNDGSATEAQSE RPI78794.1 EHM45_05245 [Desulfobacteraceae bacterium] (SEQ ID NO:473) ATPNTPTTTDAGDFLKKSEKSLKNYAVRKYDFTSFLKSAMIAINDNTTLSECLRTEAGK KSLFNAMRYAATTGLSLNPQEGKAALIGYKNKAGEMVLNYQIMKNGLIDLALSSGKVE FVTADLVRANDEFSIKKSASGDDYSFSPAIRDRGEVIGFVAALKLKGSATYVKWMSTEE VAEFRDKYSSMYKNRPDASPWTHSFNGMGIKTVMKALLRSVSISPDVDAAVKSDDYIE AEFTVHGTTADDAVTQLQTPSKPVKAEEGQGELL WP_051624047.1 RecT [Clostridium akagii] (SEQ ID NO:474) ATSESLKNQLVNKETRPPKDPFKALVYSAGIKKRFEDMLDKQANGFITSLLNLKQDKLK SCDDFTVLGSALKAAALKLPIDPNLGFAWIIPFKNHGKLEAQFQIGYKGFIQMAQRSGQY KKLNVTEIYEGQLKSFNPLTEEIVLDLDNIKSDLRKINKRYLIVMRMNLLALHLRKISKG STDU2-42312.601 (S22-113) WP_081735325.1 RecT [Paenibacillus gorillae] (SEQ ID NO:475)
Figure imgf000199_0001
LEAKHDALPSGFNAVRFVQNCKAYLPEVRNFERFNPDEIALQFLKGAILGLDFLAKECH VITEGSAARFQTDYKGEMKLAMKHSVRPLLNIYAKNVREGDVFRESVVEGRPVVSFDP LPFNNSKIIGSFAVAQFNDGGMDYESMSSTEIESIRTHYGKNPGSDTWEKSQGEMYKRT ALRRLCKTIEIDFDAEQRLAFDAGSSFEFNREPRPQQQSPLNLESEVLTDEVEQG WP_084505057.1 RecT [Acetobacterium dehalogenans] (SEQ ID NO:476) CLRSWTRSFSNSVPLKIRFRLLYTAFLSQGSPLSSVNTTQSADKGSPIFFYAMFKTKDGG YGFEVMSVEDVRAHAKKYSQSFSSAYSPWSKNFEEMAKKTVLKKALKYAPLKSDFVR GIVVDETIKREISEDMYAAPSIEIEYEVDEDGVIQDEPTSNELTEAEK AGF93134.1 RecT protein [uncultured organism] (SEQ ID NO:477) SNELQNIKPEVFGEVEDKLGSLADNNGIDLPENYSARNALKQAYLKLQSKDEPVFDKYK DETIYNALLDTLTQGLNPGKDQVYYIGYGNHLTAQKSYFGNIALAKRMAGVQEVSSNVI LEGDEVDISIERGQQVIESHDRNFDSMDGQVKGAYAVISFEDERKDKYEIMTLKELKQA WAQGKSFGGNGKSPHHKFTKEMAKKTVINRALKPLIKASDDSGLIKEKPKLEKLKDGQ QERTEGEKIEEVDVDKEEVVEVDYDV WP_076079849.1 RecT [Paenibacillus sp. FSL R7-0333] (SEQ ID NO:478) TVAIELQVQETLDRILDSKHDALPSDFNKKRFSENCKAYVADEKDLHKYSPEEIAANLFK GAVLGLDFLAKECHLISGGVELKFQTDYKGEMKLTKKYSVRPLLDVYAKNVREGDEFR EEVIEGRPVIHFAPLPFNASSIIGSFAVALFQDGGMVYESIPAGEIEEIRKNYGKSLGDAW DKSQGEMYKRTVLRRLCKTIETDFDAEQRLIYDAGGAFEFTKQPARSRQQSPFNPPEESE VTQDDRVAETDQG WP_119800346.1 RecT [Paenibacillus sp. 1011MAR3C5] (SEQ ID NO:479) ATEQIISSLEALLEAKHDALPSGFNPTRFVQNCIAYLPEIRNWDRFNAEDLAIQFFKGAVL GLDFLAKECHIIAEGSGVRFQTDYKGEMKLAMKHSVRPLLTIYAKNVREGDCIEEAVIE GRPVINFNPLPFNNSSISGSFAVAQYTDGGMVYETMSAEEIEAVRTNYGKNPGSDTWDK SKGEMYKRTVLRRLCKTIEIDFDAEQRLAFEAGSEFDFSKQPRPQQRSPFEEKEVGPDEV EQG WP_025706233.1 RecT [Paenibacillus graminis] (SEQ ID NO:480) TVAIETQVQETLDRILDSKHDALPSDFKKKRFSENCKVYVAEEKDLHKYTIDDIVANLFK GAVLGLDFLAKECHLITGGVDLKFQTDYKGEMKLTKKYSVRPMLDVYAKNVREGDIFR EQIIEGRPAIHFDPLPFNASKIIGSFAIALFQDGGMVYESIPAGEIEEIRKNYGKSLGDAWE KSQGEMYKRTVLRRLCKTIETDFDAEQRLIYEMGGAFEFTKQPTRSRQQSPFNPPEESEV IQNDRAAETDQG OIO76374.1 AUJ88_06865 [Gallionellaceae bacterium CG1_02_56_997] (SEQ ID NO:481) STDU2-42312.601 (S22-113) GRKEVERIRDGSRGYQAAKKYKKESTWDTDFVAMGLKTAIRRICKFLPKSPELATALA
Figure imgf000200_0001
AALRDANSVEALDEIYIRAEGDLDDANLEIAMREYRKCKDAISNSLI WP_131535536.1 RecT [Pedobacter nototheniae] (SEQ ID NO:482) STEQSQQQTAARVPAKFQEGTVDSILKRVSDFQNTGELVLPANYIPENAVRAAWLMLM ETTDRNDKPAIEVCTKESIANAFLEMVTKGLSVVKKQCYFVVYGNKLSLEDSYIGKIAIA KREAGVKEVNAVTIYEGDIFKYENDIETGRKRILEHKQELKNINPDKIVGA WP_028113352.1 RecT [Ferrimonas kyonanensis] (SEQ ID NO:483) NQMINEPDFVTALKDSRETYIDLTQNGGFNLNYGLEAGWAHQQIEASRYQNLDLTCSEP GSIMQAFCEAARLGLSFDPRKKHIYLMGQKDVQSGRTITILYVGYKGMIALACRTGFMI GGHADLVFEEDTFTYRSGTQLPVHEHDGRPNHERGRLKCGYVVAHQPGGMVKTLLVP KEVLLEAASNGLNAGGSNNTWCGPYMEMMYQKTCWRYAFNAWYSELEAVGMTQAQ LESATTAVSYQ WP_100916003.1 RecT [Pseudoalteromonas spongiae] (SEQ ID NO:484) NKFQHLQTELSSQLLSTKERFNELNNKNNLKVNFEEEYNFFYHLVTSSFYNINGIATCTF SSLKEAFLNIAKYGLSINPKLNLCYIRTEQSCAQANVNIAVYDFGYKGLLKLITRTGKVKI VTADVFYENDNFEFRGTREPVKHSTKTLSAAARGAMAGGYCSSELVAGGVVTTIMTPE ELREIESICQSTGNEAWNSVFIDELRRKTLIKRHWKTLMQVIEEQNLSVPIEETYQCDFAN GGY WP_125711747.1 RecT [Companilactobacillus kedongensis] (SEQ ID NO:485) MKDLARIPVKELVRSDTIKSKFNDVLGKRAPQFISSIVNIVNSNQDLKNVDQTSVISSALV AASLDLPINQSFGYMYLVPYSGKAQPQMGYKGYIQLAQRSGQYKRLNAISVSKEKVPD KMVIFIPDYRMEEAETQIDMYQDHIEDVKAGRVEPTRCGKCDYCKSTAKLGKIVSMDD LID WP_002845682.1 RecT [Peptostreptococcus anaerobius] (SEQ ID NO:486) SNQVTESKKGYVAEKNITDSALNAINKYMNDGVLHLPKTYSVENAMKSAYLTLSQAKD KNGKSVLESCTKESIYQSLLDMAVQGLTPAKNQCYFIPYGSKLTMSRSYLGTIAVTKSA VPEVKDVKGYAIYDKDVFETEFDYNTGCIKIKKFERNFDSIDTNSIKGAFALIIGEHGVLH TEVMNMAQIRNAWSMGATNGKSKAHNQFTDQMAIRTVINRACKFYINTSDDTSVLFAD SYANSDEDTSSEREVEIVDENVREKK WP_115407185.1 RecT [Shewanella morhuae] (SEQ ID NO:487) QTAQVKLSVPHQQVYQDNFNYLSSQVVGHLVDLNEEIGYLNQIVFNSLSTASPLDVAAP WSVYGLLLNVCRLGLSLNPEKKLAYVMPSWSETGEIIMKLYPGYRGEIAIASNFNVIKN ANAVLVYENDHFRIQAATGEIEHFVTSLSIDPRVRGACSGGYCRSVLMDNTIQISYLSIEE MNAIAQNQIEANMGNTPWNSIWRTEMNRVALYRRAAKDWRQLIKATPEIQSALSDTEY STDU2-42312.601 (S22-113) WP_081955873.1 RecT [Helicobacter trogontum] (SEQ ID NO:488)
Figure imgf000201_0001
SNITTIQRKNEALALLENKEIQERLCALCGNEASKDKFKASLLNIALDSNLSACSMQSIVK ASLDIAGLKLSLNKNLGKAYIVPRKVKIGNDYITEARIDIGYKGWLELAKRSKLSVKAHS VFDCDDFVYSVDGVDEYMKLTPNFELRQEHDSAWVKEHLKGIVVGIKDLKSGDSEVKF VSKGTLLKIMQKNDSVKNGKYSAYTDWLHEMLLAKAIKSCLSKTAMSEDTFYLIISNNK LFI WP_064664300.1 RecT [Pseudoalteromonas sp. MQS005] (SEQ ID NO:489) SISQQDYENLLYSKLYECESQYQAYLAEHNEKLNFNAELNYMYKAVMSGVGIEGGFPY TPLESIVESFLKAAKLGLSLDPSEQFCFLRSQYDHSTGLYHTELGLGYKGVLHLAYRSGK VKQIVSNVFYNKDNFQFNGPNSKVTHTMTVLSTSARGNLAGGYCQTELVDGSFIVTVM PPEEILAIEEQGKSVGNPAWLSAHVNQMREKTLILRHWKTLYPAIYSSSLLDSAQIFDDE CEEFPFSSPSQGFSESQTIGSY WP_069455496.1 RecT [Shewanella xiamenensis] (SEQ ID NO:490) QTAQVKLSVPHQQVFQDNFNYLSSQIVGHQVDLNEEIGYLNQIVFNSLATTSPLDVAAS WSVYRLLLNVCRLGLSLDPEKKLAYVIPSLSETGEKIMKLYPGYRGEIAIASNANVLKNA NAVLVYENDHFRIQAATGEIEHFVTSLSIDPRVRGACSGGYCRSVLMDGSVLMSYLSIEE MDSIAQHQIEANMGNTPWNSIWRTEMNRVALYRRAAKDWRQLIKATPEMQSSLLDTEF RTL04618.1 EKK58_09925 [Candidatus Dependentiae bacterium] (SEQ ID NO:491) CHVLNFQTDYKGEIKLAHKYSVRKIIDIYAKVVRDGDVLEIRVENGSQIVNFNPKVFNDG KIIGAFAVVKFVDGSLLYETMSKSEIDHTRVTFSKMPNGMAWKDSEGEMCRKTVLRRIC KLIDLHFDSVEQEQAWNDGSDADLTKNEPVKPEIQNPFPTKAVEAVIVTEEEKLRKQLK DKDPTLQDWQIDALVREHKEANQ Example 18 [00395] Exemplified by CRISPR-Cas9 systems, gene editing has become a powerful tool for probing the mechanisms of human health and diseases. Cas9 editing can cause DNA damage at on- and off-target sites and relies on the endogenous DNA repair mechanisms that are error-prone. These features often lead to unwanted mutations and safety concerns, which can be exacerbated when Applicants alter long sequences. Building on prior studies that mammalian genome DNA becomes transiently accessible upon dCas9 DNA-unwinding and R-loop formation, Applicants hypothesized that single-strand annealing proteins (SSAPs) could stimulate DNA strand exchange for gene-editing when coupled to dCas9-guideRNA complex. Thus, Applicants developed a cleavage-free gene-editing tool using the catalytically-dead dCas9 for knock-in long sequences. Applicants’ data demonstrated that this dCas9-based editor had very low editing errors at target loci, minimal detectable off-target effect, and higher overall accuracy than Cas9 editors. STDU2-42312.601 (S22-113) Meanwhile, dCas9-SSAP editor had comparable efficiencies as Cas9 editors, with robust
Figure imgf000202_0001
performances across human cell lines and stem cells. This dCas9-SSAP editor was for inserting sequences of variable lengths, up to kilobase scale. In experiments where Applicants chemically inhibited DNA repair enzymes, dCas9-SSAP editing demonstrated notable independence from endogenous mammalian repair pathways. For convenient viral delivery of the dCas9-SSAP editor for challenging cell types, Applicants performed truncation and aptamer engineering to minimize its size to fit into a single AAV vector for future applications. Overall, this tool opens opportunities towards safer genome engineering in mammalian cells. [00396] Since the initial demonstration of CRISPR-Cas9 gene-editing, significant efforts have improved and expanded gene-editing technologies for studying genome function, modeling biological processes, and gene therapies. New generations of gene-editing tools, such as base editing and prime editing, substantially improved the efficiency and fidelity of gene editing and are powerful for altering relatively short sequences. Most gene-editing tools work by cleaving genome DNA to induce single-strand nicks (SSNs) or double-stranded breaks (DSBs) that facilitate targeted editing. These DNA modifications are often repaired by error-prone endogenous pathways such as non-homologous end-joining (NHEJ)(12). This process often leads to unwanted mutations and off-target effects, which could result in toxicity and raise safety concerns. Such editing errors and off-target effects would become increasingly and sometimes prohibitively severe when engineering long genomic sequences (>=100bp). These unwanted effects limit the application of gene-editing to engineering large-scale genomic knock-in or in vivo gene-editing. Mention is made of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 that involve what is known as prime editing and twin prime editing. Each of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 is hereby incorporated herein by reference. RTs of WO2020/191241, WO2020/191153, WO2020/191245, WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239, WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and WO2021/226558 can be used in the practice of the present invention. Linkers or ways to functionally link of WO2020/191241, WO2020/191153, WO2020/191245, STDU2-42312.601 (S22-113) WO2020/191243, WO2020/191233, WO2020/191246, WO2020/191249, WO2020/191239,
Figure imgf000203_0001
WO2020/191234, WO2020/191242, WO2020/191248, WO2020191171 and can be used in the practice of the present invention. [00397] Available CRISPR-based methods for long-sequence editing, such as homology- directed repair (HDR) or microhomology-mediated end-joining (MMEJ), rely on Cas9 cutting and often trigger random indel formation within the genome. Many recent efforts have enhanced precision long-sequence editing, such as chemical enhancers, fusion of enhancement domains, and modified donor DNAs. Nicking-based HDR has been shown to reduce editing errors but could lead to lower efficiency. Thus, there remains a need for efficient, safer CRISPR editing tools for long-sequence alterations. [00398] Bacteriophages evolved enzymes that take advantage of accessible replicating genome DNA to perform precise recombination. Applicants reasoned that the key enzyme for microbial recombination, namely the single-strand annealing protein (SSAP), could be useful for gene editing in mammalian cells, and it would not explicitly cleave DNA and not rely on the error-prone pathway that was needed by Cas9 editing. Motivated by this hypothesis and Applicants’ prior work showing its ability to stimulate genomic recombination, Applicants developed a gene-editing tool using the deactivated Cas9 (dCas9, or catalytically dead Cas9) and microbial SSAPs. This dCas9 editor uses the SSAP for knock-in editing when supplied with a donor DNA, without the need for genomic DNA cleavage. Applicants termed it dCas9-SSAP editor (dCas9-SSAP). [00399] To optimize dCas9-SSAP, Applicants performed a metagenomic search of SSAPs focusing on RecT homologs, and identified EcRecT as the most efficient one for human genome knock-in. For validation, Applicants conducted a series of genome engineering and chemical perturbation experiments. Applicants’ data showed that dCas9-SSAP had comparable knock-in efficiencies to wild-type Cas9 references, with efficiencies significantly higher than Cas9 nickase editors. dCas9-SSAP achieved up to 12% knock-in efficiency without selection, across multiple genomic targets and cell lines, for kilobase-scale sequence editing. More importantly, Applicants’ data showed that this new tool generates nearly zero on- and off-target errors. In an assay for 1kb- sequence knock-in, dCas9-SSAP had less than 0.3% editing errors across all cells, while Cas9 editors had similar yields but an additional 10%-16% incorrectly-edited cells. Across loci tested, dCas9-SSAP had 90%-99.6% editing accuracies, while Cas9 editors’ accuracy ranges from 10% to 38% (FIG. 39F). STDU2-42312.601 (S22-113) [00400] Further, Applicants probed the mechanism of dCas9-SSAP editing via inhibiting several DNA repair enzymes and performing cell cycle synchronization. In these experiments, dCas9-SSAP demonstrated less dependence on the endogenous DNA repair pathways, as opposed to Cas9 editing. Results of Applicants’ cell cycle assays supported the hypothetical mechanism of dCas9 editor; they are consistent with the known biophysical, biochemical properties of dCas9. [00401] Finally, to help with delivery of dCas9-SSAP for future applications, Applicants optimize its molecular design using structural-guided truncation, and obtain a minimized dSaCas9- mSSAP, achieving over 50% reduction in size and retaining similar levels of efficiency. This minimal dCas9 editor would allow convenient delivery using viral vectors such as adeno- associated virus (AAV), potentially useful for hard-to-transfect cell types or in vivo applications. Overall, the dCas9-SSAP editor is capable of efficient, accurate knock-in genome engineering. With space for further improvement, it has potential research and therapeutic values as a cleavage- free gene-editing tool for mammalian cells. [00402] Using phage SSAPs for dCas9 knock-in gene editing. Most CRISPR-based editors capable of long-sequence knock-in require SSNs or DSBs, which can trigger the competing, error- prone NHEJ pathways, resulting in variable efficiency and accuracy. In contrast, bacteriophages evolved DNA-modifying enzymes to integrate themselves into the genomes of host bacteria via sequence homology, e.g., Lambda Red. Such precise phage integration relies on a major homology-directed step: recombination between genomic and donor DNA is stimulated by the SSAPs, e.g., Lambda Bet or its functional homolog, RecT. From prior studies, Applicants reasoned that phage SSAPs may not rely on DNA cleavage thanks to its unusual ATP-independent activity, in contrast to the ATP-dependent RAD51 protein in human cells. Phage SSAPs’ high affinity for single- and double-stranded DNAs may allow attachment to donor templates when multiple SSAPs are recruited to genomic targets via RNA-guided dCas9. It could then promote genomic-donor DNA exchange without cleavage, as target DNA strands become transiently accessible during dCas9-mediated DNA-unwinding and R-loop formation. [00403] Based on this hypothesis, Applicants designed a system to recruit SSAPs to catalytically-dead Cas9 (dCas9) (FIG. 38A). The dCas9 protein cannot cut DNA but retains the ability to unwind target sites and form R-loop, rendering the non-target strand putatively accessible for SSAP-stimulated homologous recombination. To test this, Applicants engineered and evaluated three major types of microbial SSAPs: lambda Bet protein (lambda bet); E. coli Rac STDU2-42312.601 (S22-113) prophage RecT (Rac RecT), and phage T7 gp2.5 (T7 gp2.5). Applicants recruited these SSAPs to the deactivated version of S. pyogenes Cas9 (dSpCas9, simplified as dCas9 hereafter) via an RNA aptamer MS2 stem-loop (FIGS. 38A, 38C). This MS2-aptamer was inserted into sgRNA scaffold, and the candidate SSAPs are fused to an N-term MS2 coat protein (MCP) that binds specifically to the MS2 aptamer, thus allowing multiple SSAPs to form a complex with dCas9-guideRNA. To measure their gene-editing activity in human cells, Applicants generated knock-in donors with an 800-bp transgene encoding fluorescent protein (FP) cassette flanked by homology-arms (HA), which allow in-frame insertion of the FP into housekeeping genes, e.g., DYNLT1, HSP90AA1, ACTB (FIG. 38B, left). Upon precise knock-in, Applicants measured the percentage of FP- expressing cells to quantify the gene-editing efficiency (FIGS. 38B-38D). Applicants’ initial test identified that RecT has higher knock-in editing activities relative to other SSAPs in human cells, whereas no editing above background was observed with dCas9-only or non-targe controls (FIGS. 38C, 38D). Applicants validated this knock-in editing using gel electrophoresis and sequencing (FIG.44). This provided evidence that coupling SSAP to dCas9 via RNA aptamer enables knock- in gene-editing. [00404] Development of dCas9-SSAP as a mammalian gene-editing tool. Applicants conducted metagenomic mining to identify the best SSAP for mammalian gene-editing. Applicants focused on RecT homologs and sought to maximize evolutionary diversity via a phylogenetic analysis. Applicants systematically searched the NCBI non-redundant sequence database for RecT homologs, and identified 2,071 initial candidates. Then Applicants built phylogenetic trees, filtered out proteins with high sequence homology, and subsampled the evolutionary branches, obtaining 16 highly diverse SSAP candidates (FIG. 44). [00405] Applicants examined the SSAP candidates by knock-in screening and evaluating their editing efficiencies across three genomic loci: HSP90AA1, DYNLT1, and ACTB (FIG. 38E). Among all candidates, EcRecT demonstrates the highest efficiency for dCas9 editing – it achieves genomic knock-in of kilobase cassette with up to ~6% efficiency in human cells. This was significantly higher than dCas9 controls without SSAP, which were comparable to the no-donor controls, suggesting that dCas9 alone cannot perform genomic knock-in (FIG. 38E). To measure possible background insertion of donor DNA, Applicants included non-target controls using guideRNAs that do not recognize the genomic targets and observed comparable activity to the no- donor negative control (FIG. 38E). Applicants also tested SSAP with a non-target control, STDU2-42312.601 (S22-113) confirming that expressing SSAP alone is not sufficient for knock-in (FIG. 38E). Lastly,
Figure imgf000206_0001
Applicants tested the new editor with different donor DNA designs (FIG.38F). suggested that SSAP-mediated editing is more efficient when using HDR than MMEJ donors, and longer homology arms in general make the editing efficiency higher (FIG.38F). This is consistent with prior reports that MMEJ rely on DNA breaks which are missing in dCas9 editing. Taken together, the proposed dCas9 editor enabled efficient knock-in editing in human cells, with EcRecT as the top SSAP. In what follows, Applicants focus on this top design, referred to as dCas9-SSAP. [00406] Characterizing the accuracy of dCas9-SSAP gene-editing. The motivation for developing dCas9-SSAP is to perform potentially safer, cleavage-free dCas9 editing with the help of SSAP. Thus, Applicants experimentally evaluated the accuracy of dCas9-SSAP for knock-in editing where the target sequence is ~1kb in length. Applicants measured the on-target error, off- target insertion, cell fitness effect, and editing yields of dCas9-SSAP, in comparison with Cas9 references. [00407] On-target error analysis. There are two types of on-target errors: (1) on-target indel formation, whose occurrence means that knock-in is unsuccessful; (2) knock-in errors, which means that knock-in happens but is imperfect, and that junction indels occur. [00408] To evaluate (1), Applicants used deep sequencing to measure the on-target indel formation of dCas9 editor. Applicants used the nested PCR design with an initial primer binding outside the donor DNA to avoid template contamination (FIG.39A, FIG.46). Deep sequencing of on-target sites showed that the dCas9 editor’s level of on-target error is as low as that of negative controls, in contrast to high levels of indel formation observed for Cas9 editor (FIG. 39A). [00409] To evaluate (2), Applicants benchmarked the knock-in errors of dCas9-SSAP and measured junction indels. Applicants clonally isolated edited cells, and then amplified the knock- in genomic loci using a similar 2-step nested PCR design to avoid contamination (FIG. 39B, FIG. 46), Applicants assessed the edited genomic alleles via Sanger sequencing. The long-read Sanger sequencing allowed us to thoroughly examine the entire knock-in junctions. Applicants’ results indicated that, while MMEJ donors are more efficient than HDR donors when using Cas9, they also led to a significantly higher percentage of editing errors (FIG.39B). More importantly, dCas9- SSAP outperformed Cas9-HDR and Cas9-MMEJ in terms of the percentage of clones with no STDU2-42312.601 (S22-113) knock-in errors (FIG. 39B, FIGS. 47-48). At one locus, dCas9-SSAP achieved 100% knock-in success (within limit of assay sensitivity, see Methods). [00410] Off-target error analysis. Applicants evaluated the off-target knock-in error of dCas9- SSAP editing via a genome-wide transgene insertion assay (FIGS 39C-39E, FIG. 49). Briefly, Applicants isolated high-molecular weight genomic DNA, followed by fragmentation and UMI- adapter ligation, and then used transgene-specific primers for unbiased identification of insertion sites within the genome (FIG. 39C). Through a previously validated analysis pipeline modified from Cas9 genome-wide off-target work (Methods), Applicants were able to identify enriched peaks of reads that represent high-abundance transgene insertion sites (FIG. 39D). For this analysis, Applicants also performed down-sampling to ensure all groups have the same sequencing depth/coverage. Considering insertion sites with >1% of total aligned reads, Applicants’ results confirmed that dCas9-SSAP had no detected off-target insertion site, while Cas9 references led to a significant number of off-target error sites (FIG. 39E). Notably, in all dCas9-SSAP samples, there were significantly less off-target sites when Applicants consider all sites with at least one UMI aligned, in contrast to Cas9 editor (FIG. 49). This result suggests that dCas9-SSAP could help to address the off-target issues that are prominent for long-sequence knock-in. [00411] Cell fitness effect and editing yield analysis. Applicants also compared the fitness of cells that went through Cas9/dCas9-based editing. Applicants experimented with two target sites and the data suggests that dCas9 editing in general leads to higher cell fitness than Cas9 editing (FIGS. 39F, 39G), defined by the normalized percentage of live cells after editing). [00412] For the full picture, Applicants summarized editing yields for dCas9-SSAP with comparison to Cas9 references. Applicants tabulated the percentage of accurate knock-ins, percentage of knock-ins with errors, and the percentage of on-target indels without knock-ins, where the sum of latter two is the total on-target errors (FIG. 39H). Applicants also measured the overall accuracy rate of editing, which is defined by the ratio between successful knock-in cells and total edited cells (FIG. 39H). In this analysis, Applicants observed that Cas9 editors suffered from frequent errors for long-sequence editing, where the percentage of erroneous edits are significantly higher than the yields, and their accuracy rate ranges from 10% to 38%. While dCas9- SSAP had similar levels of knock-in yields with the best Cas9 references, it had minimal error and achieved 90%-99% accuracy rate across genomic loci. STDU2-42312.601 (S22-113) [00413] Benchmarking the efficiency of dCas9-SSAP editing with Cas9 editing. Having
Figure imgf000208_0001
established that dCas9-SSAP has higher accuracy for knock-in editing, validated its efficiencies and usages. Applicants benchmarked its editing efficiency across different cell lines. For benchmarks, Applicants experimented with both wild-type and nicking- based Cas9 (nCas9) editors, including three HDR-enhancing tools. Applicants examined their 1- kb knock-in activities across the three genome targets in human HEK293T cells. Results from this comparison demonstrated that dCas9-SSAP achieved higher efficiencies than the Cas9, nCas9, and nCas9-hRAD51 nickase editors, with comparable efficiencies as Cas9-HE and Cas9-GEM, two published HDR-enhancing editors (FIG. 40A). Additionally, Applicants’ data showed that a single-guide dCas9-SSAP editor was sufficient for effective knock-in, with minor improvement when using two guideRNAs (FIG. 50C-50D). Thus, Applicants concluded that dCas9-SSAP had similar levels of efficiency as the Cas9-based editors. [00414] Next, Applicants evaluated the editing efficiencies of dCas9-SSAP with different donor DNA designs (FIG. 40C). Applicants’ results indicated that SSAP-mediated editing is more efficient when using HDR than MMEJ donors and longer HAs generally result in a higher editing efficiency. Applicants evaluated the editing efficiency of dCAS9-SSAP when the sequence for knock-in has variable length, up to 2-kb for dual-FP knock-in (FIG.40D). Applicants’ data showed that dCas9-SSAP had consistent performances, with comparable and often higher efficiencies than Cas9 references across the transgene lengths tested (FIG. 40D). [00415] Lastly, Applicants tested if dCas9-SSAP editor has robust activities across genomic targets, and if it is applicable in more challenging cases beyond one model cell line. Applicants selected four additional endogenous loci from house-keeping genes (BCAP31, HIST1H2BK, CLTA, RAB11A) in addition to the three previously tested ones (DYNLT1, HSP90AA1, ACTB) (FIG. 40E). Across all genomic sites, dCas9-SSAP editor demonstrated efficiencies up to 12% without selection, comparable and often slightly higher than Cas9 references using the same donors (FIG. 40E). [00416] Further, Applicants applied dCas9-SSAP to three cell lines with distinctive tissue origins (cervix-derived HeLa cells, liver-derived HepG2 cells, and bone-derived U-2OS cells). Applicants observed consistent knock-in efficiencies comparable to Cas9 references in all three lines (FIG. 42). Finally, Applicants used dCas9-SSAP editor in human embryonic stem cells (hESCs) to engineer sequences in a more therapeutically relevant setting. Applicants observed STDU2-42312.601 (S22-113) robust knock-in editing activity across all three genomic sites tested (FIGS. 40F, 40G). Of note, dCas9-SSAP editing used short ~200-bp HAs and achieved up to ~3% efficiency for kb-scale editing without selection, comparable and often higher than the Cas9 references in human stem cells (FIG. 40G, FIG. 52). [00417] Chemical perturbations suggest dCas9-SSAP gene-editing has less dependence on endogenous DNA repair pathways. Recall Applicants’ model that dCas9-SSAP performs gene editing without DNA cleavage or dependence on an endogenous repair pathway. To better understand the nature of dCas9-SSAP editing, Applicants used three orthogonal chemical perturbations to probe its mechanism (FIG. 41). [00418] First, Applicants investigate if the dCas9-SSAP editing depends on the DSB repair pathway as Cas9 editing does (FIG.41A). In Cas9-mediated knock-in, the recognition of DSBs by the Mre11-Rad50-Nbs1 (MRN) complex is a necessary step for downstream HDR repair. Applicants leveraged Mirin, a potent chemical inhibitor of DSB repair, which has been shown to prevent MRN complex formation, ATM activation, and Mre11 exonuclease activity. Applicants treated cells with Mirin and tested the editing efficiencies of dCas9-SSAP and Cas9 references on these cells. Across all genomic targets, Applicants observe that the dCas9-SSAP efficiencies were nearly unaffected by the Mirin treatment and essentially the same as vehicle-treated groups (FIG. 41B, Mirin). Meanwhile, Cas9 references demonstrated substantially reduced editing efficiencies under the Mirin treatment, which suggests Cas9 editing depends on the DSB repair (FIG. 41B, Mirin). [00419] Second, Applicants investigate the dependence of dCas9-SSAP on the HDR pathways. Applicants used two small-molecule inhibitors of the HDR enzyme RAD51, RI-1 and B02, to block this rate-limiting step. Applicants’ data showed that blocking RAD51 activity via these two inhibitors significantly reduced Cas9 editing efficiencies at all genomic targets, but it did not have a significant effect on dCas9-SSAP editing (FIG.41B, RI1 and B02). These two repair-modulating experiments generated consistent results: dCas9-SSAP showed significantly less dependence on the endogenous DNA repair mechanisms than Cas9 references. They suggest that dCas9-SSAP acts through the activity of SSAP when recruited by the dCas9-guideRNA complex and differs from Cas9 editing. [00420] Third, Applicants investigate how cell cycling affects the dCas9-SSAP editor. Cell cycling has been shown to facilitate the accessibility of mammalian genomes. More specifically, STDU2-42312.601 (S22-113) the genome replication (during S phase) may provide a favorable environment for the dCas9 to
Figure imgf000210_0001
unwind DNAs and allow SSAP-mediated recombination (FIG. 41C). To test this Applicants synchronized cells at the G1/S boundary using the double Thymidine blockage (DTB). DTB treatment indeed reduced dCas9-SSAP editing efficiencies (FIG. 41D). Nonetheless, when Applicants combined Mirin, RI-1, or B02 with DTB treatment, dCas9-SSAP maintained higher editing efficiencies than Cas9 references across genomic loci tested (FIG. 41D). This further supported that the dCas9-SSAP editor had less dependence on endogenous repair pathways. [00421] Taken together, Applicants’ data supported the hypothetical mechanism of dCas9- SSAP editing: RNA-guided dCas9 binds to genomic targets and makes them accessible to the SSAP, so SSAP would promote homology-directed recombination without generating any DNA break (FIG. 38A). Deeper understanding into this process will require further investigation, e.g., biophysical analysis of the dCas9-SSAP complex as it performs gene-editing or additional assays to perturb mammalian genome accessibility. Applicants hope the results could open up the opportunity and lead to helpful insights for further developing dCas9 editing approaches. [00422] Minimization of dCas9-SSAP gene-editing tool for convenient delivery. Finally, to optimize the dCas9-SSAP editor for potential future applications, Applicants sought to develop a minimal version compatible with the size limitations of viral vectors such as AAV. Applicants designed 14 different truncated EcRecT variants based on its secondary structure prediction (FIG. 42A, FIG. 53), and tested all constructs for their gene-editing activities alongside full-length dCas9-SSAP controls. From the optimization results, Applicants identified a short RecT variant (around ~200aa in length) that had comparable efficiencies with the original full-length RecT- based design (FIG. 42B). [00423] Applicants next integrated this short RecT variant with the more compact SaCas9 system and the smaller N22-BoxB aptamer design to build a minimal-functional dSaCas9-mSSAP editor (FIG. 42C). This allowed us to fit the dSaCas9-mSSAP into a single AAV and employ a >=4kb donor AAV for long-sequence editing (FIG. 42C). Applicants tested the dSaCas9-mSSAP editor via delivery of AAV2 particles, and confirmed that it had efficiencies comparable to the full-length version in HEK293T cells (FIG. 42D). This design, while needing further in vivo validation, could provide a convenient option for the delivery of this dCas9 knock-in editor. [00424] Overall, the dCas9-SSAP editor harmonizes the RNA-guided programmability of CRISPR genome-targeting with the SSAP activity of phage enzyme RecT. It enables long- STDU2-42312.601 (S22-113) sequence editing with minimal DNA damage and provides research and therapeutic possibilities
Figure imgf000211_0001
for addressing some of the currently intractable diseases involving large disease- delivering therapeutic genes in vivo where selection methods are limited, or minimizing undesirable modifications during gene-editing. Compared with other long-sequence editing methods that depend on endogenous repair pathways following DNA cleavage, dCas9-SSAP and its mini-version facilitate homology-mediated gene editing via non-cutting dCas9s. This efficient, low-error technology offers a new and complementary approach to existing CRISPR editing tools. [00425] Plasmids construction. Human codon optimized DNA fragments were ordered from Genescript, Genewiz and IDT DNA. The fragments encoding the recombination enzymes were Gibson assembled into backbones (addgene plasmid #61423) using Q5® High-Fidelity 2X Master Mix (New England BioLabs). The amino acids sequence for these SSAP could be found in the Table 10. All sgRNAs were inserted into backbones (dCas9-SSAP and dSaCas9-SSAP plasmids) using Golden Gate cloning. dCas9-SSAP plasmids bearing BbsI(dSpCas9) and BsaI(dSaCas9) sites as gRNA backbones were sequence-verified (Eton and Genewiz). The sgRNA sequence used in this research could be found in the Table 8. All dCas9-SSAP plasmids will be deposited to Addgene for open access. [00426] Cell culture. Human Embryonic Kidney (HEK) 293T, Hela, HepG2 and U2OS cells were maintained in Dulbecco’s Modified Eagle’s Medium (DMEM, Life Technologies), with 10% fetal bovine serum (FBS, BenchMark), 100 U/mL penicillin, and 100 µg/mL streptomycin (Life Technologies) at 37 ºC with 5% CO2. HEK 293T, Hela, HepG2 and U2OS cells were obtained from American Type Culture Collection (ATCC). The identity of the cell line are authenticated regularly by short tandem repeat (STR) assay and routinely tested for the presence of Mycoplasma using qPCR assay. [00427] hES-H9 cells were maintained in mTeSR1 medium (StemCell Technologies) at 37 ºC with 5% CO2. Culture plates were pre-coated with Matrigel (Corning) 12 hours prior to use. 10 µM Rho Kinase inhibitor Y27632 (Sigma) was added for the first 24 hours after each passaging. Culture media was changed every 24 hours. [00428] Transfection. HEK293T, Hela, HepG2 and U2OS cells were seeded into 96-well plates (Corning) 12-24 hours prior to transfection at a density of 30,000 cells/well, and 250 ng of total DNA was transfected per well. Cells were transfected with Lipofectamine 3000 (Life Technologies) following the manufacturer’s instructions when the cell are ~70% confluence. In STDU2-42312.601 (S22-113) brief, Applicants used 250 ng total DNA, 0.4 ul Lip3000 reagent, mixed with 10 ul of Opti-MEM per well. For the 250 ng DNA, Applicants used 160 ng of dCas9-SSAP guideRNA plasmids (for double sgRNAd design, use equal amount of the two guideRNA plasmids, e.g., 80ng each), 60 ng of pMCP-RecT or GFP control plasmid (addgene # 64539) and 30 ng of PCR template DNA (the PCR primer could be found in Table 9, the template sequence could be found in Supplementary Sequences).Three days later, the cells were analyzed using FACS. [00429] Electroporation. For hES-H9 transfection, P3 Primary Cell 4D-NucleofectorTM X Kit S (Lonza) was used following the manufacturer’s protocol. In brief, the hES-H9 cells were resuspended using Accutase (Innovative Cell Technology) and washed with PBS twice before the electroporation. For each reaction, 300,000 cells were nucleofected with 4 µg total DNA mixed in 20 ul electroporation buffer using the DC100 Nucleofector Program. For the 4 ug DNA, Applicants used 2.6 ug of dCas9-SSAP guideRNA plasmids (for double sgRNAd design, use equal amount of the two guideRNA plasmids, e.g., 1.3 ug each), 1 ug of pMCP-RecT or GFP control plasmid and 0.4 ug of PCR template DNA (the PCR primer could be found in Table 9, the template sequence could be found in Supplementary Sequences). After electroporation, the cells were seeded into 12-well plates with 1 mL of mTeSR1 media added with 10 uM Y27632. Culture media was changed every 24 hours. Four days later, the cells were analyzed using FACS. [00430] Fluorescence-activated cell analysis (FACS). mKate knock-in efficiency was analyzed on a CytoFLEX flow cytometer (Beckman Coulter; Stanford Stem Cell FACS Core). 72 hours after transfection or 96 hours after electroporation, cells were washed twice with PBS and dissociated with TrypLE Express Enzyme (Thermo Fisher Scientific). Cell suspension was then transferred to a 96-well U-bottom plate (Thermo Fisher Scientific) and centrifuged at 300g for 5 minutes. After removing the supernatant, pelleted cells were resuspended with 50 µl 4% FBS in PBS, and cells were analyzed within 30 minutes after preparation. [00431] Sanger Sequencing and NGS of knock-in junctions. HEK293T cells transfected with plasmid DNA and HDR templates were harvested 72 hours after transfection. The genomic DNA of these cells were extracted using the QuickExtract DNA Extraction Solution (Biosearch Technologies) following the manufacturer’s protocol. The target genomic region was amplified using specific primers outside of the homology arms of the HDR template. The primers used for Sanger sequencing or NGS analysis could be found in the Table 9. PCR products were purified STDU2-42312.601 (S22-113) with Monarch PCR & DNA Cleanup Kit (New England BioLabs). 100 ng of purified product was
Figure imgf000213_0001
sent for Sanger sequencing with target-specific primers (EtonBio or Genewiz). [00432] Treatment with HR and cell cycle inhibitor. All inhibitors were ordered from Sigma- Aldrich. For different inhibitor assays, the cells were pretreated with Mirin (Sigma, M9948-5MG, 25 uM), B02 (Sigma, SML0364, 10 uM),) or RI-1 (Sigma, 553514-10MG-M, 1 uM) for 16 hours. For cell cycle test, the cells were pretreated with Thymidine (Sigma, T9250-1G, 2mM) for 18 hours, then remove thymidine, culture the cells using normal D10 without thymidine for 9 hours, add the second round of thymidine to a final concentration of 2 mM for another 18 hours. After the inhibitor and thymidine, the cells were transfected with dCas9-SSAP using Lipofectamine 3000 following the manufacturer’s instruction. 3 days later, the cells were analyzed on a CytoFLEX flow cytometer and genomic DNA were also harvested for sequencing validation as above. [00433] Next-Generation Sequencing Library Preparation.72 hours after transfection, genomic DNA was extracted using QuickExtract DNA Extraction Solution (Biosearch Technologies). 200 ng total DNA was used for NGS library preparation. Genes of interest were amplified using specific primers (Table 9) for the first round PCR reaction. Illumina adapters and index barcodes were added to the fragments with a second round PCR using the primers listed in Table 9. Round 2 PCR products were purified by gel electrophoresis on a 2% E-gel using the Monarch DNA Gel Extraction Kit (New England BioLab). The purified product was quantified with Qubit dsDNA HS Assay Kit (Thermo Fisher) and sequenced on an Illumina MiSeq system using paired-end PE300 kits. All sequencing data will be deposited to NCBI SRA archive. [00434] TOPO cloning experiment. Total of 250 ng genomic DNA was used for the TOPO cloning experiments. The knock-in events were amplified using specific TA colony primers targeted to DYNLT1 or HSP90AA1 locus (Table 9) using Phusion Flash High-Fidelity PCR Master Mix (ThermoScientific, F-548L). Purify the targeted PCR products using Gel extraction kit (New England BioLabs, T1020L) following the manufacturer’s instructions. Add a-tail to the PCR products using Taq polymerase (New England BioLabs, M0273S) through incubate at 72C for 30 minutes. Set up the TOPO cloning reaction and transformation following the manufacturer’s instructions (Thermo Scientific, K457501). Send the colony plates for RCA/colony sequencing using M13F (5´-GTAAAACGACGGCCAG-3´(SEQ ID NO:623)) and M13R (5´- STDU2-42312.601 (S22-113) CAGGAAACAGCTATGAC-3´(SEQ ID NO:624)) primers. The sequence results were analyzed
Figure imgf000214_0001
using SnapGene software. [00435] High-throughput Sequencing Data Analysis. Processed (demultiplexed, trimmed, and merged) sequencing reads were analyzed to determine editing outcomes using CRISPPResso2 by aligning sequenced amplicons to reference and expected HDR amplicons. The quantification window was increased to 10 bp surrounding the expected cut site to better capture diverse editing outcomes, but substitutions were ignored to avoid inclusion of sequencing errors. Only reads containing no mismatches to the expected amplicon were considered for HDR quantification; reads containing indels that partially matched the expected amplicons were included in the overall reported indel frequency. The computation work was supported by the SCG cluster hosted by the Genetics Bioinformatics Service Center (GBSC) at the Department of Genetics of Stanford. All customized scripts for data analysis will be deposited to Github under Cong Lab and made available for download. [00436] Insertion site mapping and analysis. Applicants used a process that was previously developed (GIS-seq) and adapted for the genome-wide, unbiased off-target analysis of mKate knock-in, following the similar protocol in Applicants’ previous study. Briefly, Applicants harvest the HEK293T cells 3 days after transfection. The genomic DNA was size-selected to avoid the template contamination in the following step via the DNAdvance genomic DNA kit (A48705, Beckman Coulter).400 ng of purified genomic DNA was fragmented to an average of 500bp using NEB Fragmentase, ligated with adaptors, and size-selected using NEBNext Ultra II FS DNA Library Prep kit following manufacture’s instruction. Following two rounds of nested anchored PCR to amplify targeted DNA (from the end of the knock-in sequence to the ligated adaptor sequence), and do a size-selected purification following the NEBNext Ultra II FS DNA library Prep kit protocol. The libraries were sequenced using Illumina Miseq V3 PE600 kits. Sequencing data was analyzed to determine off-target insertion events with all analysis code deposited to Github (github.com/cong-lab). [00437] Statistical Analysis. Unless otherwise stated, all statistical analysis and comparison were performed using t-test, with 1% false-discovery-rate (FDR) using a two-stage step-up method of Benjamini, Krieger and Yekutieli. All experiments were performed in triplicates unless otherwise noted to ensure sufficient statistical power in the analysis. STDU2-42312.601 (S22-113) [00438] SSAP mining process. For initial SSAP screening, Applicants identified the three major family of phage recombination enzymes from Bacteriophage lambda, E. coli Rac prophage, and bacteriophage T7, and extracted the primary enzyme sequences as listed in supplementary sequences. [00439] For RecT-like SSAP mining. RefSeq non-redundant protein database was downloaded from NCBI on October 29, 2019. Applicants systematically searched the NCBI non-redundant sequence database for RecT homologs. Applicants’ search follows two guidelines: 1) Closely- related candidates are less likely to have differential activities; 2) Microbial enzymes that function well when heterologous expressed in eukaryotic cells are difficult to predict, thus sampling diverse evolutionary branches of RecT homologs would be ideal. After identifying a large set of 2,071 candidates, Applicants built phylogenetic trees and selected representative candidates after filtering out proteins with high sequence homology. Then, Applicants used a threshold of at least 10% sequence divergence and sizes up to 300-aa (to avoid extremely large proteins that are hard to synthesize and less portable) to refine the hits, and randomly sampled the evolutionary branches to obtain a final list of 16 SSAPs (FIG. 38E, FIG. 44). Overall, the SSAP candidates have significant evolutionary and sequence heterogeneity, while retaining conserved regions that have been previously suggested to be important for their biochemical activities. [00440] The multiple sequence alignment between RecT homologs were used online tool (T- Coffee: tcoffee.crg.cat/apps/tcoffee/do:regular). [00441] Donor design test comparing Cas9 HDR, Cas9 MMEJ, and dCas9-SSAP. As shown in FIG. 38F, Applicants tested the new editor with different donor DNA designs. Applicants considered three major types of donor DNAs with different homology arm (HA) length designs. Specifically, Applicants synthesized: 1) HDR donors bearing long HAs (>=100bp), a standard format for long-sequence engineering and transgene knock-in; 2) MMEJ donors with typically short HAs (<= 50bp), which have been shown to improve editing efficiencies for DSB-mediated knock-in; 3) NHEJ donors without HAs (0bp), which could help gauge the levels of donor integration due to Cas9-induced DSBs. Applicants’ results from these tests revealed two characteristics of dCas9-SSAP that are distinct from Cas9 gene-editing. [00442] Firstly, for the NHEJ donors without any HAs (highlighted box in FIG. 38F), Applicants observed knock-in cassette expression when using Cas9 editor but not for the dCas9- SSAP editor (FIG. 38F). This is consistent with previous reports that Cas9-mediated DSBs could STDU2-42312.601 (S22-113) induce NHEJ-mediated donor DNA insertion, but this integration is minimal when using the non- cutting dCas9-SSAP (FIG. 38F, dCas9-SSAP with NHEJ 0bp donor). [00443] Secondly, dCas9-SSAP benefited from successively longer HA within the donor, regardless of whether the HAs are for HDR-type or MMEJ-type, in contrast to Cas9 editor that showed a boost of knock-in efficiencies when using the MMEJ donors (FIG.38F, HDR and MMEJ donors). This is consistent with the assumption that the enhancing effect when using MMEJ donors is dependent on Cas9 cleavage of target genomic sites. [00444] Further, while the focus of this work is long-sequence engineering, Applicants also tested dCas9-SSAP for shorter sequence editing (FIG.45) and observed precise knock-in of 16-bp sequence into EMX1 locus in human HEK293T cells. This experiment allowed us to verify the minimal indel formation when using dCas9-SSAP compared with Cas9-based editor using deep sequencing (FIG. 45B). [00445] In summary, dCas9-SSAP editing becomes most efficient when using HDR donors, and longer homology arms in general make editing efficiency higher. [00446] Step-by-step gene-editing protocol using dCas9-SSAP plasmids. A. Design of guideRNA sequences at target genomic loci [00447] This step is the same as standard Cas9 experiments. Briefly, based on the Cas9 enzyme used, target sequence (usually 20-bp) near the knock-in or editing sites can be selected next to the protospacer adjacent motif (PAM). For SpCas9 use “NGG” and for SaCas9 use “NNGRRT”. Applicants usually append extra “G” base to the beginning of the guide sequence to facilitate U6/Pol-III transcription initiation if the first base of the guide sequence is not “G”. Two DNA oligos could be ordered based on selected guides, with golden gate cloning overhangs, as shown below. 5’ –CACCGNNNNNNNNNNNNNNNNNNN –3’ 3’ –CNNNNNNNNNNNNNNNNNNNCAAA –5’ N denotes the guide sequences. Standard desalting oligos are sufficient for this cloning. The two oligos above will be annealed to form the insert fragments in the next step. [00448] B. Annealing of two DNA oligos for each guideRNA target. Perform phosphorylation and annealing of each pair of oligos via reaction setup below. STDU2-42312.601 (S22-113) oligo1 Top (100uM) 1ul
Figure imgf000217_0001
oligo2 Bottom (100uM) 1ul g the following parameters:
Figure imgf000217_0002
37C 30 min 95C 5 min and then ramp down to 25C at 5C/min [00450] C1. Golden Gate Cloning of annealed oligos into sgRNA/dspCas9 (dCas9-SSAP) plasmid [00451] For wild-type Cas9 test, one guide RNA is needed and the backbone vectors for the cloning will bear BbsI cloning sites matching the annealed oligos from Step B. The wild-type Cas9 plasmids for this step will be: pCas9-MS2-BB_BbsI (see list of plasmids at end of protocol) Item Volume Note
Figure imgf000217_0003
as needed. After setting up the golden gate reaction (on ice), immediately move the reaction into Thermocycler and perform the golden gate reaction using the following parameters: 37C 5 min 16C 5 min cycle for ~25 cycles, additional cycles up to 50 could be used to maximize efficiency STDU2-42312.601 (S22-113) 65C 5 min 4C hold [00453] After the reaction, bacterial transformation as per standard protocol of the competent cells used in the lab. [00454] C2. Golden Gate Cloning of annealed oligos into sgRNA/dspCas9 (dCas9-SSAP) plasmid [00455] For dCas9-SSAP using dSpCas9, one or two guide RNAs can be used with double guideRNAs providing slightly better efficiency of editing. The backbone vectors for the cloning will bear BbsI cloning sites matching the annealed oligos from Step B. The dCas9-SSAP plasmids for this step will be: pdCas9-SSAP-MS2-BB_BbsI (see list of plasmids at end of protocol) Item Volume Note
Figure imgf000218_0001
e. [00457] D. Preparation of HDR templates [00458] Please refer to Supplementary Sequences for template used in the study and examples of template designs are illustrated as in Fig. 38. Applicants recommend using a dsDNA template with at least 200bp of homology arms on each end of the insertion/replacement sequences (the edited portion of the template). Applicants suggest cloning the template into simple plasmids such as pUC19, then, restriction digestion of plasmids or standard PCR (using primers such as listed in the Table 9) could be employed for generating large amounts of dsDNA templates. [00459] E. Perform gene-editing via delivery of dCas9-SSAP plasmids and template DNA STDU2-42312.601 (S22-113) [00460] With previous steps, the three components of dCas9-SSAP editing method are ready
Figure imgf000219_0001
for experiments: the guideRNA/Cas9 plasmid (cloned in step A-C), the template D), and the SSAP plasmid (pMCP-RecT, can be obtained from Addgene). For delivery into cells in vitro, routine transfection or electroporation could be performed following the recommended conditions by the reagent or equipment manufacturer and selected based on the cell types. For HEK293T cells as an example, a typical transfection condition is described below: [00461] 1. One day before transfection, 3E4 HEK293T/Hela/HepG2/U2OS cells seeded on each well of 96-well plate, the cell density should be around 70% on the next day at the time of transfection. [00462] 2. For lipofectamine 3000 as the transfection reagent, use a total of 250 ng DNA + 0.4 ul Lip3000 reagents (ea.) and perform the reagent set up using 10 ul of Opti-MEM per well, as in the manufacturer's protocol. [00463] 3. Transfection material: dCas9-SSAP guideRNA plasmids, 160ng (for double sgRNAd design, use equal amount of the two guideRNA plasmids, e.g., 80ng each); pMCP-RecT or GFP control plasmid, 60ng; Template DNA, up to 30ng. [00464] 4. Mix plasmids with template DNA and perform transfection according to the manufacturer's protocol for HEK293T/Hela/HepG2/U2OS cells. [00465] 5. 12-24 hours after transfection, if applicable could switch to fresh media. [00466] 6. After at least 3 days post transfection, cells could be harvested or proceed to downstream experiments or analysis as needed. List of Plasmids (available at Addgene via plasmid ID) P
Figure imgf000219_0002
STDU2-42312.601 (S22-113) p-dCas9-SSAP-MS2-BB_BbsI pU6-MS2-gRNA-backbone(BbsI)-CBH-
Figure imgf000220_0001
es
Figure imgf000220_0002
are: guides starting with sp indicate SpCas9 guide RNA targets, and guides starting with dsp indicate dSpCas9 guide RNA targets. Table 8. Sequence for gRNAs ) 2) ) ) ) 2) ) 4)
Figure imgf000220_0003
STDU2-42312.601 (S22-113) dSa-AAVS1-guide1 AAVS1 CACAGTGGGGCCACTAGGGA (SEQ ID NO:496)
Figure imgf000221_0001
0) ) )
Figure imgf000221_0002
[00469] Sequences for primers used for DNA template generation, targeted sequencing, and NGS assays are listed below. All NGS adapter sequences are shown underlined color. Table 9. Primer Sequences
Figure imgf000221_0003
STDU2-42312.601 (S22-113) RecT- Truncati RecT AATCACAGAGTACTCGCCGGTCAG 264aa-R on (SEQ ID NO:510)
Figure imgf000222_0001
STDU2-42312.601 (S22-113) mKate- PCR DYNLT1 GGAAGCGGAGCTACTAACTT PCR-0-F templat (SEQ ID NO:518)
Figure imgf000223_0001
STDU2-42312.601 (S22-113) HSP90AA PCR HSP90A AGAAGTAGACGGAAGCGG 1-PCR- templat A1 (SEQ ID NO:524) T A
Figure imgf000224_0001
STDU2-42312.601 (S22-113) Junction Junction mKate CCATCTCATCCCTGCGTGTCTCCGAGGCCGACAAA NGS-3’ NGS GAGACA G C
Figure imgf000225_0001
STDU2-42312.601 (S22-113) CLTA- PCR CLTA CCAGAAGCACTCAAACATGCTG PCR-F templat (SEQ ID NO:545)
Figure imgf000226_0001
Table 10 - SSAP Sequences I F H T K H Y FS R PL P FF G E NP D S R E
Figure imgf000226_0002
STDU2-42312.601 (S22-113) MGTALTPLLTKFATRYEMGTTPEEVANTLKQTCFKGQVNDSQMVALLI VADQYKLNPFTKELYAFPDKNNGIVPVVGVDGWARIINENPQFDGMEFS K T T H Y FN R PL LP H DE K K H Y FN R PL PR Y S K R V C E G N V
Figure imgf000227_0001
STDU2-42312.601 (S22-113) MPKQPPIAKADLQKTQGARTPTAVKNNNDVISFINQPSMKEQLAAALPR HMTAERMIRIATTEIRKVPALGDCDTMSFVSAIVQCSQLGLEPGGALGH D IE E V R K G P AI P A P H DE E DE P H DE K K P H DE EK K H A F K K
Figure imgf000228_0001
STDU2-42312.601 (S22-113)
Figure imgf000229_0001
F K EK G SA V Q S II L P SR E D S F G
Figure imgf000229_0003
es are detailed below with each of the templates. Unless otherwise noted, when different homology arms are used in the Example, Applicants used primers listed in Table 9 to obtain templates with different homology arm lengths. DYNLT1 P2A-mKate knock-in HDR template sequence (SEQ ID NO:548) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold)
Figure imgf000229_0002
the inserted mKate fluorescent protein sequence, the proceeding non- underlined part is the P2A peptide sequence) AGTGACCTGTGTAATTATGCAGAAGAATGGAGCTGGATTACACACAGCAAGTTCCT STDU2-42312.601 (S22-113) TAAATGCTGCTCTCTTCCCTCCCGCAGGGAGCTGCACTGTGCGATGGGAGAATAA GACCATGTACTGCATCGTCAGTGCCTTCGGACTGTCTATTGGAAGCGGAGCTACTA
Figure imgf000230_0001
GTATCCCTGTAGGTCACCTGCAGCCTGCGTTGCCACTTGTCTTAACTCTGAATATT TCATTTCAAAGGTGCTAAAATCTGAAATCTGCTAGTGTGAAACTTGCTCTACTCTC TGAAATGATTCAAATACACTAATTTTCCATACTTTATACTTTTGTTAGAATAAATTA TTCAAATCTAAAGTCTGTTGTGTTCTTCATAGTCTGCATAGTATCATAAACG HSP90AA1 P2A-mKate knock-in HDR template sequence (SEQ ID NO:549) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold) (Underlined are protein sequence, the proceeding non- underlined part is
Figure imgf000230_0002
GCAGCAAAGAAACACCTGGAGATAAACCCTGACCATTCCATTATTGAGACCTTAAG
STDU2-42312.601 (S22-113) ATCCCCGACTTCTTTAAGCAGTCCTTCCCCGAGGGCTTCACATGGGAGAGAGTCA CCACATACGAAGATGGGGGCGTGCTGACCGCTACCCAGGACACCAGCCTCCAGG
Figure imgf000231_0001
TGTTTTTCTTTATTTTTGTTAATATTAAAAAGTCTGTATGGCATGACAACTACTTTA AGGGGAAGATAAGATTTCTGTCTACTAAGTGATGCTGTGATACCTTAGGCACTAA AGCAGAGCTAGTAATGCTTTTTGAGTTTCATGTTGGTTTATTTTCACAGATTGGGG TAACGTGCACTGTAAGACGTATGTAACATGATGTTAACTTTGTGGTCTAAAGTGTT TAGCTGTCAAGCCGGATGCCTAAGTAGACCAAATCTTGTTATTGAAGTGTTCTGA GCTGTATCTTGATGTTTAGAAAAGTATTCGTTACATCTTGTAGGATCTACTTTTTGA ACTTTTCATTCCCTGTAGTTGACAATTCTGCATGTACTAGTCCTCTAGAAATAGGT TAAACTGAAGCAACTTGATGGAAGGATCTCTCCACAGGGCTTGTTTTCCAAAGAA AAGTATTGTTTGGAGGAGCAAAGTTAAAAGCCTACCTAAGCATATCGTAAAGCTG TTCAAAAATAACTCAGACCCAGTCTTGTGGA AAVS1 P2A-mKate knock-in HDR template sequence (SEQ ID NO:550) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold) (Underlined are the inserted mKate fluorescent protein sequence, the proceeding non- underlined part is the P2A peptide sequence) GATGCTCTTTCCGGAGCACTTCCTTCTCGGCGCTGCACCACGTGATGTCCTCTGA
STDU2-42312.601 (S22-113) ACAACGTCAAGATCAGAGGGGTGAACTTCCCATCCAACGGCCCTGTGATGCAGAA GAAAACACTCGGCTGGGAGGCCTCCACCGAGACACTGTACCCCGCTGACGGCGG
Figure imgf000232_0001
CTCTCTCCTTGCCAGAACCTCTAAGGTTTGCTTACGATGGAGCCAGAGAGGATCC TGGGAGGGAGAGCTTGGCAGGGGGTGGGAGGGAAGGGGGGGATGCGTGACCT GCCCGGTTCTCAGTGGCCACCCTGCGCTACCCTCTCCCAGAACCTGAGCTGCTC TGACGCGGCTGTCTGGTGCGTTTCACTGATCCTGGTGCTGCAGCTTCCTTACACT TCCCAAGAGGAGAAGCAGTTTGGAAAAACAAAATCAGAATAAGTTGGTCCTGAG TTCTAACTTTGGCTCTTCACCTTTCTAGTCCCCAATTTATATTGTTCCTCCGTGCGT CAGTTTTACCTGTGAGATAAGGCCAGTAGCCAGCCCCGTCCTGGCAGGGCTGTG GTGAGGAGGGGGGTGTCCGTGTGGAAAACTCCCTTTGTGAGAATGGTGCGTCCT AGGTGTTCACCAGGTCGTGGCCGCCTCTACTCCCTTTCTCTTTCTCCATCCTTCTT TCCTTAAAGAGTCCCCAGTGCTATCTGGGACATATTCCTCCGCCCAGAGCAGGGT CCCGCTTCCCTAAGGCCCTGCTCTGGGCTTCTGGGTTTGAGTCCTTGGC OCT4 P2A-mKate knock-in HDR template sequence (SEQ ID NO:551) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold) (Underlined are the inserted mKate fluorescent protein sequence, the proceeding non- underlined is
Figure imgf000232_0002
sequence) GCGACTATGCACAACGAGAGGATTTTGAGGCTGCTGGGTCTCCTTTCTCAGGGGG
STDU2-42312.601 (S22-113) ATTGGGAACACAAAGGGTGGGGGCAGGGGAGTTTGGGGCAACTGGTTGGAGGG AAGGTGAAGTTCAATGATGCTCTTGATTTTAATCCCACATCATGTATCACTTTTTT CTTAAATAAAGAAGCCTGGGACACAGTAGATAGACACACTT ACTB P2A-mKate knock-in HDR template sequence (SEQ ID NO:552) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold) (Underlined are the inserted mKate fluorescent protein sequence, the proceeding non- underlined part is the P2A peptide sequence) TGTGGTGTGTGGGGAGCTGTCACATCCAGGGTCCTCACTGCCTGTCCCCTTCCCT CCTCAGATCATTGCTCCTCCTGAGCGCAAGTACTCCGTGTGGATCGGCGGCTCCA
Figure imgf000233_0001
TGGCTTGACTCAGGATTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTG GAGCGAGCATCCCCCAAAGTTCACAATGTGGCCGAGGACTTTGATTGCACATTG TTGTTTTTTTAATAGTCATTCCAAATATGAGATGCGTTGTTACAGGAAGTCCCTTG CCATCCTAAAAGCCACCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCA AGTCCACACAGGGGAGGTGATAGCATTGCTTTCGTGTAA EMX1 HDR template sequence (SEQ ID NO:553) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold) inserted BsrGI restriction site, e.g., TGTACA)
Figure imgf000233_0002
CATTCTGCCTCTCTGTATGGAAAAGAGCATGGGGCTGGCCCGTGGGGTGGTGTCC STDU2-42312.601 (S22-113) TGCCATCCCCTTCTGTGAATGTTAGACCCATGGGAGCAGCTGGTCAGAGGGGACC CCGGCCTGGGGCCCCTAACCCTATGTAGCCTCAGTCTTCCCATCAGGCTCTCAGC
Figure imgf000234_0001
CCCACAGGGCTTGAAGCCCGGGGCCGCCATTGACAGAGGGACAAGCAATGGGC TGGCTGAGGCCTGGGACCACTTGGCCTTCTCCTCGGAGAGCCTGCCTGCCTGGG CGGGCCCGCCCGCCACCGCAGCCTCCCAGCTGCTCTCCGTGTCTCCAATCTCCC TTTTGTTTTGATGCATTTCTGTTTTAATTTATTTTCCAGGCACCACTGTAGTTTAGT GATCCCCAGTGTCCCCCTTCCCTATGGGAATAATAAAAGTCTCTCTCTTAATGAC ACGGGCATCCAGCTCCAGCCCCAGAGCCTGGGGTGGTAGATTCCGGCTCTGAG GGCCAGTGGGGGCTGGTAGAGCAAACGCGTTCAGGGCCTGGGAGCCTGGGGTG GGGTACTGGTGGAGGGGGTCAAGGGTAATTCATTAACTCCTCTCTTTTGTTGGGG GACCCTGGTCTCTACCTCCAGCTCCACAGCAGGAGAAACAGGCTAGACATAGGG AAGGGCCATCCTGTATCTTGAGGGAGGACAGGCCCAGGTCTTTCTTAACGTATTG AGAGGTGGGAATCAGGCCCAGGTAGTTCAATGGG DYNLT1 mKate-T2A-EGFP HDR template (SEQ ID NO:554) Left Homology Arm (italicized)-mKate-T2Alinker-EGFP-Right Homology Arm (bold) (Underlined are the inserted mKate/EGFP fluorescent protein sequence, with the connecting non-underlined T2A peptide sequence) TGCCGTAAATGCTGCTCTCTTCCCTCCCGCAGGGAGCTGCACTGTGCGATGGGAG
Figure imgf000234_0002
GGAGAATCCTGGCCCAGGTGGTTCTGCCGGTGGCTCCGGTTCTGGCTCCAGCGG TGGCAGCTCTGGTGCGTCCGGCACGGGTACTGCGGGTGGCACTGGCAGCGGTT CCGGTACTGGCTCTGGCGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGC STDU2-42312.601 (S22-113) CCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCG GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCA
Figure imgf000235_0001
TCTCTTTGTTTTGTGGCACTTTCACAATGTAGAGGAAAAAACCAAATGACCGCAC TGTGATGTGAATGGCACCGAAGTCAGATGAGTATCCCTGTAGGTCACCTGCAGC CTGCGTTGCCACTTGTCTT HSP90AA1 mKate-T2A-EGFP HDR template (SEQ ID NO:555) Left Homology Arm (italicized)-mKate-T2Alinker-EGFP-Right Homology Arm (bold) (Single underlined are the inserted mKate/EGFP fluorescent protein sequence, with the connecting non-underlined T2A peptide sequence) TACTGTCTTGAAAGCAGATAGAAACCAAGAGTATTACCCTAATAGCTGGCTTTAAGA
STDU2-42312.601 (S22-113) GCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCT GCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTT
STDU2-42312.601 (S22-113) GTGCTGTGACTTCCTGTTAGAACTTGTTGAAAGCCTATTGTGTCACGTGTACTTTC CACCATGTAATGGCGTTCTAACGTGAG BCAP31 P2A-mKate knock-in HDR template sequence (SEQ ID NO:557) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold) (Underlined are the inserted mKate fluorescent protein sequence, the proceeding non- underlined part is the P2A peptide sequence) AGGCCTTTGGGTGCAGCTGGGGAGGGGGCCCCTTGTTCACTTGAATAGCTGTTGT TAGGAGAGAGGGGAACCGAGGTGGACCTCTGGGGCATGGGGCTGGAGGTGGCA
Figure imgf000237_0001
CATTACAGGGGACCTGATTGCTACACGTTCAGAATGCGTTTGCTGTCATCCTGCT TGGCCTGGCCAGGCCTGGCACAGCCTTGGCTTCCACGCCTGAGCGTGGAGAGC ACGAGTTAGTTGTAGTCCGGCTTGCGGTGGGGCTGACTTCCTGTTGGTTTGAGCC CCTTTTTGTTTTGCCCTCTGGGTGTTTTCTTTGGTCCCGCAGGAGGGTGGGTGGA GCAGGTGGACTG CLTA P2A-mKate knock-in HDR template sequence (SEQ ID NO:558) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold) (Underlined are the inserted mKate fluorescent protein sequence, the proceeding non- underlined part is the P2A peptide sequence) GGGTAGCTCCTGAACCATTGTTGTCCTCTGATTGGTTGTTCCCTTTTCGGCTCTGC STDU2-42312.601 (S22-113) CTGTACATGGAGGGCACCGTGAACAACCACCACTTCAAGTGCACATCCGAGGGCG AAGGCAAGCCCTACGAGGGCACCCAGACCATGAGAATCAAGGCGGTCGAGGGCG
Figure imgf000238_0001
CATCGAGAACGACGAGGCCTTCGCCATCCTGGACGGCGGCGCCCCCGGGCCCC AGCCGCACGGCGAGCCGCCGGGGGGTCCGGGTGAGAGTGCGGGCGCGTTTGG GGCGAGAGGACTTGTCTGGAAACTCGGTCCACAGTGGGTCCGAGAGCTTCTGTG TGACTCGTGCTCCTTGCTGAATTAGGAGGTTAGGGAGCAGTGCAAACAGGAAAC GAGACCCTGGCCCGGTCTTTCAGAAACCTAGGCTCGAGAAGCCTGTTCGGTTCT CAGCATGTTTGAGTGCTTCTGG RAB11A P2A-mKate knock-in HDR template sequence (SEQ ID NO:559) Left Homology Arm (italicized)-Insertion Sequence-Right Homology Arm (bold) (Underlined are the inserted mKate fluorescent protein sequence, the proceeding non- underlined part is the P2A peptide sequence) CTGCCGGAAATGGCGCAGCGGCAGGGAGGGGCTCTTCACCCAGTCCGGCAGTTG
STDU2-42312.601 (S22-113) GACCCGGGCCACTCCCGGTGGACCCTCGTGCCGGCCACCCCTGCACTGATATAG GCCTCCCTCAGCCCTTCCTTTTTGTGCGGTTCCGTCTCCTAC
Figure imgf000239_0001
Example 18 [00472] Optimizing dCas9-SSAP efficiency for robust knock-in editing. Applicants further optimized dCas9-SSAP editor and tested its activities across a larger panel of genomic targets. Applicants first examined if adjusting dosage may improve the editing efficiencies (FIG. 54A, FIG.55A). Indeed, when Applicants titrated up the amount of SSAP-encoding plasmid, Applicants observed higher editing efficiencies across all targets (FIG.54A). This correlation further supports that the knock-in editing was driven by the SSAP. In contrast, increasing donor amount had negligible effects on the knock-in efficiency (FIG. 55A), suggesting that donor dosage was not a bottleneck in this setting. In addition to dosage optimization, Applicants extended the donor homology arm (HA) lengths, and observed that further extension of HAs helped to improve knock- in efficiencies, consistent with earlier results (FIG. 55B, 55C). [00473] Using these optimized parameters, Applicants measured the knock-in efficiencies of dCas9-SSAP at seven endogenous loci (DYNLT1, HSP90AA1, ACTB BCAP31, HIST1H2BK, CLTA, RAB11A) (FIG. 54B). Of note, Applicants included two loci (CLTA, RAB11A) where the knock-in tag was inserted as direct fusion at the N-termini of endogenous proteins, complementing the 2A-peptide designs. Across all targets, dCas9-SSAP demonstrated efficiencies up to ~20% without selection, comparable and sometimes moderately higher than Cas9 references (FIG.54B). [00474] To ensure the stability of editing mediated by dCas9-SSAP over long timespan, Applicants next examined the durability of knock-in transgene expression. Applicants sorted mKate+ cells at Day3 post transfection of dCas9-SSAP and donor DNA, then checked if the transgene maintained its expression beyond the 3-day window at different genomic loci (FIG. 54C). Consistent with Applicants’ sequencing results showing accurate on-target editing (FIG. 39), Applicants observed that the knock-in cassette expression is stable at Day5, Day7, Day10 post the delivery of dCas9-SSAP (FIG. 54C, FIG. 57). The knock-in cell populations have distinct, steady transgene expression compared with controls (FIG. 57C). Thus, this supported the utility of dCas9-SSAP for stable knock-in editing in mammalian cells. [00475] Finally, Applicants sought to functionally validate the ability of dCas9-SSAP editor to insert diverse payloads at endogenous loci (FIG. 56A). Briefly, Applicants constructed knock-in donors with selectable payloads (Puromycin and Blasticidin resistance cassettes) as fusion protein STDU2-42312.601 (S22-113) with endogenous genes (FIG. 56B, left). Applicants examined the knock-in results from dCas9-
Figure imgf000240_0001
SSAP and Cas9 reference using Western Blot. Immunoblotting confirmed the presence sizes of expected knock-in fusion proteins using dCas9-SSAP across targets (HSP90AA1, ACTB) and payloads (FIG. 56B). Further, Applicants quantified the relative knock-in efficiencies of dCas9-SSAP and Cas9 methods using a functional assay (FIG. 56c, FIG. 57C-57E). Applicants employed short-HA donors to insert resistance cassette to endogenous loci, and applied Puromycin to select the knock-in cells. Colony formation assay validated that dCas9-SSAP editor had reliable performances using this protein function readout (FIG. 56C). Example 19 [00476] SSAP + Reverse Transcriptase with Cas9 [00477] Editing efficiency was tested for a system combining SSAP with a reverse transcriptase and Cas9. [00478] SSAP-RT (Prime-Edit) – Experiment 1 Test SSAP-RT in HEK293T cells [00479] 48-well plate HEK293T cell, the cell density was 60%. [00480] For lipofectamine 2000, 1086 ng DNA + 1 ul Lip2000, mix in 30 ul opti-MEM per well. [00481] Cas9n-RT 600ng + gRNA with RNA template/donor, 200ng + 2nd gRNA with MS2 aptamer 66ng + SSAP 200ng [00482] Components of the system tested: [00483] Cas9-RT: pA131, Cas9(H840A) + RT: expressing Cas9 nickase fused to reverse transcriptase (RT) [00484] guideRNA with RNA template/donor: [00485] pA132: non-targeting control [00486] pA132_HEK3_CTT_ins (U6 driving guideRNA fused to RNA template/donor to insert, CTT, a 3bp sequence, at HEK3 genomic site in human genome) [00487] pA132_RNF2_GTA_ins (U6 driving guideRNA fused to RNA template/donor to insert, GTA, a 3bp sequence, at RNF2 genomic site in human genome) [00488] Second guide RNA with MS2 aptamer to recruit SSAP: [00489] pA73 (non-targeting control without MS2 aptamer) STDU2-42312.601 (S22-113) [00490] pCK032_ HEK3_+90 (U6 promoter driving HEK3 guideRNA with 20bp guide
Figure imgf000241_0001
located at +90 position relative to the guideRNA with template/donor above, this scaffold has MS2 aptamer to recruit SSAP) [00491] pCK032_RNF2_+41 (U6 promoter driving RNF2 guideRNA with 20bp guide located at +41 position relative to the guideRNA with template/donor above, this guideRNA scaffold has MS2 aptamer to recruit SSAP) [00492] SSAP protein: [00493] pCK904_MCP_GFP: pEF1A-MCP-XTENLinker-GFP-SV40NLS, expressing MCP fusion to GFP protein as control [00494] pCK904: pEF1A-MCP-XTENLinker-RecT-SV40NLS, expressing MCP fusion with SSAP RecT [00495] Prime editing with SSAP yielded the highest editing efficiency at the HEK3 locus (FIG. 85, top) and the highest editing efficiency at the RFN2 locus (FIG. 85 bottom). Example 20 [00496] SSAP-RT for different lengths of genomic edits in HEK293T cells [00497] SSAP + Reverse Transcriptase with Cas9 [00498] 48-well plate HEK293T cell, the cell density was 60%. [00499] For lipofectamine 2000, 1086 ng DNA + 1 ul Lip2000, mix in 30 ul opti-MEM per well. [00500] Cas9n-RT 600ng + gRNA with RNA template/donor, 200ng + 2nd gRNA with MS2 aptamer 66ng + SSAP 200ng [00501] Components of the system tested: [00502] Cas9-RT: [00503] pA131, Cas9(H840A) + RT: expressing Cas9 nickase fused to reverse transcriptase (RT) [00504] guideRNA with RNA template/donor: [00505] pA132: non-targeting control [00506] pA132_HEK3_12_ins (U6 driving guideRNA fused to RNA template/donor to insert 12bp sequence at HEK3 genomic site in human genome) [00507] pA132_HEK3_36_ins (U6 driving guideRNA fused to RNA template/donor to insert 36bp sequence at HEK3 genomic site in human genome) STDU2-42312.601 (S22-113) [00508] pA132_HEK3_108_ins (U6 driving guideRNA fused to RNA template/donor to
Figure imgf000242_0001
insert 108bp sequence at HEK3 genomic site in human genome) [00509] pA132_RNF2_12_ins (U6 driving guideRNA fused to RNA template/donor to insert 12bp sequence at RNF2 genomic site in human genome) [00510] pA132_RNF2_36_ins (U6 driving guideRNA fused to RNA template/donor to insert 36bp sequence at RNF2 genomic site in human genome) [00511] pA132_RNF2_108_ins (U6 driving guideRNA fused to RNA template/donor to insert 108bp sequence at RNF2 genomic site in human genome) [00512] Second guide RNA with MS2 aptamer to recruit SSAP: [00513] pCK032_ HEK3_-17 (U6 promoter driving HEK3 guideRNA with 20bp guide located at -17 position relative to the guideRNA with template/donor above, this guideRNA scaffold has MS2 aptamer to recruit SSAP) [00514] pCK032_ HEK3_-9 (U6 promoter driving HEK3 guideRNA with 20bp guide located at -9 position relative to the guideRNA with template/donor above, this guideRNA scaffold has MS2 aptamer to recruit SSAP) [00515] pCK032_RNF2_+5 (U6 promoter driving RNF2 guideRNA with 20bp guide located at +5 position relative to the guideRNA with template/donor above, this guideRNA scaffold has MS2 aptamer to recruit SSAP) [00516] pCK032_RNF2_-19 (U6 promoter driving RNF2 guideRNA with 20bp guide located at -19 position relative to the guideRNA with template/donor above, this guideRNA scaffold has MS2 aptamer to recruit SSAP) [00517] SSAP protein: [00518] pCK904_MCP_GFP: pEF1A-MCP-XTENLinker-GFP-SV40NLS, expressing MCP fusion to GFP protein as control [00519] pCK904: pEF1A-MCP-XTENLinker-RecT-SV40NLS, expressing MCP fusion with SSAP RecT [00520] Prime editing with SSAP yielded the highest editing efficiency at the HEK3 locus (FIG. 86, top) and the highest editing efficiency at the RFN2 locus (FIG. 86 bottom). The improvement in editing efficiency was more pronounced in several cases when editing longer sequences using RNA template/donor (36nt insertion, 108nt insertion). Table 11. Sequences for SSAP-RT recombineering
Figure imgf000242_0002
STDU2-42312.601 (S22-113) Name Note Sequence Cas9n-RT SEQ ID NO:604 MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWA S Y S H A D R G L II Q H S R H R K F M D S I P E I
Figure imgf000243_0001
STDU2-42312.601 (S22-113) PGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALL TAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRR T L A K Q K G V R V M V E P T V I R R P T H G
Figure imgf000244_0001
STDU2-42312.601 (S22-113) (SEQ ID KDGNPIPSAIAANSGIYSASGGSSGGSSGSETPGTSESAT NO:606) PESSGGSSGGSGGSTLNIEDEYRLHETSKEPDVSLGSTW Y K S S L T T V A T I S T L A E S A A t C c c c g c t g
Figure imgf000245_0001
STDU2-42312.601 (S22-113) pA19- MS2-containing gctgatctgcaccacgtttAagagctaggccAACATGAGGATCACCCA MS2-dg- dgRNAa with TGTCTGCAGggcctagcaagttTaaataaggctagtccgttatcaacttggccA t C c g t g t g aa t t G G C
Figure imgf000246_0001
STDU2-42312.601 (S22-113) pA132_HE guideRNA fused GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAA K3_36_ins to RNA TAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTG G G G C A G A A G
Figure imgf000247_0001
STDU2-42312.601 (S22-113) (SEQ ID NO:614) A A T A A g G G
Figure imgf000248_0001
STDU2-42312.601 (S22-113) TTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIF FKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNIL S E G V R V M V T A D L F T I Y D F D E L L H I
Figure imgf000249_0001
STDU2-42312.601 (S22-113) AKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEV A E T K T E C F A T A D L F T I Y D F D E L L H I
Figure imgf000250_0001
STDU2-42312.601 (S22-113) AKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEV A E Y N F A K K g G G c tg t t c t g c
Figure imgf000251_0001
STDU2-42312.601 (S22-113) Example 21
Figure imgf000252_0001
[00521] Arrayed SSAP library screening on endogenous genome targets (ACTB, HSP90AA1) using mKate knock-in assay. [00522] SSAP-encoding plasmids were purified and quantified. [00523] Each SSAP encoding plasmid was tested in duplicate, including a negative control (same plasmid encoding Flag_HA which is not expected to promote gene editing). Transfections were in 96-well plates and transfection efficiency was estimated to be 50%. [00524] Knock-in templates: 1. HSP90AA1: gCK240+241, tm 66.1C, mKate/pCK1451/pCK1452 as PCR template 2. ACTB: gCK115+116, tm 63.6C, mKate/pCK1453/pCK1454 as PCR templateLG [00525] Three days after transfection, mKate positive cells and cell viability were quantified across all replicates, along with positive (original RecT SSAP) and negative (Flag-HA control protein) controls. Higher frequency of mKate+ cells indicates a candidate SSAP is more active (e.g., has higher ability to mediate precision knock-in editing of the kilobase-scale transgene). At the same time, the cell viability was measured by live cell counts via flow cytometry, to help quantify the fitness effect of SSAP on mammalian cells. [00526] FIG.88 shows results of SSAP array screening, showing editing efficiency as fold over negative control or percent of mKate knock-in and cell viability for the ACTB target and the HSP90AA1 target. FIG. 89 shows normalized (89A) and absolute (89B) editing efficiency at HSP90AA compared to editing efficiency at ACTB. FIG. 89C shows cell viability, comparing SSAP use for HSP90AA1 knock-ins with ACTB knock-ins. FIG.90 provides plots comparing cell viability and editing efficiency, normalized (90A) and absolute (90B) over all targets and bar graphs illustrating normalized (C) or absolute (D) editing efficiency at ACTB and HSP90 for each of the SSAP candidates. [00527] Alignments and phylogenic trees depicting related proteins and sequence alignments for several of the top targets are provided in FIG. 91, 92, and 93. The alignments indicate certain conserved regions and motifs, consistent with regions of predicted 3D structure (e.g., FIG. 36, 37, 44, 53). At least 3 regions are highly conserved: (1) the N-terminal part has a S/N/Y-R/K-F/L/I- rich region resembling a Serine/Tyrosine recombinase motif; (2) the middle-part has a M-R/K- R/K-rich region; (3) the C-terminal part includes a D/E-D/E-F/Y region that resembles a transposase-like motif. Some candidate SSAPs may have one, or more of these regions. This is STDU2-42312.601 (S22-113) also in agreement with the predicted 3D structure of SSAP and interaction of the SSAP with DNA
Figure imgf000253_0001
that promotes homology-based recombination via highly-charged amino acids. [00528] Top scoring SSAP proteins are shown in Table 12. The table shows editing efficiency as the normalized average of two targets (HSP90 and ACTB), absolute editing efficiency, and cell viability. SSAP proteins are identified by Uniparc deposit number and SEQ ID NO. Alignment numbers correspond to SSAPs in FIG. 91, 92, and 93. Table 12 - Top scoring SSAP proteins gn nt .
Figure imgf000253_0002
STDU2-42312.601 (S22-113) SSAP_198 WP_147981944.1 367 250 5.524340663 13.65 0.444354117
Figure imgf000254_0001
SSAP 199 BAQ93806.1 368 266 4.601904797 11.4025 e
Figure imgf000254_0002
[00529] Additional families of reverse transcriptase and their sequences are contemplated. All sequences are listed in SEQ ID N1-N120 in the below tables. [00530] SEQ ID No. N11-N20 for the TERT RT design. Telomerase reverse transcriptase (TERT) and related reverse transcriptase can be used. Reference: Autexier C, Lue NF. The structure and function of telomerase reverse transcriptase. Annu Rev Biochem. 2006;75:493-517. doi: 10.1146/annurev.biochem.75.103004.142412. PMID: 16756500.1zty [00531] SEQ ID No. N21-N120 for the rvt design. Cellular single-copy rvt reverse transcriptase and related reverse transcriptase with similar domain structure can be used. Reference: Gladyshev EA, Arkhipova IR. A widespread class of reverse transcriptase-related cellular genes. Proc Natl Acad Sci U S A. 2011 Dec 20;108(51):20311-6. doi: 10.1073/pnas.1100266108. Epub 2011 Aug 29. PMID: 21876125; PMCID: PMC3251080. [00532] SEQ ID No. N124-N125 for the R2 element design. R2 element reverse transcriptase and related non-LTR retrotransposable element reverse transcriptase can be used. Reference 1: Wilkinson ME, Frangieh CJ, Macrae RK, Zhang F. Structure of the R2 non-LTR retrotransposon initiating target-primed reverse transcription. Science. 2023 Apr 21;380(6642):301-308. doi: 10.1126/science.adg7883. Epub 2023 Apr 6. PMID: 37023171. Reference 2: Deng P, Tan SQ, Yang QY, Fu L, Wu Y, Zhu HZ, Sun L, Bao Z, Lin Y, Zhang QC, Wang H, Wang J, Liu JG. STDU2-42312.601 (S22-113) Structural RNA components supervise the sequential DNA cleavage in R2 retrotransposon. Cell.
Figure imgf000255_0001
2023 Jun 22;186(13):2865-2879.e20. doi: 10.1016/j.cell.2023.05.032. Epub 2023 37301196. [00533] SEQ ID No. N121-N123 for the engineered phage or prokaryotic polymerase design. Reverse transcriptase engineered from phage or prokaryotic DNA polymerase, including metagenomic prokaryotic or phage reverse transcriptase and DNA polymerase and their synthetic or mutated derivatives can be used. Reference 1: Heller RC, Chung S, Crissy K, Dumas K, Schuster D, Schoenfeld TW. Engineering of a thermostable viral polymerase using metagenome- derived diversity for highly sensitive and specific RT-PCR. Nucleic Acids Res. 2019 Apr 23;47(7):3619-3630. doi: 10.1093/nar/gkz104. PMID: 30767012; PMCID: PMC6468311. Reference 2: Ellefson JW, Gollihar J, Shroff R, Shivram H, Iyer VR, Ellington AD. Synthetic evolutionary origin of a proofreading reverse transcriptase. Science.2016 Jun 24;352(6293):1590- 3. doi: 10.1126/science.aaf5409. PMID: 27339990. [00534] SEQ ID No. N1-N10 for the chimeric or fusion RT design. Chimeric reverse transcriptase that are engineered by fusion, with or without peptide linker, between two reverse transcriptase can be used. These chimeric or fusion reverse transcriptase will have N-term from one reverse transcriptase and the C-term from another reverse transcriptase. Specifically, one type of fusion or chimeric reverse transcriptase consists of N-terminal polymerase domain of one reverse transcriptase and the C-terminal RNaseH domain of another reverse transcriptase, with the fusion site either before, within, or after the connection domain (originally located between the polymerase domain and the RNaseH domain). Reference: Misra HS, Pandey PK, Pandey VN. An enzymatically active chimeric HIV-1 reverse transcriptase (RT) with the RNase-H domain of murine leukemia virus RT exists as a monomer. J Biol Chem. 1998 Apr 17;273(16):9785-9. doi: 10.1074/jbc.273.16.9785. PMID: 9545316. [00535] Designs that showcase SSAP along with R2 RNA element, or a family of non-long terminal repeat (non-LTR) retrotransposable element called R2 retron are as follows: [00536] R2 protein fused to Cas9 nickase (H840A) or dCas9 (D10A, H840A), and use MS2 aptamer fused to the R2 element RNA to recruit SSAP via MS2-coat-protein (MCP). Different version of the fusion proteins, SEQ ID N126 for Cas9 nickase and SEQ ID N127 for dCas9, were designed. For the MS2 aptamer containing R2 element RNA, the sequence is designed as SEQ ID N131-N136, with different location of the aptamer and number of aptamers. STDU2-42312.601 (S22-113) [00537] R2 protein fused to Cas9 nickase (H840A) or dCas9 (D10A, H840A), and further fused to GCN4 peptide to allow the use of single-chain antibody scFV to recruit SSAP. Different version of the fusion proteins, SEQ ID N128 for Cas9 nickase and SEQ ID N129 for dCas9 were designed. The R2 element basic RNA sequence is designed as SEQ ID N130. Table 13. - Reverse Transcriptases SEQ in h
Figure imgf000256_0001
STDU2-42312.601 (S22-113) Table 13. - Reverse Transcriptases SEQ Protein h
Figure imgf000257_0001
STDU2-42312.601 (S22-113) Table 13. - Reverse Transcriptases SEQ Protein h
Figure imgf000258_0001
STDU2-42312.601 (S22-113) Table 13. - Reverse Transcriptases SEQ Protein h
Figure imgf000259_0001
STDU2-42312.601 (S22-113) Table 13. - Reverse Transcriptases SEQ Protein h
Figure imgf000260_0001
STDU2-42312.601 (S22-113) Table 13. - Reverse Transcriptases SEQ Protein h
Figure imgf000261_0001
STDU2-42312.601 (S22-113) Table 13. - Reverse Transcriptases SEQ Protein h
Figure imgf000262_0001
STDU2-42312.601 (S22-113) Table 13. - Reverse Transcriptases SEQ Protein h
Figure imgf000263_0001
Table 14. - Sequences of Reverse Transcriptases SE KL N K L H K L L Q P T L T L Q N KL N K L H L GT
Figure imgf000263_0002
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ L KL N K L H L GT L KL N K L H P G L KL N K L H AL R AL KL N K L H L A K KL N K L H L A K
Figure imgf000264_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ KL N K L H L A K KL N K L H L A K P LL G E SL G FI T V V E A GL T Q S LS SR DL R SP E G L EV SI D LE C
Figure imgf000265_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ L N K N E S LK E D QI K S I IP E IV P F II I K S II R E T V E P L V Y AS L Q C N M Y V P LL
Figure imgf000266_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ H P L C Q G D A G L T Q CT DI E F H G M D Q L R K EL SF S S A LT R K F L E F T R A CF R R
Figure imgf000267_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ G K V Q R K P V R D A L KL LL D H G G P S D C F AS YL L S G N V K T PI S P T K SA G N E Q H
Figure imgf000268_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ I IH Q Y E V S G N V K SS II V GY E V A F H G V EL G KA G DL E SL E Q SI QL VL L LK RI L F D G
Figure imgf000269_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ T DL I S G V M P R T D E IA L IL T TR G A E Q V VP S Y G DL F T FF N F E E D KR L D F W T N R
Figure imgf000270_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ F E D K L D I A K T N F F L V N Y Q V RL TT L KL LE V Y A F C T N ES Q NI G L F H R R LT
Figure imgf000271_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ M C G I GL Q LE S N W TF K I LR Y Q IN L IA T FI SY N V Q DI P N II H SI G H E G D S L W R
Figure imgf000272_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ N T L T P LS R G S SS R A D P LL G A I E E W G P V E K TT G V H A LP IS K L S VE LV G Q LL V S G KL
Figure imgf000273_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ Q L W TE E R W G H NI TS T A F Q P A L K W E M DF V AL G Y FT V R AL V FC A A S R LY V RK L W
Figure imgf000274_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ K SP H S V F V Q E G K S S S A LL K D Q Y P T R K RS ET L L L R V A Y P I G K R N YY Q L R V A
Figure imgf000275_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ G E GI P E VE H G A L R L Q LF ET Y T L D G EK LA Q A Q SI D S D W Q P L A AK S D KS D K TS E SE
Figure imgf000276_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ V Q K A T T TE A IR GK L D LS R W L SS M S AE VI L S W G D K E Q R E S EI VI F S M V S L S T N FT
Figure imgf000277_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ I D H C Q L K P RY E G EE A F A R L K P Y SI A EK F T W D E Q SL P K P V K S P IS D TE L
Figure imgf000278_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ L IS ET E F Q V E D L Q M R E H M V D E S L E S R ER L G F S G D T DL K KL R V HS S Q F R L
Figure imgf000279_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ GE D TF SI A F D L A H T K EL Y E Q S N E A A F I L E D LT V S Y A S T D ST A I K D L N
Figure imgf000280_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ R A G E S K K L G E TE T K R TE G N VI S D Q V L LV AL DT E K IN V P KY K T W R W FL L PL G G S F
Figure imgf000281_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ W I K CI D YI H V E L L SI R K E N E Y LS V G H I A G V F S E R R T IF A E Q EL SL V RK V G
Figure imgf000282_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ L W L N K N E TS A YL P K A T A S W V EL CI D K L A M L I A N V Q Y L IQ E A Y G RE Q A E P
Figure imgf000283_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ D F A LP R R Y V I V AL RI G L R N V D N A D V Y L V G P P LV F GI N E A W V V G A C M T
Figure imgf000284_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ D V K V E GE EL Q FF GI D D D K G K T LP S C PE H I L E K QS S R F S T E V N F E D RA KE
Figure imgf000285_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ G K G S S K LL Q K L Q R L M L T Q H I D EE Q P P E A G F N YC F M L G A F C D F P D S
Figure imgf000286_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ DL D D E FP Q Q V N G G E E Y D YC KI T R T IR A L V S Y N T Q H H G G DS G R S K K AP R G Q V
Figure imgf000287_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ VP Q IS Y L A I V P RI I K N LT V AK Q G K K A K W FY R LY K A D Q K T G AS L V L E EF
Figure imgf000288_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ D F L IR G N PI E A H G A V R A IP K A L F M S K EL F S EV L SL AL K P V C A F R V F PI T K Y T
Figure imgf000289_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ G R E S R R K SY G A P A SC K E R LY F G C L A R K P A R SS FT G S M G L Q G V GS R K VI M
Figure imgf000290_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ G F D D W Q K E K K A E SV P T R W FI N N A R MI S V KE A V KT R M VF H G D D A V M R TI A P
Figure imgf000291_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ L SP NL E E KI Q L T T FA Y TT W N SE TI R CA S H N I YE M NT AI S L G A N R E S T V PL W R N S
Figure imgf000292_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ QE R H G Q R Q A D I H R TI L E N FF L S PK D H S KE SD E Q N V A A L D D Q VL V P M Y F G
Figure imgf000293_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ SS E DL N A KE G H G EK N P E E G E VF D L P L K N A DL AL IS H T L S A Q N N W E A H T A L G
Figure imgf000294_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ H S K E S D D H L K E M L A E H R RL Q E Q L D S W W H
Figure imgf000295_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ DL A E W G K L R R D R N R A A A Q G P T G R KE DL A
Figure imgf000296_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ E W G K L R R D R N S E N Y LL I Q V F A D G N A K IS H DL A E W G K L R R D R N S E
Figure imgf000297_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ N Y LL I Q V F A D G N A K IS H DL A E W G K L R R D R N S E N Y LL I Q V F A D G N A
Figure imgf000298_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ K IS H E V A R DL A E W G K L R R D R N S E N Y LL I Q V F A D G N A K IS H E V A R g gt
Figure imgf000299_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ tt gc cc g cc c gt cg tt a tt tc gg tg g gt tt gc cc g A C gt ct t g ca ct tc ct at g gt tt gc cc g cc c gt cg tt a tt T A
Figure imgf000300_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ tt t g gt tt gc cc g A C gt ct t g ca ct tc G C tt tc g gt tt gc cc g A C gt ct t g ca ct tc ct at aa g gt tt gc cc g
Figure imgf000301_0001
STDU2-42312.601 (S22-113) Table 14. - Sequences of Reverse Transcriptases SEQ
Figure imgf000302_0001
cc c gt cg tt a tt T A tt t aa g gt tt gc cc g A C gt ct t g ca ct tc G C tt tc tc
Figure imgf000302_0002
Example 23 [00538] Screening ERF proteins. [00539] HEK293T cells were seeded, then transfected with dCas9, guideRNA-MS2, MCP fusion to different SSAP, and template donor DNA (dsDNA for inserting cargo) (Fig. 104A). The cargo contains promoter-less mKate (~1kb) for in-frame insertion into specific endogenous genome loci. FIG. 104B through FIG. 104L show the gating strategy for detecting mKate+ lnock- ins. Top candidates were then compared for activity using Cas9 nickase and Cas9 wild-type STDU2-42312.601 (S22-113) nuclease. Fig.105 shows candidate SSAPs and D3 phage exonuclease. Table 15 lists ERF family
Figure imgf000303_0001
proteins identified in the screen, including SSAPs and exonucleases. Table 15. - ERF Family Proteins D3_Orf5 766 MQIFKDLEQGSQEWLDARLGIATCSELDVLMVNGKVQAGFGVGAFTYMDRLIGERITGAEAEPWRGN L A A E S TK TK D P D NP F Q F R D A G P D Q K E K D K GI PK SK
Figure imgf000303_0002
STDU2-42312.601 (S22-113) mv4 779 MQHSESVKEIFGALSKFRAQVKQPAKTAKNPYFNSNYVTLEGVMQSIDAALPGTGLAYCQLVENGDNG
Figure imgf000304_0001
YQRQSAQNKSYRGQNANQGNRQTAEQKARHDQASAVMDEVADVKHKAETTFIDSSGTSLLDLCRQSK K E D K S Q K IL F E KI F E KI F E I F V A
Figure imgf000304_0002
Example 24 [00540] Enhanced nickase-based prime editor (PE) with SSAP. [00541] PE-Max was used as a benchmark for comparison with SSAPs using SpCas9-H840A nickase. Two types of edits were compared: a 12-bp insertion using a 40bp flap donor in HEK3 (see Anzalone et al., Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nature Biotechnology 40, 731-740 (2022); and a lentiviral reporter to detect splice correction (see Gould et al., High-throughput evaluation of genetic variants with prime editing sensor libraries. Nature Biotechnology (2024) doi: 10.1038/s41587- 024-02172-9). PE-Max system and reporter vectors are depicted in FIG. 106A-FIG. 106C. SSAP vectors are depicted in FIG. 107A-FIG. 107C. STDU2-42312.601 (S22-113) [00542] As shown in FIG. 108 and FIG. 109, SSAPs enhanced PE efficiency in the study by
Figure imgf000305_0001
both measures. Example 25 [00543] Circular ssDNA donor with SSAP for high-efficiency genome insertion. [00544] HEK293T cells were transfected with Cas9n or dCas9 with an MS2-guideRNA (GGTAGTCGTACTCGTCGTCG (SEQ ID NO:500). SSAP was recruited using an MS2 aptamer. The donor was circular single stranded DNA (cssDNA) RAB11A-mCherry including RAB11A flanking homology arms (FIG. 110). cssDNA doses were and 90 ng / well, 30 ng / well, and 10 ng / well. Cells were subject to FACS analysis three days post-transfection. [00545] Controls were “no donor” and “non-SSAP.” As shown in FIG. 111A - FIG. 111C, SSAPs tested enhanced efficiency at all cssRNA doses tested.
STDU2-42312.601 (S22-113) References
Figure imgf000306_0001
1. D. Carroll, Genome engineering with targetable nucleases. Annu. Rev. 439 (2014). 2. A. Pickar-Oliver, C. A. Gersbach, The next generation of CRISPR-Cas technologies and applications. Nat. Rev. Mol. Cell Biol. 20, 490–507 (2019). 3. R. Barrangou, J. A. Doudna, Applications of CRISPR technologies in research and beyond. Nat. Biotechnol. 34, 933–941 (2016). 4. P. D. Hsu, E. S. Lander, F. Zhang, Development and applications of CRISPR-Cas9 for genome engineering. Cell. 157, 1262–1278 (2014). 5. J. A. Doudna, E. Charpentier, The new frontier of genome engineering with CRISPR-Cas9. Science. 346, 1258096 (2014). 6. J. D. Sander, J. K. Joung, CRISPR-Cas systems for editing, regulating and targeting genomes. Nat. Biotechnol. 32, 347–355 (2014). 7. T. Gaj, C. A. Gersbach, C. F. Barbas, ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends Biotechnol. 31, 397–405 (2013). 8. F. D. Urnov, E. J. Rebar, M. C. Holmes, H. S. Zhang, P. D. Gregory, Genome editing with engineered zinc finger nucleases. Nat. Rev. Genet. 11, 636–646 (2010). 9. H. Kim, J.-S. Kim, A guide to genome engineering with programmable nucleases. Nat. Rev. Genet. 15, 321–334 (2014). 10. W. Jiang, L. A. Marraffini, CRISPR-Cas: New Tools for Genetic Manipulations from Bacterial Immunity Systems. Annu. Rev. Microbiol. 69, 209–228 (2015). 11. A. V. Anzalone, L. W. Koblan, D. R. Liu, Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020). 12. M. Jasin, J. E. Haber, The democratization of gene editing: Insights from site-specific cleavage and double-strand break repair. DNA Repair. 44, 6–16 (2016). 13. N. Maizels, L. Davis, Initiation of homologous recombination at DNA nicks. Nucleic Acids Res. 46, 6962–6973 (2018). 14. S. Q. Tsai, J. K. Joung, Defining and improving the genome-wide specificities of CRISPR– Cas9 nucleases. Nat. Rev. Genet. 17, 300–312 (2016). 15. D. Kim, K. Luk, S. A. Wolfe, J.-S. Kim, Evaluating and Enhancing Target Specificity of Gene-Editing Nucleases and Deaminases. Annu. Rev. Biochem. 88, 191–220 (2019). 16. R. J. Ihry, K. A. Worringer, M. R. Salick, E. Frias, D. Ho, K. Theriault, S. Kommineni, J. Chen, M. Sondey, C. Ye, R. Randhawa, T. Kulkarni, Z. Yang, G. McAllister, C. Russ, J. Reece- Hoyes, W. Forrester, G. R. Hoffman, R. Dolmetsch, A. Kaykas, p53 inhibits CRISPR–Cas9 engineering in human pluripotent stem cells. Nat. Med. 24, 939–946 (2018). 17. O. M. Enache, V. Rendo, M. Abdusamad, D. Lam, D. Davison, S. Pal, N. Currimjee, J. Hess, S. Pantel, A. Nag, A. R. Thorner, J. G. Doench, F. Vazquez, R. Beroukhim, T. R. Golub, U. Ben-David, Cas9 activates the p53 pathway and selects for p53-inactivating mutations. Nat. Genet. 52, 662–668 (2020). STDU2-42312.601 (S22-113) 18. E. Haapaniemi, S. Botla, J. Persson, B. Schmierer, J. Taipale, CRISPR–Cas9 genome editing induces a p53-mediated DNA damage response. Nat. Med. 24, 927–930
Figure imgf000307_0001
19. C. E. Dunbar, K. A. High, J. K. Joung, D. B. Kohn, K. Ozawa, M. Sadelain, Gene therapy comes of age. Science. 359, eaan4672 (2018). 20. F. Mingozzi, K. A. High, Therapeutic in vivo gene transfer for genetic disease using AAV: progress and challenges. Nat. Rev. Genet. 12, 341–355 (2011). 21. D. Wang, F. Zhang, G. Gao, CRISPR-Based Therapeutic Genome Editing: Strategies and In Vivo Delivery by AAV Vectors. Cell. 181, 136–150 (2020). 22. H. Shivram, B. F. Cress, G. J. Knott, J. A. Doudna, Controlling and enhancing CRISPR systems. Nat. Chem. Biol. 17, 10–19 (2021). 23. C. D. Yeh, C. D. Richardson, J. E. Corn, Advances in genome editing through control of DNA repair pathways. Nat. Cell Biol. 21, 1468–1478 (2019). 24. K. S. Pawelczak, N. S. Gavande, P. S. VanderVere-Carozza, J. J. Turchi, Modulating DNA Repair Pathways to Improve Precision Genome Engineering. ACS Chem. Biol. 13, 389–396 (2018). 25. N. G. Copeland, N. A. Jenkins, D. L. Court, Recombineering: a powerful new tool for mouse functional genomics. Nat. Rev. Genet. 2, 769–779 (2001). 26. R. Kolodner, S. D. Hall, C. Luisi-DeLuca, Homologous pairing proteins encoded by the Escherichia coli recE and recT genes. Mol. Microbiol. 11, 23–30 (1994). 27. L. M. Iyer, E. V. Koonin, L. Aravind, Classification and evolutionary history of the single- strand annealing proteins, RecT, Redbeta, ERF and RAD52. BMC Genomics. 3, 8 (2002). 28. D. L. Court, J. A. Sawitzke, L. C. Thomason, Genetic Engineering Using Homologous Recombination. Annu. Rev. Genet. 36, 361–388 (2002). 29. Y. Zhang, F. Buchholz, J. P. P. Muyrers, A. F. Stewart, A new logic for DNA engineering using recombination in Escherichia coli. Nat. Genet. 20, 123–128 (1998). 30. S. Datta, N. Costantino, X. Zhou, D. L. Court, Identification and analysis of recombineering functions from Gram-negative and Gram-positive bacteria and their phages. Proc. Natl. Acad. Sci. U. S. A. 105, 1626–1631 (2008). 31. C. Wang, J. K. W. Cheng, Q. Zhang, N. W. Hughes, Q. Xia, M. M. Winslow, L. Cong, Microbial single-strand annealing proteins enable CRISPR gene-editing tools with improved knock-in efficiencies and reduced off-target effects. Nucleic Acids Res. 49, e36–e36 (2021). 32. M. Jinek, K. Chylinski, I. Fonfara, M. Hauer, J. A. Doudna, E. Charpentier, A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 337, 816–821 (2012). 33. L. S. Qi, M. H. Larson, L. A. Gilbert, J. A. Doudna, J. S. Weissman, A. P. Arkin, W. A. Lim, Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell. 152, 1173–1183 (2013). 34. D. Bikard, W. Jiang, P. Samai, A. Hochschild, F. Zhang, L. A. Marraffini, Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Res. 41, 7429–7437 (2013). STDU2-42312.601 (S22-113) 35. M. L. Maeder, S. J. Linder, V. M. Cascio, Y. Fu, Q. H. Ho, J. K. Joung, CRISPR RNA- guided activation of endogenous human genes. Nat. Methods. 10, 977–979 (2013). 36. L. A. Gilbert, M. H. Larson, L. Morsut, Z. Liu, G. A. Brar, S. E. Torres, N. Stern-Ginossar, O. Brandman, E. H. Whitehead, J. A. Doudna, W. A. Lim, J. S. Weissman, L. S. Qi, CRISPR- mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 154, 442–451 (2013). 37. P. Perez-Pinera, D. D. Kocak, C. M. Vockley, A. F. Adler, A. M. Kabadi, L. R. Polstein, P. I. Thakore, K. A. Glass, D. G. Ousterout, K. W. Leong, F. Guilak, G. E. Crawford, T. E. Reddy, C. A. Gersbach, RNA-guided gene activation by CRISPR-Cas9-based transcription factors. Nat. Methods. 10, 973–976 (2013). 38. F. Farzadfard, S. D. Perli, T. K. Lu, Tunable and multifunctional eukaryotic transcription factors based on CRISPR/Cas. ACS Synth. Biol. 2, 604–613 (2013). 39. D. L. Jones, P. Leroy, C. Unoson, D. Fange, V. Ćurić, M. J. Lawson, J. Elf, Kinetics of dCas9 target search in Escherichia coli. Science. 357, 1420–1424 (2017). 40. S. H. Sternberg, S. Redding, M. Jinek, E. C. Greene, J. A. Doudna, DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature. 507, 62–67 (2014). 41. S. C. Knight, L. Xie, W. Deng, B. Guglielmi, L. B. Witkowsky, L. Bosanac, E. T. Zhang, M. El Beheiry, J.-B. Masson, M. Dahan, Z. Liu, J. A. Doudna, R. Tjian, Dynamics of CRISPR- Cas9 genome interrogation in living cells. Science. 350, 823–826 (2015). 42. P. A. Carr, G. M. Church, Genome engineering. Nat. Biotechnol. 27, 1151–1162 (2009). 43. K. M. Esvelt, H. H. Wang, Genome-scale engineering for systems and synthetic biology. Mol. Syst. Biol. 9, 641 (2013). 44. N. Rybalchenko, E. I. Golub, B. Bi, C. M. Radding, Strand invasion promoted by recombination protein beta of coliphage lambda. Proc. Natl. Acad. Sci. U. S. A.101, 17056–17060 (2004). 45. J. A. Mosberg, M. J. Lajoie, G. M. Church, Lambda red recombineering in Escherichia coli occurs through a fully single-stranded intermediate. Genetics. 186, 791–799 (2010). 46. J. P. Muyrers, Y. Zhang, F. Buchholz, A. F. Stewart, RecE/RecT and Redalpha/Redbeta initiate double-stranded break repair by specifically interacting with their respective partners. Genes Dev. 14, 1971–1982 (2000). 47. T. M. Wannier, A. Nyerges, H. M. Kuchwara, M. Czikkely, D. Balogh, G. T. Filsinger, N. C. Borders, C. J. Gregg, M. J. Lajoie, X. Rios, C. Pál, G. M. Church, Improved bacterial recombineering by parallelized protein discovery. Proc. Natl. Acad. Sci. U. S. A. 117, 13689– 13698 (2020). 48. P. Noirot, R. D. Kolodner, DNA strand invasion promoted by Escherichia coli RecT protein. J. Biol. Chem. 273, 12274–12280 (1998). 49. L. C. Thomason, N. Costantino, D. L. Court, Examining a DNA Replication Requirement for Bacteriophage λ Red- and Rac Prophage RecET-Promoted Recombination in Escherichia coli. mBio. 7 (2016), doi:10.1128/mBio.01443-16. STDU2-42312.601 (S22-113) 50. P. Baumann, F. E. Benson, S. C. West, Human Rad51 protein promotes ATP-dependent homologous pairing and strand transfer reactions in vitro. Cell. 87, 757–766 (1996). 51. T. Sakuma, S. Nakade, Y. Sakane, K.-I. T. Suzuki, T. Yamamoto, MMEJ-assisted gene knock-in using TALENs and CRISPR-Cas9 with the PITCh systems. Nat. Protoc. 11, 118–133 (2016). 52. M. Charpentier, A. H. Y. Khedher, S. Menoret, A. Brion, K. Lamribet, E. Dardillac, C. Boix, L. Perrouault, L. Tesson, S. Geny, A. D. Cian, J. M. Itier, I. Anegon, B. Lopez, C. Giovannangeli, J. P. Concordet, CtIP fusion to Cas9 enhances transgene integration by homology- dependent repair. Nat. Commun. 9, 1–11 (2018). 53. T. Gutschner, M. Haemmerle, G. Genovese, G. F. Draetta, L. Chin, Post-translational Regulation of Cas9 during G1 Enhances Homology-Directed Repair. Cell Rep. 14, 1555–1566 (2016). 54. H. A. Rees, W.-H. Yeh, D. R. Liu, Development of hRad51–Cas9 nickase fusions that mediate HDR without double-stranded breaks. Nat. Commun. 10, 1–12 (2019). 55. Z. Zhu, N. Verma, F. González, Z.-D. Shi, D. Huangfu, A CRISPR/Cas-Mediated Selection-free Knockin Strategy in Human Embryonic Stem Cells. Stem Cell Rep. 4, 1103–1111 (2015). 56. A. Dupré, L. Boyer-Chatenet, R. M. Sattler, A. P. Modi, J.-H. Lee, M. L. Nicolette, L. Kopelovich, M. Jasin, R. Baer, T. T. Paull, J. Gautier, A forward chemical genetic screen reveals an inhibitor of the Mre11-Rad50-Nbs1 complex. Nat. Chem. Biol. 4, 119–125 (2008). 57. B. Budke, H. L. Logan, J. H. Kalin, A. S. Zelivianskaia, W. Cameron McGuire, L. L. Miller, J. M. Stark, A. P. Kozikowski, D. K. Bishop, P. P. Connell, RI-1: a chemical inhibitor of RAD51 that disrupts homologous recombination in human cells. Nucleic Acids Res. 40, 7347– 7357 (2012). 58. F. Huang, N. A. Motlekar, C. M. Burgwin, A. D. Napper, S. L. Diamond, A. V. Mazin, Identification of specific inhibitors of human RAD51 recombinase using high-throughput screening. ACS Chem. Biol. 6, 628–635 (2011). 59. N. Hustedt, D. Durocher, The control of DNA repair by the cell cycle. Nat. Cell Biol. 19, 1–9 (2016). 60. C. J. Bostock, D. M. Prescott, J. B. Kirkpatrick, An evaluation of the double thymidine block for synchronizing mammalian cells at the G1-S border. Exp. Cell Res. 68, 163–168 (1971). 61. K. Shedden, S. Cooper, Analysis of cell-cycle-specific gene expression in human cells as determined by microarrays and double-thymidine block synchronization. Proc. Natl. Acad. Sci. U. S. A. 99, 4379–4384 (2002). 62. F. A. Ran, L. Cong, W. X. Yan, D. A. Scott, J. S. Gootenberg, A. J. Kriz, B. Zetsche, O. Shalem, X. Wu, K. S. Makarova, E. V. Koonin, P. A. Sharp, F. Zhang, In vivo genome editing using Staphylococcus aureus Cas9. Nature. 520, 186–191 (2015). 63. R. J. Austin, T. Xia, J. Ren, T. T. Takahashi, R. W. Roberts, Designed arginine-rich RNA- binding peptides with picomolar affinity. J. Am. Chem. Soc. 124, 10966–10967 (2002). STDU2-42312.601 (S22-113) 64. S. Q. Tsai et al., GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol 33, 187-197 (2015).
Figure imgf000310_0001
65. C. L. Nobles et al., iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity. Genome Biol 20, 14 (2019). 66. S. Nakade et al., Microhomology-mediated end-joining-dependent integration of donor DNA in cells and animals using TALENs and CRISPR/Cas9. Nat Commun 5, 5560 (2014). 67. A. Paix et al., Precision genome editing using synthesis-dependent repair of Cas9-induced DNA breaks. Proc Natl Acad Sci U S A 114, E10745-E10754 (2017). 68. O. Kanca et al., An efficient CRISPR-based strategy to insert small and large fragments of DNA using short homology arms. Elife 8, (2019). 69. K. J. Tatiossian et al., Rational Selection of CRISPR-Cas9 Guide RNAs for Homology- Directed Genome Editing. Mol Ther 29, 1057-1069 (2021). [00546] All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein. [00547] Preferred embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those preferred embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context. [00548] The invention is further described by the following numbered paragraphs: Paragraph 1. A system or composition comprising: (i) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; and (ii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; STDU2-42312.601 (S22-113) or,
Figure imgf000311_0001
(iii) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for in a cell; or, (iv) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell. Paragraph 2. The system or composition of paragraph 1, wherein the system does not comprise a CRISPR protein, or does not comprise a Cas protein, or does not comprise a Cas9 protein, or does not comprise a Cas12a protein. Paragraph 3. The system or composition of paragraph 1 or 2, further comprising a recruitment system comprising. at least one aptamer; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein. Paragraph 4. The system or composition or composition of paragraph 3, wherein the at least one aptamer comprises an RNA aptamer or a peptide aptamer. Paragraph 5. The system or composition of paragraph 4, wherein the nucleic acid molecule or nucleic acid molecules comprise two RNA aptamer sequences. Paragraph 6. The system or composition of paragraph 5, wherein the two RNA aptamer sequences comprise the same sequence. Paragraph 7. The system or composition of any of paragraphs 3 to 6, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. Paragraph 8. The system or composition of any of paragraphs 3 to 61.a.i.5, wherein the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. Paragraph 9. The system or composition of paragraph 3 or 4, wherein the at least one aptamer is linked to the guide RNA. Paragraph 10. The system or composition of paragraph 9, wherein the guide RNA sequence comprises between 1 and 24 aptamer sequences. Paragraph 11. The system or composition of paragraph 9 or 10, wherein two or more aptamer sequences comprise the same sequence. Paragraph 12. The system or composition of any of paragraphs 3, 4, or 9 to 11, wherein the aptamer sequence comprises a GCN4 peptide sequence. STDU2-42312.601 (S22-113) Paragraph 13. The system or composition of any of paragraphs 3 to 12, wherein the recombination protein N-terminus is linked to the aptamer binding protein C-terminus. Paragraph 14. The system or composition of any of paragraphs 3 to 13, wherein further comprises a linker between the recombination protein and the aptamer binding protein. Paragraph 15. The system or composition of paragraph 14, wherein the linker comprises the amino acid sequence of SEQ ID NO:15. Paragraph 16. The system or composition of any of paragraphs 1 to 15, including at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the recombination protein. Paragraph 17. The system or composition of paragraph 16, wherein the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO:16. Paragraph 18. The system or composition of paragraph 16 or 17, wherein the nuclear localization sequence is on the recombination protein C-terminus or on the recombination N-terminus. Paragraph 19. The system or composition of any one of paragraphs 1 to 18, wherein the recombination protein comprises a microbial recombination protein or active portion thereof. Paragraph 20. The system or composition of any one of paragraphs 1 to 18, wherein the recombination protein comprises a mitochondrial recombination protein or active portion thereof. Paragraph 21. The system or composition of any one of paragraphs 1 to 18, wherein the recombination protein comprises a viral recombination protein or active portion thereof. Paragraph 22. The system or composition of any one of paragraphs 1 to 18, wherein the recombination protein comprises a eukaryotic recombination protein or active portion thereof. Paragraph 23. The system or composition of any of paragraphs 1 to 18, wherein the recombination protein comprises a recombination protein of Table 12 or derivative or variant or functional portion thereof. Paragraph 24. The system or composition of paragraph 23, wherein the recombination protein, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity or identity to an amino acid sequence of Table 12. Paragraph 25. The system or composition of paragraph 19, wherein the fusion protein comprises RecE, RecT, or derivative or variant thereof. STDU2-42312.601 (S22-113) Paragraph 26. The system or composition of paragraph 19, wherein the RecE, RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70% identity or similarity or identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-14. Paragraph 27. The system or composition of any of paragraphs 1 to 26, further comprising donor nucleic acid. Paragraph 28. The system or composition of any of paragraphs 1 to 26, wherein the target DNA sequence is a genomic DNA sequence in a host cell. Paragraph 29. The system or composition of any of paragraphs 3 to 28 wherein the aptamer binding protein and the recombination protein are functionally linked to each other and comprise a fusion protein. Paragraph 30. A cell or eukaryotic cell comprising the system or composition of any one of paragraphs 1 to 29. Paragraph 31. A method of altering a target genomic DNA sequence in a cell comprising a target genomic DNA sequence, comprising introducing the system or composition of any one of paragraphs 1 to 29 into the cell. Paragraph 32. The cell or eukaryotic cell of paragraph 30 or the method of paragraph 31, wherein the cell or eukaryotic cell is a mammalian cell. Paragraph 33. The cell or eukaryotic cell or method of paragraph 32, wherein the cell is a human cell. Paragraph 34. The cell or eukaryotic cell or method of any of paragraphs 30 to 33, wherein the cell or eukaryotic cell or mammalian cell is a stem cell. Paragraph 35. The method of any one of paragraphs 31 to 34, wherein the target genomic DNA sequence encodes a gene product. Paragraph 36. The method of any one of paragraphs 31 to 34, wherein the introducing into a cell comprises administering to a subject. Paragraph 37. The method of paragraph 36, wherein the subject is a mammalian non-human animal or a human. Paragraph 38. The method of paragraph 36 or 37, wherein the administering comprises in vivo administration. Paragraph 39. The method of any one of paragraphs 31 to 34, wherein the cell or eukaryotic cell or mammalian cell is an ex vivo or in vitro cell. STDU2-42312.601 (S22-113) Paragraph 40. The method of paragraph 39, further comprising, after the introducing step, administering to a subject the ex vivo or in vitro cells. Paragraph 41. The method of paragraph 36, wherein the subject is a mammalian non-human animal or a human. Paragraph 42. Use of the system or composition of any one of paragraphs 1 to 29 for the alteration of a target DNA sequence in a cell. Paragraph 43. A system or composition comprising: (i) a nucleic acid polymerase(s); (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription, or nucleic acid molecules comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) for expression in vivo in a cell; or, (v) vector(s) containing the nucleic acid molecule(s) of (iv) for expression in vivo in a cell. Paragraph 44. The system or composition of paragraph 43 wherein (i), (ii) and (iii), further comprises a Cas protein; or (iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) and/or a Cas protein for expression in vivo in a cell; or the vector(s) of (v) additionally contains nucleic acid molecule(s) encoding a Cas protein. Paragraph 45. The system or composition of paragraph 43 or 44, further comprising a recruitment system comprising. at least one aptamer; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein. STDU2-42312.601 (S22-113) Paragraph 46. The system or composition or composition of paragraph 45, wherein the at least one
Figure imgf000315_0001
aptamer is an RNA aptamer or a peptide aptamer. Paragraph 47. The system or composition of paragraph 46, wherein the nucleic acid molecule or nucleic acid molecules comprises two RNA aptamers. Paragraph 48. The system or composition of paragraph 47, wherein the two RNA aptamer sequences comprise the same sequence. Paragraph 49. The system or composition of any of paragraphs 45 to 47, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. Paragraph 50. The system or composition of any of paragraphs 45 to 47, wherein the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. Paragraph 51. The system or composition of paragraph 45, wherein the at least one aptamer sequence is linked to the Cas protein. Paragraph 52. The system or composition of paragraph 45, wherein the at least one aptamer sequence is linked to the guide RNA. Paragraph 53. The system or composition of paragraph 45, wherein the recruitment system comprises from 1 to 24 aptamers. Paragraph 54. The system or composition of any one of paragraphs 51 to 53, wherein two or more aptamers comprise the same sequence. Paragraph 55. The system or composition of any of paragraphs 45, 46 or 51 to 54, wherein the aptamer comprises a GCN4 peptide sequence. Paragraph 56. The system or composition of any of paragraphs 45 to 55, wherein the recombination protein N-terminus is linked to the aptamer binding protein C-terminus. Paragraph 57. The system or composition of any of paragraphs 45 to 56, wherein further comprises a linker between the recombination protein and the aptamer binding protein. Paragraph 58. The system or composition of paragraph 57, wherein the linker comprises the amino acid sequence of SEQ ID NO:15. Paragraph 59. The system or composition of any of paragraphs 43 to 58, including at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is / are linked to the recombination protein or the Cas protein or the reverse transcriptase or at least one NLS on each at least two or three of the recombination protein, the reverse transcriptase or the Cas protein. STDU2-42312.601 (S22-113) Paragraph 60. The system or composition of paragraph 59, wherein the nuclear localization
Figure imgf000316_0001
sequence comprises the amino acid sequence of SEQ ID NO:16. Paragraph 61. The system or composition of paragraph 59 or 60, which comprises a NLS on one or more of the recombination protein C-terminus, the recombination protein N-terminus or on the Cas protein. Paragraph 62. The system or composition of any one of paragraphs 43 to 61, wherein the recombination protein comprises a microbial recombination protein or active portion thereof. Paragraph 63. The system or composition of any one of paragraphs 43 to 61, wherein the recombination protein comprises a mitochondrial recombination protein or active portion thereof. Paragraph 64. The system or composition of any one of paragraphs 43 to 61, wherein the recombination protein comprises a viral recombination protein or active portion thereof. Paragraph 65. The system or composition of any one of paragraphs 43 to 61, wherein the recombination protein comprises a eukaryotic recombination protein or active portion thereof. Paragraph 66. The system or composition of any of paragraphs 43 to 61, wherein the recombination protein comprises a recombination protein of Table 12 or derivative or variant or functional portion thereof. Paragraph 67. The system or composition of paragraph 66, wherein the recombination protein, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity or identity to an amino acid sequence of Table 12. Paragraph 68. The system or composition of paragraph 62, wherein the fusion protein comprises RecE, RecT, or derivative or variant thereof. Paragraph 69. The system or composition of paragraph 62, wherein the RecE, RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70% identity or similarity or identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-14. Paragraph 70. The system or composition of any of paragraphs 44 to 69, wherein the Cas protein is catalytically inactive (less than 5% nuclease activity as compared with a wild-type or non- mutated of the Cas protein) or catalytically dead. Paragraph 71. The system or composition of any of paragraphs 44 to 70, wherein the Cas protein comprises Cas9 or Cas12a. Paragraph 72. The system or composition of any of paragraphs 70 to 71, wherein the Cas9 protein comprises wild-type Streptococcus pyogenes Cas9 or a wild-type Staphylococcus aureus Cas9. STDU2-42312.601 (S22-113) Paragraph 73. The system or composition of any of paragraphs 44 to 69, 71, or 72, wherein the
Figure imgf000317_0001
Cas protein comprises a nickase. Paragraph 74. The system or composition of paragraph 73, wherein the nickase comprises wild- type Streptococcus pyogenes Cas9 with an amino acid substation at position 10 of D10A. Paragraph 75. The system or composition of any of paragraphs 43 to 74, further comprising donor nucleic acid. Paragraph 76. The system or composition of any of paragraphs 43 to 75, wherein the target DNA sequence is a genomic DNA sequence in a host cell. Paragraph 77. The system of composition of any of paragraphs 43 to 76, wherein the nucleic acid polymerase comprises reverse transcriptase activity. Paragraph 78. The system or composition of any of paragraphs 43 to 76, wherein the nucleic acid polymerase comprises a retron RT. Paragraph 79. The system or composition of any one of paragraphs 43 to 78, wherein the nucleic acid polymerase and recombination protein are functionally linked to each other and comprise a fusion protein. Paragraph 80. The system or composition of any of paragraphs 44 to 78, wherein the nucleic acid polymerase and the Cas protein are functionally linked to each other and comprise a fusion protein. Paragraph 81. The system or composition of any of paragraphs 45 to 78, wherein the aptamer binding protein and the recombination protein are functionally linked to each other and comprise a fusion protein. Paragraph 82. The system or composition of any of paragraphs 44 to 78, wherein the recombination protein and the Cas protein are functionally linked to each other and comprise a fusion protein. Paragraph 83. The system or composition of any of paragraphs 44 to 78, wherein the RT, and the Cas protein, and the recombination protein are functionally linked to each other and comprise a fusion protein. Paragraph 84. A cell or eukaryotic cell comprising the system or composition of any one of paragraphs 43 to 83. Paragraph 85. A method of altering a target genomic DNA sequence in a cell comprising a target genomic DNA sequence, comprising introducing the system or composition of any one of paragraphs 43 to 83 into the cell. STDU2-42312.601 (S22-113) Paragraph 86. The cell or eukaryotic cell of paragraph 84 or the method of paragraph 85, wherein the cell or eukaryotic cell is a mammalian cell. Paragraph 87. The cell or eukaryotic cell or method of paragraph 86, wherein the cell is a human cell. Paragraph 88. The cell or eukaryotic cell or method of any of paragraphs 84 to 87, wherein the cell or eukaryotic cell or mammalian cell is a stem cell. Paragraph 89. The method of any one of paragraphs 85 to 88, wherein the target genomic DNA sequence encodes a gene product. Paragraph 90. The method of any one of paragraphs 85 to 88, wherein the introducing into a cell comprises administering to a subject. Paragraph 91. The method of paragraph 90, wherein the subject is a mammalian non-human animal or a human. Paragraph 92. The method of paragraph 90 or 91, wherein the administering comprises in vivo administration. Paragraph 93. The method of any one of paragraphs 85 to 88, wherein the cell or eukaryotic cell or mammalian cell is an ex vivo or in vitro cell. Paragraph 94. The method of paragraph 93, further comprising, after the introducing step, administering to a subject the ex vivo or in vitro cells. Paragraph 95. The method of paragraph 90, wherein the subject is a mammalian non-human animal or a human. Paragraph 96. Use of the system or composition of any one of paragraphs 43 to 83 for the alteration of a target DNA sequence in a cell. Paragraph 97. A method of recombination, which comprises providing in a cell, a system or composition comprising: (i) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; wherein the target DNA sequence comprises a genomic DNA sequence in the cell, and (ii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; STDU2-42312.601 (S22-113) or,
Figure imgf000319_0001
(iii) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for in a cell; or, (iv) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell. Paragraph 98. The method of paragraph 97, wherein (i) and (ii) further comprises a Cas protein or a reverse transcriptase (RT) or a Cas protein and RT; or (iii) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or a Cas protein and/or a nucleic acid polymerase for expression in vivo in the cell; or the vector(s) of (iv) additionally contains nucleic acid molecule(s) encoding a Cas protein and or RT. Paragraph 99. The method of paragraph 97 or 98, wherein the target DNA sequence comprises a genomic sequence of albumin (ALB), AAVS1, HSP90AA1, DYNLT1, ACTB, BCAP31, HIST1H2BK, CLTA, or RAB11A. Paragraph 100. The method of any of paragraphs 97 to 99, further comprising a recruitment system comprising. at least one aptamer; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein. Paragraph 101. The method of paragraph 0, wherein the at least one aptamer is an RNA aptamer or a peptide aptamer. Paragraph 102. The method of paragraph 101, wherein the nucleic acid molecule or nucleic acid molecules comprises two RNA aptamers. Paragraph 103. The method of paragraph 102, wherein the two RNA aptamer sequence comprise the same sequence. Paragraph 104. The method of any of paragraphs 100 to 102, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. Paragraph 105. The method of any of paragraphs 100 to 102, wherein the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. Paragraph 106. The method of paragraph 100, wherein the at least one aptamer is linked to the Cas protein. STDU2-42312.601 (S22-113) Paragraph 107. The method of paragraph 100, wherein the at least one aptamer is linked to the guide RNA. Paragraph 108. The method of paragraph 106, wherein the recruitment system comprises from 1 to 24 aptamers. Paragraph 109. The method of paragraph 106 or 108, wherein two or more aptamers comprise the same sequence. Paragraph 110. The method of any of paragraphs 100, 101 or 106 to 109, wherein the aptamer comprises a GCN4 peptide. Paragraph 111. The method of any of paragraphs 100 to 110, wherein the recombination protein N-terminus is linked to the aptamer binding protein C-terminus. Paragraph 112. The method of any of paragraphs 100 to 111, wherein further comprises a linker between the recombination protein and the aptamer binding protein. Paragraph 113. The method of paragraph 112, wherein the linker comprises the amino acid sequence of SEQ ID NO:15. Paragraph 114. The method of any of paragraphs 97 to 113, including at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is / are linked to the recombination protein or the Cas protein or the reverse transcriptase or at least one NLS on each at least two or three of the recombination protein, the reverse transcriptase or the Cas protein. Paragraph 115. The method of paragraph 114, wherein the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO:16. Paragraph 116. The method of paragraph 114 or 115, wherein the system comprises a NLS on one or more of the recombination protein C-terminus, the recombination protein N-terminus or on the Cas protein. Paragraph 117. The method of any one of paragraphs 97 to 116, wherein the recombination protein comprises a microbial recombination protein or active portion thereof. Paragraph 118. The method of any one of paragraphs 97 to 116, wherein the recombination protein comprises a mitochondrial recombination protein or active portion thereof. Paragraph 119. The method of any one of paragraphs 97 to 116, wherein the recombination protein comprises a viral recombination protein or active portion thereof. Paragraph 120. The method of any one of paragraphs 97 to 116, wherein the recombination protein comprises a eukaryotic recombination protein or active portion thereof. STDU2-42312.601 (S22-113) Paragraph 121. The method of any of paragraphs 97 to 116, wherein the recombination protein comprises a recombination protein of Table 12 or derivative or variant or functional portion thereof. Paragraph 122. The method of paragraph 121, wherein the recombination protein, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity or identity to an amino acid sequence of Table 12. Paragraph 123. The method of paragraph 117, wherein the fusion protein comprises RecE, RecT, or derivative or variant thereof. Paragraph 124. The method of paragraph 117, wherein the RecE, RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70% identity or similarity or identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-14. Paragraph 125. The method of any of paragraphs 98 to 124, wherein the Cas protein is catalytically inactive (less than 5% nuclease activity as compared with a wild-type or non-mutated of the Cas protein) or catalytically dead. Paragraph 126. The method of any of paragraphs 98 to 125, wherein the Cas protein comprises Cas9 or Cas12a. Paragraph 127. The method of any of paragraphs 98 to 126, wherein the Cas9 protein comprises wild-type Streptococcus pyogenes Cas9 or a wild-type Staphylococcus aureus Cas9. Paragraph 128. The method of any of paragraphs 98 to 124, 126, or 127, wherein the Cas protein comprises a nickase. Paragraph 129. The method of paragraph 128, wherein the nickase comprises wild-type Streptococcus pyogenes Cas9 with an amino acid substation at position 10 of D10A. Paragraph 130. The method of any of paragraphs 97 to 129, further comprising donor nucleic acid. Paragraph 131. The method of any of paragraphs 97 to 130, wherein the target DNA sequence is a genomic DNA sequence in a host cell. Paragraph 132. The method of any of paragraphs 98 to 131, wherein the nucleic acid polymerase comprises reverse transcriptase activity. Paragraph 133. The method of any of paragraphs 98 to 131, wherein the nucleic acid polymerase comprises a retron RT. Paragraph 134. The method of any one of paragraphs 98 to 133, wherein the RT and recombination protein are functionally linked to each other and comprise a fusion protein. STDU2-42312.601 (S22-113) Paragraph 135. The method of any of paragraphs 100 t 134, wherein the aptamer binding protein
Figure imgf000322_0001
and the recombination protein are functionally linked to each other and comprise a Paragraph 136. The method of any of paragraphs 98 to 135, wherein the RT and the Cas protein are functionally linked to each other and comprise a fusion protein. Paragraph 137. The method of any of paragraphs 98 to 136, wherein the recombination protein and the Cas protein are functionally linked to each other and comprise a fusion protein. Paragraph 138. The method of any of paragraphs 98 to 137 wherein the RT, and the Cas protein, and the recombination protein are functionally linked to each other and comprise a fusion protein. Paragraph 139. The method of any of paragraphs 97 to 138, wherein the target genomic DNA sequence encodes a gene product. Paragraph 140. The method of any of paragraphs 97 to 139, wherein the cell or eukaryotic cell is a mammalian cell. Paragraph 141. The method of any of paragraphs 97 to 140, wherein the cell is a human cell. Paragraph 142. The method of any of paragraphs 97 to 141 wherein the cell or eukaryotic cell or mammalian cell or human cell is a stem cell. Paragraph 143. A recombinant cell or organism produced by the method of any of claims 97 to 142. * * * [00549] Having thus described in detail preferred embodiments of the present invention, it is to be understood that the invention defined by the above paragraphs is not to be limited to particular details set forth in the above description as many apparent variations thereof are possible without departing from the spirit or scope of the present invention.

Claims

PATENT Docket No. STDU2-42312.601 WHAT IS CLAIMED IS: 1. A system or composition comprising: (i) a Cas protein; (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) for expression in vivo in a cell; or, (v) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell; wherein the recombination protein comprises an amino acid sequence with at least 70% similarity or 70% identity to SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:781; SEQ ID NO:782; SEQ ID NO:783; SEQ ID NO:784; SEQ ID NO:785; or SEQ ID NO:786, or derivative or variant or functional portion thereof. 2. The system or composition of claim 1 or 97, further comprising a recruitment system comprising: at least one aptamer; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein. 3. The system or composition or composition of claim 2, wherein the at least one aptamer comprises an RNA aptamer or a peptide aptamer. 4. The system or composition of claim 3, wherein the nucleic acid molecule or nucleic acid molecules comprise two RNA aptamer sequences. PATENT Docket No. STDU2-42312.601 5. The system or composition of claim 4, wherein the two RNA aptamer sequences comprise the same sequence. 6. The system or composition of any one of claims 2-5, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. 7. The system or composition of any one of claims 2-5, wherein the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. 8. The system or composition of claim 2 or 3, wherein the at least one aptamer is linked to the guide RNA. 9. The system or composition of claim 8, wherein the guide RNA sequence comprises between 1 and 24 aptamer sequences. 10. The system or composition of claim 8 or 9, wherein two or more aptamer sequences comprise the same sequence. 11. The system or composition of any one of claims 2, 3 or 8-10, wherein the aptamer sequence comprises a GCN4 peptide sequence. 12. The system or composition of any one of claims 2-11, wherein the recombination protein N-terminus is linked to the aptamer binding protein C-terminus. 13. The system or composition of any one of claims 2-12, wherein further comprises a linker between the recombination protein and the aptamer binding protein. 14. The system or composition of claim 13, wherein the linker comprises the amino acid sequence of SEQ ID NO:15. 15. The system or composition of any one of claims 1-14, including at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the recombination protein. 16. The system or composition of claim 15, wherein the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO:16. 17. The system or composition of claim 15 or 16, wherein the nuclear localization sequence is on the recombination protein C-terminus or on the recombination N-terminus. 18. The system of any one of claims 1-17, wherein the Cas protein is catalytically dead. PATENT Docket No. STDU2-42312.601 19. The system of any one of claims 1-17, wherein the Cas protein comprises a nickase. 20. The system of any one of claims 1-17, wherein the Cas protein comprises Cas9 or Cas12a. 21. The system of any one of claims 1-17, wherein the Cas9 protein comprises a wild-type Streptococcus pyogenes Cas9 or a wild type Staphylococcus aureus Cas9. 22. The system or composition of any one of claims 1-21, further comprising donor nucleic acid. 23. The system or composition of claim 22, wherein the donor nucleic acid comprises a single stranded nucleic acid. 24. The system or composition of any one of claims 1-23, wherein the target DNA sequence is a genomic DNA sequence in a host cell. 25. The system or composition of any one of claims 2-24 wherein the aptamer binding protein and the recombination protein are functionally linked to each other and comprise a fusion protein. 26. A cell or eukaryotic cell comprising the system or composition of any one of claims 1-25. 27. A method of altering a target genomic DNA sequence in a cell comprising a target genomic DNA sequence, comprising introducing the system or composition of any one of claims 1-25 into the cell. 28. The cell or eukaryotic cell of claim 26 or the method of claim 27, wherein the cell or eukaryotic cell is a mammalian cell. 29. The cell or eukaryotic cell or method of claim 28, wherein the cell is a human cell. 30. The cell or eukaryotic cell or method of any one of claims 26-29, wherein the cell or eukaryotic cell or mammalian cell is a stem cell. 31. The method of any one of claims 27-30, wherein the target genomic DNA sequence encodes a gene product. 32. The method of any one of claims 27-30, wherein the introducing into a cell comprises administering to a subject. 33. The method of claim 32, wherein the subject is a mammalian non-human animal or a human. PATENT Docket No. STDU2-42312.601 34. The method of claim 32 or 33, wherein the administering comprises in vivo administration. 35. The method of any one of claims 27-30, wherein the cell or eukaryotic cell or mammalian cell is an ex vivo or in vitro cell. 36. The method of claim 35, further comprising, after the introducing step, administering to a subject the ex vivo or in vitro cells. 37. The method of claim 32, wherein the subject is a mammalian non-human animal or a human. 38. Use of the system or composition of any one of claims 1-25 for the alteration of a target DNA sequence in a cell. 39. A system or composition comprising: (i) a nucleic acid polymerase(s); (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription, or nucleic acid molecules comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) for expression in vivo in a cell; or, (v) vector(s) containing the nucleic acid molecule(s) of (iv) for expression in vivo in a cell. 40. The system or composition of claim 39, wherein the recombination protein comprises an amino acid sequence having at least 70% similarity or 70% identity to SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:781; SEQ ID NO:782; SEQ ID NO:783; PATENT Docket No. STDU2-42312.601 SEQ ID NO:784; SEQ ID NO:785; or SEQ ID NO:786, or derivative or variant or functional portion thereof. 41. The system or composition of claim 40, wherein the nucleic acid polymerase comprises a reverse transcriptase. 42. The system or composition of claim 40, wherein the nucleic acid polymerase comprises a reverse transcriptase, wherein the reverse transcriptase comprises an amino acid sequence having at least 70% similarity or 70% identity to a reverse transcriptase of any one of SEQ ID NO:627 to SEQ ID NO:755. 43. The system or composition of claim 39, wherein the nucleic acid polymerase comprises a reverse transcriptase, wherein the reverse transcriptase comprises an amino acid sequence having at least 70% similarity or 70% identity to a reverse transcriptase of any one of SEQ ID NO:627 to SEQ ID NO:755. 44. The system or composition of any one of claims 39 to 43, wherein (i), (ii) and (iii), further comprises a Cas protein; or (iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) and/or a Cas protein for expression in vivo in a cell; or the vector(s) of (v) additionally contains nucleic acid molecule(s) encoding a Cas protein. 45. The system or composition of any one of claims 39 to 43, which comprises a prime editor. 46. The system or composition of any one of claims 39 to 44, further comprising a recruitment system comprising. at least one aptamer; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein. 47. The system or composition or composition of claim 46, wherein the at least one aptamer is an RNA aptamer or a peptide aptamer. 48. The system or composition of claim 47, wherein the nucleic acid molecule or nucleic acid molecules comprises two RNA aptamers. 49. The system or composition of claim 48, wherein the two RNA aptamer sequences comprise the same sequence. PATENT Docket No. STDU2-42312.601 50. The system or composition of any of claims 46-48, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. 51. The system or composition of any of claims 46-48, wherein the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. 52. The system or composition of claim 46, wherein the at least one aptamer sequence is linked to the Cas protein. 53. The system or composition of claim 46, wherein the at least one aptamer sequence is linked to the guide RNA. 54. The system or composition of claim 46, wherein the recruitment system comprises from 1 to 24 aptamers. 55. The system or composition of any one of claims 52 to 54, wherein two or more aptamers comprise the same sequence. 56. The system or composition of any of claims 46, 47 or 52-55, wherein the aptamer comprises a GCN4 peptide sequence. 57. The system or composition of any of claims 46-56, wherein the recombination protein N- terminus is linked to the aptamer binding protein C-terminus. 58. The system or composition of any of claims 46-57, wherein further comprises a linker between the recombination protein and the aptamer binding protein. 59. The system or composition of claim 58, wherein the linker comprises the amino acid sequence of SEQ ID NO:15. 60. The system or composition of any of claims 39-59, including at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is / are linked to the recombination protein or the Cas protein or the reverse transcriptase or at least one NLS on each at least two or three of the recombination protein, the reverse transcriptase or the Cas protein. 61. The system or composition of claim 60, wherein the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO:16. PATENT Docket No. STDU2-42312.601 62. The system or composition of claim 60 or 61, which comprises a NLS on one or more of the recombination protein C-terminus, the recombination protein N-terminus or on the Cas protein. 63. The system or composition of any one of claims 39-62, wherein the recombination protein comprises a microbial recombination protein or active portion thereof. 64. The system or composition of any one of claims 39-62, wherein the recombination protein comprises a mitochondrial recombination protein or active portion thereof. 65. The system or composition of any one of claims 39-62, wherein the recombination protein comprises a viral recombination protein or active portion thereof. 66. The system or composition of any one of claims 39-62, wherein the recombination protein comprises a eukaryotic recombination protein or active portion thereof. 67. The system or composition of claim 63, wherein the fusion protein comprises RecE, RecT, or derivative or variant thereof. 68. The system or composition of claim 63, wherein the RecE, RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70% or similarity or 70% identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-14. 69. The system or composition of any of claims 44-68, wherein the Cas protein is catalytically inactive (less than 5% nuclease activity as compared with a wild-type or non-mutated of the Cas protein) or catalytically dead. 70. The system or composition of any of claims 44-69, wherein the Cas protein comprises Cas9 or Cas12a. 71. The system or composition of any of claims 69-70, wherein the Cas9 protein comprises wild-type Streptococcus pyogenes Cas9 or a wild-type Staphylococcus aureus Cas9. 72. The system or composition of any of claims 44-68, 70 or 71, wherein the Cas protein comprises a nickase. 73. The system or composition of claim 72, wherein the nickase comprises wild-type Streptococcus pyogenes Cas9 with an amino acid substation at position 10 of D10A. 74. The system or composition of any of claims 39-73, further comprising donor nucleic acid. PATENT Docket No. STDU2-42312.601 75. The system or composition of any of claims 39-74, wherein the target DNA sequence is a genomic DNA sequence in a host cell. 76. The system of composition of any of claims 39-75, wherein the nucleic acid polymerase comprises reverse transcriptase activity. 77. The system or composition of any of claims 39-75, wherein the nucleic acid polymerase comprises a retron RT. 78. The system or composition of any one of claims 39-77 wherein the nucleic acid polymerase and recombination protein are functionally linked to each other and comprise a fusion protein. 79. The system or composition of any of claims 43-77 wherein the nucleic acid polymerase and the Cas protein are functionally linked to each other and comprise a fusion protein. 80. The system or composition of any of claims 46-77 wherein the aptamer binding protein and the recombination protein are functionally linked to each other and comprise a fusion protein. 81. The system or composition of any of claims 43-77 wherein the recombination protein and the Cas protein are functionally linked to each other and comprise a fusion protein. 82. The system or composition of any of claims 43-77 wherein the RT, and the Cas protein, and the recombination protein are functionally linked to each other and comprise a fusion protein. 83. A cell or eukaryotic cell comprising the system or composition of any one of claims 39- 82. 84. A method of altering a target genomic DNA sequence in a cell comprising a target genomic DNA sequence, comprising introducing the system or composition of any one of claims 39-82 into the cell. 85. The cell or eukaryotic cell of claim 83 or the method of claim 84, wherein the cell or eukaryotic cell is a mammalian cell. 86. The cell or eukaryotic cell or method of claim 85, wherein the cell is a human cell. 87. The cell or eukaryotic cell or method of any of claims 83-86, wherein the cell or eukaryotic cell or mammalian cell is a stem cell. PATENT Docket No. STDU2-42312.601 88. The method of any one of claims 84-87, wherein the target genomic DNA sequence encodes a gene product. 89. The method of any one of claims 84-87, wherein the introducing into a cell comprises administering to a subject. 90. The method of claim 89, wherein the subject is a mammalian non-human animal or a human. 91. The method of claim 89 or 90, wherein the administering comprises in vivo administration. 92. The method of any one of claims 84-87, wherein the cell or eukaryotic cell or mammalian cell is an ex vivo or in vitro cell. 93. The method of claim 92, further comprising, after the introducing step, administering to a subject the ex vivo or in vitro cells. 94. The method of claim 89, wherein the subject is a mammalian non-human animal or a human. 95. Use of the system or composition of any one of claims 39-82 for the alteration of a target DNA sequence in a cell. 96. A system or composition comprising: (i) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; and (ii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iii) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) for expression in vivo in a cell; or, (iv) vector(s) containing the nucleic acid molecule(s) of (iii) for expression in vivo in a cell; wherein the recombination protein comprises an amino acid sequence with at least 70% similarity or 70% identity to SEQ ID NO:779; SEQ ID NO:766; SEQ ID NO:767; SEQ ID NO:768; SEQ PATENT Docket No. STDU2-42312.601 ID NO:769; SEQ ID NO:770; SEQ ID NO:771; SEQ ID NO:772; SEQ ID NO:773; SEQ ID NO:774; SEQ ID NO:775; SEQ ID NO:776; SEQ ID NO:777; SEQ ID NO:778; SEQ ID NO:780; SEQ ID NO:781; SEQ ID NO:782; SEQ ID NO:783; SEQ ID NO:784; SEQ ID NO:785; or SEQ ID NO:786, or derivative or variant or functional portion thereof. 97. The system or composition of claim 96, wherein the system does not comprise a CRISPR protein, or does not comprise a Cas protein, or does not comprise a Cas9 protein, or does not comprise a Cas12a protein.
PCT/US2024/042871 2023-08-17 2024-08-19 Rna-guided genome recombineering at kilobase scale Pending WO2025038989A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363533192P 2023-08-17 2023-08-17
US63/533,192 2023-08-17

Publications (1)

Publication Number Publication Date
WO2025038989A1 true WO2025038989A1 (en) 2025-02-20

Family

ID=94632736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/042871 Pending WO2025038989A1 (en) 2023-08-17 2024-08-19 Rna-guided genome recombineering at kilobase scale

Country Status (1)

Country Link
WO (1) WO2025038989A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190390179A1 (en) * 2016-04-12 2019-12-26 Solis Biodyne Oü Synthetic reverse transcriptases and uses thereof
WO2020191246A1 (en) * 2019-03-19 2020-09-24 The Broad Institute, Inc. Methods and compositions for editing nucleotide sequences
WO2023034925A1 (en) * 2021-09-01 2023-03-09 The Board Of Trustees Of The Leland Stanford Junior University Rna-guided genome recombineering at kilobase scale
WO2023064858A1 (en) * 2021-10-13 2023-04-20 Apellis Pharmaceuticals, Inc. Compositions and methods for genome editing the neonatal fc receptor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190390179A1 (en) * 2016-04-12 2019-12-26 Solis Biodyne Oü Synthetic reverse transcriptases and uses thereof
WO2020191246A1 (en) * 2019-03-19 2020-09-24 The Broad Institute, Inc. Methods and compositions for editing nucleotide sequences
WO2023034925A1 (en) * 2021-09-01 2023-03-09 The Board Of Trustees Of The Leland Stanford Junior University Rna-guided genome recombineering at kilobase scale
WO2023064858A1 (en) * 2021-10-13 2023-04-20 Apellis Pharmaceuticals, Inc. Compositions and methods for genome editing the neonatal fc receptor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DATABASE GenBank 19 January 2010 (2010-01-19), "Lactobacillus phage mv4 early region, complete sequence", XP093283503, Database accession no. AF182207.1 *

Similar Documents

Publication Publication Date Title
US11149259B2 (en) CRISPR-Cas systems and methods for altering expression of gene products, structural information and inducible modular Cas enzymes
US12168789B2 (en) Engineering and optimization of systems, methods, enzymes and guide scaffolds of CAS9 orthologs and variants for sequence manipulation
US20230279391A1 (en) Systems, methods, and compositions for site-specific genetic engineering using programmable addition via site-specific targeting elements (paste)
US20210277371A1 (en) Engineering of systems, methods and optimized guide compositions with new architectures for sequence manipulation
EP3237615B2 (en) Crispr having or associated with destabilization domains
US20250034594A1 (en) Rna-guided genome recombineering at kilobase scale
US20250354164A1 (en) Rna-guided genome recombineering at kilobase scale
CA3077086A1 (en) Systems, methods, and compositions for targeted nucleic acid editing
WO2025038989A1 (en) Rna-guided genome recombineering at kilobase scale
WO2024119154A1 (en) Programmable dna transposases for nucleic acid manipulation
WO2024168265A1 (en) Aav delivery of rna guided recombination system
Demozzi Identification of novel active Cas9 orthologs from metagenomic data
WO2025160203A1 (en) Engineered programmable dna transposases and engineered bridge rna systems for nucleic acid manipulation
WO2024168253A1 (en) Delivery of an rna guided recombination system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24855035

Country of ref document: EP

Kind code of ref document: A1