WO2023242225A1 - Ribosomal protein s15 in crispr transposon mediated sequence engineering - Google Patents
Ribosomal protein s15 in crispr transposon mediated sequence engineering Download PDFInfo
- Publication number
- WO2023242225A1 WO2023242225A1 PCT/EP2023/065861 EP2023065861W WO2023242225A1 WO 2023242225 A1 WO2023242225 A1 WO 2023242225A1 EP 2023065861 W EP2023065861 W EP 2023065861W WO 2023242225 A1 WO2023242225 A1 WO 2023242225A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- protein
- nucleic acid
- sequence
- cas12k
- dna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
Definitions
- the present invention relates to vectors and methods related to use of the RNA-programmable, CRISPR-associated Tn7-like transposition complex.
- the invention provides the bacterial ribosomal protein S15 as an integral part of this complex.
- Bacterial CRISPR-associated transposons leverage RNA-guided CRISPR machineries to target specific genomic sites and recruit a Tn7-like transposition complex, or transpososome, that mediates insertion of transposon DNA at a fixed distance downstream of the target site specified by the CRISPR machinery. Sequence-specific, RNA-guided targeting by the CRISPR-Cas machinery coupled with efficient transposase-mediated DNA integration makes these systems the first truly programmable, site-specific gene insertion machineries discovered to date. CRISPR- associated transposons hold high potential as site-specific DNA insertion vectors, but transplantation of these systems from the bacterial domain into higher order organisms for genome engineering applications has not been established to date.
- the inventors previously investigated the RNA-guided DNA insertion pathway of type V CRISPR- associated transposons that relies on the pseudonuclease Cas12k, the ATPase TnsC, the transposase TnsB and the zinc-finger protein TniQ, and identified the molecular function of each of these proteins biochemically and analyzed the CRISPR-Cas12k machinery and the transposon components TnsC and TniQ at high-resolution using structural biology methods (Querques et al. Target site selection and remodelling by type V CRISPR-transposon systems. Nature 599, 497- 502 (2021 ). https://doi.org/10.1038/s41586-021-04030-z).
- the objective of the present invention is to provide means and methods to facilitate use of Tn7-like transposition in cells in which it has so far not been possible to use.
- This objective is attained by the subject-matter of the independent claims of the present specification, with further advantageous embodiments described in the dependent claims, examples, figures and general description of this specification.
- ribosomal protein S15 is provided as a single protein or as a fusion protein together with the Cas12k effector to ensure stabilization of the RNA component of the CRISPR-Cas machinery by means of specific protein-RNA interactions that we have identified at high-resolution.
- a first aspect of the invention relates to an engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand.
- the system comprises: one or more CRISPR-associated transposase proteins or functional fragments thereof, or a nucleic acid sequence encoding such transposase proteins or functional fragments thereof; a Cas protein, or a nucleic acid sequence encoding such Cas protein; a guide RNA molecule capable of complexing with the Cas protein and directing sequence specific binding of the guide-Cas protein complex to a target sequence of a target DNA strand; and a ribosomal protein S15, or a nucleic acid sequence encoding such ribosomal protein S15.
- a further aspect of the invention relates to a method for inserting a cargo DNA nucleic acid sequence into a target nucleic acid DNA sequence.
- the method comprising the steps of contacting the target nucleic acid sequence with an engineered nucleic acid targeting system according to the invention.
- the minimal set of functionalities necessary to be provided consists of the Cas protein, particularly Cas12k, the set of CRISPR-associated transposase proteins or functional fragments thereof, consisting ofthe group of TnsB, TnsC, and TniQ, and the ribosomal protein S15, in addition to the cargo protein.
- Another aspect of the invention relates to the use of a recombinant ribosomal protein S15, or of a nucleic acid sequence encoding said recombinant ribosomal protein S15, in a method for inserting a donor polynucleotide sequence into a target polynucleotide sequence, said method for inserting being facilitated by a CRISPR-associated transposase system.
- references to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X.”
- polypeptide in the context of the present specification relates to a molecule consisting of 50 or more amino acids that form a linear chain wherein the amino acids are connected by peptide bonds.
- the amino acid sequence of a polypeptide may represent the amino acid sequence of a whole (as found physiologically) protein or fragments thereof.
- polypeptides and protein are used interchangeably herein and include proteins and fragments thereof. Polypeptides are disclosed herein as amino acid residue sequences.
- peptide in the context of the present specification relates to a molecule consisting of up to 50 amino acids, in particular 8 to 30 amino acids, more particularly 8 to 15amino acids, that form a linear chain wherein the amino acids are connected by peptide bonds.
- Amino acid residue sequences are given from amino to carboxyl terminus.
- Capital letters for sequence positions refer to L-amino acids in the one-letter code (Stryer, Biochemistry, 3 rd ed. p. 21 ).
- Lower case letters for amino acid sequence positions refer to the corresponding D- or (2R)- amino acids. Sequences are written left to right in the direction from the amino to the carboxy terminus.
- amino acid residue sequences are denominated by either a three letter or a single letter code as indicated as follows: Alanine (Ala, A), Arginine (Arg, R), Asparagine (Asn, N), Aspartic Acid (Asp, D), Cysteine (Cys, C), Glutamine (Gin, Q), Glutamic Acid (Glu, E), Glycine (Gly, G), Histidine (His, H), Isoleucine (lie, I), Leucine (Leu, L), Lysine (Lys, K), Methionine (Met, M), Phenylalanine (Phe, F), Proline (Pro, P), Serine (Ser, S), Threonine (Thr, T), Tryptophan (Trp, W), Tyrosine (Tyr, Y), and Valine (Vai, V).
- engineered when used to characterize proteins or nucleic acids in the context of the present specification relates to components of the Tn7 system that are changed by sequence, or by context of expression (a modified or unmodified protein being expressed in an organism where it does not occur naturally) in comparison to the natural origin of the protein or nucleic acid thus characterized.
- an engineered system may be comprised of different components that are each unchanged from their natural sequence, but do not occur together.
- PAM in the context of the present specification relates to protospacer adjacent motif.
- NTS in the context of the present specification relates to non-target strand, the strand that has the same sequence as the guide crRNA, as opposed to the TS (target strand), which is base-paired to the guide RNA.
- amino acid linker or peptide linker refers to a polypeptide of variable length that is used to connect two polypeptides in order to generate a single chain polypeptide.
- linkers useful for practicing the invention specified herein are oligopeptide chains consisting of 1 , 2, 3, 4, 5, 10, 20, 30, 40 or 50 amino acids.
- the linker consists of amino acids selected from the group of G S, A and D.
- An important characteristic of the conjugate peptide linkers as specified above are low immunogenicity, and a peptide length that allows the domains which are joined by the linker, to interact to form a functional entity as disclosed herein.
- the sequences are primarily made up of stretches of small, polar amino acids such as glycine (G) and serine (S).
- peptide linker is >15 amino acids in length, particularly 15 to 30 amino acids in length wherein the amino acids are selected from G S, A and D.
- amino acid linker is a monomer or di-, tri- or tetramer of a peptide motif composed of three or four glycine and one serine.
- sequences similar or homologous are also part of the invention.
- the sequence identity at the amino acid level can be about 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher.
- the sequence identity can be about 70%, 75%, 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher.
- substantial identity exists when the nucleic acid segments will hybridize under selective hybridization conditions (e.g., very high stringency hybridization conditions), to the complement of the strand.
- the nucleic acids may be present in whole cells, in a cell lysate, or in a partially purified or substantially pure form.
- sequence identity and percentage of sequence identity refer to a single quantitative parameter representing the result of a sequence comparison determined by comparing two aligned sequences position by position.
- Methods for alignment of sequences for comparison are well-known in the art. Alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2:482 (1981 ), by the global alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman, Proc. Nat. Acad. Sci.
- sequence identity values refer to the value obtained using the BLAST suite of programs (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) using the above identified default parameters for protein and nucleic acid comparison, respectively.
- the term having substantially the same biological activity in the context of the present invention relates to the function of a ribosomal S15protein in reconstituting CRISPR-associated Tn7-like transposon activity in a system containing activity of Cas12k, TnsB, TnsC, and TniQ as well as an appropriate guide RNA, but no native S15 activity.
- gene refers to a polynucleotide containing at least one open reading frame (ORF) that is capable of encoding a particular polypeptide or protein after being transcribed and translated.
- ORF open reading frame
- a polynucleotide sequence can be used to identify larger fragments or full-length coding sequences of the gene with which they are associated. Methods of isolating larger fragment sequences are known to those of skill in the art.
- transgene in the context of the present specification relates to a gene or genetic material that has been transferred from one organism to another.
- the term may also refer to transfer of the natural or physiologically intact variant of a genetic sequence into tissue of a patient where it is missing. It may further refer to transfer of a natural encoded sequence the expression of which is driven by a promoter absent or silenced in the targeted tissue.
- a recombinant in the context of the present specification relates to a nucleic acid, which is the product of one or several steps of cloning, restriction and/or ligation and which is different from the naturally occurring nucleic acid.
- a recombinant virus particle comprises a recombinant nucleic acid.
- gene expression or expression may refer to either of, or both of, the processes - and products thereof - of generation of nucleic acids (RNA) or the generation of a peptide or polypeptide, also referred to transcription and translation, respectively, or any of the intermediate processes that regulate the processing of genetic information to yield polypeptide products.
- the term gene expression may also be applied to the transcription and processing of a RNA gene product, for example a regulatory RNA or a structural (e.g. ribosomal) RNA. If an expressed polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell. Expression may be assayed both on the level of transcription and translation, in other words mRNA and/or protein product.
- a first aspect of the invention relates to an engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand.
- the system comprises: one or more CRISPR-associated transposase proteins or functional fragments thereof, or a nucleic acid sequence encoding such transposase proteins or functional fragments thereof; a Cas protein, or a nucleic acid sequence encoding such Cas protein; a guide RNA molecule capable of complexing with the Cas protein and directing sequence specific binding of the guide-Cas protein complex to a target sequence of a target DNA strand; and a ribosomal protein S15, or a nucleic acid sequence encoding such ribosomal protein S15.
- the Cas protein is a Type V Cas protein.
- the invention relates to improvements of Type V CRISPR related transposase systems that enable a copy-paste- type insertion of “cargo” DNA sequences into a target sequence.
- the Type V Cas protein is a Type V-K Cas protein.
- the Type V-K Cas protein is Cas12.
- the Cas protein is Cas12k.
- Cas12k protein that is useful in practicing the invention, is the Scytonema hofmanni protein, Cas12k (WP_029636312.1 ).
- Cas12k mediates RNA-guided DNA integration into the target DNA site. Both strands of the DNA will contain the transposon DNA in ligated/integrated form.
- the inventors predict that the use of ribosomal protein S15 is not limited to Cas12k alone but may be of use to facilitate use of any system in which the Cas12k tracrRNA plays a role.
- the interaction of S15 is mediated by the tracrRNA. As far as the inventors have been able to determine, there isn’t any specific sequence that mediates this interaction, as the interaction depends also on the specific arrangement in space of the tracrRNA and the interactions are with the RNA backbone, not with specific nucleotides.
- the guide RNA molecule shown in the examples is a single guide comprising the tracrRNA sequence mediating interaction with the Cas12k protein, and the sequence-specific target interacting RNA (CRISPR RNA-crRNA). It is, however, also possible to use a dual system comprising a tracrRNA (trans-activating RNA) and a crRNA (CRISPR RNA).
- the Cas protein lacks nuclease activity.
- the Cas protein’s activity is in binding the DNA, but not cleaving it as the Cas protein used in the examples, Cas12k, is naturally mutated in residues involved in DNA cleavage. It is only the DNA binding activity that is required for transposon insertion, and not the nuclease activity. Integration is mediated by the transposon components, TnsB in the case of the Tn7-like machinery employed in the examples.
- the coding sequence may include a nuclear localization sequence peptide in applications that involve expression of the protein component of the system in a eukaryotic cell.
- the transposase protein complex is composed of CRISPR-associated transposase proteins.
- the transposase protein complex is a Tn7 transposase complex.
- the one or more (CRISPR-associated) transposase proteins or functional fragments thereof may be an engineered transposase system configured for replicative (copy-and- paste) transposition.
- the transposase proteins consist of a Tn7-like transposase system.
- the transposase proteins consist of the group composed of TnsB, TnsC, and TniQ.
- TnsB is a transposase that cleaves the 3’ termini of the transposon ends and performs the DNA ligation steps required for transposon integration.
- One non-limiting example is the Scytonema hofmanni protein, TnsB (WP_084763316.1 )
- TnsC is a AAA+ ATPase involed in target DNA recognition and transposase activation.
- One nonlimiting example is the Scytonema hofmanni protein, TnsC (WP_029636336.1 )
- TniQ is a zinc-finger protein that regulates transposition by interaction with the other components.
- One non-limiting example is the Scytonema hofmanni protein, TniQ (WP_029636334.1 ).
- the ribosomal protein S15 is ribosomal protein S15 from E. coli (AP009048.1 ) or ribosomal protein S15 from S. hofmanni (WP_029633173.1 ).
- the inventors have successfully employed ribosomal proteins S15 from E. coli and the original organism Scytonema hoffmanni (WP_029633173.1 ), but expect that other prokaryote-derived S15 proteins, as well as mitochondrial S15 proteins, will also be functional.
- the inventors predict that the S15 complementation will work with any CRISPR-Cas transposon system in which the tracrRNA of the system has an architecture that is structurally conserved with respect to the one observed for the Cas12k-associated tracrRNA, which is characterized by interaction of S15 with the specific 3D arrangement of the tracrRNA and RNA-DNA duplex.
- ribosomal protein S15 herein refers to prokaryotic S15, in the strictest sense to the E. coli or S. hofmanni ribosomal protein S15.
- the ribosomal protein S15 may be present as a recombinant S15 protein, or a vector from which the S15 may be expressed.
- the recombinant S15 protein may only consist of the S15 sequence found in nature, or a polypeptide sequence having the biological activity (measured as the ability to complement Tn7 transposase activity in a system as shown in the examples of the present specification), and characterized by at least 50% identity in comparison to S15 from S. hofmanni (WP_029633173.1 ).
- the recombinant S15 protein is characterized by at > 60% identity in comparison to S15 from S. hofmanni. In certain particular embodiments, the recombinant S15 protein is characterized by at > 70% identity in comparison to S15 from S. hofmanni. In certain more particular embodiments, the recombinant S15 protein is characterized by at > 80% identity in comparison to S15 from S. hofmanni. In certain even more particular embodiments, the recombinant S15 protein is characterized by at > 85% identity in comparison to S15 from S. hofmanni. In certain yet even more particular embodiments, the recombinant S15 protein is characterized by at > 95% identity in comparison to S15 from S. hofmanni.
- S15 polypeptide as used in the invention may also be present as a fusion protein sequence, joined to the Cas protein or one of the transposase proteins present in the system.
- the system is used for inserting a donor polynucleotide sequence into a target sequence.
- the target sequence will determine the guide RNA molecule sequence, as the guide RNA hybridizes to a sequence adjacent to the target.
- the minimal set of components of the system according to the invention are, as CRISPR components, one Cas protein (particularly Cas12k) and a guide RNA (either fused sgRNA or a dual tracrRNA + crRNA guide).
- the S15 protein must be present; it can either be fused, for example to Cas12k (to make an engineered Cas12k-S15 fusion protein) or be provided in trans.
- the minimal transposon components are Tn7-like proteins TniQ, TnsC and TnsB.
- the coding sequence may include a nuclear localization sequence peptide in applications that involve expression of the protein component of the system in a eukaryotic cell.
- the engineered nucleic acid targeting system further comprises a donor polynucleotide (DNA) sequence that is to be integrated.
- the donor polynucleotide (DNA) sequence comprises a recognition site for the recombinase and a cargo nucleic acid sequence flanked by at least one transposon end sequence.
- the cargo nucleic acid sequence is flanked by a right end sequence element and a left end sequence element.
- the donor polynucleotide or donor sequence consists of a “cargo” sequence, in other words the sequence to be inserted net of sequence elements that are present in order to facilitate the insertion.
- the cargo sequence is flanked by the left and right transposon ends.
- type V systems all the donor (including the backbone DNA) is expected to be integrated.
- type I systems or in an engineered version of the type V system, only the cargo and the terminal transposon ends are integrated.
- the donor is a piece of DNA (linear or circular) that contains transposon left/right end sequences and the cargo to be inserted in between.
- the cargo nucleic acid sequence can range in size from 100 bases to 30 kb in length of double stranded DNA.
- terminal sequences differ for different transposase family members that may be employed in the course of practicing the invention as laid out herein. But the designation “right end sequence element” and “left end sequence element” are known in the art to generally refer to all the terminal sequences of any transposons, in particular also of Tn7 transposons.
- the right end sequences I left end sequences are the termini of the transposon (they mark the boundaries of the transposon and are part of the transposon itself).
- Certain commercial applications of the invention as laid out herein may provide the protein components of the nucleic acid targeting system including the Cas protein, transposase proteins and S15 ribosomal polypeptide, either as isolated proteins (single polypeptides certain components being fused to each other), or as vectors, supplied by a commercial provider.
- the polynucleotide sequences (the cargo I donor sequence to be integrated into a target sequence, and the RNA component comprising the tracrRNA interacting with the Cas protein and the S15 protein) may be provided by a second party, or the user of the system.
- a further aspect of the invention relates to a method for inserting a cargo DNA nucleic acid sequence into a target nucleic acid DNA sequence.
- the method comprising the steps of contacting the target nucleic acid sequence with an engineered nucleic acid targeting system according to the invention.
- the minimal set of functionalities necessary to be provided consists of the Cas protein, particularly Cas12k, and guide RNA, the set of CRISPR-associated transposase proteins or functional fragments thereof, consisting of the group of TnsB, TnsC, and TniQ, and the ribosomal protein S15, in addition to the cargo protein.
- Any one of the protein components of the set of minimal functionalities, or any combination thereof, including all protein components may be provided as nucleic acid encoded components.
- the cargo polynucleotide is inserted at a position between 40 and 100 bases 3’-terminal a PAM sequence in the target polynucleotide.
- the cargo is inserted at a position between 40 and 100 bases downstream a protospacer adjacent motif (PAM) sequence in the target polynucleotide.
- PAM protospacer adjacent motif
- the PAM sequence is on the 5’ side of the target site.
- the PAM is located on the non-target strand (NTS) on the 5’ side of the target, the transposon insertion site is located on the 3’ side.
- NTS non-target strand
- the PAM (5’-3’) is the sequence of the NTS just upstream of the crRNA:TS_DNA duplex. This is the point of orientation. In case of the system used herein, the PAM sequence is always on the 5’ side of the target. In certain embodiments, the PAM comprises the sequence NGTN. In certain particular embodiments, the PAM is RGTR, VGTD, or VGTR. N: any nucleotide; G: guanine; T: thymine; R: purine (A/G); V: not T (A/G/C); D: not C (A/G/T).
- the method of the invention is directed at inserting a cargo nucleic acid sequence into a target nucleic acid sequence inside a cell, and comprises contacting the target nucleic acid sequence inside the cell, particularly inside a eukaryotic cell, with an engineered nucleic acid targeting system according to the invention.
- the method is applied to a cell that does not express ribosomal protein S15, or a homologue or orthologue of S15, prior to the cell having been contacted with the engineered nucleic acid targeting system.
- One key aspect of the invention is that it supplies a key functionality that is a constitutively expressed protein in E coli and thereby went unrecognized as a key component of the Type V transposase complex prior to the present invention.
- the cell is a eukaryotic cell.
- the cell is a mammalian cell. In certain particular embodiments, the cell is a primate cell. In certain more particular embodiments, the cell is a human cell.
- the cell is a stem cell.
- the method according to the invention in any combination of the particular embodiments of its components mentioned herein, is practiced ex-vivo.
- the engineered nucleic targeting system is delivered into the cell by one or more expression vectors.
- An expression vector in the broadest sense is a polynucleotide sequence encoding one or more of the components of the engineered nucleic acid targeting system according to the invention.
- Each coding sequence is understood to be under control of a promoter operable in the target cell.
- the promoters may be inducible or constitutive.
- Protein expression in eukaryotes is usually driven by RNA polymerase II promoters, generating mRNA having the appropriate 5’ cap and 3’ poly-A tags. Expression of RNA components may also be driven from other RNA polymerases, such as RNA polymerase III.
- Expression vectors include “naked” (closed circular plasmid or linear) DNA vectors that may be delivered enclosed in liposomes, or associated to particles. Expression vectors also include RNA molecules, from which ribosomes may translate the protein components directly. RNA can be delivered by liposomes. The extraordinary success of RNA vector-mediated Sars-Cov-2 vaccination has highlighted the potential of this technology.
- the components of the engineered nucleic acid targeting system according to the invention may be delivered by viral vectors.
- DNA virus, positive or negative strand or double stranded RNA virus have all been employed in experimental therapeutic gene transfer approaches.
- the one or more expression vectors are selected from the group consisting of viral vectors, DNA vectors and RNA vectors.
- a viral vector is used, selected from the group consisting of an adeno- associated virus, an adenovirus, a herpesvirus, and a lentivirus.
- the cell is a eukaryotic cell, particularly a mammalian cell, and at least one, particularly all of the Cas protein, the transposase proteins and the S15 protein carry a nuclear localization sequence peptide.
- Another aspect of the invention relates to the use of a recombinant ribosomal protein S15, or of a nucleic acid sequence encoding said recombinant ribosomal protein S15, in a method for inserting a donor polynucleotide sequence into a target polynucleotide sequence, said method for inserting being facilitated by a CRISPR-associated transposase system.
- the S15 protein may carry an NLS for transport to the nucleus when expressed inside a eukaryotic cell.
- the recombinant ribosomal protein is ribosomal protein S15 from E. coli (AP009048.1 ) or ribosomal protein S15 from S. hofmanni (WP_029633173.1 ), or a ribosomal protein having at least 85% sequence identity to ribosomal protein S15 from S. hofmanni (WP_029633173.1 ) and at least 80% of the biological identity of WP_029633173.1 .
- the S15 recombinant ribosomal protein is selected from the list of prokaryotic proteins in Table 1.
- the S15 recombinant ribosomal protein is selected from the list of eukaryotic proteins in Table 1 .
- the CRISPR-associated transposase system comprises Cas12k.
- the CRISPR-associated transposase system comprises Tn7-like transposase proteins.
- the Tn7-like transposase proteins comprise, particularly consist of, TnsB, TnsC, and TniQ.
- the insertion of the cargo sequence into the cell’s genome may effect a number of corrections or changes. It may introduce one or more mutations to the target sequence, for example in order to correct an existing error to revert to a wild type, or to increase the genetic diversity (for example, in a library). It may correct, or may introduce, a stop codon in the target sequence, for example to restore protein function in the case of a premature stop codon, or to abrogate a certain protein function.
- the changes made by introduction of the cargo cell may also disrupt, restore or introduce a splicing site. Alternatively, it may insert a gene or gene fragment at one or both alleles of a target.
- Mutations introduced by the donor sequence may comprise substitutions, deletions, insertions, or a combination thereof.
- the mutations may cause a shift in an open reading frame on the target polynucleotide.
- An engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand, the system comprising: a. one or more transposase proteins or functional fragments thereof, or a nucleic acid sequence encoding such transposase proteins or functional fragments thereof; b. a Cas protein, or a nucleic acid sequence encoding such Cas protein; c. a guide RNA molecule capable of complexing with the Cas protein and directing sequence specific binding of the guide-Cas protein complex to a target sequence of a target DNA strand; d. a ribosomal protein S15, or a nucleic acid sequence encoding such ribosomal protein S15.
- transposase proteins consist of the group composed of TnsB, TnsC, and TniQ.
- ribosomal protein S15 is ribosomal protein S15 from E. coli (AP009048.1 ) or ribosomal protein S15 from S. hofmanni (WP_029633173.1 ).
- the engineered nucleic acid targeting system according to any one of the preceding items, further comprising a donor polynucleotide (DNA) sequence to be integrated, wherein the donor polynucleotide (DNA) sequence comprises a recognition site for the recombinase and a cargo nucleic acid sequence flanked by at least one transposon end sequence.
- DNA donor polynucleotide
- a method for inserting a cargo nucleic acid sequence into a target nucleic acid sequence comprising contacting the target nucleic acid sequence with an engineered nucleic acid targeting system according to any one of items 1 to 7.
- a method for inserting a cargo nucleic acid sequence into a target nucleic acid sequence inside a cell comprising contacting the target nucleic acid sequence with an engineered nucleic acid targeting system according to any one of items 1 to 7.
- the cell is a mammalian cell, particularly a primate cell, more particularly a human cell.
- the cell is a stem cell, particularly wherein the method is practiced ex-vivo.
- the one or more expression vectors are selected from the group consisting of viral vectors, DNA vectors and RNA vectors.
- a viral vector selected from the group consisting of an adeno-associated virus, an adenovirus, a herpesvirus, and a lentivirus.
- the cell is a eukaryotic cell, particularly a mammalian cell, and wherein at least one, particularly all of the members of the group consisting of Cas protein, the transposase proteins and the S15 protein carry a nuclear localization sequence peptide.
- ribosomal protein S15 or of a nucleic acid sequence encoding said recombinant ribosomal protein S15, in a method for inserting a donor polynucleotide sequence into a target polynucleotide sequence, said method for inserting being facilitated by a CRISPR-associated transposase system.
- Tn7-like transposase proteins comprise, particularly consist of, TnsB, TnsC, and TniQ.
- An isolated recombinant S15 protein comprising a bacterial S15 protein sequence, a purification tag and a peptide sequence for translation of the protein into the eukaryotic nucleus, particularly wherein the isolated recombinant S15 protein is characterized by SEQ ID NO 001 or SEQ ID NO 003.
- kits comprising DNA sequences encoding the recombinant proteins:
- TnsB each of the proteins comprising a purification tag and a peptide sequence for translation of the protein into the eukaryotic nucleus, particularly wherein the recombinant proteins are characterized by SEQ ID NO 005, SEQ ID NO 007, SEQ ID NO 009, and SEQ ID NO 011 .
- the invention further encompasses the following items:
- Item 1A An engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand, the system comprising: a. a first nucleic acid sequence, or a plurality of first nucleic acids, encoding transposase proteins TnsB, TnsC, and TniQ; b. a second nucleic acid sequence encoding a Cas12k protein; c. a third nucleic acid sequence encoding a ribosomal protein S15; d. a fourth nucleic acid sequence encoding an RNA consisting of a sgRNA and a crRNA segment, with an optional linker segment separating the sgRNA from the crRNA segment.
- Item 2A The engineered nucleic acid targeting system according to Item 1A, wherein said Cas12k protein is encoded as a CAS12k fusion polypeptide containing a nuclear localization signal peptide fused to said Cas12k protein.
- Item 2B The engineered nucleic acid targeting system according to Item 2A, wherein said Cas12k protein is encoded as a polypeptide containing two nuclear localization signal peptides fused to said Cas12k protein.
- Item 2C The engineered nucleic acid targeting system according to Item 2A or 2B, wherein a peptide linker is present between the Cas12k protein and the nuclear localization signal peptide, particularly wherein the linker is 2 to 10 amino acids in length.
- Item 2D The engineered nucleic acid targeting system according to Item 2A, 2B or 2C, wherein the Cas12k protein is situated N-terminally relative to the nuclear localization signal peptide on said CAS12k fusion polypeptide.
- Item 2E The engineered nucleic acid targeting system according to Item 2A, 2B, 2C or 2D, wherein the Cas12k protein is Scytonema hofmanni Cas12k.
- Item 3A The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the ribosomal protein S15 is encoded as an S15 fusion polypeptide containing a nuclear localization signal peptide fused to said S15 protein.
- Item 3B The engineered nucleic acid targeting system according to Item 3A, wherein the S15 protein is situated C-terminally relative to the nuclear localization signal peptide on said S15 fusion polypeptide.
- Item 3C The engineered nucleic acid targeting system according to Item 3A or 3B, wherein the S15 protein is separated from the nuclear localization signal peptide by a peptide linker, particularly wherein the linker is 2 to 10 amino acids in length.
- Item 3D The engineered nucleic acid targeting system according to Item 3A, 3B or 3C, wherein the S15 protein is Scytonema hofmanni S15 or Escherichia coli S15.
- Item 4A The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the TniQ protein is encoded as a TniQ fusion polypeptide containing a nuclear localization signal peptide fused to said TniQ protein.
- Item 4B The engineered nucleic acid targeting system according to Item 4A, wherein the TniQ protein is situated N-terminally relative to the nuclear localization signal peptide on said TniQ fusion polypeptide.
- Item 4C The engineered nucleic acid targeting system according to Item 4A or 4B, wherein the TniQ protein is separated from the nuclear localization signal peptide by a peptide linker, particularly wherein the linker is 2 to 10 amino acids in length.
- Item 4D The engineered nucleic acid targeting system according to Item 4A, 4B or 4C, wherein the TniQ protein is Scytonema hofmanni TniQ.
- Item 5A The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the TnsB protein is encoded as a TnsB fusion polypeptide containing a nuclear localization signal peptide fused to said TnsB protein.
- Item 5B The engineered nucleic acid targeting system according to Item 5A, wherein the TnsB protein is situated C-terminally relative to the nuclear localization signal peptide on said TnsB fusion polypeptide.
- Item 5C The engineered nucleic acid targeting system according to Item 5A or 5B, wherein the TnsB protein is separated from the nuclear localization signal peptide by a peptide linker, particularly wherein the linker is 2 to 10 amino acids in length.
- Item 5D The engineered nucleic acid targeting system according to Item 5A, 5B or 5C, wherein the TnsB protein is Scytonema hofmanni TnsB
- Item 6A The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the TnsC protein is encoded as a TnsC fusion polypeptide containing a nuclear localization signal peptide fused to said TnsC protein.
- Item 6B The engineered nucleic acid targeting system according to Item 6A, wherein the TnsB protein is situated C-terminally relative to the nuclear localization signal peptide on said TnsC fusion polypeptide.
- Item 6C The engineered nucleic acid targeting system according to Item 6A or 6B, wherein the TnsC protein is separated from the nuclear localization signal peptide by a peptide linker, particularly wherein the linker is 2 to 10 amino acids in length.
- Item 5D The engineered nucleic acid targeting system according to Item 6A, 6B or 6C, wherein the TnsC protein is Scytonema hofmanni TnsC.
- Item 7A The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the Cas12k fusion polypeptide, the S15 fusion polypeptide, the TniQ fusion polypeptide, the TnsB fusion polypeptide and the TnsC fusion polypeptide are expressed under control of a constitutive RNA polymerase II promoter operable in a human cell.
- Item 8A The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the Cas12k fusion polypeptide, the S15 fusion polypeptide, the TniQ fusion polypeptide, the TnsB fusion polypeptide and the TnsC fusion polypeptide are each expressed under control of a separate promoter sequence.
- Item 9A The engineered nucleic acid targeting system according to any one of the preceding Items 1A to 7A, wherein the Cas12k fusion polypeptide, the S15 fusion polypeptide, the TniQ fusion polypeptide, the TnsB fusion polypeptide and the TnsC fusion polypeptide are together expressed as a single polypeptide under control of a single promoter sequence.
- Item 9B The engineered nucleic acid targeting system according to Item 9A, wherein the Cas12k fusion polypeptide, the S15 fusion polypeptide, the TniQ fusion polypeptide, the TnsB fusion polypeptide and the TnsC fusion polypeptide are separated from one another by a self-cleaving peptide sequence.
- Item 9C The engineered nucleic acid targeting system according to Item 9B, wherein the selfcleaving peptide sequence is the Thosea self-cleaving peptide sequence.
- Item 10 The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the promoter is the CMV immediate early promoter.
- Item 11 A The engineered nucleic acid targeting system according to any one of the preceding Items, wherein expression of the fourth nucleic acid sequence encoding an RNA consisting of a sgRNA and a crRNA segment is under control of an RNA polymerase III promoter operable in a human cell.
- Item 11 B The engineered nucleic acid targeting system according to Item 11 A, wherein the RNA polymerase II promoter is an U6 promoter.
- Item 12B The engineered nucleic acid targeting system according to any one of the preceding Items, further comprising a DNA comprising a donor sequence to be inserted, flanked by a left transposon end and a right transposon end, respectively.
- a DNA comprising a donor sequence to be inserted, flanked by a left transposon end and a right transposon end, respectively.
- Fig. 1 Cryo-EM structure of the Cas12k-TnsC transposon recruitment complex
- a Schematic of the type V-K CRISPR-associated transposon system from Scytonema hofmanni and the S15 gene.
- LE RE - left and right transposon ends
- b Electron density map of the Cas12k-transposon recruitment complex.
- TnsC1-TnsC9 Density for nine TnsC protomers (TnsC1-TnsC9) is shown, c, Schematic of the sgRNA structure and R-loop architecture indicating interactions between nucleic acids and protein components of the complex, (see SEQ ID NO 021 to SEQ ID NO 023) d, Structural model of the Cas12k-transposon recruitment complex (surface and cartoon representations).
- TS target DNA strand
- NTS non-target DNA strand.
- Fig. 2 R-loop completion upon complex assembly a-b, Structural comparison and detailed views of the R-loop structure comprising the crRNA portion of the single guide RNA (red cartoon backbone), the target DNA strand (TS, blue cartoon backbone) and the non-target DNA strand (NTS, dark grey cartoon backbone) bound to the Cas12k-transposon recruitment complex (a) and the Cas12k-sgRNA-target DNA complex (PDB ID:7PLA) (b). Only the REC (recognition lobe) and RuvC domains and the bridging helix (BH) of Cas12k are shown in surface representation.
- REC recognition lobe
- RuvC domains and the bridging helix (BH) of Cas12k are shown in surface representation.
- TniQ recognizes tracrRNA and completed R-loop.
- a Overview of the interactions established by TniQ with the tracrRNA scaffold (yellow cartoon backbone) and the RNA:DNA heteroduplex formed by the crRNA portion of the single guide RNA (red cartoon backbone) and the target DNA strand (TS, blue cartoon backbone). NTS (non-target DNA strand, dark grey cartoon backbone).
- N non-target DNA strand, dark grey cartoon backbone
- the N-terminus (N) and C-terminus (C) of TniQ are indicated,
- b Close-up views of key tracrRNA-interacting residues of TniQ.
- c Detailed views of R-loop recognition by TniQ.
- d-e Sitespecific transposition activity in E.
- TniQ primes TnsC oligomerization a, Detailed view of the interactions between TniQ (cartoon representation) and two protomers of the TnsC filament (TnsC1 and TnsC2, surface representation).
- the N-terminus (N) and C-terminus (C) of TniQ are indicated, b-c, Close-up views of key interactions between TniQ and TnsC1 (b) and TnsC2 (c).
- Fig. 5 TnsC assembly on PAM distal end of R-loop DNA a, Overview of the DNA duplex - target DNA strand (TS, blue cartoon backbone) and the non-target DNA strand (NTS, dark grey cartoon backbone) - bound to the Cas12k-transposon recruitment complex. Only the crRNA portion of the single guide RNA (red cartoon backbone) is shown. Residues 132-254 of Cas12k and the TnsC2 and TnsC3 protomers are hidden for better visualization of the DNA binding channel formed by the components of the complex, (b) Zoomed-in view of target DNA binding residues of TniQ.
- Fig. 6 promotes Cas12k-TnsC transposon recruitment complex assembly, (a) Zoomed-in view of S15 binding in the Cas12k-recruitment complex, (b) Co-precipitation of TnsC and TniQ in presence or absence of ShS15 (Scytonema hofmanni S15) or EcS15 (Escherichia coli S15) by immobilized Strepll-fused Cas12k-sgRNA-target DNA complex, (c) In vitro-transposition activity of purified ShCAST components in the absence or presence of E. coli and S. hofmanni ribosomal S15 proteins, as determined by droplet digital PCR (ddPCR) analysis. Data are presented as mean ⁇ s.d. (n-3 independent replicates)
- Fig. 7 Mechanism of RNA-mediated assembly in type V CRISPR-associated transposons.
- Schematic diagram depicting the recruitment of the transposition machinery by the RNA-guided Cas12k complex in type V-K CRISPR-associated transposons.
- Cas12k in association with a crRNA-tracrRNA dual guide RNA recognizes target DNA sequences to form a partial R-loop structure.
- Full R-loop formation occurs upon recruitment of S15, TniQ and TnsC.
- TniQ recognizes specific regions of the tracrRNA and primes oligomerization of TnsC filaments by holding together the first two TnsC protomers.
- TnsC forms extended protein filaments around the target DNA in presence of ATP by establishing discrete interactions with the backbone of the TS (target) and NTS (non-target) DNA strands and thereby remodeling the underlying duplex.
- the ribosomal protein S15 assists productive complex assembly by interacting with the tracrRNA and Cas12k. Altogether the assembly provides a recruitment platform for TnsB, which promotes transposon DNA insertion at the target site.
- Fig. 8 Cryo-EM data processing workflow for the Cas12k-TsnC transposon recruitment complex
- a Representative negative stain EM micrographs (at 98,000x magnification) and cryo-EM micrographs (at 130,000x magnification) of the ShCas12k-transposon recruitment complex after reconstitution and cryo-EM image processing workflow
- b Angular distribution plotted on the density map.
- c Final electron density map colored according to the local resolution
- d Fourier Shell Correlations (FSC) of the reconstruction from two independently refined half-maps.
- e Validation of ShCas12k- transposon recruitment complex structure model.
- Fig. 9 Structural comparison between the Cas12k-transposon recruitment complex and the Cas12k-TnsC non-productive complex: Electron density maps (a) and structural models (b) of the Cas12k-transposon recruitment complex and the Cas12k-TnsC non-productive complex. Side views and structural superpositions are shown. Proteins are shown in surface representation.
- Fig 10 Structural rearrangements in Cas12k and the sgRNA upon R-loop completion, a, Structural models of the Cas12k-sgRNA-target DNA complex (top) and the same assembly in the Cas12k- transposon recruitment complex (bottom). Domain organization of Cas12k is shown for both models at the bottom. REC, recognition lobe. WED, wedge domain. PI, PAM interacting domain. BH, bridging helix.
- TS target DNA strand
- NTS non-target DNA strand
- b Detailed views of the RuvC and BH arrangements in the two superposed structural models
- c Structural comparison of overlaid tracrRNA in the Cas12k-sgRNA-target DNA and the Cas12k-transposon recruitment complex.
- Fig. 11 Interactions and conservation of the ribosomal protein S15.
- Fig. 12 Examples of construct designs for the expression of type V CAST components (including S15 protein) in mammalian/human cells.
- CRISPR-associated (Cas) protein effector complexes that mediate CRISPR RNA (crRNA)-guided recognition of target nucleic acids and their subsequent nucleolytic degradation (Koonin et al., 2017; Sorek et al., 2013).
- Tn7-like transposons have co-opted RNA-guided type l-F, l-B and V-K CRISPR-Cas systems to direct transposon DNA insertion into specific target sites (Faure et al., 2019; Klompe et al., 2019; Petassi et al., 2020; Peters et al., 2017; Rybarski et al., 2021 ; Saito et al., 2021 ; Strecker et al., 2019).
- the CRISPR-Cas DNA targeting machineries either a nuclease-deficient multisubunit Cascade complex in type I systems (Klompe et al., 2019) or a single catalytically inactive Cas12k protein in type V-K systems (Strecker et al., 2019), are encoded between the left and right ends of the transposon together with CRISPR arrays and recruit the transposon machinery to target sites specified by the crRNA, ultimately resulting in transposon DNA insertion at a fixed distance downstream of the selected target DNA sequence.
- crRNAs encoded from the CRISPR arrays guide transposon insertion preferentially into other mobile genetic elements
- delocalized atypical crRNAs target integration into chromosomal sites for transposon homing (Saito et al., 2021 ).
- CRISPR-associated transposons encode CRISPR arrays (Faure et al., 2019; Peters et al., 2017; Rybarski et al., 2021 ) and additional defense systems as cargos (Klompe et al., 2022)
- these elements have been hypothesizes to mediate horizontal gene transfer of host defense systems within bacterial populations by using other transposons and plasmids as shuttle vectors.
- CRISPR-associated transposons provide programmable, targeted DNA integration machineries that have been repurposed as site-specific, homology-independent DNA insertion tools to engineer bacterial hosts (Vo et al., 2021 b; Zhang et al., 2020) and communities (Rubin et al., 2022).
- their application in eukaryotic cells has been so far hindered by our limited knowledge on the underlying mechanisms.
- RNA-guided transposition in bacteria requires the concerted activities of the CRISPR effector complex built of the pseudonuclease Cas12k, the crRNA and a trans-activating RNA (tracrRNA) and three transposon proteins: the AAA+ ATPase TnsC, the transposase TnsB and the zinc-finger protein TniQ (Strecker et al., 2019).
- Cas12k Upon binding to a 5 -GTN protospacer adjacent motif (PAM), Cas12k initiates guide RNA hybridization that yet leads to incomplete R-loop formation, suggesting that further rearrangements occurring upon recruitment of the downstream transposon machinery are required to elicit further guide RNA-target DNA hybridization and ultimately lead to recognition of ⁇ 24 bp long targets (Strecker et al., 2019).
- PAM protospacer adjacent motif
- TnsC (Park et al., 2021 ; Querques et al., 2021 ) revealed that the transposon protein assembles into DNA- and ATP- dependent helical filaments that recognize and remodel the underlying target DNA duplex and, at the same time, help tether TnsB to the target site by direct protein-protein interactions. Based on homology to the prototypical E.
- Tn7 transposon (Peters and Craig, 2001 ) and analysis of transposase-mediated integration events (Vo et al., 2021 a), the DDE-type TnsB transposase has been postulated to catalyse the 3 -DNA strand breakage and transfer reactions required for the replicative transposition pathway, resulting in transposon end DNA nicking at a donor locus and ligation to a target DNA at sites located 60-66 bp downstream of the PAM (Strecker et al., 2019). TnsC is thought to be involved in these events by activating the transposase upon recruitment to the target DNA (Peters and Craig, 2001 ).
- TnsB triggers TnsC filament disassembly by stimulating the ATP hydrolysis activity of TnsC (Park et al., 2021 ; Querques et al., 2021 ), thereby preventing insertion of multiple transposon copies into the same target site.
- This phenomenon known as target immunity, is a common feature of both canonical (Skelding et al., 2003) and CRISPR-associated Tn7 elements (Klompe et al., 2019; Strecker et al., 2019) and is conserved in homologous transposons harboring coupled transposase/ATPase systems (Adzuma and Mizuuchi, 1988; Greene and Mizuuchi, 2002).
- a structure of an ADP-bound TnsC revealed that a single ring of the ATPase engages with target DNA upon ATP hydrolysis, suggesting this intermediate complex to bridge between the Cas12k-DNA targeting complex and the transposase (Park et al., 2021 ).
- the structural details of a complete CRISPR-transposon assembly remain elusive.
- the function of the zinc-protein TniQ is also presently unclear.
- type V-K systems a monomeric, compact TniQ directly interacts with the TnsC filament and has been implicated in the regulation of its polymerization (Park et al., 2021 ; Querques et al., 2021 ).
- TniQ establishes critical interactions with the trans-activating crRNA and the R-loop at the Cas12k-proximal end of the complex and, at the same time, primes nucleation of the TnsC filament in a discrete, productive orientation distally to Cas12k.
- the host-encoded protein S15 as a bona fide component of the crRNA-guided transposition machinery that promotes complex assembly and activity by a mechanism reminiscent of its function in cellular translation.
- sgRNA single-molecule guide RNA
- tracrRNA trans activating crRNA
- the resulting atomic model of the complex comprises Cas12k and the sgRNA guide bound to the target site in the DNA, together with a single TniQ molecule and an emerging right-handed TnsC filament assembled on the PAM-distal region of the target DNA.
- Cas12k binds the DNA in a guide RNA- and PAM-dependent manner, as observed previously in the structure of the Cas12k-sgRNA- target DNA complex (Querques et al., 2021 ; Xiao et al., 2021 ).
- TniQ makes direct contacts with the tracrRNA part of the sgRNA and two TnsC protomers, thereby bridging Cas12k and the TnsC filament without directly interacting with Cas12k.
- the TnsC filament is assembled with the TnsC C- terminal domains pointing away from Cas12k.
- the reconstruction contains additional proteinaceous density, which we were able to assign to a single copy of the Escherichia coli ribosomal S15 protein that was serendipitously co-purified with TniQ, as verified by mass- spectrometric analysis of the TniQ sample.
- S15 makes extensive contacts with both Cas12k and the tracrRNA part of the sgRNA ( Figure 1C,D).
- crRNA-TS DNA hybridization beyond the 17 th base pair is prevented by TniQ binding to the complete R-loop, leaving seven unpaired nucleotides at the 3’ end of the crRNA spacer sequence.
- This observation is consistent with previous studies showing that 3’-terminally truncated crRNAs comprising 17-nucleotide spacer segments supported type V CRISPR-associated transposon activity in vivo (Saito et al., 2021 ).
- the DNA adopts a bent conformation, with the PAM-distal DNA exiting Cas12k at a 122° angle relative to the PAM-proximal DNA duplex (Figure 2A).
- the backbone of the displaced NTS can be completely traced as it wraps around Cas12k, passing through a gap between the REC lobe and the RuvC domain ( Figure 2A).
- Completion of the R-loop occurs by TniQ interacting with the PAM-distal end of the sgRNA-TS heteroduplex, and is enabled by conformational rearrangements within Cas12k ( Figure 2A,B).
- the Cas12k bridge helix which precludes full R-loop formation in the structure of the Cas12k-sgRNA- target DNA complex (Querques et al., 2021 ; Xiao et al., 2021 ), is repositioned to expose the binding cleft for the PAM-distal part of the sgRNA-TS DNA heteroduplex ( Figure 10 A,B). This is accompanied by structural ordering of the REC lobe motifs, whereby the REC1 domain (residues 12-239 Cas12k ) interacts with the unpaired NTS while the REC2 domain (residues 240-278 Cas12k ) contacts the extended crRNA-TS heteroduplex.
- TniQ recognizes tracrRNA and R-loop
- TniQ is confined by the tracrRNA rooftop loop and the PAM-distal end of the crRNA-TS DNA heteroduplex on one side and the TnsC filament on the other ( Figure 3A).
- the rooftop loop (nucleotides 167-171 tracrRNA ), which is structurally disordered in the Cas12k-sgRNA-target DNA complex (Querques et al., 2021 ; Xiao et al., 2021 ), now assumes a well-defined pentaloop conformation whose shape is read out by hydrogen bonding contacts with side chains of Gln93 TniQ , Arg98 TniQ , Lys128 TniQ , Lys132 TniQ , Gin 137 TniQ , in addition to a - stacking interaction between rA169 tracrRNA and Trp120 TniQ ( Figure 3B).
- the terminal base pair of the crRNA-TS heteroduplex is contacted by Asn59 TniQ at the minor groove edge and capped by a TT-TT stacking interaction with His57 TniQ , which is in turn hydrogen bonded to His94 TniQ (Figure 3D).
- Mutations of TniQ residues directly interacting with the crRNA-TS DNA duplex substantially reduced transposition activity in vivo ( Figure 3E).
- these results confirm the critical role of the tracrRNA rooftop loop as a TniQ interaction site, and its significance for TnsC recruitment to support transposition activity of type V CRISPR-associated transposon.
- the observed interactions of TniQ with the PAM-distal end of the guide RNA-TS DNA heteroduplex suggest that R-loop completion is facilitated by TniQ recruitment.
- TniQ nucleates TnsC filament formation
- the single TniQ molecule in the Cas12k-transposon recruitment complex straddles two TnsC protomers at the Cas12k- proximal end of the TnsC filament ( Figure 4A).
- the C-terminal zinc finger domain (ZnF2) of TniQ contacts the terminal TnsC protomer (TnsC1 ), mostly via electrostatic interactions ( Figure 4B).
- the N-terminal HTH domain of TniQ interacts extensively with the next TnsC protomer (TnsC2) in the filament.
- the N-terminal tail of TniQ inserts into a cleft in the a/0 AAA+ domain of TnsC2, with the aromatic side chain of Trp10 TniQ sandwiched by hydrophobic interactions with Tyr1 15 TnsC2 and Pro86 TnsC2 (Figure 4C).
- Glutamate substitution of TnsC1 -interacting residue Arg155 TniQ reduced in vivo by -50%, suggesting that the interaction of TniQ with TnsC1 contributes to transposon recruitment.
- TniQ three copies of TniQ assemble at the end of the TnsC filament, each bridging a TnsC dimer within the terminal hexameric helical turn of the filament.
- the interactions between the TniQ and TnsC dimers are highly similar with the interactions observed in the Cas12k- transposon recruitment complex.
- binding of Cas12k-gRNA to a TnsC filament fully capped by TniQ would not be compatible due to steric clashes with two of the three TniQ copies.
- TnsC assembles at distal end of TniQ-bound R-loop
- the TS bends away from the crRNA-TS heteroduplex and exits through a narrow channel formed by the Cas12k RuvC domain and TniQ to immediately rehybridize with the NTS ( Figure 5A).
- the first base pair of the reformed TS-NTS duplex stacks against the aromatic side chains of Tyr570 Cas12k and Phe567 Cas12k , while TS nucleotides at positions 19 and 20 make backbone interactions with TniQ by hydrogen bonding with Ser36 TniQ and Ser38 TniQ ( Figure 5B).
- TnsC1 The terminal TnsC protomer (TnsC1 ) interacts with NTS nucleotides at positions 22 and 23 via Thr121 TnsC1 , Lys103 TnsC1 and Lys150 TnsC1 ( Figure 5C).
- TnsC2 The next TnsC protomer (TnsC2) interacts with the minor groove of the duplex, contacting both TS and NTS, while TnsC3 and subsequent protomers interact mostly with the NTS.
- the interactions of the TnsC filament with the DNA involve the same residues (Thr121 TnsC , Lys103 TnsC and Lys150 TnsC ) as previously observed in the structures of dsDNA-bound TnsC filaments and validated by mutational analysis in vivo (Park et al., 2021 ; Querques et al., 2021 ).
- the PAM-distal DNA duplex is distorted from the canonical B-form geometry to match the helical symmetry of the TnsC filament.
- the TnsC filament in the Cas12k-TnsC recruitment complex tracks the DNA strand with the opposed polarity, i.e. the NTS, as compared with the TnsC-only filament ( Figure 5D).
- Ribosomal S15 protein supports Cas12k-transposon complex assembly
- the E. coll ribosomal protein S15 (EcSh15) captured in the Cas12k-transposon recruitment complex is wedged between the Cas12k REC2 domain and the tracrRNA connector duplex and contacts the ribose-phosphate backbone of the crRNA in the PAM-distal part of the crRNA-TS DNA heteroduplex (Fig. 6A, Figure 11A-C).
- EcS15 adopts a four- helix bundle fold and interacts with the tracrRNA and the heteroduplex in a manner that mimics its interactions with 16S rRNA.
- EcS15 makes extensive shape-complementary interactions with the tracrRNA rooftop loop via electrostatic interactions mediated by Arg72 EcS15 , Lys73 EcS15 and Arg77 EcS15 , and n-it stacking between Tyr69 EcS15 and rA171 tracrRNA , suggesting that it stabilizes the tracrRNA rooftop loop in a conformation that supports TniQ recruitment.
- TniQ was re-purified according to a stringent protocol that minimized EcS15 contamination. Both EcS15 and ShS15 were efficiently coprecipitated by the Cas12k-sgRNA-DNA complex. TniQ co-precipitation was markedly enhanced in the presence of EcS15 or ShE15 and TnsC, suggesting that S15 proteins facilitate the cooperative assembly of TniQ with TnsC and Cas12k-sgRNA on target DNA.
- CRISPR-associated transposons relies on the concerted activities of the RNA-guided CRISPR effector and transposase modules. In type V CRISPR-associated transposons, this is thought to involve interactions of the AAA+ ATPase transposon regulator TnsC and the RNA-guided effector Cas12k at the target site but the mechanistic details have remained elusive thus far.
- Our structural analysis of the Cas12k-transposon recruitment complex shows that the tracrRNA component of the Cas12k guide RNA and TniQ play key roles in the process by bridging Cas12k and the ATP-dependent T nsC filament assembled on the target DNA.
- TniQ was previously shown to cap TnsC filaments and restrict their polymerization on free linear dsDNA in vitro (Park et al., 2021 ; Querques et al., 2021 ). Based on these observations, we proposed a mechanistic model in which we placed TniQ at the Cas12k-distal end of the TnsC filament, restricting its polymerization to the vicinity of Cas12k (Querques et al., 2021 ). An alternative model posited that TnsC polymerization initiates randomly on target DNA and selectively stabilized by interactions with target site-bound Cas12k- and TniQ (Park et al., 2021 ).
- TniQ is an integral component of the Cascade complex (Halpin-Healy et al., 2019; Jia et al., 2020; Li et al., 2020).
- TniQ does not form a stable complex on its own with Cas12k-sgRNA and is instead recruited cooperatively together with TnsC (and S15, as discussed below).
- TniQ may also interact with TnsC in a Cas12k-independent manner, which might lead to off-target transposon recruitment and thus explain why type V CRISPR-transposon systems appear to be less specific than type I systems and more prone to off-target transposon insertion (Strecker et al., 2019; Vo et al., 2021a).
- the previously characterized distance between the Cas12k target site and the transposon insertion site implies that formation of four complete helical turns of the TnsC filament is required for TnsB recruitment and transposon insertion.
- TnsC assembles into hexameric rings on dsDNA in the presence of ADP (Park et al., 2021 )
- ADP Park et al., 2021
- TnsB-stimulated disassembly of TnsC filaments would result in a TnsC hexamer remaining bound to the Cas12k-TniQ R-loop complex, and the physical footprint of the resulting complex would explain the distance requirement for insertion site selection.
- modeling the ADP-bound TnsC hexamer onto the Cas12k-TnsC transposon recruitment complex suggests that the insertion site would be located approximately 40-46 bp from the edge of the TnsC hexamer.
- bacterial ribosomal protein S15 is an integral component of the type V CRISPR-transposon system, as it is allosterically stimulates the assembly of TniQ and, indirectly, TnsC in the Cas12k-transposon recruitment complex, thereby enhancing RNA-guided transposition in vitro.
- the involvement of a host-encoded “housekeeping” factor in the activity of a CRISPR effector complex is so far unprecedented.
- Type V systems have not yet been demonstrated to support RNA-guided transposition in eukaryotic cells and it is conceivable that this because the functional parts list of type V CRISPR-transposons has hitherto been incomplete. In sum, this work sheds light on a fundamental step in the biological mechanism of CRISPR-associated transposon systems and lays the mechanistic foundation for their development as next-generation genome engineering technologies.
- CAST CRISPR-associated transposon
- the components of the system can be delivered in several formats. These include delivery of ribonucleoprotein (RNP) complexes comprising recombinant CAST proteins and synthetic or in vitro transcribed guide RNAs, delivered into cells alongside a DNA vector encoding the cargo sequence to be inserted, flanked by left-end (LE) and right-end (RE) transposon DNA sequences.
- RNP ribonucleoprotein
- the protein and guide RNA components of the CAST system can be delivered and expressed in the form of in vitro transcribed mRNA.
- the components can be delivered in the form of DNA expression vectors, including DNA plasmids or viral vectors such as lentiviral, adenoviral or adenovirus-associated viral vectors.
- the protein-coding sequences of the individual components are codon-optimized.
- Figure 12 provides a non-exhaustive list of examples of expression constructs for the individual components of the type V CRISPR-transposon system.
- the expression of CAST protein-coding sequences is driven by a eukaryotic promoter, for example the human cytomegalovirus (CMV) promoter ( Figure 12, constructs m1-m5 and p).
- CMV human cytomegalovirus
- Expression of the guide RNA is driven by an RNA polymerase III promoter such as the U6 promoter ( Figure 12, r1 and r2).
- the RNA expression construct contains a 3’-terminal sequence encoding the hepatitis delta virus (HDV) self-cleaving ribozyme to facilitate transcriptional termination and proper processing of the transcribed RNA ( Figure 12, r1 ).
- HDV hepatitis delta virus
- the individual protein components of the CAST system are delivered and expressed from monocistronic constructs ( Figure 12, m1-m5).
- the components are expressed from a polycistronic DNA construct in which the protein-coding DNA sequences are linked with sequences encoding self-cleaving peptides such as the T2A peptide ( Figure 12, p).
- the proteins are expressed as fusion proteins in which the native polypeptide sequence is fused to a eukaryotic nuclear localization signal (NLS) peptide.
- NLS nuclear localization signal
- the native polypeptide sequence is fused to a eukaryotic nuclear localization signal (NLS) peptide.
- the NLS is fused to the amino- (N-) terminus of the native polypeptide sequence.
- the NLS is fused to the carboxy- (C-) terminus.
- the donor DNA containing the sequence of interest to be inserted (e.g. coding sequences for therapeutic proteins, chimeric antigen receptors) is designed so as the insert sequence is flanked by appropriate left-end (LE) and right-end (RE) transposon DNA sequences recognized by the TnsB transposase of the CAST system ( Figure 12, d).
- L left-end
- RE right-end
- the ShCasI 2k gene was inserted into the 1 B (Addgene 29653) and 1 R (Addgene 29664) plasmids using ligation-independent cloning (LIC), resulting in constructs carrying an N-terminal hexahistidine tag followed by a tobacco etch virus (TEV) protease cleavage site and a N-terminal hexahistidine-Strepll tag followed by a TEV cleavage site, respectively.
- LIC ligation-independent cloning
- the ShTnsC gene was inserted into the 1 S (Addgene 29659) plasmid to produce a construct carrying a N-terminal hexahistidine and hexahistidine-SUMO tag followed by a TEV cleavage site and the ShTniQ was cloned into the 1 C (Addgene 29659) vector generating a construct carrying a hexahistidine-maltose binding protein (6xHis-MBP) tag followed by a TEV cleavage site.
- the EcS15 and ShS15 genes were inserted into the 1 C vector.
- Point mutations were introduced by Gibson assembly using gBIock gene fragments synthetized by IDT or annealed oligonucleotides provided by Sigma as inserts.
- the pDonor (Addgene 127921 ), pHelper (Addgene 127924), and pTarget (Addgene 127926) plasmids used in droplet digital PCR experiments were sourced from Addgene.
- the PSP1 -targeting spacer was cloned into pHelper by Gibson assembly, yielding pHelper-PSP1 . Mutations in the sgRNA or in the sequence of the ShCas12k and ShTnsC genes were introduced into the pHelper plasmid by Gibson assembly.
- Plasmids were cloned and propagated in Mach I cells (Thermo Fisher Scientific) with the exception of pHelper, which was grown in One Shot PIR1 cells (Thermo Fisher Scientific). Plasmids were purified using the GeneJET plasmid miniprep kit (Thermo Fisher Scientific) and verified by Sanger sequencing.
- Amino acid sequences and DNA sequences, optimized for eukaryotic expression of the recombinant bacterial proteins with nuclear localisation sequences and purification tags, are given in SEQ ID NO 001 to 012.
- Sequences for sgRNA and full polycistronic expression constructs are given in SEQ ID NO 013 to 020.
- ShCas12k constructs For expression of ShCas12k constructs, hexahistidine-Strep Il-tagged and hexahistidine-tagged ShCas12k proteins were expressed in E. coli BL21 Star (DE3) cells. Cell cultures were grown at 37 °C shaking at 100 rpm until reaching an ODeoo of 0.6 and protein expression was induced with 0.4 mM IPTG (isopropyl-P-d-thiogalactopyranoside) and continued for 16 h at 18 °C.
- IPTG isopropyl-P-d-thiogalactopyranoside
- the column was washed with 100 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 5 mM Imidazole before elution with 50 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 250 mM Imidazole. Elution fractions were pooled and dialyzed overnight against 20 mM HEPES-KOH pH 7.5, 250 mM KCI, 1 mM EDTA, 1 mM DTT in the presence of tobacco etch virus (TEV) protease.
- TSV tobacco etch virus
- Dialyzed proteins were loaded onto a 5 mL HiTrap Heparin HP column (GE Healthcare) and eluted with a linear gradient of 20 mM HEPES-KOH pH 7.5, 1 M KCI. Elution fractions were pooled, concentrated using 30,000 molecular weight cut-off centrifugal filters (Merck Millipore) and further purified by size exclusion chromatography using a Superdex 200 (16/600) column (GE Healthcare) in 20 mM HEPES-KOH pH 7.5, 250 mM KCI, 1 mM DTT yielding pure, monodisperse proteins. Purified proteins were concentrated to 10-15 mg mL -1 , flash frozen in liquid nitrogen and stored at -80 °C until further usage.
- the cell suspension was lysed by ultrasonication and the lysate was cleared by centrifugation at 40,000 x g for 40 min. Cleared lysate was applied to a 5 mL Ni-NTA cartridge (Qiagen).
- the column was washed in two steps with lysis buffer supplemented with 25 mM and 100 mM imidazole, and bound proteins were eluted with 25 mL of same buffer supplemented with 500 mM imidazole pH 7.5. Eluted proteins were dialysed overnight against 20 mM Tris-HCI pH 7.5, 250 mM NaCI, 5% glycerol, 1 mM DTT in the presence of TEV protease.
- the protein was further purified using a 5 mL HiTrap HP Heparin column (GE Healthcare) and eluted with a buffer containing 20 mM Tris-HCI pH 7.5, 700 mM NaCI, 5% glycerol and 1 mM DTT.
- the eluted fraction was concentrated and further purified by size exclusion chromatography using an S200 increase (10/300 GL) column (GE Healthcare) in 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 1 mM DTT, yielding pure, monodisperse proteins.
- Purified ShTnsC was concentrated to 1-2 mg mL -1 using 30,000 kDa molecular weight cut-off centrifugal filters (Merck Millipore) and flash-frozen in liquid nitrogen.
- Eluted protein was dialysed overnight against 20 mM Tris-HCI pH 7.8, 500 mM NaCI, 1 mM DTT in the presence of TEV protease. Dialysed protein was passed through a 5 mL MBPTrap column (GE Healthcare). The flow-through fraction was concentrated and further purified by size exclusion chromatography using a Superdex 200 (16/600) column (GE Healthcare) in 20 mM Tris-HCI pH 7.8, 250 mM NaCI, 1 mM DTT. Purified ShTniQ was concentrated to 10 mg mL -1 using 10,000 kDa molecular weight cut-off centrifugal filters (Merck Millipore) and flash-frozen in liquid nitrogen.
- the protocol was adjusted as follows. Cells were harvested and resuspended in 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 5% glycerol and 5 mM imidazole supplemented with EDTA-free protease inhibitor (Roche) and lysed by ultrasonication. The lysate was cleared by centrifugation at 40,000 x g for 30 min at 4 °C and applied to two 5 mL Ni-NTA cartridges connected in tandem.
- the column was washed with 150 mL of 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 5 mM imidazole, 5 % glycerol before elution with 50 mL of 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 500 mM imidazole, 5 % glycerol into two 5 mL MBP-trap cartridges (GE Healthcare) connected in tandem before removal of the Ni-NTA cartridges.
- the MBP-trap column was washed with 50 mL of 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 500 mM imidazole, 5 % glycerol before elution with 50 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 500 mM Imidazole, 5 % glycerol, 10 mM maltose. Elution fractions were pooled and dialyzed overnight against 20 mM HEPES-KOH pH 7.5, 100 mM KCI, 1 mM DTT in the presence of tobacco etch virus (TEV) protease.
- TSV tobacco etch virus
- Dialyzed proteins were loaded onto two 5 mL HiTrap Heparin HP columns (GE Healthcare) connected in tandem, and eluted with a linear gradient of 20 mM HEPES-KOH pH 7.5, 1 M KCI. Elution fractions were pooled, concentrated using 3,000 molecular weight cut-off centrifugal filters (Merck Millipore) and further purified by size exclusion chromatography using a Superdex 75 (16/600) column (GE Healthcare) in 20 mM Tris-HCI pH 7.5, 250 mM NaCI, 1 mM DTT yielding pure, monodisperse proteins.
- EcS15 and ShS15 hexahistidine-MBP-tagged proteins were expressed in E. coli BL21 Rosetta2 (DE3) cells. Cell cultures were grown at 37 °C shaking at 100 rpm until reaching an ODeoo of 0.6 and protein expression was induced with 0.4 mM IPTG (isopropyl-P-d- thiogalactopyranoside) and continued for 16 h at 18 °C.
- IPTG isopropyl-P-d- thiogalactopyranoside
- the column was washed with 100 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 5 mM Imidazole before elution with 50 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 250 mM Imidazole. Elution fractions were pooled and dialyzed overnight against 20 mM HEPES-KOH pH 7.5, 150 mM KCI, 1 mM DTT in the presence of tobacco etch virus (TEV) protease.
- TSV tobacco etch virus
- Dialyzed proteins were loaded onto a 5 mL HiTrap Heparin HP column (GE Healthcare) and eluted with a linear gradient of 20 mM HEPES-KOH pH 7.5, 1 M KCI. Elution fractions were pooled, concentrated using 3,000 molecular weight cut-off centrifugal filters (Merck Millipore) and further purified by size exclusion chromatography using a Superdex 200 (16/600) column (GE Healthcare) in 20 mM HEPES-KOH pH 7.5, 250 mM KCI, 1 mM DTT yielding pure, monodisperse proteins. Purified proteins were concentrated to 2-5 mg mL -1 , flash frozen in liquid nitrogen and stored at -80 °C until further use.
- Samples resulting from purification of TniQ following the Protocol 2 were directly diluted in digestion buffer (10 mM Tris, 2 mM CaCI2, pH 8.2) and reduced with 2 mM TCEP(tris(2-carboxyethyl)phosphine) and alkylated with 15 mM chloroacetamide at 30°C for 30 min. Digestion was performed for all samples using the same procedure: 500 ng of Sequencing Grade Trypsin (Promega) were added for digestion carried out in a microwave instrument (Discover System, CEM) for 30 min at 5 W and 60 °C. The samples were dried to completeness and re-solubilized in 20 pL of MS sample buffer (3% acetonitrile, 0.1 % formic acid).
- LC-MS/MS analysis was performed on an Q Exactive HF mass spectrometer (Thermo Scientific) equipped with a Digital PicoView source (New Objective) and coupled to an M-Class UPLC (Waters). Solvent composition at the two channels was 0.1 % formic acid for channel A and 0.1 % formic acid, 99.9% acetonitrile for channel B.
- 80 ng of peptides were loaded on a commercial ACQUITY UPLC M-Class Symmetry C18 Trap Column (100A, 5 pm, 180 pm x 20 mm, Waters) connected to a ACQUITY UPLC M-Class HSS T3 Column (100A, 1.8 pm, 75 pm X 250 mm, Waters).
- the peptides were eluted at a flow rate of 300 nL/min. After a 3 min initial hold at 5% B, a gradient from 5 to 35 % B in 60 min and 35 to 40% B in additional 5 min was applied. The column was cleaned after the run by increasing to 95 % B and holding 95 % B for 10 min prior to re-establishing loading condition.
- the mass spectrometer was operated in data-dependent mode (DDA), funnel RF level at 60 %, and heated capillary temperature at 275 °C.
- Full-scan MS spectra (350-1500 m/z) were acquired at a resolution of 120’000 for sample from Protocol 2 and of 70’000 for sample from Protocol 1 at 200 m/z after accumulation to a target value of 3’000’000, followed by HCD (higher-energy collision dissociation) fragmentation on the twelve most intense signals per cycle.
- Ions were isolated with a 1.2 m/z isolation window and fragmented by higher-energy collisional dissociation (HCD) using a normalized collision energy of 28% for sample from Protocol 2 and of 25% for sample from Protocol 1 .
- HCD higher-energy collisional dissociation
- HCD spectra were acquired at a resolution of 30’000 and a maximum injection time of 50 ms for sample from Protocol 2 and at a resolution of 35’000 and a maximum injection time of 120 ms for sample from Protocol 1 .
- the automatic gain control (AGC) was set to 100’000 ions. Charge state screening was enabled and singly and unassigned charge states were rejected. Precursor masses previously selected for MS/MS measurement were excluded from further selection for 30 s, and the exclusion window tolerance was set at 10 ppm.
- the samples were acquired using internal lock mass calibration on m/z 371.1010 and 445.1200.
- the mass spectrometry proteomics data were handled using the local laboratory information management system (LIMS).
- LIMS local laboratory information management system
- the acquired raw MS data were processed by MaxQuant (version 1.6.2.3), followed by protein identification using the integrated Andromeda search engine. Spectra were searched against a Uniprot E. coli proteome database (UP000000625, taxonomy 83333, version from 2021-07-12), concatenated to its reversed decoyed fasta database and the sequences of the proteins of interest. Acetyl (Protein N-term) oxidation (M) and deamidation (NQ) were set as variable modification. Enzyme specificity was set to trypsin/P allowing a minimal peptide length of 7 amino acids and a maximum of two missed-cleavages. MaxQuant Orbitrap default search settings were used.
- FDR The maximum false discovery rate
- Peptide identifications were accepted if they achieved a false discovery rate (FDR) of less than 0.1 % by the Scaffold Local FDR algorithm.
- Protein identifications were accepted if they achieved an FDR of less than 1 .0% and contained at least 2 identified peptides.
- DNA sequence encoding the T7 RNA polymerase promoter upstream of the ShCas12k-sgRNA was sourced as a gBIock (IDT), cloned into a pUC19 plasmid using restriction digest with BamHI and EcoRI, and confirmed by Sanger sequencing.
- the sequence encoding the T7 RNA polymerase promoter and sgRNA was amplified by PCR and purified by ethanol precipitation for use as template for in vitro transcription with T7 RNA polymerase as described previously.
- the transcribed RNA was gel purified, precipitated with 70 % (v/v) ethanol, dried and dissolved in nuclease-free water.
- the ShCas12k-transposon recruitment complex was generated by stepwise assembly on Strep-tactin matrix as follows. First, the sgRNA was mixed with hexahistidine-Strepll-tagged ShCas12k (Strep- Cas12k) in assembly buffer (20 mM HEPES-KOH pH 7.5, 250 mM KOI, 10 mM MgCI 2 , 1 mM DTT) and incubated 20 min at 37 °C to allow formation of a binary Strep-Cas12k-sgRNA complex.
- assembly buffer (20 mM HEPES-KOH pH 7.5, 250 mM KOI, 10 mM MgCI 2 , 1 mM DTT
- a target dsDNA duplex was added to the reaction in a Strep-Cas12k:sgRNA:dsDNA molar ratio of 1 :1.2:2 and incubated for 20 min at 37 °C, yielding an assembled Strep-Cas12k-sgRNA-target DNA complex.
- the final 30 pL reaction contained 20 pg (at a concentration of 8.4 pM, 0.7 mg/mL) Strep-Cas12k, 10.1 pM sgRNA, and 16.9 pM DNA in assembly buffer.
- the resulting sample was then mixed with 25 pL Strep- Tactin beads (iba) equilibrated in pull-down wash buffer 1 (20 mM HEPES-KOH pH 7.5, 250 mM KOI, 10 mM MgCI 2 , 1 mM DTT, 0.05 % Tween20) and incubated 30 min at 4 °C on a rotating wheel.
- the beads were washed three times with pull-down wash buffer 1 to remove excess nucleic acids.
- the beads were resuspended in 250 pL of pull-down wash buffer 1 before ShTniQ (purified following Protocol
- the eluted sample were analyzed by SDS-PAGE using Any kDa gradient polyacrylamide gels (Bio-Rad) stained with Coomassie Brilliant Blue and by denaturing PAGE on a 10 % polyacrylamide- 7 M urea gel upon proteinase K digestion for 15 min at 37 °C prior to preparation of cryo-EM grids.
- recombinantly produced S15 was not added to the sample subjected to structural analysis.
- S15 was identified as a component of the Cas12k-transposon recruitment complex after structure determination and model building and was confirmed to have co-purified with TniQ using mass spectrometry as described below. Prior to structural analysis by cryo-EM, complex homogeneity was assessed by negative stain electron microscopy.
- Negative stain EM grids were prepared as described above using a sample that was 40 times diluted as compared to the one used for cryo-EM. For preparation of cryo-EM grids, 2.5 pL of sample was applied to glow-discharged 200-mesh copper
- cryo-EM grids Quantifoil Micro Tools
- blotted 3 s at 75 % humidity, 4 °C plunge frozen in liquid ethane (using a Vitrobot Mark IV plunger, FEI) and stored in liquid nitrogen.
- Cryo- EM data collection was performed on a FEI Titan Krios G3i microscope (University of Zurich, Switzerland) operated at 300 kV equipped with a Gatan K3 direct electron detector in super-resolution counting mode. A total of 16,165 movies were recorded at a calibrated magnification of 130,000 x resulting in super-resolution pixel size of 0.325 A.
- Each movie comprised 36 subframes with a total dose of 67.68 e- A" 2 .
- Data acquisition was performed with EPU Automated Data Acquisition Software for Single Particle Analysis (ThermoFisher) with three shots per hole at -1.0 pm to -2.4 pm defocus (0.2 pm steps).
- the structures of the Cas12k-sgRNA-target DNA (PDB: 7PLA) and TnsC-TniQ-DNA were docked in the new density and used as starting model to complete the Cas12k-transposon recruitment complex.
- the model building revealed a well resolved extra density between tracrRNA and Cas12k.
- a de-novo built template model resulting from this extra density was subjected to a DALI search and identified the prokaryotic ribosomal S15 protein as closely related.
- the E. coli ribosomal S15 protein was confirmed to have co-purified with TniQ by mass spectrometry and build in the extra density with great fit.
- the model was refined in Coot using restraints for the nucleic acids calculated with the LibG script (base pair, stacking plane and sugar pucker restraints) in ccp4 and finally refined using Phenix.
- Real space refinement was performed with the global minimization and atomic displacement parameter (ADP) refinement options selected. Secondary structure restraints, side chain rotamer restraints, and Ramachandran restraints were used. Key refinement statistics are listed in Table 2.
- the final atomic model includes Cas12k residues 1-142, 174- 636, sgRNA nucleotides 5-250, 41 nucleotides of each TS and NTS DNA, TniQ residues 9-167, S15 residues 3-87, seven TnsC molecules (each with residues 17-276), two Zinc and seven Magnesium cations and seven ATP molecules.
- the quality of the atomic model including basic protein and DNA geometry, Ramachandran plots, clash analysis and model cross-validation, was assessed with MolProbity and the validation tools in Phenix. Structural superposition was performed in Coot using the secondary structure matching (SSM) function. Figure preparation for maps and models and calculation of map contour levels was performed using UCSF ChimeraX.
- pHelper plasmids (20 ng) were then introduced in a new transformation reaction by heat shock, and after recovering cells in fresh LB medium at 37 °C for 1 h, cells were plated on triple antibiotic LB-agar plates containing 100 pg mL’ 1 carbenicillin, 50 pg mL -1 kanamycin, and 33 pg mL -1 chloramphenicol. After overnight growth at 37 °C for 16 h, colonies were harvested from the plates, resuspended in 15 pL Lysis buffer (TE with 0.1 % Triton X 100) and heated for 5 min at 95 °C. 60 pL of water were added to the samples before centrifugation for 10 min at 16,000 x g.
- ddPCR droplet digital PCR
- sgRNA was first mixed with hexahistidine-Strepll- tagged ShCas12k (Strep-Cas12k) in assembly buffer (20 mM HEPES-KOH pH 7.5, 250 mM KCI, 10 mM MgCL, 1 mM DTT), and incubated 20 min at RT to allow complex formation.
- assembly buffer (20 mM HEPES-KOH pH 7.5, 250 mM KCI, 10 mM MgCL, 1 mM DTT
- a dsDNA target was then added to the reaction in a Strep-Cas12k:sgRNA:dsDNA molar ratio of 1 :1.2:1.5 and incubated for 20 min at RT.
- the final 20 pL reaction contained 5 pg (at a concentration of 3.2 pM) Strep-Cas12k, 3.8 pM sgRNA, and 4.7 pM DNA in assembly buffer.
- Samples were mixed with 12.5 pL Strep-Tactin beads (iba) equilibrated in pull-down wash buffer 1 (20 mM HEPES-KOH pH 7.5, 250 mM KCI, 10 mM MgCL, 1 mM DTT, 0.05 % Tween20) and incubated 30 min at 4 °C on a rotating wheel.
- the beads were washed three times with pull-down wash buffer 2 (20 mM HEPES-KOH pH 7.5, 250 mM KCI, 10 mM MgCL, 1 mM DTT, 1 mM ATP, 0.05 % Tween20) to remove excess nucleic acids.
- the beads were resuspended in 150 pL of pull-down wash buffer 2 and EcS15 and/or ShS15 and/or ShTniQ (purified according to Protocol 2 and thus S15-free) was added in 10-fold molar excess and/or ShTnsC was added in 12-fold molar excess.
- CasX enzymes comprise a distinct family of RNA-guided genome editors. Nature 566, 218-223.
- cryoSPARC algorithms for rapid unsupervised cryo-EM structure determination. Nat Methods 14, 290-296.
Landscapes
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Organic Chemistry (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Enzymes And Modification Thereof (AREA)
Abstract
The invention relates to an engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand, comprising: - one or more CRISPR-associated transposase proteins or functional fragments thereof; - a Cas protein; - a guide RNA molecule capable of complexing with the Cas protein and directing sequence specific binding of the guide-Cas protein complex to a target sequence of a target DNA strand; and - a ribosomal protein S15. Any of the protein components may be provided encoded by a nucleic acid sequence. A further aspect of the invention relates to a method for inserting a cargo DNA nucleic acid sequence into a target nucleic acid DNA sequence by the system of the invention. Another aspect of the invention relates to the use of a recombinant ribosomal protein S15, in a method for inserting a donor polynucleotide sequence into a target polynucleotide sequence, said method for inserting being facilitated by a CRISPR-associated transposase system.
Description
Ribosomal Protein S15 in CRISPR Transposon Mediated Sequence Engineering
This application claims the benefit of EP applications 22178728.6. filed 13 June 2022 and
22178728.6 filed 18 November 2022, both of which are incorporated herein by reference.
Field
The present invention relates to vectors and methods related to use of the RNA-programmable, CRISPR-associated Tn7-like transposition complex. The invention provides the bacterial ribosomal protein S15 as an integral part of this complex.
Background
Bacterial CRISPR-associated transposons leverage RNA-guided CRISPR machineries to target specific genomic sites and recruit a Tn7-like transposition complex, or transpososome, that mediates insertion of transposon DNA at a fixed distance downstream of the target site specified by the CRISPR machinery. Sequence-specific, RNA-guided targeting by the CRISPR-Cas machinery coupled with efficient transposase-mediated DNA integration makes these systems the first truly programmable, site-specific gene insertion machineries discovered to date. CRISPR- associated transposons hold high potential as site-specific DNA insertion vectors, but transplantation of these systems from the bacterial domain into higher order organisms for genome engineering applications has not been established to date.
The inventors previously investigated the RNA-guided DNA insertion pathway of type V CRISPR- associated transposons that relies on the pseudonuclease Cas12k, the ATPase TnsC, the transposase TnsB and the zinc-finger protein TniQ, and identified the molecular function of each of these proteins biochemically and analyzed the CRISPR-Cas12k machinery and the transposon components TnsC and TniQ at high-resolution using structural biology methods (Querques et al. Target site selection and remodelling by type V CRISPR-transposon systems. Nature 599, 497- 502 (2021 ). https://doi.org/10.1038/s41586-021-04030-z).
Based on the above-mentioned state of the art, the objective of the present invention is to provide means and methods to facilitate use of Tn7-like transposition in cells in which it has so far not been possible to use. This objective is attained by the subject-matter of the independent claims of the present specification, with further advantageous embodiments described in the dependent claims, examples, figures and general description of this specification.
Summary of the Invention
We have identified an unprecedented interaction between the E. coli ribosomal protein S15 and the RNA component of the CRISPR-Cas12k machinery and we have visualized by cryo-electron microscopy the assembly of a CRISPR-transposon machinery comprising Cas12k, TniQ, TnsC
and, unexpectedly, the bacterial protein S15, indicating that S15 is a bona fide, functional component of the RNA-guided DNA insertion machinery. Based on these findings, our invention relies on the identification of the ribosomal protein S15 as a factor directly involved in the assembly and function of the CRISPR-transposon machinery and that thus needs to be provided together with the previously identified components to reconstitute efficient RNA-guided DNA insertion for technological applications. S15 is provided as a single protein or as a fusion protein together with the Cas12k effector to ensure stabilization of the RNA component of the CRISPR-Cas machinery by means of specific protein-RNA interactions that we have identified at high-resolution.
A first aspect of the invention relates to an engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand. The system comprises: one or more CRISPR-associated transposase proteins or functional fragments thereof, or a nucleic acid sequence encoding such transposase proteins or functional fragments thereof; a Cas protein, or a nucleic acid sequence encoding such Cas protein; a guide RNA molecule capable of complexing with the Cas protein and directing sequence specific binding of the guide-Cas protein complex to a target sequence of a target DNA strand; and a ribosomal protein S15, or a nucleic acid sequence encoding such ribosomal protein S15.
A further aspect of the invention relates to a method for inserting a cargo DNA nucleic acid sequence into a target nucleic acid DNA sequence. The method comprising the steps of contacting the target nucleic acid sequence with an engineered nucleic acid targeting system according to the invention. The minimal set of functionalities necessary to be provided consists of the Cas protein, particularly Cas12k, the set of CRISPR-associated transposase proteins or functional fragments thereof, consisting ofthe group of TnsB, TnsC, and TniQ, and the ribosomal protein S15, in addition to the cargo protein.
Another aspect of the invention relates to the use of a recombinant ribosomal protein S15, or of a nucleic acid sequence encoding said recombinant ribosomal protein S15, in a method for inserting a donor polynucleotide sequence into a target polynucleotide sequence, said method for inserting being facilitated by a CRISPR-associated transposase system.
Terms and definitions
For purposes of interpreting this specification, the following definitions will apply and whenever appropriate, terms used in the singular will also include the plural and vice versa. In the event that any definition set forth below conflicts with any document incorporated herein by reference, the definition set forth shall control.
The terms “comprising”, “having”, “containing”, and “including”, and other similar forms, and grammatical equivalents thereof, as used herein, are intended to be equivalent in meaning and to be open-ended in that an item or items following any one of these words is not meant to be an
exhaustive listing of such item or items, or meant to be limited to only the listed item or items. For example, an article “comprising” components A, B, and C can consist of (i.e., contain only) components A, B, and C, or can contain not only components A, B, and C but also one or more other components. As such, it is intended and understood that “comprises” and similar forms thereof, and grammatical equivalents thereof, include disclosure of embodiments of “consisting essentially of or “consisting of.”
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit, unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure.
Reference to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X.”
As used herein, including in the appended claims, the singular forms “a”, “or” and “the” include plural referents unless the context clearly dictates otherwise.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art (e.g., in cell culture, molecular genetics, nucleic acid chemistry, hybridization techniques and biochemistry). Standard techniques are used for molecular, genetic, and biochemical methods (see generally, Sambrook et al., Molecular Cloning: A Laboratory Manual, 4th ed. (2012) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. and Ausubel et al., Short Protocols in Molecular Biology (2002) 5th Ed, John Wiley & Sons, Inc.) and chemical methods.
General Biochemistry: Peptides, Amino Acid Sequences
The term polypeptide in the context of the present specification relates to a molecule consisting of 50 or more amino acids that form a linear chain wherein the amino acids are connected by peptide bonds. The amino acid sequence of a polypeptide may represent the amino acid sequence of a whole (as found physiologically) protein or fragments thereof. The term "polypeptides" and "protein" are used interchangeably herein and include proteins and fragments thereof. Polypeptides are disclosed herein as amino acid residue sequences.
The term peptide in the context of the present specification relates to a molecule consisting of up to 50 amino acids, in particular 8 to 30 amino acids, more particularly 8 to 15amino acids, that form a linear chain wherein the amino acids are connected by peptide bonds.
Amino acid residue sequences are given from amino to carboxyl terminus. Capital letters for sequence positions refer to L-amino acids in the one-letter code (Stryer, Biochemistry, 3rd ed. p.
21 ). Lower case letters for amino acid sequence positions refer to the corresponding D- or (2R)- amino acids. Sequences are written left to right in the direction from the amino to the carboxy terminus. In accordance with standard nomenclature, amino acid residue sequences are denominated by either a three letter or a single letter code as indicated as follows: Alanine (Ala, A), Arginine (Arg, R), Asparagine (Asn, N), Aspartic Acid (Asp, D), Cysteine (Cys, C), Glutamine (Gin, Q), Glutamic Acid (Glu, E), Glycine (Gly, G), Histidine (His, H), Isoleucine (lie, I), Leucine (Leu, L), Lysine (Lys, K), Methionine (Met, M), Phenylalanine (Phe, F), Proline (Pro, P), Serine (Ser, S), Threonine (Thr, T), Tryptophan (Trp, W), Tyrosine (Tyr, Y), and Valine (Vai, V).
The term “engineered” when used to characterize proteins or nucleic acids in the context of the present specification relates to components of the Tn7 system that are changed by sequence, or by context of expression (a modified or unmodified protein being expressed in an organism where it does not occur naturally) in comparison to the natural origin of the protein or nucleic acid thus characterized. In certain embodiments, an engineered system may be comprised of different components that are each unchanged from their natural sequence, but do not occur together.
The term PAM in the context of the present specification relates to protospacer adjacent motif.
The term NTS in the context of the present specification relates to non-target strand, the strand that has the same sequence as the guide crRNA, as opposed to the TS (target strand), which is base-paired to the guide RNA.
In the context of the present specification, the term amino acid linker or peptide linker refers to a polypeptide of variable length that is used to connect two polypeptides in order to generate a single chain polypeptide. Exemplary embodiments of linkers useful for practicing the invention specified herein are oligopeptide chains consisting of 1 , 2, 3, 4, 5, 10, 20, 30, 40 or 50 amino acids.
There is no constraint on the amino acid composition of the linker. In certain embodiments, the linker consists of amino acids selected from the group of G S, A and D. An important characteristic of the conjugate peptide linkers as specified above are low immunogenicity, and a peptide length that allows the domains which are joined by the linker, to interact to form a functional entity as disclosed herein. In particular desirable embodiments of the domain peptide linkers specified above, the sequences are primarily made up of stretches of small, polar amino acids such as glycine (G) and serine (S).
In certain embodiments peptide linker is >15 amino acids in length, particularly 15 to 30 amino acids in length wherein the amino acids are selected from G S, A and D.
A non-limiting example of an amino acid linker is a monomer or di-, tri- or tetramer of a peptide motif composed of three or four glycine and one serine.
Any embodiments relating peptide linkers as disclosed herein, encompass structures in which amino acids with similar characteristics are exchanged, for example, the amino acids V, L, I, P, S, C, or M may replace G, S, or S, and D may be replaced by E.
Sequences
Sequences similar or homologous (e.g., at least about 70% sequence identity) to the sequences disclosed herein are also part of the invention. In some embodiments, the sequence identity at the amino acid level can be about 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher. At the nucleic acid level, the sequence identity can be about 70%, 75%, 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher. Alternatively, substantial identity exists when the nucleic acid segments will hybridize under selective hybridization conditions (e.g., very high stringency hybridization conditions), to the complement of the strand. The nucleic acids may be present in whole cells, in a cell lysate, or in a partially purified or substantially pure form.
In the context of the present specification, the terms sequence identity and percentage of sequence identity refer to a single quantitative parameter representing the result of a sequence comparison determined by comparing two aligned sequences position by position. Methods for alignment of sequences for comparison are well-known in the art. Alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2:482 (1981 ), by the global alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman, Proc. Nat. Acad. Sci. 85:2444 (1988) or by computerized implementations of these algorithms, including, but not limited to: CLUSTAL, GAP, BESTFIT, BLAST, FASTA and TFASTA. Software for performing BLAST analyses is publicly available, e.g., through the National Center for Biotechnology-Information (http://blast.ncbi.nlm.nih.gov/).
One example for comparison of amino acid sequences is the BLASTP algorithm that uses the default settings: Expect threshold: 10; Word size: 3; Max matches in a query range: 0; Matrix: BLOSUM62; Gap Costs: Existence 11 , Extension 1 ; Compositional adjustments: Conditional compositional score matrix adjustment. One such example for comparison of nucleic acid sequences is the BLASTN algorithm that uses the default settings: Expect threshold: 10; Word size: 28; Max matches in a query range: 0; Match/Mismatch Scores: 1 .-2; Gap costs: Linear. Unless stated otherwise, sequence identity values provided herein refer to the value obtained using the BLAST suite of programs (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) using the above identified default parameters for protein and nucleic acid comparison, respectively.
Reference to identical sequences without specification of a percentage value implies 100% identical sequences (i.e. the same sequence).
The term having substantially the same biological activity in the context of the present invention relates to the function of a ribosomal S15protein in reconstituting CRISPR-associated Tn7-like
transposon activity in a system containing activity of Cas12k, TnsB, TnsC, and TniQ as well as an appropriate guide RNA, but no native S15 activity.
General Molecular Biology: Nucleic Acid Sequences, Expression
The term gene refers to a polynucleotide containing at least one open reading frame (ORF) that is capable of encoding a particular polypeptide or protein after being transcribed and translated. A polynucleotide sequence can be used to identify larger fragments or full-length coding sequences of the gene with which they are associated. Methods of isolating larger fragment sequences are known to those of skill in the art.
The term transgene in the context of the present specification relates to a gene or genetic material that has been transferred from one organism to another. In the present context, the term may also refer to transfer of the natural or physiologically intact variant of a genetic sequence into tissue of a patient where it is missing. It may further refer to transfer of a natural encoded sequence the expression of which is driven by a promoter absent or silenced in the targeted tissue.
The term recombinant in the context of the present specification relates to a nucleic acid, which is the product of one or several steps of cloning, restriction and/or ligation and which is different from the naturally occurring nucleic acid. A recombinant virus particle comprises a recombinant nucleic acid.
The terms gene expression or expression, or alternatively the term gene product, may refer to either of, or both of, the processes - and products thereof - of generation of nucleic acids (RNA) or the generation of a peptide or polypeptide, also referred to transcription and translation, respectively, or any of the intermediate processes that regulate the processing of genetic information to yield polypeptide products. The term gene expression may also be applied to the transcription and processing of a RNA gene product, for example a regulatory RNA or a structural (e.g. ribosomal) RNA. If an expressed polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell. Expression may be assayed both on the level of transcription and translation, in other words mRNA and/or protein product.
Detailed Description of the Invention
A first aspect of the invention relates to an engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand. The system comprises: one or more CRISPR-associated transposase proteins or functional fragments thereof, or a nucleic acid sequence encoding such transposase proteins or functional fragments thereof; a Cas protein, or a nucleic acid sequence encoding such Cas protein; a guide RNA molecule capable of complexing with the Cas protein and directing sequence specific binding of the guide-Cas protein complex to a target sequence of a target DNA strand; and
a ribosomal protein S15, or a nucleic acid sequence encoding such ribosomal protein S15.
The Cas protein
In certain embodiments, the Cas protein is a Type V Cas protein. In general terms, the invention relates to improvements of Type V CRISPR related transposase systems that enable a copy-paste- type insertion of “cargo” DNA sequences into a target sequence.
In certain embodiments, the Type V Cas protein is a Type V-K Cas protein.
In certain embodiments, the Type V-K Cas protein is Cas12.
In certain particular embodiments, the Cas protein is Cas12k.
One non-limiting example of a Cas12k protein that is useful in practicing the invention, is the Scytonema hofmanni protein, Cas12k (WP_029636312.1 ).
Cas12k mediates RNA-guided DNA integration into the target DNA site. Both strands of the DNA will contain the transposon DNA in ligated/integrated form. The inventors predict that the use of ribosomal protein S15 is not limited to Cas12k alone but may be of use to facilitate use of any system in which the Cas12k tracrRNA plays a role. The interaction of S15 is mediated by the tracrRNA. As far as the inventors have been able to determine, there isn’t any specific sequence that mediates this interaction, as the interaction depends also on the specific arrangement in space of the tracrRNA and the interactions are with the RNA backbone, not with specific nucleotides. The guide RNA molecule shown in the examples is a single guide comprising the tracrRNA sequence mediating interaction with the Cas12k protein, and the sequence-specific target interacting RNA (CRISPR RNA-crRNA). It is, however, also possible to use a dual system comprising a tracrRNA (trans-activating RNA) and a crRNA (CRISPR RNA).
In certain embodiments, the Cas protein lacks nuclease activity. The Cas protein’s activity is in binding the DNA, but not cleaving it as the Cas protein used in the examples, Cas12k, is naturally mutated in residues involved in DNA cleavage. It is only the DNA binding activity that is required for transposon insertion, and not the nuclease activity. Integration is mediated by the transposon components, TnsB in the case of the Tn7-like machinery employed in the examples.
Any of the components mentioned herein may be expressed from a nucleic acid expression vector as set forth herein. The coding sequence may include a nuclear localization sequence peptide in applications that involve expression of the protein component of the system in a eukaryotic cell.
The transposase protein complex
In certain embodiments, the transposase protein complex is composed of CRISPR-associated transposase proteins.
In particular embodiments, the transposase protein complex is a Tn7 transposase complex.
In certain embodiments, the one or more (CRISPR-associated) transposase proteins or functional fragments thereof may be an engineered transposase system configured for replicative (copy-and- paste) transposition.
In certain particular embodiments, the transposase proteins consist of a Tn7-like transposase system.
In certain particular embodiments, the transposase proteins consist of the group composed of TnsB, TnsC, and TniQ.
TnsB is a transposase that cleaves the 3’ termini of the transposon ends and performs the DNA ligation steps required for transposon integration. One non-limiting example is the Scytonema hofmanni protein, TnsB (WP_084763316.1 )
TnsC is a AAA+ ATPase involed in target DNA recognition and transposase activation. One nonlimiting example is the Scytonema hofmanni protein, TnsC (WP_029636336.1 )
TniQ is a zinc-finger protein that regulates transposition by interaction with the other components. One non-limiting example is the Scytonema hofmanni protein, TniQ (WP_029636334.1 ).
In certain embodiments, the ribosomal protein S15 is ribosomal protein S15 from E. coli (AP009048.1 ) or ribosomal protein S15 from S. hofmanni (WP_029633173.1 ).
The inventors have successfully employed ribosomal proteins S15 from E. coli and the original organism Scytonema hoffmanni (WP_029633173.1 ), but expect that other prokaryote-derived S15 proteins, as well as mitochondrial S15 proteins, will also be functional.
The inventors predict that the S15 complementation will work with any CRISPR-Cas transposon system in which the tracrRNA of the system has an architecture that is structurally conserved with respect to the one observed for the Cas12k-associated tracrRNA, which is characterized by interaction of S15 with the specific 3D arrangement of the tracrRNA and RNA-DNA duplex.
Mention of ribosomal protein S15 herein refers to prokaryotic S15, in the strictest sense to the E. coli or S. hofmanni ribosomal protein S15.
The ribosomal protein S15 may be present as a recombinant S15 protein, or a vector from which the S15 may be expressed. The recombinant S15 protein may only consist of the S15 sequence found in nature, or a polypeptide sequence having the biological activity (measured as the ability to complement Tn7 transposase activity in a system as shown in the examples of the present specification), and characterized by at least 50% identity in comparison to S15 from S. hofmanni (WP_029633173.1 ).
In certain embodiments, the recombinant S15 protein is characterized by at > 60% identity in comparison to S15 from S. hofmanni. In certain particular embodiments, the recombinant S15 protein is characterized by at > 70% identity in comparison to S15 from S. hofmanni. In certain more particular embodiments, the recombinant S15 protein is characterized by at > 80% identity in comparison to S15 from S. hofmanni.
In certain even more particular embodiments, the recombinant S15 protein is characterized by at > 85% identity in comparison to S15 from S. hofmanni. In certain yet even more particular embodiments, the recombinant S15 protein is characterized by at > 95% identity in comparison to S15 from S. hofmanni.
An S15 polypeptide as used in the invention may also be present as a fusion protein sequence, joined to the Cas protein or one of the transposase proteins present in the system.
A non-exhaustive list of S15-like proteins from prokaryotic, which the inventors predict will be useful in practicing the invention, are included in Table 1.
The system is used for inserting a donor polynucleotide sequence into a target sequence. The target sequence will determine the guide RNA molecule sequence, as the guide RNA hybridizes to a sequence adjacent to the target.
The minimal set of components of the system according to the invention are, as CRISPR components, one Cas protein (particularly Cas12k) and a guide RNA (either fused sgRNA or a dual tracrRNA + crRNA guide). In addition, the S15 protein must be present; it can either be fused, for example to Cas12k (to make an engineered Cas12k-S15 fusion protein) or be provided in trans. The minimal transposon components are Tn7-like proteins TniQ, TnsC and TnsB.
Any of the components mentioned herein may be expressed from a nucleic acid expression vector as exemplarily set forth herein. The coding sequence may include a nuclear localization sequence peptide in applications that involve expression of the protein component of the system in a eukaryotic cell.
The inserted polynucleotide
In certain embodiments, the engineered nucleic acid targeting system according to the invention further comprises a donor polynucleotide (DNA) sequence that is to be integrated. The donor polynucleotide (DNA) sequence comprises a recognition site for the recombinase and a cargo nucleic acid sequence flanked by at least one transposon end sequence.
In certain embodiments, the cargo nucleic acid sequence is flanked by a right end sequence element and a left end sequence element.
The donor polynucleotide or donor sequence consists of a “cargo” sequence, in other words the sequence to be inserted net of sequence elements that are present in order to facilitate the insertion. The cargo sequence is flanked by the left and right transposon ends. In type V systems, all the donor (including the backbone DNA) is expected to be integrated. In type I systems (or in an engineered version of the type V system), only the cargo and the terminal transposon ends are integrated.
In other words, the donor is a piece of DNA (linear or circular) that contains transposon left/right end sequences and the cargo to be inserted in between.
In particular embodiments, the cargo nucleic acid sequence can range in size from 100 bases to 30 kb in length of double stranded DNA.
The specific structural requirements of the terminal sequences differ for different transposase family members that may be employed in the course of practicing the invention as laid out herein. But the designation “right end sequence element” and “left end sequence element” are known in the art to generally refer to all the terminal sequences of any transposons, in particular also of Tn7 transposons. The right end sequences I left end sequences are the termini of the transposon (they mark the boundaries of the transposon and are part of the transposon itself).
Certain commercial applications of the invention as laid out herein may provide the protein components of the nucleic acid targeting system including the Cas protein, transposase proteins and S15 ribosomal polypeptide, either as isolated proteins (single polypeptides certain components being fused to each other), or as vectors, supplied by a commercial provider. The polynucleotide sequences (the cargo I donor sequence to be integrated into a target sequence, and the RNA component comprising the tracrRNA interacting with the Cas protein and the S15 protein) may be provided by a second party, or the user of the system.
A further aspect of the invention relates to a method for inserting a cargo DNA nucleic acid sequence into a target nucleic acid DNA sequence. The method comprising the steps of contacting the target nucleic acid sequence with an engineered nucleic acid targeting system according to the invention. The minimal set of functionalities necessary to be provided consists of the Cas protein, particularly Cas12k, and guide RNA, the set of CRISPR-associated transposase proteins or functional fragments thereof, consisting of the group of TnsB, TnsC, and TniQ, and the ribosomal protein S15, in addition to the cargo protein. Any one of the protein components of the set of minimal functionalities, or any combination thereof, including all protein components, may be provided as nucleic acid encoded components.
In the course of practicing the method according to the invention, the cargo polynucleotide is inserted at a position between 40 and 100 bases 3’-terminal a PAM sequence in the target polynucleotide. The cargo is inserted at a position between 40 and 100 bases downstream a protospacer adjacent motif (PAM) sequence in the target polynucleotide.
The PAM sequence is on the 5’ side of the target site.
The PAM is located on the non-target strand (NTS) on the 5’ side of the target, the transposon insertion site is located on the 3’ side.
In the terminology used in context of describing type V CRISPR Cas systems, the PAM (5’-3’) is the sequence of the NTS just upstream of the crRNA:TS_DNA duplex. This is the point of orientation. In case of the system used herein, the PAM sequence is always on the 5’ side of the target.
In certain embodiments, the PAM comprises the sequence NGTN. In certain particular embodiments, the PAM is RGTR, VGTD, or VGTR. N: any nucleotide; G: guanine; T: thymine; R: purine (A/G); V: not T (A/G/C); D: not C (A/G/T).
In certain embodiments, the method of the invention is directed at inserting a cargo nucleic acid sequence into a target nucleic acid sequence inside a cell, and comprises contacting the target nucleic acid sequence inside the cell, particularly inside a eukaryotic cell, with an engineered nucleic acid targeting system according to the invention.
The method is applied to a cell that does not express ribosomal protein S15, or a homologue or orthologue of S15, prior to the cell having been contacted with the engineered nucleic acid targeting system. One key aspect of the invention is that it supplies a key functionality that is a constitutively expressed protein in E coli and thereby went unrecognized as a key component of the Type V transposase complex prior to the present invention.
In certain embodiments, the cell is a eukaryotic cell.
In certain embodiments, the cell is a mammalian cell. In certain particular embodiments, the cell is a primate cell. In certain more particular embodiments, the cell is a human cell.
In certain embodiments, the cell is a stem cell. In certain particular embodiments, the method according to the invention in any combination of the particular embodiments of its components mentioned herein, is practiced ex-vivo.
In certain embodiments, the engineered nucleic targeting system is delivered into the cell by one or more expression vectors.
An expression vector in the broadest sense is a polynucleotide sequence encoding one or more of the components of the engineered nucleic acid targeting system according to the invention. Each coding sequence is understood to be under control of a promoter operable in the target cell. The promoters may be inducible or constitutive. Protein expression in eukaryotes is usually driven by RNA polymerase II promoters, generating mRNA having the appropriate 5’ cap and 3’ poly-A tags. Expression of RNA components may also be driven from other RNA polymerases, such as RNA polymerase III.
Expression vectors include “naked” (closed circular plasmid or linear) DNA vectors that may be delivered enclosed in liposomes, or associated to particles. Expression vectors also include RNA molecules, from which ribosomes may translate the protein components directly. RNA can be delivered by liposomes. The extraordinary success of RNA vector-mediated Sars-Cov-2 vaccination has highlighted the potential of this technology.
Alternatively, the components of the engineered nucleic acid targeting system according to the invention may be delivered by viral vectors. DNA virus, positive or negative strand or double stranded RNA virus have all been employed in experimental therapeutic gene transfer approaches.
In certain embodiments, the one or more expression vectors are selected from the group consisting of viral vectors, DNA vectors and RNA vectors.
In certain embodiments, a viral vector is used, selected from the group consisting of an adeno- associated virus, an adenovirus, a herpesvirus, and a lentivirus.
In certain embodiments, the cell is a eukaryotic cell, particularly a mammalian cell, and at least one, particularly all of the Cas protein, the transposase proteins and the S15 protein carry a nuclear localization sequence peptide.
Another aspect of the invention relates to the use of a recombinant ribosomal protein S15, or of a nucleic acid sequence encoding said recombinant ribosomal protein S15, in a method for inserting a donor polynucleotide sequence into a target polynucleotide sequence, said method for inserting being facilitated by a CRISPR-associated transposase system. Again, the S15 protein may carry an NLS for transport to the nucleus when expressed inside a eukaryotic cell.
In certain embodiments of this use according to the invention, the recombinant ribosomal protein is ribosomal protein S15 from E. coli (AP009048.1 ) or ribosomal protein S15 from S. hofmanni (WP_029633173.1 ), or a ribosomal protein having at least 85% sequence identity to ribosomal protein S15 from S. hofmanni (WP_029633173.1 ) and at least 80% of the biological identity of WP_029633173.1 . In certain embodiments, the S15 recombinant ribosomal protein is selected from the list of prokaryotic proteins in Table 1. In certain embodiments, the S15 recombinant ribosomal protein is selected from the list of eukaryotic proteins in Table 1 .
In certain embodiments of this use according to the invention, the CRISPR-associated transposase system comprises Cas12k.
In certain embodiments of this use according to the invention, the CRISPR-associated transposase system comprises Tn7-like transposase proteins.
In certain embodiments of this use according to the invention, the Tn7-like transposase proteins comprise, particularly consist of, TnsB, TnsC, and TniQ.
The insertion of the cargo sequence into the cell’s genome may effect a number of corrections or changes. It may introduce one or more mutations to the target sequence, for example in order to correct an existing error to revert to a wild type, or to increase the genetic diversity (for example, in a library). It may correct, or may introduce, a stop codon in the target sequence, for example to restore protein function in the case of a premature stop codon, or to abrogate a certain protein function. The changes made by introduction of the cargo cell may also disrupt, restore or introduce a splicing site. Alternatively, it may insert a gene or gene fragment at one or both alleles of a target.
Mutations introduced by the donor sequence may comprise substitutions, deletions, insertions, or a combination thereof. Alternatively, the mutations may cause a shift in an open reading frame on the target polynucleotide.
The invention further encompasses the following items:
1. An engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand, the system comprising: a. one or more transposase proteins or functional fragments thereof, or a nucleic acid sequence encoding such transposase proteins or functional fragments thereof; b. a Cas protein, or a nucleic acid sequence encoding such Cas protein; c. a guide RNA molecule capable of complexing with the Cas protein and directing sequence specific binding of the guide-Cas protein complex to a target sequence of a target DNA strand; d. a ribosomal protein S15, or a nucleic acid sequence encoding such ribosomal protein S15.
2. The engineered nucleic acid targeting system according to item 1 , wherein the Cas protein is Cas12k.
3. The engineered nucleic acid targeting system according to item 1 or 2, wherein the transposase proteins consist of a Tn7-like transposase system.
4. The engineered nucleic acid targeting system according to item 3, wherein the transposase proteins consist of the group composed of TnsB, TnsC, and TniQ.
5. The engineered nucleic acid targeting system according to any one of the preceding items, wherein the ribosomal protein S15 is ribosomal protein S15 from E. coli (AP009048.1 ) or ribosomal protein S15 from S. hofmanni (WP_029633173.1 ).
6. The engineered nucleic acid targeting system according to any one of the preceding items, further comprising a donor polynucleotide (DNA) sequence to be integrated, wherein the donor polynucleotide (DNA) sequence comprises a recognition site for the recombinase and a cargo nucleic acid sequence flanked by at least one transposon end sequence.
7. The engineered nucleic acid targeting system according to item 6, wherein the cargo nucleic acid sequence is flanked by a right end sequence element and a left end sequence element.
8. A method for inserting a cargo nucleic acid sequence into a target nucleic acid sequence, the method comprising contacting the target nucleic acid sequence with an engineered nucleic acid targeting system according to any one of items 1 to 7.
9. A method for inserting a cargo nucleic acid sequence into a target nucleic acid sequence inside a cell, the method comprising contacting the target nucleic acid sequence with an engineered nucleic acid targeting system according to any one of items 1 to 7.
10. The method according to item 9, wherein the cell does not express ribosomal protein S15, or a prokaryote homologue or prokaryote orthologue of S15, prior to the cell having been contacted with the engineered nucleic acid targeting system.
11 . The method according to item 9 or 10, wherein the cell is a eukaryotic cell.
12. The method according to any one of the preceding items 9 to 11 , wherein the cell is a mammalian cell, particularly a primate cell, more particularly a human cell.
13. The method according to item 12, wherein the cell is a stem cell, particularly wherein the method is practiced ex-vivo.
14. The method according to any one of items 9 to 13, wherein the engineered nucleic targeting system is delivered into the cell by one or more expression vectors.
15. The method according to item 14, wherein the one or more expression vectors are selected from the group consisting of viral vectors, DNA vectors and RNA vectors.
16. The method according to item 15, wherein a viral vector is used, selected from the group consisting of an adeno-associated virus, an adenovirus, a herpesvirus, and a lentivirus.
17. The method according to any one of items 9 to 16, wherein the cell is a eukaryotic cell, particularly a mammalian cell, and wherein at least one, particularly all of the members of the group consisting of Cas protein, the transposase proteins and the S15 protein carry a nuclear localization sequence peptide.
18. Use of a recombinant ribosomal protein S15, or of a nucleic acid sequence encoding said recombinant ribosomal protein S15, in a method for inserting a donor polynucleotide sequence into a target polynucleotide sequence, said method for inserting being facilitated by a CRISPR-associated transposase system.
19. The use according to item 18, wherein the recombinant ribosomal protein is ribosomal protein S15 from E. coli (AP009048.1 ) or ribosomal protein S15 from S. hofmanni (WP_029633173.1 ), or a ribosomal protein having at least 85% sequence identity to ribosomal protein S15 from S. hofmanni (WP_029633173.1 ) and at least 80% of the biological identity of WP_029633173.1 .
20. The use according to item 18 or 19, wherein the CRISPR-associated transposase system comprises Cas12k.
21 . The use according to item 18 to 20, wherein the CRISPR-associated transposase system comprises Tn7-like transposase proteins.
22. The use according to item 18 to 20, wherein the Tn7-like transposase proteins comprise, particularly consist of, TnsB, TnsC, and TniQ.
23. An isolated recombinant S15 protein comprising a bacterial S15 protein sequence, a purification tag and a peptide sequence for translation of the protein into the eukaryotic nucleus, particularly wherein the isolated recombinant S15 protein is characterized by SEQ ID NO 001 or SEQ ID NO 003.
24. A DNA sequence encoding the isolated recombinant S15 protein according to item 23, optimized for translation in a eukaryotic cell, particularly in a human cell, particularly wherein the DNA sequence is SEQ ID NO 002 or SEQ ID NO 004.
25. A kit comprising DNA sequences encoding the recombinant proteins:
- CasC12k
TnsC
TniQ
TnsB
each of the proteins comprising a purification tag and a peptide sequence for translation of the protein into the eukaryotic nucleus, particularly wherein the recombinant proteins are characterized by SEQ ID NO 005, SEQ ID NO 007, SEQ ID NO 009, and SEQ ID NO 011 .
The invention further encompasses the following items:
Item 1A: An engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand, the system comprising: a. a first nucleic acid sequence, or a plurality of first nucleic acids, encoding transposase proteins TnsB, TnsC, and TniQ; b. a second nucleic acid sequence encoding a Cas12k protein; c. a third nucleic acid sequence encoding a ribosomal protein S15; d. a fourth nucleic acid sequence encoding an RNA consisting of a sgRNA and a crRNA segment, with an optional linker segment separating the sgRNA from the crRNA segment.
Item 2A: The engineered nucleic acid targeting system according to Item 1A, wherein said Cas12k protein is encoded as a CAS12k fusion polypeptide containing a nuclear localization signal peptide fused to said Cas12k protein.
Item 2B: The engineered nucleic acid targeting system according to Item 2A, wherein said Cas12k protein is encoded as a polypeptide containing two nuclear localization signal peptides fused to said Cas12k protein.
Item 2C: The engineered nucleic acid targeting system according to Item 2A or 2B, wherein a peptide linker is present between the Cas12k protein and the nuclear localization signal peptide, particularly wherein the linker is 2 to 10 amino acids in length.
Item 2D: The engineered nucleic acid targeting system according to Item 2A, 2B or 2C, wherein the Cas12k protein is situated N-terminally relative to the nuclear localization signal peptide on said CAS12k fusion polypeptide.
Item 2E: The engineered nucleic acid targeting system according to Item 2A, 2B, 2C or 2D, wherein the Cas12k protein is Scytonema hofmanni Cas12k.
Item 3A: The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the ribosomal protein S15 is encoded as an S15 fusion polypeptide containing a nuclear localization signal peptide fused to said S15 protein.
Item 3B: The engineered nucleic acid targeting system according to Item 3A, wherein the S15 protein is situated C-terminally relative to the nuclear localization signal peptide on said S15 fusion polypeptide.
Item 3C: The engineered nucleic acid targeting system according to Item 3A or 3B, wherein the S15 protein is separated from the nuclear localization signal peptide by a peptide linker, particularly wherein the linker is 2 to 10 amino acids in length.
Item 3D: The engineered nucleic acid targeting system according to Item 3A, 3B or 3C, wherein the S15 protein is Scytonema hofmanni S15 or Escherichia coli S15.
Item 4A: The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the TniQ protein is encoded as a TniQ fusion polypeptide containing a nuclear localization signal peptide fused to said TniQ protein.
Item 4B: The engineered nucleic acid targeting system according to Item 4A, wherein the TniQ protein is situated N-terminally relative to the nuclear localization signal peptide on said TniQ fusion polypeptide.
Item 4C: The engineered nucleic acid targeting system according to Item 4A or 4B, wherein the TniQ protein is separated from the nuclear localization signal peptide by a peptide linker, particularly wherein the linker is 2 to 10 amino acids in length.
Item 4D: The engineered nucleic acid targeting system according to Item 4A, 4B or 4C, wherein the TniQ protein is Scytonema hofmanni TniQ.
Item 5A: The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the TnsB protein is encoded as a TnsB fusion polypeptide containing a nuclear localization signal peptide fused to said TnsB protein.
Item 5B: The engineered nucleic acid targeting system according to Item 5A, wherein the TnsB protein is situated C-terminally relative to the nuclear localization signal peptide on said TnsB fusion polypeptide.
Item 5C: The engineered nucleic acid targeting system according to Item 5A or 5B, wherein the TnsB protein is separated from the nuclear localization signal peptide by a peptide linker, particularly wherein the linker is 2 to 10 amino acids in length.
Item 5D: The engineered nucleic acid targeting system according to Item 5A, 5B or 5C, wherein the TnsB protein is Scytonema hofmanni TnsB
Item 6A: The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the TnsC protein is encoded as a TnsC fusion polypeptide containing a nuclear localization signal peptide fused to said TnsC protein.
Item 6B: The engineered nucleic acid targeting system according to Item 6A, wherein the TnsB protein is situated C-terminally relative to the nuclear localization signal peptide on said TnsC fusion polypeptide.
Item 6C: The engineered nucleic acid targeting system according to Item 6A or 6B, wherein the TnsC protein is separated from the nuclear localization signal peptide by a peptide linker, particularly wherein the linker is 2 to 10 amino acids in length.
Item 5D: The engineered nucleic acid targeting system according to Item 6A, 6B or 6C, wherein the TnsC protein is Scytonema hofmanni TnsC.
Item 7A: The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the Cas12k fusion polypeptide, the S15 fusion polypeptide, the TniQ fusion polypeptide, the TnsB fusion polypeptide and the TnsC fusion polypeptide are expressed under control of a constitutive RNA polymerase II promoter operable in a human cell.
Item 8A: The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the Cas12k fusion polypeptide, the S15 fusion polypeptide, the TniQ fusion polypeptide, the TnsB fusion polypeptide and the TnsC fusion polypeptide are each expressed under control of a separate promoter sequence.
Item 9A: The engineered nucleic acid targeting system according to any one of the preceding Items 1A to 7A, wherein the Cas12k fusion polypeptide, the S15 fusion polypeptide, the TniQ fusion polypeptide, the TnsB fusion polypeptide and the TnsC fusion polypeptide are together expressed as a single polypeptide under control of a single promoter sequence.
Item 9B: The engineered nucleic acid targeting system according to Item 9A, wherein the Cas12k fusion polypeptide, the S15 fusion polypeptide, the TniQ fusion polypeptide, the TnsB fusion polypeptide and the TnsC fusion polypeptide are separated from one another by a self-cleaving peptide sequence.
Item 9C: The engineered nucleic acid targeting system according to Item 9B, wherein the selfcleaving peptide sequence is the Thosea self-cleaving peptide sequence.
Item 10: The engineered nucleic acid targeting system according to any one of the preceding Items, wherein the promoter is the CMV immediate early promoter.
Item 11 A: The engineered nucleic acid targeting system according to any one of the preceding Items, wherein expression of the fourth nucleic acid sequence encoding an RNA consisting of a sgRNA and a crRNA segment is under control of an RNA polymerase III promoter operable in a human cell.
Item 11 B: The engineered nucleic acid targeting system according to Item 11 A, wherein the RNA polymerase II promoter is an U6 promoter.
Item 12B: The engineered nucleic acid targeting system according to any one of the preceding Items, further comprising a DNA comprising a donor sequence to be inserted, flanked by a left transposon end and a right transposon end, respectively.
Wherever alternatives for single separable features are laid out herein as “embodiments”, it is to be understood that such alternatives may be combined freely to form discrete embodiments of the invention disclosed herein. Thus, any of the embodiments of S15, the transposase proteins, or the Cas protein component, may be combined for the system, method and use aspects of this invention.
The invention is further illustrated by the following examples and figures, from which further embodiments and advantages can be drawn. These examples are meant to illustrate the invention but not to limit its scope.
Description of the Figures
Fig. 1 Cryo-EM structure of the Cas12k-TnsC transposon recruitment complex, a, Schematic of the type V-K CRISPR-associated transposon system from Scytonema hofmanni and the S15 gene. LE, RE - left and right transposon ends, b, Electron density map of the Cas12k-transposon recruitment complex. Density for nine TnsC protomers (TnsC1-TnsC9) is shown, c, Schematic of the sgRNA structure and R-loop architecture indicating interactions between nucleic acids and protein components of the complex, (see SEQ ID NO 021 to SEQ ID NO 023) d, Structural model of the Cas12k-transposon recruitment complex (surface and cartoon representations). TS, target DNA strand; NTS, non-target DNA strand.
Fig. 2 R-loop completion upon complex assembly a-b, Structural comparison and detailed views of the R-loop structure comprising the crRNA portion of the single guide RNA (red cartoon backbone), the target DNA strand (TS, blue cartoon backbone) and the non-target DNA strand (NTS, dark grey cartoon backbone) bound to the Cas12k-transposon recruitment complex (a) and the Cas12k-sgRNA-target DNA complex (PDB ID:7PLA) (b). Only the REC (recognition lobe) and RuvC domains and the bridging helix (BH) of Cas12k are shown in surface representation.
Fig. 3 TniQ recognizes tracrRNA and completed R-loop. a, Overview of the interactions established by TniQ with the tracrRNA scaffold (yellow cartoon backbone) and the RNA:DNA heteroduplex formed by the crRNA portion of the single guide RNA (red cartoon backbone) and the target DNA strand (TS, blue cartoon backbone). NTS (non-target DNA strand, dark grey cartoon backbone). The N-terminus (N) and C-terminus (C) of TniQ are indicated, b, Close-up views of key tracrRNA-interacting residues of TniQ. c, Detailed views of R-loop recognition by TniQ. d-e, Sitespecific transposition activity in E. coli of structure-based tracrRNA mutants and mutants at the tracrRNA-interacting interface (d) as well as in the R-loop recognition pocket of TniQ (e), as determined by droplet digital PCR (ddPCR) analysis. Data are presented as mean ± s.d. (n=3 biologically independent replicates).
Fig. 4 TniQ primes TnsC oligomerization, a, Detailed view of the interactions between TniQ (cartoon representation) and two protomers of the TnsC filament (TnsC1 and TnsC2, surface representation). The N-terminus (N) and C-terminus (C) of TniQ are indicated, b-c, Close-up views of key interactions between TniQ and TnsC1 (b) and TnsC2 (c). d, Site-specific transposition
activity in E. coli of CAST systems containing mutations in the TnsC binding interface of TnsC, as determined by droplet digital PCR (ddPCR) analysis. Data are presented as mean ± s.d. (n=3 biologically independent replicates).
Fig. 5 TnsC assembly on PAM distal end of R-loop DNA: a, Overview of the DNA duplex - target DNA strand (TS, blue cartoon backbone) and the non-target DNA strand (NTS, dark grey cartoon backbone) - bound to the Cas12k-transposon recruitment complex. Only the crRNA portion of the single guide RNA (red cartoon backbone) is shown. Residues 132-254 of Cas12k and the TnsC2 and TnsC3 protomers are hidden for better visualization of the DNA binding channel formed by the components of the complex, (b) Zoomed-in view of target DNA binding residues of TniQ. (c) Zoomed-in view of the DNA binding residues in TnsC1 protomer, (d) Comparison of DNA binding modes of consecutive TnsC protomers (TnsC1-TnsC3) in the Cas12k-transposon recruitment complex (left) and in the TniQ-capped TnsC filament (right).
Fig. 6 S15 promotes Cas12k-TnsC transposon recruitment complex assembly, (a) Zoomed-in view of S15 binding in the Cas12k-recruitment complex, (b) Co-precipitation of TnsC and TniQ in presence or absence of ShS15 (Scytonema hofmanni S15) or EcS15 (Escherichia coli S15) by immobilized Strepll-fused Cas12k-sgRNA-target DNA complex, (c) In vitro-transposition activity of purified ShCAST components in the absence or presence of E. coli and S. hofmanni ribosomal S15 proteins, as determined by droplet digital PCR (ddPCR) analysis. Data are presented as mean ± s.d. (n-3 independent replicates)
Fig. 7 Mechanism of RNA-mediated assembly in type V CRISPR-associated transposons. Schematic diagram depicting the recruitment of the transposition machinery by the RNA-guided Cas12k complex in type V-K CRISPR-associated transposons. Cas12k in association with a crRNA-tracrRNA dual guide RNA recognizes target DNA sequences to form a partial R-loop structure. Full R-loop formation occurs upon recruitment of S15, TniQ and TnsC. TniQ recognizes specific regions of the tracrRNA and primes oligomerization of TnsC filaments by holding together the first two TnsC protomers. TnsC forms extended protein filaments around the target DNA in presence of ATP by establishing discrete interactions with the backbone of the TS (target) and NTS (non-target) DNA strands and thereby remodeling the underlying duplex. The ribosomal protein S15 assists productive complex assembly by interacting with the tracrRNA and Cas12k. Altogether the assembly provides a recruitment platform for TnsB, which promotes transposon DNA insertion at the target site.
Fig. 8: Cryo-EM data processing workflow for the Cas12k-TsnC transposon recruitment complex, a, Representative negative stain EM micrographs (at 98,000x magnification) and cryo-EM micrographs (at 130,000x magnification) of the ShCas12k-transposon recruitment complex after reconstitution and cryo-EM image processing workflow b, Angular distribution plotted on the density map. c, Final electron density map colored according to the local resolution, d, Fourier Shell Correlations (FSC) of the reconstruction from two independently refined half-maps. The gold-
standard cutoff (FSC = 0.143) is marked with a black dotted line, e, Validation of ShCas12k- transposon recruitment complex structure model.
Fig. 9: Structural comparison between the Cas12k-transposon recruitment complex and the Cas12k-TnsC non-productive complex: Electron density maps (a) and structural models (b) of the Cas12k-transposon recruitment complex and the Cas12k-TnsC non-productive complex. Side views and structural superpositions are shown. Proteins are shown in surface representation.
Fig 10: Structural rearrangements in Cas12k and the sgRNA upon R-loop completion, a, Structural models of the Cas12k-sgRNA-target DNA complex (top) and the same assembly in the Cas12k- transposon recruitment complex (bottom). Domain organization of Cas12k is shown for both models at the bottom. REC, recognition lobe. WED, wedge domain. PI, PAM interacting domain. BH, bridging helix. TS, target DNA strand; NTS, non-target DNA strand, b, Detailed views of the RuvC and BH arrangements in the two superposed structural models, c, Structural comparison of overlaid tracrRNA in the Cas12k-sgRNA-target DNA and the Cas12k-transposon recruitment complex.
Fig. 11 Interactions and conservation of the ribosomal protein S15. (a) Zoomed-in view of S15 interactions with the tracrRNA and crRNA:TS-DNA duplex, (b) Zoomed-in view of S15 contacts with the Cas12k REC2 domain, (c) Sequence alignment of the ribosomal proteins EcS15, ShS15 and HsS13; see SEQ ID NO 024 to 026. (d) Structural models of the Cas12k-transposon recruitment complex (left), CasX-sgRNA-target DNA complex (middle, PDB: 6NY2) and a superposition of both focused on S15 (right), (e) Zoomed-in views of EcS15 interactions with tracrRNA in the Cas12k-transposon recruitment complex (left), 16S rRNA in the E. coli ribosome (middle, PDB ID: 6Q97 (Rae et al., 2019)) and a superposition of both focused on S15 (right).
Fig. 12 Examples of construct designs for the expression of type V CAST components (including S15 protein) in mammalian/human cells.
Examples - Publication
Introduction
The canonical function of prokaryotic CRISPR-Cas adaptive immune systems is to provide adaptive immunity against invading mobile genetic elements - including transposons, plasmids and phages. This relies on CRISPR-associated (Cas) protein effector complexes that mediate CRISPR RNA (crRNA)-guided recognition of target nucleic acids and their subsequent nucleolytic degradation (Koonin et al., 2017; Sorek et al., 2013). Several Tn7-like transposons have co-opted RNA-guided type l-F, l-B and V-K CRISPR-Cas systems to direct transposon DNA insertion into specific target sites (Faure et al., 2019; Klompe et al., 2019; Petassi et al., 2020; Peters et al., 2017; Rybarski et al., 2021 ; Saito et al., 2021 ; Strecker et al., 2019). In CRISPR-associated transposons, the CRISPR-Cas DNA targeting machineries, either a nuclease-deficient multisubunit Cascade complex in type I systems (Klompe et al., 2019) or a single catalytically inactive Cas12k protein in
type V-K systems (Strecker et al., 2019), are encoded between the left and right ends of the transposon together with CRISPR arrays and recruit the transposon machinery to target sites specified by the crRNA, ultimately resulting in transposon DNA insertion at a fixed distance downstream of the selected target DNA sequence. While crRNAs encoded from the CRISPR arrays guide transposon insertion preferentially into other mobile genetic elements, delocalized atypical crRNAs target integration into chromosomal sites for transposon homing (Saito et al., 2021 ). As CRISPR-associated transposons encode CRISPR arrays (Faure et al., 2019; Peters et al., 2017; Rybarski et al., 2021 ) and additional defense systems as cargos (Klompe et al., 2022), these elements have been hypothesizes to mediate horizontal gene transfer of host defense systems within bacterial populations by using other transposons and plasmids as shuttle vectors.
CRISPR-associated transposons provide programmable, targeted DNA integration machineries that have been repurposed as site-specific, homology-independent DNA insertion tools to engineer bacterial hosts (Vo et al., 2021 b; Zhang et al., 2020) and communities (Rubin et al., 2022). However, their application in eukaryotic cells has been so far hindered by our limited knowledge on the underlying mechanisms. For type V-K systems, reconstitution of RNA-guided transposition in bacteria requires the concerted activities of the CRISPR effector complex built of the pseudonuclease Cas12k, the crRNA and a trans-activating RNA (tracrRNA) and three transposon proteins: the AAA+ ATPase TnsC, the transposase TnsB and the zinc-finger protein TniQ (Strecker et al., 2019). First insights into the molecular basis of crRNA-directed transposition mostly derive from biochemical and structural studies of a CRISPR-associated transposon system from Scytonema hofmanni (ShCAST) (Park et al., 2021 ; Querques et al., 2021 ; Xiao et al., 2021 ). Structural reports by us (Querques et al., 2021 ) and others (Xiao et al., 2021 ) revealed that Cas12k- mediated DNA targeting depends on the intricate architecture of the tracrRNA that serves as a scaffold to correctly position Cas12k and the crRNA guide for the recognition of complementary targets. Upon binding to a 5 -GTN protospacer adjacent motif (PAM), Cas12k initiates guide RNA hybridization that yet leads to incomplete R-loop formation, suggesting that further rearrangements occurring upon recruitment of the downstream transposon machinery are required to elicit further guide RNA-target DNA hybridization and ultimately lead to recognition of ~ 24 bp long targets (Strecker et al., 2019). In turn, structural and biochemical studies of TnsC (Park et al., 2021 ; Querques et al., 2021 ) revealed that the transposon protein assembles into DNA- and ATP- dependent helical filaments that recognize and remodel the underlying target DNA duplex and, at the same time, help tether TnsB to the target site by direct protein-protein interactions. Based on homology to the prototypical E. coli Tn7 transposon (Peters and Craig, 2001 ) and analysis of transposase-mediated integration events (Vo et al., 2021 a), the DDE-type TnsB transposase has been postulated to catalyse the 3 -DNA strand breakage and transfer reactions required for the replicative transposition pathway, resulting in transposon end DNA nicking at a donor locus and ligation to a target DNA at sites located 60-66 bp downstream of the PAM (Strecker et al., 2019). TnsC is thought to be involved in these events by activating the transposase upon recruitment to the target DNA (Peters and Craig, 2001 ). Concurrently, TnsB triggers TnsC filament disassembly
by stimulating the ATP hydrolysis activity of TnsC (Park et al., 2021 ; Querques et al., 2021 ), thereby preventing insertion of multiple transposon copies into the same target site. This phenomenon, known as target immunity, is a common feature of both canonical (Skelding et al., 2003) and CRISPR-associated Tn7 elements (Klompe et al., 2019; Strecker et al., 2019) and is conserved in homologous transposons harboring coupled transposase/ATPase systems (Adzuma and Mizuuchi, 1988; Greene and Mizuuchi, 2002). A structure of an ADP-bound TnsC revealed that a single ring of the ATPase engages with target DNA upon ATP hydrolysis, suggesting this intermediate complex to bridge between the Cas12k-DNA targeting complex and the transposase (Park et al., 2021 ). However, the structural details of a complete CRISPR-transposon assembly remain elusive. The function of the zinc-protein TniQ is also presently unclear. In type V-K systems, a monomeric, compact TniQ directly interacts with the TnsC filament and has been implicated in the regulation of its polymerization (Park et al., 2021 ; Querques et al., 2021 ). Recruitment of TniQ by target DNA- bound Cas12k has been reported to occur only cooperatively with binding to TnsC in vitro (Querques et al., 2021 ). Conversely, in type I-F3 CRISPR-transposon systems a dimer of the homologous, yet larger TniQ has been identified as an integral component of the DNA targeting complex together with the Cascade (Halpin-Healy et al., 2019; Jia et al., 2020; Li et al., 2020).
Despite the available information on each of the factors orchestrating RNA-guided transposition in type V-K systems, our understanding of their functional interplay is yet very limited. Here, we analysed by cryo-electron microscopy (cryo-EM) a functional Cas12k-transposon assembly from ShCAST(Strecker et al., 2019). The structure visualizes how the guide RNA-bound Cas12k, TniQ, TnsC and, unexpectedly, the ribosomal protein S15 interact with each other to mediate DNA targeting. Transposon protein recruitment by the Cas12k effector complex induces complete crRNA:target DNA hybridization and is accompanied with a severe bent of the underlying target DNA duplex. We find that TniQ establishes critical interactions with the trans-activating crRNA and the R-loop at the Cas12k-proximal end of the complex and, at the same time, primes nucleation of the TnsC filament in a discrete, productive orientation distally to Cas12k. We further identify the host-encoded protein S15 as a bona fide component of the crRNA-guided transposition machinery that promotes complex assembly and activity by a mechanism reminiscent of its function in cellular translation. Altogether, our work elucidates how a CRISPR effector recruits its cognate transposon complex to initiate RNA-guided transposition and will inform reconstitution of this machinery for programmable site-specific DNA insertion in genome engineering applications.
Results
Reconstitution and structure of Cas12k-transposon recruitment complex
To obtain structural insights into the interaction between Cas12k and the transposon components in type V CRISPR-associated transposons, we sought to reconstitute guide RNA-programmed Cas12k together with TnsC and TniQ on target DNA. To this end, we first bound a single-molecule guide RNA (sgRNA), comprising sequences corresponding to the crRNA guide and a trans activating crRNA (tracrRNA), and an internally unpaired target DNA oligonucleotide duplex to
Cas12k immobilized on a solid support, and subsequently incubated the resulting complex with TnsC and TniQ in the presence ATP to trigger TnsC filament assembly. After extensive washing and elution, the sample was vitrified and imaged using cryo-EM for single-particle analysis. By stringent selection of Cas12k particles displaying adjacent filament-like densities, and subsequent 2-D/3-D classification, we were able to obtain a reconstruction of the Cas12k-TnsC transposon recruitment complex at a resolution of 3.3 A (Figure 1 , Figure 8 Table 2).
The resulting atomic model of the complex comprises Cas12k and the sgRNA guide bound to the target site in the DNA, together with a single TniQ molecule and an emerging right-handed TnsC filament assembled on the PAM-distal region of the target DNA. Cas12k binds the DNA in a guide RNA- and PAM-dependent manner, as observed previously in the structure of the Cas12k-sgRNA- target DNA complex (Querques et al., 2021 ; Xiao et al., 2021 ). TniQ makes direct contacts with the tracrRNA part of the sgRNA and two TnsC protomers, thereby bridging Cas12k and the TnsC filament without directly interacting with Cas12k. The TnsC filament is assembled with the TnsC C- terminal domains pointing away from Cas12k. The reconstruction contains additional proteinaceous density, which we were able to assign to a single copy of the Escherichia coli ribosomal S15 protein that was serendipitously co-purified with TniQ, as verified by mass- spectrometric analysis of the TniQ sample. Notably, S15 makes extensive contacts with both Cas12k and the tracrRNA part of the sgRNA (Figure 1C,D).
Within the same cryo-EM sample, we were additionally able to identify a separate population of particles comprising Cas12k bound to DNA together with TnsC oligomers assembled in the reverse orientation (i.e. with TnsC C-terminal domains pointing towards Cas12k), obtaining a reconstruction at a resolution of 4.1 A (not shown). Notably, the R-loop in this assembly remained in its incomplete form and both TniQ and S15 are absent from this reconstruction (Figure 9). Furthermore, there are very few direct intermolecular contacts between Cas12k-sgRNA and the proximal end of the TnsC filament, suggesting that this molecular assembly represents a non-productive complex in which TnsC spontaneously oligomerized on the target DNA in a Cas12k-independent manner.
R-loop completion occurs upon TniQ and TnsC binding
Previous structures of the Cas12k-guide RNA-target DNA complexes revealed incomplete hybridization of the crRNA sequence and the target strand (TS) of the DNA, resulting in a nine- base pair (bp) duplex (Querques et al., 2021 ; Xiao et al., 2021 ). In the present structure of the Cas12k-transposon recruitment complex, the crRNA and the target DNA form a complete R-loop structure comprising 17 base pairs, beyond which the TS and the non-target strand (NTS) rehybridize. crRNA-TS DNA hybridization beyond the 17th base pair is prevented by TniQ binding to the complete R-loop, leaving seven unpaired nucleotides at the 3’ end of the crRNA spacer sequence. This observation is consistent with previous studies showing that 3’-terminally truncated crRNAs comprising 17-nucleotide spacer segments supported type V CRISPR-associated transposon activity in vivo (Saito et al., 2021 ). Overall, the DNA adopts a bent conformation, with the PAM-distal DNA exiting Cas12k at a 122° angle relative to the PAM-proximal DNA duplex
(Figure 2A). The backbone of the displaced NTS can be completely traced as it wraps around Cas12k, passing through a gap between the REC lobe and the RuvC domain (Figure 2A). Completion of the R-loop occurs by TniQ interacting with the PAM-distal end of the sgRNA-TS heteroduplex, and is enabled by conformational rearrangements within Cas12k (Figure 2A,B). The Cas12k bridge helix, which precludes full R-loop formation in the structure of the Cas12k-sgRNA- target DNA complex (Querques et al., 2021 ; Xiao et al., 2021 ), is repositioned to expose the binding cleft for the PAM-distal part of the sgRNA-TS DNA heteroduplex (Figure 10 A,B). This is accompanied by structural ordering of the REC lobe motifs, whereby the REC1 domain (residues 12-239Cas12k) interacts with the unpaired NTS while the REC2 domain (residues 240-278Cas12k) contacts the extended crRNA-TS heteroduplex. Further conformational rearrangements occur in the RuvC domain, where an alpha-helical hairpin (residues 548-590Cas12k) is repositioned to contact the NTS and the tracrRNA scaffold (Figure 10 B,C).
TniQ recognizes tracrRNA and R-loop
In the Cas12k-transposon recruitment complex, TniQ is confined by the tracrRNA rooftop loop and the PAM-distal end of the crRNA-TS DNA heteroduplex on one side and the TnsC filament on the other (Figure 3A). The rooftop loop (nucleotides 167-171 tracrRNA), which is structurally disordered in the Cas12k-sgRNA-target DNA complex (Querques et al., 2021 ; Xiao et al., 2021 ), now assumes a well-defined pentaloop conformation whose shape is read out by hydrogen bonding contacts with side chains of Gln93TniQ, Arg98TniQ, Lys128TniQ, Lys132TniQ, Gin 137TniQ , in addition to a - stacking interaction between rA169tracrRNA and Trp120TniQ (Figure 3B). To validate the observed interactions, we tested the effect of TniQ and tracrRNA mutations on the transposition activity of ShCAST in vivo using quantitative droplet-digital PCR analysis. Mutations of a subset of interacting residues substantially reduced transposition activity. In turn, substitution of the rooftop pentaloop with a GAAA tetraloop or adenine substitution of ui 68tracrRNA led to complete loss of transposition, while individual substitutions of other pentaloop nucleotides substantially reduced transposition activity (Figure 3C). TniQ further interacts with the PAM-distal end of the R-loop. Here, the terminal base pair of the crRNA-TS heteroduplex is contacted by Asn59TniQ at the minor groove edge and capped by a TT-TT stacking interaction with His57TniQ, which is in turn hydrogen bonded to His94TniQ (Figure 3D). Mutations of TniQ residues directly interacting with the crRNA-TS DNA duplex substantially reduced transposition activity in vivo (Figure 3E). Together with our structural observations, these results confirm the critical role of the tracrRNA rooftop loop as a TniQ interaction site, and its significance for TnsC recruitment to support transposition activity of type V CRISPR-associated transposon. Furthermore, the observed interactions of TniQ with the PAM-distal end of the guide RNA-TS DNA heteroduplex suggest that R-loop completion is facilitated by TniQ recruitment.
TniQ nucleates TnsC filament formation
Positioned by interactions with the tracrRNA rooftop loop and the R-loop, the single TniQ molecule in the Cas12k-transposon recruitment complex straddles two TnsC protomers at the Cas12k- proximal end of the TnsC filament (Figure 4A). The C-terminal zinc finger domain (ZnF2) of TniQ
contacts the terminal TnsC protomer (TnsC1 ), mostly via electrostatic interactions (Figure 4B). In turn, the N-terminal HTH domain of TniQ interacts extensively with the next TnsC protomer (TnsC2) in the filament. Notably, the N-terminal tail of TniQ inserts into a cleft in the a/0 AAA+ domain of TnsC2, with the aromatic side chain of Trp10TniQ sandwiched by hydrophobic interactions with Tyr1 15TnsC2 and Pro86TnsC2 (Figure 4C). Glutamate substitution of TnsC1 -interacting residue Arg155TniQ reduced in vivo by -50%, suggesting that the interaction of TniQ with TnsC1 contributes to transposon recruitment. In contrast, N-terminal truncation of TniQ to remove residues 1-12 resulted in complete loss of transposition in vivo, while alanine substitution of Trp10TniQ resulted in >90% reduction (Figure 4D). Together, these results validate the critical role of the TniQ N-terminal tail for the interaction with the TnsC filament and suggest its involvement in filament nucleation.
Previous studies have shown that in the absence of Cas12k and guide RNA, TniQ caps TnsC filaments assembled on dsDNA (Park et al., 2021 ), thereby restricting TnsC polymerization in vitro (Park et al., 2021 ; Querques et al., 2021 ). We extended these findings by docking a previously determined crystal structure of TniQ (Querques et al., 2021 ) into a 3.5 A-resolution cryo-EM reconstruction of a TniQ-capped TnsC filament obtained by single-particle analysis (Table2). Altogether, three copies of TniQ assemble at the end of the TnsC filament, each bridging a TnsC dimer within the terminal hexameric helical turn of the filament. Overall, the interactions between the TniQ and TnsC dimers are highly similar with the interactions observed in the Cas12k- transposon recruitment complex. However, binding of Cas12k-gRNA to a TnsC filament fully capped by TniQ would not be compatible due to steric clashes with two of the three TniQ copies.
TnsC assembles at distal end of TniQ-bound R-loop
At the PAM-distal end of the R-loop within the Cas12k-transposon recruitment complex, the TS bends away from the crRNA-TS heteroduplex and exits through a narrow channel formed by the Cas12k RuvC domain and TniQ to immediately rehybridize with the NTS (Figure 5A). The first base pair of the reformed TS-NTS duplex (position 18) stacks against the aromatic side chains of Tyr570Cas12k and Phe567Cas12k, while TS nucleotides at positions 19 and 20 make backbone interactions with TniQ by hydrogen bonding with Ser36TniQ and Ser38TniQ (Figure 5B). This positions the TS-NTS duplex for entry into the TnsC helical filament. The terminal TnsC protomer (TnsC1 ) interacts with NTS nucleotides at positions 22 and 23 via Thr121TnsC1, Lys103TnsC1 and Lys150TnsC1 (Figure 5C). The next TnsC protomer (TnsC2) interacts with the minor groove of the duplex, contacting both TS and NTS, while TnsC3 and subsequent protomers interact mostly with the NTS. Overall, the interactions of the TnsC filament with the DNA involve the same residues (Thr121 TnsC, Lys103TnsC and Lys150TnsC) as previously observed in the structures of dsDNA-bound TnsC filaments and validated by mutational analysis in vivo (Park et al., 2021 ; Querques et al., 2021 ). Similarly, the PAM-distal DNA duplex is distorted from the canonical B-form geometry to match the helical symmetry of the TnsC filament. Crucially, the TnsC filament in the Cas12k-TnsC recruitment complex tracks the DNA strand with the opposed polarity, i.e. the NTS, as compared with the TnsC-only filament (Figure 5D).
Ribosomal S15 protein supports Cas12k-transposon complex assembly
The E. coll ribosomal protein S15 (EcSh15) captured in the Cas12k-transposon recruitment complex is wedged between the Cas12k REC2 domain and the tracrRNA connector duplex and contacts the ribose-phosphate backbone of the crRNA in the PAM-distal part of the crRNA-TS DNA heteroduplex (Fig. 6A, Figure 11A-C). As within the small ribosomal subunit, EcS15 adopts a four- helix bundle fold and interacts with the tracrRNA and the heteroduplex in a manner that mimics its interactions with 16S rRNA. The combined fold of EcS15 and the Cas12k REC2 domain is highly similar to the helical fold found in the REC2 (Helical II) domain of the distantly related Cas12e (CasX) nuclease (Liu et al., 2019). EcS15 makes extensive shape-complementary interactions with the tracrRNA rooftop loop via electrostatic interactions mediated by Arg72EcS15, Lys73EcS15 and Arg77EcS15, and n-it stacking between Tyr69EcS15 and rA171tracrRNA, suggesting that it stabilizes the tracrRNA rooftop loop in a conformation that supports TniQ recruitment. Notably, E. coll and S. hofmanni S15 (ShS15) protesin share 58% sequence identity and the residues involved in contacting the tracrRNA and Cas12k are nearly invariant between the orthologs (Fig 11C). This suggests that ShS15 might act as a bona fide component of the transposon recruitment complex to promote the interactions of Cas12k with TniQ, thereby contributing the TnsC recruitment. To test this hypothesis, we used pull-down experiments to observe the co-precipitation of TniQ and TnsC with immobilized Strepll-tagged Cas12k-sgRNA-target DNA complex in the presence of purified EcS15 or ShS15 proteins (Figure 6B). For these experiments, TniQ was re-purified according to a stringent protocol that minimized EcS15 contamination. Both EcS15 and ShS15 were efficiently coprecipitated by the Cas12k-sgRNA-DNA complex. TniQ co-precipitation was markedly enhanced in the presence of EcS15 or ShE15 and TnsC, suggesting that S15 proteins facilitate the cooperative assembly of TniQ with TnsC and Cas12k-sgRNA on target DNA. To further test the effect of EcS15 and ShS15 proteins on Cas12k-dependent transposition, we performed in vitro transposition assays using donor and acceptor plasmids, sgRNA and purified recombinant proteins, monitoring transposition efficiency by droplet-digital PCR analysis. In the absence of S15 proteins, only very low levels of sgRNA-dependent transposition could be detected. Transposition efficiency was substantially enhanced by the addition of EcS15 or ShS15 (Figure 6C). These results indicate that S15 promotes transposition by facilitating the assembly of the Cas12k- transposon recruitment complex, suggesting that it may function as an integral part of the type V CRISPR-associated transposon machinery.
Discussion
The molecular function of CRISPR-associated transposons relies on the concerted activities of the RNA-guided CRISPR effector and transposase modules. In type V CRISPR-associated transposons, this is thought to involve interactions of the AAA+ ATPase transposon regulator TnsC and the RNA-guided effector Cas12k at the target site but the mechanistic details have remained elusive thus far. Our structural analysis of the Cas12k-transposon recruitment complex shows that the tracrRNA component of the Cas12k guide RNA and TniQ play key roles in the process by
bridging Cas12k and the ATP-dependent T nsC filament assembled on the target DNA. Our findings provide evidence that the formation of a complete R-loop structure within Cas12k occurs upon TniQ and TnsC recruitment, and thus serves as a structural checkpoint for transposon recruitment complex assembly, which is likely to have mechanistic parallels in the type I CRISPR-associated transposon systems, as hinted by previous structural analysis of the Type l-F Cascade-TniQ complex (Halpin-Healy et al., 2019; Jia et al., 2020; Li et al., 2020).
TniQ was previously shown to cap TnsC filaments and restrict their polymerization on free linear dsDNA in vitro (Park et al., 2021 ; Querques et al., 2021 ). Based on these observations, we proposed a mechanistic model in which we placed TniQ at the Cas12k-distal end of the TnsC filament, restricting its polymerization to the vicinity of Cas12k (Querques et al., 2021 ). An alternative model posited that TnsC polymerization initiates randomly on target DNA and selectively stabilized by interactions with target site-bound Cas12k- and TniQ (Park et al., 2021 ). Our Cas12k- transposon recruitment complex structure reveals that a single TniQ copy bridges two TnsC protomers at the Cas12k-proximal end of the TnsC filament. Based on these findings and their functional validation, we thus propose a revised model in which TniQ neither caps TnsC filaments nor stabilizes filaments randomly nucleated on DNA (Figure 7). Instead, the cooperative assembly of Cas12k, guide RNA and TniQ directly nucleates TnsC filament formation starting at the PAM- distal end of the Cas12k R-loop. Although we cannot conclusively rule out a mechanism based on randomly-intiated TnsC polymerization followed by specific capture and stabilization by DNA-bound Cas12k, a parsimonious interpretation of the structural and functional data points to site-specific TnsC filament nucleation at the target site defined by the Cas12k guide RNA. This is also supported by the observation that the polarity of the tracking DNA strand on which the TnsC filament assembles in the context of the Cas12k-transposon recruitment complex (the NTS strand) appears to be opposite to the one observed in the standalone TnsC filament assembled on dsDNA (Park et al., 2021 ; Querques et al., 2021 ).
Overall, the transposon recruitment mechanism of type V CRISPR-associated transposons is thus likely to be analogous to that of type I CRISPR-associated transposons in that they both likely rely on TniQ-dependent assembly of TnsC. However, in type I systems TniQ is an integral component of the Cascade complex (Halpin-Healy et al., 2019; Jia et al., 2020; Li et al., 2020). In type V, TniQ does not form a stable complex on its own with Cas12k-sgRNA and is instead recruited cooperatively together with TnsC (and S15, as discussed below). It is nevertheless possible that TniQ may also interact with TnsC in a Cas12k-independent manner, which might lead to off-target transposon recruitment and thus explain why type V CRISPR-transposon systems appear to be less specific than type I systems and more prone to off-target transposon insertion (Strecker et al., 2019; Vo et al., 2021a).
In view of the transposon recruitment complex structure, the previously characterized distance between the Cas12k target site and the transposon insertion site (60-66 bp downstream from the PAM) implies that formation of four complete helical turns of the TnsC filament is required for TnsB
recruitment and transposon insertion. As TnsC assembles into hexameric rings on dsDNA in the presence of ADP (Park et al., 2021 ), it is thought that TnsB-stimulated disassembly of TnsC filaments would result in a TnsC hexamer remaining bound to the Cas12k-TniQ R-loop complex, and the physical footprint of the resulting complex would explain the distance requirement for insertion site selection. However, modeling the ADP-bound TnsC hexamer onto the Cas12k-TnsC transposon recruitment complex suggests that the insertion site would be located approximately 40-46 bp from the edge of the TnsC hexamer. It is possible that the discrepancy might be due to the large physical footprint of the TnsB-transposon complex, as indicated by a recent structural study of the Tn7 TnsB bound to transposon end DNA (Kaczmarska et al., 2022). The molecular ruler mechanism determining the distance between the Cas12k target and transposon insertion thus sites remains unclear.
Finally, our structural and functional data indicate that the bacterial ribosomal protein S15 is an integral component of the type V CRISPR-transposon system, as it is allosterically stimulates the assembly of TniQ and, indirectly, TnsC in the Cas12k-transposon recruitment complex, thereby enhancing RNA-guided transposition in vitro. The involvement of a host-encoded “housekeeping” factor in the activity of a CRISPR effector complex is so far unprecedented. These findings have important implications for the genome engineering application of CRISPR-associated transposons. Type V systems have not yet been demonstrated to support RNA-guided transposition in eukaryotic cells and it is conceivable that this because the functional parts list of type V CRISPR-transposons has hitherto been incomplete. In sum, this work sheds light on a fundamental step in the biological mechanism of CRISPR-associated transposon systems and lays the mechanistic foundation for their development as next-generation genome engineering technologies.
Formats
For implementation of type V CRISPR-associated transposon (CAST) systems for genome editing in mammalian/human (or more generally eukaryotic) cells, the components of the system can be delivered in several formats. These include delivery of ribonucleoprotein (RNP) complexes comprising recombinant CAST proteins and synthetic or in vitro transcribed guide RNAs, delivered into cells alongside a DNA vector encoding the cargo sequence to be inserted, flanked by left-end (LE) and right-end (RE) transposon DNA sequences. Alternatively, the protein and guide RNA components of the CAST system can be delivered and expressed in the form of in vitro transcribed mRNA. Finally, the components can be delivered in the form of DNA expression vectors, including DNA plasmids or viral vectors such as lentiviral, adenoviral or adenovirus-associated viral vectors.
For both mRNA- or DNA-based delivery into the target cell, e.g. human cell, the protein-coding sequences of the individual components are codon-optimized.
Figure 12 provides a non-exhaustive list of examples of expression constructs for the individual components of the type V CRISPR-transposon system.
In DNA-based vectors, the expression of CAST protein-coding sequences is driven by a eukaryotic promoter, for example the human cytomegalovirus (CMV) promoter (Figure 12, constructs m1-m5 and p). Expression of the guide RNA is driven by an RNA polymerase III promoter such as the U6 promoter (Figure 12, r1 and r2). In certain embodiments, the RNA expression construct contains a 3’-terminal sequence encoding the hepatitis delta virus (HDV) self-cleaving ribozyme to facilitate transcriptional termination and proper processing of the transcribed RNA (Figure 12, r1 ).
In certain embodiments, the individual protein components of the CAST system are delivered and expressed from monocistronic constructs (Figure 12, m1-m5). In other embodiments, the components are expressed from a polycistronic DNA construct in which the protein-coding DNA sequences are linked with sequences encoding self-cleaving peptides such as the T2A peptide (Figure 12, p).
To enable nuclear localization of the protein components, the proteins are expressed as fusion proteins in which the native polypeptide sequence is fused to a eukaryotic nuclear localization signal (NLS) peptide. For some protein components, e.g. TnsB, TnsC and S15, the NLS is fused to the amino- (N-) terminus of the native polypeptide sequence. For other proteins, e.g. Cas12k and TniQ, the NLS is fused to the carboxy- (C-) terminus.
The donor DNA containing the sequence of interest to be inserted (e.g. coding sequences for therapeutic proteins, chimeric antigen receptors) is designed so as the insert sequence is flanked by appropriate left-end (LE) and right-end (RE) transposon DNA sequences recognized by the TnsB transposase of the CAST system (Figure 12, d).
Experimental Methods
Plasmid DNA constructs
The DNA sequences of Scytonema hofmanni Cas12k (WP_029636312.1 ), TnsC (WP_029636336.1 ), TniQ (WP_029636334.1 ), S15 (WP_029633173.1 ) and Escherichia coli S15 (AP009048.1 ) proteins were codon optimized for heterologous expression in Escherichia coli (E. coli) and synthesized by GeneArt (Thermo Fisher Scientific). The ShCasI 2k gene was inserted into the 1 B (Addgene 29653) and 1 R (Addgene 29664) plasmids using ligation-independent cloning (LIC), resulting in constructs carrying an N-terminal hexahistidine tag followed by a tobacco etch virus (TEV) protease cleavage site and a N-terminal hexahistidine-Strepll tag followed by a TEV cleavage site, respectively. The ShTnsC gene was inserted into the 1 S (Addgene 29659) plasmid to produce a construct carrying a N-terminal hexahistidine and hexahistidine-SUMO tag followed by a TEV cleavage site and the ShTniQ was cloned into the 1 C (Addgene 29659) vector generating a construct carrying a hexahistidine-maltose binding protein (6xHis-MBP) tag followed by a TEV cleavage site. The EcS15 and ShS15 genes were inserted into the 1 C vector. Point mutations were introduced by Gibson assembly using gBIock gene fragments synthetized by IDT or annealed oligonucleotides provided by Sigma as inserts. The pDonor (Addgene 127921 ), pHelper (Addgene 127924), and pTarget (Addgene 127926) plasmids used in droplet digital PCR experiments were
sourced from Addgene. The PSP1 -targeting spacer was cloned into pHelper by Gibson assembly, yielding pHelper-PSP1 . Mutations in the sgRNA or in the sequence of the ShCas12k and ShTnsC genes were introduced into the pHelper plasmid by Gibson assembly. Plasmids were cloned and propagated in Mach I cells (Thermo Fisher Scientific) with the exception of pHelper, which was grown in One Shot PIR1 cells (Thermo Fisher Scientific). Plasmids were purified using the GeneJET plasmid miniprep kit (Thermo Fisher Scientific) and verified by Sanger sequencing.
Amino acid sequences and DNA sequences, optimized for eukaryotic expression of the recombinant bacterial proteins with nuclear localisation sequences and purification tags, are given in SEQ ID NO 001 to 012. Sequences for sgRNA and full polycistronic expression constructs are given in SEQ ID NO 013 to 020.
Protein expression and purification
For expression of ShCas12k constructs, hexahistidine-Strep Il-tagged and hexahistidine-tagged ShCas12k proteins were expressed in E. coli BL21 Star (DE3) cells. Cell cultures were grown at 37 °C shaking at 100 rpm until reaching an ODeoo of 0.6 and protein expression was induced with 0.4 mM IPTG (isopropyl-P-d-thiogalactopyranoside) and continued for 16 h at 18 °C. Harvested cells were resuspended in 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 5 mM Imidazole, 1 pg/mL Pepstatin, 200 pg/mL AEBSF and lysed in a Maximator Cell homogenizer at 2,000 bar and 4 °C. The lysate was cleared by centrifugation at 40,000 x g for 30 min at 4 °C and applied to two 5 mL Ni-NTA cartridges connected in tandem. The column was washed with 100 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 5 mM Imidazole before elution with 50 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 250 mM Imidazole. Elution fractions were pooled and dialyzed overnight against 20 mM HEPES-KOH pH 7.5, 250 mM KCI, 1 mM EDTA, 1 mM DTT in the presence of tobacco etch virus (TEV) protease. Dialyzed proteins were loaded onto a 5 mL HiTrap Heparin HP column (GE Healthcare) and eluted with a linear gradient of 20 mM HEPES-KOH pH 7.5, 1 M KCI. Elution fractions were pooled, concentrated using 30,000 molecular weight cut-off centrifugal filters (Merck Millipore) and further purified by size exclusion chromatography using a Superdex 200 (16/600) column (GE Healthcare) in 20 mM HEPES-KOH pH 7.5, 250 mM KCI, 1 mM DTT yielding pure, monodisperse proteins. Purified proteins were concentrated to 10-15 mg mL-1, flash frozen in liquid nitrogen and stored at -80 °C until further usage.
Expression of wild-type ShTnsC and ShTniQ was performed in E. coli BL21 Rosetta2 (DE3) cells. Cells were grown in LB medium until reaching an ODeoo of 0.6 and expression was induced by addition of 0.4 mM IPTG. Proteins were expressed at 18 °C for 16 h. For ShTnsC, the cells were harvested and resuspended in lysis buffer containing 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 5% glycerol and 10 mM imidazole supplemented with EDTA-free protease inhibitor (Roche). The cell suspension was lysed by ultrasonication and the lysate was cleared by centrifugation at 40,000 x g for 40 min. Cleared lysate was applied to a 5 mL Ni-NTA cartridge (Qiagen). The column was washed in two steps with lysis buffer supplemented with 25 mM and 100 mM imidazole, and bound proteins were eluted with 25 mL of same buffer supplemented with 500 mM imidazole pH 7.5.
Eluted proteins were dialysed overnight against 20 mM Tris-HCI pH 7.5, 250 mM NaCI, 5% glycerol, 1 mM DTT in the presence of TEV protease. The protein was further purified using a 5 mL HiTrap HP Heparin column (GE Healthcare) and eluted with a buffer containing 20 mM Tris-HCI pH 7.5, 700 mM NaCI, 5% glycerol and 1 mM DTT. The eluted fraction was concentrated and further purified by size exclusion chromatography using an S200 increase (10/300 GL) column (GE Healthcare) in 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 1 mM DTT, yielding pure, monodisperse proteins. Purified ShTnsC was concentrated to 1-2 mg mL-1 using 30,000 kDa molecular weight cut-off centrifugal filters (Merck Millipore) and flash-frozen in liquid nitrogen.
For ShTniQ, cells were harvested and resuspended in 20 mM Tris-HCI pH 7.8, 500 mM NaCI, 5% glycerol and 5 mM imidazole supplemented with EDTA-free protease inhibitor (Roche) and lysed by ultrasonication. The cleared lysate was applied to a 5 mL Ni-NTA cartridge (Qiagen) and the column was washed in two steps with lysis buffer supplemented with 25 and 50 mM imidazole. The protein was eluted with lysis buffer supplemented with 300 mM imidazole. Eluted protein was dialysed overnight against 20 mM Tris-HCI pH 7.8, 500 mM NaCI, 1 mM DTT in the presence of TEV protease. Dialysed protein was passed through a 5 mL MBPTrap column (GE Healthcare). The flow-through fraction was concentrated and further purified by size exclusion chromatography using a Superdex 200 (16/600) column (GE Healthcare) in 20 mM Tris-HCI pH 7.8, 250 mM NaCI, 1 mM DTT. Purified ShTniQ was concentrated to 10 mg mL-1 using 10,000 kDa molecular weight cut-off centrifugal filters (Merck Millipore) and flash-frozen in liquid nitrogen. To produce recombinant ShTniQ protein free of the E. coli S15 contaminant, the protocol was adjusted as follows. Cells were harvested and resuspended in 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 5% glycerol and 5 mM imidazole supplemented with EDTA-free protease inhibitor (Roche) and lysed by ultrasonication. The lysate was cleared by centrifugation at 40,000 x g for 30 min at 4 °C and applied to two 5 mL Ni-NTA cartridges connected in tandem. The column was washed with 150 mL of 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 5 mM imidazole, 5 % glycerol before elution with 50 mL of 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 500 mM imidazole, 5 % glycerol into two 5 mL MBP-trap cartridges (GE Healthcare) connected in tandem before removal of the Ni-NTA cartridges. The MBP-trap column was washed with 50 mL of 20 mM Tris-HCI pH 7.5, 500 mM NaCI, 500 mM imidazole, 5 % glycerol before elution with 50 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 500 mM Imidazole, 5 % glycerol, 10 mM maltose. Elution fractions were pooled and dialyzed overnight against 20 mM HEPES-KOH pH 7.5, 100 mM KCI, 1 mM DTT in the presence of tobacco etch virus (TEV) protease. Dialyzed proteins were loaded onto two 5 mL HiTrap Heparin HP columns (GE Healthcare) connected in tandem, and eluted with a linear gradient of 20 mM HEPES-KOH pH 7.5, 1 M KCI. Elution fractions were pooled, concentrated using 3,000 molecular weight cut-off centrifugal filters (Merck Millipore) and further purified by size exclusion chromatography using a Superdex 75 (16/600) column (GE Healthcare) in 20 mM Tris-HCI pH 7.5, 250 mM NaCI, 1 mM DTT yielding pure, monodisperse proteins. Purified proteins were concentrated to 1 .5-5.0 mg mL-1, flash frozen in liquid nitrogen and stored at -80 °C until further usage. Absence of the E. coli S15
in the purified ShTniQ produced using the revised protocol was confirmed by mass spectrometry as described below.
For expression of EcS15 and ShS15, hexahistidine-MBP-tagged proteins were expressed in E. coli BL21 Rosetta2 (DE3) cells. Cell cultures were grown at 37 °C shaking at 100 rpm until reaching an ODeoo of 0.6 and protein expression was induced with 0.4 mM IPTG (isopropyl-P-d- thiogalactopyranoside) and continued for 16 h at 18 °C. Harvested cells were resuspended in 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 5 mM Imidazole, 1 pg/mL Pepstatin, 200 pg/mL AEBSF and lysed by ultrasonication at 4 °C. The lysate was cleared by centrifugation at 40,000 x g for 30 min at 4 °C and applied to two 5 mL Ni-NTA cartridges connected in tandem. The column was washed with 100 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 5 mM Imidazole before elution with 50 mL of 20 mM Tris-HCI pH 8.0, 500 mM NaCI, 250 mM Imidazole. Elution fractions were pooled and dialyzed overnight against 20 mM HEPES-KOH pH 7.5, 150 mM KCI, 1 mM DTT in the presence of tobacco etch virus (TEV) protease. Dialyzed proteins were loaded onto a 5 mL HiTrap Heparin HP column (GE Healthcare) and eluted with a linear gradient of 20 mM HEPES-KOH pH 7.5, 1 M KCI. Elution fractions were pooled, concentrated using 3,000 molecular weight cut-off centrifugal filters (Merck Millipore) and further purified by size exclusion chromatography using a Superdex 200 (16/600) column (GE Healthcare) in 20 mM HEPES-KOH pH 7.5, 250 mM KCI, 1 mM DTT yielding pure, monodisperse proteins. Purified proteins were concentrated to 2-5 mg mL-1, flash frozen in liquid nitrogen and stored at -80 °C until further use.
Mass spectrometry analysis
Samples resulting from purification of TniQ following the Protocol 1 were treated as follows prior to digestion. Proteins were precipitated with trichloroacetic acid (TCA; Sigma-Aldrich) at a final concentration of 5% and washed twice with ice-cold acetone. The protein pellet was resuspended in 45 pL of digestion buffer (10 mM Tris, 2 mM CaCI2, pH 8.2). Samples resulting from purification of TniQ following the Protocol 2 were directly diluted in digestion buffer (10 mM Tris, 2 mM CaCI2, pH 8.2) and reduced with 2 mM TCEP(tris(2-carboxyethyl)phosphine) and alkylated with 15 mM chloroacetamide at 30°C for 30 min. Digestion was performed for all samples using the same procedure: 500 ng of Sequencing Grade Trypsin (Promega) were added for digestion carried out in a microwave instrument (Discover System, CEM) for 30 min at 5 W and 60 °C. The samples were dried to completeness and re-solubilized in 20 pL of MS sample buffer (3% acetonitrile, 0.1 % formic acid).
LC-MS/MS analysis was performed on an Q Exactive HF mass spectrometer (Thermo Scientific) equipped with a Digital PicoView source (New Objective) and coupled to an M-Class UPLC (Waters). Solvent composition at the two channels was 0.1 % formic acid for channel A and 0.1 % formic acid, 99.9% acetonitrile for channel B. For each sample, 80 ng of peptides were loaded on a commercial ACQUITY UPLC M-Class Symmetry C18 Trap Column (100A, 5 pm, 180 pm x 20 mm, Waters) connected to a ACQUITY UPLC M-Class HSS T3 Column (100A, 1.8 pm, 75 pm X 250 mm, Waters). The peptides were eluted at a flow rate of 300 nL/min. After a 3 min initial hold
at 5% B, a gradient from 5 to 35 % B in 60 min and 35 to 40% B in additional 5 min was applied. The column was cleaned after the run by increasing to 95 % B and holding 95 % B for 10 min prior to re-establishing loading condition.
The mass spectrometer was operated in data-dependent mode (DDA), funnel RF level at 60 %, and heated capillary temperature at 275 °C. Full-scan MS spectra (350-1500 m/z) were acquired at a resolution of 120’000 for sample from Protocol 2 and of 70’000 for sample from Protocol 1 at 200 m/z after accumulation to a target value of 3’000’000, followed by HCD (higher-energy collision dissociation) fragmentation on the twelve most intense signals per cycle. Ions were isolated with a 1.2 m/z isolation window and fragmented by higher-energy collisional dissociation (HCD) using a normalized collision energy of 28% for sample from Protocol 2 and of 25% for sample from Protocol 1 . HCD spectra were acquired at a resolution of 30’000 and a maximum injection time of 50 ms for sample from Protocol 2 and at a resolution of 35’000 and a maximum injection time of 120 ms for sample from Protocol 1 . The automatic gain control (AGC) was set to 100’000 ions. Charge state screening was enabled and singly and unassigned charge states were rejected. Precursor masses previously selected for MS/MS measurement were excluded from further selection for 30 s, and the exclusion window tolerance was set at 10 ppm. The samples were acquired using internal lock mass calibration on m/z 371.1010 and 445.1200. The mass spectrometry proteomics data were handled using the local laboratory information management system (LIMS).
The acquired raw MS data were processed by MaxQuant (version 1.6.2.3), followed by protein identification using the integrated Andromeda search engine. Spectra were searched against a Uniprot E. coli proteome database (UP000000625, taxonomy 83333, version from 2021-07-12), concatenated to its reversed decoyed fasta database and the sequences of the proteins of interest. Acetyl (Protein N-term) oxidation (M) and deamidation (NQ) were set as variable modification. Enzyme specificity was set to trypsin/P allowing a minimal peptide length of 7 amino acids and a maximum of two missed-cleavages. MaxQuant Orbitrap default search settings were used. The maximum false discovery rate (FDR) was set to 0.01 for peptides and 0.05 for proteins. Peptide identifications were accepted if they achieved a false discovery rate (FDR) of less than 0.1 % by the Scaffold Local FDR algorithm. Protein identifications were accepted if they achieved an FDR of less than 1 .0% and contained at least 2 identified peptides. sgRNA preparation
DNA sequence encoding the T7 RNA polymerase promoter upstream of the ShCas12k-sgRNA was sourced as a gBIock (IDT), cloned into a pUC19 plasmid using restriction digest with BamHI and EcoRI, and confirmed by Sanger sequencing. The sequence encoding the T7 RNA polymerase promoter and sgRNA was amplified by PCR and purified by ethanol precipitation for use as template for in vitro transcription with T7 RNA polymerase as described previously. The transcribed RNA was gel purified, precipitated with 70 % (v/v) ethanol, dried and dissolved in nuclease-free water.
Cryo-EM sample preparation and data collection
The ShCas12k-transposon recruitment complex was generated by stepwise assembly on Strep-tactin matrix as follows. First, the sgRNA was mixed with hexahistidine-Strepll-tagged ShCas12k (Strep- Cas12k) in assembly buffer (20 mM HEPES-KOH pH 7.5, 250 mM KOI, 10 mM MgCI2, 1 mM DTT) and incubated 20 min at 37 °C to allow formation of a binary Strep-Cas12k-sgRNA complex. Next, a target dsDNA duplex was added to the reaction in a Strep-Cas12k:sgRNA:dsDNA molar ratio of 1 :1.2:2 and incubated for 20 min at 37 °C, yielding an assembled Strep-Cas12k-sgRNA-target DNA complex. The final 30 pL reaction contained 20 pg (at a concentration of 8.4 pM, 0.7 mg/mL) Strep-Cas12k, 10.1 pM sgRNA, and 16.9 pM DNA in assembly buffer. The resulting sample was then mixed with 25 pL Strep- Tactin beads (iba) equilibrated in pull-down wash buffer 1 (20 mM HEPES-KOH pH 7.5, 250 mM KOI, 10 mM MgCI2, 1 mM DTT, 0.05 % Tween20) and incubated 30 min at 4 °C on a rotating wheel. The beads were washed three times with pull-down wash buffer 1 to remove excess nucleic acids. The beads were resuspended in 250 pL of pull-down wash buffer 1 before ShTniQ (purified following Protocol
1 and thus containing co-purifying E. coli S15) and ShTnsC were added in 30-fold and 10-fold molar excess, respectively. ATP was added to a final concentration of 1 mM. The sample was incubated for 30 min at 37 °C, then washed three times with pull-down wash buffer 2 (20 mM HEPES-KOH pH 7.5, 250 mM KOI, 10 mM MgCI2, 1 mM DTT, 1 mM ATP, 0.05 % Tween20) and eluted with pull-down elution buffer (20 mM HEPES-KOH pH 7.5, 250 mM KOI, 10 mM MgCI2, 1 mM DTT, 1 mM ATP, 5 mM desthiobiotin). The eluted sample were analyzed by SDS-PAGE using Any kDa gradient polyacrylamide gels (Bio-Rad) stained with Coomassie Brilliant Blue and by denaturing PAGE on a 10 % polyacrylamide- 7 M urea gel upon proteinase K digestion for 15 min at 37 °C prior to preparation of cryo-EM grids. Of note, recombinantly produced S15 was not added to the sample subjected to structural analysis. S15 was identified as a component of the Cas12k-transposon recruitment complex after structure determination and model building and was confirmed to have co-purified with TniQ using mass spectrometry as described below. Prior to structural analysis by cryo-EM, complex homogeneity was assessed by negative stain electron microscopy. Negative stain EM grids were prepared as described above using a sample that was 40 times diluted as compared to the one used for cryo-EM. For preparation of cryo-EM grids, 2.5 pL of sample was applied to glow-discharged 200-mesh copper
2 nm C R1.2/1.3 cryo-EM grids (Quantifoil Micro Tools), blotted 3 s at 75 % humidity, 4 °C, plunge frozen in liquid ethane (using a Vitrobot Mark IV plunger, FEI) and stored in liquid nitrogen. Cryo- EM data collection was performed on a FEI Titan Krios G3i microscope (University of Zurich, Switzerland) operated at 300 kV equipped with a Gatan K3 direct electron detector in super-resolution counting mode. A total of 16,165 movies were recorded at a calibrated magnification of 130,000 x resulting in super-resolution pixel size of 0.325 A. Each movie comprised 36 subframes with a total dose of 67.68 e- A"2. Data acquisition was performed with EPU Automated Data Acquisition Software for Single Particle Analysis (ThermoFisher) with three shots per hole at -1.0 pm to -2.4 pm defocus (0.2 pm steps).
Image processing and model building
Images were processed using RELION and SPHIRE software packages. All movies were motion- corrected and dose-weighted with MotionCor2 (RELION). Aligned, non-dose-weighted micrographs
were then used to estimate the contrast transfer function (CTF) with Gctf. Motion corrected movies were linked to SPHIRE and corresponding image shift information converted to enable drift assessment in SPHIRE (0-60 A overall drift selected). Single particles were picked with crYOLO (gmodel_phosnet_202005_N63_c17.h5) on JANNI-denoised micrographs
(gmodeljanni_20190703. h5). Particle coordinates (3.1 million) were linked back to RELION, extracted with a box size of 480 pix (binned 4-fold) and 2D-classified. All particle representing classes were selected for a 3D classification using a loosely masked volume of the Cas12k-sgRNA-dsDNA complex (PDB: 7PLA, EMDB: EMD-13486) as reference to force orientational alignment on the Cas12k-sgRNA part. A 3D class showing features expanding the input density was subclassified without a mask to allow for identification of sub-populations. Sub-populations with new features were used as input (masked 3D classifcation) to identify all corresponding particles. Classes were inspected visually before unbinned reextraction and 3D refinement (RELION) in real scale. Two rounds (one round for the ‘artificial’ complex) of Bayesian particle polishing (RELION) and CTF refinement (RELION) prior to final refinements (RELION) resulted in a reconstruction of the full target recognition complex from 75 k particles at an overall resolution of 3.3 A (133 k particles at an overall resolution of 4.1 A for the ‘artificial’ complex). The local resolution was calculated based on the resulting map using the local resolution functionality (RELION) and plotted on the map using UCSF Chimera. The structure model for the Cas12k-transposon recruitment complex was built in Coot. The structures of the Cas12k-sgRNA-target DNA (PDB: 7PLA) and TnsC-TniQ-DNA were docked in the new density and used as starting model to complete the Cas12k-transposon recruitment complex. The model building revealed a well resolved extra density between tracrRNA and Cas12k. A de-novo built template model resulting from this extra density was subjected to a DALI search and identified the prokaryotic ribosomal S15 protein as closely related. The E. coli ribosomal S15 protein was confirmed to have co-purified with TniQ by mass spectrometry and build in the extra density with great fit. The model was refined in Coot using restraints for the nucleic acids calculated with the LibG script (base pair, stacking plane and sugar pucker restraints) in ccp4 and finally refined using Phenix. Real space refinement was performed with the global minimization and atomic displacement parameter (ADP) refinement options selected. Secondary structure restraints, side chain rotamer restraints, and Ramachandran restraints were used. Key refinement statistics are listed in Table 2. The final atomic model includes Cas12k residues 1-142, 174- 636, sgRNA nucleotides 5-250, 41 nucleotides of each TS and NTS DNA, TniQ residues 9-167, S15 residues 3-87, seven TnsC molecules (each with residues 17-276), two Zinc and seven Magnesium cations and seven ATP molecules. The quality of the atomic model, including basic protein and DNA geometry, Ramachandran plots, clash analysis and model cross-validation, was assessed with MolProbity and the validation tools in Phenix. Structural superposition was performed in Coot using the secondary structure matching (SSM) function. Figure preparation for maps and models and calculation of map contour levels was performed using UCSF ChimeraX.
Transposition assay and droplet digital PCR analysis
All transposition experiments were performed in One Shot PIR1 E. coli cells (Thermo Fisher Scientific). The strain was first co-transformed with 20 ng each of pDonor and pTarget, and transformants were isolated by selective plating on double antibiotic LB-agar plates. Liquid cultures
were then inoculated from single colonies, and the resulting strains were made chemically competent using standard methods, aliquoted and snap frozen. pHelper plasmids (20 ng) were then introduced in a new transformation reaction by heat shock, and after recovering cells in fresh LB medium at 37 °C for 1 h, cells were plated on triple antibiotic LB-agar plates containing 100 pg mL’1 carbenicillin, 50 pg mL-1 kanamycin, and 33 pg mL-1 chloramphenicol. After overnight growth at 37 °C for 16 h, colonies were harvested from the plates, resuspended in 15 pL Lysis buffer (TE with 0.1 % Triton X 100) and heated for 5 min at 95 °C. 60 pL of water were added to the samples before centrifugation for 10 min at 16,000 x g. The supernatant was then transferred and the nucleic acid concentration adjusted to 0.3 ng pL-1. 0.75 ng of template DNA were used for subsequent investigation by droplet digital PCR (ddPCR). A mixture of five primers (900 nM final concentration each), two probes (250 nM final each) and template DNA were combined with ddPCR Supermix for Probes (No dUTP) (BioRad) in a final 20 pL reaction volume. Droplets were generated with 70 pL of Droplet Generation Oil for Probes (BioRad) using the QX200 Droplet Generator (BioRad). 40 pL of final sample were transferred to 96 well plates for amplification by PCR (1 cycle, 95 °C, 10 min; 40 cycles, 94 °C, 30 s, 58 °C, 1 min; 1 cycle, 98 °C, 10 min; 4 °C hold). Samples were read with a QX200 Droplet Reader, and data analysed using the QuantaSoft software to determine the concentration of inserts and template in each reaction (Abs counting mode). On-target transposition efficiencies were calculated as inserts/(inserts+targets). All ddPCR measurements presented in the text and figures were determined from three independent biological replicates and measured in technical duplicates.
Pull-down experiments
For pull-down experiments using ShCas12k as bait, sgRNA was first mixed with hexahistidine-Strepll- tagged ShCas12k (Strep-Cas12k) in assembly buffer (20 mM HEPES-KOH pH 7.5, 250 mM KCI, 10 mM MgCL, 1 mM DTT), and incubated 20 min at RT to allow complex formation. A dsDNA target was then added to the reaction in a Strep-Cas12k:sgRNA:dsDNA molar ratio of 1 :1.2:1.5 and incubated for 20 min at RT. The final 20 pL reaction contained 5 pg (at a concentration of 3.2 pM) Strep-Cas12k, 3.8 pM sgRNA, and 4.7 pM DNA in assembly buffer. Samples were mixed with 12.5 pL Strep-Tactin beads (iba) equilibrated in pull-down wash buffer 1 (20 mM HEPES-KOH pH 7.5, 250 mM KCI, 10 mM MgCL, 1 mM DTT, 0.05 % Tween20) and incubated 30 min at 4 °C on a rotating wheel. The beads were washed three times with pull-down wash buffer 2 (20 mM HEPES-KOH pH 7.5, 250 mM KCI, 10 mM MgCL, 1 mM DTT, 1 mM ATP, 0.05 % Tween20) to remove excess nucleic acids. The beads were resuspended in 150 pL of pull-down wash buffer 2 and EcS15 and/or ShS15 and/or ShTniQ (purified according to Protocol 2 and thus S15-free) was added in 10-fold molar excess and/or ShTnsC was added in 12-fold molar excess. Samples were incubated 20 min at RT, then washed three times with pull-down wash buffer 2 and eluted with pull-down elution buffer (20 mM HEPES-KOH pH 7.5, 250 mM KCI, 10 mM MgCL, 1 mM DTT, 1 mM ATP, 5 mM desthiobiotin). Eluted samples were analyzed by SDS-PAGE using Any kDa gradient polyacrylamide gels (Bio-Rad) and stained with Coomassie Brilliant Blue.
REFERENCES
Adzuma, K., and Mizuuchi, K. (1988). Target immunity of Mu transposition reflects a differential distribution of Mu B protein. Cell 53, 257-266.
Afonine et al. (2018). Real-space refinement in PHENIX for cryo-EM and crystallography. Acta Crystallogr D Struct Biol 74, 531-544.
Anders et al. (2015). In Vitro Reconstitution and Crystallization of Cas9 Endonuclease Bound to a Guide RNA and a DNA Target. Methods Enzymol 558, 515-537.
Brown et al. (2015). Tools for macromolecular model building and refinement into electron cryomicroscopy reconstructions. Acta Crystallogr D Biol Crystallogr 71 , 136-153.
Chen et al. (2010). MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 66, 12-21 .
Cox et al. M. (2008). MaxQuant enables high peptide identification rates, individualized p.p.b.- range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26, 1367-1372.
Emsley et al. (2010). Features and development of Coot. Acta Crystallogr D Biol Crystallogr 66, 486-501.
Faure et al. (2019). CRISPR-Cas in mobile genetic elements: counter-defence and beyond. Nat Rev Microbiol 17, 513-525.
Greene et al. (2002). Target immunity during Mu DNA transposition. Transpososome assembly and DNA looping enhance MuA-mediated disassembly of the MuB target complex. Mol Cell 10, 1367-1378.
Halpin-Healy et al. (2019). Structural basis of DNA targeting by a transposon-encoded CRISPR- Cas system. bioRxiv, 706143.
Holm, L. (2022). Dali server: structural unification of protein families. Nucleic Acids Res.
Jia et al. (2020). Structure-function insights into the initial step of DNA integration by a CRISPR- Cas-Transposon complex. Cell Res 30, 182-184.
Kaczmarska et al. (2022). Structural basis of transposon end recognition explains central features of Tn7 transposition systems. Mol Cell.
Klompe et al. (2022). Evolutionary and mechanistic diversity of Type l-F CRISPR-associated transposons. Mol Cell.
Klompe et al. (2019). Transposon-encoded CRISPR-Cas systems direct RNA-guided DNA integration. Nature 571 , 219-225.
Koonin et al. (2017). Evolutionary Genomics of Defense Systems in Archaea and Bacteria. Annu Rev Microbiol 71 , 233-261 .
Krissinel, E., and Henrick, K. (2004). Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 60, 2256-2268.
Li, Z., Zhang, H., Xiao, R.J., and Chang, L.F. (2020). Cryo-EM structure of a type l-F CRISPR RNA guided surveillance complex bound to transposition protein TniQ. Cell Res 30, 179-181.
Liebschner et al. (2019). Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D Struct Biol 75, 861-877.
Liu et al. (2019). CasX enzymes comprise a distinct family of RNA-guided genome editors. Nature 566, 218-223.
Moriya et al. (2017). High-resolution Single Particle Analysis from Electron Cryo-microscopy Images Using SPHIRE. J Vis Exp.
Park et al. (2021 ). Structural basis for target site selection in RNA-guided DNA transposition systems. Science 373, 768-774.
Petassi, M.T., Hsieh, S.C., and Peters, J.E. (2020). Guide RNA Categorization Enables Target Site Choice in Tn7-CRISPR-Cas Transposons. Cell 183, 1757-1771 e1718.
Peters, J.E., and Craig, N.L. (2001 ). Tn7: smarter than we thought. Nat Rev Mol Cell Biol 2, 806- 814.
Peters, J.E., Makarova, K.S., Shmakov, S., and Koonin, E.V. (2017). Recruitment of CRISPR-Cas systems by Tn7-like transposons. Proc Natl Acad Sci U S A 114, E7358-E7366.
Pettersen et al. (2004). UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem 25, 1605-1612.
Pettersen et al. (2021 ). UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci 30, 70-82.
Prisant et al. New tools in MolProbity validation: CaBLAM for CryoEM backbone, UnDowser to rethink "waters," and NGL Viewer to recapture online 3D graphics. Protein Sci 29, 315-329.
Punjani, A., Rubinstein, J.L., Fleet, D.J., and Brubaker, M.A. (2017). cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat Methods 14, 290-296.
Querques, I., Schmitz, M., Oberli, S., Chanez, C., and Jinek, M. (2021 ). Target site selection and remodelling by type V CRISPR-transposon systems. Nature 599, 497-502.
Rae, C.D., Gordiyenko, Y., and Ramakrishnan, V. (2019). How a circularized tmRNA moves through the ribosome. Science 363, 740-744.
Rubin et al. (2022). Species- and site-specific genome editing in complex bacterial communities. Nat Microbiol 7, 34-47.
Rybarski, J.R., Hu, K., Hill, A.M., Wilke, C.O., and Finkelstein, I. J. (2021 ). Metagenomic discovery of CRISPR-associated transposons. Proc Natl Acad Sci U S A 118.
Saito et al. (2021 ). Dual modes of CRISPR-associated transposon homing. Cell 184, 2441-2453 e2418.
Skelding et al. (2003). Alternative interactions between the Tn7 transposase and the Tn7 target DNA binding protein regulate target immunity and transposition. EMBO J 22, 5904-5917.
Sorek, R., Lawrence, C.M., and Wiedenheft, B. (2013). CRISPR-mediated adaptive immune systems in bacteria and archaea. Annu Rev Biochem 82, 237-266.
Strecker et al. (2019). RNA-guided DNA insertion with CRISPR-associated transposases. Science 365, 48-53. Turker et al. (2011 ). Life sciences data and application integration with B-fabric. J Integr Bioinform 8, 159.
Vo, P.L.H., Acree, C., Smith, M.L., and Sternberg, S.H. (2021a). Unbiased profiling of CRISPR RNA-guided transposition products by long-read sequencing. Mob DNA 12, 13.
Vo et al. (2021 b). CRISPR RNA-guided integrases for high-efficiency, multiplexed bacterial genome engineering. Nat Biotechnol 39, 480-489.
Xiao et al. (2021 ). Structural basis of target DNA recognition by CRISPR-Cas12k for RNA-guided DNA transposition. Mol Cell 81 , 4457-4466 e4455.
Zhang, K. (2016). Gctf: Real-time CTF determination and correction. J Struct Biol 193, 1-12.
Zhanget al. (2020). Multicopy Chromosomal Integration Using CRISPR-Associated Transposases. ACS Synth Biol 9, 1998-2008.
Zheng et al. (2017). MotionCor2: anisotropic correction of beam-induced motion for improved cryoelectron microscopy. Nat Methods 14, 331-332.
Table 2 (S1) | Cryo-EM data collection, refinement and validation statistics for the ShCas12k- sgRNA-dsDNA-S15-TniQ-TnsC complex structure and the TnsC-ATP-dsDNA-TniQ structure
Cas12k-sgRNA- TnsC-ATP-dsDNA- dsDNA-S15-TniQ- TniQ
TnsC (EMDB- yyyy)
(EMDB- xxxx) (PDB yyy)
(PDB xxx)
Data collection and processing
Magnification 130,000 130,000
Voltage (kV) 300 300
Electron exposure (e-/A2) 67.68 66.036
Defocus range (pm) -1 .0 to -2.4 (0.2 steps) -1 .0 to -2.4 (0.2 steps)
Pixel size (A) 0.325 0.325
Symmetry imposed C1 C1
Initial particle images (no.) 16,165 10,436
Final particle images (no.) 15,296 10,176
Map resolution (A) 3.28 3.48
FSC threshold 0.143 0.143
Map resolution range (A) 2.9-8.0 3.4-7.5
Refinement
Initial model used (PDB code) 7PLA & 7PLH 7OXD & 7PLH
Model resolution (A) 3.5 3.5
FSC threshold 0.5 0.143
Map sharpening B factor (A2) -66.35 -47.3
Model composition
Non-hydrogen atoms 28381 18984
Protein residues 2669 2285
Nucleotide residues 328 32
Ligands Zn : 2 MG :7 ATP :7 Zn : 6 MG :7 ATP :7
B factors (A2) min/max/mean
Protein 0.00/449.71/194.31 30.00/428.64/201.00
Nucleotide 17.06/522.32/138.45 30.00/30.00/30.00
Ligand 126.42/309.10/243.23 30.00/299.56/234.94
R.m.s. deviations
Bond lengths (A) 0.008 0.005
Bond angles (°) 0.692 1.054
Validation
MolProbity score 1.84 2.05
Clashscore 16.21 30.40
Poor rotamers (%) 0.18 0.64
Ramachandran plot
Favored (%) 97.36 97.62
Allowed (%) 2.64 2.38
Disallowed (%) 0.00 0.00
Ramachandran Z score whole -0.24 -0.08 helix 0.56 0.57 sheet -1.58 -1.90 loop -1.09 -0.68
Claims
1. An engineered nucleic acid targeting system for insertion of a donor polynucleotide sequence into a target DNA strand, the system comprising: a. one or more transposase proteins comprising a Tn7-like transposase system, or functional fragments thereof, or a nucleic acid sequence encoding such transposase proteins or functional fragments thereof; b. a Cas 12k protein, or a nucleic acid sequence encoding such Cas protein; c. a guide RNA molecule capable of complexing with the Cas protein and directing sequence specific binding of the guide-Cas protein complex to a target sequence of a target DNA strand; d. a ribosomal protein S15, or a nucleic acid sequence encoding such ribosomal protein S15.
2. The engineered nucleic acid targeting system according to claim 1 , wherein the transposase proteins consist of the group composed of TnsB, TnsC, and TniQ.
3. The engineered nucleic acid targeting system according to anyone of the preceding claims, wherein the ribosomal protein S15 is ribosomal protein S15 from E. coli (AP009048.1 ) or ribosomal protein S15 from S. hofmanni (WP_029633173.1 ).
4. The engineered nucleic acid targeting system according to anyone of the preceding claims, further comprising a donor polynucleotide (DNA) sequence to be integrated, wherein the donor polynucleotide (DNA) sequence comprises a recognition site for the recombinase and a cargo nucleic acid sequence flanked by at least one transposon end sequence.
5. A method for inserting a cargo nucleic acid sequence into a target nucleic acid sequence inside a cell, the method comprising contacting the target nucleic acid sequence with an engineered nucleic acid targeting system according to any one of claims 1 to 4.
6. The method according to claim 5, wherein the cell does not express ribosomal protein S15, or a prokaryote homologue or orthologue of S15, prior to the cell having been contacted with the engineered nucleic acid targeting system.
7. The method according to claim 5 or 6, wherein the cell is a eukaryotic cell.
8. The method according to claim 7, wherein the cell is a mammalian stem cell, particularly wherein the method is practiced ex-vivo.
9. The method according to any one of claims 9 to 13, wherein the engineered nucleic targeting system is delivered into the cell by one or more expression vectors.
10. The method according to claim 9, wherein a viral vector is used, selected from the group consisting of an adeno-associated virus, an adenovirus, a herpesvirus, and a lentivirus.
11 . The method according to any one of claims 5 to 10, wherein the cell is a eukaryotic cell, particularly a mammalian cell, and wherein at least one, particularly all of the Cas protein, the transposase proteins and the S15 protein carry a nuclear localization sequence peptide.
Use of a recombinant ribosomal protein S15, or of a nucleic acid sequence encoding said recombinant ribosomal protein S15, in a method for inserting a donor polynucleotide sequence into a target polynucleotide sequence, said method for inserting being facilitated by a CRISPR-associated transposase system. The use according to claim 12, wherein the recombinant ribosomal protein is ribosomal protein S15 from E. coli (AP009048.1 ) or ribosomal protein S15 from S. hofmanni (WP_029633173.1 ), or a ribosomal protein having at least 85% sequence identity to ribosomal protein S15 from S. hofmanni (WP_029633173.1 ) and at least 80% of the biological identity of WP_029633173.1 . The use according to claim 12 or 13, wherein the CRISPR-associated transposase system comprises Cas12k and Tn7-like transposase proteinsTnsB, TnsC, and TniQ.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22178728.6 | 2022-06-13 | ||
| EP22178728 | 2022-06-13 | ||
| EP22208421.2 | 2022-11-18 | ||
| EP22208421 | 2022-11-18 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023242225A1 true WO2023242225A1 (en) | 2023-12-21 |
Family
ID=86896039
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2023/065861 Ceased WO2023242225A1 (en) | 2022-06-13 | 2023-06-13 | Ribosomal protein s15 in crispr transposon mediated sequence engineering |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2023242225A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200190487A1 (en) * | 2018-12-17 | 2020-06-18 | The Broad Institute, Inc. | Crispr-associated transposase systems and methods of use thereof |
| WO2023102176A1 (en) * | 2021-12-03 | 2023-06-08 | The General Hospital Corporation | Crispr-associated transposases and methods of use thereof |
-
2023
- 2023-06-13 WO PCT/EP2023/065861 patent/WO2023242225A1/en not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200190487A1 (en) * | 2018-12-17 | 2020-06-18 | The Broad Institute, Inc. | Crispr-associated transposase systems and methods of use thereof |
| WO2023102176A1 (en) * | 2021-12-03 | 2023-06-08 | The General Hospital Corporation | Crispr-associated transposases and methods of use thereof |
Non-Patent Citations (53)
| Title |
|---|
| ADZUMA, K.MIZUUCHI, K.: "Target immunity of Mu transposition reflects a differential distribution of Mu B protein", CELL, vol. 53, 1988, pages 257 - 266, XP085762483, DOI: 10.1016/0092-8674(88)90387-X |
| AFONINE ET AL.: "Real-space refinement in PHENIX for cryo-EM and crystallography", ACTA CRYSTALLOGR D STRUCT BIOL, vol. 74, 2018, pages 531 - 544 |
| ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410 |
| ANDERS ET AL.: "In Vitro Reconstitution and Crystallization of Cas9 Endonuclease Bound to a Guide RNA and a DNA Target", METHODS ENZYMOL, vol. 558, 2015, pages 515 - 537 |
| BROWN ET AL.: "Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions", ACTA CRYSTALLOGR D BIOL CRYSTALLOGR, vol. 71, 2015, pages 136 - 153 |
| CHEN ET AL.: "MolProbity: all-atom structure validation for macromolecular crystallography", ACTA CRYSTALLOGR D BIOL CRYSTALLOGR, vol. 66, 2010, pages 12 - 21 |
| COX ET AL.: "MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification", NAT BIOTECHNOL, vol. 26, 2008, pages 1367 - 1372, XP055527588, DOI: 10.1038/nbt.1511 |
| EMSLEY ET AL.: "Features and development of Coot", ACTA CRYSTALLOGR D BIOL CRYSTALLOGR, vol. 66, 2010, pages 486 - 501, XP055950447, DOI: 10.1107/S0907444910007493 |
| FAURE ET AL.: "CRISPR-Cas in mobile genetic elements: counter-defence and beyond", NAT REV MICROBIOL, vol. 17, 2019, pages 513 - 525, XP036835511, DOI: 10.1038/s41579-019-0204-7 |
| GREENE ET AL.: "Target immunity during Mu DNA transposition. Transpososome assembly and DNA looping enhance MuA-mediated disassembly of the MuB target complex", MOL CELL, vol. 10, 2002, pages 1367 - 1378 |
| HALPIN-HEALY ET AL.: "Structural basis of DNA targeting by a transposon-encoded CRISPR-Cas system", BIORXIV, 2019, pages 706143 |
| HOLM, L.: "Dali server: structural unification of protein families", NUCLEIC ACIDS RES, 2022 |
| JIA ET AL.: "Structure-function insights into the initial step of DNA integration by a CRISPR-Cas-Transposon complex", CELL RES, vol. 30, 2020, pages 182 - 184, XP037005164, DOI: 10.1038/s41422-019-0272-2 |
| JOSEPH E. PETERS ET AL: "Recruitment of CRISPR-Cas systems by Tn7-like transposons", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 114, no. 35, 15 August 2017 (2017-08-15), pages E7358 - E7366, XP055547696, ISSN: 0027-8424, DOI: 10.1073/pnas.1709035114 * |
| KACZMARSKA ET AL.: "Structural basis of transposon end recognition explains central features of Tn7 transposition systems", MOL CELL, 2022 |
| KLOMPE ET AL.: "Evolutionary and mechanistic diversity of Type I-F CRISPR-associated transposons", MOL CELL, 2022 |
| KLOMPE ET AL.: "Transposon-encoded CRISPR-Cas systems direct RNA-guided DNA integration", NATURE, vol. 571, 2019, pages 219 - 225, XP036831898, DOI: 10.1038/s41586-019-1323-z |
| KOONIN ET AL.: "Evolutionary Genomics of Defense Systems in Archaea and Bacteria", ANNU REV MICROBIOL, vol. 71, 2017, pages 233 - 261 |
| KRISSINEL, E.HENRICK, K.: "Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions", ACTA CRYSTALLOGR D BIOL CRYSTALLOGR, vol. 60, 2004, pages 2256 - 2268 |
| LI, Z.ZHANG, H.XIAO, R.JCHANG, L.F.: "Cryo-EM structure of a type I-F CRISPR RNA guided surveillance complex bound to transposition protein TniQ", CELL RES, vol. 30, 2020, pages 179 - 181, XP037005161, DOI: 10.1038/s41422-019-0268-y |
| LIEBSCHNER ET AL.: "Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix", ACTA CRYSTALLOGR D STRUCT BIOL, vol. 75, 2019, pages 861 - 877, XP072456730, DOI: 10.1107/S2059798319011471 |
| LIU ET AL.: "CasX enzymes comprise a distinct family of RNA-guided genome editors", NATURE, vol. 566, 2019, pages 218 - 223, XP036746431, DOI: 10.1038/s41586-019-0908-x |
| MORIYA ET AL.: "High-resolution Single Particle Analysis from Electron Cryo-microscopy Images Using SPHIRE", J VIS EXP, 2017 |
| NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443 |
| PARK ET AL.: "Structural basis for target site selection in RNA-guided DNA transposition systems", SCIENCE, vol. 373, 2021, pages 768 - 774 |
| PARK JUNG-UN ET AL: "Structures of the holo CRISPR RNA-guided transposon integration complex", vol. 613, no. 7945, 28 November 2022 (2022-11-28), London, pages 775 - 782, XP093073286, ISSN: 0028-0836, Retrieved from the Internet <URL:https://www.nature.com/articles/s41586-022-05573-5> DOI: 10.1038/s41586-022-05573-5 * |
| PEARSONLIPMAN, PROC. NAT. ACAD. SCI., vol. 85, 1988, pages 2444 |
| PETASSI, M.T.HSIEH, S.C.PETERS, J.E.: "Guide RNA Categorization Enables Target Site Choice in Tn7-CRISPR-Cas Transposons", CELL, vol. 183, 2020, pages 1757 - 1771 |
| PETERS, J.E.CRAIG, N.L.: "Tn7: smarter than we thought", NAT REV MOL CELL BIOL, vol. 2, 2001, pages 806 - 814, XP055735463, DOI: 10.1038/35099006 |
| PETERS, J.E.MAKAROVA, K.S.SHMAKOV, S.KOONIN, E.V.: "Recruitment of CRISPR-Cas systems by Tn7-like transposons", PROC NATL ACAD SCI U S A, vol. 114, 2017, pages E7358 - E7366, XP055547696, DOI: 10.1073/pnas.1709035114 |
| PETTERSEN ET AL.: "UCSF Chimera--a visualization system for exploratory research and analysis", J COMPUT CHEM, vol. 25, 2004, pages 1605 - 1612 |
| PETTERSEN ET AL.: "UCSF ChimeraX: Structure visualization for researchers, educators, and developers", PROTEIN SCI, vol. 30, 2021, pages 70 - 82 |
| PRISANT ET AL.: "New tools in MolProbity validation: CaBLAM for CryoEM backbone, UnDowser to rethink ''waters,'' and NGL Viewer to recapture online 3D graphics", PROTEIN SCI, vol. 29, pages 315 - 329 |
| PUNJANI, A.RUBINSTEIN, J.L.FLEET, D.J.BRUBAKER, M.A.: "cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination", NAT METHODS, vol. 14, 2017, pages 290 - 296, XP055631965, DOI: 10.1038/nmeth.4169 |
| QUERQUES IRMA ET AL: "Target site selection and remodelling by type V CRISPR-transposon systems", NATURE, NATURE PUBLISHING GROUP UK, LONDON, vol. 599, no. 7885, 10 November 2021 (2021-11-10), pages 497 - 502, XP037621527, ISSN: 0028-0836, [retrieved on 20211110], DOI: 10.1038/S41586-021-04030-Z * |
| QUERQUES, I.SCHMITZ, M.OBERLI, S.CHANEZ, C.JINEK, M.: "Target site selection and remodelling by type V CRISPR-transposon systems", NATURE, vol. 599, 2021, pages 497 - 502, XP037621527, DOI: 10.1038/s41586-021-04030-z |
| RAE, C.D.GORDIYENKO, Y.RAMAKRISHNAN, V.: "How a circularized tmRNA moves through the ribosome.", SCIENCE, vol. 363, 2019, pages 740 - 744 |
| RUBIN ET AL.: "Species- and site-specific genome editing in complex bacterial communities", NAT MICROBIOL, vol. 7, 2022, pages 34 - 47, XP037655810, DOI: 10.1038/s41564-021-01014-7 |
| RYBARSKI, J.R.HU, K.HILL, A.M.WILKE, C.O.FINKELSTEIN, I.J.: "Metagenomic discovery of CRISPR-associated transposons", PROC NATL ACAD SCI U S A, 2021, pages 118 |
| SAITO ET AL.: "Dual modes of CRISPR-associated transposon homing", CELL, vol. 184, 2021, pages 2441 - 2453 |
| SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 2012, COLD SPRING HARBOR LABORATORY PRESS |
| SCHMITZ MICHAEL ET AL: "Structural basis for the assembly of the type V CRISPR-associated transposon complex", CELL, ELSEVIER, AMSTERDAM NL, vol. 185, no. 26, 25 November 2022 (2022-11-25), pages 4999, XP087235864, ISSN: 0092-8674, [retrieved on 20221125], DOI: 10.1016/J.CELL.2022.11.009 * |
| SKELDING ET AL.: "Alternative interactions between the Tn7 transposase and the Tn7 target DNA binding protein regulate target immunity and transposition", EMBO J, vol. 22, 2003, pages 5904 - 5917 |
| SOREK, R.LAWRENCE, C.M.WIEDENHEFT, B.: "CRISPR-mediated adaptive immune systems in bacteria and archaea", ANNU REV BIOCHEM, vol. 82, 2013, pages 237 - 266, XP055957785, DOI: 10.1146/annurev-biochem-072911-172315 |
| STRECKER ET AL.: "RNA-guided DNA insertion with CRISPR-associated transposases", SCIENCE, vol. 365, 2019, pages 48 - 53, XP055737954, DOI: 10.1126/science.aax9181 |
| TOU CONNOR J. ET AL: "Precise cut-and-paste DNA insertion using engineered type V-K CRISPR-associated transposases", vol. 41, no. 7, 2 January 2023 (2023-01-02), New York, pages 968 - 979, XP093071638, ISSN: 1087-0156, Retrieved from the Internet <URL:https://www.nature.com/articles/s41587-022-01574-x> DOI: 10.1038/s41587-022-01574-x * |
| TURKER ET AL.: "Life sciences data and application integration with B-fabric", J INTEGR BIOINFORM, vol. 8, 2011, pages 159 |
| VO ET AL.: "CRISPR RNA-guided integrases for high-efficiency, multiplexed bacterial genome engineering", NAT BIOTECHNOL, vol. 39, 2021, pages 480 - 489, XP037421720, DOI: 10.1038/s41587-020-00745-y |
| VO, P.L.H.ACREE, C.SMITH, M.L.STERNBERG, S.H.: "Unbiased profiling of CRISPR RNA-guided transposition products by long-read sequencing", MOB DNA, vol. 12, 2021, pages 13 |
| XIAO ET AL.: "Structural basis of target DNA recognition by CRISPR-Cas12k for RNA-guided DNA transposition", MOL CELL, vol. 81, 2021, pages 4457 - 4466 |
| ZHANG, K.: "Gctf: Real-time CTF determination and correction", J STRUCT BIOL, vol. 193, 2016, pages 1 - 12, XP029369162, DOI: 10.1016/j.jsb.2015.11.003 |
| ZHANGET: "Multicopy Chromosomal Integration Using CRISPR-Associated Transposases", ACS SYNTH BIOL, vol. 9, 2020, pages 1998 - 2008, XP055901921, DOI: 10.1021/acssynbio.0c00073 |
| ZHENG ET AL.: "MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy", NAT METHODS, vol. 14, 2017, pages 331 - 332 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Schmitz et al. | Structural basis for the assembly of the type V CRISPR-associated transposon complex | |
| Querques et al. | Target site selection and remodelling by type V CRISPR-transposon systems | |
| JP7429057B2 (en) | Methods and compositions for sequences that guide CAS9 targeting | |
| CN111093714A (en) | Deamination using a split deaminase to restrict unwanted off-target base editors | |
| CN118726313A (en) | CAS9 mutant gene of Streptococcus pyogenes and polypeptide encoded thereby | |
| Wang et al. | Defense mechanism of a bacterial retron supramolecular assembly | |
| Smalakyte et al. | Filament formation activates protease and ring nuclease activities of CRISPR Lon-SAVED | |
| Sieber et al. | EF-P and its paralog EfpL (YeiP) differentially control translation of proline-containing sequences | |
| WO2023242225A1 (en) | Ribosomal protein s15 in crispr transposon mediated sequence engineering | |
| US8981067B2 (en) | Composition, method and kit for obtaining purified recombinant proteins | |
| Kuenzl et al. | Mutant variants of the substrate-binding protein DppA from Escherichia coli enhance growth on nonstandard γ-glutamyl amide-containing peptides | |
| WO2020214549A1 (en) | Self-assembling 2d arrays with de novo protein building blocks | |
| US20250122535A1 (en) | Crispr-associated transposases and methods of use thereof | |
| Querques et al. | Molecular mechanism of target site selection and remodeling by type V CRISPR-associated transposons | |
| Smalakyte et al. | Filament formation activates protease and ring nuclease activities of CRISPR SAVED-Lon | |
| Wang | Molecular mechanism of the type I-B2 CRISPR-associated transposon system | |
| Hwang | Structural and Mechanistic Characterization of Type II-C Anti-CRISPR Proteins | |
| Ng | Exploring higher plant Rubisco activase function | |
| Xiao | Structural and functional studies of type v crispr-cas effectors | |
| Bonilla | THE USE OF PROTEIN CHEMISTRY STRATEGIES TO INVESTIGATE POST TRANSLATIONAL MODIFICATIONS ON THE DNA DAMAGE RESPONSE PROTEIN APE1 AND THE COPPER CHAPERONE CCS. | |
| Xiao et al. | Structural basis for the dimerization-dependent CRISPR-Cas12f nuclease | |
| US11203773B2 (en) | Designer ribosomes and methods of use thereof for incorporating non-standard amino acids into polypeptides | |
| Gillet et al. | The plastid-encoded RNA polymerase structures a logistic chain for light-induced photosynthesis | |
| CN117737023A (en) | Ergothioneine synthase derived from Pleurotus truncatula and its encoding genes and applications | |
| JP3934066B2 (en) | Novel protein that forms spherical particles, and novel gene encoding the protein |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23732549 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23732549 Country of ref document: EP Kind code of ref document: A1 |