WO2024192274A2 - On cas template synthesis (ocats) systems and uses thereof - Google Patents
On cas template synthesis (ocats) systems and uses thereof Download PDFInfo
- Publication number
- WO2024192274A2 WO2024192274A2 PCT/US2024/019986 US2024019986W WO2024192274A2 WO 2024192274 A2 WO2024192274 A2 WO 2024192274A2 US 2024019986 W US2024019986 W US 2024019986W WO 2024192274 A2 WO2024192274 A2 WO 2024192274A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- nucleic acid
- protein
- dna
- effector protein
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y207/00—Transferases transferring phosphorus-containing groups (2.7)
- C12Y207/07—Nucleotidyltransferases (2.7.7)
- C12Y207/07049—RNA-directed DNA polymerase (2.7.7.49), i.e. telomerase or reverse-transcriptase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/85—Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
- C12N15/86—Viral vectors
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2750/00—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
- C12N2750/00011—Details
- C12N2750/14011—Parvoviridae
- C12N2750/14111—Dependovirus, e.g. adenoassociated viruses
- C12N2750/14141—Use of virus, viral particle or viral elements as a vector
- C12N2750/14143—Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
Definitions
- compositions and methods for genome editing relate generally to compositions and methods for genome editing.
- systems and compositions comprise an RNA guided effector protein (e.g., CRISPR associated (Cas) protein) that functions with one or more additional proteins (e.g., reverse transcriptase, DNA repair proteins, endonuclease) to produce a donor DNA at the site of target DNA cleavage.
- additional proteins e.g., reverse transcriptase, DNA repair proteins, endonuclease
- Homologous recombination is a notoriously inefficient process for genome editing.
- the donor DNA In addition to being downregulated in many relevant cell-types, the donor DNA must also be exogenously supplied.
- strategies such as homology independent targeted integration (HITI) presumably rely on nonhomology mediated end joining (NHEJ) or other non-homology dependent repair mechanisms to incorporate DNA at the site of a break. These methods too are often inefficient.
- One potential solution to increasing the rates of homologous repair (HR) or NHEJ mediated insertions is to recruit donors to the sites of the breaks.
- OCATS On Cas Template Synthesis
- a Cas protein and guide nucleic acid complex recruits an RT to a cleavage site, where the RT reverse transcribes a template sequence, which is contained in an extended crRNA (for certain type V Cas proteins) or extended intermediary RNA (certain type II Cas proteins), to produce a single stranded donor (ss donor) reverse-transcribed DNA (RT-DNA) that creates a free 3’ end available for DNA repair or other chemistry.
- ss donor single stranded donor reverse-transcribed DNA
- RT-DNA reverse-transcribed DNA
- a required primer is formed from the annealed, but non-extended intermediary RNA (for type V Cas proteins) or crRNA (certain type II Cas proteins). Synthesis occurs directly on the intermediary RNA or crRNA. See FIGS. 1 to 3.
- donor creation at the site of a Cas induced DNA break may increase the efficiency of non-homology based donor DNA insertion, or increase the likelihood of homology dependent forms of repair.
- Systems and methods disclosed herein may be used for precise, templated repair, lowering the amount of unwanted indels while increasing the level of desired genetic change.
- Retrons work by reverse transcribing an RNA sequence flanked by two secondary structural elements.
- the resulting reverse transcribed DNA contains unwanted retron elements flanking the 5’ and 3’ of the RT-DNA. While acceptable for some applications, this limits the ability of the Cas- retron RT-DNA to be used for NHEJ or non-homology mediated applications.
- systems and methods disclosed herein employ RT-Cas protein to synthesize a RT-DNA, attached to a crRNA or intermediary RNA, without such secondary structural elements leaving the RT-DNAs 3’ terminus available for NHEJ or other non-homology based forms of integration.
- Cas-retron systems with compact Cas nucleases deliverable by a single AAV vector are also described.
- FIG. 1 depicts an exemplary system for modifying a dsDNA target nucleic acid.
- a Type V Cas effector protein forms a ribonucleoprotein (RNP) complex with an extended crRNA and intermediary RNA.
- the RNP complex binds a target sequence of the target nucleic acid, forms an R loop at the target sequence, and cleaves at least one strand of the target sequence.
- a reverse transcriptase which may be fused to the Type V Cas effector protein, generates a RT-DNA at the 3’ end of the intermediary RNA that is complementary to the 5’ extension sequence of the extended crRNA.
- the extension sequence may be a template sequence.
- the RT-DNA may be incorporated into the genome through cellular DNA repair machinery and/or exogenously added factors.
- FIG. 2 depicts an exemplary system for modifying a dsDNA target nucleic acid.
- a Type V Cas effector protein forms an RNP complex with an extended crRNA and intermediary RNA.
- the RNP complex binds a target sequence of the target nucleic acid, forms an R loop at the target sequence, and cleaves at least one strand of the target sequence.
- the Type V Cas effector protein may remove a portion of a single strand of the R loop (ssDNA removal) through its own enzymatic activity, by recruiting an endonuclease, or via a separate strategy.
- a reverse transcriptase which may be fused to the Type V Cas effector protein, generates a RT-DNA at the 3’ end of the intermediary RNA that is complementary to a template sequence at the 5’ end of the extended crRNA.
- the RT may be fused to a binding moiety (e.g., antibody fragment, peptide) that interacts with a DNA repair factor such as a ligase to aid in the incorporation of the RT-DNA and/or DNA break repair or may be directly fused to the DNA repair factor itself (not shown).
- FIG. 3 depicts an exemplary system for modifying a dsDNA target nucleic acid.
- a Type V Cas effector protein forms an RNP complex with an extended crRNA and in some cases, an extended intermediary RNA.
- the extended crRNA comprises, in order of 5’ to 3’, a template sequence, a repeat sequence, and a spacer sequence that hybridizes to the target sequence on the target strand.
- the RNP complex binds a target sequence of the target nucleic acid, forms an R loop at the target sequence, and cleaves both strands of the R-loop.
- the Type V Cas effector protein or recruited endonuclease removes a portion of the displaced non-target strand of the R loop (ssDNA removal) and nicks the target strand.
- a reverse transcriptase which may be fused to the Type V Cas effector protein, generates a RT-DNA at the 3 ’ end of the intermediary RNA that is complementary to a template sequence at the 5 ’ end of the extended crRNA.
- the RT may be fused to (e.g. , antibody fragment, peptide) that interacts with a synthesis dependent strand annealing (SDSA) factor that incorporates a new fragment off the 3’ end of the cut target strand.
- SDSA synthesis dependent strand annealing
- FIG. 4 depicts the results of a PAGE electrophoresis demonstrating that CasM.265466 was able to cleave target DNA in the presence of extended crRNAs and intermediary RNAs.
- FIG. 5 depicts the results of a PAGE electrophoresis demonstrating that RTs were able to synthesize DNA in the presence of CasM.265466.
- FIG. 6 depicts the results of a PAGE electrophoresis demonstrating that RTs were able to synthesize DNA in the absence of CasM.265466.
- FIG. 7 depicts the results of a PAGE electrophoresis demonstrating that CasM.265466 was still able to cleave target DNA after DNA synthesis on the intermediary RNA (intRNA).
- FIG. 8 depicts the results of a PAGE electrophoresis demonstrating that trans activity of CasM.265466 did not affect the RNA-DNA hybrid.
- FIGS. 9A and 9B depict the results of a PAGE electrophoresis demonstrating that RT was able to synthesize without the need for a longer intermediary RNA (intRNA) and in the presence of CasM.19952.
- FIG. 10 depicts the results of a PAGE electrophoresis demonstrating that trans activity of CasM.19952 did not affect the RNA-DNA hybrid.
- FIGS. 11A and 11B depict the results of a PAGE electrophoresis demonstrating that trans activity of CasM. 19952 did not affect the RNA-DNA hybrid.
- FIG. 12 depicts the results of a PAGE electrophoresis demonstrating that RT was able to synthesize in the absence of spCas9.
- FIG. 13 depicts the results of a PAGE electrophoresis demonstrating that RT was able to synthesize in the presence of spCas9.
- FIG. 14 depicts the results of a PAGE electrophoresis demonstrating that spCas9 was still able to cleave target DNA after DNA synthesis on the crRNA.
- FIGS. 15A and 15B depict a schematic and the result of PAGE electrophoresis demonstrating that CasM.265466 can cleave target DNA in the presence of a normal length (non-extended crRNA) and extended crRNA, respectively.
- FIGS. 16A and 16B depict the results of a PAGE electrophoresis demonstrating that an extended crRNA remains intact after target DNA cleavage; non-target (NT) DNA was included as a control.
- FIGS. 17A and 17B depict a schematic and the result of PAGE electrophoresis demonstrating that RT was able to synthesize RT-DNA in the presence of intRNAs and CasM.265466.
- FIG. 18 depicts the results of a PAGE electrophoresis demonstrating that a stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced). RNase digestion post synthesis was used to remove RNA for better visualization of DNA products.
- FIGS. 19A and 19B depict the results of a PAGE electrophoresis demonstrating that synthesized cDNA was not affected by any trans DNAse activity. Addition of target DNA post synthesis did not result in trans DNAse cleavage of cDNA products, and the synthesis product did not prevent proper target DNA cleavage.
- FIGS. 20A, 20B, and 20C depict a schematic and the result of PAGE electrophoresis demonstrating that RT was able to synthesize without the need for a longer intermediary RNA (intRNA) and in the presence of CasM. 19952.
- intRNA intermediary RNA
- FIG. 21 depicts the results of a PAGE electrophoresis demonstrating that a stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced). RNase digestion post synthesis was used to remove RNA for better visualization of DNA products.
- FIGS. 22A and 22B depict the results of a PAGE electrophoresis demonstrating that synthesized cDNA was not affected by any trans DNAse activity. Addition of target DNA post synthesis did not result in trans DNAse cleavage of cDNA products.
- FIGS. 23A, 23B, and 23C depict a schematic and the result of PAGE electrophoresis demonstrating that trans activity of CasM.19952 did not affect the RNA-DNA hybrid.
- FIGS. 24A and 24B depict a schematic and the result of PAGE electrophoresis demonstrating that a stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced). RNase digestion post synthesis was used to remove RNA for better visualization of DNA products.
- FIGS. 25A and 25B depict a schematic and the result of PAGE electrophoresis demonstrating that SpCas9 can cleave dsDNA with a dual guide system of crRNA and intRNA (intRNA 4242) and with a dual guide system containing an extended crRNA and an extended intRNA.
- FIGS. 26A and 26B depict a schematic and the result of PAGE electrophoresis demonstrating that RT was able to synthesize in the presence of spCas9.
- FIG. 27 depicts the results of a PAGE electrophoresis demonstrating that spCas9 was still able to cleave target DNA after DNA synthesis on the crRNA.
- % identical refers to the percent of residues that are identical between respective positions of two sequences when the two sequences are aligned for maximum sequence identity.
- the % identity is calculated by dividing the total number of the aligned residues by the number of the residues that are identical between the respective positions of the at least two sequences and multiplying by 100.
- computer programs can be employed for such calculations. Illustrative programs that compare and align pairs of sequences, include ALIGN (Myers and Miller, Comput Appl Biosci.
- % complementary refers to the percent of nucleotides in two nucleotide sequences in said nucleic acid molecules of equal length that can undergo cumulative base pairing at two or more individual corresponding positions in an antiparallel orientation. Accordingly, the terms include nucleic acid sequences that are not completely complementary over their entire length, which indicates that the two or more nucleic acid molecules include one or more mismatches. A “mismatch” is present at any position in the two opposed nucleotides that are not complementary.
- the % complementary is calculated by dividing the total number of the complementary residues by the total number of the nucleotides in one of the equal length sequences, and multiplying by 100.
- Complete or total complementarity describes nucleotide sequences in 100% of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence.
- Partially complementarity describes nucleotide sequences in which at least 20%, but less than 100%, of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence. In some embodiments, at least 50%, but less than 100%, of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence.
- At least 70%, 80%, 90% or 95%, but less than 100%, of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence.
- “Non-complementary” describes nucleotide sequences in which less than 20% of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence.
- percent similarity refers to a value that is calculated by dividing a similarity score by the length of the alignment.
- the similarity of two amino acid sequences can be calculated by using a BLOSUM62 similarity matrix (Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA., 89: 10915-10919 (1992)) that is transformed so that any value > I is replaced with +1 and any value ⁇ 0 is replaced with 0.
- BLOSUM62 similarity matrix Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA., 89: 10915-10919 (1992)
- an lie (I) to Leu (L) substitution is scored at +2.0 by the BLOSUM62 similarity matrix, which in the transformed matrix is scored at +1.
- a multilevel consensus sequence (or PROSITE motif sequence) can be used to identify how strongly each domain or motif is conserved.
- the second and third levels of the multilevel sequence are treated as equivalent to the top level.
- +1 point is assigned. For example, given the multilevel consensus sequence: RLG and
- the test sequence QIQ would receive three points. This is because in the transformed BLOSUM62 matrix, each combination is scored as: Q-R: +1; Q-Y: +0; I-L: +1; I-C: +0; Q-G: +0; Q-K: +1 For each position, the highest score is used when calculating similarity.
- an accessory protein refers to a polypeptide that is capable of recruiting one or more repair factors, such as enzymes, that promotes, increases or enables nucleic acid repair mechanisms or to a polypeptide that is capable of promoting increasing or enabling nucleic acid repair mechanisms.
- an accessory protein comprises a domain that is capable of recruiting one or more repair factors, such as enzymes, that promotes, increases or enables a repair mechanism.
- an accessory protein comprises a polypeptide that is capable of promoting increasing or enabling nucleic acid repair mechanisms.
- nucleic acid repair mechanisms include synthesis dependent strand annealing (SDSA), homology directed repair (HDR), or non-homologous end joining (NHEJ).
- SDSA synthesis dependent strand annealing
- HDR homology directed repair
- NHEJ non-homologous end joining
- Repair mechanisms can be native or endogenous repair mechanism, which is intended to encompass all repair mechanisms naturally existing inside a target cell, or an exogenous repair mechanism, which is intended to encompass repair mechanisms that are delivered to the target cell by systems, mechanisms, or methods described herein.
- bind refers to a non-covalent interaction between macromolecules (e.g., between two polypeptides, between a polypeptide and a nucleic acid; between a polypeptide/guide nucleic acid complex and a target nucleic acid; and the like). While in a state of noncovalent interaction, the macromolecules are said to be “associated” or “interacting” or “binding” (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner).
- Non-limiting examples of non-covalent interactions are ionic bonds, hydrogen bonds, van der Waals and hydrophobic interactions. Not all components of a binding interaction need be sequence-specific (e.g. , contacts with phosphate residues in a DNA backbone), but some portions of a binding interaction may be sequence-specific.
- cleavage refers to cleavage (hydrolysis of a phosphodiester bond) of a target nucleic acid by a complex of an effector protein and a guide nucleic acid (e.g. , an RNP complex), wherein at least a portion of the guide nucleic acid is hybridized to at least a portion of the target nucleic acid. Cleavage may occur within or directly adjacent to the portion of the target nucleic acid that is hybridized to the portion of the guide nucleic acid.
- nucleic acid molecule or nucleotide sequence refer to the characteristic of a polynucleotide having nucleotides that can undergo cumulative base pairing with their Watson-Crick counterparts (C with G; or A with T) in a reference nucleic acid in antiparallel orientation. For example, when every nucleotide in a polynucleotide or a specified portion thereof forms a base pair with every nucleotide in an equal length sequence of a reference nucleic acid, that polynucleotide is said to be 100% complementary to the sequence of the reference nucleic acid.
- the upper (sense) strand sequence is, in general, understood as going in the direction from its 5'- to 3 '-end, and the complementary sequence is thus understood as the sequence of the lower (antisense) strand in the same direction as the upper strand.
- the reverse sequence is understood as the sequence of the upper strand in the direction from its 3'- to its 5 '-end, while the “reverse complement” sequence or the “reverse complementary” sequence is understood as the sequence of the lower strand in the direction of its 5'- to its 3 '-end.
- Each nucleotide in a double stranded DNA or RNA molecule that is paired with its Watson-Crick counterpart can be referred to as its complementary nucleotide.
- the complementarity of modified or artificial base pairs can be based on other types of hydrogen bonding and/or hydrophobicity of bases and/or shape complementarity between bases.
- codon optimized refers to a mutation of a nucleotide sequence encoding a polypeptide, such as a nucleotide sequence encoding an effector protein, to mimic the codon preferences of the intended host organism or cell while encoding the same polypeptide. Thus, the codons can be changed, but the encoded polypeptide remains unchanged. For example, if the intended target cell was a human cell, a human codon-optimized nucleotide sequence encoding an effector protein could be used. As another non-limiting example, if the intended host cell were a mouse cell, then a mouse codon- optimized nucleotide sequence encoding an effector protein could be generated.
- a eukaryote codon-optimized nucleotide sequence encoding an effector protein could be generated.
- a prokaryotic cell then a prokaryote codon-optimized nucleotide sequence encoding an effector protein could be generated. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.or.jp/codon.
- cleavage assay refers to an assay designed to visualize, quantitate or identify cleavage of a nucleic acid.
- the cleavage activity may be cis cleavage activity.
- the cleavage activity may be trans cleavage activity.
- nucleic acid cleavage may be assessed by gel electrophoresis.
- cleave in the context of a nucleic acid molecule or nuclease activity of an effector protein, refer to the hydrolysis of a phosphodiester bond of a nucleic acid molecule that results in breakage of that bond.
- the result of this breakage can be a nick (hydrolysis of a single phosphodiester bond on one side of a double-stranded molecule), single strand break (hydrolysis of a single phosphodiester bond on a single-stranded molecule) or double strand break (hydrolysis of two phosphodiester bonds on both sides of a double-stranded molecule) depending upon whether the nucleic acid molecule is single -stranded (e.g., ssDNA or ssRNA) or double-stranded (e.g., dsDNA) and the type of nuclease activity being catalyzed by the effector protein.
- a nick hydrolysis of a single phosphodiester bond on one side of a double-stranded molecule
- single strand break hydrolysis of a single phosphodiester bond on a single-stranded molecule
- double strand break hydrolysis of two phosphodiester bonds on both sides of a double-stranded molecule
- CRISPR clustered regularly interspaced short palindromic repeats
- conservative amino acid substitution refers to the replacement of one amino acid for another such that the replacement takes place within a family of amino acids that are related in their side chains. In some embodiments, a conservative amino acid substitution in a protein does not change the activity of the protein. Conversely, the term “non-conservative amino acid substitution” as used herein refers to the replacement of one amino acid residue for another that does not have a related side chain.
- Genetically encoded amino acids can be divided into four families having related side chains: (1) acidic (negatively charged): Asp (D), Glu (E); (2) basic (positively charged): Lys (K), Arg (R), His (H); (3) non-polar (hydrophobic): Cys (C), Ala (A), Vai (V), Leu (L), He (I), Pro (P), Phe (F), Met (M), Trp (W), Gly (G), Tyr (Y), with non-polar also being subdivided into: (i) strongly hydrophobic: Ala (A), Vai (V), Leu (L), He (I), Met (M), Phe (F); and (ii) moderately hydrophobic: Gly (G), Pro (P), Cys (C), Tyr (Y), Trp (W); and (4) uncharged polar: Asn (N), Gin (Q), Ser (S), Thr (T).
- Amino acids may be related by aliphatic side chains: Gly (G), Ala (A), Vai (V), Leu (L), He (I), Ser (S), Thr (T), with Ser (S) and Thr (T) optionally being grouped separately as aliphatic-hydroxyl; Amino acids may be related by aromatic side chains: Phe (F), Tyr (Y), Trp (W). Amino acids may be related by amide side chains: Asn (N), Gin (Q). Amino acids may be related by sulfur-containing side chains: Cys (C) and Met (M).
- CRISPR RNA and “crRNA,” as used herein, refer to a type of guide nucleic acid that is RNA comprising a first sequence that is capable of hybridizing to a target sequence of a target nucleic acid and a second sequence that is capable of interacting with an effector protein either directly (by being bound by an effector protein) or indirectly (e.g., by hybridization with a second nucleic acid molecule that can be bound by an effector).
- the first sequence and the second sequence are directly connected to each other or by a linker.
- donor nucleic acid refers to a nucleic acid that is (designed or intended to be) incorporated into a target nucleic acid or target sequence.
- a reverse-transcribed DNA (RT-DNA) as described herein may be used as a donor nucleic acid in systems, compositions and methods described herein.
- a donor nucleic acid may be a single stranded donor nucleic acid and is referred to herein as a ss donor nucleic acid.
- a donor nucleic acid may be a double stranded donor nucleic acid and is referred to herein as a ds donor nucleic acid.
- the term “dual nucleic acid system” as used herein refers to a system that uses a intRNA-crRNA duplex complexed with one or more polypeptides described herein, wherein the complex is capable of interacting with a target nucleic acid in a sequence selective manner.
- effector protein refers to a protein, polypeptide, or peptide that is capable of interacting with a nucleic acid, such as a guide nucleic acid, to form a complex (e.g., a RNP complex), wherein the complex interacts with a target nucleic acid.
- edited target nucleic acid refers to a target nucleic acid that has undergone a change to its nucleotide sequence, for example, after contact with an OCATS system described herein.
- the edited target nucleic acid comprises an insertion, deletion, or substitution of one or more nucleotides compared to the unedited target nucleic acid.
- edited target nucleic acid or edited DNA is the genomic DNA of a target.
- extended guide nucleic acid refers to a guide nucleic acid engineered to comprise an extension sequence on a 5 ’ or 3 ’ end that can be used for production of a donor nucleic acid.
- an extended guide nucleic acid, or a portion thereof is capable of serving a scaffold for the activity of effector proteins described herein, fusion partners, fusion proteins, and/or accessory proteins described herein.
- an extension sequence comprises a template sequence described herein.
- fidelity refers to the accuracy of template-mediated nucleotide synthesis of a polymerase.
- fidelity of an RNA polymerase depends on the error rate of the transcription of DNA to RNA by the RNA polymerase.
- fidelity of a reverse transcriptase depends on the error rate of the reverse transcription of RNA to DNA by the reverse transcriptase.
- fused refers to at least two sequences that can be connected together, such as by a linker, or by conjugation (e.g., chemical conjugation or enzymatic conjugation). The term “fused” includes a linker.
- fusion protein refers to a protein comprising at least two heterologous polypeptides.
- the fusion protein may comprise one or more effector protein and fusion partner.
- an effector protein and fusion partner are not found connected to one another, such as as a native protein or complex that occurs together in nature.
- fusion partner refers to a protein, polypeptide or peptide that is fused, or linked by a linker, to one or more effector protein.
- the fusion partner can impart some function to the fusion protein that is not provided by the effector protein.
- genetic disease refers to a disease, disorder, condition, or syndrome associated with or caused by one or more mutations in the DNA of an organism having the genetic disease.
- guide nucleic acid refers to a nucleic acid that, when in a complex with one or more polypeptides described herein (e.g., an RNP complex) can impart sequence selectivity to the complex when the complex interacts with a target nucleic acid.
- a guide nucleic acid may be referred to interchangeably as a guide RNA, however it is understood that guide nucleic acids may comprise deoxyribonucleotides (DNA), ribonucleotides (RNA), a combination thereof (e.g., RNA with a thymine base), biochemically or chemically modified nucleobases (e.g., one or more engineered modifications described herein), or combinations thereof.
- heterologous refers to at least two different polypeptide sequences that are not found similarly connected to one another in a native nucleic acid or protein.
- a protein that is heterologous to the effector protein is a protein that is not covalently linked by an amide bond to the effector protein in nature.
- a heterologous protein is not encoded by a species that encodes the effector protein.
- a guide nucleic acid may comprise “heterologous” sequences, which means that it includes a first sequence and a second sequence, wherein the first sequence and the second sequence are not found covalently linked by a phosphodiester bond in nature.
- the first sequence is considered to be heterologous with the second sequence, and the guide nucleic acid may be referred to as a heterologous guide nucleic acid.
- hybridize refers to a nucleotide sequence that is able to noncovalently interact, i. e. form Watson-Crick base pairs and/or G/U base pairs, or anneal, to another nucleotide sequence in a sequence-specific, antiparallel, manner (z.e., a nucleotide sequence specifically interacts to a complementary nucleotide sequence) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength.
- Standard Watson-Crick base-pairing includes: adenine (A) pairing with thymidine (T), adenine (A) pairing with uracil (U), and guanine (G) pairing with cytosine (C) for both DNA and RNA.
- adenine (A) pairing with thymidine (T) adenine (A) pairing with uracil (U)
- guanine (G) pairing with cytosine (C) for both DNA and RNA.
- RNA molecules e.g., dsRNA
- guanine (G) can also base pair with uracil (U).
- G/U base-pairing is at least partially responsible for the degeneracy (i.e., redundancy) of the genetic code in the context of tRNA anti-codon base-pairing with codons in mRNA.
- a guanine (G) can be considered complementary to both an uracil (U) and to an adenine (A).
- G/U base-pair can be made at a given nucleotide position, the position is not considered to be non- complementary, but is instead considered to be complementary. While hybridization typically occurs between two nucleotide sequences that are complementary, mismatches between bases are possible.
- nucleotide sequences need not be 100% complementary to be specifically hybridizable, hybridizable, partially hybridizable, or for hybridization to occur.
- a nucleotide sequence may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a bulge, a loop structure or hairpin structure, etc.).
- the conditions appropriate for hybridization between two nucleotide sequences depend on the length of the sequence and the degree of complementarity, variables which are well known in the art. For hybridizations between nucleic acids with short stretches of complementarity (e.g.
- the position of mismatches may become important (see Sambrook et al., supra, 11.7-11.8).
- the length for a hybridizable nucleic acid is 8 nucleotides or more (e.g., 10 nucleotides or more, 12 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 22 nucleotides or more, 25 nucleotides or more, or 30 nucleotides or more). Any suitable in vitro assay may be utilized to assess whether two sequences “hybridize”.
- One such assay is a melting point analysis where the greater the degree of complementarity between two nucleotide sequences, the greater the value of the melting temperature (Tm) for hybrids of nucleic acids having those sequences.
- Tm melting temperature
- the conditions of temperature and ionic strength determine the “stringency” of the hybridization. Temperature, wash solution salt concentration, and other conditions may be adjusted as necessary according to factors such as length of the region of complementation and the degree of complementation.
- Hybridization and washing conditions are well known and exemplified in Sambrook, J. and Russell, W., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001); and in Green, M. and Sambrook, J., Molecular Cloning: A Laboratory Manual, Fourth Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2012).
- indel refers to an insertion-deletion or indel mutation, which is a type of genetic mutation that results from the insertion and/or deletion of one or more nucleotide in a target nucleic acid.
- An indel can vary in length (e.g., 1 to 1,000 nucleotides in length) and be detected by any suitable method, including sequencing.
- the term, “indel percentage,” as used herein, refers to a percentage of sequencing reads that show at least one nucleotide has been edited from the insertion and/or deletion of nucleotides regardless of the size of insertion or deletion, or number of nucleotides edited. For example, if there is at least one nucleotide deletion detected in a given target nucleic acid, it counts towards the percent indel value. As another example, if one copy of the target nucleic acid has one nucleotide deleted, and another copy of the target nucleic acid has 10 nucleotides deleted, they are counted the same. This number reflects the percentage of target nucleic acids that are edited by a given effector protein.
- intermediary RNA and “intRNA,” as used herein, refer to the component of a dual nucleic acid system comprising a first sequence that is capable of interacting with an effector protein by being non-covalently bound by the effector protein and a second sequence that is capable of hybridizing to a repeat sequence of a crRNA. The first sequence and the second sequence are linked.
- in vitro refers to describing something outside an organism.
- An in vitro system, composition or method may take place in a container for holding laboratory reagents such that it is separated from the biological source from which a material in the container is obtained.
- In vitro assays can encompass cell-based assays in which living or dead cells are employed.
- In vitro assays can also encompass a cell-free assay in which no intact cells are employed.
- the term “in vivo” is used to describe an event that takes place within an organism.
- ex vivo is used to describe an event that takes place in a cell that has been obtained from an organism. An ex vivo assay is not performed on a subject. Rather, it is performed upon a sample separate from a subject.
- length and “linked” as used herein refer to a nucleic acid (polynucleotide) or polypeptide, may be expressed as “kilobases” (kb) or “base pairs (bp),”. Thus, a length of 1 kb refers to a length of 1000 linked nucleotides, and a length of 500 bp refers to a length of 500 linked nucleotides. Similarly, a protein having a length of 500 linked amino acids may also be simply described as having a length of 500 amino acids.
- linker refers to a molecule that links a first polypeptide to a second polypeptide (e.g., by an amide bond) or a first nucleic acid to a second nucleic acid (e.g., by a phosphodiester bond).
- mutation refers to an alteration that changes an amino acid residue or a nucleotide as described herein. Such an alteration can include, for example, deletions, insertions, and/or substitutions.
- the mutation can refer to a change in structure of an amino acid residue or nucleotide relative to the starting or reference residue or nucleotide.
- a mutation of an amino acid residue includes, for example, deletions, insertions and substituting one amino acid residue for a structurally different amino acid residue.
- substitutions can be a conservative amino acid substitution, a non-conservative amino acid substitution, a substitution to a specific sub-class of amino acids, or a combination thereof as described herein.
- a mutation of a nucleotide includes, for example, changing one naturally occurring base for a different naturally occurring base, such as changing an adenine to a thymine or a guanine to a cytosine or an adenine to a cytosine or a guanine to a thymine .
- a mutation of a nucleotide base may result in a structural and/or functional alteration of the encoding peptide, polypeptide or protein by changing the encoded amino acid residue of the peptide, polypeptide or protein.
- a mutation of a nucleotide base may not result in an alteration of the amino acid sequence or function of encoded peptide, polypeptide or protein, also known as a silent mutation. Methods of mutating an amino acid residue or a nucleotide are well known.
- mutation associated with a disease and “mutation associated with a genetic disorder,” as used herein, refer to the co-occurrence of a mutation and the phenotype of a disease.
- the mutation may occur in a gene, wherein transcription or translation products from the gene occur at a significantly abnormal level or in an abnormal form in a cell or subject harboring the mutation as compared to a nondisease control subject not having the mutation.
- nickase refers to an enzyme that possess catalytic activity for single stranded nucleic acid cleavage of a double stranded nucleic acid.
- nickase activity refers to catalytic activity that results in single stranded nucleic acid cleavage of a double stranded nucleic acid.
- nucleic acid, nucleotide, protein, polypeptide, peptide or amino acid refers to a molecule, such as but not limited to, a nucleic acid, nucleotide, protein, polypeptide, peptide or amino acid refers to a modification of that molecule (e.g., chemical modification, nucleotide sequence, or amino acid sequence) that is not present in the naturally molecule.
- a composition or system described herein refer to a composition or system having at least one component that is not naturally associated with the other components of the composition or system.
- a composition may include an effector protein and a guide nucleic acid that do not naturally occur together.
- an effector protein or guide nucleic acid that is “natural,” “naturally-occurring,” or “found in nature” includes an effector protein and a guide nucleic acid from a cell or organism that have not been genetically modified by the hand of man.
- nuclease and “endonuclease” as used herein, refer to an enzyme which possesses catalytic activity for nucleic acid cleavage.
- nuclease activity refers to catalytic activity that results in nucleic acid cleavage (e.g., ribonuclease activity (ribonucleic acid cleavage), or deoxyribonuclease activity (deoxyribonucleic acid cleavage), etc.).
- nucleic acid refers to a polymer of nucleotides.
- a nucleic acid may comprise ribonucleotides, deoxyribonucleotides, combinations thereof, and modified versions of the same.
- a nucleic acid may be single- stranded or double-stranded, unless specified.
- Non-limiting examples of nucleic acids are double stranded DNA (dsDNA), single stranded (ssDNA), messenger RNA, genomic DNA, cDNA, DNA-RNA hybrids, and a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. Accordingly, nucleic acids as described herein may comprise one or more mutations, one or more engineered modifications, or both.
- nucleic acid expression vector refers to a plasmid that can be used to express a nucleic acid of interest.
- nuclear localization signal refers to an entity (e.g., peptide) that facilitates localization of a nucleic acid, protein, or small molecule to the nucleus, when present in a cell that contains a nuclear compartment.
- nucleotide(s) and nucleoside(s) in the context of a nucleic acid molecule having multiple residues, refer to describing the sugar and base of the residue contained in the nucleic acid molecule.
- nucleosides as used in the context of a nucleic acid having multiple linked residues, are interchangeable and describe linked sugars and bases of residues contained in a nucleic acid molecule.
- nucleobase(s) or linked nucleobase, as used in the context of a nucleic acid molecule, it can be understood as describing the base of the residue contained in the nucleic acid molecule, for example, the base of a nucleotide, nucleosides, or linked nucleotides or linked nucleosides.
- nucleotides, nucleosides, and/or nucleobases would also understand the differences between RNA and DNA (generally the exchange of uridine for thymidine or vice versa) and the presence of nucleoside analogs, such as modified uridines, do not contribute to differences in identity or complementarity among polynucleotides as long as the relevant nucleotides (such as thymidine, uridine, or modified uridine) have the same complement (e.g., adenosine for all of thymidine, uridine, or modified uridine; another example is cytosine and 5- methylcytosine, both of which have guanosine or modified guanosine as a complement).
- nucleoside analogs such as modified uridines
- sequence 5'-AXG where X is any modified uridine, such as pseudouridine, Nl-methyl pseudouridine, or 5 -methoxyuridine is considered 100% identical to AUG in that both are perfectly complementary to the same sequence (5' -CAU).
- polypeptide and “protein,” as used herein, refer to a polymeric form of amino acids.
- a polypeptide may include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. Accordingly, polypeptides as described herein may comprise one or more mutations, one or more engineered modifications, or both. It is understood that when describing coding sequences of polypeptides described herein, said coding sequences do not necessarily require a codon encoding an N-terminal Methionine (M) or a Valine (V) as described for the effector proteins described herein.
- M N-terminal Methionine
- V Valine
- a start codon could be replaced or substituted with a start codon that encodes for an amino acid residue sufficient for initiating translation in a host cell.
- a heterologous peptide such as a fusion partner protein, protein tag or NLS
- a start codon for the heterologous peptide serves as a start codon for the effector protein as well.
- the natural start codon encoding an amino acid residue sufficient for initiating translation e.g., Methionine (M) or a Valine (V)
- M Methionine
- V Valine
- promoter and “promoter sequence,” as used herein, refer to a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3 ’ direction) coding or non-coding sequence.
- Eukaryotic promoters will often, but not always, contain “TATA” boxes and “CAT” boxes.
- Various promoters, including inducible promoters may be used to drive expression by the various vectors of the present disclosure.
- protein binding sequence in a context of a dual nucleic acid system, refers to a nucleotide sequence in an intermediary RNA, wherein the protein binding sequence is capable of, at least partially, being non-covalently bound to an effector protein to form a complex (e.g., an RNP complex).
- PAM protospacer adjacent motif
- a PAM is required for a complex of an effector protein and a guide nucleic acid (e.g., an RNP complex) to hybridize to and edit the target nucleic acid.
- the complex does not require a PAM to edit the target nucleic acid.
- nucleic acids refers to proteins, polypeptides, peptides and nucleic acids that are products of various combinations of cloning, restriction, and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems.
- regulatory element refers to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., a guide nucleic acid) or a coding sequence (e.g., effector proteins, fusion proteins, and the like) and/or regulate translation of an encoded polypeptide.
- a non-coding sequence e.g., a guide nucleic acid
- a coding sequence e.g., effector proteins, fusion proteins, and the like
- replica hybridization sequence in the context of a dual nucleic acid system, refers to a sequence of nucleotides of an intRNA that is capable of hybridizing to a repeat sequence of a guide nucleic acid.
- replica sequence refers to a sequence of nucleotides in a guide nucleic acid that is capable of, at least partially, interacting with an effector protein and/or another guide nucleic acid (e.g., hybridizes to a portion of an intermediary RNA).
- reverse transcriptase refers to an enzyme that possesses catalytic activity for reverse transcription of an RNA strand into a DNA strand without the use of secondary structural elements associated with retrons.
- the reverse transcriptase comprises one or more activities selected from RNA-dependent DNA polymerase activities, ribonuclease activities, and DNA-dependent DNA polymerase activities.
- reverse transcriptase activity refers to the catalytic activity that results in the reverse of normal transcription in which a sequence of nucleotides is copied from an RNA template during the synthesis of a molecule of DNA.
- reverse -transcribed DNA or “RT-DNA” or “RT-DNA molecule” as used herein refers to a DNA strand synthesized by reverse transcriptase activity and/or by the activity of a reverse transcriptase from a template sequence.
- ribonucleotide protein complex and “RNP” as used herein, refer to a complex of one or more nucleic acids and one or more polypeptides described herein. While the term utilizes “ribonucleotides” it is understood that the one or more nucleic acid may comprise deoxyribonucleotides (DNA), ribonucleotides (RNA), a combination thereof (e.g., RNA with a thymine base), biochemically or chemically modified nucleobases (e.g., one or more engineered modifications described herein), or combinations thereof.
- DNA deoxyribonucleotides
- RNA ribonucleotides
- a combination thereof e.g., RNA with a thymine base
- biochemically or chemically modified nucleobases e.g., one or more engineered modifications described herein
- R-Loop refers to a three-stranded nucleic acid structure comprising a DNA:RNA hybrid and a displaced strand of DNA.
- an R-Loop can be formed upon hybridization of a guide nucleic acid as described herein to a target sequence of a target nucleic acid.
- the target strand of the R-Loop is that to which a spacer sequence hybridizes.
- FIGS. 1-3 Non-limiting examples of an R-Loop are depicted in FIGS. 1-3.
- single guide nucleic acid refers to a guide nucleic acid, wherein the guide nucleic acid is a single polynucleotide chain having all the required sequence for a functional complex with an effector protein (e.g., being bound by an effector protein, including in some embodiments activating the effector protein, and hybridizing to a target nucleic acid, without the need for a second nucleic acid molecule).
- an sgRNA can have two or more linked guide nucleic acid components (e.g., an intermediary sequence, a repeat sequence, a spacer sequence and optionally a linker).
- single nucleic acid system refers to a system that uses a guide nucleic acid complexed with one or more polypeptides described herein, wherein the complex is capable of interacting with a target nucleic acid in a sequence specific manner, and wherein the guide nucleic acid is capable of non-covalently interacting with the one or more polypeptides described herein, and wherein the guide nucleic acid is capable of hybridizing with a target sequence of the target nucleic acid.
- a single nucleic acid system lacks a duplex of a guide nucleic acid as hybridized to a second nucleic acid, wherein in such a duplex the second nucleic acid, and not the guide nucleic acid, is capable of interacting with the effector protein.
- spacer sequence refers to a nucleotide sequence in a guide nucleic acid that is capable of, at least partially, hybridizing to an equal length portion of a sequence (e.g., a target sequence) of a target nucleic acid.
- target nucleic acid refers to a nucleic acid that is selected as the nucleic acid for editing, binding, hybridization or any other activity of or interaction with a nucleic acid, protein, polypeptide, or peptide described herein.
- a target nucleic acid may comprise RNA, DNA, or a combination thereof.
- a target nucleic acid may be single -stranded (e.g., single-stranded RNA or single -stranded DNA, referred to herein as a ssRNA or ssDNA respectively) or double-stranded (e.g., double-stranded DNA, referred to herein as a dsRNA).
- target sequence in the context of a target nucleic acid, refers to a nucleotide sequence found within a target nucleic acid. Such a nucleotide sequence can, for example, hybridize to a respective length portion of a guide nucleic acid.
- variant refers to a form or version of a protein that differs from the wild-type protein.
- a variant may have a different function or activity relative to the wild-type protein.
- viral vector refers to a nucleic acid to be delivered into a host cell by a recombinantly produced virus or viral particle.
- an RNP complex comprising a Cas effector protein and guide nucleic acid recognizes and cleaves a target sequence of a target nucleic acid, and a reverse transcriptase generates a reverse-transcribed DNA (RT-DNA) (e.g., a single stranded donor (ss donor)) on an extended portion of the guide nucleic acid (e.g., crRNA, intRNA) referred to as an extension sequence, and in some embodiments, comprising a template sequence.
- RT-DNA reverse-transcribed DNA
- crRNA, intRNA extended portion of the guide nucleic acid
- extension sequence e.g., crRNA, intRNA
- System components may be provided in separate compositions or in a single composition.
- nucleic acids of the systems are provided in separate compositions, plasmids, or expression vectors.
- nucleic acids of the systems are provided in a single composition, plasmid, or expression vector.
- the extended portion of the crRNA or intRNA is not required to be complementary to the target sequence (e.g. , of either the target strand (TS) or the non-target strand (NTS)) or portion thereof (e.g. , after strand cleavage) .
- RT- DNA is generated at the 3’ end of a guide nucleic acid, crRNA or intRNA.
- the RT-DNA sequence which can be used as a ss donor nucleic acid, and template sequence can be different from the target sequence.
- neither of the RT-DNA (e.g., ss donor nucleic acid) and template sequence can hybridize to the target sequence.
- neither of the RT-DNA (e.g., a ss donor nucleic acid) and template sequence can hybridize to the target sequence.
- the RT-DNA (e.g. , the ss donor nucleic acid) and template sequence are less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less than 50% identical to the target sequence.
- systems comprise a Type V Cas effector protein or a nucleic acid encoding the Type V effector protein.
- systems comprise a reverse transcriptase (RT) or a nucleic acid that encodes the RT.
- systems comprise an intermediary RNA or DNA molecule encoding the same.
- the intermediary RNA comprises from 5’ to 3’: a protein binding sequence, a repeat hybridization sequence.
- systems comprise an extended crRNA or DNA molecule encoding the same.
- the crRNA comprises from 5 ’ to 3 ’ : a template sequence, a repeat sequence that hybridizes to the repeat hybridization sequence of the intermediary RNA, and a spacer sequence that hybridizes to a target sequence of a target strand of a dsDNA target nucleic acid. While components of the intermediary RNA and crRNA are described in order of 5’ to 3’ in the foregoing description and throughout, these components need not be directly linked. One of skill in the art understands that such guide nucleic acids may comprise linker nucleotides and additional elements.
- the Type V Cas effector protein forms a ribonucleotide (RNP) complex with the intermediary RNA and crRNA; the RNP complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid; the RNP complex cleaves at least one strand of the R- loop to produce a cut site; and the RT reverse transcribes the template sequence to produce a RT-DNA (e.g., a ss donor nucleic acid) at the 3’ end of the intermediary RNA.
- the RT-DNA e.g. , the ss donor nucleic acid
- the RT-DNA is inserted into the non-target strand at the cut site.
- a DNA strand complementary to the RT-DNA (e.g., the ss donor nucleic acid) is polymerized to produce DNA complementary to the RT-DNA, which is directly attached to the non-target or target strand at the cut site.
- systems comprise a Type II effector protein, or a nucleic acid encoding the same.
- systems comprise a reverse transcriptase (RT) or a nucleic acid that encodes the RT.
- systems comprise an extended intermediary RNA or DNA molecule encoding the same.
- the intermediary RNA comprises from 5’ to 3’: a template sequence, a repeat hybridization sequence, and a protein binding sequence.
- systems comprise a crRNA or DNA molecule encoding the same.
- the crRNA comprises from 5 ’ to 3 ’ : a spacer sequence that hybridizes to a target sequence of a target strand of a dsDNA target nucleic acid, and a repeat sequence that hybridizes to the repeat hybridization sequence of the intermediary RNA. While components of the intermediary RNA and crRNA are described in order of 5’ to 3’ in the foregoing description and throughout, these components need not be directly linked. One of skill in the art understands that such guide nucleic acids may comprise linker nucleotides and additional elements.
- the Type II Cas effector protein forms a complex with the intermediary RNA and crRNA; the complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid; the complex cleaves at least one strand of the R-loop to produce a cut site, and the RT reverse transcribes the template sequence to produce RT-DNA (e.g. , the ss donor nucleic acid) at the 3 ’ end of the crRNA.
- RT-DNA e.g., the ss donor nucleic acid
- the RT-DNA is inserted into the non-target strand at the cut site.
- a DNA strand complementary to the RT-DNA (e.g., the ss donor nucleic acid) is polymerized to produce a dsDNA donor nucleic acid which is inserted into the nontarget strand at the cut site.
- nucleotides at the 5’ end of the template sequence are complementary to the 3 ’ end of the intermediary RNA (in the case of Type V systems) or the 3’ end of the crRNA (in the case of Type II systems).
- at least 2, at least 3, at least 4, at least 5, or at least 10 nucleotides are complementary.
- production of an intermediary RNA is driven by a U6 promoter and terminated by a heptaT sequence, which adds multiple uracils to the 3 ’ end of the intermediary RNA.
- the 3 ’ end of the intermediary RNA hybridizes to the 5’ end of the template sequence of the crRNA may comprise three to seven adenosines.
- the RT is fused to the Cas effector protein.
- the RT comprises an aptamer binding moiety, and the intermediary RNA or crRNA comprises an aptamer that recruits the RT to a target sequence.
- the RT is fused to a first peptide/protein and the Cas effector protein comprises a second peptide/protein, wherein the first peptide/protein interacts with a second peptide/protein.
- the effector protein removes a portion of a cleaved strand of the R-loop. In some embodiments, the effector protein removes a portion of a non-target strand (NTS) of a target sequence . This may also be referred to as ssDNA removal.
- systems comprise an endonuclease that removes the portion of the NTS.
- the endonuclease may be an endogenous nuclease in a cell.
- the endonuclease may be an exogenous nuclease introduced to a cell.
- systems comprise an accessory protein that recruits an endonuclease to the RNP complex.
- the endonuclease or accessory protein is fused to the effector protein or the RT.
- systems comprise a domain that recruits or is itself a protein involved in synthesis dependent strand annealing (SDSA) or nucleic acid encoding the same.
- systems comprise an accessory protein, wherein the accessory protein comprises a domain that recruits factors for or is itself a protein involved in SDSA.
- the SDSA proteins enable synthesis of a complementary strand on the ss donor RT-DNA to produce a strand that contains the edit and is directly linked to the genomic DNA.
- the SDSA protein is a DNA polymerase.
- the length of the portion of the non-target strand that is removed is at least 5, at least 10, at least 15, at least 20, or at least 25 nucleotides.
- systems comprise an accessory protein, wherein the accessory protein comprises one or more repair proteins that promotes, increases or enables nucleic acid repair mechanisms.
- systems comprise a DNA repair protein that promotes, increases or enables DNA repair at the cut site.
- the DNA repair protein recruits other NHEJ proteins to the cut site.
- the DNA repair protein comprises a ligase.
- the DNA repair protein is fused to or interacts with any one of an effector protein, RT, and an accessory protein.
- compositions, systems, and methods comprising an effector protein or a use thereof.
- effector proteins interact with a guide nucleic acid to form a complex.
- an interaction between the complex and a target nucleic acid comprises one or more of: recognition of a protospacer adjacent motif (PAM) sequence within the target nucleic acid by the effector protein, hybridization of the guide nucleic acid to the target nucleic acid, and optionally modification of the target nucleic acid and/or the non-target nucleic acid.
- PAM protospacer adjacent motif
- effector proteins have some detectable catalytic activity.
- the catalytic activity is nuclease activity (e.g., cleaving a strand of a nucleic acid, breaking a phosphodiester bond). In some embodiments, the catalytic activity is nickase activity.
- effector proteins are CRISPR associated (Cas) proteins.
- the Cas protein is a Class 1 Cas protein.
- the Cas protein is a Class 2 protein.
- the Cas protein is selected from a Type I, Type II, Type III, Type IV, and Type V Cas protein.
- the Cas protein is a Type V Cas protein.
- the Cas protein is a Casl2 protein.
- the Cas protein is a Casl4 protein.
- the Cas protein is a Type VU protein.
- the Cas protein is a Type VU-3 protein.
- the Cas protein is a Type VU-4 protein.
- the effector protein comprises an engineered variant of any of the Cas proteins described herein.
- the effector protein comprises transposase activity.
- the effector protein comprises integrase activity.
- the effector protein comprises an IscB protein or engineered variant thereof.
- effector proteins described herein comprise one or more functional domains.
- Effector protein functional domains can include a protospacer adjacent motif (PAM)-interacting domain, an oligonucleotide-interacting domain, one or more recognition domains, a non-target strand interacting domain, and a RuvC domain.
- a PAM interacting domain can be a target strand PAM interacting domain (TPID) or a non-target strand PAM interacting domain (NTPID).
- TPID target strand PAM interacting domain
- NTPID non-target strand PAM interacting domain
- a PAM interacting domain, such as a TPID or a NTPID, on an effector protein describes a region of an effector protein that interacts with target nucleic acid.
- effector proteins described herein comprise one or more recognition domain (REC domain) with a binding affinity for a guide nucleic acid or for a guide nucleic acid-target nucleic acid heteroduplex.
- An effector protein described herein may comprise a zinc finger domain.
- effector proteins comprise an HNH domain.
- effector proteins comprise an HNH domain.
- the effector protein is a Type V effector protein.
- a Type V protein comprises a RuvC domain and an HNH domain.
- the effector protein is a Type II effector protein.
- the Type II effector protein is a Cas9 protein.
- a Type II protein comprises a RuvC domain and an HNH domain.
- An effector protein may have a length of at least about 200, at least about 225, at least about 250, at least about 275, at least about 300, at least about 325, at least about 350, at least about 375, at least about 400, at least about 425, at least about 450, at least about 475, at least about 500, at least about 525, at least about 550, at least about 575, at least about 600, at least about 625, at least about 650, at least about 675, at least about 700, at least about 725, at least about 750, at least about 775, at least about 800, at least about 825, at least about 850, at least about 875, at least about 900, at least about 925, at least about 950, at least about 975, at least about 1,000, or more linked amino acids.
- the length of an effector is less than 1,000 linked amino acids. In some embodiments, the length of the effector protein is 300 to 500, 300 to 600, 300 to 700, 300 to 800, 300 to 900, 400 to 500, 400 to 600, 400 to 700, 400 to 800, or 400 to 900 linked amino acids.
- TABLE 1 provides illustrative amino acid sequences of effector proteins that are useful in the compositions, systems and methods described herein.
- compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the amino acid sequence of the effector protein comprises at least about 200 contiguous amino acids or more of any one of the sequences recited in TABLE 1.
- the amino acid sequence of an effector protein provided herein comprises at least about 200 contiguous amino acids, at least about 225 contiguous amino acids, at least about 250 contiguous amino acids, at least about 275 contiguous amino acids, at least about 300 contiguous amino acids, at least about 325 contiguous amino acids, at least about 350 contiguous amino acids, at least about 375 contiguous amino acids, at least about 400 contiguous amino acids, or more of any one of the sequences of TABLE 1
- compositions, systems, and methods described herein comprise an effector protein or a nucleic acid encoding the effector protein, wherein the effector protein comprises a portion of any one of the sequences recited in TABLE 1.
- the effector protein comprises a portion of any one of the sequences recited in TABLE 1, wherein the portion does not comprise at least the first 10 amino acids, at least the first 20 amino acids, at least the first 40 amino acids, at least the first 60 amino acids, at least the first 80 amino acids, at least the first 100 amino acids, at least the first 120 amino acids, at least the first 140 amino acids, at least the first 160 amino acids, at least the first 180 amino acids, or at least the first 200 amino acids of any one of the sequences recited in TABLE 1.
- the effector protein comprises a portion of any one of the sequences recited in TABLE 1, wherein the portion does not comprise the last 10 amino acids, the last 20 amino acids, the last 40 amino acids, the last 60 amino acids, the last 80 amino acids, the last 100 amino acids, the last 120 amino acids, the last 140 amino acids, the last 160 amino acids, the last 180 amino acids, or the last 200 amino acids of any one of the sequences recited in TABLE 1.
- compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to any one of the sequences as set forth in TABLE 1.
- an effector protein provided herein comprises an amino acid sequence that is at least 65% identical to any one of the sequences as set forth in TABLE 1.
- an effector protein provided herein comprises an amino acid sequence that is at least 70% identical to any one of the sequences as set forth in TABLE 1.
- an effector protein provided herein comprises an amino acid sequence that is at least 75% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 80% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 85% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 90% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 95% identical to any one of the sequences as set forth in TABLE 1.
- an effector protein provided herein comprises an amino acid sequence that is at least 97% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 98% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 99% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is identical to any one of the sequences as set forth in TABLE 1.
- compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% similar to any one of the sequences as set forth in TABLE 1.
- an effector protein provided herein comprises an amino acid sequence that is at least 80% similar to any one of the sequences as set forth in TABLE 1.
- an effector protein provided herein comprises an amino acid sequence that is at least 85% similar to any one of the sequences as set forth in TABLE 1.
- an effector protein provided herein comprises an amino acid sequence that is at least 90% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 95% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 97% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 98% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 99% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is 100% similar to any one of the sequences as set forth in TABLE 1.
- compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein comprises one or more amino acid alterations relative to any one of the sequences recited in TABLE 1.
- the effector protein comprising one or more amino acid alterations is a variant of an effector protein described herein. It is understood that any reference to an effector protein herein also refers to an effector protein variant as described herein.
- the one or more amino acid alterations comprises conservative substitutions, non-conservative substitutions, conservative deletions, non-conservative deletions, or combinations thereof.
- an effector protein or a nucleic acid encoding the effector protein comprises 1 amino acid alteration, 2 amino acid alterations, 3 amino acid alterations, 4 amino acid alterations, 5 amino acid alterations, 6 amino acid alterations, 7 amino acid alterations, 8 amino acid alterations, 9 amino acid alterations, 10 amino acid alterations or more relative to any one of the sequences recited in TABLE 1.
- effector proteins are described in WO2021247924, WO2023141590, WO2022240858, W02024006824, WO2023092136, WO202222I58I, WO2020223634, WO2023028444, WO2023102329, and WO2023092132.
- compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to any one of the sequences described in these publications.
- compositions, systems, and methods described herein comprise a nucleic acid encoding the effector protein, wherein the nucleic acid encoding the effector protein comprises RNA or messenger RNA (mRNA).
- mRNA messenger RNA
- effector proteins described herein have been modified (also referred to as an engineered protein).
- a modification of the effector proteins may include addition of one or more amino acids, deletion of one or more amino acids, substitution of one or more amino acids, or combinations thereof relative to a naturally occurring sequence.
- effector proteins disclosed herein are engineered proteins. Unless otherwise indicated, reference to effector proteins throughout the present disclosure include engineered proteins thereof.
- effector proteins may comprise one or more modifications that may provide altered activity as compared to a naturally-occurring counterpart. For example, effector proteins may comprise one or more modifications that may provide increased activity as compared to a naturally-occurring counterpart.
- effector proteins may provide increased catalytic activity (e.g., nickase, nuclease, binding activity) as compared to a naturally-occurring counterpart.
- Effector proteins may provide enhanced nucleic acid binding activity (e.g. , enhanced binding of a guide nucleic acid, and/or target nucleic acid) as compared to a naturally-occurring counterpart.
- An effector protein may have a 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 120%, 140%, 160%, 180%, 200%, or more, increase of the activity of a naturally-occurring counterpart.
- effector proteins may comprise one or more modifications that reduce the activity of the effector proteins relative to a naturally occurring nuclease, or nickase.
- An effector protein may have a 100%, 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, 10%, 5%, 1%, or less, decrease of the activity of a naturally occurring counterpart. Decreased activity may be decreased catalytic activity (e.g., nickase, nuclease, binding activity) as compared to a naturally-occurring counterpart.
- activity e.g., nickase, nuclease, binding, activity
- activity of effector proteins described herein can be measured relative to a naturally-occurring effector protein or compositions containing the same in a cleavage assay.
- effector proteins described herein can be modified with the addition of one or more heterologous peptides or heterologous polypeptides (referred to collectively herein as a heterologous polypeptide).
- an effector protein modified with the addition of one or more heterologous peptides or heterologous polypeptides may be referred to herein as a fusion protein.
- heterologous polypeptides described herein are fused to effector protein(s).
- a fusion protein comprises heterologous polypeptide(s) and effector protein(s).
- a heterologous peptide or heterologous polypeptide comprises a subcellular localization signal.
- a subcellular localization signal can be a nuclear localization signal (NLS).
- the NLS facilitates localization of a nucleic acid, protein, or small molecule to the nucleus, when present in a cell that contains a nuclear compartment.
- TABLE 2 lists exemplary NLS sequences.
- the subcellular localization signal is a nuclear export signal (NES), a sequence to keep an effector protein retained in the cytoplasm, a mitochondrial localization signal for targeting to the mitochondria, a chloroplast localization signal for targeting to a chloroplast, an ER retention signal, and the like.
- NES nuclear export signal
- an effector protein described herein is not modified with a subcellular localization signal so that the polypeptide is not targeted to the nucleus, which can be advantageous depending on the circumstance (e.g., when the target nucleic acid is an RNA that is present in the cytosol).
- the heterologous polypeptide is a cell penetrating peptide (CPP), also known as a Protein Transduction Domain (PTD).
- CPP or PTD is a polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane.
- heterologous polypeptides include, but are not limited to, proteins (or fragments/domains thereof) that are boundary elements (e.g., CTCF), proteins and fragments thereof that provide periphery recruitment (e.g., Lamin A, Lamin B, etc.), and protein docking elements (e.g., FKBP/FRB, Pill/Abyl, etc.).
- boundary elements e.g., CTCF
- proteins and fragments thereof that provide periphery recruitment e.g., Lamin A, Lamin B, etc.
- protein docking elements e.g., FKBP/FRB, Pill/Abyl, etc.
- a heterologous peptide or heterologous polypeptide comprises a protein tag.
- the protein tag is referred to as purification tag or a fluorescent protein.
- the protein tag may be detectable for use in detection of the effector protein and/or purification of the effector protein.
- compositions, systems and methods comprise a protein tag or use thereof. Any suitable protein tag may be used depending on the purpose of its use.
- Non-limiting examples of protein tags include a fluorescent protein, a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and maltose binding protein (MBP).
- the protein tag is a portion of MBP that can be detected and/or purified.
- fluorescent proteins include green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), mCherry, and tdTomato.
- a heterologous polypeptide may be located at or near the amino terminus (N-terminus) of the effector protein disclosed herein.
- a heterologous polypeptide may be located at or near the carboxy terminus (C-terminus) of the effector proteins disclosed herein.
- a heterologous polypeptide is located internally in an effector protein described herein (z.e., is not at the N- or C- terminus of an effector protein described herein) at a suitable insertion site.
- effector proteins described herein are encoded by a codon optimized nucleic acid.
- a nucleic acid sequence encoding an effector protein described herein is codon optimized.
- effector proteins described herein may be codon optimized for expression in a specific cell, for example, a bacterial cell, a plant cell, a eukaryotic cell, an animal cell, a mammalian cell, or a human cell.
- the effector protein is codon optimized for a human cell.
- compositions, systems, and methods comprising a reverse transcriptase or a use thereof.
- the reverse transcriptase is fused to an effector protein.
- the reverse transcriptase is recruited to the effector protein or to a target nucleic acid to be modified by the effector protein.
- a fusion partner imparts some function or activity to a fusion protein that is not provided by an effector protein, including but not limited to reverse transcriptase activity, nuclease activity, ligase activity, or combinations thereof.
- the compositions, systems and methods provided herein comprise one or more fusion partners.
- the fusion partner described herein comprise one or more subcellular localization signals described herein.
- the one or more fusion partners comprise at least one, at least two, at least three, at least four, at least five, or more fusion partners.
- the one or more fusion partners comprise one, two, three, four, fiveor more fusion partners.
- the fusion partners described herein function to repair DNA single-strand breaks or DNA double-strand breaks.
- the repair can be with or without insertion of the donor nucleic acids.
- the fusion partner comprises one or more proteins associated with the NHEJ and/or HDR mechanism (e.g. , repair factors described herein).
- the HDR mechanism is governed by a homology between a donor DNA (e.g. , a donor nucleic acid) and an acceptor DNA (e.g., target nucleic acid).
- the HDR mechanism comprises an abbreviated homologous recombination, a single-strand annealing or a breakage -induced replication.
- the homology in abbreviated HDR mechanism is greater than the homology in breakage-induced replication repair mechanism.
- a fusion partner comprises reverse transcriptase activity.
- a fusion partner comprises a reverse transcriptase.
- fusions of effector proteins and reverse transcriptases described herein are referred to as fusion proteins.
- fusion proteins comprise a reverse transcriptase fused to an effector protein by a linker, such as a linker described herein.
- fusion proteins comprise a reverse transcriptase directly fused to an effector protein.
- fusion proteins comprise a reverse transcriptase that is not fused to an effector protein.
- reverse transcriptases are localized to systems described herein, effector proteins described herein, RNP complexes described herein, or to target nucleic acids described herein.
- Cas-RT fusions comprise a reverse transcriptase fused to an effector protein by a linker, such as a linker described herein.
- Cas-RT fusions comprise a reverse transcriptase directly fused to an effector protein.
- Cas-RT fusions comprise a reverse transcriptase that is not fused to an effector protein.
- reverse transcriptases are localized to an effector protein, to an RNP complex, or to a target nucleic acid.
- fusion proteins or effector proteins described herein leverage reverse transcriptase activity of reverse transcriptases (e.g., as fusion partners or as separate entities) to edit or modify target nucleic acids as described herein.
- reverse transcriptases are capable of using an RNA sequence contained in a crRNA or intRNA to reverse transcribe to produce a reverse- transcribed DNA (RT-DNA).
- RT-DNA reverse-transcribed DNA
- reverse transcriptases do not rely upon use of secondary structural elements (e.g., hairpin loops) to produce a reverse-transcribed DNA (RT-DNA).
- a reverse transcriptase catalyzes the transcription of an RNA sequence into a reverse-transcribed DNA (RT-DNA).
- the RNA sequence is a template sequence as described herein.
- a template sequence is extended from a nucleic acid described herein (e.g., an extended guide nucleic acid, such as an extended crRNA, or an extended intRNA).
- a RT-DNA comprises a nucleic acid that can serve as a donor nucleic acid (e.g. , a ss donor nucleic acid) that is incorporated into a target nucleic acid or genome.
- a RT-DNA comprises a nucleic acid that is can serve as a template to generate a donor nucleic acid that is incorporated into a target nucleic acid or genome.
- a RT-DNA comprises a nucleic acid that is capable as serves as a substrate for the activity of systems, compositions, and/or methods described herein.
- a RT-DNA that serves as a substrate as described herein is capable of facilitating the introduction of a DNA sequence modification (e.g., an insertion, a deletion, a substitution, or combinations thereof) into a locus by homologous recombination using nucleic acid-guided nucleases, such as a ligase.
- a reverse transcriptase is capable of reverse transcribing an RNA strand (e.g., a template sequence) without the use of a homologous structural scaffold.
- RNA strand e.g., a template sequence
- reverse transcriptase activity on an RNA strand such as the activity of a bacterial retron, requires a homologous structural scaffold on the RNA strand to support the reverse transcriptase activity as it moves down the 3’ end of the RNA strand .
- reverse transcriptases described herein utilize guide nucleic acids described herein as a structural scaffold for reverse transcriptase activity.
- guide nucleic acids described herein comprise a nucleotide sequence heterologous of the reverse transcriptase.
- reverse transcriptases described herein are viral reverse transcriptases. Exemplary reverse transcriptases are set forth in TABLE 1.1.
- systems and methods comprise a reverse transcriptase, nucleic acid encoding the reverse transcriptase, or a use thereof, wherein the reverse transcriptase is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to a reverse transcriptase set forth in TABLE 1.1.
- a reverse transcriptase is an RNA-dependent DNA polymerase (RDDP).
- the RDDP is an RDDP described in WO 2024/040202.
- compositions, systems, and methods comprise one or more accessory proteins or uses thereof.
- systems described herein recruit one or more accessory protein.
- reverse transcriptase activity, cleavage activity or both, of the effector proteins, fusion partners and/or fusion proteins described herein recruit one or more accessory protein.
- the one or more accessory protein is exogenous. In some embodiments the one or more accessory protein is exogenous.
- one or more accessory proteins comprise one or more nucleases (e.g., endonucleases), polymerases, ligases, or combinations thereof. In some embodiments, the one or more accessory proteins comprise one or more nucleases to resect damaged DNA. In some embodiments, the one or more accessory proteins comprise one or more polymerases to fdl-in new DNA. In some embodiments, the one or more accessory proteins comprise one or more ligases to restore integrity to the DNA strands.
- nucleases e.g., endonucleases
- polymerases e.g., polymerases
- ligases e.g., ligases, or combinations thereof.
- the one or more accessory proteins comprise one or more nucleases to resect damaged DNA.
- the one or more accessory proteins comprise one or more polymerases to fdl-in new DNA.
- the one or more accessory proteins comprise one or more ligases to restore integrity to the DNA strands.
- the one or more one or more accessory proteins comprises from Ku 70/80, DNA-PKcs, Artemis, Pol p, Pol X, XRCC4, ligase IV, XLF, or a combination thereof, wherein the one or more accessory proteins function to repair DNA by NHEJ.
- the one or more one or more accessory proteins comprises BRCA1, BRCA2, CtIP, EX01, BLM, MRE11, Nbsl, PALB2, RAD50, RAD51 (e.g., RAD51B, RAD51C, and RAD51D), XRCC2, XRCC3, RAD52, RAD548, replication protein A (RPA), SWSAP1, or a combination thereof, wherein the one or more one or more accessory proteins function to repair DNA by HDR.
- RAD51 e.g., RAD51B, RAD51C, and RAD51D
- RPA replication protein A
- SWSAP1 SWSAP1
- compositions, systems, and methods comprising a DNA repair protein or a use thereof. In some embodiments, compositions, systems, and methods comprising a ligase or a use thereof. In some embodiments, compositions, systems, and methods comprising an endonuclease or a use thereof. In some embodiments, DNA repair proteins, ligases and endonucleases are endogenous to a cell or subject. In some embodiments, DNA repair proteins, ligases and endonucleases are exogenous factors provided with the system or composition.
- systems comprise a first polypeptide and a second polypeptide connected by a linker.
- an effector protein may be connected to a fusion partner protein, e.g., a reverse transcriptase, a ligase, an SDSA, or an accessory protein via a linker.
- the linker may comprise or consist of a covalent bond.
- the linker may comprise or consist of a chemical group.
- the linker comprises an amino acid.
- a peptide linker comprises at least two amino acids linked by an amide bond. In general, the linker connects a terminus of the first polypeptide to a terminus of the second polypeptide.
- carboxy terminus of the first polypeptide is linked to the amino terminus of the second polypeptide. In some embodiments, carboxy terminus of the second polypeptide is linked to the amino terminus of the first polypeptide. In some embodiments, the first polypeptide and the second polypeptide are directly linked by a covalent bond.
- linkers comprise one or more amino acids.
- linker is a protein.
- a terminus of the effector protein is linked to a terminus of the fusion partner through an amide bond.
- a terminus of the effector protein is linked to a terminus of the fusion partner through a peptide bond.
- linkers comprise an amino acid.
- linkers comprise a peptide.
- an effector protein is coupled to a fusion partner by a linker protein.
- the linker may have any of a variety of amino acid sequences.
- the linker may comprise a region of rigidity (e.g., beta sheet, alpha helix), a region of flexibility, or any combination thereof.
- the linker comprises small amino acids, such as glycine and alanine, that impart high degrees of flexibility.
- design of a peptide conjugated to any desired element may include linkers that are all or partially flexible, such that the linker may include a flexible linker as well as one or more portions that confer less flexible structure.
- Suitable linkers include proteins of 4 linked amino acids to 40 linked amino acids in length, or between 4 linked amino acids and 25 linked amino acids in length.
- linked amino acids described herein comprise at least two amino acids linked by an amide bond.
- Linkers may be produced by using synthetic, linker-encoding oligonucleotides to couple proteins, or may be encoded by a nucleic acid sequence encoding a fusion protein (e.g., an effector protein coupled to a fusion partner).
- the linker is from 1 to 100 amino acids in length. In some embodiments, the linker is more 100 amino acids in length. In some embodiments, the linker is from 10 to 27 amino acids in length.
- linker proteins include glycine polymers (G)n, glycineserine polymers (including, for example, (GS)n, GSGGSn (SEQ ID NO: 72), GGSGGSn (SEQ ID NO: 73), and GGGSn (SEQ ID NO: 74), where n is an integer of at least one), glycine-alanine polymers, and alanine-serine polymers.
- linkers may comprise amino acid sequences including, but not limited to, GGSG (SEQ ID NO: 75), GGSGG (SEQ ID NO: 76), GSGSG (SEQ ID NO: 77), GSGGG (SEQ ID NO: 78), GGGSG (SEQ ID NO: 68), and GSSSG (SEQ ID NO: 69).
- the linker comprises one or more repeats a tri -peptide GGS.
- the linker is an XTEN linker.
- the XTEN linker is an XTEN80 linker.
- the XTEN linker is an XTEN20 linker.
- the XTEN20 linker has an amino acid sequence of GSGGSPAGSPTSTEEGTSESATPGSG (SEQ ID NO: 70).
- linkers do not comprise an amino acid. In some embodiments, linkers do not comprise a peptide. In some embodiments, linkers comprise a nucleotide, a polynucleotide, a polymer, or a lipid.
- linker may be a polyethylene glycol (PEG), polypropylene glycol (PPG), co-poly(ethylene/propylene) glycol, polyoxyethylene (POE), polyurethane, polyphosphazene, polysaccharides, dextran, polyvinyl alcohol, polyvinylpyrrolidones, polyvinyl ethyl ether, polyacrylamide, polyacrylate, polycyanoacrylates, lipid polymers, chitins, hyaluronic acid, heparin, or an alkyl linker.
- PEG polyethylene glycol
- PPG polypropylene glycol
- POE polyoxyethylene
- polyurethane polyphosphazene
- polysaccharides dextran
- polyvinyl alcohol polyvinylpyrrolidones
- polyvinyl ethyl ether polyacrylamide
- polyacrylate polycyanoacrylates
- lipid polymers chitins, hyaluronic acid, he
- compositions, systems, and methods of the present disclosure may comprise a guide nucleic acid or a use thereof.
- compositions, systems, and methods comprising guide nucleic acids or uses thereof, as described herein and throughout include DNA molecules, such as expression vectors, that encode a guide nucleic acid.
- compositions, systems, and methods of the present disclosure comprise a guide nucleic acid or a nucleotide sequence encoding the guide nucleic acid.
- the guide nucleic acid comprises a nucleotide sequence.
- nucleotide sequence may be described as a nucleotide sequence of either DNA or RNA, however, no matter the form the sequence is described, it is readily understood that such nucleotide sequences can be revised to be RNA or DNA, as needed, for describing a sequence within a guide nucleic acid itself or the sequence that encodes a guide nucleic acid.
- disclosure of the nucleotide sequences described herein also discloses a complementary nucleotide sequence, a reverse nucleotide sequence, and the reverse complement nucleotide sequence, any one of which can be a nucleotide sequence for use in a guide nucleic acid.
- a guide nucleic acid sequence(s) comprises one or more nucleotide alterations at one or more positions in any one of the sequences described herein.
- Alternative nucleotides can be any one or more of A, C, G, T or U, or a deletion, or an insertion.
- compositions, systems, and methods of the present disclosure may comprise a guide nucleic acid, a nucleic acid encoding the guide nucleic acid, or a use thereof.
- compositions, systems, and methods comprising guide nucleic acids or uses thereof, as described herein and throughout, include DNA molecules, such as expression vectors, that encode a guide nucleic acid.
- Guide nucleic acids are also referred to herein as “guide RNA.”
- a guide nucleic acid, as well as any components thereof may comprise one or more deoxyribonucleotides, ribonucleotides, biochemically or chemically modified nucleotides (e.g., one or more engineered modifications as described herein), or any combinations thereof.
- guide nucleic acids disclosed herein are not naturally occurring.
- a guide nucleic acid may comprise a non-naturally occurring sequence, wherein the sequence of the guide nucleic acid, or any portion thereof, may be different from the sequence of a naturally occurring guide nucleic acid.
- a guide nucleic acid comprises two naturally occurring sequences that do not occur in nature together.
- a guide nucleic acid of the present disclosure may comprise an engineered modification that makes it different from a nucleic acid that occurs in nature.
- a guide nucleic acid may be chemically synthesized or recombinantly produced by any suitable methods.
- guide nucleic acids described herein comprise one or more engineered modifications.
- Non-limiting examples of engineered modifications include: 2’0-methyl modified nucleotides (e.g., 2’-O-Methyl (2’0Me) sugar modifications); 2’ fluoro modified nucleotides (e.g., 2’-fluoro (2’-F) sugar modifications); locked nucleic acid (LNA) modified nucleotides; peptide nucleic acid (PNA) modified nucleotides; nucleotides with phosphorothioate linkages; a 5’ cap (e.g., a 7-methylguanylate cap (m7G)), phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates, 5'-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3 '-amino
- a guide nucleic acid comprises a first region that is not complementary to a target nucleic acid (FR) and a second region is complementary to the target nucleic acid (SR), wherein the FR and the SR are heterologous to each other.
- FR is located 5’ to SR (FR-SR).
- SR is located 5’ to FR (SR-FR).
- the FR comprises one or more repeat sequence, intermediary sequence, combinations thereof.
- at least a portion of the FR interacts or binds to an effector protein.
- the SR comprises a spacer sequence, wherein the spacer sequence can interact in a sequence -specific manner with (e.g., has complementarity with, or can hybridize to a target sequence in) a target nucleic acid.
- the first region, the second region, or both may be about 8 nucleic acids, about 10 nucleic acids, about 12 nucleic acids, about 14 nucleic acids, about 16 nucleic acids, about 18 nucleic acids, about 20 nucleic acids, about 22 nucleic acids, about 24 nucleic acids, about 26 nucleic acids, about 28 nucleic acids, about 30 nucleic acids, about 32 nucleic acids, about 34 nucleic acids, about 36 nucleic acids, about 38 nucleic acids, about 40 nucleic acids, about 42 nucleic acids, about 44 nucleic acids, about 46 nucleic acids, about 48 nucleic acids, or about 50 nucleic acids long.
- the first region, the second region, or both may be from about 8 to about 12, from about 8 to about 16, from about 8 to about 20, from about 8 to about 24, from about 8 to about 28, from about 8 to about 30, from about 8 to about 32, from about 8 to about 34, from about 8 to about 36, from about 8 to about 38, from about 8 to about 40, from about 8 to about 42, from about 8 to about 44, from about 8 to about 48, or from about 8 to about 50 nucleic acids long.
- the first region, the second region, or both may comprise a GC content of about 1%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, or about 99%.
- the first region, the second region, or both may comprise a GC content of from about 1% to about 95%, from about 5% to about 90%, from about 10% to about 80%, from about 15% to about 70%, from about 20% to about 60%, from about 25% to about 50%, or from about 30% to about 40%.
- a guide nucleic acid comprises about: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 linked nucleotides.
- a guide nucleic acid comprises at least: 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60 linked nucleotides.
- the length of a guide nucleic acid is about 30 to about 200 linked nucleotides.
- the length of a guide nucleic acid is about 40 to about 150, about 40 to about 120, about 40 to about 100, about 40 to about 90, about 40 to about 80, about 40 to about 70, about 40 to about 60, about 40 to about 50, about 50 to about 90, about 50 to about 80, about 50 to about 70, or about 50 to about 60 linked nucleotides.
- the length of a guide nucleic acid is about 40, about 45, about 50, about 55, about 60, about 65, about 70 or about 75 linked nucleotides.
- the length of a guide nucleic acid is greater than about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70 or about 75 linked nucleotides. In some embodiments, the length of a guide nucleic acid is not greater than about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 105, about 110, about 115, about 120, or about 125 linked nucleotides.
- a guide nucleic acid comprises at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 contiguous nucleotides that are complementary to a eukaryotic sequence.
- a eukaryotic sequence is a nucleotide sequence that is present in a host eukaryotic cell.
- Such a nucleotide sequence is distinguished from nucleotide sequences present in other host cells, such as prokaryotic cells, or viruses.
- Said sequences present in a eukaryotic cell can be located in a gene, an exon, an intron, a non-coding (e.g., promoter or enhancer) region, a selectable marker, tag, signal, and the like.
- a target sequence is a eukaryotic sequence.
- the guide nucleic acid comprises a nucleotide sequence that is capable of hybridizing to a target sequence in a target nucleic acid, wherein the target nucleic acid is any one of: a naturally occurring eukaryotic sequence, a naturally occurring prokaryotic sequence, a naturally occurring viral sequence, a naturally occurring bacterial sequence, a naturally occurring fungal sequence, an engineered eukaryotic sequence, an engineered prokaryotic sequence, an engineered viral sequence, an engineered bacterial sequence, an engineered fungal sequence, a fragment of a naturally occurring sequence, a fragment of an engineered sequence, and combinations thereof.
- compositions, systems and methods described herein comprise a dual guide nucleic acid system (or simply, “dual guide system”) comprising a crRNA or a nucleotide sequence encoding the crRNA, an intermediary RNA or a nucleotide sequence encoding the intermediary RNA, wherein the crRNA and the intermediary RNA are separate, unlinked molecules, wherein a repeat hybridization region of the intermediary RNA is capable of hybridizing with an equal length portion of the crRNA to form a intRNA-crRNA duplex, and wherein a spacer sequence of the crRNA is capable of hybridizing to a target sequence of the target nucleic acid.
- a dual guide nucleic acid system or simply, “dual guide system” comprising a crRNA or a nucleotide sequence encoding the crRNA, an intermediary RNA or a nucleotide sequence encoding the intermediary RNA, wherein the crRNA and the intermediary RNA are separate, unlinked molecules, wherein a repeat hybridization
- An intermediary RNA and/or intRNA-crRNA duplex may form a secondary structure that facilitates the binding of an effector protein to a target nucleic acid.
- the crRNA is linked to the intermediary sequence to form a single guide nucleic acid (sgRNA).
- a guide nucleic acid comprises a crRNA.
- the guide nucleic acid is the crRNA.
- a crRNA comprises a first region (FR) and a second region (SR), wherein the FR of the crRNA comprises a repeat sequence, and the SR of the crRNA comprises a spacer sequence.
- the repeat sequence and the spacer sequences are directly connected to each other (e.g. , covalent bond (phosphodiester bond)).
- the repeat sequence and the spacer sequence are connected by a linker.
- Type V Cas effector proteins function with a crRNA, wherein the FR is located 5 ’ of the SR.
- Type II Cas effector proteins function with a crRNA, wherein the FR is located 3’ of the SR.
- the FR may be immediately 5’ of the SR (in the case of Type V).
- the FR may be immediately 3’ of the SR (in the case of Type II).
- the FR may be separated from the SR by one or more nucleotides. Non-limiting examples of repeat sequences are provided in TABLE 3.
- the crRNA comprises an extension sequence.
- a crRNA comprising an extension sequence is an extended crRNA.
- systems comprise a Type V Cas effector protein and an extended crRNA, wherein the extended crRNA comprises an extension sequence at the 5’ end of the crRNA.
- the extension sequence comprises a template sequence.
- the extension sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides.
- the template sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides.
- the extension sequence is not greater than 200, 500 or 1000 nucleotides.
- the extension sequence is not greater than lOkb or not greater than 20kb. In some embodiments, the template sequence cannot hybridize to the target sequence or the reverse complement thereof. In some embodiments, the template sequence is less than 100%, less than 99%, less than 98% less than 95%, less than 90%, less than 80%, less than 70%, less than 60%, or less than 50% complementary to the target sequence or the reverse complement thereof.
- the crRNA comprises a homology sequence. In some embodiments, the crRNA comprises a homology sequence, wherein the effector protein is a type II Cas protein. In some embodiments, a crRNA comprising a homology sequence does not comprise an extension sequence. In some embodiments, the homology sequence is located at the 5’ end of the crRNA. In some embodiments, the homology sequence is located at the 3’ end of the crRNA. In some embodiments, the homology sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the homology sequence is not greater than 100 nucleotides.
- a crRNA comprising a homology sequence is part of a dual nucleic acid system having an intermediary RNA, wherein the intermediary RNA comprises an extension sequence.
- the extension sequence of the intermediary RNA comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides.
- the extension sequence of the intermediary RNA is not greater than 100 nucleotides.
- the homology sequence has 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mismatched nucleotides to the extension sequence.
- a crRNA may include deoxyribonucleosides, ribonucleosides, chemically modified nucleosides, or any combination thereof.
- a crRNA comprises about: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 linked nucleotides.
- a crRNA comprises at least: 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60 linked nucleotides.
- the length of the crRNA is about 20 to about 120 linked nucleotides. In some embodiments, the length of a crRNA is about 20 to about 100, about 30 to about 100, about 40 to about 100, about 40 to about 90, about 40 to about 80, about 40 to about 70, about 40 to about 60, about 40 to about 50, about 50 to about 90, about 50 to about 80, about 50 to about 70, or about 50 to about 60 linked nucleotides. In some embodiments, the length of a crRNA is about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70 or about 75 linked nucleotides.
- the repeat sequence is between 5 and 10, 10 and 50, 12 and 48, 14 and 46, 16 and 44, and 18 and 42 nucleotides in length.
- a spacer sequence comprises at least 5 to about 50 linked nucleotides.
- a spacer sequence comprises at least 5 to about 50, at least 5 to about 25, at least about 10 to at least about 25, or at least about 15 to about 25 linked nucleotides.
- the spacer sequence comprises 15-28 linked nucleotides.
- a spacer sequence comprises 15-26, 15-24, 15-22, 15-20, 15-18, 16-28, 16-26, 16-24, 16-22, 16-20, 16-18, 17-26, 17-24, 17-22, 17-20, 17-18, 18-26, 18-24, or 18-22 linked nucleotides.
- the spacer sequence comprises 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides.
- the spacer sequence of a spacer sequence need not be 100% complementary to that of a target sequence of a target nucleic acid to hybridize or hybridize specifically to the target sequence .
- the spacer sequence may comprise at least one alteration, such as a substituted or modified nucleotide, that is not complementary to the corresponding nucleotide of the target sequence.
- a spacer sequence comprises at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, or at least 15 contiguous nucleotides that are complementary to a target sequence in a target nucleic acid.
- the spacer sequence comprises at least 10 contiguous nucleotides that are complementary to the target sequence in the target nucleic acid. In some embodiments, the spacer sequence is at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% complementary to the target sequence.
- an intermediary RNA comprises a protein binding sequence that can form a secondary structure (e.g., hairpin, stem-loop), wherein the secondary structure can be recognized and bound by an effector protein.
- an intermediary RNA comprises a repeat hybridization sequence that hybridizes to at least a portion of a repeat sequence of a crRNA.
- a repeat hybridization sequence is at the 3 ’ end of an intermediary RNA.
- Type V Cas proteins bind an intermediary RNA with a repeat hybridization a the 3 ’ end of the intermediary RNA.
- a repeat hybridization sequence is at the 3’ end of an intermediary RNA.
- Type II Cas proteins bind an intermediary RNA with a repeat hybridization athe 5 ’ end of the intermediary RNA.
- the length of the repeat hybridization sequence is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, or 20 linked nucleotides.
- the length of the repeat hybridization sequence is 1 to 10, 10-20, or 20-30 linked nucleotides.
- the 3’ end of the intermediary RNA (in the case of Type V systems) or the 3 ’ end of the crRNA (in the case of Type II systems) is complementary or hybridizable to the 5 ’ end of a template sequence described herein.
- at least 2, at least 3, at least 4, at least 5, or at least 10 nucleotides are complementary.
- production of an intermediary RNA for a Type V effector protein is driven by a U6 promoter which adds three uracils to the 3’ end of the intermediary RNA.
- the 3’ end of the intermediary RNA hybridizes to the 5’ end of the template sequence of the crRNA may comprise three adenosines.
- the intermediary RNA comprises an extension sequence.
- an intermediary RNA comprising an extension sequence is an extended intRNA.
- systems comprise a Type II Cas effector protein and an extended intermediary RNA, wherein the extended intermediary RNA comprises an extension sequence at the 5’ end of the intermediary RNA.
- the intermediary RNA does not comprise an extension sequence at the 3 ’ end of the intermediary RNA.
- the extension sequence comprises a template sequence.
- the extension sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides.
- the template sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides.
- the extension sequence is not greater than 200, 500, or 1000 nucleotides.
- the extension sequence is not greater than lOkb or not greater than 20kb.
- the template sequence cannot hybridize to the target sequence or the reverse complement thereof.
- the template sequence is less than 100%, less than 99%, less than 98% less than 95%, less than 90%, less than 80%, less than 70%, less than 60%, or less than 50% complementary to the target sequence or the reverse complement thereof.
- the intermediary RNA does not comprise an extension sequence on the 3 ’ end.
- the intermediary RNA consists essentially of a protein binding sequence, optionally a linker sequence, and a repeat hybridization sequence.
- the intermediary RNA comprises an extension sequence at its 3 ’ end that is not complementary to a portion of a target sequence.
- the portion of the target sequence comprises at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides.
- the intermediary RNA comprises a homology sequence. In some embodiments, the intermediary RNA comprises a homology sequence, wherein the effector protein is a type V Cas protein. In certain embodiments, an intermediary RNA comprising a homology sequence does not comprise an extension sequence. In some embodiments, the homology sequence is located at the 5’ end of the intermediary RNA. In some embodiments, the homology sequence is located at the 3’ end of the intermediary RNA In some embodiments, the homology sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the homology sequence is not greater than 100 nucleotides.
- an intermediary RNA comprising a homology sequence is part of a dual nucleic acid system wherein the crRNA comprises an extension sequence.
- the extension sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the extension sequence is not greater than 100 nucleotides. In some embodiments, the homology sequence has 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mismatched nucleotides to the extension sequence.
- a length of the protein binding sequence is at least 30, 50, 70, 90, 110, 130, 150, 170, 190, or 210 linked nucleotides. In some embodiments, a length of the protein binding sequence is not greater than 30, 50, 70, 90, 110, 130, 150, 170, 190, or 210 linked nucleotides. In some embodiments, the length of the protein binding sequence is about 30 to about 210, about 60 to about 210, about 90 to about 210, about 120 to about 210, about 150 to about 210, about 180 to about 210, about 30 to about 180, about 60 to about 180, about 90 to about 180, about 120 to about 180, or about 150 to about 180 linked nucleotides.
- guide nucleic acids comprise one or more linkers connecting different nucleotide sequences as described herein.
- a linker may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotides.
- a linker may be any suitable linker, examples of which are described herein.
- a linker is a degradable linker or a cleavable linker. In some embodiments, a linker is a self-cleavable linker. Examples of self-cleavable polypeptide linkers include T2A
- Effector proteins of the present disclosure may cleave or nick a target nucleic acid within or near a protospacer adjacent motif (PAM) sequence of the target nucleic acid.
- the target nucleic acid is a double stranded nucleic acid comprising a target strand and a non-target strand.
- cleavage occurs within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides of a 5’ or 3’ terminus of a PAM sequence.
- effector proteins described herein recognize a PAM sequence.
- recognizing a PAM sequence comprises interacting with a sequence adjacent to the PAM.
- a target nucleic acid comprises a target sequence that is adjacent to a PAM sequence.
- the effector protein does not require a PAM to bind and/or cleave a target nucleic acid.
- PAMs are provided in TABLE 6.
- a target nucleic acid is a double stranded nucleic acid comprising a target strand and a non-target strand.
- the PAM sequence is located on the target strand.
- the PAM sequence is located on the non-target strand.
- the PAM sequence described herein is adjacent (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides) to the target sequence on the target strand or the non-target strand.
- the PAM sequence is located 5 ’ of the target sequence on the non-target strand.
- such a PAM described herein is directly adjacent to the target sequence on the target strand or the non-target strand.
- an RNP cleaves the target strand or the non-target strand. In some embodiments, the RNP cleaves both, the target strand and the non-target strand. In some embodiments, an RNP recognizes the PAM sequence, and hybridizes to a target sequence of the target nucleic acid. In some embodiments, the RNP cleaves the target nucleic acid, wherein the RNP has recognized the PAM sequence and is hybridized to the target sequence. In some embodiments, cleavage occurs within 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides of a 5’ or 3’ terminus of a PAM sequence.
- compositions, systems and methods for modifying a target nucleic acid are disclosed herein.
- the target nucleic acid is a double stranded DNA molecule.
- Non-limiting examples of target nucleic acids are provided in TABLE 6.
- target nucleic acids described herein comprise a mutation.
- a composition, system or method described herein can be used to edit a target nucleic acid comprising a mutation such that the mutation is edited to be the wild-type nucleotide or nucleotide sequence.
- a composition, system or method described herein can be used to detect a target nucleic acid comprising a mutation.
- a mutation may result in the insertion of at least one amino acid in a protein encoded by the target nucleic acid.
- a mutation may result in the deletion of at least one amino acid in a protein encoded by the target nucleic acid.
- a mutation may result in the substitution of at least one amino acid in a protein encoded by the target nucleic acid.
- a mutation that results in the deletion, insertion, or substitution of one or more amino acids of a protein encoded by the target nucleic acid may result in misfolding of a protein encoded by the target nucleic acid.
- a mutation may result in a premature stop codon, thereby resulting in a truncation of the encoded protein.
- Non-limiting examples of mutations are insertion-deletion (indel), a point mutation, single nucleotide polymorphism (SNP), a chromosomal mutation, a copy number mutation or variation, and frameshift mutations.
- an indel mutation is an insertion or deletion of one or more nucleotides.
- a point mutation comprises a substitution, insertion, or deletion.
- a frameshift mutation occurs when the number of nucleotides in the insertion/deletion is not divisible by three, and it occurs in a protein coding region.
- a chromosomal mutation can comprise an inversion, a deletion, a duplication, or a translocation of one or more nucleotides.
- a copy number variation can comprise a gene amplification or an expanding trinucleotide repeat.
- an SNP is associated with a phenotype of the sample or a phenotype of the organism from which the sample was taken.
- an SNP is associated with altered phenotype from wild type phenotype.
- the SNP is a synonymous substitution or a nonsynonymous substitution.
- the nonsynonymous substitution is a missense substitution or a nonsense point mutation.
- the synonymous substitution is a silent substitution.
- compositions, systems, and methods described herein comprise a nucleic acid expression vector or a use thereof.
- the nucleic acid of interest comprises a nucleotide sequence that encodes one or more components of the composition or system described herein.
- a vector may be part of a vector system.
- the vector system may comprise a library of vectors each encoding one or more component of a composition or system described herein.
- components described herein e.g., an effector protein, a guide nucleic acid, a reverse transcriptase, an intermediary RNA, an extended crRNA, or a combination thereof
- components described herein are each encoded by different vectors of the system.
- a vector may comprise or encode one or more regulatory elements. Regulatory elements may refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence or a coding sequence and/or regulate translation of an encoded polypeptide.
- a vector may comprise or encode for one or more additional elements, such as, for example, replication origins, antibiotic resistance (or a nucleic acid encoding the same), a tag (or a nucleic acid encoding the same), selectable markers, and the like.
- a vector comprises or encodes for one or more elements, such as, for example, ribosome binding sites, and RNA splice sites.
- Vectors described herein generally encode a promoter - a regulatory region on a nucleic acid, such as a DNA sequence, capable of initiating transcription of a downstream (3' direction) coding or non-coding sequence.
- a promoter can be linked at its 3' terminus to a nucleic acid, the expression or transcription of which is desired, and extends upstream (5' direction) to include bases or elements necessary to initiate transcription or induce expression, which could be measured at a detectable level.
- a promoter can comprise a nucleotide sequence, referred to herein as a “promoter sequence”.
- the promoter sequence can include a transcription initiation site, and one or more protein binding domains responsible for the binding of transcription machinery, such as RNA polymerase.
- promoters When eukaryotic promoters are used, such promoters can contain “TATA” boxes and “CAT” boxes.
- Various promoters, including inducible promoters, may be used to drive expression, i. e. , transcriptional activation, of the nucleic acid of interest. Accordingly, in some embodiments, the nucleic acid of interest can be operably linked to a promoter.
- Promotors may be any suitable type of promoter envisioned for the compositions, systems, and methods described herein. Examples include constitutively active promoters (e.g., CMV promoter), inducible promoters (e.g., heat shock promoter, tetracycline -regulated promoter, steroid-regulated promoter, metal-regulated promoter, estrogen receptor-regulated promoter, etc.), spatially restricted and/or temporally restricted promoters (e.g., a tissue specific promoter, a cell type specific promoter, etc.), etc.
- constitutively active promoters e.g., CMV promoter
- inducible promoters e.g., heat shock promoter, tetracycline -regulated promoter, steroid-regulated promoter, metal-regulated promoter, estrogen receptor-regulated promoter, etc.
- spatially restricted and/or temporally restricted promoters e.g., a tissue specific promoter, a cell type specific promoter, etc.
- Suitable promoters include, but are not limited to: SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6), an enhanced U6 promoter, and a human Hl promoter (Hl).
- SV40 early promoter mouse mammary tumor virus long terminal repeat (LTR) promoter
- Ad MLP adenovirus major late promoter
- HSV herpes simplex virus
- CMV cytomegalovirus
- CMVIE CMV immediate early promoter region
- RSV rous sarcoma virus
- U6 small nuclear promoter U6 small nuclear promoter
- Hl human Hl promoter
- vectors used for providing a nucleic acid that, when transcribed, produces a guide nucleic acid and/or a nucleic acid that encodes an effector protein to a cell may include nucleic acid sequences that encode for selectable markers in the target cells, so as to identify cells that have taken up the guide nucleic acid and/or the effector protein.
- vectors provided herein comprise at least one promotor or a combination of promoters driving expression or transcription of one or more genome editing tools described herein.
- the vector comprises a nucleotide sequence of a promoter.
- the vector comprises two promoters.
- the vector comprises three promoters.
- a length of the promoter is less than about 500, less than about 400, less than about 300, or less than about 200 linked nucleotides.
- a length of the promoter is at least 100, at least 200, at least 300, at least 400, or at least 500 linked nucleotides.
- Non-limiting examples of promoters include CMV, 7SK, EFla, RPBSA, hPGK, EFS, SV40, PGK1, Ubc, human beta actin, TRE, UAS, Ac5, Polyhedrin, CaMKIIa, GALl-lO, Hl, TEF1, GDS, ADH1, CaMV35S, HSV TK, Ubi, U6, MNDU3, MSCV, MND, and CAG.
- the promoter is a constitutive promoter. In some embodiments, the promoter is an inducible promoter. In some embodiments, the inducible promoter only drives expression of its corresponding coding sequence (e.g., polypeptide or guide nucleic acid) when a signal is present, e.g., a hormone, a small molecule, a peptide.
- a signal e.g., a hormone, a small molecule, a peptide.
- Non-limiting examples of inducible promoters are the T7 RNA polymerase promoter, the T3 RNA polymerase promoter, the Isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter, a lactose induced promoter, a heat shock promoter, a tetracycline-regulated promoter (tetracycline-inducible or tetracycline-repressible), a steroid regulated promoter, a metal- regulated promoter, and an estrogen receptor-regulated promoter.
- IPTG Isopropyl-beta-D-thiogalactopyranoside
- the promoters are prokaryotic promoters (e.g., drive expression of a gene in a prokaryotic cell).
- the promoters are eukaryotic promoters, (e.g., drive expression of a gene in a eukaryotic cell).
- the promoter is EFla.
- the promoter is ubiquitin.
- vectors are bicistronic or polycistronic vector (e.g., having or involving two or more loci responsible for generating a protein) having an internal ribosome entry site (IRES) is for translation initiation in a cap-independent manner.
- a vector described herein is a nucleic acid expression vector. In some embodiments, a vector described herein is a recombinant expression vector. In some embodiments, a vector described herein is a messenger RNA. In some embodiments, a vector comprising the recombinant nucleic acid as described herein, wherein the vector is a viral vector, an adeno associated viral (AAV) vector, a retroviral vector, or a lentiviral vector. In some embodiments, a vector described herein or a recombinant nucleic acid described herein is comprised in a cell. In some embodiments, a recombinant nucleic acid integrated into a genomic DNA sequence of the cell, wherein the cell is a eukaryotic cell or a prokaryotic cell.
- a vector described herein is a delivery vector.
- the delivery vector is a eukaryotic vector, a prokaryotic vector (e.g., a bacterial vector) a viral vector, or any combination thereof.
- the delivery vehicle is a non-viral vector.
- the delivery vector is a plasmid.
- the plasmid comprises DNA.
- the plasmid comprises RNA.
- the plasmid comprises circular doublestranded DNA.
- the plasmid is linear.
- the plasmid comprises one or more coding sequences of interest and one or more regulatory elements.
- the plasmid comprises a bacterial backbone containing an origin of replication and an antibiotic resistance gene or other selectable marker for plasmid amplification in bacteria.
- the plasmid is a minicircle plasmid.
- the plasmid contains one or more genes that provide a selective marker to induce a target cell to retain the plasmid.
- the plasmids are engineered through synthetic or other suitable means known in the art.
- the genetic elements are assembled by restriction digest of the desired genetic sequence from a donor plasmid or organism to produce ends of the DNA which is then be readily ligated to another genetic sequence.
- vectors comprise an enhancer.
- Enhancers are nucleotide sequences that have the effect of enhancing promoter activity.
- enhancers augment transcription regardless of the orientation of their sequence.
- enhancers activate transcription from a distance of several kilo basepairs.
- enhancers are located optionally upstream or downstream of a gene region to be transcribed, and/or located within the gene, to activate the transcription.
- Exemplary enhancers include, but are not limited to, WPRE; CMV enhancers; the R-U5' segment in LTR of HTLV-I.
- a vector described herein comprises a viral vector.
- the viral vector comprises a nucleic acid to be delivered into a host cell by a recombinantly produced virus or viral particle.
- the nucleic acid may be single-stranded or double stranded, linear or circular, segmented or non-segmented.
- the nucleic acid may comprise DNA, RNA, or a combination thereof.
- the vector is an adeno-associated viral vector.
- viral vectors that are associated with various types of viruses, including but not limited to retroviruses (e.g., lentiviruses and y- retroviruses), adenoviruses, arenaviruses, alphaviruses, adeno-associated viruses (AAVs), baculoviruses, vaccinia viruses, herpes simplex viruses and poxviruses.
- retroviruses e.g., lentiviruses and y- retroviruses
- adenoviruses e.g., lentiviruses and y- retroviruses
- AAVs adeno-associated viruses
- baculoviruses baculoviruses
- vaccinia viruses herpes simplex viruses and poxviruses.
- the vector is an adeno- associated viral (AAV) vector.
- the viral vector is a recombinant viral vector.
- the vector is a retroviral vector.
- the retroviral vector comprises gamma-retroviral vector.
- a viral vector provided herein may be derived from or based on any such virus.
- the gamma-retroviral vector is derived from a Moloney Murine Leukemia Virus (MoMLV, MMLV, MuLV, or MLV) or a Murine Stem cell Virus (MSCV) genome.
- the lentiviral vector is derived from the human immunodeficiency virus (HIV) genome.
- the viral vector is a chimeric viral vector.
- the chimeric viral vector comprises viral portions from two or more viruses.
- the viral vector corresponds to a virus of a specific serotype.
- a viral vector is an adeno-associated viral vector (AAV vector).
- AAV vector adeno-associated viral vector
- a viral particle that delivers a viral vector described herein is an AAV.
- the AAV comprises any AAV known in the art.
- the viral vector corresponds to a virus of a specific AAV serotype.
- the AAV serotype is selected from an AAV1 serotype, an AAV2 serotype, AAV3 serotype, an AAV4 serotype, AAV5 serotype, an AAV6 serotype, AAV7 serotype, an AAV8 serotype, an AAV9 serotype, an AAV10 serotype, an AAV11 serotype, an AAV12 serotype, an AAV-rhlO serotype, and any combination, derivative, or variant thereof.
- the AAV vector is a recombinant vector, a hybrid AAV vector, a chimeric AAV vector, a self-complementary AAV (scAAV) vector, a single-stranded AAV, or any combination thereof.
- scAAV genomes are generally known in the art and contain both DNA strands which can anneal together to form double-stranded DNA.
- an AAV vector described herein is a chimeric AAV vector.
- the chimeric AAV vector comprises an exogenous amino acid or an amino acid substitution, or capsid proteins from two or more serotypes.
- a chimeric AAV vector may be genetically engineered to increase transduction efficiency, selectivity, or a combination thereof.
- AAV vector described herein comprises two inverted terminal repeats (ITRs).
- the viral vector provided herein comprises two inverted terminal repeats of AAV.
- a nucleotide sequence between the ITRs of an AAV vector provided herein comprises a sequence encoding genome editing tools.
- the genome editing tools comprise a nucleic acid encoding one or more effector proteins, a nucleic acid encoding one or more fusion proteins (e.g., a nuclear localization signal (NLS), polyA tail), one or more guide nucleic acids, a nucleic acid encoding the one or more guide nucleic acids, respective promoter(s), one or more donor nucleic acid, or any combinations thereof.
- viral vectors provided herein comprise at least one promotor or a combination of promoters driving expression or transcription of one or more genome editing tools described herein.
- a coding region of the AAV vector forms an intramolecular double-stranded DNA template thereby generating the AAV vector that is a self-complementary AAV (scAAV) vector.
- the scAAV vector comprises the sequence encoding genome editing tools that has a length of about 2 kb to about 3 kb.
- the AAV vector provided herein is a self-inactivating AAV vector.
- the AAV vector provided herein comprises a modification, such as an insertion, deletion, chemical alteration, or synthetic modification, relative to a wild-type AAV vector.
- methods of producing AAV delivery vectors herein comprise packaging a nucleic acid encoding an effector protein and a guide nucleic acid, or a combination thereof, into an AAV vector.
- methods of producing the delivery vector comprises, (a) contacting a cell with at least one nucleic acid encoding: (i) a guide nucleic acid; (ii) a Replication (Rep) gene; and (iii) a Capsid (Cap) gene that encodes an AAV capsid protein; (b) expressing the AAV capsid protein in the cell; (c) assembling an AAV particle; and (d) packaging an effector encoding nucleic acid into the AAV particle, thereby generating an AAV delivery vector.
- promoters, staffer sequences, and any combination thereof may be packaged in the AAV vector.
- the AAV vector may package 1, 2, 3, 4, or 5 guide nucleic acids or copies thereof.
- the AAV vector comprises inverted terminal repeats, e.g., a 5’ inverted terminal repeat and a 3’ inverted terminal repeat.
- the AAV vector comprises a mutated inverted terminal repeat that lacks a terminal resolution site.
- a hybrid AAV vector is produced by transcapsidation, e.g., packaging an inverted terminal repeat (ITR) from a first serotype into a capsid of a second serotype, wherein the first and second serotypes may be not the same.
- the Rep gene and ITR from a first AAV serotype e.g., AAV2
- a second AAV serotype e.g., AAV9
- a hybrid AAV serotype comprising the AAV2 ITRs and AAV9 capsid protein may be indicated AAV2/9.
- the hybrid AAV delivery vector comprises an AAV2/1, AAV2/2, AAV 2/4, AAV2/5, AAV2/8, or AAV2/9 vector.
- AAV particles described herein are recombinant AAV (rAAV).
- rAAV particles are generated by transfecting AAV producing cells with an AAV-containing plasmid carrying the sequence encoding the genome editing tools, a plasmid that carries viral encoding regions, i.e., Rep and Cap gene regions; and a plasmid that provides the helper genes such as E1A, E1B, E2A, E4ORF6 and VA.
- the AAV producing cells are mammalian cells.
- host cells for rAAV viral particle production are mammalian cells.
- a mammalian cell for rAAV viral particle production is a COS cell, a HEK293T cell, a HeLa cell, a KB cell, a variant thereof, or a combination thereof.
- rAAV virus particles can be produced in the mammalian cell culture system by providing the rAAV plasmid to the mammalian cell.
- producing rAAV virus particles in a mammalian cell comprises transfecting vectors that express the rep protein, the capsid protein, and the gene-of-interest expression construct flanked by the ITR sequence on the 5’ and 3’ ends.
- rAAV is produced in a non-mammalian cell.
- rAAV is produced in an insect cell.
- the insect cell for producing rAAV viral particles comprises a Sf9 cell.
- production of rAAV virus particles in insect cells may comprise baculovirus.
- production of rAAV virus particles in insect cells may comprise infecting the insect cells with three recombinant baculoviruses, one carrying the cap gene, one carrying the rep gene, and one carrying the gene-of-interest expression construct enclosed by an ITR on both the 5’ and 3’ end.
- rAAV virus particles are produced by the One Bac system.
- rAAV virus particles can be produced by the Two Bac system.
- the rep gene and the cap gene of the AAV is integrated into one baculovirus virus genome, and the ITR sequence and the gene-of-interest expression construct is integrated into another baculovirus virus genome.
- an insect cell line that expresses both the rep protein and the capsid protein is established and infected with a baculovirus virus integrated with the ITR sequence and the gene-of-interest expression construct. Details of such processes are provided in, for example, Smith et. al., (1983), Mol. Cell.
- compositions and systems provided herein comprise a lipid particle.
- a lipid particle is a lipid nanoparticle (LNP).
- LNPs are a non-viral delivery system for delivery of the composition and/or system components described herein. LNPs are particularly effective for delivery of nucleic acids. Beneficial properties of LNP include ease of manufacture, low cytotoxicity and immunogenicity, high efficiency of nucleic acid encapsulation and cell transfection, multi-dosing capabilities and flexibility of design (Kulkami et al., (2016) Nucleic Acid Therapeutics, 28(3): 146-157).
- compositions and methods comprise a lipid, polymer, nanoparticle, or a combination thereof, or use thereof, to introduce one or more effector proteins, one or more guide nucleic acids, one or more donor nucleic acids, or any combinations thereof to a cell.
- lipids and polymers are cationic polymers, cationic lipids, ionizable lipids, or bio-responsive polymers.
- the ionizable lipids exploits chemical-physical properties of the endosomal environment (e.g. , pH) offering improved delivery of nucleic acids.
- the ionizable lipids are neutral at physiological pH.
- the ionizable lipids are protonated under acidic pH.
- the bio-responsive polymer exploits chemical-physical properties of the endosomal environment (e.g., pH) to preferentially release the genetic material in the intracellular space.
- a LNP comprises an outer shell and an inner core.
- the outer shell comprises lipids.
- the lipids comprise modified lipids.
- the modified lipids comprise pegylated lipids.
- the lipids comprise one or more of cationic lipids, anionic lipids, ionizable lipids, and non-ionic lipids.
- the LNP comprises one or more of Nl,N3,N5-tris(3-(didodecylamino)propyl)benzene-l,3,5-tricarboxamide (TT3), 2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), l-palmitoyl-2-oleoylsn-glycero-3- phosphoethanolamine (POPE), l,2-distearoyl-sn-glycero-3 -phosphocholine (DSPC), cholesterol (Choi), 1,2-dimyristoyl-sn-glycerol, and methoxypolyethylene glycol (DMG-PEChooo), derivatives, analogs, or variants thereof.
- DOPE 2-dioleoyl-sn-glycero-3-phosphoethanolamine
- POPE l-palmitoyl-2-oleoylsn-glycero-3- phosphoethanolamine
- DSPC l
- the LNP has a negative net overall charge prior to complexation with one or more of a guide nucleic acid, a nucleic acid encoding the one or more guide nucleic acid, a nucleic acid encoding the effector protein, and/or a donor nucleic acid.
- the inner core is a hydrophobic core.
- the one or more of a guide nucleic acid, the nucleic acid encoding the one or more guide nucleic acid, the nucleic acid encoding the effector protein, and/or the donor nucleic acid forms a complex with one or more of the cationic lipids and the ionizable lipids.
- the nucleic acid encoding the effector protein or the nucleic acid encoding the guide nucleic acid is self-replicating.
- a LNP comprises one or more of cationic lipids, ionizable lipids, and modified versions thereof.
- the ionizable lipid comprises TT3 or a derivative thereof.
- the LNP comprises one or more of TT3 and pegylated TT3.
- the publication WO2016187531 is hereby incorporated by reference in its entirety, which describes representative LNP formulations in Table 2 and Table 3, and representative methods of delivering LNP formulations in Example 7.
- a LNP comprises a lipid composition targeting to a specific organ.
- the lipid composition comprises lipids having a specific alkyl chain length that controls accumulation of the LNP in the specific organ (e.g., liver or spleen).
- the lipid composition comprises a biomimetic lipid that controls accumulation of the LNP in the specific organ (e.g. , brain).
- the lipid composition comprises lipid derivatives (e.g., cholesterol derivatives) that controls accumulation of the LNP in a specific cell (e.g., liver endothelial cells, Kupffer cells, hepatocytes).
- administration of a non-viral vector comprises contacting a cell, such as a host cell, with the non-viral vector.
- a physical method or a chemical method is employed for delivering the vector into the cell.
- Exemplary physical methods include electroporation, gene gun, sonoporation, magnetofection, or hydrodynamic delivery.
- Exemplary chemical methods include delivery of the recombinant polynucleotide by liposomes such as, cationic lipids or neutral lipids; lipofection; dendrimers; lipid nanoparticle (LNP); or cell-penetrating peptides.
- methods for modifying comprise cleaving a target nucleic acid and inserting one a donor nucleic acid at the cut site of the target nucleic acid.
- Methods of modifying may comprise contacting a target nucleic acid with one or more components of the systems described herein.
- Methods may comprise contacting a cell that comprises a target nucleic acid with one or more components of the systems described herein.
- Methods may comprise delivering to a subject one or more components of the systems described herein.
- a cleaved target nucleic acid is repaired by homologous recombination (e.g. , homology directed repair (HDR)) or non-homologous end joining (NHEJ).
- HDR homology directed repair
- NHEJ non-homologous end joining
- a doublestranded break in the target nucleic acid may be repaired (e.g. , by NHEJ or HDR) after insertion of a donor nucleic acid.
- a nucleotide insertion and/or deletion sometimes referred to as an indel occurs at a cleavage site.
- An indel may vary in length (e.g. , 1 to 1,000 nucleotides in length) and be detected using methods well known in the art, including sequencing.
- Indel percentage is the percentage of sequencing reads that show at least one nucleotide has been mutation that results from the insertion and/or deletion of nucleotides regardless of the size of insertion or deletion, or number of nucleotides mutated. For example, if there is at least one nucleotide deletion detected in a given target nucleic acid, it counts towards the percent indel value.
- the target nucleic acid As another example, if one copy of the target nucleic acid has one nucleotide deleted, and another copy of the target nucleic acid has 10 nucleotides deleted, they are counted the same. This number reflects the percentage of target nucleic acids that are edited by a given effector protein.
- TABLE 1 provides illustrative amino acid sequences of effector proteins that are useful in the compositions, systems and methods described herein.
- TABLE 1.1 provides illustrative reverse transcriptases that are useful in the compositions, systems and methods described herein.
- TABLE 1.1 Exemplary Reverse Transcriptases
- TABLE 2 provides illustrative sequences of exemplary heterologous polypeptides useful in the compositions, systems and methods described herein.
- TABLE 3 provides illustrative repeat sequences for use in guide nucleic acids that are useful in the compositions, systems and methods described herein.
- TABLE 4 provides illustrative protein binding sequences for use in guide nucleic acids that are useful in the compositions, systems and methods described herein. TABLE 4. Exemplary Protein Binding Sequences for use in Guide Nucleic Acids
- TABLE 5 provides illustrative PAM sequences that are useful in the compositions, systems and methods described herein.
- TABLE 6 provides illustrative target nucleic acids that are useful in the compositions, systems and methods described herein.
- CA4 CACNA1A, CAH1, CAPN3, CASR, CBS, CCNB1 CC2D2A, CCR5, CD1, CD2, CD3, CD3D, CD3Z, CD4, CD5, CD6, CD7, CD8A, CD8B, CD9, CD14, CD18, CD19, CD21, CD22, CD23, CD27, CD28, CD30, CD33, CD34, CD36, CD38, CD40, CD40L, CD44, CD46, CD47, CD48, CD52, CD55, CD57, CD58, CD59, CD68, CD69, CD72, CD73, CD74, CD79A, CD80, CD81, CD83, CD84, CD86, CD90, CD93, CD96, CD99, CD 100, CD 123, CD 160, CD 163, CD 164, CD164L2, CD 166, CD200, CD204, CD207, CD209, CD226, CD244, CD247, CD274, CD276, CD300, CD320, CDC73, CDH1, CD
- CIDEB CIDEB, CIITA, CLN3, CLN5, CLN6, CLN8, CLRN1, CLTA, CMT1A, CNBP, CNGB1, CNGB3, C0L1A1, C0L1A2, COL27A1, COL4A3, COL4A4, COL4A5, C0L6A1, COL6A2, COL6A3, C0L7A1, CPS1, CPT1A, CPT2, CRB1, CREBBP, CRX, CRYAA, CTNNA1, CTNNB1, CTNND2, CTNS, CTSK, CXCL12, CYBA, CYBB, CYP11B1, CYP11B2, CYP17A1, CYP19A1, CYP21A2, CYP27A1, DBT, DCC, DCLRE1C, DERI.2.
- TABLE 7 provides illustrative diseases and syndromes for compositions, systems and methods described herein.
- Example 1 Type V OCATS [215]
- a fusion protein comprising a reverse transcriptase (RT) fused to the N terminus or C terminus of a Type V Cas effector protein is generated by constructing a plasmid that encodes the fusion protein.
- Nonlimiting examples of the RT are MLV-RT, a Group I intron RT, and a Group II intron RT.
- Non -limiting examples of Type V Cas effector proteins are CasM.265446 (SEQ ID NO: 1) and CasM. 19952 (SEQ ID NO: 2).
- the plasmid also encodes an intermediary RNA and an extended crRNA.
- the intermediary RNA comprises from 5’ to 3’: a repeat hybridization sequence, and a protein binding sequence.
- the extended crRNA comprises from 5’ to 3’: a template sequence, a repeat sequence that hybridizes to the repeat hybridization sequence of the intermediary RNA, and a spacer sequence that hybridizes to a target sequence of a target strand of a dsDNA target nucleic acid.
- Eukaryotic cells are transfected with the plasmid.
- the plasmid is an AAV vector.
- the intermediary RNA and crRNA are synthesized or encoded by a second plasmid, and then complexed with the translated protein for delivery to the cells.
- the Type V Cas effector protein forms a ribonucleoprotein (RNP) complex with the intermediary RNA and crRNA.
- the RNP complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid.
- the RNP complex cleaves at least one strand of the R-loop to produce a cut site.
- the RT reverse transcribes the template sequence to produce a RT-DNA to be used as a ss donor nucleic acid at the 3’ end of the intermediary RNA.
- This edit is incorporated into the DNA via native repair mechanisms. Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
- a plasmid encoding a fusion protein, intermediary RNA and extended crRNA is constructed as described in Example 1.
- the plasmid additionally encodes a domain capable of recruiting an endogenous DNA ligase expressed in eukaryotic cells.
- the plasmid encodes a DNA ligase.
- the protein binding domain or ligase is fused as part of a multi-domain fusion consisting of a Cas effector, an RT, and a recruitment domain or a ligase. This fusion can be in any order. Domains can be separated by linkers or can be recruited by domain-aptamer fusions, that recognize RNA sequences added to the intermediary RNA or crRNA.
- Eukaryotic cells are transfected with the plasmid.
- the plasmid is an AAV vector.
- the Type V Cas effector protein forms an RNP complex with the intermediary RNA and crRNA.
- the RNP complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid.
- the RNP complex cleaves at least one strand of the R-loop to produce a cut site.
- the RT reverse transcribes the template sequence to produce a RT-DNA to be used as a ss donor nucleic acid at the 3’ end of the intermediary RNA.
- the ligase is recruited to the cut site where it ligates insertion of the ss donor nucleic acid into the cut site. Resolution of the complex is enabled by DNA repair. Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
- Example 3 Type V OCATS with SDSA [219]
- a plasmid encoding a fusion protein, intermediary RNA and extended crRNA is constructed as described in Example 1.
- the plasmid additionally encodes a peptide capable of interacting with protein(s) involved in synthesis dependent strand annealing (SDSA) which are endogenously expressed in eukaryotic cells.
- SDSA synthesis dependent strand annealing
- the peptide is fused as part of a multi-domain fusion consisting of a Cas effector, an RT, and a recruitment domain. This fusion can be in any order.
- Eukaryotic cells are transfected with the plasmid.
- the plasmid is an AAV vector.
- the Type V Cas effector protein forms a RNP complex with the intermediary RNA and crRNA.
- the complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid.
- the complex cleaves at least one strand of the R-loop to produce a cut site.
- the RT reverse transcribes the template sequence to produce a RT-DNA to be used as a ss donor nucleic acid at the 3’ end of the intermediary RNA.
- the SDSA related proteins are recruited to the cut site where the protein generates a second strand of DNA that is complementary to the ss donor nucleic acid to produce edited DNA that is directly attached to the genomic DNA.
- This edit is incorporated into the DNA via native repair mechanisms. Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
- a fusion protein comprising a reverse transcriptase (RT) fused to the N terminus or C terminus of a Type II Cas effector protein is generated by constructing a plasmid that encodes the fusion protein.
- Nonlimiting examples of the RT are MLV-RT, a Group I intron RT, and a Group II intron RT.
- a non-limiting example of a Type II Cas effector protein is SpCas9.
- the plasmid also encodes an extended intermediary RNA and a crRNA.
- the extended intermediary RNA comprises from 5’ to 3’: a template sequence, a repeat hybridization sequence , and a protein binding sequence.
- the crRNA comprises from 5’ to 3’: a spacer sequence that hybridizes to a target sequence of a target strand of a dsDNA target nucleic acid and a repeat sequence that hybridizes to the repeat hybridization sequence of the intermediary RNA.
- Eukaryotic cells are transfected with the plasmid.
- the plasmid is an AAV vector.
- the intermediary RNA and crRNA are synthesized or encoded by a second plasmid, and then complexed with the translated protein for delivery to the cells.
- the Type II Cas effector protein forms a RNP complex with the intermediary RNA and crRNA.
- the RNP complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid.
- the RNP complex cleaves at least one strand of the R-loop to produce a cut site.
- the RT reverse transcribes the template sequence to produce a RT-DNA to be used as a ss donor nucleic acid at the 3’ end of the crRNA. Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
- An AAV vector is constructed to express a fusion protein comprising a retron fused to the N terminus or C terminus of a Type V Cas effector protein.
- retrons are Ec86 retron and Sal63 retron.
- Type V Cas effector proteins are CasM.265446 (SEQ ID NO: 1) and CasM. 19952 (SEQ ID NO: 2).
- the AAV vector additionally encodes an intermediary RNA, a crRNA, and a template sequence encoding RNA flanked by two secondary structures.
- Eukaryotic cells are transfected with the AAV vector.
- the Type V Cas effector protein forms a RNP complex with the intermediary RNA and crRNA.
- the complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid.
- the complex cleaves at least one strand of the R-loop to produce a cut site.
- the retron reverse transcribes the donor encoding RNA, thereby producing a RT-DNA to be used as a donor nucleic acid comprising secondary elements at 5’ and 3 ’ ends of the donor.
- Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
- Example 6 CasM.265466 cleaves DNA with an extended crRNA and RT synthesizes RT-DNA to be used as a donor nucleic acid
- RT-DNA synthesis from intRNA-20, intRNA-22, and intRNA-23 was observed in the presence of CasM.265466, see FIG. 5.
- Expected product size was 230-250 nucleotides. Difference in observed size may be due to the resulting product being a DNA/RNA hybrid which will run differently than pure RNA on the gel.
- RT-DNA synthesis was also observed before cutting.
- the following figure shows RT is able to synthesize the intRNA in the absence of CasM.265466, see FIG. 6.
- Expected product size (230-250 nt). Difference in observed size may be due to the resulting product being a DNA/RNA hybrid which will run differently than pure RNA on the gel.
- RT-DNA is synthesized in presence of CasM.19952 and extended crRNA.
- Ribonucleoprotein (RNP) complexes were produced by incubating CasM. 19952 with various intRNAs and extended crRNAs for 20 min at room temp. Reverse transcriptase (RT) was added to the RNPs and incubated for 30 min at 37°C for OCAT synthesis. Synthesis occurred for intRNA-4 and intRNA- 13 in presence of CasM.19952, see FIG. 9A and FIG. 9B. IntRNA numbers in FIG. 9A and FIG. 9B correspond to the length of the intRNA. Expected product size was 230-250 nt.
- RT-DNA occurred in the presence of CasM.19952. Synthesis occurred for intRNA-5 and intRNA-6 in the presence of CasM. 19952 ( ⁇ 300 nt). See FIG. 11A and FIG. 11B. RT was able to synthesize without the need for a longer intRNA. IntRNA-9 and intRNA- 10 were negative controls due to lack of complementarity to crRNA-492F (with RT left panel; without RT right panel).
- RT-DNA is synthesized in presence of Cas9 and extended intRNA.
- RNP Ribonucleoprotein
- SpCas9 SEQ ID NO: 5
- RT Reverse transcriptase
- Ribonucleoprotein (RNP) complexes were formed by incubating CasM.265466 protein (200nM) with intRNA (100 nM) and crRNA (100 nM) together for 20 min at room temperature. crRNAs and extended intRNAs used for this experiment are provided in TABLE 8 below; the extended portion of the intRNAs are lowercase. 1 pl of reverse transcriptase enzyme (RT) was added to the RNA (200 units/ul) and incubated for 30 min at 37°C for intRNA synthesis of RT-DNA. Target DNA was added after RT- DNA synthesis to test for cleavage and potential transDNAse activity. Samples were run on a 15% denaturing polyacrylamide gel for analysis.
- RT reverse transcriptase enzyme
- Synthesis products were generated using crRNA 1471 F and intRNAs with homology sequence lengths of 13, 21 and 29 nucleotides.
- the homology sequence may allow reverse transcriptase access to the RNA for synthesis of RT-DNA (e.g., the longer the intRNA, the less the chance of the Cas nuclease blocking the RT for synthesis of RT-DNA).
- These homology sequences contained 1 nucleotide mismatch relative to the crRNA (with the exception of int-20, which has perfect homology to the crRNA).
- the expected synthesis product size was 250 nucleotides, but the actual size was closer to 300 nucleotides.
- FIG. 18 shows RNase digestion post synthesis to remove RNA for better visualization of DNA products, the expected DNA lengths post digestion are about 175 nt.
- a stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced).
- FIG. 19A and FIG. 19B show that synthesized cDNA was not affected by any trans DNAse activity. Addition of target DNA post synthesis did not result in trans DNAse cleavage of cDNA products, and the synthesis product did not prevent proper target DNA cleavage.
- Ribonucleoprotein (RNP) complexes were formed by incubating CasM.19952 protein (200 nM) with intRNA (100 nM) and crRNA (100 nM) together for 20 min at room temperature. crRNAs and intRNAs used for this experiment are provided in TABLE 9 below, (the repeat sequence of the crRNA is shown in bold font and the spacer sequence is shown in italicized font; the extended portion of the intRNAs are lowercase). 1 pl of reverse transcriptase enzyme (RT) was added to the RNA (200 units/ul) and incubated for 30 min at 37°C for intRNA synthesis of RT-DNA. Target DNA was added after synthesis to test for cleavage and potential trans DNAse activity. Samples were run on a 15% denaturing polyacrylamide gel for analysis.
- RT reverse transcriptase enzyme
- FIG. 20A represents various intRNAs that were tested with homology sequence lengths of 3 to 28 nucleotides.
- FIG. 20B and FIG. 20C shows that RT was able to synthesize RT-DNA in the presence of intRNAs and CasM.19952. Synthesis products were generated using crRNA 480 F and intRNAs with homology sequence lengths of 10, 19 and 28 nucleotides. These homology sequences contained 1 nucleotide mismatch relative to the crRNA. The expected synthesis product size was 250 nucleotides, but the actual size was closer to 300 nucleotides. The difference in observed size may be due to the synthesis product being a DNA/RNA hybrid which will run differently than pure RNA on the gel. This was confirmed with RNA digestion. After RT synthesis, samples were heated at 95°C for 10 minutes, and RNase A added, allowing for digestion to occur at 37°C for 30 min.
- FIG. 21 shows RNase digestion post synthesis to remove RNA for better visualization of DNA products, expected DNA lengths post digestion are about 175 nt. A stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced).
- FIG. 22A and FIG. 22B show that synthesized cDNA was not affected by any trans DNAse activity. Addition of target DNA post synthesis did not result in trans DNAse cleavage of cDNA products. [241] Additional guide designs were tested with similar results.
- FIG. 23A represents various intRNAs that were tested and FIG. 23B and FIG. 23C shows that RT was able to synthesize RT-DNA in the presence of intRNAs and CasM.19952. RNA digestion confirmed the synthesized DNA products (see FIG. 24A and FIG. 24B)
- Example 11 RT synthesizes RT-DNA in the presence of intRNA and SpCas9
- Ribonucleoprotein (RNP) complexes were formed by incubating SpCas9 protein (200 nM) with intRNA (100 nM) and crRNA (100 nM) together for 20 min at room temperature.
- crRNAs and intRNAs used for this experiment are provided in TABLE 10 below, (the repeat sequence of the crRNA is shown in bold font and the spacer sequence is shown in italicized font; the extended portion of the intRNAs are italicized; the int portion of the intRNA italicized and the extension in bold font).
- 1 pl of reverse transcriptase enzyme (RT) was added to the RNA (200 units/ul) and incubated for 30 min at 37°C for intRNA synthesis of RT-DNA.
- Target DNA was added after synthesis to test for cleavage and potential trans DNAse activity. Samples were run on a 15% denaturing polyacrylamide gel for analysis.
- FIG. 25A shows that SpCas9 can cleave dsDNA with a dual guide system of crRNA and intRNA (intRNA 4242).
- FIG. 25B shows that SpCas9 can still cleave dsDNA with a dual guide system containing an extended crRNA and an extended intRNA.
- Modified crRNA crNRA-36-39 had slightly decreased cleavage activity compared to WT crRNA (crRNA-40) and sgRNA.
- FIG. 26A shows a representation of different crRNAs that were tested with homology sequence lengths of 10, 17 and 20 nucleotides that were complementary to the 5’ end of the intRNA extension sequence.
- FIG. 26B shows that RT was able to synthesize RT-DNA in the presence of crRNAs and SpCas9. Synthesis products were generated using intRNA-4242R and crRNAs with homology sequence lengths of 10 and 20 nucleotides. The expected synthesis product size was 250 nucleotides, but the observed size was closer to 300 nucleotides. The difference in observed size may be due to the synthesized product being a DNA/RNA hybrid which will run differently than pure RNA on the gel.
- Target DNA was added after synthesized product was generated.
- FIG. 27 shows that the cDNA synthesis product does not affect the ability of SpCas9 to cleave target DNA.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Peptides Or Proteins (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
Abstract
Provided herein are compositions, systems, and methods comprising effector proteins that function with one or more additional proteins (e.g., reverse transcriptases, or accessory proteins such as DNA repair proteins or endonucleases) to produce a donor DNA at the site of target DNA cleavage and uses thereof. These effector proteins may be characterized as CRISPR-associated (Cas) proteins. Various compositions, systems, and methods of the present disclosure may leverage the activities of these effector proteins for such applications as genome editing.
Description
ON CAS TEMPLATE SYNTHESIS (OCATS) SYSTEMS AND USES THEREOF
CROSS-REFERENCE
[1] This application claims the benefit of priority of U.S. Provisional Application No. 63/490,490, filed on March 15, 2023, and U.S. Provisional Application No. 63/502,909, filed on May 17, 2023, each of which are incorporated herein by reference in its entirety.
INCORPORATION BY REFERENCE OF SEQUENCE LISTING
[2] The instant application contains a Sequence Listing, which has been submitted via Patent Center. The Sequence Listing titled 203477-77660 l_PCT_SL.xml, which was created on March 7, 2024 and is 72,458 bytes in size, is hereby incorporated by reference in its entirety.
FIELD
[3] The present disclosure relates generally to compositions and methods for genome editing. In general, systems and compositions comprise an RNA guided effector protein (e.g., CRISPR associated (Cas) protein) that functions with one or more additional proteins (e.g., reverse transcriptase, DNA repair proteins, endonuclease) to produce a donor DNA at the site of target DNA cleavage. Methods of using such compositions and systems are also described.
SUMMARY
[4] Homologous recombination is a notoriously inefficient process for genome editing. In addition to being downregulated in many relevant cell-types, the donor DNA must also be exogenously supplied. Further, strategies such as homology independent targeted integration (HITI) presumably rely on nonhomology mediated end joining (NHEJ) or other non-homology dependent repair mechanisms to incorporate DNA at the site of a break. These methods too are often inefficient. One potential solution to increasing the rates of homologous repair (HR) or NHEJ mediated insertions is to recruit donors to the sites of the breaks.
[5] Disclosed herein are systems and methods that increase the rates of HR, other homology dependent forms of repair, or NHEJ mediated insertions by producing donor DNA at a Cas-mediated DNA cleavage site. These systems and methods employ a process that can be referred to as On Cas Template Synthesis (OCATS). These systems and methods provide a means to synthesize a donor DNA on an extended guide nucleic acid resulting in the production of donor DNA at the site of a Cas mediated DNA cleavage site. A Cas protein and guide nucleic acid complex recruits an RT to a cleavage site, where the RT reverse transcribes a template sequence, which is contained in an extended crRNA (for certain type V Cas proteins) or extended intermediary RNA (certain type II Cas proteins), to produce a single stranded donor (ss donor) reverse-transcribed DNA (RT-DNA) that creates a free 3’ end available for DNA repair or other chemistry. A required primer is formed from the annealed, but non-extended intermediary RNA (for type V Cas proteins) or crRNA (certain type II Cas proteins). Synthesis occurs directly on the intermediary RNA or
crRNA. See FIGS. 1 to 3. Without being bound by theory, donor creation at the site of a Cas induced DNA break may increase the efficiency of non-homology based donor DNA insertion, or increase the likelihood of homology dependent forms of repair. Systems and methods disclosed herein may be used for precise, templated repair, lowering the amount of unwanted indels while increasing the level of desired genetic change.
[6] Strategies that others have developed to recruit target donor DNA to the site of a break rely on a Cas-retron fusion and covalent or noncovalent fusions of donor to a preformed RNP complex, prior to transfection. Retrons work by reverse transcribing an RNA sequence flanked by two secondary structural elements. The resulting reverse transcribed DNA (RT-DNA) contains unwanted retron elements flanking the 5’ and 3’ of the RT-DNA. While acceptable for some applications, this limits the ability of the Cas- retron RT-DNA to be used for NHEJ or non-homology mediated applications. In contrast, systems and methods disclosed herein employ RT-Cas protein to synthesize a RT-DNA, attached to a crRNA or intermediary RNA, without such secondary structural elements leaving the RT-DNAs 3’ terminus available for NHEJ or other non-homology based forms of integration. Furthermore, Cas-retron systems with compact Cas nucleases deliverable by a single AAV vector are also described.
INCORPORATION BY REFERENCE
[7] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[8] The features of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings of which:
[9] FIG. 1 depicts an exemplary system for modifying a dsDNA target nucleic acid. A Type V Cas effector protein forms a ribonucleoprotein (RNP) complex with an extended crRNA and intermediary RNA. The RNP complex binds a target sequence of the target nucleic acid, forms an R loop at the target sequence, and cleaves at least one strand of the target sequence. A reverse transcriptase, which may be fused to the Type V Cas effector protein, generates a RT-DNA at the 3’ end of the intermediary RNA that is complementary to the 5’ extension sequence of the extended crRNA. The extension sequence may be a template sequence. The RT-DNA may be incorporated into the genome through cellular DNA repair machinery and/or exogenously added factors.
[10] FIG. 2 depicts an exemplary system for modifying a dsDNA target nucleic acid. A Type V Cas effector protein forms an RNP complex with an extended crRNA and intermediary RNA. The RNP complex
binds a target sequence of the target nucleic acid, forms an R loop at the target sequence, and cleaves at least one strand of the target sequence. The Type V Cas effector protein may remove a portion of a single strand of the R loop (ssDNA removal) through its own enzymatic activity, by recruiting an endonuclease, or via a separate strategy. A reverse transcriptase, which may be fused to the Type V Cas effector protein, generates a RT-DNA at the 3’ end of the intermediary RNA that is complementary to a template sequence at the 5’ end of the extended crRNA. The RT may be fused to a binding moiety (e.g., antibody fragment, peptide) that interacts with a DNA repair factor such as a ligase to aid in the incorporation of the RT-DNA and/or DNA break repair or may be directly fused to the DNA repair factor itself (not shown).
[11] FIG. 3 depicts an exemplary system for modifying a dsDNA target nucleic acid. A Type V Cas effector protein forms an RNP complex with an extended crRNA and in some cases, an extended intermediary RNA. The extended crRNA comprises, in order of 5’ to 3’, a template sequence, a repeat sequence, and a spacer sequence that hybridizes to the target sequence on the target strand. The RNP complex binds a target sequence of the target nucleic acid, forms an R loop at the target sequence, and cleaves both strands of the R-loop. The Type V Cas effector protein or recruited endonuclease removes a portion of the displaced non-target strand of the R loop (ssDNA removal) and nicks the target strand. A reverse transcriptase, which may be fused to the Type V Cas effector protein, generates a RT-DNA at the 3 ’ end of the intermediary RNA that is complementary to a template sequence at the 5 ’ end of the extended crRNA. The RT may be fused to (e.g. , antibody fragment, peptide) that interacts with a synthesis dependent strand annealing (SDSA) factor that incorporates a new fragment off the 3’ end of the cut target strand.
[12] The use of CasM.265466 in FIGS. 1-3 is exemplary and representative of Type V Cas proteins. It is contemplated and described herein that additional Type V Cas proteins may be employed in a similar fashion without the requirement for dimerization.
[13] FIG. 4 depicts the results of a PAGE electrophoresis demonstrating that CasM.265466 was able to cleave target DNA in the presence of extended crRNAs and intermediary RNAs.
[14] FIG. 5 depicts the results of a PAGE electrophoresis demonstrating that RTs were able to synthesize DNA in the presence of CasM.265466.
[15] FIG. 6 depicts the results of a PAGE electrophoresis demonstrating that RTs were able to synthesize DNA in the absence of CasM.265466.
[16] FIG. 7 depicts the results of a PAGE electrophoresis demonstrating that CasM.265466 was still able to cleave target DNA after DNA synthesis on the intermediary RNA (intRNA).
[17] FIG. 8 depicts the results of a PAGE electrophoresis demonstrating that trans activity of CasM.265466 did not affect the RNA-DNA hybrid.
[18] FIGS. 9A and 9B depict the results of a PAGE electrophoresis demonstrating that RT was able to synthesize without the need for a longer intermediary RNA (intRNA) and in the presence of CasM.19952.
[19] FIG. 10 depicts the results of a PAGE electrophoresis demonstrating that trans activity of CasM.19952 did not affect the RNA-DNA hybrid.
[20] FIGS. 11A and 11B depict the results of a PAGE electrophoresis demonstrating that trans activity of CasM. 19952 did not affect the RNA-DNA hybrid.
[21] FIG. 12 depicts the results of a PAGE electrophoresis demonstrating that RT was able to synthesize in the absence of spCas9.
[22] FIG. 13 depicts the results of a PAGE electrophoresis demonstrating that RT was able to synthesize in the presence of spCas9.
[23] FIG. 14 depicts the results of a PAGE electrophoresis demonstrating that spCas9 was still able to cleave target DNA after DNA synthesis on the crRNA.
[24] FIGS. 15A and 15B depict a schematic and the result of PAGE electrophoresis demonstrating that CasM.265466 can cleave target DNA in the presence of a normal length (non-extended crRNA) and extended crRNA, respectively.
[25] FIGS. 16A and 16B depict the results of a PAGE electrophoresis demonstrating that an extended crRNA remains intact after target DNA cleavage; non-target (NT) DNA was included as a control.
[26] FIGS. 17A and 17B depict a schematic and the result of PAGE electrophoresis demonstrating that RT was able to synthesize RT-DNA in the presence of intRNAs and CasM.265466.
[27] FIG. 18 depicts the results of a PAGE electrophoresis demonstrating that a stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced). RNase digestion post synthesis was used to remove RNA for better visualization of DNA products.
[28] FIGS. 19A and 19B depict the results of a PAGE electrophoresis demonstrating that synthesized cDNA was not affected by any trans DNAse activity. Addition of target DNA post synthesis did not result in trans DNAse cleavage of cDNA products, and the synthesis product did not prevent proper target DNA cleavage.
[29] FIGS. 20A, 20B, and 20C depict a schematic and the result of PAGE electrophoresis demonstrating that RT was able to synthesize without the need for a longer intermediary RNA (intRNA) and in the presence of CasM. 19952.
[30] FIG. 21 depicts the results of a PAGE electrophoresis demonstrating that a stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced). RNase digestion post synthesis was used to remove RNA for better visualization of DNA products.
[31] FIGS. 22A and 22B depict the results of a PAGE electrophoresis demonstrating that synthesized cDNA was not affected by any trans DNAse activity. Addition of target DNA post synthesis did not result in trans DNAse cleavage of cDNA products.
[32] FIGS. 23A, 23B, and 23C depict a schematic and the result of PAGE electrophoresis demonstrating that trans activity of CasM.19952 did not affect the RNA-DNA hybrid.
[33] FIGS. 24A and 24B depict a schematic and the result of PAGE electrophoresis demonstrating that a stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced). RNase digestion post synthesis was used to remove RNA for better visualization of DNA products.
[34] FIGS. 25A and 25B depict a schematic and the result of PAGE electrophoresis demonstrating that SpCas9 can cleave dsDNA with a dual guide system of crRNA and intRNA (intRNA 4242) and with a dual guide system containing an extended crRNA and an extended intRNA.
[35] FIGS. 26A and 26B depict a schematic and the result of PAGE electrophoresis demonstrating that RT was able to synthesize in the presence of spCas9.
[36] FIG. 27 depicts the results of a PAGE electrophoresis demonstrating that spCas9 was still able to cleave target DNA after DNA synthesis on the crRNA.
DETAILED DESCRIPTION
[37] It is to be understood that both the foregoing general description and the following detailed description are exemplary, and explanatory only, and are not restrictive of the disclosure.
[38] The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
[39] All documents, or portions of documents, cited in this application, including, but not limited to, patents, patent applications, articles, books, and treatises, are hereby expressly incorporated by reference in their entirety for any purpose.
Definitions
[40] Unless otherwise indicated, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Unless otherwise indicated or obvious from context, the following terms have the following meanings:
[41] The terms, “a,” “an,” and “the,” as used herein, include plural references unless the context clearly dictates otherwise.
[42] The terms, “or” and “and/or,” as used herein, include any and all combinations of one or more of the associated listed items.
[43] The terms, “including,” “includes,” “included,” and other forms, are not limiting.
[44] The terms, “comprise” and its grammatical equivalents, as used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[45] The term, “about,” as used herein in reference to a number or range of numbers, is understood to mean the stated number and numbers +/- 10% thereof, or 10% below the lower listed limit and 10% above the higher listed limit for the values listed for a range.
[46] The terms, “% identical,” “% identity,” “percent identity,” and grammatical equivalents thereof, as used herein, in the context of an amino acid sequence or nucleotide sequence, refer to the percent of residues that are identical between respective positions of two sequences when the two sequences are aligned for maximum sequence identity. The % identity is calculated by dividing the total number of the aligned residues by the number of the residues that are identical between the respective positions of the at least two sequences and multiplying by 100. Generally, computer programs can be employed for such calculations. Illustrative programs that compare and align pairs of sequences, include ALIGN (Myers and Miller, Comput Appl Biosci. 1988 Mar;4(l): 11-7), FASTA (Pearson and Lipman, Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444-8; Pearson, Methods Enzymol. 1990;183:63-98) and gapped BLAST (Altschul et al., Nucleic Acids Res. 1997 Sep l;25(17):3389-40), BLASTP, BLASTN, or GCG (Devereux et al., Nucleic Acids Res. 1984 Jan I I;I2(1 Pt l):387-95).
[47] The terms, “% complementary”, “% complementarity”, “percent complementary”, “percent complementarity” and grammatical equivalents thereof, as used interchangeably herein, in the context of two or more nucleic acid molecules, refer to the percent of nucleotides in two nucleotide sequences in said nucleic acid molecules of equal length that can undergo cumulative base pairing at two or more individual corresponding positions in an antiparallel orientation. Accordingly, the terms include nucleic acid sequences that are not completely complementary over their entire length, which indicates that the two or more nucleic acid molecules include one or more mismatches. A “mismatch” is present at any position in the two opposed nucleotides that are not complementary. The % complementary is calculated by dividing the total number of the complementary residues by the total number of the nucleotides in one of the equal length sequences, and multiplying by 100. Complete or total complementarity describes nucleotide sequences in 100% of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence. “Partially complementarity” describes nucleotide sequences in which at least 20%, but less than 100%, of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence. In some embodiments, at least 50%, but less than 100%, of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence. In some embodiments, at least 70%, 80%, 90% or 95%, but less than 100%, of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence. “Non-complementary” describes nucleotide sequences in
which less than 20% of the residues of a nucleotide sequence are complementary to residues in a reference nucleotide sequence.
[48] The term, “percent similarity,” or “% similarity,” as used herein, in the context of an amino acid sequence, refers to a value that is calculated by dividing a similarity score by the length of the alignment. The similarity of two amino acid sequences can be calculated by using a BLOSUM62 similarity matrix (Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA., 89: 10915-10919 (1992)) that is transformed so that any value > I is replaced with +1 and any value < 0 is replaced with 0. For example, an lie (I) to Leu (L) substitution is scored at +2.0 by the BLOSUM62 similarity matrix, which in the transformed matrix is scored at +1. This transformation allows the calculation of percent similarity, rather than a similarity score. Alternately, when comparing two full protein sequences, the proteins can be aligned using pairwise MUSCLE alignment. Then, the % similarity can be scored at each residue and divided by the length of the alignment. For determining % similarity over a protein domain or motif, a multilevel consensus sequence (or PROSITE motif sequence) can be used to identify how strongly each domain or motif is conserved. In calculating the similarity of a domain or motif, the second and third levels of the multilevel sequence are treated as equivalent to the top level. Additionally, if a substitution could be treated as conservative with any of the amino acids in that position of the multilevel consensus sequence, +1 point is assigned. For example, given the multilevel consensus sequence: RLG and
YCK, the test sequence QIQ would receive three points. This is because in the transformed BLOSUM62 matrix, each combination is scored as: Q-R: +1; Q-Y: +0; I-L: +1; I-C: +0; Q-G: +0; Q-K: +1 For each position, the highest score is used when calculating similarity. The % similarity can also be calculated using commercially available programs, such as the Geneious Prime software given the parameters matrix = BLOSUM62 and threshold > 1.
[49] The term, “accessory protein,” as used herein, refers to a polypeptide that is capable of recruiting one or more repair factors, such as enzymes, that promotes, increases or enables nucleic acid repair mechanisms or to a polypeptide that is capable of promoting increasing or enabling nucleic acid repair mechanisms. In some embodiments, an accessory protein comprises a domain that is capable of recruiting one or more repair factors, such as enzymes, that promotes, increases or enables a repair mechanism. In some embodiments, an accessory protein comprises a polypeptide that is capable of promoting increasing or enabling nucleic acid repair mechanisms. Examples of nucleic acid repair mechanisms include synthesis dependent strand annealing (SDSA), homology directed repair (HDR), or non-homologous end joining (NHEJ). Repair mechanisms can be native or endogenous repair mechanism, which is intended to encompass all repair mechanisms naturally existing inside a target cell, or an exogenous repair mechanism, which is intended to encompass repair mechanisms that are delivered to the target cell by systems, mechanisms, or methods described herein.
[50] The terms, “bind,” “binding,” “interact” and “interacting,” as used herein, refer to a non-covalent interaction between macromolecules (e.g., between two polypeptides, between a polypeptide and a nucleic
acid; between a polypeptide/guide nucleic acid complex and a target nucleic acid; and the like). While in a state of noncovalent interaction, the macromolecules are said to be “associated” or “interacting” or “binding” (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner). Non-limiting examples of non-covalent interactions are ionic bonds, hydrogen bonds, van der Waals and hydrophobic interactions. Not all components of a binding interaction need be sequence-specific (e.g. , contacts with phosphate residues in a DNA backbone), but some portions of a binding interaction may be sequence-specific.
[51] The term, “cis cleavage,” as used herein, refers to cleavage (hydrolysis of a phosphodiester bond) of a target nucleic acid by a complex of an effector protein and a guide nucleic acid (e.g. , an RNP complex), wherein at least a portion of the guide nucleic acid is hybridized to at least a portion of the target nucleic acid. Cleavage may occur within or directly adjacent to the portion of the target nucleic acid that is hybridized to the portion of the guide nucleic acid.
[52] The terms, “complementary” and “complementarity,” as used herein, in the context of a nucleic acid molecule or nucleotide sequence, refer to the characteristic of a polynucleotide having nucleotides that can undergo cumulative base pairing with their Watson-Crick counterparts (C with G; or A with T) in a reference nucleic acid in antiparallel orientation. For example, when every nucleotide in a polynucleotide or a specified portion thereof forms a base pair with every nucleotide in an equal length sequence of a reference nucleic acid, that polynucleotide is said to be 100% complementary to the sequence of the reference nucleic acid. In a double stranded DNA or RNA sequence, the upper (sense) strand sequence is, in general, understood as going in the direction from its 5'- to 3 '-end, and the complementary sequence is thus understood as the sequence of the lower (antisense) strand in the same direction as the upper strand. Following the same logic, the reverse sequence is understood as the sequence of the upper strand in the direction from its 3'- to its 5 '-end, while the “reverse complement” sequence or the “reverse complementary” sequence is understood as the sequence of the lower strand in the direction of its 5'- to its 3 '-end. Each nucleotide in a double stranded DNA or RNA molecule that is paired with its Watson-Crick counterpart can be referred to as its complementary nucleotide. The complementarity of modified or artificial base pairs can be based on other types of hydrogen bonding and/or hydrophobicity of bases and/or shape complementarity between bases.
[53] The term, “codon optimized,” as used herein, refers to a mutation of a nucleotide sequence encoding a polypeptide, such as a nucleotide sequence encoding an effector protein, to mimic the codon preferences of the intended host organism or cell while encoding the same polypeptide. Thus, the codons can be changed, but the encoded polypeptide remains unchanged. For example, if the intended target cell was a human cell, a human codon-optimized nucleotide sequence encoding an effector protein could be used. As another non-limiting example, if the intended host cell were a mouse cell, then a mouse codon- optimized nucleotide sequence encoding an effector protein could be generated. As another non-limiting example, if the intended host cell were a eukaryotic cell, then a eukaryote codon-optimized nucleotide
sequence encoding an effector protein could be generated. As another non-limiting example, if the intended host cell were a prokaryotic cell, then a prokaryote codon-optimized nucleotide sequence encoding an effector protein could be generated. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.or.jp/codon.
[54] The term, “cleavage assay,” as used herein, refers to an assay designed to visualize, quantitate or identify cleavage of a nucleic acid. In some embodiments, the cleavage activity may be cis cleavage activity. In some embodiments, the cleavage activity may be trans cleavage activity. By way of non-limiting example, nucleic acid cleavage may be assessed by gel electrophoresis.
[55] The terms, “cleave,” “cleaving” and “cleavage,” as used herein, in the context of a nucleic acid molecule or nuclease activity of an effector protein, refer to the hydrolysis of a phosphodiester bond of a nucleic acid molecule that results in breakage of that bond. The result of this breakage can be a nick (hydrolysis of a single phosphodiester bond on one side of a double-stranded molecule), single strand break (hydrolysis of a single phosphodiester bond on a single-stranded molecule) or double strand break (hydrolysis of two phosphodiester bonds on both sides of a double-stranded molecule) depending upon whether the nucleic acid molecule is single -stranded (e.g., ssDNA or ssRNA) or double-stranded (e.g., dsDNA) and the type of nuclease activity being catalyzed by the effector protein.
[56] The term, “clustered regularly interspaced short palindromic repeats (CRISPR),” as used herein, refers to a segment of DNA found in the genomes of certain prokaryotic organisms, including some bacteria and archaea, that includes repeated short sequences of nucleotides interspersed at regular intervals between unique sequences of nucleotides derived from another organism.
[57] The term, “conservative amino acid substitution,” as used herein, refers to the replacement of one amino acid for another such that the replacement takes place within a family of amino acids that are related in their side chains. In some embodiments, a conservative amino acid substitution in a protein does not change the activity of the protein. Conversely, the term “non-conservative amino acid substitution” as used herein refers to the replacement of one amino acid residue for another that does not have a related side chain. Genetically encoded amino acids can be divided into four families having related side chains: (1) acidic (negatively charged): Asp (D), Glu (E); (2) basic (positively charged): Lys (K), Arg (R), His (H); (3) non-polar (hydrophobic): Cys (C), Ala (A), Vai (V), Leu (L), He (I), Pro (P), Phe (F), Met (M), Trp (W), Gly (G), Tyr (Y), with non-polar also being subdivided into: (i) strongly hydrophobic: Ala (A), Vai (V), Leu (L), He (I), Met (M), Phe (F); and (ii) moderately hydrophobic: Gly (G), Pro (P), Cys (C), Tyr (Y), Trp (W); and (4) uncharged polar: Asn (N), Gin (Q), Ser (S), Thr (T). Amino acids may be related by aliphatic side chains: Gly (G), Ala (A), Vai (V), Leu (L), He (I), Ser (S), Thr (T), with Ser (S) and Thr (T) optionally being grouped separately as aliphatic-hydroxyl; Amino acids may be related by aromatic side chains: Phe (F), Tyr (Y), Trp (W). Amino acids may be related by amide side chains: Asn (N), Gin (Q). Amino acids may be related by sulfur-containing side chains: Cys (C) and Met (M).
[58] The terms, “CRISPR RNA” and “crRNA,” as used herein, refer to a type of guide nucleic acid that is RNA comprising a first sequence that is capable of hybridizing to a target sequence of a target nucleic acid and a second sequence that is capable of interacting with an effector protein either directly (by being bound by an effector protein) or indirectly (e.g., by hybridization with a second nucleic acid molecule that can be bound by an effector). The first sequence and the second sequence are directly connected to each other or by a linker.
[59] The term, “donor nucleic acid,” as used herein, refers to a nucleic acid that is (designed or intended to be) incorporated into a target nucleic acid or target sequence. In some embodiments, a reverse-transcribed DNA (RT-DNA) as described herein may be used as a donor nucleic acid in systems, compositions and methods described herein. In some embodiments, a donor nucleic acid may be a single stranded donor nucleic acid and is referred to herein as a ss donor nucleic acid. In some embodiments, a donor nucleic acid may be a double stranded donor nucleic acid and is referred to herein as a ds donor nucleic acid.
[60] The term “dual nucleic acid system” as used herein refers to a system that uses a intRNA-crRNA duplex complexed with one or more polypeptides described herein, wherein the complex is capable of interacting with a target nucleic acid in a sequence selective manner.
[61] The term, “effector protein,” as used herein, refers to a protein, polypeptide, or peptide that is capable of interacting with a nucleic acid, such as a guide nucleic acid, to form a complex (e.g., a RNP complex), wherein the complex interacts with a target nucleic acid.
[62] The term, “edited target nucleic acid,” or “edited DNA” as used herein, refers to a target nucleic acid that has undergone a change to its nucleotide sequence, for example, after contact with an OCATS system described herein. In some instances, the edited target nucleic acid comprises an insertion, deletion, or substitution of one or more nucleotides compared to the unedited target nucleic acid. In some instances, edited target nucleic acid or edited DNA is the genomic DNA of a target.
[63] The term, “extended guide nucleic acid,” as used herein, refers to a guide nucleic acid engineered to comprise an extension sequence on a 5 ’ or 3 ’ end that can be used for production of a donor nucleic acid. In some embodiments, an extended guide nucleic acid, or a portion thereof, is capable of serving a scaffold for the activity of effector proteins described herein, fusion partners, fusion proteins, and/or accessory proteins described herein. In some embodiments, an extension sequence comprises a template sequence described herein.
[64] The term “fidelity,” as used herein, refers to the accuracy of template-mediated nucleotide synthesis of a polymerase. For example, fidelity of an RNA polymerase depends on the error rate of the transcription of DNA to RNA by the RNA polymerase. Likewise, fidelity of a reverse transcriptase, depends on the error rate of the reverse transcription of RNA to DNA by the reverse transcriptase.
[65] The term, “fused,” as used herein, refers to at least two sequences that can be connected together, such as by a linker, or by conjugation (e.g., chemical conjugation or enzymatic conjugation). The term “fused” includes a linker.
[66] The term, “fusion protein,” as used herein, refers to a protein comprising at least two heterologous polypeptides. The fusion protein may comprise one or more effector protein and fusion partner. In some embodiments, an effector protein and fusion partner are not found connected to one another, such as as a native protein or complex that occurs together in nature.
[67] The term, “fusion partner,” as used herein, refers to a protein, polypeptide or peptide that is fused, or linked by a linker, to one or more effector protein. The fusion partner can impart some function to the fusion protein that is not provided by the effector protein.
[68] The term, “genetic disease,” as used herein, refers to a disease, disorder, condition, or syndrome associated with or caused by one or more mutations in the DNA of an organism having the genetic disease.
[69] The term, “guide nucleic acid,” as used herein, refers to a nucleic acid that, when in a complex with one or more polypeptides described herein (e.g., an RNP complex) can impart sequence selectivity to the complex when the complex interacts with a target nucleic acid. A guide nucleic acid may be referred to interchangeably as a guide RNA, however it is understood that guide nucleic acids may comprise deoxyribonucleotides (DNA), ribonucleotides (RNA), a combination thereof (e.g., RNA with a thymine base), biochemically or chemically modified nucleobases (e.g., one or more engineered modifications described herein), or combinations thereof.
[70] The term, “heterologous,” as used herein, refers to at least two different polypeptide sequences that are not found similarly connected to one another in a native nucleic acid or protein. A protein that is heterologous to the effector protein is a protein that is not covalently linked by an amide bond to the effector protein in nature. In some embodiments, a heterologous protein is not encoded by a species that encodes the effector protein. A guide nucleic acid may comprise “heterologous” sequences, which means that it includes a first sequence and a second sequence, wherein the first sequence and the second sequence are not found covalently linked by a phosphodiester bond in nature. Thus, the first sequence is considered to be heterologous with the second sequence, and the guide nucleic acid may be referred to as a heterologous guide nucleic acid.
[71] The term, “homology sequence,” as used herein, refers to an extension of a guide nucleic acid that is at least partially homologous to an extension sequence, but is not a template sequence.
[72] The terms, “hybridize,” “hybridizable” and grammatical equivalents thereof, refer to a nucleotide sequence that is able to noncovalently interact, i. e. form Watson-Crick base pairs and/or G/U base pairs, or anneal, to another nucleotide sequence in a sequence-specific, antiparallel, manner (z.e., a nucleotide sequence specifically interacts to a complementary nucleotide sequence) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength. Standard Watson-Crick base-pairing
includes: adenine (A) pairing with thymidine (T), adenine (A) pairing with uracil (U), and guanine (G) pairing with cytosine (C) for both DNA and RNA. In addition, for hybridization between two RNA molecules (e.g., dsRNA), and for hybridization of a DNA molecule with an RNA molecule (e.g., when a DNA target nucleic acid base pairs with a guide RNA, etc.) guanine (G) can also base pair with uracil (U). For example, G/U base-pairing is at least partially responsible for the degeneracy (i.e., redundancy) of the genetic code in the context of tRNA anti-codon base-pairing with codons in mRNA. Thus, a guanine (G) can be considered complementary to both an uracil (U) and to an adenine (A). Accordingly, when a G/U base-pair can be made at a given nucleotide position, the position is not considered to be non- complementary, but is instead considered to be complementary. While hybridization typically occurs between two nucleotide sequences that are complementary, mismatches between bases are possible. It is understood that two nucleotide sequences need not be 100% complementary to be specifically hybridizable, hybridizable, partially hybridizable, or for hybridization to occur. Moreover, a nucleotide sequence may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a bulge, a loop structure or hairpin structure, etc.). The conditions appropriate for hybridization between two nucleotide sequences depend on the length of the sequence and the degree of complementarity, variables which are well known in the art. For hybridizations between nucleic acids with short stretches of complementarity (e.g. complementarity over 35 or less, 30 or less, 25 or less, 22 or less, 20 or less, or 18 or less nucleotides) the position of mismatches may become important (see Sambrook et al., supra, 11.7-11.8). Typically, the length for a hybridizable nucleic acid is 8 nucleotides or more (e.g., 10 nucleotides or more, 12 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 22 nucleotides or more, 25 nucleotides or more, or 30 nucleotides or more). Any suitable in vitro assay may be utilized to assess whether two sequences “hybridize”. One such assay is a melting point analysis where the greater the degree of complementarity between two nucleotide sequences, the greater the value of the melting temperature (Tm) for hybrids of nucleic acids having those sequences. The conditions of temperature and ionic strength determine the “stringency” of the hybridization. Temperature, wash solution salt concentration, and other conditions may be adjusted as necessary according to factors such as length of the region of complementation and the degree of complementation. Hybridization and washing conditions are well known and exemplified in Sambrook, J. and Russell, W., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001); and in Green, M. and Sambrook, J., Molecular Cloning: A Laboratory Manual, Fourth Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2012).
[73] The term, “indel,” as used herein, refers to an insertion-deletion or indel mutation, which is a type of genetic mutation that results from the insertion and/or deletion of one or more nucleotide in a target nucleic acid. An indel can vary in length (e.g., 1 to 1,000 nucleotides in length) and be detected by any suitable method, including sequencing.
[74] The term, “indel percentage,” as used herein, refers to a percentage of sequencing reads that show at least one nucleotide has been edited from the insertion and/or deletion of nucleotides regardless of the
size of insertion or deletion, or number of nucleotides edited. For example, if there is at least one nucleotide deletion detected in a given target nucleic acid, it counts towards the percent indel value. As another example, if one copy of the target nucleic acid has one nucleotide deleted, and another copy of the target nucleic acid has 10 nucleotides deleted, they are counted the same. This number reflects the percentage of target nucleic acids that are edited by a given effector protein.
[75] The terms, “intermediary RNA” and “intRNA,” as used herein, refer to the component of a dual nucleic acid system comprising a first sequence that is capable of interacting with an effector protein by being non-covalently bound by the effector protein and a second sequence that is capable of hybridizing to a repeat sequence of a crRNA. The first sequence and the second sequence are linked.
[76] The term, “in vitro ” as used herein, refers to describing something outside an organism. An in vitro system, composition or method may take place in a container for holding laboratory reagents such that it is separated from the biological source from which a material in the container is obtained. In vitro assays can encompass cell-based assays in which living or dead cells are employed. In vitro assays can also encompass a cell-free assay in which no intact cells are employed. The term “in vivo” is used to describe an event that takes place within an organism. The term “ex vivo” is used to describe an event that takes place in a cell that has been obtained from an organism. An ex vivo assay is not performed on a subject. Rather, it is performed upon a sample separate from a subject.
[77] The terms, “length” and “linked” as used herein, refer to a nucleic acid (polynucleotide) or polypeptide, may be expressed as “kilobases” (kb) or “base pairs (bp),”. Thus, a length of 1 kb refers to a length of 1000 linked nucleotides, and a length of 500 bp refers to a length of 500 linked nucleotides. Similarly, a protein having a length of 500 linked amino acids may also be simply described as having a length of 500 amino acids.
[78] The term, “linker,” as used herein, refers to a molecule that links a first polypeptide to a second polypeptide (e.g., by an amide bond) or a first nucleic acid to a second nucleic acid (e.g., by a phosphodiester bond).
[79] The term, “mutation,” as used herein, refers to an alteration that changes an amino acid residue or a nucleotide as described herein. Such an alteration can include, for example, deletions, insertions, and/or substitutions. The mutation can refer to a change in structure of an amino acid residue or nucleotide relative to the starting or reference residue or nucleotide. A mutation of an amino acid residue includes, for example, deletions, insertions and substituting one amino acid residue for a structurally different amino acid residue. Such substitutions can be a conservative amino acid substitution, a non-conservative amino acid substitution, a substitution to a specific sub-class of amino acids, or a combination thereof as described herein. A mutation of a nucleotide includes, for example, changing one naturally occurring base for a different naturally occurring base, such as changing an adenine to a thymine or a guanine to a cytosine or an adenine to a cytosine or a guanine to a thymine . A mutation of a nucleotide base may result in a structural and/or functional alteration of the encoding peptide, polypeptide or protein by changing the encoded amino
acid residue of the peptide, polypeptide or protein. A mutation of a nucleotide base may not result in an alteration of the amino acid sequence or function of encoded peptide, polypeptide or protein, also known as a silent mutation. Methods of mutating an amino acid residue or a nucleotide are well known.
[80] The terms, “mutation associated with a disease” and “mutation associated with a genetic disorder,” as used herein, refer to the co-occurrence of a mutation and the phenotype of a disease. The mutation may occur in a gene, wherein transcription or translation products from the gene occur at a significantly abnormal level or in an abnormal form in a cell or subject harboring the mutation as compared to a nondisease control subject not having the mutation.
[81] The term, “nickase,” as used herein, refers to an enzyme that possess catalytic activity for single stranded nucleic acid cleavage of a double stranded nucleic acid.
[82] The term, “nickase activity,” as used herein, refers to catalytic activity that results in single stranded nucleic acid cleavage of a double stranded nucleic acid.
[83] The terms, “non-naturally occurring” and “engineered,” as used herein, refer to indicate involvement of the hand of man. The terms, when referring to a nucleic acid, nucleotide, protein, polypeptide, peptide or amino acid, refer to a molecule, such as but not limited to, a nucleic acid, nucleotide, protein, polypeptide, peptide or amino acid refers to a modification of that molecule (e.g., chemical modification, nucleotide sequence, or amino acid sequence) that is not present in the naturally molecule. The terms, when referring to a composition or system described herein, refer to a composition or system having at least one component that is not naturally associated with the other components of the composition or system. By way of a non-limiting example, a composition may include an effector protein and a guide nucleic acid that do not naturally occur together. Conversely, and as a non-limiting further clarifying example, an effector protein or guide nucleic acid that is “natural,” “naturally-occurring,” or “found in nature” includes an effector protein and a guide nucleic acid from a cell or organism that have not been genetically modified by the hand of man.
[84] The terms, “nuclease” and “endonuclease” as used herein, refer to an enzyme which possesses catalytic activity for nucleic acid cleavage.
[85] The term, “nuclease activity,” as used herein, refers to catalytic activity that results in nucleic acid cleavage (e.g., ribonuclease activity (ribonucleic acid cleavage), or deoxyribonuclease activity (deoxyribonucleic acid cleavage), etc.).
[86] The term, “nucleic acid,” as used herein, refers to a polymer of nucleotides. A nucleic acid may comprise ribonucleotides, deoxyribonucleotides, combinations thereof, and modified versions of the same. A nucleic acid may be single- stranded or double-stranded, unless specified. Non-limiting examples of nucleic acids are double stranded DNA (dsDNA), single stranded (ssDNA), messenger RNA, genomic DNA, cDNA, DNA-RNA hybrids, and a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. Accordingly, nucleic
acids as described herein may comprise one or more mutations, one or more engineered modifications, or both.
[87] The term, “nucleic acid expression vector,” as used herein, refers to a plasmid that can be used to express a nucleic acid of interest.
[88] The term, “nuclear localization signal (NLS),” as used herein, refers to an entity (e.g., peptide) that facilitates localization of a nucleic acid, protein, or small molecule to the nucleus, when present in a cell that contains a nuclear compartment.
[89] The terms, “nucleotide(s)” and “nucleoside(s)”, as used herein, in the context of a nucleic acid molecule having multiple residues, refer to describing the sugar and base of the residue contained in the nucleic acid molecule. Similarly, a skilled artisan could understand that linked nucleotides and/or linked nucleosides, as used in the context of a nucleic acid having multiple linked residues, are interchangeable and describe linked sugars and bases of residues contained in a nucleic acid molecule. When referring to a “nucleobase(s)”, or linked nucleobase, as used in the context of a nucleic acid molecule, it can be understood as describing the base of the residue contained in the nucleic acid molecule, for example, the base of a nucleotide, nucleosides, or linked nucleotides or linked nucleosides. A person of ordinary skill in the art when referring to nucleotides, nucleosides, and/or nucleobases would also understand the differences between RNA and DNA (generally the exchange of uridine for thymidine or vice versa) and the presence of nucleoside analogs, such as modified uridines, do not contribute to differences in identity or complementarity among polynucleotides as long as the relevant nucleotides (such as thymidine, uridine, or modified uridine) have the same complement (e.g., adenosine for all of thymidine, uridine, or modified uridine; another example is cytosine and 5- methylcytosine, both of which have guanosine or modified guanosine as a complement). Thus, for example, the sequence 5'-AXG where X is any modified uridine, such as pseudouridine, Nl-methyl pseudouridine, or 5 -methoxyuridine, is considered 100% identical to AUG in that both are perfectly complementary to the same sequence (5' -CAU).
[90] The terms, “polypeptide” and “protein,” as used herein, refer to a polymeric form of amino acids. A polypeptide may include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. Accordingly, polypeptides as described herein may comprise one or more mutations, one or more engineered modifications, or both. It is understood that when describing coding sequences of polypeptides described herein, said coding sequences do not necessarily require a codon encoding an N-terminal Methionine (M) or a Valine (V) as described for the effector proteins described herein. One skilled in the art would understand that a start codon could be replaced or substituted with a start codon that encodes for an amino acid residue sufficient for initiating translation in a host cell. In some embodiments, when a heterologous peptide, such as a fusion partner protein, protein tag or NLS, is located at the N terminus of the effector protein, a start codon for the heterologous peptide serves as a start codon for the effector protein as well. Thus, the natural start codon
encoding an amino acid residue sufficient for initiating translation (e.g., Methionine (M) or a Valine (V)) of the effector protein may be removed or absent.
[91] The terms, “promoter” and “promoter sequence,” as used herein, refer to a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3 ’ direction) coding or non-coding sequence. A transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase, can also be found in a promoter region. Eukaryotic promoters will often, but not always, contain “TATA” boxes and “CAT” boxes. Various promoters, including inducible promoters, may be used to drive expression by the various vectors of the present disclosure.
[92] The term, “protein binding sequence,” as used herein, in a context of a dual nucleic acid system, refers to a nucleotide sequence in an intermediary RNA, wherein the protein binding sequence is capable of, at least partially, being non-covalently bound to an effector protein to form a complex (e.g., an RNP complex).
[93] The terms, “protospacer adjacent motif’ and “PAM,” as used herein, refer to a nucleotide sequence found in a target nucleic acid that directs an effector protein to edit the target nucleic acid at a specific location. In some embodiments, a PAM is required for a complex of an effector protein and a guide nucleic acid (e.g., an RNP complex) to hybridize to and edit the target nucleic acid. In some embodiments, the complex does not require a PAM to edit the target nucleic acid.
[94] The term, “recombinant,” as used herein, in the context of proteins, polypeptides, peptides and nucleic acids, refers to proteins, polypeptides, peptides and nucleic acids that are products of various combinations of cloning, restriction, and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems.
[95] The term, “regulatory element,” used herein, refers to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., a guide nucleic acid) or a coding sequence (e.g., effector proteins, fusion proteins, and the like) and/or regulate translation of an encoded polypeptide.
[96] The term, “repeat hybridization sequence,” as used herein, in the context of a dual nucleic acid system, refers to a sequence of nucleotides of an intRNA that is capable of hybridizing to a repeat sequence of a guide nucleic acid.
[97] The term, “repeat sequence,” as used herein, refers to a sequence of nucleotides in a guide nucleic acid that is capable of, at least partially, interacting with an effector protein and/or another guide nucleic acid (e.g., hybridizes to a portion of an intermediary RNA).
[98] The term “reverse transcriptase ” as used herein refers to an enzyme that possesses catalytic activity for reverse transcription of an RNA strand into a DNA strand without the use of secondary structural elements associated with retrons. The reverse transcriptase comprises one or more activities selected from
RNA-dependent DNA polymerase activities, ribonuclease activities, and DNA-dependent DNA polymerase activities.
[99] The term “reverse transcriptase activity ” as used herein refers to the catalytic activity that results in the reverse of normal transcription in which a sequence of nucleotides is copied from an RNA template during the synthesis of a molecule of DNA.
[100] The term “reverse -transcribed DNA” or “RT-DNA” or “RT-DNA molecule” as used herein refers to a DNA strand synthesized by reverse transcriptase activity and/or by the activity of a reverse transcriptase from a template sequence.
[101] The terms, “ribonucleotide protein complex” and “RNP” as used herein, refer to a complex of one or more nucleic acids and one or more polypeptides described herein. While the term utilizes “ribonucleotides” it is understood that the one or more nucleic acid may comprise deoxyribonucleotides (DNA), ribonucleotides (RNA), a combination thereof (e.g., RNA with a thymine base), biochemically or chemically modified nucleobases (e.g., one or more engineered modifications described herein), or combinations thereof.
[102] The term, “R-Loop” as used herein, refers to a three-stranded nucleic acid structure comprising a DNA:RNA hybrid and a displaced strand of DNA. For example, an R-Loop can be formed upon hybridization of a guide nucleic acid as described herein to a target sequence of a target nucleic acid. In general, the target strand of the R-Loop is that to which a spacer sequence hybridizes. Non-limiting examples of an R-Loop are depicted in FIGS. 1-3.
[103] The terms, “single guide nucleic acid”, “single guide RNA” and “sgRNA,” as used herein, in the context of a single nucleic acid system, refers to a guide nucleic acid, wherein the guide nucleic acid is a single polynucleotide chain having all the required sequence for a functional complex with an effector protein (e.g., being bound by an effector protein, including in some embodiments activating the effector protein, and hybridizing to a target nucleic acid, without the need for a second nucleic acid molecule). For example, an sgRNA can have two or more linked guide nucleic acid components (e.g., an intermediary sequence, a repeat sequence, a spacer sequence and optionally a linker).
[104] The term, “single nucleic acid system,” as used herein, refers to a system that uses a guide nucleic acid complexed with one or more polypeptides described herein, wherein the complex is capable of interacting with a target nucleic acid in a sequence specific manner, and wherein the guide nucleic acid is capable of non-covalently interacting with the one or more polypeptides described herein, and wherein the guide nucleic acid is capable of hybridizing with a target sequence of the target nucleic acid. A single nucleic acid system lacks a duplex of a guide nucleic acid as hybridized to a second nucleic acid, wherein in such a duplex the second nucleic acid, and not the guide nucleic acid, is capable of interacting with the effector protein.
[105] The term, “spacer sequence,” as used herein, refers to a nucleotide sequence in a guide nucleic acid that is capable of, at least partially, hybridizing to an equal length portion of a sequence (e.g., a target sequence) of a target nucleic acid.
[106] The term, “target nucleic acid,” as used herein, refers to a nucleic acid that is selected as the nucleic acid for editing, binding, hybridization or any other activity of or interaction with a nucleic acid, protein, polypeptide, or peptide described herein. A target nucleic acid may comprise RNA, DNA, or a combination thereof. A target nucleic acid may be single -stranded (e.g., single-stranded RNA or single -stranded DNA, referred to herein as a ssRNA or ssDNA respectively) or double-stranded (e.g., double-stranded DNA, referred to herein as a dsRNA).
[107] The term, “target sequence,” as used herein, in the context of a target nucleic acid, refers to a nucleotide sequence found within a target nucleic acid. Such a nucleotide sequence can, for example, hybridize to a respective length portion of a guide nucleic acid.
[108] The term, “variant,” as used herein, refers to a form or version of a protein that differs from the wild-type protein. A variant may have a different function or activity relative to the wild-type protein.
[109] The term, “viral vector,” as used herein, refers to a nucleic acid to be delivered into a host cell by a recombinantly produced virus or viral particle.
I. On Cas Template Synthesis (OCATS) Systems
[HO] In some aspects, disclosed herein are systems and uses thereof for synthesis of a donor nucleic acid at a Cas-mediated DNA cleavage site. In general, an RNP complex comprising a Cas effector protein and guide nucleic acid recognizes and cleaves a target sequence of a target nucleic acid, and a reverse transcriptase generates a reverse-transcribed DNA (RT-DNA) (e.g., a single stranded donor (ss donor)) on an extended portion of the guide nucleic acid (e.g., crRNA, intRNA) referred to as an extension sequence, and in some embodiments, comprising a template sequence. Such systems may be referred to as OCATS systems. System components may be provided in separate compositions or in a single composition. In some embodiments, nucleic acids of the systems are provided in separate compositions, plasmids, or expression vectors. In some embodiments, nucleic acids of the systems are provided in a single composition, plasmid, or expression vector.
[Hl] In contrast, to many prime editing systems described by others, the extended portion of the crRNA or intRNA is not required to be complementary to the target sequence (e.g. , of either the target strand (TS) or the non-target strand (NTS)) or portion thereof (e.g. , after strand cleavage) . Instead of the RT generating new DNA extending from the cut/nicked end of genomic DNA as is seen in prime editing systems, RT- DNA is generated at the 3’ end of a guide nucleic acid, crRNA or intRNA. The RT-DNA sequence, which can be used as a ss donor nucleic acid, and template sequence can be different from the target sequence. In some embodiments, neither of the RT-DNA (e.g., ss donor nucleic acid) and template sequence can hybridize to the target sequence. In some embodiments, neither of the RT-DNA (e.g., a ss donor nucleic
acid) and template sequence can hybridize to the target sequence. In some embodiments, the RT-DNA (e.g. , the ss donor nucleic acid) and template sequence are less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less than 50% identical to the target sequence.
[112] In some embodiments, systems comprise a Type V Cas effector protein or a nucleic acid encoding the Type V effector protein. In some embodiments, systems comprise a reverse transcriptase (RT) or a nucleic acid that encodes the RT. In some embodiments, systems comprise an intermediary RNA or DNA molecule encoding the same. In some embodiments, the intermediary RNA comprises from 5’ to 3’: a protein binding sequence, a repeat hybridization sequence. In some embodiments, systems comprise an extended crRNA or DNA molecule encoding the same. In some embodiments, the crRNA comprises from 5 ’ to 3 ’ : a template sequence, a repeat sequence that hybridizes to the repeat hybridization sequence of the intermediary RNA, and a spacer sequence that hybridizes to a target sequence of a target strand of a dsDNA target nucleic acid. While components of the intermediary RNA and crRNA are described in order of 5’ to 3’ in the foregoing description and throughout, these components need not be directly linked. One of skill in the art understands that such guide nucleic acids may comprise linker nucleotides and additional elements. Without being bound by theory, the Type V Cas effector protein forms a ribonucleotide (RNP) complex with the intermediary RNA and crRNA; the RNP complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid; the RNP complex cleaves at least one strand of the R- loop to produce a cut site; and the RT reverse transcribes the template sequence to produce a RT-DNA (e.g., a ss donor nucleic acid) at the 3’ end of the intermediary RNA. In some embodiments, the RT-DNA (e.g. , the ss donor nucleic acid) is inserted into the non-target strand at the cut site. In some embodiments, a DNA strand complementary to the RT-DNA (e.g., the ss donor nucleic acid) is polymerized to produce DNA complementary to the RT-DNA, which is directly attached to the non-target or target strand at the cut site.
[113] In some embodiments, systems comprise a Type II effector protein, or a nucleic acid encoding the same. In some embodiments, systems comprise a reverse transcriptase (RT) or a nucleic acid that encodes the RT. In some embodiments, systems comprise an extended intermediary RNA or DNA molecule encoding the same. In some embodiments, the intermediary RNA comprises from 5’ to 3’: a template sequence, a repeat hybridization sequence, and a protein binding sequence. In some embodiments, systems comprise a crRNA or DNA molecule encoding the same. In some embodiments, the crRNA comprises from 5 ’ to 3 ’ : a spacer sequence that hybridizes to a target sequence of a target strand of a dsDNA target nucleic acid, and a repeat sequence that hybridizes to the repeat hybridization sequence of the intermediary RNA. While components of the intermediary RNA and crRNA are described in order of 5’ to 3’ in the foregoing description and throughout, these components need not be directly linked. One of skill in the art understands that such guide nucleic acids may comprise linker nucleotides and additional elements. Without being bound by theory, the Type II Cas effector protein forms a complex with the intermediary RNA and crRNA; the complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid; the complex cleaves at least one strand of the R-loop to produce a cut site, and the RT reverse
transcribes the template sequence to produce RT-DNA (e.g. , the ss donor nucleic acid) at the 3 ’ end of the crRNA. In some embodiments, the RT-DNA (e.g., the ss donor nucleic acid) is inserted into the non-target strand at the cut site. In some embodiments, a DNA strand complementary to the RT-DNA (e.g., the ss donor nucleic acid) is polymerized to produce a dsDNA donor nucleic acid which is inserted into the nontarget strand at the cut site.
[114] In some embodiments, it is advantageous for some nucleotides at the 5’ end of the template sequence to be complementary to the 3 ’ end of the intermediary RNA (in the case of Type V systems) or the 3’ end of the crRNA (in the case of Type II systems). In some instances, at least 2, at least 3, at least 4, at least 5, or at least 10 nucleotides are complementary. By way of non-limiting example, production of an intermediary RNA is driven by a U6 promoter and terminated by a heptaT sequence, which adds multiple uracils to the 3 ’ end of the intermediary RNA. In order to ensure that the 3 ’ end of the intermediary RNA hybridizes to the 5’ end of the template sequence of the crRNA may comprise three to seven adenosines.
[115] In some embodiments, the RT is fused to the Cas effector protein. In some embodiments, the RT comprises an aptamer binding moiety, and the intermediary RNA or crRNA comprises an aptamer that recruits the RT to a target sequence. In some embodiments, the RT is fused to a first peptide/protein and the Cas effector protein comprises a second peptide/protein, wherein the first peptide/protein interacts with a second peptide/protein.
[116] In some embodiments, the effector protein removes a portion of a cleaved strand of the R-loop. In some embodiments, the effector protein removes a portion of a non-target strand (NTS) of a target sequence . This may also be referred to as ssDNA removal. In some embodiments, systems comprise an endonuclease that removes the portion of the NTS. The endonuclease may be an endogenous nuclease in a cell. The endonuclease may be an exogenous nuclease introduced to a cell. In some embodiments, systems comprise an accessory protein that recruits an endonuclease to the RNP complex. In some embodiments, the endonuclease or accessory protein is fused to the effector protein or the RT.
[117] In some embodiments, systems comprise a domain that recruits or is itself a protein involved in synthesis dependent strand annealing (SDSA) or nucleic acid encoding the same. In some embodiments, systems comprise an accessory protein, wherein the accessory protein comprises a domain that recruits factors for or is itself a protein involved in SDSA. In some embodiments, the SDSA proteins enable synthesis of a complementary strand on the ss donor RT-DNA to produce a strand that contains the edit and is directly linked to the genomic DNA. In some embodiments, the SDSA protein is a DNA polymerase. In some embodiments, the length of the portion of the non-target strand that is removed is at least 5, at least 10, at least 15, at least 20, or at least 25 nucleotides. In some embodiments, the length of the portion of the non-target strand that is removed is not greater than 100 nucleotides. In some embodiments, the length of the portion of the non-target strand that is removed is 5-10 nucleotides, 10-15 nucleotides, 15-30 nucleotides.
[118] In some embodiments, systems comprise an accessory protein, wherein the accessory protein comprises one or more repair proteins that promotes, increases or enables nucleic acid repair mechanisms. In some embodiments, systems comprise a DNA repair protein that promotes, increases or enables DNA repair at the cut site. In some embodiments, the DNA repair protein recruits other NHEJ proteins to the cut site. In some embodiments, the DNA repair protein comprises a ligase. In some embodiments, the DNA repair protein is fused to or interacts with any one of an effector protein, RT, and an accessory protein.
A. Effector Proteins
[119] Provided herein are compositions, systems, and methods comprising an effector protein or a use thereof. In general, effector proteins interact with a guide nucleic acid to form a complex. In some embodiments, an interaction between the complex and a target nucleic acid comprises one or more of: recognition of a protospacer adjacent motif (PAM) sequence within the target nucleic acid by the effector protein, hybridization of the guide nucleic acid to the target nucleic acid, and optionally modification of the target nucleic acid and/or the non-target nucleic acid. In some embodiments, effector proteins have some detectable catalytic activity. In some embodiments, the catalytic activity is nuclease activity (e.g., cleaving a strand of a nucleic acid, breaking a phosphodiester bond). In some embodiments, the catalytic activity is nickase activity.
[120] In general, effector proteins are CRISPR associated (Cas) proteins. In some embodiments, the Cas protein is a Class 1 Cas protein. In some embodiments, the Cas protein is a Class 2 protein. In some embodiments, the Cas protein is selected from a Type I, Type II, Type III, Type IV, and Type V Cas protein. In some embodiments, the Cas protein is a Type V Cas protein. In some embodiments, the Cas protein is a Casl2 protein. In some embodiments, the Cas protein is a Casl4 protein. In some embodiments, the Cas protein is a Type VU protein. In some embodiments, the Cas protein is a Type VU-3 protein. In some embodiments, the Cas protein is a Type VU-4 protein. In some embodiments, the effector protein comprises an engineered variant of any of the Cas proteins described herein. In some embodiments, the effector protein comprises transposase activity. In some embodiments, the effector protein comprises integrase activity. In some embodiments, the effector protein comprises an IscB protein or engineered variant thereof.
[121] In some embodiments, effector proteins described herein comprise one or more functional domains. Effector protein functional domains can include a protospacer adjacent motif (PAM)-interacting domain, an oligonucleotide-interacting domain, one or more recognition domains, a non-target strand interacting domain, and a RuvC domain. A PAM interacting domain can be a target strand PAM interacting domain (TPID) or a non-target strand PAM interacting domain (NTPID). In some embodiments, a PAM interacting domain, such as a TPID or a NTPID, on an effector protein describes a region of an effector protein that interacts with target nucleic acid. In some embodiments, effector proteins described herein comprise one or more recognition domain (REC domain) with a binding affinity for a guide nucleic acid or for a guide nucleic acid-target nucleic acid heteroduplex. An effector protein described herein may comprise a zinc
finger domain. In some embodiments, effector proteins comprise an HNH domain. In some embodiments, effector proteins comprise an HNH domain.
[122] In some embodiments, the effector protein is a Type V effector protein. In general, a Type V protein comprises a RuvC domain and an HNH domain. In some embodiments, the effector protein is a Type II effector protein. In some embodiments, the Type II effector protein is a Cas9 protein. In general, a Type II protein comprises a RuvC domain and an HNH domain.
[123] An effector protein may have a length of at least about 200, at least about 225, at least about 250, at least about 275, at least about 300, at least about 325, at least about 350, at least about 375, at least about 400, at least about 425, at least about 450, at least about 475, at least about 500, at least about 525, at least about 550, at least about 575, at least about 600, at least about 625, at least about 650, at least about 675, at least about 700, at least about 725, at least about 750, at least about 775, at least about 800, at least about 825, at least about 850, at least about 875, at least about 900, at least about 925, at least about 950, at least about 975, at least about 1,000, or more linked amino acids. In some embodiments, the length of an effector is less than 1,000 linked amino acids. In some embodiments, the length of the effector protein is 300 to 500, 300 to 600, 300 to 700, 300 to 800, 300 to 900, 400 to 500, 400 to 600, 400 to 700, 400 to 800, or 400 to 900 linked amino acids.
[124] TABLE 1 provides illustrative amino acid sequences of effector proteins that are useful in the compositions, systems and methods described herein.
[125] In some embodiments, compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the amino acid sequence of the effector protein comprises at least about 200 contiguous amino acids or more of any one of the sequences recited in TABLE 1. In some embodiments, the amino acid sequence of an effector protein provided herein comprises at least about 200 contiguous amino acids, at least about 225 contiguous amino acids, at least about 250 contiguous amino acids, at least about 275 contiguous amino acids, at least about 300 contiguous amino acids, at least about 325 contiguous amino acids, at least about 350 contiguous amino acids, at least about 375 contiguous amino acids, at least about 400 contiguous amino acids, or more of any one of the sequences of TABLE 1
[126] In some embodiments, compositions, systems, and methods described herein comprise an effector protein or a nucleic acid encoding the effector protein, wherein the effector protein comprises a portion of any one of the sequences recited in TABLE 1. In some embodiments, the effector protein comprises a portion of any one of the sequences recited in TABLE 1, wherein the portion does not comprise at least the first 10 amino acids, at least the first 20 amino acids, at least the first 40 amino acids, at least the first 60 amino acids, at least the first 80 amino acids, at least the first 100 amino acids, at least the first 120 amino acids, at least the first 140 amino acids, at least the first 160 amino acids, at least the first 180 amino acids, or at least the first 200 amino acids of any one of the sequences recited in TABLE 1. In some embodiments, the effector protein comprises a portion of any one of the sequences recited in TABLE 1, wherein the
portion does not comprise the last 10 amino acids, the last 20 amino acids, the last 40 amino acids, the last 60 amino acids, the last 80 amino acids, the last 100 amino acids, the last 120 amino acids, the last 140 amino acids, the last 160 amino acids, the last 180 amino acids, or the last 200 amino acids of any one of the sequences recited in TABLE 1.
[127] In some embodiments, compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 65% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 70% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 75% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 80% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 85% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 90% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 95% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 97% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 98% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 99% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is identical to any one of the sequences as set forth in TABLE 1.
[128] In some embodiments, compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 80% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 85% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 90% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 95% similar to any
one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 97% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 98% identical to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is at least 99% similar to any one of the sequences as set forth in TABLE 1. In some embodiments, an effector protein provided herein comprises an amino acid sequence that is 100% similar to any one of the sequences as set forth in TABLE 1.
[129] In some embodiments, compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein comprises one or more amino acid alterations relative to any one of the sequences recited in TABLE 1. In some embodiments, the effector protein comprising one or more amino acid alterations is a variant of an effector protein described herein. It is understood that any reference to an effector protein herein also refers to an effector protein variant as described herein. In some embodiments, the one or more amino acid alterations comprises conservative substitutions, non-conservative substitutions, conservative deletions, non-conservative deletions, or combinations thereof. In some embodiments, an effector protein or a nucleic acid encoding the effector protein comprises 1 amino acid alteration, 2 amino acid alterations, 3 amino acid alterations, 4 amino acid alterations, 5 amino acid alterations, 6 amino acid alterations, 7 amino acid alterations, 8 amino acid alterations, 9 amino acid alterations, 10 amino acid alterations or more relative to any one of the sequences recited in TABLE 1.
[130] Additional, non-limiting examples of effector proteins are described in WO2021247924, WO2023141590, WO2022240858, W02024006824, WO2023092136, WO202222I58I, WO2020223634, WO2023028444, WO2023102329, and WO2023092132. In some embodiments, compositions, systems, and methods described herein comprise an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein comprises an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% identical to any one of the sequences described in these publications.
[131] In some embodiments, compositions, systems, and methods described herein comprise a nucleic acid encoding the effector protein, wherein the nucleic acid encoding the effector protein comprises RNA or messenger RNA (mRNA).
[132] In some embodiments, effector proteins described herein have been modified (also referred to as an engineered protein). In some embodiments, a modification of the effector proteins may include addition of one or more amino acids, deletion of one or more amino acids, substitution of one or more amino acids, or combinations thereof relative to a naturally occurring sequence. In some embodiments, effector proteins disclosed herein are engineered proteins. Unless otherwise indicated, reference to effector proteins throughout the present disclosure include engineered proteins thereof. In some embodiments, effector
proteins may comprise one or more modifications that may provide altered activity as compared to a naturally-occurring counterpart. For example, effector proteins may comprise one or more modifications that may provide increased activity as compared to a naturally-occurring counterpart. As another example, effector proteins may provide increased catalytic activity (e.g., nickase, nuclease, binding activity) as compared to a naturally-occurring counterpart. Effector proteins may provide enhanced nucleic acid binding activity (e.g. , enhanced binding of a guide nucleic acid, and/or target nucleic acid) as compared to a naturally-occurring counterpart. An effector protein may have a 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 120%, 140%, 160%, 180%, 200%, or more, increase of the activity of a naturally-occurring counterpart. Alternatively, effector proteins may comprise one or more modifications that reduce the activity of the effector proteins relative to a naturally occurring nuclease, or nickase. An effector protein may have a 100%, 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, 10%, 5%, 1%, or less, decrease of the activity of a naturally occurring counterpart. Decreased activity may be decreased catalytic activity (e.g., nickase, nuclease, binding activity) as compared to a naturally-occurring counterpart. In some embodiments, activity (e.g., nickase, nuclease, binding, activity) of effector proteins described herein can be measured relative to a naturally-occurring effector protein or compositions containing the same in a cleavage assay.
[133] In some embodiments, effector proteins described herein can be modified with the addition of one or more heterologous peptides or heterologous polypeptides (referred to collectively herein as a heterologous polypeptide). In some embodiments, an effector protein modified with the addition of one or more heterologous peptides or heterologous polypeptides may be referred to herein as a fusion protein. In some embodiments, heterologous polypeptides described herein are fused to effector protein(s). In some embodiments a fusion protein comprises heterologous polypeptide(s) and effector protein(s). In some embodiments, a heterologous peptide or heterologous polypeptide comprises a subcellular localization signal. In some embodiments, a subcellular localization signal can be a nuclear localization signal (NLS). In some embodiments, the NLS facilitates localization of a nucleic acid, protein, or small molecule to the nucleus, when present in a cell that contains a nuclear compartment. TABLE 2 lists exemplary NLS sequences. In some embodiments, the subcellular localization signal is a nuclear export signal (NES), a sequence to keep an effector protein retained in the cytoplasm, a mitochondrial localization signal for targeting to the mitochondria, a chloroplast localization signal for targeting to a chloroplast, an ER retention signal, and the like. In some embodiments, an effector protein described herein is not modified with a subcellular localization signal so that the polypeptide is not targeted to the nucleus, which can be advantageous depending on the circumstance (e.g., when the target nucleic acid is an RNA that is present in the cytosol). In some embodiments, the heterologous polypeptide is a cell penetrating peptide (CPP), also known as a Protein Transduction Domain (PTD). A CPP or PTD is a polypeptide, polynucleotide, carbohydrate, or organic or inorganic compound that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane. Further suitable heterologous polypeptides include, but are not limited to, proteins (or fragments/domains thereof) that are boundary elements (e.g., CTCF),
proteins and fragments thereof that provide periphery recruitment (e.g., Lamin A, Lamin B, etc.), and protein docking elements (e.g., FKBP/FRB, Pill/Abyl, etc.).
[134] In some embodiments, a heterologous peptide or heterologous polypeptide comprises a protein tag. In some embodiments, the protein tag is referred to as purification tag or a fluorescent protein. The protein tag may be detectable for use in detection of the effector protein and/or purification of the effector protein. Accordingly, in some embodiments, compositions, systems and methods comprise a protein tag or use thereof. Any suitable protein tag may be used depending on the purpose of its use. Non-limiting examples of protein tags include a fluorescent protein, a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and maltose binding protein (MBP). In some embodiments, the protein tag is a portion of MBP that can be detected and/or purified. Non-limiting examples of fluorescent proteins include green fluorescent protein (GFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), cyan fluorescent protein (CFP), mCherry, and tdTomato.
[135] A heterologous polypeptide may be located at or near the amino terminus (N-terminus) of the effector protein disclosed herein. A heterologous polypeptide may be located at or near the carboxy terminus (C-terminus) of the effector proteins disclosed herein. In some embodiments, a heterologous polypeptide is located internally in an effector protein described herein (z.e., is not at the N- or C- terminus of an effector protein described herein) at a suitable insertion site.
[136] In some embodiments, effector proteins described herein are encoded by a codon optimized nucleic acid. In some embodiments, a nucleic acid sequence encoding an effector protein described herein, is codon optimized. In some embodiments, effector proteins described herein may be codon optimized for expression in a specific cell, for example, a bacterial cell, a plant cell, a eukaryotic cell, an animal cell, a mammalian cell, or a human cell. In some embodiments, the effector protein is codon optimized for a human cell.
B. Reverse Transcriptases & Accessory Proteins
[137] Provided herein are compositions, systems, and methods comprising a reverse transcriptase or a use thereof. In some instances, the reverse transcriptase is fused to an effector protein. In some instances, the reverse transcriptase is recruited to the effector protein or to a target nucleic acid to be modified by the effector protein.
[138] In some embodiments, a fusion partner imparts some function or activity to a fusion protein that is not provided by an effector protein, including but not limited to reverse transcriptase activity, nuclease activity, ligase activity, or combinations thereof. In some embodiments, the compositions, systems and methods provided herein comprise one or more fusion partners. In some embodiments, the fusion partner described herein comprise one or more subcellular localization signals described herein. In some embodiments, the one or more fusion partners comprise at least one, at least two, at least three, at least four, at least five, or more fusion partners. In some embodiments, the one or more fusion partners comprise one, two, three, four, fiveor more fusion partners.
[139] In some embodiments, the fusion partners described herein function to repair DNA single-strand breaks or DNA double-strand breaks. In some embodiments, the repair can be with or without insertion of the donor nucleic acids. Accordingly, in some embodiments, the fusion partner comprises one or more proteins associated with the NHEJ and/or HDR mechanism (e.g. , repair factors described herein). In some embodiments, the HDR mechanism is governed by a homology between a donor DNA (e.g. , a donor nucleic acid) and an acceptor DNA (e.g., target nucleic acid). In some embodiments, the HDR mechanism comprises an abbreviated homologous recombination, a single-strand annealing or a breakage -induced replication. In some embodiments, the homology in abbreviated HDR mechanism is greater than the homology in breakage-induced replication repair mechanism.
[140] In some embodiments, a fusion partner comprises reverse transcriptase activity. In some embodiments, a fusion partner comprises a reverse transcriptase. In some embodiments, fusions of effector proteins and reverse transcriptases described herein are referred to as fusion proteins. In some embodiments, fusion proteins comprise a reverse transcriptase fused to an effector protein by a linker, such as a linker described herein. In some embodiments, fusion proteins comprise a reverse transcriptase directly fused to an effector protein. In some embodiments, fusion proteins comprise a reverse transcriptase that is not fused to an effector protein. In such embodiments, reverse transcriptases are localized to systems described herein, effector proteins described herein, RNP complexes described herein, or to target nucleic acids described herein.
[141] In some embodiments, fusion proteins described herein are also be referred to as Cas-RT fusions. In some embodiments, Cas-RT fusions comprise a reverse transcriptase fused to an effector protein by a linker, such as a linker described herein. In some embodiments, Cas-RT fusions comprise a reverse transcriptase directly fused to an effector protein. In some embodiments, Cas-RT fusions comprise a reverse transcriptase that is not fused to an effector protein. In such embodiments, reverse transcriptases are localized to an effector protein, to an RNP complex, or to a target nucleic acid.
[142] In some embodiments, fusion proteins or effector proteins described herein leverage reverse transcriptase activity of reverse transcriptases (e.g., as fusion partners or as separate entities) to edit or modify target nucleic acids as described herein. In some embodiments, reverse transcriptases are capable of using an RNA sequence contained in a crRNA or intRNA to reverse transcribe to produce a reverse- transcribed DNA (RT-DNA). In some embodiments, reverse transcriptases do not rely upon use of secondary structural elements (e.g., hairpin loops) to produce a reverse-transcribed DNA (RT-DNA).
[143] In some embodiments, a reverse transcriptase catalyzes the transcription of an RNA sequence into a reverse-transcribed DNA (RT-DNA). In some embodiments, the RNA sequence is a template sequence as described herein. As described in further detail en supra, in some embodiments, a template sequence is extended from a nucleic acid described herein (e.g., an extended guide nucleic acid, such as an extended crRNA, or an extended intRNA). In some embodiments, a RT-DNA comprises a nucleic acid that can serve as a donor nucleic acid (e.g. , a ss donor nucleic acid) that is incorporated into a target nucleic acid or
genome. In some embodiments, a RT-DNA comprises a nucleic acid that is can serve as a template to generate a donor nucleic acid that is incorporated into a target nucleic acid or genome. In some embodiments, a RT-DNA comprises a nucleic acid that is capable as serves as a substrate for the activity of systems, compositions, and/or methods described herein. In some embodiments, a RT-DNA that serves as a substrate as described herein, is capable of facilitating the introduction of a DNA sequence modification (e.g., an insertion, a deletion, a substitution, or combinations thereof) into a locus by homologous recombination using nucleic acid-guided nucleases, such as a ligase.
[144] In some embodiments, a reverse transcriptase is capable of reverse transcribing an RNA strand (e.g., a template sequence) without the use of a homologous structural scaffold. In naturally occurring systems, reverse transcriptase activity on an RNA strand, such as the activity of a bacterial retron, requires a homologous structural scaffold on the RNA strand to support the reverse transcriptase activity as it moves down the 3’ end of the RNA strand . In some embodiment, reverse transcriptases described herein utilize guide nucleic acids described herein as a structural scaffold for reverse transcriptase activity. In such embodiments, guide nucleic acids described herein comprise a nucleotide sequence heterologous of the reverse transcriptase.
[145] In some embodiments, reverse transcriptases described herein are viral reverse transcriptases. Exemplary reverse transcriptases are set forth in TABLE 1.1. In some embodiments, systems and methods comprise a reverse transcriptase, nucleic acid encoding the reverse transcriptase, or a use thereof, wherein the reverse transcriptase is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to a reverse transcriptase set forth in TABLE 1.1. In some embodiments, a reverse transcriptase is an RNA-dependent DNA polymerase (RDDP). In some embodiments, the RDDP is an RDDP described in WO 2024/040202.
[146] In some embodiments, compositions, systems, and methods comprise one or more accessory proteins or uses thereof. In some embodiments, systems described herein recruit one or more accessory protein. In some embodiments, reverse transcriptase activity, cleavage activity or both, of the effector proteins, fusion partners and/or fusion proteins described herein recruit one or more accessory protein. In some embodiments, the one or more accessory protein is exogenous. In some embodiments the one or more accessory protein is exogenous.
[147] In some embodiments, one or more accessory proteins comprise one or more nucleases (e.g., endonucleases), polymerases, ligases, or combinations thereof. In some embodiments, the one or more accessory proteins comprise one or more nucleases to resect damaged DNA. In some embodiments, the one or more accessory proteins comprise one or more polymerases to fdl-in new DNA. In some embodiments, the one or more accessory proteins comprise one or more ligases to restore integrity to the DNA strands. In some embodiments, the one or more one or more accessory proteins comprises from Ku 70/80, DNA-PKcs, Artemis, Pol p, Pol X, XRCC4, ligase IV, XLF, or a combination thereof, wherein the one or more accessory proteins function to repair DNA by NHEJ. In some embodiments, the one or more one or more accessory
proteins comprises BRCA1, BRCA2, CtIP, EX01, BLM, MRE11, Nbsl, PALB2, RAD50, RAD51 (e.g., RAD51B, RAD51C, and RAD51D), XRCC2, XRCC3, RAD52, RAD548, replication protein A (RPA), SWSAP1, or a combination thereof, wherein the one or more one or more accessory proteins function to repair DNA by HDR.
[148] In some embodiments, compositions, systems, and methods comprising a DNA repair protein or a use thereof. In some embodiments, compositions, systems, and methods comprising a ligase or a use thereof. In some embodiments, compositions, systems, and methods comprising an endonuclease or a use thereof. In some embodiments, DNA repair proteins, ligases and endonucleases are endogenous to a cell or subject. In some embodiments, DNA repair proteins, ligases and endonucleases are exogenous factors provided with the system or composition.
C. Protein Linkers
[149] In some embodiments, systems comprise a first polypeptide and a second polypeptide connected by a linker. By way of non-limiting example, an effector protein may be connected to a fusion partner protein, e.g., a reverse transcriptase, a ligase, an SDSA, or an accessory protein via a linker. The linker may comprise or consist of a covalent bond. The linker may comprise or consist of a chemical group. In some embodiments, the linker comprises an amino acid. In some embodiments, a peptide linker comprises at least two amino acids linked by an amide bond. In general, the linker connects a terminus of the first polypeptide to a terminus of the second polypeptide. In some embodiments, carboxy terminus of the first polypeptide is linked to the amino terminus of the second polypeptide. In some embodiments, carboxy terminus of the second polypeptide is linked to the amino terminus of the first polypeptide. In some embodiments, the first polypeptide and the second polypeptide are directly linked by a covalent bond.
[150] In some embodiments, linkers comprise one or more amino acids. In some embodiments, linker is a protein. In some embodiments, a terminus of the effector protein is linked to a terminus of the fusion partner through an amide bond. In some embodiments, a terminus of the effector protein is linked to a terminus of the fusion partner through a peptide bond. In some embodiments, linkers comprise an amino acid. In some embodiments, linkers comprise a peptide. In some embodiments, an effector protein is coupled to a fusion partner by a linker protein. In some embodiments, the linker may have any of a variety of amino acid sequences. In some embodiments, the linker may comprise a region of rigidity (e.g., beta sheet, alpha helix), a region of flexibility, or any combination thereof. In some embodiments, the linker comprises small amino acids, such as glycine and alanine, that impart high degrees of flexibility. The ordinarily skilled artisan will recognize that design of a peptide conjugated to any desired element may include linkers that are all or partially flexible, such that the linker may include a flexible linker as well as one or more portions that confer less flexible structure. Suitable linkers include proteins of 4 linked amino acids to 40 linked amino acids in length, or between 4 linked amino acids and 25 linked amino acids in length. In some embodiments, linked amino acids described herein comprise at least two amino acids linked by an amide bond.
[151] Linkers may be produced by using synthetic, linker-encoding oligonucleotides to couple proteins, or may be encoded by a nucleic acid sequence encoding a fusion protein (e.g., an effector protein coupled to a fusion partner). In some embodiments, the linker is from 1 to 100 amino acids in length. In some embodiments, the linker is more 100 amino acids in length. In some embodiments, the linker is from 10 to 27 amino acids in length. In some embodiments, linker proteins include glycine polymers (G)n, glycineserine polymers (including, for example, (GS)n, GSGGSn (SEQ ID NO: 72), GGSGGSn (SEQ ID NO: 73), and GGGSn (SEQ ID NO: 74), where n is an integer of at least one), glycine-alanine polymers, and alanine-serine polymers. In some embodiments, linkers may comprise amino acid sequences including, but not limited to, GGSG (SEQ ID NO: 75), GGSGG (SEQ ID NO: 76), GSGSG (SEQ ID NO: 77), GSGGG (SEQ ID NO: 78), GGGSG (SEQ ID NO: 68), and GSSSG (SEQ ID NO: 69). In some embodiments, the linker comprises one or more repeats a tri -peptide GGS. In some embodiments, the linker is an XTEN linker. In some embodiments, the XTEN linker is an XTEN80 linker. In some embodiments, the XTEN linker is an XTEN20 linker. In some embodiments, the XTEN20 linker has an amino acid sequence of GSGGSPAGSPTSTEEGTSESATPGSG (SEQ ID NO: 70).
[152] In some embodiments, linkers do not comprise an amino acid. In some embodiments, linkers do not comprise a peptide. In some embodiments, linkers comprise a nucleotide, a polynucleotide, a polymer, or a lipid. In some embodiments, linker may be a polyethylene glycol (PEG), polypropylene glycol (PPG), co-poly(ethylene/propylene) glycol, polyoxyethylene (POE), polyurethane, polyphosphazene, polysaccharides, dextran, polyvinyl alcohol, polyvinylpyrrolidones, polyvinyl ethyl ether, polyacrylamide, polyacrylate, polycyanoacrylates, lipid polymers, chitins, hyaluronic acid, heparin, or an alkyl linker.
D. Guide Nucleic Acids
[153] The compositions, systems, and methods of the present disclosure may comprise a guide nucleic acid or a use thereof. Unless otherwise indicated, compositions, systems, and methods comprising guide nucleic acids or uses thereof, as described herein and throughout, include DNA molecules, such as expression vectors, that encode a guide nucleic acid. Accordingly, compositions, systems, and methods of the present disclosure comprise a guide nucleic acid or a nucleotide sequence encoding the guide nucleic acid.
[154] In some embodiments, the guide nucleic acid comprises a nucleotide sequence. Such nucleotide sequence may be described as a nucleotide sequence of either DNA or RNA, however, no matter the form the sequence is described, it is readily understood that such nucleotide sequences can be revised to be RNA or DNA, as needed, for describing a sequence within a guide nucleic acid itself or the sequence that encodes a guide nucleic acid. Similarly, disclosure of the nucleotide sequences described herein also discloses a complementary nucleotide sequence, a reverse nucleotide sequence, and the reverse complement nucleotide sequence, any one of which can be a nucleotide sequence for use in a guide nucleic acid. In some embodiments, a guide nucleic acid sequence(s) comprises one or more nucleotide alterations at one or more
positions in any one of the sequences described herein. Alternative nucleotides can be any one or more of A, C, G, T or U, or a deletion, or an insertion.
[155] The compositions, systems, and methods of the present disclosure may comprise a guide nucleic acid, a nucleic acid encoding the guide nucleic acid, or a use thereof. Unless otherwise indicated, compositions, systems, and methods comprising guide nucleic acids or uses thereof, as described herein and throughout, include DNA molecules, such as expression vectors, that encode a guide nucleic acid. Guide nucleic acids are also referred to herein as “guide RNA.” A guide nucleic acid, as well as any components thereof (e.g., spacer sequence, repeat sequence, linker nucleotide sequence, etc.) may comprise one or more deoxyribonucleotides, ribonucleotides, biochemically or chemically modified nucleotides (e.g., one or more engineered modifications as described herein), or any combinations thereof.
[156] In general, guide nucleic acids disclosed herein are not naturally occurring. A guide nucleic acid may comprise a non-naturally occurring sequence, wherein the sequence of the guide nucleic acid, or any portion thereof, may be different from the sequence of a naturally occurring guide nucleic acid. In some embodiments, a guide nucleic acid comprises two naturally occurring sequences that do not occur in nature together. A guide nucleic acid of the present disclosure may comprise an engineered modification that makes it different from a nucleic acid that occurs in nature. A guide nucleic acid may be chemically synthesized or recombinantly produced by any suitable methods. In some embodiments, guide nucleic acids described herein comprise one or more engineered modifications. Non-limiting examples of engineered modifications include: 2’0-methyl modified nucleotides (e.g., 2’-O-Methyl (2’0Me) sugar modifications); 2’ fluoro modified nucleotides (e.g., 2’-fluoro (2’-F) sugar modifications); locked nucleic acid (LNA) modified nucleotides; peptide nucleic acid (PNA) modified nucleotides; nucleotides with phosphorothioate linkages; a 5’ cap (e.g., a 7-methylguanylate cap (m7G)), phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates, 5'-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3 '-amino phosphoramidate and aminoalkyl phosphoramidates, phosphorodiamidates, thionophosphor amidates, thionoalkylphosphonates , thionoalkylphosphotriesters, selenophosphates and boranophosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein one or more intemucleotide linkages is a 3' to 3', 5' to 5' or 2' to 2' linkage; phosphorothioate and/or heteroatom intemucleoside linkages, such as -CH2-NH-O-CH2-, -CH2- N(CH3)-O-CH2- (known as a methylene (methylimino) or MMI backbone), -CH2-O-N(CH3)-CH2-, -CH2- N(CH3)- N(CH3)-CH2- and -O-N(CH3)-CH2-CH2- (wherein the native phosphodiester intemucleotide linkage is represented as -0-P(=0)(0H)-0-CH2-); morpholino linkages (formed in part from the sugar portion of a nucleoside); morpholino backbones; phosphorodiamidate or other non-phosphodiester intemucleoside linkages; siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; riboacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones;
sulfonate and sulfonamide backbones; amide backbones; other backbone modifications having mixed N, O, S and CH2 component parts; and combinations thereof
[157] In some embodiments, a guide nucleic acid comprises a first region that is not complementary to a target nucleic acid (FR) and a second region is complementary to the target nucleic acid (SR), wherein the FR and the SR are heterologous to each other. In some embodiments, FR is located 5’ to SR (FR-SR). In some embodiments, SR is located 5’ to FR (SR-FR). In some embodiments, the FR comprises one or more repeat sequence, intermediary sequence, combinations thereof. In some embodiments, at least a portion of the FR interacts or binds to an effector protein. In some embodiments, the SR comprises a spacer sequence, wherein the spacer sequence can interact in a sequence -specific manner with (e.g., has complementarity with, or can hybridize to a target sequence in) a target nucleic acid. In some embodiments, the first region, the second region, or both may be about 8 nucleic acids, about 10 nucleic acids, about 12 nucleic acids, about 14 nucleic acids, about 16 nucleic acids, about 18 nucleic acids, about 20 nucleic acids, about 22 nucleic acids, about 24 nucleic acids, about 26 nucleic acids, about 28 nucleic acids, about 30 nucleic acids, about 32 nucleic acids, about 34 nucleic acids, about 36 nucleic acids, about 38 nucleic acids, about 40 nucleic acids, about 42 nucleic acids, about 44 nucleic acids, about 46 nucleic acids, about 48 nucleic acids, or about 50 nucleic acids long. In some embodiments, the first region, the second region, or both may be from about 8 to about 12, from about 8 to about 16, from about 8 to about 20, from about 8 to about 24, from about 8 to about 28, from about 8 to about 30, from about 8 to about 32, from about 8 to about 34, from about 8 to about 36, from about 8 to about 38, from about 8 to about 40, from about 8 to about 42, from about 8 to about 44, from about 8 to about 48, or from about 8 to about 50 nucleic acids long. In some embodiments, the first region, the second region, or both may comprise a GC content of about 1%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, or about 99%. In some embodiments, the first region, the second region, or both may comprise a GC content of from about 1% to about 95%, from about 5% to about 90%, from about 10% to about 80%, from about 15% to about 70%, from about 20% to about 60%, from about 25% to about 50%, or from about 30% to about 40%.
[158] In some embodiments, a guide nucleic acid comprises about: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 linked nucleotides. In general, a guide nucleic acid comprises at least: 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60 linked nucleotides. In some embodiments, the length of a guide nucleic acid is about 30 to about 200 linked nucleotides. In some embodiments, the length of a guide nucleic acid is about 40 to about 150, about 40 to about 120, about 40 to about 100, about 40 to about 90, about 40 to about 80, about 40 to about 70, about 40 to about 60, about 40 to about 50, about 50 to about 90, about 50 to about 80, about 50 to about 70, or about 50 to about 60 linked nucleotides. In some embodiments, the length of a guide nucleic acid is about 40, about 45, about 50, about 55, about 60, about 65, about 70 or about 75 linked nucleotides. In some embodiments, the length of a guide nucleic acid is greater than about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about
60, about 65, about 70 or about 75 linked nucleotides. In some embodiments, the length of a guide nucleic acid is not greater than about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 105, about 110, about 115, about 120, or about 125 linked nucleotides.
[159] In some embodiments, a guide nucleic acid comprises at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 contiguous nucleotides that are complementary to a eukaryotic sequence. Such a eukaryotic sequence is a nucleotide sequence that is present in a host eukaryotic cell. Such a nucleotide sequence is distinguished from nucleotide sequences present in other host cells, such as prokaryotic cells, or viruses. Said sequences present in a eukaryotic cell can be located in a gene, an exon, an intron, a non-coding (e.g., promoter or enhancer) region, a selectable marker, tag, signal, and the like. In some embodiments, a target sequence is a eukaryotic sequence. In some embodiments, the guide nucleic acid comprises a nucleotide sequence that is capable of hybridizing to a target sequence in a target nucleic acid, wherein the target nucleic acid is any one of: a naturally occurring eukaryotic sequence, a naturally occurring prokaryotic sequence, a naturally occurring viral sequence, a naturally occurring bacterial sequence, a naturally occurring fungal sequence, an engineered eukaryotic sequence, an engineered prokaryotic sequence, an engineered viral sequence, an engineered bacterial sequence, an engineered fungal sequence, a fragment of a naturally occurring sequence, a fragment of an engineered sequence, and combinations thereof.
[160] In some embodiments, compositions, systems and methods described herein comprise a dual guide nucleic acid system (or simply, “dual guide system”) comprising a crRNA or a nucleotide sequence encoding the crRNA, an intermediary RNA or a nucleotide sequence encoding the intermediary RNA, wherein the crRNA and the intermediary RNA are separate, unlinked molecules, wherein a repeat hybridization region of the intermediary RNA is capable of hybridizing with an equal length portion of the crRNA to form a intRNA-crRNA duplex, and wherein a spacer sequence of the crRNA is capable of hybridizing to a target sequence of the target nucleic acid. An intermediary RNA and/or intRNA-crRNA duplex may form a secondary structure that facilitates the binding of an effector protein to a target nucleic acid. In some embodiments, the crRNA is linked to the intermediary sequence to form a single guide nucleic acid (sgRNA). crRNA
[161] In some embodiments, a guide nucleic acid comprises a crRNA. In some embodiments, the guide nucleic acid is the crRNA. In general, a crRNA comprises a first region (FR) and a second region (SR), wherein the FR of the crRNA comprises a repeat sequence, and the SR of the crRNA comprises a spacer sequence. In some embodiments, the repeat sequence and the spacer sequences are directly connected to each other (e.g. , covalent bond (phosphodiester bond)). In some embodiments, the repeat sequence and the spacer sequence are connected by a linker. In general, Type V Cas effector proteins function with a crRNA, wherein the FR is located 5 ’ of the SR. In general, Type II Cas effector proteins function with a crRNA,
wherein the FR is located 3’ of the SR. The FR may be immediately 5’ of the SR (in the case of Type V). The FR may be immediately 3’ of the SR (in the case of Type II). The FR may be separated from the SR by one or more nucleotides. Non-limiting examples of repeat sequences are provided in TABLE 3.
[162] In some embodiments, the crRNA comprises an extension sequence. In some embodiments, a crRNA comprising an extension sequence is an extended crRNA. In some embodiments, systems comprise a Type V Cas effector protein and an extended crRNA, wherein the extended crRNA comprises an extension sequence at the 5’ end of the crRNA. In some embodiments, the extension sequence comprises a template sequence. In some embodiments, the extension sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the template sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the extension sequence is not greater than 200, 500 or 1000 nucleotides. In some embodiments, the extension sequence is not greater than lOkb or not greater than 20kb. In some embodiments, the template sequence cannot hybridize to the target sequence or the reverse complement thereof. In some embodiments, the template sequence is less than 100%, less than 99%, less than 98% less than 95%, less than 90%, less than 80%, less than 70%, less than 60%, or less than 50% complementary to the target sequence or the reverse complement thereof.
[163] In some embodiments, the crRNA comprises a homology sequence. In some embodiments, the crRNA comprises a homology sequence, wherein the effector protein is a type II Cas protein. In some embodiments, a crRNA comprising a homology sequence does not comprise an extension sequence. In some embodiments, the homology sequence is located at the 5’ end of the crRNA. In some embodiments, the homology sequence is located at the 3’ end of the crRNA. In some embodiments, the homology sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the homology sequence is not greater than 100 nucleotides. In some embodiments, a crRNA comprising a homology sequence is part of a dual nucleic acid system having an intermediary RNA, wherein the intermediary RNA comprises an extension sequence. In such embodiments, the extension sequence of the intermediary RNA comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the extension sequence of the intermediary RNA is not greater than 100 nucleotides. In some embodiments, the homology sequence has 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mismatched nucleotides to the extension sequence.
[164] A crRNA may include deoxyribonucleosides, ribonucleosides, chemically modified nucleosides, or any combination thereof. In some embodiments, a crRNA comprises about: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 linked nucleotides. In some embodiments, a crRNA comprises at least: 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60 linked nucleotides. In some embodiments, the length of the crRNA is about 20 to about 120 linked nucleotides. In some embodiments, the length of a crRNA is about 20 to about 100, about 30 to about 100, about 40 to about 100, about 40 to
about 90, about 40 to about 80, about 40 to about 70, about 40 to about 60, about 40 to about 50, about 50 to about 90, about 50 to about 80, about 50 to about 70, or about 50 to about 60 linked nucleotides. In some embodiments, the length of a crRNA is about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70 or about 75 linked nucleotides. In some embodiments, the repeat sequence is between 5 and 10, 10 and 50, 12 and 48, 14 and 46, 16 and 44, and 18 and 42 nucleotides in length. In some embodiments, a spacer sequence comprises at least 5 to about 50 linked nucleotides. In some embodiments, a spacer sequence comprises at least 5 to about 50, at least 5 to about 25, at least about 10 to at least about 25, or at least about 15 to about 25 linked nucleotides. In some embodiments, the spacer sequence comprises 15-28 linked nucleotides. In some embodiments, a spacer sequence comprises 15-26, 15-24, 15-22, 15-20, 15-18, 16-28, 16-26, 16-24, 16-22, 16-20, 16-18, 17-26, 17-24, 17-22, 17-20, 17-18, 18-26, 18-24, or 18-22 linked nucleotides. In some embodiments, the spacer sequence comprises 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides.
[165] It is understood that the spacer sequence of a spacer sequence need not be 100% complementary to that of a target sequence of a target nucleic acid to hybridize or hybridize specifically to the target sequence . For example, the spacer sequence may comprise at least one alteration, such as a substituted or modified nucleotide, that is not complementary to the corresponding nucleotide of the target sequence. In some embodiments, a spacer sequence comprises at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, or at least 15 contiguous nucleotides that are complementary to a target sequence in a target nucleic acid. In some embodiments, the spacer sequence comprises at least 10 contiguous nucleotides that are complementary to the target sequence in the target nucleic acid. In some embodiments, the spacer sequence is at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% complementary to the target sequence.
Intermediary RNA
[166] In general, an intermediary RNA comprises a protein binding sequence that can form a secondary structure (e.g., hairpin, stem-loop), wherein the secondary structure can be recognized and bound by an effector protein. In general, an intermediary RNA comprises a repeat hybridization sequence that hybridizes to at least a portion of a repeat sequence of a crRNA. Non-limiting examples of protein binding sequences are provided in TABLE 4.
[167] In some embodiments, a repeat hybridization sequence is at the 3 ’ end of an intermediary RNA. In general, Type V Cas proteins bind an intermediary RNA with a repeat hybridization a the 3 ’ end of the intermediary RNA. In some embodiments, a repeat hybridization sequence is at the 3’ end of an intermediary RNA. In general, Type II Cas proteins bind an intermediary RNA with a repeat hybridization athe 5 ’ end of the intermediary RNA. In some embodiments, the length of the repeat hybridization sequence is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, or 20 linked nucleotides. In some embodiments, the length of the repeat hybridization sequence is 1 to 10, 10-20, or 20-30 linked nucleotides.
[168] In some embodiments, the 3’ end of the intermediary RNA (in the case of Type V systems) or the 3 ’ end of the crRNA (in the case of Type II systems) is complementary or hybridizable to the 5 ’ end of a template sequence described herein. In some instances, at least 2, at least 3, at least 4, at least 5, or at least 10 nucleotides are complementary. By way of non-limiting example, production of an intermediary RNA for a Type V effector protein is driven by a U6 promoter which adds three uracils to the 3’ end of the intermediary RNA. In order to ensure that the 3’ end of the intermediary RNA hybridizes to the 5’ end of the template sequence of the crRNA may comprise three adenosines.
[169] In some embodiments, the intermediary RNA comprises an extension sequence. In some embodiments, an intermediary RNA comprising an extension sequence is an extended intRNA. In some embodiments, systems comprise a Type II Cas effector protein and an extended intermediary RNA, wherein the extended intermediary RNA comprises an extension sequence at the 5’ end of the intermediary RNA. In some embodiments, the intermediary RNA does not comprise an extension sequence at the 3 ’ end of the intermediary RNA. In some embodiments, the extension sequence comprises a template sequence. In some embodiments, the extension sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the template sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the extension sequence is not greater than 200, 500, or 1000 nucleotides. In some embodiments, the extension sequence is not greater than lOkb or not greater than 20kb. In some embodiments, the template sequence cannot hybridize to the target sequence or the reverse complement thereof. In some embodiments, the template sequence is less than 100%, less than 99%, less than 98% less than 95%, less than 90%, less than 80%, less than 70%, less than 60%, or less than 50% complementary to the target sequence or the reverse complement thereof.
[170] In some embodiments, the intermediary RNA does not comprise an extension sequence on the 3 ’ end. In some embodiments, the intermediary RNA consists essentially of a protein binding sequence, optionally a linker sequence, and a repeat hybridization sequence. In some embodiments, the intermediary RNA comprises an extension sequence at its 3 ’ end that is not complementary to a portion of a target sequence. In some embodiments, the portion of the target sequence comprises at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides.
[171] In some embodiments, the intermediary RNA comprises a homology sequence. In some embodiments, the intermediary RNA comprises a homology sequence, wherein the effector protein is a type V Cas protein. In certain embodiments, an intermediary RNA comprising a homology sequence does not comprise an extension sequence. In some embodiments, the homology sequence is located at the 5’ end of the intermediary RNA. In some embodiments, the homology sequence is located at the 3’ end of the intermediary RNA In some embodiments, the homology sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the homology sequence is not greater than 100 nucleotides. In some embodiments, an intermediary RNA comprising a homology sequence is part of a dual nucleic acid system wherein the crRNA comprises an extension sequence. In
some embodiments, the extension sequence comprises at least 5, at least 10, at least 20, at least 30, at least 40, or at least 50 nucleotides. In some embodiments, the extension sequence is not greater than 100 nucleotides. In some embodiments, the homology sequence has 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mismatched nucleotides to the extension sequence.
[172] In some embodiments, a length of the protein binding sequence is at least 30, 50, 70, 90, 110, 130, 150, 170, 190, or 210 linked nucleotides. In some embodiments, a length of the protein binding sequence is not greater than 30, 50, 70, 90, 110, 130, 150, 170, 190, or 210 linked nucleotides. In some embodiments, the length of the protein binding sequence is about 30 to about 210, about 60 to about 210, about 90 to about 210, about 120 to about 210, about 150 to about 210, about 180 to about 210, about 30 to about 180, about 60 to about 180, about 90 to about 180, about 120 to about 180, or about 150 to about 180 linked nucleotides.
[173] In some embodiments, guide nucleic acids comprise one or more linkers connecting different nucleotide sequences as described herein. A linker may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotides. A linker may be any suitable linker, examples of which are described herein.
[174] In some embodiments, a linker is a degradable linker or a cleavable linker. In some embodiments, a linker is a self-cleavable linker. Examples of self-cleavable polypeptide linkers include T2A
E. Protospacer Adjacent Motif (PAM) Sequences
[175] Effector proteins of the present disclosure may cleave or nick a target nucleic acid within or near a protospacer adjacent motif (PAM) sequence of the target nucleic acid. In some embodiments, the target nucleic acid is a double stranded nucleic acid comprising a target strand and a non-target strand. In some embodiments, cleavage occurs within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides of a 5’ or 3’ terminus of a PAM sequence. In some embodiments, effector proteins described herein recognize a PAM sequence. In some embodiments, recognizing a PAM sequence comprises interacting with a sequence adjacent to the PAM. In some embodiments, a target nucleic acid comprises a target sequence that is adjacent to a PAM sequence. In some embodiments, the effector protein does not require a PAM to bind and/or cleave a target nucleic acid. Non-limiting examples of PAMs are provided in TABLE 6.
[176] In some embodiments, a target nucleic acid is a double stranded nucleic acid comprising a target strand and a non-target strand. In some embodiments, the PAM sequence is located on the target strand. In some embodiments, the PAM sequence is located on the non-target strand. In some embodiments, the PAM sequence described herein is adjacent (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides) to the target sequence on the target strand or the non-target strand. In some embodiments, the PAM sequence is located 5 ’ of the target sequence on the non-target strand. In some embodiments, such a PAM described herein is directly adjacent to the target sequence on the target strand or the non-target strand. In some embodiments, an RNP cleaves the target strand or the non-target strand. In some
embodiments, the RNP cleaves both, the target strand and the non-target strand. In some embodiments, an RNP recognizes the PAM sequence, and hybridizes to a target sequence of the target nucleic acid. In some embodiments, the RNP cleaves the target nucleic acid, wherein the RNP has recognized the PAM sequence and is hybridized to the target sequence. In some embodiments, cleavage occurs within 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides of a 5’ or 3’ terminus of a PAM sequence.
F. Target Nucleic Acids
[177] Disclosed herein are compositions, systems and methods for modifying a target nucleic acid. In general, the target nucleic acid is a double stranded DNA molecule. Non-limiting examples of target nucleic acids are provided in TABLE 6.
[178] In some embodiments, target nucleic acids described herein comprise a mutation. In some embodiments, a composition, system or method described herein can be used to edit a target nucleic acid comprising a mutation such that the mutation is edited to be the wild-type nucleotide or nucleotide sequence. In some embodiments, a composition, system or method described herein can be used to detect a target nucleic acid comprising a mutation. A mutation may result in the insertion of at least one amino acid in a protein encoded by the target nucleic acid. A mutation may result in the deletion of at least one amino acid in a protein encoded by the target nucleic acid. A mutation may result in the substitution of at least one amino acid in a protein encoded by the target nucleic acid. A mutation that results in the deletion, insertion, or substitution of one or more amino acids of a protein encoded by the target nucleic acid may result in misfolding of a protein encoded by the target nucleic acid. A mutation may result in a premature stop codon, thereby resulting in a truncation of the encoded protein.
[179] Non-limiting examples of mutations are insertion-deletion (indel), a point mutation, single nucleotide polymorphism (SNP), a chromosomal mutation, a copy number mutation or variation, and frameshift mutations. In some embodiments, an indel mutation is an insertion or deletion of one or more nucleotides. In some embodiments, a point mutation comprises a substitution, insertion, or deletion. In some embodiments, a frameshift mutation occurs when the number of nucleotides in the insertion/deletion is not divisible by three, and it occurs in a protein coding region. In some embodiments, a chromosomal mutation can comprise an inversion, a deletion, a duplication, or a translocation of one or more nucleotides. In some embodiments, a copy number variation can comprise a gene amplification or an expanding trinucleotide repeat. In some embodiments, an SNP is associated with a phenotype of the sample or a phenotype of the organism from which the sample was taken. In some embodiments, an SNP is associated with altered phenotype from wild type phenotype. In some embodiments, the SNP is a synonymous substitution or a nonsynonymous substitution. In some embodiments, the nonsynonymous substitution is a missense substitution or a nonsense point mutation. In some embodiments, the synonymous substitution is a silent substitution.
G. Nucleic Acid Expression Vectors
[180] Compositions, systems, and methods described herein comprise a nucleic acid expression vector or a use thereof. In some embodiments, the nucleic acid of interest comprises a nucleotide sequence that encodes one or more components of the composition or system described herein. In some embodiments, a vector may be part of a vector system. The vector system may comprise a library of vectors each encoding one or more component of a composition or system described herein. In some embodiments, components described herein (e.g., an effector protein, a guide nucleic acid, a reverse transcriptase, an intermediary RNA, an extended crRNA, or a combination thereof) are encoded by the same vector. In some embodiments, components described herein (e.g., an effector protein, a guide nucleic acid, , a reverse transcriptase, an intermediary RNA, an extended crRNA, or a combination thereof) are each encoded by different vectors of the system.
[181] In some embodiments, a vector may comprise or encode one or more regulatory elements. Regulatory elements may refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence or a coding sequence and/or regulate translation of an encoded polypeptide. In some embodiments, a vector may comprise or encode for one or more additional elements, such as, for example, replication origins, antibiotic resistance (or a nucleic acid encoding the same), a tag (or a nucleic acid encoding the same), selectable markers, and the like. In some embodiments, a vector comprises or encodes for one or more elements, such as, for example, ribosome binding sites, and RNA splice sites.
[182] Vectors described herein generally encode a promoter - a regulatory region on a nucleic acid, such as a DNA sequence, capable of initiating transcription of a downstream (3' direction) coding or non-coding sequence. A promoter can be linked at its 3' terminus to a nucleic acid, the expression or transcription of which is desired, and extends upstream (5' direction) to include bases or elements necessary to initiate transcription or induce expression, which could be measured at a detectable level. A promoter can comprise a nucleotide sequence, referred to herein as a “promoter sequence”. The promoter sequence can include a transcription initiation site, and one or more protein binding domains responsible for the binding of transcription machinery, such as RNA polymerase. When eukaryotic promoters are used, such promoters can contain “TATA” boxes and “CAT” boxes. Various promoters, including inducible promoters, may be used to drive expression, i. e. , transcriptional activation, of the nucleic acid of interest. Accordingly, in some embodiments, the nucleic acid of interest can be operably linked to a promoter.
[183] Promotors may be any suitable type of promoter envisioned for the compositions, systems, and methods described herein. Examples include constitutively active promoters (e.g., CMV promoter), inducible promoters (e.g., heat shock promoter, tetracycline -regulated promoter, steroid-regulated promoter, metal-regulated promoter, estrogen receptor-regulated promoter, etc.), spatially restricted and/or
temporally restricted promoters (e.g., a tissue specific promoter, a cell type specific promoter, etc.), etc. Suitable promoters include, but are not limited to: SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6), an enhanced U6 promoter, and a human Hl promoter (Hl). By transcriptional activation, it is intended that transcription will be increased above basal levels in the target cell by 2 fold, 5 fold, 10 fold, 50 fold, by 100 fold, 500 fold, or by 1000 fold, or more. In addition, vectors used for providing a nucleic acid that, when transcribed, produces a guide nucleic acid and/or a nucleic acid that encodes an effector protein to a cell may include nucleic acid sequences that encode for selectable markers in the target cells, so as to identify cells that have taken up the guide nucleic acid and/or the effector protein.
[184] In general, vectors provided herein comprise at least one promotor or a combination of promoters driving expression or transcription of one or more genome editing tools described herein. In some embodiments, the vector comprises a nucleotide sequence of a promoter. In some embodiments, the vector comprises two promoters. In some embodiments, the vector comprises three promoters. In some embodiments, a length of the promoter is less than about 500, less than about 400, less than about 300, or less than about 200 linked nucleotides. In some embodiments, a length of the promoter is at least 100, at least 200, at least 300, at least 400, or at least 500 linked nucleotides. Non-limiting examples of promoters include CMV, 7SK, EFla, RPBSA, hPGK, EFS, SV40, PGK1, Ubc, human beta actin, TRE, UAS, Ac5, Polyhedrin, CaMKIIa, GALl-lO, Hl, TEF1, GDS, ADH1, CaMV35S, HSV TK, Ubi, U6, MNDU3, MSCV, MND, and CAG.
[185] In some embodiments, the promoter is a constitutive promoter. In some embodiments, the promoter is an inducible promoter. In some embodiments, the inducible promoter only drives expression of its corresponding coding sequence (e.g., polypeptide or guide nucleic acid) when a signal is present, e.g., a hormone, a small molecule, a peptide. Non-limiting examples of inducible promoters are the T7 RNA polymerase promoter, the T3 RNA polymerase promoter, the Isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter, a lactose induced promoter, a heat shock promoter, a tetracycline-regulated promoter (tetracycline-inducible or tetracycline-repressible), a steroid regulated promoter, a metal- regulated promoter, and an estrogen receptor-regulated promoter.
[186] In some embodiments, the promoters are prokaryotic promoters (e.g., drive expression of a gene in a prokaryotic cell). In some embodiments, the promoters are eukaryotic promoters, (e.g., drive expression of a gene in a eukaryotic cell). In some embodiments, the promoter is EFla. In some embodiments, the promoter is ubiquitin. In some embodiments, vectors are bicistronic or polycistronic vector (e.g., having or involving two or more loci responsible for generating a protein) having an internal ribosome entry site (IRES) is for translation initiation in a cap-independent manner.
[187] In some embodiments, a vector described herein is a nucleic acid expression vector. In some embodiments, a vector described herein is a recombinant expression vector. In some embodiments, a vector described herein is a messenger RNA. In some embodiments, a vector comprising the recombinant nucleic acid as described herein, wherein the vector is a viral vector, an adeno associated viral (AAV) vector, a retroviral vector, or a lentiviral vector. In some embodiments, a vector described herein or a recombinant nucleic acid described herein is comprised in a cell. In some embodiments, a recombinant nucleic acid integrated into a genomic DNA sequence of the cell, wherein the cell is a eukaryotic cell or a prokaryotic cell.
[188] In some embodiments, a vector described herein is a delivery vector. In some embodiments, the delivery vector is a eukaryotic vector, a prokaryotic vector (e.g., a bacterial vector) a viral vector, or any combination thereof. In some embodiments, the delivery vehicle is a non-viral vector. In some embodiments, the delivery vector is a plasmid. In some embodiments, the plasmid comprises DNA. In some embodiments, the plasmid comprises RNA. In some embodiments, the plasmid comprises circular doublestranded DNA. In some embodiments, the plasmid is linear. In some embodiments, the plasmid comprises one or more coding sequences of interest and one or more regulatory elements. In some embodiments, the plasmid comprises a bacterial backbone containing an origin of replication and an antibiotic resistance gene or other selectable marker for plasmid amplification in bacteria. In some embodiments, the plasmid is a minicircle plasmid. In some embodiments, the plasmid contains one or more genes that provide a selective marker to induce a target cell to retain the plasmid. In some examples, the plasmids are engineered through synthetic or other suitable means known in the art. For example, in some embodiments, the genetic elements are assembled by restriction digest of the desired genetic sequence from a donor plasmid or organism to produce ends of the DNA which is then be readily ligated to another genetic sequence.
[189] In some embodiments, vectors comprise an enhancer. Enhancers are nucleotide sequences that have the effect of enhancing promoter activity. In some embodiments, enhancers augment transcription regardless of the orientation of their sequence. In some embodiments, enhancers activate transcription from a distance of several kilo basepairs. Furthermore, enhancers are located optionally upstream or downstream of a gene region to be transcribed, and/or located within the gene, to activate the transcription. Exemplary enhancers include, but are not limited to, WPRE; CMV enhancers; the R-U5' segment in LTR of HTLV-I.
[190] In some embodiments, a vector described herein comprises a viral vector. In some embodiments, the viral vector comprises a nucleic acid to be delivered into a host cell by a recombinantly produced virus or viral particle. The nucleic acid may be single-stranded or double stranded, linear or circular, segmented or non-segmented. The nucleic acid may comprise DNA, RNA, or a combination thereof. In some embodiments, the vector is an adeno-associated viral vector. There are a variety of viral vectors that are associated with various types of viruses, including but not limited to retroviruses (e.g., lentiviruses and y- retroviruses), adenoviruses, arenaviruses, alphaviruses, adeno-associated viruses (AAVs), baculoviruses,
vaccinia viruses, herpes simplex viruses and poxviruses. In some embodiments, the vector is an adeno- associated viral (AAV) vector. In some embodiments, the viral vector is a recombinant viral vector. In some embodiments, the vector is a retroviral vector. In some embodiments, the retroviral vector is a lentiviral vector. In some embodiments, the retroviral vector comprises gamma-retroviral vector. A viral vector provided herein may be derived from or based on any such virus. For example, in some embodiments, the gamma-retroviral vector is derived from a Moloney Murine Leukemia Virus (MoMLV, MMLV, MuLV, or MLV) or a Murine Stem cell Virus (MSCV) genome. In some embodiments, the lentiviral vector is derived from the human immunodeficiency virus (HIV) genome. In some embodiments, the viral vector is a chimeric viral vector. In some embodiments, the chimeric viral vector comprises viral portions from two or more viruses. In some embodiments, the viral vector corresponds to a virus of a specific serotype.
[191] In some embodiments, a viral vector is an adeno-associated viral vector (AAV vector). In some embodiments, a viral particle that delivers a viral vector described herein is an AAV. In some embodiments, the AAV comprises any AAV known in the art. In some embodiments, the viral vector corresponds to a virus of a specific AAV serotype. In some embodiments, the AAV serotype is selected from an AAV1 serotype, an AAV2 serotype, AAV3 serotype, an AAV4 serotype, AAV5 serotype, an AAV6 serotype, AAV7 serotype, an AAV8 serotype, an AAV9 serotype, an AAV10 serotype, an AAV11 serotype, an AAV12 serotype, an AAV-rhlO serotype, and any combination, derivative, or variant thereof. In some embodiments, the AAV vector is a recombinant vector, a hybrid AAV vector, a chimeric AAV vector, a self-complementary AAV (scAAV) vector, a single-stranded AAV, or any combination thereof. scAAV genomes are generally known in the art and contain both DNA strands which can anneal together to form double-stranded DNA.
[192] In some embodiments, an AAV vector described herein is a chimeric AAV vector. In some embodiments, the chimeric AAV vector comprises an exogenous amino acid or an amino acid substitution, or capsid proteins from two or more serotypes. In some examples, a chimeric AAV vector may be genetically engineered to increase transduction efficiency, selectivity, or a combination thereof.
[193] In some embodiments, AAV vector described herein comprises two inverted terminal repeats (ITRs). According, in some embodiments, the viral vector provided herein comprises two inverted terminal repeats of AAV. A nucleotide sequence between the ITRs of an AAV vector provided herein comprises a sequence encoding genome editing tools. In some embodiments, the genome editing tools comprise a nucleic acid encoding one or more effector proteins, a nucleic acid encoding one or more fusion proteins (e.g., a nuclear localization signal (NLS), polyA tail), one or more guide nucleic acids, a nucleic acid encoding the one or more guide nucleic acids, respective promoter(s), one or more donor nucleic acid, or any combinations thereof. In some embodiments, viral vectors provided herein comprise at least one promotor or a combination of promoters driving expression or transcription of one or more genome editing tools described herein. In some embodiments, a coding region of the AAV vector forms an intramolecular
double-stranded DNA template thereby generating the AAV vector that is a self-complementary AAV (scAAV) vector. In some embodiments, the scAAV vector comprises the sequence encoding genome editing tools that has a length of about 2 kb to about 3 kb. In some embodiments, the AAV vector provided herein is a self-inactivating AAV vector. In some embodiments, the AAV vector provided herein comprises a modification, such as an insertion, deletion, chemical alteration, or synthetic modification, relative to a wild-type AAV vector.
[194] In some embodiments, methods of producing AAV delivery vectors herein comprise packaging a nucleic acid encoding an effector protein and a guide nucleic acid, or a combination thereof, into an AAV vector. In some embodiments, methods of producing the delivery vector comprises, (a) contacting a cell with at least one nucleic acid encoding: (i) a guide nucleic acid; (ii) a Replication (Rep) gene; and (iii) a Capsid (Cap) gene that encodes an AAV capsid protein; (b) expressing the AAV capsid protein in the cell; (c) assembling an AAV particle; and (d) packaging an effector encoding nucleic acid into the AAV particle, thereby generating an AAV delivery vector. In some embodiments, promoters, staffer sequences, and any combination thereof may be packaged in the AAV vector. In some examples, the AAV vector may package 1, 2, 3, 4, or 5 guide nucleic acids or copies thereof. In some embodiments, the AAV vector comprises inverted terminal repeats, e.g., a 5’ inverted terminal repeat and a 3’ inverted terminal repeat. In some embodiments, the AAV vector comprises a mutated inverted terminal repeat that lacks a terminal resolution site.
[195] In some embodiments, a hybrid AAV vector is produced by transcapsidation, e.g., packaging an inverted terminal repeat (ITR) from a first serotype into a capsid of a second serotype, wherein the first and second serotypes may be not the same. In some examples, the Rep gene and ITR from a first AAV serotype (e.g., AAV2) may be used in a capsid from a second AAV serotype (e.g., AAV9), wherein the first and second AAV serotypes may be not the same. As a non-limiting example, a hybrid AAV serotype comprising the AAV2 ITRs and AAV9 capsid protein may be indicated AAV2/9. In some examples, the hybrid AAV delivery vector comprises an AAV2/1, AAV2/2, AAV 2/4, AAV2/5, AAV2/8, or AAV2/9 vector.
[196] In some embodiments, AAV particles described herein are recombinant AAV (rAAV). In some embodiments, rAAV particles are generated by transfecting AAV producing cells with an AAV-containing plasmid carrying the sequence encoding the genome editing tools, a plasmid that carries viral encoding regions, i.e., Rep and Cap gene regions; and a plasmid that provides the helper genes such as E1A, E1B, E2A, E4ORF6 and VA. In some embodiments, the AAV producing cells are mammalian cells. In some embodiments, host cells for rAAV viral particle production are mammalian cells. In some embodiments, a mammalian cell for rAAV viral particle production is a COS cell, a HEK293T cell, a HeLa cell, a KB cell, a variant thereof, or a combination thereof. In some embodiments, rAAV virus particles can be produced in the mammalian cell culture system by providing the rAAV plasmid to the mammalian cell. In some embodiments, producing rAAV virus particles in a mammalian cell comprises transfecting vectors that
express the rep protein, the capsid protein, and the gene-of-interest expression construct flanked by the ITR sequence on the 5’ and 3’ ends. Methods of such processes are provided in, for example, Naso et al., BioDrugs, 2017 Aug;31(4):317-334 and Benskey et al., (2019), Methods Mol Biol., 1937:3-26, each of which is incorporated by reference in their entireties.
[197] In some embodiments, rAAV is produced in a non-mammalian cell. In some embodiments, rAAV is produced in an insect cell. In some embodiments, the insect cell for producing rAAV viral particles comprises a Sf9 cell. In some embodiments, production of rAAV virus particles in insect cells may comprise baculovirus. In some embodiments, production of rAAV virus particles in insect cells may comprise infecting the insect cells with three recombinant baculoviruses, one carrying the cap gene, one carrying the rep gene, and one carrying the gene-of-interest expression construct enclosed by an ITR on both the 5’ and 3’ end. In some embodiments, rAAV virus particles are produced by the One Bac system. In some embodiments, rAAV virus particles can be produced by the Two Bac system. In some embodiments, in the Two Bac system, the rep gene and the cap gene of the AAV is integrated into one baculovirus virus genome, and the ITR sequence and the gene-of-interest expression construct is integrated into another baculovirus virus genome. In some embodiments, in the One Bac system, an insect cell line that expresses both the rep protein and the capsid protein is established and infected with a baculovirus virus integrated with the ITR sequence and the gene-of-interest expression construct. Details of such processes are provided in, for example, Smith et. al., (1983), Mol. Cell. Biol., 3(12):2156-65; Urabe et al., (2002), Hum. Gene. Then, 1;13(16): 1935-43; and Benskey et al., (2019), Methods Mol Biol., 1937:3-26, each of which is incorporated by reference in its entirety.
H. Lipid Particles and Non-Viral Vectors
[198] In some embodiments, compositions and systems provided herein comprise a lipid particle. In some embodiments, a lipid particle is a lipid nanoparticle (LNP). In some embodiments, a lipid or a lipid nanoparticle can encapsulate an expression vector as described herein. LNPs are a non-viral delivery system for delivery of the composition and/or system components described herein. LNPs are particularly effective for delivery of nucleic acids. Beneficial properties of LNP include ease of manufacture, low cytotoxicity and immunogenicity, high efficiency of nucleic acid encapsulation and cell transfection, multi-dosing capabilities and flexibility of design (Kulkami et al., (2018) Nucleic Acid Therapeutics, 28(3): 146-157). In some embodiments, compositions and methods comprise a lipid, polymer, nanoparticle, or a combination thereof, or use thereof, to introduce one or more effector proteins, one or more guide nucleic acids, one or more donor nucleic acids, or any combinations thereof to a cell. Non-limiting examples of lipids and polymers are cationic polymers, cationic lipids, ionizable lipids, or bio-responsive polymers. In some embodiments, the ionizable lipids exploits chemical-physical properties of the endosomal environment (e.g. , pH) offering improved delivery of nucleic acids. In some embodiments, the ionizable lipids are neutral at physiological pH. In some embodiments, the ionizable lipids are protonated under acidic pH. In some
embodiments, the bio-responsive polymer exploits chemical-physical properties of the endosomal environment (e.g., pH) to preferentially release the genetic material in the intracellular space.
[199] In some embodiments, a LNP comprises an outer shell and an inner core. In some embodiments, the outer shell comprises lipids. In some embodiments, the lipids comprise modified lipids. In some embodiments, the modified lipids comprise pegylated lipids. In some embodiments, the lipids comprise one or more of cationic lipids, anionic lipids, ionizable lipids, and non-ionic lipids. In some embodiments, the LNP comprises one or more of Nl,N3,N5-tris(3-(didodecylamino)propyl)benzene-l,3,5-tricarboxamide (TT3), 2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), l-palmitoyl-2-oleoylsn-glycero-3- phosphoethanolamine (POPE), l,2-distearoyl-sn-glycero-3 -phosphocholine (DSPC), cholesterol (Choi), 1,2-dimyristoyl-sn-glycerol, and methoxypolyethylene glycol (DMG-PEChooo), derivatives, analogs, or variants thereof. In some embodiments, the LNP has a negative net overall charge prior to complexation with one or more of a guide nucleic acid, a nucleic acid encoding the one or more guide nucleic acid, a nucleic acid encoding the effector protein, and/or a donor nucleic acid. In some embodiments, the inner core is a hydrophobic core. In some embodiments, the one or more of a guide nucleic acid, the nucleic acid encoding the one or more guide nucleic acid, the nucleic acid encoding the effector protein, and/or the donor nucleic acid forms a complex with one or more of the cationic lipids and the ionizable lipids. In some embodiments, the nucleic acid encoding the effector protein or the nucleic acid encoding the guide nucleic acid is self-replicating.
[200] In some embodiments, a LNP comprises one or more of cationic lipids, ionizable lipids, and modified versions thereof. In some embodiments, the ionizable lipid comprises TT3 or a derivative thereof. Accordingly, in some embodiments, the LNP comprises one or more of TT3 and pegylated TT3. The publication WO2016187531 is hereby incorporated by reference in its entirety, which describes representative LNP formulations in Table 2 and Table 3, and representative methods of delivering LNP formulations in Example 7.
[201] In some embodiments, a LNP comprises a lipid composition targeting to a specific organ. In some embodiments, the lipid composition comprises lipids having a specific alkyl chain length that controls accumulation of the LNP in the specific organ (e.g., liver or spleen). In some embodiments, the lipid composition comprises a biomimetic lipid that controls accumulation of the LNP in the specific organ (e.g. , brain). In some embodiments, the lipid composition comprises lipid derivatives (e.g., cholesterol derivatives) that controls accumulation of the LNP in a specific cell (e.g., liver endothelial cells, Kupffer cells, hepatocytes).
[202] In some embodiments, administration of a non-viral vector comprises contacting a cell, such as a host cell, with the non-viral vector. In some embodiments, a physical method or a chemical method is employed for delivering the vector into the cell. Exemplary physical methods include electroporation, gene gun, sonoporation, magnetofection, or hydrodynamic delivery. Exemplary chemical methods include
delivery of the recombinant polynucleotide by liposomes such as, cationic lipids or neutral lipids; lipofection; dendrimers; lipid nanoparticle (LNP); or cell-penetrating peptides.
II. Methods of Modifying a Nucleic Acid
[203] Provided herein are methods for modifying (e.g., editing) target nucleic acids. In general, methods comprise cleaving a target nucleic acid and inserting one a donor nucleic acid at the cut site of the target nucleic acid. Methods of modifying may comprise contacting a target nucleic acid with one or more components of the systems described herein. Methods may comprise contacting a cell that comprises a target nucleic acid with one or more components of the systems described herein. Methods may comprise delivering to a subject one or more components of the systems described herein.
[204] In some embodiments, a cleaved target nucleic acid is repaired by homologous recombination (e.g. , homology directed repair (HDR)) or non-homologous end joining (NHEJ). In some embodiments, a doublestranded break in the target nucleic acid may be repaired (e.g. , by NHEJ or HDR) after insertion of a donor nucleic acid. In some embodiments, a nucleotide insertion and/or deletion, sometimes referred to as an indel occurs at a cleavage site. An indel may vary in length (e.g. , 1 to 1,000 nucleotides in length) and be detected using methods well known in the art, including sequencing. If the number of nucleotides in the insertion/deletion is not divisible by three, and it occurs in a protein coding region, it is also a frameshift mutation. Indel percentage is the percentage of sequencing reads that show at least one nucleotide has been mutation that results from the insertion and/or deletion of nucleotides regardless of the size of insertion or deletion, or number of nucleotides mutated. For example, if there is at least one nucleotide deletion detected in a given target nucleic acid, it counts towards the percent indel value. As another example, if one copy of the target nucleic acid has one nucleotide deleted, and another copy of the target nucleic acid has 10 nucleotides deleted, they are counted the same. This number reflects the percentage of target nucleic acids that are edited by a given effector protein.
SEQUENCES AND TABLES
[205] The following tables are referenced herein and throughout. The information in the following tables is exemplary and is not intended to limit the scope of the claims.
[206] TABLE 1 provides illustrative amino acid sequences of effector proteins that are useful in the compositions, systems and methods described herein.
[207] TABLE 1.1 provides illustrative reverse transcriptases that are useful in the compositions, systems and methods described herein.
TABLE 1.1. Exemplary Reverse Transcriptases
[208] TABLE 2 provides illustrative sequences of exemplary heterologous polypeptides useful in the compositions, systems and methods described herein.
[209] TABLE 3 provides illustrative repeat sequences for use in guide nucleic acids that are useful in the compositions, systems and methods described herein.
[210] TABLE 4 provides illustrative protein binding sequences for use in guide nucleic acids that are useful in the compositions, systems and methods described herein.
TABLE 4. Exemplary Protein Binding Sequences for use in Guide Nucleic Acids
[211] TABLE 5 provides illustrative PAM sequences that are useful in the compositions, systems and methods described herein.
[212] TABLE 6 provides illustrative target nucleic acids that are useful in the compositions, systems and methods described herein.
TABLE 6. Exemplary Target Nucleic Acids
ANGPTL3, ANGPTL4, APC, Apo(a), APOCIII (AP0C3), AP0Ee4, AP0L1, APP, AQP2, AR, ARFRP1, ARG1, ARH, ARL13B, ARL6, ARSA, ARSB, ASL, ASNS, ASPA, ASS1, ATM, ATP6V1B1, ATP7A, ATP7B, ATRX, ATXN1, ATXN10, ATXN2, ATXN3, ATXN7, ATXN80S, AXIN1, AXIN2, B2M, BACE-1, BAK1, BAP1, BARD1, BAX2, BBS1, BBS10, BBS12, BBS2, BCKDHA, BCKDHB, BCL2L2, BCS1L, BEST1, BLM, BMPR1A, BRAE, BRAFV600E, BRCA1, BRCA2, BRIP1, BSND, C9ORF72. CA4, CACNA1A, CAH1, CAPN3, CASR, CBS, CCNB1 CC2D2A, CCR5, CD1, CD2, CD3, CD3D, CD3Z, CD4, CD5, CD6, CD7, CD8A, CD8B, CD9, CD14, CD18, CD19, CD21, CD22, CD23, CD27, CD28, CD30, CD33, CD34, CD36, CD38, CD40, CD40L, CD44, CD46, CD47, CD48, CD52, CD55, CD57, CD58, CD59, CD68, CD69, CD72, CD73, CD74, CD79A, CD80, CD81, CD83, CD84, CD86, CD90, CD93, CD96, CD99, CD 100, CD 123, CD 160, CD 163, CD 164, CD164L2, CD 166, CD200, CD204, CD207, CD209, CD226, CD244, CD247, CD274, CD276, CD300, CD320, CDC73, CDH1, CDH23, CDK11, CDK4, CDKN1A, CDKN1B, CDKN1C, CDKN2A, CDKN2B, CEBPA, CELA3B, CEP 290, CERKL, CEB, CFTR, CHCHD10, CHEK2, CHM, CHRNE. CIDEB, CIITA, CLN3, CLN5, CLN6, CLN8, CLRN1, CLTA, CMT1A, CNBP, CNGB1, CNGB3, C0L1A1, C0L1A2, COL27A1, COL4A3, COL4A4, COL4A5, C0L6A1, COL6A2, COL6A3, C0L7A1, CPS1, CPT1A, CPT2, CRB1, CREBBP, CRX, CRYAA, CTNNA1, CTNNB1, CTNND2, CTNS, CTSK, CXCL12, CYBA, CYBB, CYP11B1, CYP11B2, CYP17A1, CYP19A1, CYP21A2, CYP27A1, DBT, DCC, DCLRE1C, DERI.2. DFNA36, DFNB31, DGAT2, DHCR7, DHDDS, DICER1, DIS3L2, DLD, DMD, DMPK, DNAH5, DNAI1, DNAI2, DNM2, DNMT1, DPC4, DUX4, DYSF, EDA, EDN3, EDNRB, EGFR, EIF2B5, EMC2, EMC3, EMD, EMX1, EN1, EPCAM, ERCC6, ERCC8, ESC02, ETFA, ETFDH, ETHE1, EVC, EVC2, EYS, F5, F9, FXI, FAH, FAM161A, FANCA, FANCB, FANCC, FANCD1, FANCD2, FANCE, FANCF, FANCG, FANCI, FANCJ, FANCL, FANCM, FANCN, FANCP, FANCS, FBN1, FGF14, FGFR2, FGFR3, FGA, FGB, FGG, FH, FHL1, FIX, FKRP, FKTN, FLCN, FMRI, F0XP3, FSCN2, FSHD1, FUS, FUT8, FVIII, FXII, FX G6PC, GAA, GALC, GALK1, GALT, GAMT, GATA2, GATA-4, GBA, GBE1, GCDH, GCGR, GDNF, GFAP, GFM1, GHR, GJB1, GJB2, GLA, GLB1, GLDC, GLE1, GNE, GNPTAB, GNPTG, GNS, GPAM, GPC3, GPR98, GREM1, GRHPR, GRIN2B, H2AFX, H2AX, HADHA, HAX1, HBA1, HBA2, HBB, HBV cccDNA, HER2, HEXA, HEXB, HFE, HGSNAT, HLCS, HMGCL, HA01, H0GA1, H0XB13, HPRPF3, HPRT1, HPS1, HPS3, HRAS, HRD1, HSD3B2, HSD17B4, HSD17B13, HTT, HUS1, HYAL1, HYLS1, IDS, IDUA, IFITM5, IKBKAP, IL2RG, IL7R, IMPDH1, INPP5E, IRF4, ITGB2, ITPR1, IVD, JAG1, JAK1, JAK3, KCNC3, KCND3, KCNJ11, KLKB1, KLHL7, KRAS, LAMA1, LAMA2, LAMA3, LAMB3, LAMC2, LCA5, LDHA, EDER, LDLRAP1, LHX3, LIFR, LIPA, IMNA, IM0D3, LOR, L0XHD1, LPA, LPL, LRAT, LRP6, LRPPRC, LRRK2, MADR2, MAN2B1, MAPT, MARC1, MAX, MCM6, MC0LN1, MECP2, MED17, MEFV, MEN1, MERTK, MESP2, MET, METexl4, MFN2, MFSD8, MIA3, MITF, MKL2, MKS1, MLC1, MLH1, MLH3, MMAA, MMAB, MMACHC, MMADHC, MMD, MPI, MPL, MPV17, MSH2, MSH3, MSH6, MTHFD1L, MTHFR, MTM1, MTRR, MTTP, MUT, MUTYH, MYC, MYH7, MY07A, NAGLU, NAGS, NBN, NDRG1, NDUFAF5, NDUFS6, NEB, NF1, NF2, NKX2-5, NOG, N0TCH1, N0TCH2, NPC1, NPC2, NPHP1, NPHS1, NPHS2, NRAS, NR2E3, NTHL1, NTRK, NTRK1, OAT, 0CT4, 0FD1, 0PA3, OTC, PAH, PALB2, PAQR8, PAX3, PC, PCCA, PCCB, PCDH15, PCSK9, PD1, PDCD1, PDE6B, PDGFRA, PDHA1, PDHB, PEX1, PEX10, PEX12, PEX13, PEX14, PEX16, PEX19, PEX2, PEX26, PEX3, PEX5, PEX6, PEX7, PFKM, PHGDH, PH0X2B, PKD1, PKD2, PKHD1, PKK, PLEKHG4, PMM2, PMP22, PMS1, PMS2, PNPLA3, POLDI, POLE, P0MGNT1, POTI, POU5F1, PPM1A, PPP2R2B, PPT1, PRCD, PRKAG2, PRKAR1A, PRKCG, PRNP, PR0M1, PR0P1, PRPF31, PRPF8, PRPH2, PRPS1, PSAP, PSD3, PSD95, PSEN1, PSEN2, PSRC1, PTCHI, PTEN, PTS, PUS1, PYGM, RAB23, RAD50, RAD51C, RAD51D, RAG1, RAG2, RAPSN RARS2, RBI, RDH12, RECQL4, RET, RHO, RICTOR, RMRP, R0S1, RPL RP2, RPE65, RPGR, RPGRIP1L, RPL32P3, RSI, RTCA, RTEL1, RUNX1, SACS, SAMHD1, SCN1A, SCN2A, SDHA, SDHAF2, SDHB, SDHC, SDHD, SEL1L, SEPSECS, SERPINA1, SERPINC1, SERPING1, SGCA, SGCB, SGCG, SGSH, SIRT1, SLC12A3, SLC12A6, SLC17A5, SLC22A5, SLC25A13, SLC25A15, SLC26A2, SLC26A4, SLC35A3, SLC35B4, SLC37A4, SLC39A4, SLC4A11, SLC6A8, SLC7A7, SMAD3, SMAD4, SMARCA4, SMARCAL1, SMARCB1, SMARCE1, SAINI, SMN2, SMPD1, SNAI2, SNCA, SNRNP200, S0D1, SOXIO, SPARA7, SPTBN2, STAR, STAT 3, STK11, SUFU, SUMF1, SYNE1, SYNE2, SYS1, TARDBP, TAT, TBK1, TBP, TCIRG1, TCTN3, TECPR2, TERC, TERT, TFR2, TGFBR2, TGM1, TH, TLE3, TMEM127, TMEM138, TMEM216, TMEM43, TMEM67, TMPRSS6, TNNI2, TNNT1, TNNT3, TOPI, TOPORS, TP53, TPM2, TPM3, TPP1, TRAC, TRMU, TSC1, TSC2, TSFM, TSPAN14, TTBK2, TTC8, TTP A, TTR TULP1, TYMP, UBE2G2, UBE2J1, UBE3A, USH1C, USH1G, USH2A,
[213] TABLE 7 provides illustrative diseases and syndromes for compositions, systems and methods described herein.
TABLE 7. Diseases and Syndromes
muscular dystrophies; encephalomyopathic mtDNA depletion syndrome; encephalitis; enzymatic diseases; EPCAM-associated congenital tufting enteropathy; epidermolysis bullosa with pyloric atresia; epilepsy; fabry disease; facioscapulohumeral muscular dystrophy; Factor V Leiden thrombophilia; Faisalabad histiocytosis; familial atypical mycobacteriosis; familial capillary malformation-arteriovenous; Familial Creutzfeld-Jakob disease; familial esophageal achalasia; familial glomuvenous malformation; familial hemophagocytic lymphohistiocytosis; familial mediterranean fever; familial megacalyces; familial schwannomatosis; familial spina bifida; familial splenic asplenia/hypoplasia; familial thrombotic thrombocytopenic purpura; Fanconi disease (Fanconi anemia); Feingold syndrome; FENIB; fibrodysplasia ossificans progressiva; FKTN; Fragile X syndrome; Francois-Neetens fleck comeal dystrophy; Frasier syndrome; Friedreich’s ataxia; FTDP-17;
Fuchs comeal dystrophy; fucosidosis; G6PD deficiency; galactosialidosis; Galloway syndrome;
Gardner syndrome; Gaucher disease; Gitelman syndrome; GLUT1 deficiency; GM2- Gangliosidoses (e.g., Tay Sachs Disease, Sandhoff Disease) glycogen storage disease type lb; glycogen storage disease type 2; glycogen storage disease type 3; glycogen storage disease type 4; glycogen storage disease type 9a; glycogen storage diseases; GM 1 -gangliosidosis; Greenberg syndrome; Greig cephalopolysyndactyly syndrome; hair genetic diseases; hairy cell leukemia; HANAC syndrome; harlequin type ichtyosis congenita; HDR syndrome; hearing loss; hemochromatosis type 3; hemochromatosis type 4; hemolytic anemia; hemolytic uremic syndrome; hemophilia A; hemophilia B; hereditary angioedema type 3; hereditary angioedemas; hereditary hemorrhagic telangiectasia; hereditary hypofibrinogenemia; hereditary intraosseous vascular malformation; hereditary leiomyomatosis and renal cell cancer; hereditary neuralgic amyotrophy; hereditary sensory and autonomic neuropathy type; Hermansky-Pudlak disease; HHH syndrome; HHT2; hidrotic ectodermal dysplasia type 1; hidrotic ectodermal dysplasias; histiocytic sarcoma; HNF4A-associated hyperinsulinism; HNPCC; homozygous familial hypercholesterolemia; human immunodeficiency with microcephaly; Human monkeypox (MPX); human papilloma virus (HPV) infection; Huntington’s disease; hyper-IgD syndrome; hyperinsulinism-hyperammonemia syndrome; hypercholesterolemia; hypertrophy of the retinal pigment epithelium; hypochondrogenesis; hypohidrotic ectodermal dysplasia; ICF syndrome; idiopathic congenital intestinal pseudo-obstruction; immunodeficiency 13; immunodeficiency 17; immunodeficiency 25; immunodeficiency with hyper-IgM type 1; immunodeficiency with hyper-IgM type 3; immunodeficiency with hyper-IgM type 4; immunodeficiency with hyper-IgM type 5; immunoglobulin alpha deficiency; inborn errors of thyroid metabolism; infantile myofibromatosis; infantile visceral myopathy; infantile X-linked spinal muscular atrophy; intrahepatic cholestasis of pregnancy; IPEX syndrome; IRAK4 deficiency; isolated congenital asplenia; Jeune syndrome; Johanson-Blizzard syndrome; Joubert syndrome; JP-HHT syndrome; juvenile hemochromatosis; juvenile hyalin fibromatosis; juvenile nephronophthisis; Kabuki mask syndrome; Kallmann syndromes; Kartagener syndrome; KCNJ11 -associated hyperinsulinism; Kearns- Sayre syndrome; Kostmann disease; Kozlowski type of spondylometaphyseal dysplasia; Krabbe disease; LADD syndrome; late infantile-onset neuronal ceroid lipofuscinosis; LCK deficiency; LDHCP syndrome; Leber Congenital Amaurosis Teyp 10; Legius syndrome; Leigh syndrome; lethal congenital contracture syndrome 2; lethal congenital contracture syndromes; lethal contractural syndrome type 3; lethal neonatal CPT deficiency type 2; lethal osteosclerotic bone dysplasia; leukocyte adhesion deficiency; Li Fraumeni syndrome; LIG4 syndrome; limb girdle muscular dystrophies (LGMD1B, LGMD2A, LGMD2B); lipodystrophy; lissencephaly type 1; lissencephaly type 3; Loeys-Dietz syndrome; low phospholipid-associated cholelithiasis; Lynch Syndrome; lysinuric protein intolerance; a lysosomal storage disease (e.g., Hunter syndrome, Hurler syndrome); macular dystrophy; Maffiicci syndrome; Majeed syndrome; mannose-binding protein deficiency; mantle cell lymphoma; Marfan disease; Marshall syndrome; MASA syndrome; mastocytosis; MCAD deficiency; McCune-Albright syndrome; MCKD2; Meckel syndrome; MECP2 Duplication Syndrome; Meesmann comeal dystrophy; megacystis-microcolon-intestinal hypoperistalsis; megaloblastic anemia type 1; MEHMO; MELAS; Melnick-Needles syndrome; MEN2s; meningitis; Menkes disease; metachromatic leukodystrophies; methymalonic acidemia due to transcobalamin receptor defect; methylmalonic acidurias; methylvalonic aciduria; microcoria-congenital nephrosis syndrome; microvillous atrophy; migraine; mitochondrial neurogastrointestinal encephalomyopathy; monilethrix; monosomy X; mosaic trisomy 9 syndrome; Mowat-Wilson syndrome; mucolipidosis type 2; mucolipidosis type Ma; mucolipidosis type IV; mucopolysaccharidoses; mucopolysaccharidosis type 3A; mucopolysaccharidosis type 3C; mucopolysaccharidosis type 4B; multiminicore disease; multiple acyl-CoA dehydrogenation
>
EXAMPLES
[214] The following examples are included for illustrative purposes only and are not intended to limit the scope of the disclosure.
Example 1: Type V OCATS
[215] A fusion protein comprising a reverse transcriptase (RT) fused to the N terminus or C terminus of a Type V Cas effector protein is generated by constructing a plasmid that encodes the fusion protein. Nonlimiting examples of the RT are MLV-RT, a Group I intron RT, and a Group II intron RT. Non -limiting examples of Type V Cas effector proteins are CasM.265446 (SEQ ID NO: 1) and CasM. 19952 (SEQ ID NO: 2). The plasmid also encodes an intermediary RNA and an extended crRNA. The intermediary RNA comprises from 5’ to 3’: a repeat hybridization sequence, and a protein binding sequence. The extended crRNA comprises from 5’ to 3’: a template sequence, a repeat sequence that hybridizes to the repeat hybridization sequence of the intermediary RNA, and a spacer sequence that hybridizes to a target sequence of a target strand of a dsDNA target nucleic acid.
[216] Eukaryotic cells are transfected with the plasmid. In some instances, the plasmid is an AAV vector. In other instances, the intermediary RNA and crRNA are synthesized or encoded by a second plasmid, and then complexed with the translated protein for delivery to the cells. Without being bound by theory, the Type V Cas effector protein forms a ribonucleoprotein (RNP) complex with the intermediary RNA and crRNA. The RNP complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid. The RNP complex cleaves at least one strand of the R-loop to produce a cut site. The RT reverse transcribes the template sequence to produce a RT-DNA to be used as a ss donor nucleic acid at the 3’ end of the intermediary RNA. This edit is incorporated into the DNA via native repair mechanisms. Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
Example 2: Type V OCATS with Ligase
[217] A plasmid encoding a fusion protein, intermediary RNA and extended crRNA is constructed as described in Example 1. The plasmid additionally encodes a domain capable of recruiting an endogenous DNA ligase expressed in eukaryotic cells. Alternatively, the plasmid encodes a DNA ligase. When the plasmid is expressed, the protein binding domain or ligase is fused as part of a multi-domain fusion consisting of a Cas effector, an RT, and a recruitment domain or a ligase. This fusion can be in any order. Domains can be separated by linkers or can be recruited by domain-aptamer fusions, that recognize RNA sequences added to the intermediary RNA or crRNA.
[218] Eukaryotic cells are transfected with the plasmid. In some instances, the plasmid is an AAV vector. Without being bound by theory, the Type V Cas effector protein forms an RNP complex with the intermediary RNA and crRNA. The RNP complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid. The RNP complex cleaves at least one strand of the R-loop to produce a cut site. The RT reverse transcribes the template sequence to produce a RT-DNA to be used as a ss donor nucleic acid at the 3’ end of the intermediary RNA. The ligase is recruited to the cut site where it ligates insertion of the ss donor nucleic acid into the cut site. Resolution of the complex is enabled by DNA repair. Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
Example 3: Type V OCATS with SDSA
[219] A plasmid encoding a fusion protein, intermediary RNA and extended crRNA is constructed as described in Example 1. The plasmid additionally encodes a peptide capable of interacting with protein(s) involved in synthesis dependent strand annealing (SDSA) which are endogenously expressed in eukaryotic cells. When the plasmid is expressed, the peptide is fused as part of a multi-domain fusion consisting of a Cas effector, an RT, and a recruitment domain. This fusion can be in any order.
[220] Eukaryotic cells are transfected with the plasmid. In some instances, the plasmid is an AAV vector. Without being bound by theory, the Type V Cas effector protein forms a RNP complex with the intermediary RNA and crRNA. The complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid. The complex cleaves at least one strand of the R-loop to produce a cut site. The RT reverse transcribes the template sequence to produce a RT-DNA to be used as a ss donor nucleic acid at the 3’ end of the intermediary RNA. The SDSA related proteins are recruited to the cut site where the protein generates a second strand of DNA that is complementary to the ss donor nucleic acid to produce edited DNA that is directly attached to the genomic DNA. This edit is incorporated into the DNA via native repair mechanisms. Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
Example 4: Type II OCATS
[221] A fusion protein comprising a reverse transcriptase (RT) fused to the N terminus or C terminus of a Type II Cas effector protein is generated by constructing a plasmid that encodes the fusion protein. Nonlimiting examples of the RT are MLV-RT, a Group I intron RT, and a Group II intron RT. A non-limiting example of a Type II Cas effector protein is SpCas9. The plasmid also encodes an extended intermediary RNA and a crRNA. The extended intermediary RNA comprises from 5’ to 3’: a template sequence, a repeat hybridization sequence , and a protein binding sequence. The crRNA comprises from 5’ to 3’: a spacer sequence that hybridizes to a target sequence of a target strand of a dsDNA target nucleic acid and a repeat sequence that hybridizes to the repeat hybridization sequence of the intermediary RNA.
[222] Eukaryotic cells are transfected with the plasmid. In some instances, the plasmid is an AAV vector. In other instances, the intermediary RNA and crRNA are synthesized or encoded by a second plasmid, and then complexed with the translated protein for delivery to the cells. Without being bound by theory, the Type II Cas effector protein forms a RNP complex with the intermediary RNA and crRNA. The RNP complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid. The RNP complex cleaves at least one strand of the R-loop to produce a cut site. The RT reverse transcribes the template sequence to produce a RT-DNA to be used as a ss donor nucleic acid at the 3’ end of the crRNA. Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
Example 5: Type V Retron System
[223] An AAV vector is constructed to express a fusion protein comprising a retron fused to the N terminus or C terminus of a Type V Cas effector protein. Non-limiting examples of retrons are Ec86 retron
and Sal63 retron. Non-limiting examples of Type V Cas effector proteins are CasM.265446 (SEQ ID NO: 1) and CasM. 19952 (SEQ ID NO: 2). The AAV vector additionally encodes an intermediary RNA, a crRNA, and a template sequence encoding RNA flanked by two secondary structures.
[224] Eukaryotic cells are transfected with the AAV vector. Without being bound by theory, the Type V Cas effector protein forms a RNP complex with the intermediary RNA and crRNA. The complex binds the target nucleic acid, thereby forming an R-loop in the dsDNA target nucleic acid. The complex cleaves at least one strand of the R-loop to produce a cut site. The retron reverse transcribes the donor encoding RNA, thereby producing a RT-DNA to be used as a donor nucleic acid comprising secondary elements at 5’ and 3 ’ ends of the donor. Activity is confirmed by genomic sequencing of the incorporated edit encoded by the donor nucleic acid.
Example 6: CasM.265466 cleaves DNA with an extended crRNA and RT synthesizes RT-DNA to be used as a donor nucleic acid
[225] Ribonucleoprotein (RNP) complexes were produced by incubating CasM.265466 with various intermediary RNAs and extended crRNAs for 20 min at room temp. Reverse transcriptase (RT) was added to the RNPs and incubated for 30 min at 37°C for OCAT synthesis. Target DNA was added after RT synthesis to test for cleavage and potential transDNAse activity. Samples were run on denaturing polyacrylamide gel. Controls: No intRNA = Cas + crRNA only. crRNA intRNA and no Cas. Results demonstrated that CasM.265466 was able to cleave target DNA in the presence of extended crRNAs and intermediary RNAs. Results are shown in FIG. 4. Intermediary RNA numbers in FIG. 4 correspond to the length of the intermediary RNA.
[226] Furthermore, RT-DNA synthesis from intRNA-20, intRNA-22, and intRNA-23 was observed in the presence of CasM.265466, see FIG. 5. Expected product size was 230-250 nucleotides. Difference in observed size may be due to the resulting product being a DNA/RNA hybrid which will run differently than pure RNA on the gel.
[227] RT-DNA synthesis was also observed before cutting. The following figure shows RT is able to synthesize the intRNA in the absence of CasM.265466, see FIG. 6. Expected product size (230-250 nt). Difference in observed size may be due to the resulting product being a DNA/RNA hybrid which will run differently than pure RNA on the gel.
[228] CasM.265466 was still able to cleave target DNA after DNA synthesis on the intRNA, see FIG. 7.
[229] Any trans cleavage activity that CasM.265466 might possess did not affect the resulting product. DNA-RNA hybrid product was still present for intRNA-20, intRNA-22, and intRNA-23 even after addition and cleavage of target DNA. See FIG. 8.
Example 7: RT-DNA is synthesized in presence of CasM.19952 and extended crRNA.
[230] Ribonucleoprotein (RNP) complexes were produced by incubating CasM. 19952 with various intRNAs and extended crRNAs for 20 min at room temp. Reverse transcriptase (RT) was added to the RNPs and incubated for 30 min at 37°C for OCAT synthesis. Synthesis occurred for intRNA-4 and intRNA- 13 in presence of CasM.19952, see FIG. 9A and FIG. 9B. IntRNA numbers in FIG. 9A and FIG. 9B correspond to the length of the intRNA. Expected product size was 230-250 nt. Difference in observed size may be due to the resulting product being a DNA/RNA hybrid which will run differently than pure RNA on the gel. RT was able to synthesize without the need for a longer intRNA. IntRNA-15 and intRNA-16 were negative controls due to lack of complementarity to crRNA-480F.
[231] Any trans cleavage activity that CasM.19952 might possess did not affect the resulting product. Synthesized product was still present for intRNA-4/13 even after addition of target DNA. See FIG. 10.
[232] Synthesis of the RT-DNA occurred in the presence of CasM.19952. Synthesis occurred for intRNA-5 and intRNA-6 in the presence of CasM. 19952 (~300 nt). See FIG. 11A and FIG. 11B. RT was able to synthesize without the need for a longer intRNA. IntRNA-9 and intRNA- 10 were negative controls due to lack of complementarity to crRNA-492F (with RT left panel; without RT right panel).
Example 8. RT-DNA is synthesized in presence of Cas9 and extended intRNA.
[233] Ribonucleoprotein (RNP) complexes were produced by incubating SpCas9 (SEQ ID NO: 5) with various extended intRNAs and crRNAs for 20 min at room temp. Reverse transcriptase (RT) was added to the RNPs and incubated for 30 min at 37°C for OCAT synthesis. Synthesis occurred for crRNA-38, crRNA- 39, and crRNA-40 in the absence of spCas9. Expected product size was 230-250 nt. See FIG. 12. Difference in observed size may be due to the resulting product being a DNA/RNA hybrid which will run differently than pure RNA on the gel.
[234] Synthesis occurred for crRNA-38 and crRNA-39 in the presence of spCas9. Expected product size was 230-250 nt. See FIG. 13. Difference in observed size may be due to the resulting product being a DNA/RNA hybrid which will run differently than pure RNA on the gel. Without being bound by theory, lack of synthesis for crRNA-40 may indicate spCas9 requires longer crRNA to allow for RT access. RT synthesis did not affect DNA cleavage by spCas9, see FIG. 14.
Example 9. RT synthesizes RT-DNA in presence of crRNA and CasM.265466
[235] Ribonucleoprotein (RNP) complexes were formed by incubating CasM.265466 protein (200nM) with intRNA (100 nM) and crRNA (100 nM) together for 20 min at room temperature. crRNAs and extended intRNAs used for this experiment are provided in TABLE 8 below; the extended portion of the intRNAs are lowercase. 1 pl of reverse transcriptase enzyme (RT) was added to the RNA (200 units/ul) and incubated for 30 min at 37°C for intRNA synthesis of RT-DNA. Target DNA was added after RT- DNA synthesis to test for cleavage and potential transDNAse activity. Samples were run on a 15% denaturing polyacrylamide gel for analysis.
[236] FIG. 15A shows that CasM.265466 can cleave target DNA in the presence of a normal length (non-
extended) crRNA and various intRNAs. FIG. 15B shows that CasM.265466 can still cleave target DNA in the presence of an extended crRNA and various intRNAs. FIG. 16A and FIG. 16B shows that an extended crRNA remains intact after target DNA cleavage (non-target (NT) DNA was included as a control). FIG. 17A represents various intRNAs that were tested with homology sequence lengths of 5 to 29 nucleotides to the crRNA. FIG. 17B shows that RT was able to synthesize RT-DNA in the presence of intRNAs and CasM.265466. Synthesis products were generated using crRNA 1471 F and intRNAs with homology sequence lengths of 13, 21 and 29 nucleotides. Without being bound by theory, the homology sequence may allow reverse transcriptase access to the RNA for synthesis of RT-DNA (e.g., the longer the intRNA, the less the chance of the Cas nuclease blocking the RT for synthesis of RT-DNA). These homology sequences contained 1 nucleotide mismatch relative to the crRNA (with the exception of int-20, which has perfect homology to the crRNA). The expected synthesis product size was 250 nucleotides, but the actual size was closer to 300 nucleotides. The difference in observed size may be due to the extended product being a DNA/RNA hybrid which will run differently than pure RNA on the gel. This was confirmed with RNA digestion. After RT synthesis, samples were heated at 95°C for 10 minutes, and RNase A added, allowing for digestion to occur at 37°C for 30 min. FIG. 18 shows RNase digestion post synthesis to remove RNA for better visualization of DNA products, the expected DNA lengths post digestion are about 175 nt. A stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced). FIG. 19A and FIG. 19B show that synthesized cDNA was not affected by any trans DNAse activity. Addition of target DNA post synthesis did not result in trans DNAse cleavage of cDNA products, and the synthesis product did not prevent proper target DNA cleavage.
Lowercase = extension sequence; Bold uppercase = repeat sequence; Italicized uppercase = spacer sequence
Example 10. RT extends intRNA in presence of CasM.19952
[237] Ribonucleoprotein (RNP) complexes were formed by incubating CasM.19952 protein (200 nM) with intRNA (100 nM) and crRNA (100 nM) together for 20 min at room temperature. crRNAs and intRNAs used for this experiment are provided in TABLE 9 below, (the repeat sequence of the crRNA is shown in bold font and the spacer sequence is shown in italicized font; the extended portion of the intRNAs are lowercase). 1 pl of reverse transcriptase enzyme (RT) was added to the RNA (200 units/ul) and incubated for 30 min at 37°C for intRNA synthesis of RT-DNA. Target DNA was added after synthesis to test for cleavage and potential trans DNAse activity. Samples were run on a 15% denaturing polyacrylamide gel for analysis.
[238] FIG. 20A represents various intRNAs that were tested with homology sequence lengths of 3 to 28 nucleotides. FIG. 20B and FIG. 20C shows that RT was able to synthesize RT-DNA in the presence of intRNAs and CasM.19952. Synthesis products were generated using crRNA 480 F and intRNAs with homology sequence lengths of 10, 19 and 28 nucleotides. These homology sequences contained 1 nucleotide mismatch relative to the crRNA. The expected synthesis product size was 250 nucleotides, but the actual size was closer to 300 nucleotides. The difference in observed size may be due to the synthesis product being a DNA/RNA hybrid which will run differently than pure RNA on the gel. This was confirmed with RNA digestion. After RT synthesis, samples were heated at 95°C for 10 minutes, and RNase A added, allowing for digestion to occur at 37°C for 30 min.
[239] FIG. 21 shows RNase digestion post synthesis to remove RNA for better visualization of DNA products, expected DNA lengths post digestion are about 175 nt. A stepwise decrease in DNA length for RNAse digested samples was observed due to the differing lengths of the crRNA (longer intRNA substrate, shorter cDNA produced).
[240] FIG. 22A and FIG. 22B show that synthesized cDNA was not affected by any trans DNAse activity. Addition of target DNA post synthesis did not result in trans DNAse cleavage of cDNA products.
[241] Additional guide designs were tested with similar results. FIG. 23A represents various intRNAs that were tested and FIG. 23B and FIG. 23C shows that RT was able to synthesize RT-DNA in the presence of intRNAs and CasM.19952. RNA digestion confirmed the synthesized DNA products (see FIG. 24A and FIG. 24B)
Lowercase = extension sequence; Bold uppercase = repeat sequence; Italicized uppercase = spacer sequence
Example 11. RT synthesizes RT-DNA in the presence of intRNA and SpCas9
[242] Ribonucleoprotein (RNP) complexes were formed by incubating SpCas9 protein (200 nM) with intRNA (100 nM) and crRNA (100 nM) together for 20 min at room temperature. crRNAs and intRNAs used for this experiment are provided in TABLE 10 below, (the repeat sequence of the crRNA is shown in bold font and the spacer sequence is shown in italicized font; the extended portion of the intRNAs are italicized; the int portion of the intRNA italicized and the extension in bold font). 1 pl of reverse transcriptase enzyme (RT) was added to the RNA (200 units/ul) and incubated for 30 min at 37°C for intRNA synthesis of RT-DNA. Target DNA was added after synthesis to test for cleavage and potential trans DNAse activity. Samples were run on a 15% denaturing polyacrylamide gel for analysis.
[243] FIG. 25A shows that SpCas9 can cleave dsDNA with a dual guide system of crRNA and intRNA (intRNA 4242). FIG. 25B shows that SpCas9 can still cleave dsDNA with a dual guide system containing an extended crRNA and an extended intRNA. Modified crRNA (crNRA-36-39) had slightly decreased cleavage activity compared to WT crRNA (crRNA-40) and sgRNA.
[244] FIG. 26A shows a representation of different crRNAs that were tested with homology sequence lengths of 10, 17 and 20 nucleotides that were complementary to the 5’ end of the intRNA extension sequence. FIG. 26B shows that RT was able to synthesize RT-DNA in the presence of crRNAs and SpCas9. Synthesis products were generated using intRNA-4242R and crRNAs with homology sequence lengths of 10 and 20 nucleotides. The expected synthesis product size was 250 nucleotides, but the observed size was closer to 300 nucleotides. The difference in observed size may be due to the synthesized product being a DNA/RNA hybrid which will run differently than pure RNA on the gel.
[245] Target DNA was added after synthesized product was generated. FIG. 27 shows that the cDNA synthesis product does not affect the ability of SpCas9 to cleave target DNA.
Lowercase = extension sequence; Bold uppercase = repeat sequence; Italicized uppercase = spacer sequence; Uppercase (non-bold, non-italicized) = intermediary sequence
Claims
1. A system comprising: a) an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein is a Type V Cas protein; b) a reverse transcriptase (RT) or a nucleic acid that encodes the RT; c) an intermediary RNA or DNA molecule encoding the same, the intermediary RNA comprising from 5 ’ to 3 ’ : i) a protein binding sequence, and ii) a repeat hybridization sequence, and d) an extended crRNA or DNA molecule encoding the same, the extended crRNA comprising from 5 ’ to 3 ’ : i) a template sequence, ii) a repeat sequence that is capable of hybridizing to the repeat hybridization sequence of the intermediary RNA, and iii) a spacer sequence that is capable of hybridizing to a target sequence of a target strand of a double-stranded (dsDNA) target nucleic acid, wherein the Type V Cas effector protein is capable of forming a complex (RNP complex) with the intermediary RNA and the extended crRNA, wherein the RNA complex is capable of binding the target nucleic acid, thereby forming an R- loop in the dsDNA target nucleic acid, wherein the RNP complex is capable of cleaving at least one strand of the R-loop to produce a cut site, wherein the RT is capable of reverse transcribing the template sequence to produce a single stranded donor (ss donor) reverse -transcribed DNA (RT-DNA) at the 3’ end of the intermediary RNA, and wherein the system is capable of incorporating the ss donor RT-DNA into a non-target strand at the cut site.
2. A system comprising: a) an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein is a Type V Cas protein; b) a reverse transcriptase (RT) or a nucleic acid that encodes the RT;
c) an intermediary RNA or DNA molecule encoding the same, the intermediary RNA comprising from 5 ’ to 3 ’ : i) a protein binding sequence, and ii) a repeat hybridization sequence, and d) an extended crRNA or DNA molecule encoding the same, the extended crRNA comprising from 5 ’ to 3 ’ : i) a template sequence, ii) a repeat sequence that is capable of hybridizing to the repeat hybridization sequence of the intermediary RNA, and iii) a spacer sequence that is capable of hybridizing to a target sequence of a target strand of a dsDNA target nucleic acid.
3. The system of claim 1 or 2, wherein the RT is fused to the effector protein.
4. The system of any one of claims 1-3, further comprising a domain that is capable of recruiting proteins involved in synthesis dependent strand annealing (SDSA) or nucleic acid encoding the same, wherein the domain is further capable of synthesizing a complementary strand on the ss donor nucleic acid to produce an edited DNA that is directly attached to a genomic DNA.
5. The system of any one of claims 1-4, wherein the system is capable of repairing a dsDNA target nucleic acid by a native DNA repair mechanism after insertion of the ss donor nucleic acid.
6. The system of any one of claims 1-5, further comprising:
(a) an endonuclease that is capable of removing a portion of the cleaved strand of the R-loop; or
(b) an accessory protein that is capable of recruiting the same, optionally wherein the endonuclease or accessory protein is fused to the effector protein, RT, or combination thereof; or a nucleic acid encoding the same.
7. The system of any one of claims 1-6, further comprising a ligase or nucleic acid encoding the same, optionally wherein the ligase is fused to or interacts with any one of the effector protein, RT, and accessory protein.
8. The system of any one of claims 1-7, wherein the length of the Type V Cas effector protein is 300-400, 400-500, 500-600, 600-700, or 700-800 linked amino acids.
9. The system of any one of claims 1-8, wherein the Type V Cas effector protein homodimerizes.
10. The system of any one of claims 1-9, wherein the Type V Cas effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 2.
11. The system of any one of claims 1-10, wherein the Type V Cas effector protein comprises an amino acid sequence that is at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or 100% identical to SEQ ID NO: 1.
12. The system of any one of claims 1-11, wherein the template sequence is not capable of hybridizing to the target sequence or the reverse complement thereof.
13. The system of any one of claims 1-12, wherein at least two nucleotides of the template sequence are complementary to at least two nucleotides at the 3’ end of the intermediary RNA.
14. The system of any one of claims 1-13, wherein the intermediary RNA does not comprise an extended sequence on the 3’ end.
15. The system of any one of claims 1-14, wherein the intermediary RNA does not comprise a sequence at its 3 ’ end that is complementary to a portion of the template sequence or target sequence.
16. The system of any one of claims 1-15, wherein the system comprises a single effector protein.
17. A system comprising: a) an effector protein, or a nucleic acid encoding the effector protein, wherein the effector protein is a Type II Cas protein; b) a reverse transcriptase (RT) or a nucleic acid that encodes the RT; c) an extended intermediary RNA or DNA molecule encoding the same, the extended intermediary RNA comprising from 5 ’ to 3 ’ : i) a template sequence, ii) a repeat hybridization sequence, and iii) a protein binding sequence; and d) a crRNA or DNA molecule encoding the same, the crRNA comprising from 5’ to 3’: i) a spacer sequence that is capable of hybridizing to a target sequence of a target strand of a dsDNA target nucleic acid, and ii) a repeat sequence that is capable of hybridizing to the repeat hybridization sequence of the intermediary RNA, and wherein the Type II Cas effector protein is capable of forming a complex with the intermediary RNA and crRNA (RNP complex), wherein the RNP complex is capable of binding the target nucleic acid, thereby forming an R- loop in the dsDNA target nucleic acid, wherein the complex is capable of cleaving at least one strand of the R-loop to produce a cut site,
wherein the RT is capable of reverse transcribing the template sequence to produce a RT-DNA for use as a single stranded donor nucleic acid (ss donor nucleic acid ) at the 3 ’ end of the crRNA, and wherein the system is capable of inserting the ss donor nucleic acid is inserted into a nontarget strand at the cut site.
18. The system of claim 17, wherein the RT is fused to the effector protein.
19. The system of claim 17 or 18, comprising a domain that is capable of recruiting proteins involved in synthesis dependent strand annealing (SDSA) or nucleic acid encoding the same, wherein the domain is further capable of synthesizing a complementary strand on the ss donor nucleic acid to produce an edited DNA that is directly attached to a genomic target nucleic acid DNA.
20. The system of any one of claims 17-19, wherein the system is capable of repairing the dsDNA target nucleic acid by a native DNA repair mechanism after insertion of the ss donor nucleic acid.
21. The system of any one of claims 17-20, wherein the system comprises an endonuclease that is capable of removing a portion of the cleaved strand of the R-loop, or an accessory protein that is capable of recruiting the same, optionally wherein the endonuclease or accessory protein is fused to the effector protein, RT, or combination thereof; or a nucleic acid encoding the same.
22. The system of any one of claims 17-21, wherein the system further comprises a ligase or nucleic acid encoding the same, optionally wherein the ligase is fused to or interacts with any one of the effector protein, RT, and accessory protein.
23. The system of any one of claims 17-22, wherein the template sequence cannot hybridize to the target sequence or the reverse complement thereof.
24. The system of any one of claims 17-23, wherein at least two nucleotides of the donor sequence are complementary to at least two nucleotides at the 3’ end of the intermediary RNA.
25. The system of any one of claims 17-24, wherein the intermediary RNA does not comprise an extended sequence on the 3’ end.
26. The system of claim 17, wherein the intermediary RNA does not comprise a sequence at its 3’ end that is complementary to a portion of the template sequence or target sequence.
27. The system of claim 17, wherein the system comprises a single effector protein.
28. The system of any one of claims 1-16, wherein the intermediary RNA comprises a homology sequence attached to the 3’ end of the intermediary RNA.
29. The system of claim 28, wherein the homology sequence has a length from 10 to 50 nucleotides.
30. The system of claim 28 or 29, wherein the homology sequence has 1 or more nucleotide mismatches to the extended crRNA sequence.
31. The system of any one of claims 17-27, wherein the crRNA comprises a homology sequence attached to the 5’ end of the crRNA.
32. The system of claim 31, wherein the homology sequence has a length from 10 to 50 nucleotides.
33. The system of claim 31 or 32, wherein the homology sequence has 1 or more nucleotide mismatches to the extended intermediary RNA sequence.
34. The system of any one of claims 1-16, wherein the nucleic acid encoding the effector protein, the nucleic acid that encodes the RT, the DNA molecule encoding the intermediary RNA, the DNA molecule encoding the extended crRNA, or any combination thereof, are all present in a nucleic acid expression vector.
35. The system of claim 34, wherein the nucleic acid expression vector is an adeno-associated viral vector.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363490490P | 2023-03-15 | 2023-03-15 | |
| US63/490,490 | 2023-03-15 | ||
| US202363502909P | 2023-05-17 | 2023-05-17 | |
| US63/502,909 | 2023-05-17 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2024192274A2 true WO2024192274A2 (en) | 2024-09-19 |
| WO2024192274A3 WO2024192274A3 (en) | 2024-10-31 |
Family
ID=92756054
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/019986 Pending WO2024192274A2 (en) | 2023-03-15 | 2024-03-14 | On cas template synthesis (ocats) systems and uses thereof |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024192274A2 (en) |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021183850A1 (en) * | 2020-03-13 | 2021-09-16 | The Regents Of The University Of California | Compositions and methods for modifying a target nucleic acid |
| US20240263173A1 (en) * | 2021-08-11 | 2024-08-08 | The Board Of Trustees Of The Leland Stanford Junior University | High-throughput precision genome editing in human cells |
-
2024
- 2024-03-14 WO PCT/US2024/019986 patent/WO2024192274A2/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024192274A3 (en) | 2024-10-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240417757A1 (en) | Methods and compositions for modulating a genome | |
| US20230242899A1 (en) | Methods and compositions for modulating a genome | |
| CN116419975A (en) | Systems, methods, and compositions for site-specific genetic engineering using Programmable Addition (PASTE) with site-specific targeting elements | |
| US20240218393A1 (en) | Vectors encoding gene editing systems and uses thereof | |
| CA3104028A1 (en) | Compositions and methods for genomic editing by insertion of donor polynucleotides | |
| US20240301379A1 (en) | Effector proteins and uses thereof | |
| US20240384247A1 (en) | Effector proteins and uses thereof | |
| EP4337701A1 (en) | Effector proteins and methods of use | |
| US20240327812A1 (en) | Fusion effector proteins and uses thereof | |
| US20250179454A1 (en) | Fusion proteins and uses thereof for precision editing | |
| WO2022241032A1 (en) | Enhanced guide nucleic acids and methods of use | |
| WO2024238321A1 (en) | Compositions and methods for targeted epigenetic modification | |
| WO2024138202A2 (en) | Effector proteins, compositions, systems and methods of use thereof | |
| WO2024220911A1 (en) | Effector proteins, compositions, systems and methods of use thereof | |
| WO2024192274A2 (en) | On cas template synthesis (ocats) systems and uses thereof | |
| WO2023220649A2 (en) | Effector protein compositions and methods of use thereof | |
| WO2023220654A2 (en) | Effector protein compositions and methods of use thereof | |
| WO2025072763A2 (en) | Compositions and methods for precision editing with cas dimers | |
| US20250145974A1 (en) | Engineered cas-phi proteins and uses thereof | |
| US20250101498A1 (en) | Effector proteins, compositions, systems, devices, kits and methods of use thereof | |
| WO2025227064A1 (en) | Genomic editing methods to treat cardiovascular disease, and compositions for use in practicing the same | |
| WO2025024285A1 (en) | Compositions for the modification of the human c9orf72 gene | |
| WO2024263707A1 (en) | Compositions for the treatment of amyotrophic lateral sclerosis | |
| US20240131187A1 (en) | Effector proteins, effector partners, compositions, systems and methods of use thereof | |
| WO2025137123A1 (en) | Compositions and methods for circularizing donor nucleic acids |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24771757 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |