WO2025201481A1 - Crispr-cas systems - Google Patents
Crispr-cas systemsInfo
- Publication number
- WO2025201481A1 WO2025201481A1 PCT/CN2025/085450 CN2025085450W WO2025201481A1 WO 2025201481 A1 WO2025201481 A1 WO 2025201481A1 CN 2025085450 W CN2025085450 W CN 2025085450W WO 2025201481 A1 WO2025201481 A1 WO 2025201481A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- stem
- loop
- crrna
- crispr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
- C12N15/907—Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/70—Vectors or expression systems specially adapted for E. coli
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/88—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation using microencapsulation, e.g. using amphiphile liposome vesicle
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
Definitions
- the present disclosure provides novel CRISPR-Cas systems and uses thereof.
- a CRISPR-Cas system is characterized by elements that promote the formation of a CRISPR-Cas complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR-Cas system) .
- target sequence refers to a sequence to which a guide molecule is designed to target, e.g., have complementarity, where hybridization between a target sequence and a sequence of a guide molecule promotes the formation of a CRISPR-Cas complex.
- a target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides and is comprised within a target locus of interest. In some embodiments, a target sequence is located in the nucleus or cytoplasm of a cell.
- the action of the CRISPR-Cas system is usually divided into three stages: (1) adaptation or spacer integration, (2) processing of the primary transcript of the CRISPR locus (pre-crRNA) and maturation of the crRNA which includes the spacer and variable regions corresponding to 5’ and 3’ fragments of CRISPR repeats, and (3) DNA (or RNA) interference.
- pre-crRNA primary transcript of the CRISPR locus
- maturation of the crRNA which includes the spacer and variable regions corresponding to 5’ and 3’ fragments of CRISPR repeats
- DNA (or RNA) interference DNA (or RNA) interference.
- protein factors such as Cas1 and Cas2 that are present in the great majority of the known CRISPR-Cas systems are sufficient for the insertion of spacers into the CRISPR cassettes.
- CRISPR guide molecule of a CRISPR-Cas system particular a Type V CRISPR-Cas system.
- DR direct repeat
- the guide molecule design disclosed herein is largely independent of the spacer sequence and tolerates variations in the DR sequence, demonstrating broad adaptability of the present guide molecule design to various CRISPR-Cas systems, including its use with various Type V CRISPR effector proteins, such as many subclasses of the Cas12 proteins.
- a CRISPR RNA comprising, in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, and a second stem-loop sequence, wherein the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 3-12 base pairs and a first loop of about 3-10 nucleotides; wherein the connector comprises about 3-10 nucleotides; and wherein the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 3-12 base pairs and a second loop of about 3-15 nucleotides.
- crRNA CRISPR RNA
- the crRNA further comprises a floater region 5’ to the first stem-loop sequence, wherein the floater is at least about 1 nucleotide, 2 nucleotides, or 3 nucleotides.
- the second stem is about 5 base pairs, and the second loop is about 5, 6, or 7 nucleotides.
- the first stem is about 7 or 8 base pairs. In some embodiments, the first loop is about 4 nucleotides. In some embodiments, the first stem is about 7 or 8 base pairs, and the first loop is about 4 nucleotides. In some embodiments, the first stem is 7 base pairs, and the first loop is about 4 nucleotides. In some embodiments, the first loop comprises the sequence of 5’ -GAAA-3’ .
- one or more nucleotides of the crRNA molecule are methylated. In some embodiments, one or more nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, all nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, no nucleotide from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop is methylated. In some embodiments, one or more nucleotides of the 3' end of the crRNA molecule are methylated. In some embodiments, up to 3 nucleotides of the 3' end of the crRNA molecule are methylated.
- the spacer region is about 20 to 40 nucleotides.
- the first stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%to a first stem-loop sequence of any of SEQ ID NOs: 2, and 478-494; optionally wherein the first stem-loop sequence consists of a first stem-loop sequence of any of SEQ ID NOs: 2, and 478-494.
- the first stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%to a first stem-loop sequence set forth in Table 1; optionally wherein the first stem-loop sequence consists of a first stem-loop sequence set forth in Table 1.
- the crRNA is a single-stranded polynucleotide, wherein the single-stranded polynucleotide comprises a sequence of any of SEQ ID NOs: 73-120, 456-476, and 547-631. In some embodiments, the crRNA is a single-stranded polynucleotide, wherein the single-stranded polynucleotide comprises a sequence set forth in Table 3.
- a modified Type V CRISPR RNA comprising at least one stem-loop sequence connected to the 5’ end of a naturally-existing Type V crRNA or a functional derivative thereof, wherein the stem-loop sequence is capable of forming a stem-loop structure having a stem of about 3-12 base pairs and a loop of about 3-10 nucleotides.
- the stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%to a sequence set forth in Table 1; optionally wherein the stem-loop sequence comprises a sequence set forth in Table 1; optionally wherein the stem-loop sequence consists of a sequence set forth in Table 1.
- the stem-loop sequence is connected to the 5’ end of the naturally-existing CRISPR-Type V guide RNA or the functional derivative thereof via a connector sequence, and wherein the connector sequence comprises about 3-10 nucleotides.
- the naturally existing Type V crRNA is processed from a CRISPR array located 3’ to a Type V CRISPR-Cas locus.
- the naturally existing Type V crRNA comprises a sequence set forth in any of SEQ ID NOs: 18-70; optionally wherein the naturally existing Type V crRNA or functional derivative thereof consists of a sequence set forth in any of SEQ ID NOs: 18-70.
- the naturally existing Type V crRNA comprises a sequence set forth in Table 2; optionally wherein the naturally existing Type V crRNA or functional derivative thereof consists of a sequence set forth in Table 2.
- CRISPR Clustered Regularly Interspaced Short Palindromic Repeat
- a guide molecule or a polynucleotide encoding the guide molecule wherein the guide molecule comprises the crRNA provided herein, or the modified Type V crRNA provided herein.
- the CRISPR effector protein comprises a RuvC-like endonuclease domain.
- the RuvC-like endonuclease domain comprises one or more RuvC motifs selected from a RuvC I motif, a RuvC II motif and RuvC III motif.
- the RuvC I motif comprises the amino acid sequence of X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N.
- the RuvC III motif comprises the amino acid sequence of X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H; X2 is A, S, R, or G; X is any amino acid; X3 is N, V, I, or E; X4 is A, G, S, or K; X5 is A, or S; X6 is N, H, V, or G; X7 is I, L, or V; X8 is A, G, or L.
- the CRISPR effector protein comprises a zinc-finger protein domain, optionally the Zinc finger domain is inserted in the RuvC-like endonuclease domain.
- the CRISPR effector protein further comprises a wedge (WED) domain.
- WED wedge
- the CRISPR effector protein is less than about 1400 amino acids in length. In some embodiments, the CRISPR effector protein is less than about 1300 amino acids in length. In some embodiments, the CRISPR effector protein is less than about 1200 amino acids in length. In some embodiments, the CRISPR effector protein is less than about 1100 amino acids in length. In some embodiments, the CRISPR effector protein is more than about 175 amino acids in length. In some embodiments, the CRISPR effector protein is more than about 200 amino acids in length. In some embodiments, the CRISPR effector protein is more than about 225 amino acids in length. In some embodiments, the CRISPR effector protein is more than about 250 amino acids in length.
- the CRISPR effector protein is a Type V CRISPR effector protein.
- the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12b1 (C2c1) , Cas12b2, Cas12c (C2c3) , Cas12d (CasY) , Cas12e (CasX) , Cas12f1 (Cas14a) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12g, Cas12h, Cas12i, Cas 12j (Cas ⁇ -2) , Cas12k (C2c5) , Cas 12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof.
- the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12d (CasY) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12h, Cas12i, Cas 12j (Cas ⁇ -2) , Cas12k (C2c5) , Cas 12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof.
- the CRISPR effector protein is fused to a deaminase catalytic domain, a DNA methylation catalytic domain, a DNA demethylation catalytic domain, a histone residue modification domain, a nuclease catalytic domain, a fluorescent protein, a transcription modification factor; optionally wherein the deaminase catalytic domain is selected from the group consisting of an adenosine deaminase catalytic domain and a cytidine deaminase catalytic domain.
- the CRISPR effector protein is fused to a reverse transcriptase, and wherein the CRISPR-Cas system further comprises a donor template nucleic acid, wherein optionally the donor template nucleic acid is a DNA or RNA.
- the CRISPR effector protein is fused to a DNA methylation domain, a histone residue modification domain, a localization factor, a transcription modification factor, a light-gated control factor, a chemically inducible factor, or a chromatin visualization factor.
- the polynucleotide encoding the CRISPR effector protein and/or the polynucleotide encoding the guide molecule are mRNA molecules.
- the mRNA encoding the CRISPR effector protein and the mRNA encoding the guide molecule are present in a delivery system selected from the group consisting of a lipid nanoparticle, a liposome, an exosome, a micro-vesicles, and a gene-gun.
- the polynucleotide encoding the CRISPR effector protein and/or the polynucleotide encoding the guide molecule are operably linked to a promoter.
- the polynucleotide encoding the CRISPR effector protein and/or the polynucleotide encoding the guide molecule are in a vector selected from a retroviral vector, a lentiviral vector, a phage vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.
- the system lacks a tracrRNA.
- the CRISPR effector protein and the guide molecule form a complex that associates with the target nucleic acid, thereby modifying the target nucleic acid.
- the spacer region is between about 15 and about 50 nucleotides in length.
- a cell comprising the system provided herein.
- the cell is a eukaryotic cell.
- the cell is a prokaryotic cell.
- provided herein is a method of targeting and nicking a non-spacer complementary strand of a double-stranded target nucleic acid upon recognition of a spacer complementary strand of the double-stranded target nucleic acid, the method comprising contacting the double-stranded target DNA with a system provided herein.
- provided herein is a method of targeting and cleaving a double-stranded target nucleic acid, comprising contacting the double-stranded target DNA with a system provided herein.
- both strands of target DNA are cleaved at different sites, resulting in a staggered cut. In some embodiments, both strands of target DNA are cleaved at the same site, resulting in a blunt double-strand break.
- provided herein is a method of targeting and cleaving a single-stranded target DNA, the method comprising contacting the target nucleic acid with a system provided herein.
- a method of detecting a target nucleic acid in a sample comprising:
- the method further comprises comparing a level of the detectable signal with a reference signal level, and determining an amount of target nucleic acid in the sample based on the level of the detectable signal.
- the measuring is performed using gold nanoparticle detection, fluorescence polarization, colloid phase transition/dispersion, electrochemical detection, or semiconductor based-sensing.
- a method of specifically editing a double-stranded nucleic acid comprising contacting, under sufficient conditions and for a sufficient amount of time,
- a method of editing a double-stranded nucleic acid comprising contacting, under sufficient conditions and for a sufficient amount of time,
- a fusion protein comprising a CRISPR effector protein and a protein domain with DNA modifying activity and a guide molecule targeting the double-stranded nucleic acid
- the CRISPR effector protein of the fusion protein is modified to nick a non-target strand of the double-stranded nucleic acid.
- a method of inducing genotype-specific or transcriptional-state-specific cell death or dormancy in a cell comprising contacting a cell with a system provided herein, wherein the guide molecule hybridizing to the target DNA causes a collateral DNase activity-mediated cell death or dormancy.
- the cell is a prokaryotic cell.
- the cell is a eukaryotic cell.
- the cell is a mammalian cell.
- the cell is a cancer cell.
- the cell is an infectious cell or a cell infected with an infectious agent.
- the cell is a cell infected with a virus, a cell infected with a prion, a fungal cell, a protozoan, or a parasite cell.
- a method of treating a condition or disease in a subject in need thereof comprising administering to the subject a system provided herein, wherein the spacer sequence is complementary to at least 15 nucleotides of a target nucleic acid associated with the condition or disease; wherein the CRISPR effector protein associates with the guide molecule to form a complex; wherein the complex binds to a target nucleic acid sequence that is complementary to the at least 15 nucleotides of the spacer sequence; and wherein upon binding of the complex to the target nucleic acid sequence the CRISPR effector protein cleaves the target nucleic acid, thereby treating the condition or disease in the subject.
- the condition or disease is a cancer or an infectious disease.
- the condition or disease is selected from the group consisting of Cystic Fibrosis, Duchenne Muscular Dystrophy, Becker Muscular Dystrophy, Alpha-1 -antitrypsin Deficiency, Pompe Disease, Myotonic Dystrophy, Huntington Disease, Fragile X Syndrome, Friedreich's ataxia, Amyotrophic Lateral Sclerosis, Frontotemporal Dementia, Hereditary Chronic Kidney Disease, Hyperlipidemia, Hypercholesterolemia, Leber Congenital Amaurosis, Sickle Cell Disease, and Beta Thalassemia, Familial Hypercholesterolemia (FH) , Transthyretin Amyloidosis (ATTR) , Primary Hyperoxaluria (PH1) , Hereditary Angioedema (HAE) , and Atherosclerotic Cardiovascular Disease (ASCVD) .
- Cystic Fibrosis Duchenne Muscular Dystrophy
- Becker Muscular Dystrophy Alpha-1 -antitrypsin Defici
- the condition or disease is cancer
- the cancer is selected from the group consisting of Wilms'tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, and urinary bladder cancer.
- the condition or disease is infectious, and wherein the infectious agent is selected from the group consisting of human immunodeficiency virus (HIV) , herpes simplex virus-l (HSV1) , and herpes simplex virus-2 (HSV2) , Hepatitis B.
- HIV human immunodeficiency virus
- HSV1 herpes simplex virus-l
- HSV2 herpes simplex virus-2
- Hepatitis B Hepatitis B.
- system provided herein or the cell provided herein is for use as a medicament.
- system provided herein or the cell provided herein is for use in the treatment or prevention of a cancer or an infectious disease.
- the cancer is selected from the group consisting of Wilms'tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, and urinary bladder cancer.
- provided herein is use of the system provided herein or cell provided herein for an in vitro or ex vivo method of:
- provided herein is use of the provided herein or cell provided herein in a method of:
- the method does not comprise a process for modifying the germ line genetic identity of a human being and does not comprise a method of treatment of the human or animal body.
- cleaving the target DNA or target nucleic acid results in the formation of an indel.
- cleaving the target DNA or target nucleic acid results in the insertion of a nucleic acid sequence.
- a eukaryotic cell comprising a modified target locus of interest, wherein the target locus of interest has been modified according to a method or via use of a composition of any one of the preceding claims.
- the modification of the target locus of interest results in:
- the eukaryotic cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is increased;
- the eukaryotic cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is decreased;
- the eukaryotic cell comprises a mammalian cell.
- the mammalian cell comprises a human cell.
- FIG. 6 shows the results of editing efficiency of CasY7 and LbCpf1 in E. Coli.
- FIG. 7 shows plasmid map of the PHK09T vector.
- FIG. 11 shows an exemplary DR sequence and secondary structure correspond to Cas12a.
- FIG. 12 shows an exemplary DR sequence and secondary structure correspond to Cas12i.
- FIGs. 13A-13N show exemplary first stem-loop sequences and secondary structures.
- FIGs. 14A-14ZZ show exemplary DR sequences and secondary structures.
- FIGs. 15A-15F show additional exemplary first stem-loop sequences and secondary structures.
- FIGs. 16A-16F show the statistical results of the editing efficiency mediated by various crRNAs.
- FIGs. 17A and 17B show exemplary methylation pattern of the stem-loop sequences.
- CRISPR guide molecule of a CRISPR-Cas system particular a Type V CRISPR-Cas system.
- DR direct repeat
- manipulating secondary structural features of the guide molecule including the addition of one or more stem-loop structure (s) to the 5’ end of the DR region of the guide molecule, can significantly enhance target recognition and cleavage efficiency of a CRISPR-Cas system utilizing such modified guide molecule.
- the guide molecule design disclosed herein is largely independent of the spacer sequence and tolerates variations in the DR sequence, demonstrating broad adaptability of the present guide molecule design to various CRISPR-Cas systems, including its use with various Type V CRISPR effector proteins, such as many subclasses of the Cas12 proteins.
- the term “about” or “approximately” means an acceptable error for a particular value as determined by one of ordinary skill in the art, which depends in part on how the value is measured or determined. In certain embodiments, the term “about” or “approximately” means within 1, 2, 3, or 4 standard deviations. In certain embodiments, the term “about” or “approximately” means within 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.05%, or less of a given value or range.
- nucleic acid includes a nucleotide sequence described as having a “percent complementarity” to a specified second nucleotide sequence.
- a nucleotide sequence may have 80%, 90%, or 100%complementarity to a specified second nucleotide sequence, indicating that 8 of 10, 9 of 10 or 10 of 10 nucleotides of a sequence are complementary to the specified second nucleotide sequence.
- the nucleotide sequence 3’ -TCGA-5’ is 100%complementary to the nucleotide sequence 5’ -AGCT-3’ .
- the nucleotide sequence 3’ -TCGA- is 100%complementary to a region of the nucleotide sequence 5’ -TTAGCTGG-3’ .
- reverse complementary means two nucleic acid sequences complement to each other when read in opposite directions.
- a pair of reverse complementary sequences can be in separated nucleic acid molecules or in different regions of a single nucleic acid molecule. In the latter case, the nucleic acid molecule is considered “self-complementary. ”
- a “self-complementary” nucleic acid molecule can have at least two regions that are complementary or substantially complementary to each other when read in opposite directions. Under a suitable condition, a pair of reverse-complementary regions are capable of base-pairing with each other to form a double-stranded duplex, and the sequence between the reverse-complementary regions is bend into an unpaired loop. The resulting structure is referred to as a “stem-loop, ” a “hairpin, ” or a “hairpin loop, ” which is a secondary structure found in many self-complementary molecules.
- stem-loop sequence refers to a single-stranded polynucleotide sequence having at least two regions that are complementary or substantially complementary to each other when read in opposite directions, and thus capable of base-pairing with each other to form at least one double helix (referred to herein as a “stem” ) and an unpaired loop.
- the resulting structure is known as a stem-loop structure, a hairpin, or a hairpin loop, which is a secondary structure found in many RNA molecules.
- stem-loop structures do not require precise base pairing in the stem region.
- the stem may comprise one or more base mismatches ( “bulges” ) .
- stem-loop structures require precise base pairing.
- the stem base pairing does not include any mismatches.
- duplexed, ” “double-stranded, ” or “hybridized” as used herein refer to multiple nucleic acid molecules or a region of a single nucleic acid molecule (e.g., the stem region in a stem-loop structure) that is formed by hybridization of two single strands of nucleic acids containing complementary sequences. As described herein, a pair of complementary sequences can be fully complementary or partially complementary.
- hybridization and “hybridizes” refer to pairing and binding of complementary nucleic acids. Hybridization occurs to varying extents between two nucleic acids depending on factors such as the degree of complementarity of the nucleic acids, the melting temperature, Tm, of the nucleic acids and the stringency of hybridization conditions, as is well known in the art.
- stringency of hybridization conditions refers to conditions of temperature, ionic strength, and composition of a hybridization medium with respect to particular common additives such as formamide and Denhardt's solution. Determination of particular hybridization conditions relating to a specified nucleic acid is routine and is well known in the art, for instance, as described in J. Sambrook and D. W.
- High stringency hybridization conditions are those which only allow hybridization of substantially complementary nucleic acids. Typically, nucleic acids having about 85-100%complementarity are considered highly complementary and hybridize under high stringency conditions. Intermediate stringency conditions are exemplified by conditions under which nucleic acids having intermediate complementarity, about 50-84%complementarity, as well as those having a high degree of complementarity, hybridize. In contrast, low stringency hybridization conditions are those in which nucleic acids having a low degree of complementarity hybridize.
- a “modification” of an amino acid residue/position refers to a change of a primary amino acid sequence as compared to a starting amino acid sequence, wherein the change results from a sequence alteration involving said amino acid residue/position.
- typical modifications include substitution of the residue with another amino acid (e.g., a conservative or substantial substitution) , insertion of one or more (e.g., generally fewer than 5, 4, or 3) amino acids adjacent to said residue/position, and/or deletion of said residue/position.
- derivative refers to a peptide or polypeptide that comprises an amino acid sequence of the peptide or polypeptide, or a fragment of a peptide or polypeptide, which has been altered by the introduction of amino acid residue substitutions, deletions, or additions.
- derivative also refers to a peptide or polypeptide, or a fragment of a peptide or polypeptide, which has been chemically modified, e.g., by the covalent attachment of any type of molecule to the polypeptide.
- a peptide or polypeptide or a fragment of the peptide or polypeptide may be chemically modified, e.g., by glycosylation, acetylation, pegylation, phosphorylation, amidation, derivatization by known protecting/blocking groups, proteolytic cleavage, chemical cleavage, formulation, metabolic synthesis of tunicamycin, linkage to a cellular ligand or other protein, etc.
- the derivatives are modified in a manner that is different from naturally occurring or starting peptide or polypeptides, either in the type or location of the molecules attached. Derivatives further include deletion of one or more chemical groups which are naturally present on the peptide or polypeptide.
- a functional derivative of a peptide or polypeptide described herein shares at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99%sequence identity with respect to the starting (e.g., wild-type) peptide or polypeptide.
- sequence identity refers to a relationship between the sequences of two or more biological molecules (e.g., a pair of polynucleotides or multiple polypeptides) , as determined by aligning and comparing the respective sequences. “Percent (%) amino acid sequence identity” with respect to a reference amino acid sequence (e.g., a reference polypeptide) is defined as the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in the reference amino acid sequence, after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity.
- Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, or MEGALIGN (DNAStar, Inc. ) software. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. Exemplary parameters for determining relatedness of two or more sequences using the BLAST algorithm, for example, can be as set forth below.
- CRISPR-associated protein ” “Cas protein, ” and “CRISPR effector protein” are used interchangeably herein to refer to any of the proteins presented in, and/or meet the criteria of, the classification of CRISPR-Cas systems (See P. Mohanraju et al, “Diverse evolutionary roots and mechanistic variations of the CRISPR-Cas systems. ” Science. 2016 Aug 5; 353 (6299) : aad5147; Makarova et al. “Evoluntionary classification of CRISPR-Cas systems; a burst of class 2 and derived variants. ” Nat Rev Microbiol. 2020 Feb; 18 (2) : 67-83. ” ) .
- Cpf1 CRISPR-associated protein Cpf1, subtype PREFRAN
- Cpf1 is a large protein (about 1300 amino acids) that contains a RuvC-like nuclease domain homologous to the corresponding domain of Cas9 along with a counterpart to the characteristic arginine-rich cluster of Cas9.
- Cpf1 lacks the HNH nuclease domain that is present in all Cas9 proteins, and the RuvC-like domain is contiguous in the Cpf1 sequence, in contrast to Cas9 where it contains long inserts including the HNH domain.
- the CRISPR-Cas protein described herein comprises a RuvC-like nuclease domain and lacks a NHN nuclease domain.
- Type V CRISPR effector proteins include Cas12a (Cpf1) , Cas12b1 (C2c1) , Cas12b2, Cas12c (C2c3) , Cas12d (CasY) , Cas12e (CasX) , Cas12f1 (Cas14a) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12g, Cas12h, Cas12i, Cas 12j (Cas ⁇ -2) , Cas12k (C2c5) , Cas 12l, C2c4, C2c8, C2c9, and C2c10.
- Type V CRISPR effector proteins also encompass functional derivatives of an endogenously encoded Type V CRISPR effector protein or artificially designed proteins that (i) retain the function of a Cas protein (e.g., ability to form a binary complex with crRNA and/or a tertiary complex with crRNA and a target sequence, to recognize PAM signatures, and to nick or cleave the target sequence, etc. ) and (ii) meet the criteria for Type V classification as known in the art.
- a Type V CRISPR effector protein has a RuvC-like nuclease domain but lacks a HNH domain.
- the RuvC-like endonuclease domain comprises one or more RuvC motifs selected from a RuvC I motif, a RuvC II motif and RuvC III motif.
- the RuvC I motif comprises the amino acid sequence of X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N.
- the RuvC I motif comprises the amino acid sequence of X 1 XDXNX 6 X 7 XXXX 11 , wherein X 1 is A or G or S, X is any amino acid, X 6 is Q or I, X7 is T or S or V, X 11 is T or A.
- the RuvC II motif comprises the amino acid sequence of X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D.
- the RuvC-like nuclease domain comprises one or more RuvC motifs selected from a RuvC I motif: X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N; a RuvC II motif: X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D; and a RuvC III motif: X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H
- the RuvC-like nuclease domain comprises one or more RuvC motifs selected from a RuvC I motif: X 1 XDXNX 6 X 7 XXXX 11 , wherein X 1 is A or G or S, X is any amino acid, X 6 is Q or I, X7 is T or S or V, X 11 is T or A; a RuvC II motif: X 1 X 2 X 3 E, wherein X 1 is C or F or I or L or M or P or V or W or Y, X 2 is C or F or I or L or M or P or R or V or W or Y, and X 3 is C or F or G or I or L or M or P or V or W or Y; and a RuvC III motif: X 1 SHX 4 DX 6 X 7 , wherein X 1 is S or T, X 4 is Q or L, X 6 is P or S, and X 7 is F or L.
- a Type V CRISPR effector protein has a zinc-finger protein domain.
- the RuvC-like nuclease domain are discontinued segments in the amino acid sequence of a Type V CRISPR effector protein, where the zinc-finger protein domain sequence is placed in between the RuvC-like nuclease domain sequences.
- a Type V CRISPR effector protein has a Wedge (WED) domain. In some embodiments, a Type V CRISPR effector protein has a REC domain.
- WED Wedge
- a Type V CRISPR effector protein has a REC domain.
- a Type V CRISPR effector protein is less than about 1400, 1300, 1200, 1100, 1000, 900, 800 , 750, 700, 650, 600, 550, 500, 450, 400, 350, 300, 250, or 200, amino acids in length. In some embodiments, a Type V CRISPR effector protein is more than about 400, 350, 325, 300, 275, 250, 225, 200, or 175 amino acids in length.
- tracr sequence refers to trans-activating CRISPR RNA.
- tracrRNA includes any polynucleotide sequence that has sufficient complementarity with a crRNA sequence to hybridize.
- CRISPR array refers to a nucleic acid (e.g., DNA) fragment comprising CRISPR repeats and spacers, which begins from the first nucleotide of the first CRISPR repeat and ends at the last nucleotide of the last (terminal) CRISPR repeat. Typically, each spacer in the CRISPR array is located between two repeats.
- CRISPR repeat or “CRISPR direct repeat” or “direct repeat” refers to a plurality of short direct repeat sequences that exhibit very little or no sequence variation in a CRISPR array. Appropriately, Type-V CRISPR direct repeats may form a stem-loop structure.
- a direct repeat sequence can be naturally existing or non-naturally existing (e.g., artificially engineered or synthesized) .
- crRNA is used interchangeably with guide molecule, gRNA, and guide RNA, and refers to nucleic acid-based molecules, which include but are not limited to RNA-based molecules capable of forming complexes with Cas proteins (e.g., any of Cas12 proteins described herein) (e.g., via direct repeat, DR) , and comprises sequences (e.g., spacers) that are sufficiently complementary to a target nucleic acid sequence to hybridize to the target nucleic acid sequence and guide sequence-specific binding of the complex to the target nucleic acid sequence.
- Cas proteins e.g., any of Cas12 proteins described herein
- DR direct repeat
- a crRNA can be naturally existing or non-naturally existing (e.g., artificially engineered or synthesized) .
- secondary structures e.g., stem-loop structures
- Type V CRISPR RNA or “Type V crRNA” are used interchangeably to refer to a crRNA that is capable of forming CRISPR-Cas complexes with Type V CRISPR effector proteins.
- Type V crRNA encompasses artificially designed guide RNA molecules, as well as guide RNA molecules endogenously produced from a CRISPR array located adjacent to (e.g., to the 3’ end of) a Type V CRISPR-Cas loci.
- non-naturally existing refers to “not found in nature. ”
- a non-naturally existing nucleic acid molecule as described herein is intended to mean that the nucleic acid molecule is not found in nature.
- a non-naturally occurring nucleic acid encoding a peptide or protein contains at least one genetic alternation or chemical modification not normally found in nature.
- a functional derivative refers to a derivative that retains one or more functions or activities of the naturally occurring or starting peptide or polypeptide from which, it was derived.
- a functional derivative of a starting peptide or polypeptide has an amino acid sequence that is at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity to the starting peptide or polypeptide.
- targeting refers to the ability of a complex including a CRISPR-associated protein and a guide molecule, such as a crRNA, to preferentially or specifically bind to, e.g., hybridize to, a specific target nucleic acid compared to other nucleic acids that do not have the same or similar sequence as the target nucleic acid.
- target nucleic acid refers to a specific nucleic acid substrate that contains a nucleic acid sequence complementary to the entirety or a part of the spacer in a guide molecule.
- the target nucleic acid comprises a gene or a sequence within a gene.
- the target nucleic acid comprises a non-coding region (e.g., a promoter) in some embodiments, the target nucleic acid is single-stranded. In some embodiments, the target nucleic acid is double-stranded.
- nick or its grammatical variants such as “nicking” refers to the creation of a break in only one strand of a double-stranded nucleic acid molecule.
- cleave or its grammatical variant such as “cleaving” refers to the creation of breaks in both strands of a double-stranded nucleic acid molecule.
- a cleaving event is the result of two sequential nicking events in the two strands, respectively.
- the term “donor template nucleic acid, ” as used herein refers to a nucleic acid molecule that can be used by one or more cellular proteins to alter the structure of a target nucleic acid after a CRISPR enzyme described herein has altered a target nucleic acid.
- the donor template nucleic acid is a double-stranded nucleic acid.
- the donor template nucleic acid is a single-stranded nucleic acid.
- the donor template nucleic acid is linear.
- the donor template nucleic acid is circular (e.g., a plasmid) .
- the donor template nucleic acid is an exogenous nucleic acid molecule.
- the donor template nucleic acid is an endogenous nucleic acid molecule (e.g., a chromosome) .
- At least 5 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 4 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 3 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated.
- up to 12 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 11 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 10 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated.
- up to 9 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 8 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 7 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated.
- up to 6 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 5 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 4 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated.
- the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence. (ii) Second Stem Loop (Direct Repeat)
- a floater region e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence. (ii) Second Stem Loop (Direct Repeat)
- the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, and a second stem-loop sequence.
- the second stem-loop sequence is capable of forming a second stem-loop structure.
- the second stem-loop sequence comprises at least about 5 nucleotides, such as 5, 6, 7, 8, 9, 10, or more nucleotides. In some embodiments, the second stem-loop sequence comprises at most about 33 nucleotides, such as 33, 32, 31, 30, 29 or less nucleotides. In some embodiments, the second stem-loop sequence comprises about 5 nucleotides to about 30 nucleotides. In some embodiments, the second stem-loop sequence comprises about 6 nucleotides to about 24 nucleotides. In some embodiments, the second stem-loop sequence comprises about 10 nucleotides to about 20 nucleotides.
- the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-11 base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-10 base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-9 base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-8 base pairs.
- the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5-10 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5-9 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5-8 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5 nucleotides.
- the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 6 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 7 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 8 nucleotides.
- the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 3-12 base pairs and a second loop of about 3-15 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-6 base pairs and a second loop of about 5-8 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 5 base pairs and a second loop of about 5 nucleotides.
- the second stem-loop structure comprises 5’ -X1X2X3X4X5NNnNNX6X7X8X9X10-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, and X10 can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5 and X6X7X8X9X10 can hybridize to each other to form a stem and make NNnNN form a loop.
- 5’ -X1X2X3X4X5-3’ and 5’ -X6X7X8X9X10-3’ are reverse complementary sequences.
- the stem formed by X1X2X3X4X5 and X6X7X8X9X10 does not contain base mismatches.
- the stem formed by X1X2X3X4X5 and X6X7X8X9X10 contains about 5%-10%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
- the second stem-loop structure comprises 5’ -X1X2X3X4X5X6NNnNNX7X8X9X10X11X12-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6 and X7X8X9X10X11X12 can hybridize to each other to form a stem and make NNnNN form a loop.
- 5’ -X1X2X3X4X5X6-3’ and 5’ -X7 X8X9X10X11X12-3’ are reverse complement sequences.
- the stem formed by X1X2X3X4X5X6 and X7X8X9X10X11X12 does not contain base mismatches.
- the stem formed by X1X2X3X4X5X6 and X7X8X9X10X11X12 contains about 5%-30%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
- the second stem-loop structure comprises 5’ -X1X2X3X4X5X6X7NNnNNX8X9X10X11X12X13X14-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6X7 and X8X9X10X11X12X13X14 can hybridize to each other to form a stem and make NNnNN form a loop.
- the stem formed by X1X2X3X4X5X6X7X8X9X10 and X11X12X13X14X15X16 X17X18X19X20 contains about 5%-20%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
- the second stem loop sequence is derived from a naturally-existing Type V crRNA or a functional derivative thereof.
- the naturally-existing Type V crRNA is also referred as Direct Repeat (DR) .
- DR Direct Repeat
- Exemplary DR sequences are provided in Table 2.
- Exemplary DR sequences and secondary structures are provided in FIGs. 11 and 12. More exemplary DR sequences and secondary structures are provided in FIGs. 14A-14ZZ.
- the naturally existing Type V crRNA is processed from a CRISPR array located 3’ to a Type V CRISPR-Cas locus.
- the Direct Repeat is processed from a CRISPR array located 3’ to a Type V CRISPR-Cas locus.
- the Type V crRNA or functional derivative thereof comprises a stem-loop structure for binding by a Type V Cas protein.
- the naturally existing Type V crRNA comprises a second stem loop sequence and a connector region sequence (or part of the connector region sequence) .
- the naturally existing Type V crRNA comprises any second stem loop sequence as described herein and a connector region sequence or part of the connector region sequence) (e.g. any connector region sequence as described in Section 5.2.1 (a) (iii) (Connector Region) ) .
- the DR sequence comprises a second stem loop sequence and a connector region sequence (or part of the connector region sequence) .
- the DR sequence comprises any second stem loop sequence as described herein and a connector region sequence (or part of the connector region sequence) (e.g. any connector region sequence as described in Section 5.2.1 (a) (iii) (Connector Region) ) .
- the crRNA described herein comprises a first stem-loop sequence (e.g. any first stem-loop as described in Section 5.2.1 (a) (i) (First stem loop) ) connected to the 5’ end of a naturally-existing Type V crRNA or a functional derivative thereof.
- the crRNA described herein comprises at least one stem-loop sequence (e.g. any first stem-loop as described in Section 5.2.1 (a) (i) (First stem loop) ) connected to the 5’ end of a naturally-existing Type V crRNA or a functional derivative thereof.
- the naturally existing Type V crRNA has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence set forth in Table 2.
- the naturally existing Type V crRNA comprises a nucleotide sequence set forth in Table 2.
- the naturally existing Type V crRNA consists of a nucleotide sequence set forth in Table 2. Exemplary naturally existing Type V crRNA sequences and secondary structures are provided in FIGs. 11 and 12. More exemplary naturally existing Type V crRNA sequences and secondary structures are provided in FIGs. 14A-14ZZ.
- the naturally existing Type V crRNA has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence of any of SEQ ID NOs: 18-70.
- the naturally existing Type V crRNA comprises a nucleotide sequence of any of SEQ ID NOs: 18-70.
- the naturally existing Type V crRNA consists of a nucleotide sequence of any of SEQ ID NOs: 18-70.
- the DR has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence set forth in Table 2.
- the DR comprises a nucleotide sequence set forth in Table 2.
- the DR consists of a nucleotide sequence set forth in Table 2. Exemplary DR sequences and secondary structures are provided in FIGs. 11 and 12. More exemplary DR sequences and secondary structures are provided in FIGs. 14A-14ZZ.
- the DR has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence of any of SEQ ID NOs: 18-70.
- the DR comprises a nucleotide sequence of any of SEQ ID NOs: 18-70.
- the DR consists of a nucleotide sequence of any of SEQ ID NOs: 18-70.
- the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Stem Loop) ) , a connector region sequence (e.g. any connector region sequence as described in Section 5.2.1 (a) (iii) (Connector Region) ) , and any second stem-loop sequence described herein.
- the crRNA further comprises a spacer region (e.g. any spacer region as described in Section 5.2.1 (a) (v) (Spacer Region) ) 3’ to the second stem-loop sequence.
- the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence.
- the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Stem Loop) ) , a naturally-existing Type V crRNA or a functional derivative thereof.
- the crRNA further comprises a spacer region (e.g. any spacer region as described in Section 5.2.1 (a) (v) (Spacer Region) ) 3’ to the naturally-existing Type V crRNA or a functional derivative thereof.
- the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence. (iii) Connector region
- the connector region comprises at least about 3 nucleotides, such as 3, 4, 5, 6, or more nucleotides. In some embodiments, the connector region comprises at most about 25 nucleotides, such as 25, 24, 23, 22, 21, 20, or less nucleotides. In some embodiments, the connector region comprises about 3-23 nucleotides. In some embodiments, the connector region comprises about 3-20 nucleotides. In some embodiments, the connector region comprises about 4-19 nucleotides. In some embodiments, the connector region comprises about 5-19 nucleotides. In some embodiments, the connector region comprises 4 nucleotides. In some embodiments, the connector region comprises 5 nucleotides. In some embodiments, the connector region comprises 6 nucleotides. In some embodiments, the connector region comprises 7 nucleotides. In some embodiments, the connector region comprises 8 nucleotides.
- the connector region comprises a nucleotide sequence of any one of the connector regions as indicated in Table 1 and Table 2.
- the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of AUU.
- the connector region comprises the nucleotide sequence of AUU.
- the connector region consists of the nucleotide sequence of AUU.
- the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of AGAAAU.
- the connector region comprises the nucleotide sequence of AGAAAU.
- the connector region consists of the nucleotide sequence of AGAAAU.
- the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of UCUGCU.
- the connector region comprises the nucleotide sequence of UCUGCU.
- the connector region consists of the nucleotide sequence of UCUGCU.
- the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of AAUUUUU.
- the connector region comprises the nucleotide sequence of AAUUUUU.
- the connector region consists of the nucleotide sequence of AAUUUUU.
- the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of GUUUAAA.
- the connector region comprises the nucleotide sequence of GUUUAAA.
- the connector region consists of the nucleotide sequence of GUUUAAA.
- the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of CCCACAAUACCUGAGAAAU (SEQ ID NO: 71) .
- the connector region comprises the nucleotide sequence of CCCACAAUACCUGAGAAAU (SEQ ID NO: 71) .
- the connector region consists of the nucleotide sequence of CCCACAAUACCUGAGAAAU (SEQ ID NO: 71) .
- the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of GUUGCAAAACCCAAGAAAU (SEQ ID NO: 72) .
- the connector region comprises the nucleotide sequence of GUUGCAAAACCCAAGAAAU (SEQ ID NO: 72) .
- the connector region consists of the nucleotide sequence of GUUGCAAAACCCAAGAAAU (SEQ ID NO: 72) .
- the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Step Loop) ) , any connector region sequence as described herein, and a second stem-loop sequence (e.g. any second stem-loop sequence as described in Section 5.2.1 (a) (ii) (Second Stem Loop (Direct Repeat) ) .
- the crRNA further comprises a spacer region (e.g. any spacer region as described in Section 5.2.1 (a) (v) (Spacer Region) ) 3’ to the second stem-loop sequence.
- the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence. (iv) Floater region
- a floater region e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence. (iv) Floater region
- the floater region is absent. In some embodiments, the floater region comprises at least about 1 nucleotide, such as 1, 2, 3 or more nucleotides. In some embodiments, the floater region comprises 1 nucleotide. In some embodiments, the floater region comprises 2 nucleotides. In some embodiments, the floater region comprises 3 nucleotides. In some embodiments, the floater region comprises 4 nucleotides. In some embodiments, the floater region comprises 5 nucleotides. In some embodiments, the floater region comprises 6 nucleotides. In some embodiments, the floater region comprises 7 nucleotides. In some embodiments, the floater region comprises 8 nucleotides. In some embodiments, the floater region comprises 9 nucleotides. In some embodiments, the floater region comprises 10 nucleotides.
- the floater region comprises the nucleotide sequence of CAU. In some embodiments, the floater region consists of the nucleotide sequence of CAU.
- the floater region comprises the nucleotide sequence of U. In some embodiments, the floater region consists of the nucleotide sequence of U.
- the floater region comprises the nucleotide sequence of GG. In some embodiments, the floater region consists of the nucleotide sequence of GG.
- the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Stem Loop) ) , a connector region sequence (e.g. any connect region sequence as described in Section 5.2.1 (a) (iii) (Connector Region) ) , and a second stem-loop sequence (e.g. any second stem-loop sequence as described in Section 5.2.1 (a) (ii) (Second Stem Loop (Direct Repeat) ) ) , wherein the crRNA further comprises any floater region as described herein 5’ to the first stem-loop sequence.
- the crRNA further comprises a spacer region (e.g. any spacer region as described in Section 5.2.1 (a) (v) (Spacer region) ) 3’ to the second stem-loop sequence. (v) Spacer region
- the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, a second stem-loop sequence, wherein the crRNA further comprises a spacer region 3’ to the second stem-loop sequence.
- the spacer region comprises at least about 5 nucleotides, such as 5, 15, 20, 25, 30, or more nucleotides. In some embodiments, the spacer region comprises about 5-75 nucleotides. In some embodiments, the spacer region comprises about 20-40 nucleotides.
- the spacer region comprises about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75 or more nucleotides in length.
- the spacer is least about 50%, at least about 60%, least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%complementary to a target sequence.
- the degree of complementarity between the spacer and the target sequence is 100%.
- there are at least about 15 base pairing (e.g., at least about any of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 or more base pairing) between the spacer sequence and the target sequence of the target nucleic acid (e.g., DNA) .
- base pairing e.g., at least about any of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 or more base pairing
- cleavage efficiency by Type V CRISPR effector protein mediated by the crRNA can be adjusted by introducing one or more mismatches (e.g., 1 or 2 mismatches between the spacer sequence and the target sequence, including the positions along the mismatches of the spacer/target sequence) . Mismatches, such as double mismatches, have greater impact on cleavage efficiency when they are located more central to the spacer (i.e., not at the 3′ or 5′ end of the spacer) .
- the cleavage efficiency of Type V CRISPR effector protein can be tuned. For example, if less than 100%cleavage of the target sequence is desired (e.g., in a population of cells) , 1 or 2 mismatches between the spacer sequence and the target sequence can be introduced into the spacer sequence.
- mutations can be introduced to the spacer so that the Type V CRISPR-Cas system can distinguish between target and off-target sequences that have greater than 80%, 85%, 90%, or 95%complementarity.
- the degree of complementarity is from 80%to 95%, e.g., about 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% (for example, distinguishing between a target having 18 nucleotides from an off-target of 18 nucleotides having 1, 2, or 3 mismatches) . Accordingly, in some embodiments, the degree of complementarity between the spacer sequence in a guide molecule and its corresponding target sequence is greater than 94.5%, 95%, 95.5%, 96%, 96.5%, 97%, 97.5%, 98%, 98.5%, 99%, 99.5%, or 99.9%. In some embodiments, the degree of complementarity is 100%.
- the extent of base-pairing between the spacer and a target sequence can modulate nuclease activity of a Type V CRISPR effector protein.
- the spacer has no more than about 18 continuous base pairs with the target sequence.
- the spacer has no more than 18, no more 17, no more than 16, or no more than 15 continuous base pairs with the target sequence.
- the spacer has at least about 18 continuous base pairs with the target sequence.
- the spacer has at least 18, at least 19, at least 20 or more continuous base pairs with the target sequence.
- the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Stem Loop) ) , a connector region sequence (e.g. any connect region sequence as described in Section 5.2.1 (a) (iii) (Connector region) ) , and a second stem-loop sequence (e.g. any second stem-loop sequence as described in Section 5.2.1 (a) (ii) (Second Stem Loop (Direct Repeat) ) , wherein the crRNA further comprises any spacer region as described herein 3’ to the second stem-loop sequence.
- a first stem-loop sequence e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Stem Loop)
- a connector region sequence e.g. any connect region sequence as described in Section 5.2.1 (a) (iii) (Connector region)
- the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater region) ) 5’ to the first stem-loop sequence. (vi) Stem-Loop Modified Directed Repeat (SLDR) Sequences
- the crRNA described herein has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence set forth in Table 3.
- the crRNA described herein comprises a nucleotide sequence set forth in Table 3.
- the crRNA described herein consists of a nucleotide sequence set forth in Table 3.
- RNA is amenable to both 5′ and 3′ end conjugations with a variety of functional moieties including fluorescent dyes, polyethylene glycol, or proteins.
- modifying an oligonucleotide with a 2′-OMe to improve nuclease resistance can change the binding energy of Watson-Crick base pairing.
- a 2′-OMe modification can affect how the oligonucleotide interacts with transfection reagents, proteins or any other molecules in the cell. The effects of these modifications can be determined by empirical testing.
- the crRNA includes one or more phosphorothioate modifications. In some embodiments, the crRNA includes one or more locked nucleic acids for the purpose of enhancing base pairing and/or increasing nuclease resistance.
- one or more nucleotides of the crRNA molecule are methylated
- the sequences and the lengths of the crRNAs described herein can be optimized.
- the optimized length of crRNA can be determined by identifying the processed form of the crRNAs, or by empirical length studies of the crRNAs.
- the crRNAs can also include one or more aptamer sequences.
- Aptamers are oligonucleotide or peptide molecules that can bind to a specific target molecule.
- the aptamers can be specific to gene effectors, gene activators, or gene repressors.
- the aptamers can be specific to a protein, which in turn is specific to and recruits/binds to specific gene effectors, gene activators, or gene repressors.
- the effectors, activators, or repressors can be present in the form of fusion proteins.
- the crRNA has two or more aptamer sequences that are specific to the same adaptor proteins.
- the two or more aptamer sequences are specific to different adaptor proteins.
- the adaptor proteins can include, e.g., MS2, PP7, Q ⁇ , F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ⁇ Cb5, ⁇ Cb8r, ⁇ Cb12r, ⁇ Cb23r, 7s, and PRR1.
- the aptamer is selected from binding proteins specifically binding any one of the adaptor proteins as described herein.
- the aptamer sequence is a MS2 loop.
- nucleic acids such as cDNA
- cDNA nucleic acid sequences encoding Type V CRISPR effector protein variants that have been codon-optimized for expression in bacteria (e.g., E. coli) and in human cells are disclosed herein.
- the codon-optimized sequences for human cells can be generated by substituting codons in the nucleotide sequence that occur at lower frequency in human cells for codons that occur at higher frequency in human cells.
- the frequency of occurrence for codons can be computationally determined by methods known in the art.
- An example of a calculation of these codon frequencies for various host cells e.g., E. coli, yeast, insect, C. elegans, D. melanogaster, human, mouse, rat, pig, P. pastoris, A. thalian, maize, and tobacco
- E. coli, yeast, insect, C. elegans, D. melanogaster, human, mouse, rat, pig, P. pastoris, A. thalian, maize, and tobacco have been published or made available by sources such as the Codon Usage Frequence Table Tool. 5.2.2 Cas proteins (a) Type V CRISPR Effect Proteins
- the present application provides Type V CRISPR Effect Proteins which have single-stranded nucleic acid nicking activity or double-stranded nucleic acid cleavage activity.
- the Type V CRISPR effector protein forms a complex with a crRNA (e.g. a crRNA of Section 5.2.1 (crRNA) ) .
- a crRNA e.g. a crRNA of Section 5.2.1 (crRNA)
- crRNA Section 5.2.1
- the Type V CRISPR effect protein comprises a RuvC-like endonuclease domain.
- the RuvC-like endonuclease domain comprises one or more RuvC motifs selected from a RuvC I motif, a RuvC II motif, and a RuvC III motif.
- the Type V CRISPR effector protein comprises a RuvC I motif, a RuvC II motif and/or a RuvC III motif.
- the Type V CRISPR effector protein comprises a RuvC I motif.
- the Type V CRISPR effector protein comprises a RuvC II motif.
- the Type V CRISPR effector protein comprises a RuvC III motif.
- the Type V CRISPR effector protein comprises a RuvC I motif and a RuvC II motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC I motif and a RuvC III motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC II motif and a RuvC III motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC I motif, a RuvC II motif, and a RuvC III motif. In some embodiments, the RuvC-like endonuclease domain comprises one or more RuvC motifs selected from a RuvC I motif, a RuvC II motif and RuvC III motif.
- the RuvC I motif comprises the amino acid sequence of X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N.
- the RuvC I motif comprises the amino acid sequence of X 1 XDXNX 6 X 7 XXXX 11 , wherein X 1 is A or G or S, X is any amino acid, X 6 is Q or I, X7 is T or S or V, X 11 is T or A.
- the RuvC II motif comprises the amino acid sequence of X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D.
- the RuvC II motif comprises the amino acid sequence of X 1 X 2 X 3 E, wherein X 1 is C or F or I or L or M or P or V or W or Y, X 2 is C or F or I or L or M or P or R or V or W or Y, and X 3 is C or F or G or I or L or M or P or V or W or Y.
- the RuvC III motif motif comprises the amino acid sequence of X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H; X2 is A, S, R, or G; X is any amino acid; X3 is N, V, I, or E; X4 is A, G, S, or K; X5 is A, or S; X6 is N, H, V, or G; X7 is I, L, or V; X8 is A, G, or L.
- the RuvC III motif motif comprises the amino acid sequence of X 1 SHX 4 DX 6 X 7 , wherein X 1 is S or T, X 4 is Q or L, X 6 is P or S, and X 7 is F or L.
- the RuvC-like nuclease domain comprises one or more RuvC motifs selected from a RuvC I motif: X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N; a RuvC II motif: X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D; and a RuvC III motif: X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H
- the RuvC-like nuclease domain comprises one or more RuvC motifs selected from a RuvC I motif: X 1 XDXNX 6 X 7 XXXX 11 , wherein X 1 is A or G or S, X is any amino acid, X 6 is Q or I, X7 is T or S or V, X 11 is T or A; a RuvC II motif: X 1 X 2 X 3 E, wherein X 1 is C or F or I or L or M or P or V or W or Y, X 2 is C or F or I or L or M or P or R or V or W or Y, and X 3 is C or F or G or I or L or M or P or V or W or Y; and a RuvC III motif: X 1 SHX 4 DX 6 X 7 , wherein X 1 is S or T, X 4 is Q or L, X 6 is P or S, and X 7 is F or L.
- the Type V CRISPR effector protein comprises a PAM interacting (PI) domain.
- the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity sequence with 176-263 aa of Cas12i_2.
- the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
- the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 662-762 aa of Cas12a_Cpf1_8.
- the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
- the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 1-57 aa of cas12j Cas -2.
- the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
- the Type V CRISPR effector protein comprises an oligonucleotide-binding domain (OBD) .
- OBD domain is also referred as wedge domain (WED) .
- WED wedge domain
- the Type V CRISPR effector protein comprises two OBD sub-domains.
- the Type V CRISPR effector protein comprises three OBD sub-domains.
- the OBD domain comprises two OBD sub-domains.
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 1-18 aa of Cas12i_2
- the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 433-577 aa of Cas12i_2.
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of MATKTIVRPYTSNLSPNA (SEQ ID NO: 124)
- the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
- the OBD domain comprises two OBD sub-domains.
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 1-13 aa of Cas12b_8, and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 390-508 aa of Cas12b_8.
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of MAVKSIKVKLRLD (SEQ ID NO: 126)
- the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
- the OBD domain comprises two OBD sub-domains.
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 1-15 aa of ISDra2_TnpB
- the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 115-183 aa of ISDra2_TnpB.
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of MIRNKAFVVRLYPNA (SEQ ID NO: 128)
- the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
- the OBD domain comprises two OBD sub-domains.
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 57-73 aa of cas12j
- the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 196-363 aa of cas12j
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of NFQPPAKCHVVTKSRDF (SEQ ID NO: 132)
- the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
- the OBD domain comprises two OBD sub-domains.
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 1-22 aa of cas12o
- the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 358-474 aa of cas12o
- the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of MAKYDPSNVEVTSAFNAPVRLE (SEQ ID NO: 136)
- the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
- the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 867-990 aa of Cas12i_2.
- the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
- the first NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100 sequence identity with the amino acid sequence of AAFSSRFDARTGAPGIRCRRVPARCTQEHNPEPFPWWLNKFVVEHTLDACPLRADDLIPTGEGEI FVSPFSAEEGDFHQIH (SEQ ID NO: 139)
- the second NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
- the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
- the Type V CRISPR effector protein comprises a bridge helix (BH) domain.
- BH domain is also referred as helical domain or helical hairpin (HH) domain.
- the BH domain is inserted in the REC domain (e.g. REC2 domain) .
- the BH domain is inserted between the REC domain and the RuvC domain.
- the BH domain is inserted between the REC2 domain and the RuvC-II domain.
- the BH domain is inserted in the RuvC domain.
- the BH domain is inserted between the RuvC-I domain and the RuvC-II domain.
- the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of FDSDLFKLGECLSEKRVNKREERANRIVSSVLQICSRLNV (SEQ ID NO: 147) .
- the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 621-659 aa of Cas12b_8.
- the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of KLPGETESKDLRAIREERQRTLRQLRTQLAYLRLLVRCG (SEQ ID NO: 148) .
- the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 953-971 aa of Cas12a_Cpf1_8.
- the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of YHDKLAAIEKDRDSARKDW (SEQ ID NO: 149) .
- the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 348-415 aa of Cas12f_16.
- the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
- the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 565-606 aa of cas12o
- the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of AKNIPVEDIRKIDKVTNMAKSVKSLIGYARQHLAAIKAKKFG (SEQ ID NO: 151) .
- the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of AAKKAMLDESFKFFDHAYTVFFSVFIKLWGGVKPTQVALVENDTNKIDAICSILWFRLQTKTDST NITLQSAEERIRRFKEYAQHDPSPLALSYLTGNLDPEKHEWVDCRELYQNWCAELKCDLATDIET MINHNLLPISAKQEYNCYSSFSNLFGEAE (SEQ ID NO: 152)
- the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least
- the Type V CRISPR effector protein comprises two REC domains.
- the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 13-390 aa of Cas12b_8, and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 659-821 of Cas12b_8.
- the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of DDMPEIRAGLWKLHKEVNAGVRYYTEWLSLLRQENLYRRSPNGDGEQECDKTAEECKAELLER LRARQVENGHRGPAGSDDELLQLARQLYELLVPQAIGAKGDAQQIARKFLSPLADKDAVGGLGI AKAGNKPRWVRMREAGEPGWEEEKEKAETRKSADRTADVLRALADFGLKPLMRVYTDSEMSS VEWKPLRKGQAVRTWDRDMFQQAIERMMSWESWNQRVGQEYAKLVEQKNRFEQKNFVGQE HLVHLVNQLQDMKEASPGLESKEQTAHYVT
- the REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 15-115 aa of ISDra2_TnpB.
- the REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
- the REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 19-192 aa of Cas12f_16.
- the REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
- the Type V CRISPR effector protein comprises two REC domains.
- the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 22-358 aa of cas12o
- the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 606-764 of cas12o
- the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of EGEEVLIDFIRNEILPAADKLLELLLFFRGKPFCLSGVNYSESDVDQKLKEIYNSVSIVPEKAKRFG VKDASDFAFDQFKDEAQKLYKFFIGEESPDDGNKIKQAATSFYAIFFAKATGNRITRNIPSICSSSL FPIASFANCNLGASITAEVERKIKSFEELQKLRNEEYTKLNNAGDHNPDGEDDGSETIFASAVVDV RRFCQSLYENSKTYGFKEFGKENIKSVSEFLSENVEQLRSIFAEKGGNFSFEDEADLSRHKIVTGY KANFVNAIYSDFDYVWKSRPDV
- the Type V CRISPR effector protein does not contain an HNH-like domain.
- the HNH domain shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 775-909 aa of SpCas9.
- the HNH domain shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 196-296 aa of OgeuIscB1.
- the HNH domain shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of KNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDV DHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK AERGGLS (SEQ ID NO: 166) .
- the CRISPR effector protein of the present invention can recognize PAM (protospacer adjacent motif, protospacer adjacent motif) to act on the target sequence.
- the PAM comprises the nucleic acid sequence of 5’ -TTN-3’ or 5’ -NTN-3’ , wherein N is selected from A, T, C, G, and U.
- the PAM consists of the nucleic acid sequence of 5’ -TTN-3’ or 5’ -NTN-3’ , wherein N is selected from A, T, C, G, and U.
- the PAM comprises 5’ -TTA-3’ , 5’ -TTT-3’ , 5’ -TTG-3’ , 5-TTC-3’ , 5’ -ATA-3’ , or 5’ -ATG-3’ .
- the PAM consists of 5’ -TTA-3’ , 5’ -TTT-3’ , 5’ -TTG-3’ , 5-TTC-3’ , 5’ -ATA-3’ , or 5’ -ATG-3’ .
- the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12b1 (C2c1) , Cas12b2, Cas12c (C2c3) , Cas12d (CasY) , Cas12e (CasX) , Cas12f1 (Cas14a) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12g, Cas12h, Cas12i, Cas12j (Cas ⁇ -2) , Cas12k (C2c5) , Cas12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof.
- the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12d (CasY) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12h, Cas12i, Cas12j (Cas ⁇ -2) , Cas12k (C2c5) , Cas12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof.
- the Type V CRISPR effector protein is a Cas12a (Cpf1) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12b1 (C2c1) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12b2 or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12b2or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12c (C2c3) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12d (CasY) or a functional derivative thereof.
- the Type V CRISPR effector protein is a Cas12e (CasX) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12f1 (Cas14a) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12f2 (Cas14b) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12f3 (Cas14c) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12g or a functional derivative thereof.
- the Type V CRISPR effector protein is a Cas12h or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12i or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12j (Cas ⁇ -2) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12k (C2c5) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12l or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a C2c4 or a functional derivative thereof.
- the Type V CRISPR effector protein is a C2c8 or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a C2c9 or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a C2c10 or a functional derivative thereof.
- a functional derivative of a Cas12 protein comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99%sequence identity to such Cas12 protein.
- the functional derivative of a Cas12 protein described herein comprises one or more conservative amino acid substitutions.
- Conservative amino acid substitutions are ones in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been generally defined in the art, including basic side chains (e.g., lysine, arginine, histidine) , acidic side chains (e.g., aspartic acid, glutamic acid) , uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine) , nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan) , beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g.
- Amino acid substitutions may be introduced into a polypeptide of interest and the products screened for a desired activity of interest, e.g., retained/improved ability of a Cas12 protein variant in producing target gene editing efficiency in a report cell line, and methods for measuring such desired activity are well-known in the art.
- a derivative of polypeptide can be prepared using methods well-known in the art, e.g., by modifying the corresponding nucleic acid molecules encoding the derivative.
- derivatives may be a substitution, deletion, or insertion of one or more codons encoding the polypeptide that results in a change in the amino acid sequence as compared with the wild-type sequence of the polypeptide.
- the derivatives can be made using methods well-known in the art such as DNA synthesis, oligonucleotide-mediated (site-directed) mutagenesis, alanine scanning, and PCR mutagenesis. Site-directed mutagenesis (see, e.g., Carter, 1986, Biochem J.
- a functional derivative of a polypeptide comprises one or more modifications to one or more predicted non-essential amino acid residues in its sequence.
- modifications made to non-essential amino acid residues can be a conservative substation as described herein.
- modifications made to non-essential amino acid residues can be a substantial substation described herein.
- modifications made to non-essential amino acid residues can be a deletion of the non-essential amino acid residue.
- one or more modifications can be made to one or more predicted essential amino acid residues in its sequence.
- the modifications made to essential amino acid residues in a protein sequence can be a conservative substitution as described herein.
- Methods well-known in the art can be used to analyze a protein (e.g., a Cas12 protein) sequence to identify essential and non-essential amino acid residues of the protein.
- a protein e.g., a Cas12 protein
- an amino acid residue of a protein that is not conserved among orthologous gene products is predicted to be a non-essential amino acid residue
- another amino acid residue that is conserved among orthologous gene products is predicted to be an essential amino acid residue.
- functional derivatives of the polypeptide can be identified by testing the resulting derivatives for activity exhibited by the original sequence.
- a Cas protein e.g., a Cas12 protein
- nucleic acid molecules encoding the derivative polypeptides can be delivered into a population of in vitro cultured cells in the presence of a suitable guide molecule that targets a reporter gene.
- assays can be conducted to detect and/or measure editing (e.g., knocking down) of the target gene in the testing cell population, and those derivatives that induce the gene editing phenotype in the testing cell population at a comparable level to that of the control population can be selected as functional derivatives.
- editing e.g., knocking down
- the Type V CRISPR effector protein provided herein has sequence identity of at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99%sequence identity to an amino acid sequence set forth in Table 4.
- the Type V CRISPR effector protein comprises an amino acid sequence set forth in Table 4.
- the Type V CRISPR effector protein consists of an amino acid sequence set forth in Table 4.
- Functional domains are used in their broadest sense and include proteins such as enzymes or factors themselves or specific functional fragments (domains) thereof.
- the functional domain may be a transcription activation domain. In some embodiments, the functional domain is a transcription repression domain. In some embodiments, the functional domain is an epigenetic modification domain such that an epigenetic modification enzyme is provided. In some embodiments, the functional domain is an activation domain. In some embodiments, the Type V CRISPR effector protein is associated with one or more functional domains; and the Type V CRISPR effector protein contains one or more mutations within the RuvC domain, and the resulting CRISPR complex can deliver epigenetic modifiers, or transcript or translate activation or repression signals.
- the functional domain exhibits activity to modify a target DNA or proteins associated with the target DNA, wherein the activity is one or more selected from the group consisting of nuclease activity (e.g., HNH nuclease, RuvC nuclease, Trex1 nuclease, Trex2 nuclease) , methylation activity, demethylation activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, glycosylase activity, acetyl transferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitination activity, adenylation activity, deadeny
- the Type V CRISPR effector protein may be fused to adenosine deaminase or cytidine deaminase for base editing purposes.
- the term “adenosine deaminase” or “adenosine deaminase protein” refers to a protein, polypeptide, or one or more functional domains of a protein or polypeptide that can catalyze hydrolytic deamination reaction to convert adenine (or the adenine portion of a molecule) to hypoxanthine (or the hypoxanthine portion of a molecule) , as shown below.
- the adenine-containing molecule is adenosine (A) and the hypoxanthine-containing molecule is inosine (I) .
- the adenine-containing molecule may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) .
- the adenosine deaminase is derived from one or more metazoan species, including but not limited to mammals, birds, frogs, squid, fish, flies, and worms. In some embodiments, the adenosine deaminase is human, squid, or drosophila adenosine deaminase.
- the adenosine deaminase is TadA protein, such as E. coli TadA. See Kim et al., Biochemistry 45: 6407-6416 (2006) ; Wolf et al., EMBO J. 21: 3841-3851 (2002) .
- the adenosine deaminase is mouse ADA. See Grunebaum et al., Curr. Opin. Allergy Clin. Immunol. 13: 630-638 (2013) .
- the adenosine deaminase is human ADAT2. See Fukui et al., J. Nucleic Acids 2010: 260512 (2010) .
- the deaminase e.g., adenosine or cytidine deaminase
- the deaminase is one or more of those described in: Cox et al., Science. Nov. 24, 2017; 358 (6366) : 1019-1027; Komore et al., Nature. May 19, 2016; 533 (7603) : 420-4; and Gaudelli et al., Nature. Nov. 23, 2017; 551 (7681) : 464-471.
- the adenosine deaminase protein recognizes one or more target adenosine residues in a double-stranded nucleic acid substrate and converts them to inosine residues.
- the double-stranded nucleic acid substrate is an RNA-DNA heteroduplex.
- the adenosine deaminase protein recognizes a binding window on a double-stranded substrate.
- the binding window comprises at least one target adenosine residue.
- the binding window is in the range of about 3 bp to about 100 bp. In some embodiments, the binding window is in the range of about 5 bp to about 50 bp.
- the binding window is in the range of about 10 bp to about 30 bp. In some embodiments, the binding window is about 1 bp, 2 bp, 3 bp, 5 bp, 7 bp, 10 bp, 15 bp, 20 bp, 25 bp, 30 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 by or 100 bp.
- the adenosine deaminase protein comprises one or more deaminase domains.
- the deaminase domain is used to recognize one or more target adenosine (A) residues contained in a double-stranded nucleic acid substrate and convert them to inosine (I) residues.
- the deaminase domain comprises an active center.
- the active center comprises zinc ions.
- amino acid residues in or near the active center interact with one or more nucleotides 5′ of the target adenosine residue. In some embodiments, amino acid residues in or near the active center interact with one or more nucleotides 3′ of the target adenosine residue. In some embodiments, amino acid residues in or near the active center further interact with nucleotides complementary to the target adenosine residues on the opposite chain. In some embodiments, the amino acid residue forms a hydrogen bond with the 2′ hydroxyl group of the nucleotide.
- the adenosine deaminase comprises human ADAR2 whole protein (hADAR2) or deaminase domain (hADAR2-D) thereof. In some embodiments, the adenosine deaminase is a member of the ADAR family homologous to hADAR2 or hADAR2-D.
- the homologous ADAR protein is human ADAR1 (hADAR1) or deaminase domain (hADAR1-D) thereof.
- hADAR1-D deaminase domain
- glycine 1007 of hADAR1-D corresponds to glycine 487hADAR2-D
- glutamic acid 1008 of hADAR1-D corresponds to glutamic acid 488 of hADAR2-D.
- the adenosine deaminase comprises the wild-type amino acid sequence of hADAR2-D. In some embodiments, the adenosine deaminase comprises one or more mutations in the hADAR2-D sequence such that the editing efficiency and/or substrate editing preference of hADAR2-D are changed as desired.
- the adenosine deaminase is TadA8e.
- the Type V CRISPR effector protein described herein is fused to TadA8e or functional fragment thereof (i.e., capable of A-to-I single base editing) .
- the deaminase is cytidine deaminase.
- the term “cytidine deaminase” or “cytidine deaminase protein” refers to a protein, polypeptide, or one or more functional domains of a protein or polypeptide that can catalyze hydrolytic deamination reaction to convert cytosine (or the cytosine portion of a molecule) to uracil (or the uracil portion of a molecule) , as shown below.
- the cytosine-containing molecule is cytidine (C) and the uracil-containing molecule is uridine (U) .
- the cytosine-containing molecule may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) .
- cytidine deaminases that can be used in combination with the present disclosure include, but are not limited to, members of an enzyme family known as apolipoprotein B mRNA editing complex (APOBEC) family deaminases, activation-induced deaminase (AID) , or cytidine deaminase 1 (CDA1) , and in specific embodiments, the deaminase in APOBEC1 deaminases, APOBEC2 deaminases, APOBEC3A deaminases, APOBEC3B deaminases, APOBEC3C deaminases and APOBEC3D deaminases, APOBEC3E deaminases, APOBEC3F deaminases, APOBEC3G deaminases, APOBEC3H deaminases or APOBEC4 deaminases.
- APOBEC apolipoprotein B mRNA editing complex
- the cytidine deaminase is capable of targeting cytosines in a DNA single strand.
- the cytidine deaminase can edit on a single strand present outside of the binding component, e.g., bind to Cas13.
- the cytidine deaminase may edit at localized bubbles, such as those formed at target editing sites but with guide sequence mismatching.
- the cytidine deaminase may comprise mutations that contribute to focus activity, such as those described in Kim et al., Nature Biotechnology (2017) 35 (4) : 371-377 (doi: 10.1038/nbt. 3803) .
- the cytidine deaminase is derived from one or more metazoan species, including but not limited to mammals, birds, frogs, squid, fish, flies, and worms. In some embodiments, the cytidine deaminase is human, primate, bovine, canine, rat, or mouse cytidine deaminase.
- the cytidine deaminase is human APOBEC, including hAPOBEC1 or hAPOBEC3. In some embodiments, the cytidine deaminase is human AID.
- the cytidine deaminase protein recognizes one or more target cytosine residues in a single-stranded bubble of an RNA duplex and converts them to uracil residues. In some embodiments, the cytidine deaminase protein recognizes a binding window on a single-stranded bubble of an RNA duplex. In some embodiments, the binding window comprises at least one target cytosine residue. In some embodiments, the binding window is in the range of about 3 bp to about 100 bp. In some embodiments, the binding window is in the range of about 5 bp to about 50 bp. In some embodiments, the binding window is in the range of about 10 bp to about 30 bp.
- the binding window is about 1 bp, 2 bp, 3 bp, 5 bp, 7 bp, 10 bp, 15 bp, 20 bp, 25 bp, 30 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 by or 100 bp.
- the cytidine deaminase protein comprises one or more deaminase domains.
- deaminase domains are used to recognize one or more target cytosine (C) residues contained in a single-stranded bubble of an RNA duplex and convert them to uracil (U) residues.
- the deaminase domain comprises an active center.
- the active center comprises zinc ions.
- amino acid residues in or near the active center interact with one or more nucleotides at 5′ of the target cytosine residue.
- amino acid residues in or near the active center interact with one or more nucleotides at 3′ of the target cytosine residue.
- the cytidine deaminase comprises human APOBEC1 whole protein (hAPOBEC1) or its deaminase domain (hAPOBEC1-D) or its C-terminal truncated form (hAPOBEC-T) .
- the cytidine deaminase is a member of the APOBEC family homologous to hAPOBEC1, hAPOBEC-D, or hAPOBEC-T.
- the cytidine deaminase comprises human AID1 whole protein (hAID) or its deaminase domain (hAID-D) or its C-terminal truncated form (hAID-T) .
- the cytidine deaminase comprises the wild-type amino acid sequence of cytosine deaminase. In some embodiments, the cytidine deaminase comprises one or more mutations in the cytosine deaminase sequence such that the editing efficiency and/or substrate editing preference of the cytosine deaminase are changed as desired.
- the CRISPR-Cas systems described herein have a wide variety of utilities including modifying (e.g., deleting, inserting, translocating, inactivating, or activating) a target polynucleotide in a multiplicity of cell types.
- the CRISPR-Cas systems have a broad spectrum of applications in, e.g., DNA/RNA detection (e.g., specific high sensitivity enzymatic reporter unlocking (SHERLOCK) ) , tracking and labeling of nucleic acids, enrichment assays (extracting desired sequence from background) , detecting circulating tumor DNA, preparing next generation library, drug screening, disease diagnosis and prognosis, and treating various genetic disorders.
- DNA/RNA detection e.g., specific high sensitivity enzymatic reporter unlocking (SHERLOCK)
- SHERLOCK specific high sensitivity enzymatic reporter unlocking
- enrichment assays extract desired sequence from background
- detecting circulating tumor DNA preparing next generation library
- drug screening disease diagnosis
- Genome editing system refers to an engineered CRISPR-Cas system of the present disclosure having RNA-guided DNA editing activity.
- Genome editing systems of the present disclosure include at least two components of the CRISPR-Cas systems described above: a crRNA and a Type V CRISPR effector protein. As described above, these two components form a complex that is capable of associating with a specific nucleic acid sequence and editing the DNA in or around that nucleic acid sequence, for instance by making one or more of a single strand break (an SSB or nick) , a double strand break (aDSB) , a nucleobase modification, a DNA methylation or demethylation, a chromatin modification, etc.
- a single strand break an SSB or nick
- aDSB double strand break
- Genome editing systems of the present disclosure when introduced into cells, may alter (a) endogenous genomic DNA (gDNA) including, without limitation, DNA encoding e.g., a gene target of interest, an exonic sequence of a gene, an intronic sequence of a gene, a regulatory element of a gene or group of genes, etc. ; (b) endogenous extra-genomic DNA such as mitochondrial DNA (mtDNA) ; and/or (c) exogenous DNA such as a non-integrated viral genome, a plasmid, an artificial chromosome, etc.
- gDNA endogenous genomic DNA
- mtDNA mitochondrial DNA
- exogenous DNA such as a non-integrated viral genome, a plasmid, an artificial chromosome, etc.
- these DNA substrates are referred to as “target DNA. ”
- alterations caused by the system may take the form of short DNA insertions or deletions, which are collectively referred to as “indels. ”
- These indels may be formed within or proximate to a predicted cleavage site that is typically proximate to the PAM sequence and/or within a region of complementarity to the spacer sequence, though in some cases indels may occur outside of such predicted cleavage site. Without wishing to be bound by any theory, it is believed that indels are often the result of the repair of an SSB or DSB by “error-prone” DNA damage repair pathways, such as non-homologous end joining (NHEJ) .
- NHEJ non-homologous end joining
- a genome editing is used to generate two DSBs within 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1250, 1500, 1750, or 2000 base pairs of one another, which results in one or more outcomes, including the formation of an indels at one or both sites of cleavage, as well as deletion or inversion of a DNA sequence disposed between the DSBs.
- genome editing systems of this disclosure may alter target DNA via integration of new sequences.
- These new sequences may be distinct from the existing sequence of the target DNA (as a non-limiting example, integrated by NHEJ by ligation of blunt-ends) or may correspond to a DNA template having one or more regions that are homologous to a region of the targeted DNA. Integration of templated homologous sequences is also referred to as “homology-directed repair” or “HDR. ”
- Template DNA for HDR may be endogenous to the cell, including without limitation in the form of a homologous sequence located on another copy of the same chromosome as the target DNA, a homologous sequence from the same gene cluster as the target DNA, etc.
- the template DNA may be provided exogenously, including without limitation as a free linear or circular DNA, as a DNA bound (covalently or non-covalently) to one or more genome editing system components, or as part of a vector genome.
- editing comprises a temporary or permanent silencing of a gene by CRISPR-mediated interference, as described by Matthew H. Larson et al. “CRISPR interference (CRISPRi) for sequence-specific control of gene expression, ” Nature Protocols 8, 2180-2196 (2013) , which is incorporated by reference in its entirety and for all purposes.
- CRISPR interference CRISPRi
- Genome editing systems may include other components, including without limitation one or more heterologous functional domains which mediate site specific nucleobase modification, DNA methylation or demethylation, or chromatin modification.
- the heterologous functional domain covalently bound to a Type V CRISPR effector protein, for instance by means of a direct peptide bond or an intervening peptide linker.
- the heterologous functional domain is covalently bound to the crRNA, for instance by means of a chemical cross-link.
- one or more functional groups may be non-covalently associated with a Type V CRISPR effector protein and/or a crRNA.
- the CRISPR-Cas system described herein can be used in DNA/RNA detection by DNA sensing.
- Single effector RNA-guided DNases can be reprogrammed with RNA guides to provide a platform for specific single-stranded DNA (ssDNA) sensing.
- ssDNA single-stranded DNA
- an activated Type V CRISPR effector protein engages in “collateral” cleavage of nearby ssDNA with no sequence similarity to the target sequence. This RNA-programmed collateral cleavage activity allows the CRISPR systems to detect the presence of a specific DNA by nonspecific degradation of labeled ssDNA.
- the collateral ssDNase activity can be combined with a reporter in DNA detection applications such as a method called the DNA Endonuclease-Targeted CRISPR trans reporter (DETECTR) method, which when combined with amplification achieves attomolar sensitivity for DNA detection (see, e.g., Chen et al., Science, 360 (6387) : 436-439, 2018) , which is incorporated herein by reference in its entirety.
- DETECTR DNA Endonuclease-Targeted CRISPR trans reporter
- One application of using the enzymes described herein is to degrade non-target ssDNA in an in vitro environment.
- a “reporter” ssDNA molecule linking a fluorophore and a quencher can also be added to the in vitro system, along with an unknown sample of DNA (either single-stranded or double-stranded) .
- the surveillance complex containing Type V CRISPR effector protein cleaves the reporter ssDNA resulting in a fluorescent readout.
- the CRISPR-Cas systems described herein can be used to detect a target DNA in a sample (e.g., a clinical sample, a cell, or a cell lysate) .
- a sample e.g., a clinical sample, a cell, or a cell lysate
- the collateral DNase activity of the Type V CRISPR effector proteins described herein is activated when the effector proteins bind to a target nucleic acid.
- the effector protein cleaves a labeled detector ssDNA to generate or change a signal (e.g., an increased signal or a decreased signal) thereby allowing for the qualitative and quantitative detection of the target DNA in the sample.
- the specific detection and quantification of DNA in the sample allows for a multitude of applications including diagnostics.
- the methods include a) contacting a sample with: (i) a crRNA and/or a nucleic acid encoding the crRNA, wherein the crRNA comprises a first stem-loop sequence, a second stem-loop sequence and a spacer sequence capable of hybridizing to the target RNA; (ii) a Type V CRISPR effector protein and/or a nucleic acid encoding the effector protein; and (iii) a labeled detector ssDNA; wherein the effector protein associates with the crRNA to form a complex; wherein the complex hybridizes to the target DNA; and wherein upon binding of the complex to the target DNA, the effector protein exhibits collateral DNase activity and cleaves the labeled detector ssDNA; and b) measuring a detectable signal produced by cleavage of the labeled detector ssDNA, wherein said measuring provides for detection of the target DNA in the sample.
- the methods further include comparing the detectable signal with a reference signal and determining the amount of target DNA in the sample.
- the measuring is performed using gold nanoparticle detection, fluorescence polarization, colloid phase transition/dispersion, electrochemical detection, and semiconductor based-sensing.
- the labeled detector ssDNA includes a fluorescence-emitting dye pair, a fluorescence resonance energy transfer (FRET) pair, or a quencher/fluorophore pair.
- FRET fluorescence resonance energy transfer
- an amount of detectable signal produced by the labeled detector ssDNA is decreased or increased.
- the labeled detector ssDNA produces a first detectable signal prior to cleavage by the effector protein and a second detectable signal after cleavage by the effector protein.
- a detectable signal is produced when the labeled detector ssDNA is cleaved by the effector protein.
- the labeled detector ssDNA includes a modified nucleobase, a modified sugar moiety, a modified nucleic acid linkage, or a combination thereof.
- the methods include the multi-channel detection of multiple independent target DNAs in a sample (e.g., two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty, thirty, forty, or more target RNAs) by using multiple CRISPR-Cas systems, each including a distinct orthologous Type V CRISPR effector protein and corresponding crRNAs, allowing for the differentiation of multiple target DNAs in the sample.
- the methods include the multi-channel detection of multiple independent target DNAs in a sample, with the use of multiple instances of CRISPR-Cas systems, each containing an orthologous Type V CRISPR effector protein with differentiable collateral ssDNase substrates.
- RNA targeting effector proteins can for instance be used to target probes to selected RNA sequences.
- the CRISPR-Cas systems described herein can be used in tandem such that two Cas12i nicking enzymes, or one Cas12i enzyme and one other CRISPR Cas enzyme with nicking activity, targeted by a pair of crRNAs to opposite strands of a target locus, can generate a double-strand break with overhangs.
- This method may reduce the likelihood of off-target modifications, because a double-strand break is expected to occur only at loci where both enzymes generate a nick, thereby increasing genome editing specificity.
- This method is referred to as a ‘double nicking’ or ‘paired nickase’ strategy and is described, e.g., in Ran et al., “Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity, ” Cell, 2013 Sep. 12; 154 (6) : 1380-1389, and in Mali et al., “CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering, ” Nature Biotechnology, 2013 Aug. 1; 31: 833-838, which are both incorporated herein by reference in their entireties.
- paired nickases demonstrated the utility of this strategy in mammalian cell lines.
- Applications of paired nickases have been described in the model plant Arabidopsis (e.g., in Fauser et al., “Both CRISPR/Cas-based nucleases and nickases can be used efficiently for genome engineering in Arabidopsis thaliana, ” The Plant Journal 79 (2) : 348-59 (2014) , and Shiml et al., “The CRISPR/Cas system can be used as nuclease for in planta gene targeting and as paired nickases for directed mutagenesis in Arabidopsis resulting in heritable progeny, ” The Plant Journal 80 (6) : 1139-50 (2014) ; in crops such as in rice (e.g., in Mikami et al., “Precision Targeted Mutagenesis via Cas9 Paired Nickases in Rice, ” Plant and Cell Physiology 57 (5) : 1058-68
- CRISPR-Cas systems described herein can also be used as paired nickases to detect splice junctions as described e.g., in Santo &Paik, “Asplice junction-targeted CRISPR approach (spJCRISPR) reveals human FOXO3B to be a protein-coding gene, ” Gene 673: 95-101 (2016) .
- the CRISPR-Cas systems described herein can also be used as paired nickases to insert DNA molecules into target loci as described in e.g., Wang et al, “Therapeutic Genome Editing for Myotonic Dystrophy Type 1 Using CRISPR/Cas9, ” Molecular Therapy 26 (11) : 2617-2630 (2018) .
- the CRISPR systems described herein can also be used as single nickases to insert genes as described in e.g., Gao et al, “Single Cas9 nickase induced generation of NRAMP1 knockin cattle with reduced off-target effects, ” Genome Biology 18 (1) : 13 (2017) . 5.3.5 Enhancing Base Editing Using CRISPR Nickases
- the CRISPR-Cas systems described herein can be used to augment the efficiency of CRISPR base editing.
- base editing a protein domain with DNA nucleotide modifying activity (e.g., cytidine deamination) is fused to a programmable Type V CRISPR effector protein that has been deactivated by mutation so as to no longer possess double-strand DNA cleavage activity.
- a nickase as the programmable Cas protein has been shown to improve the efficiency of base editing as described e.g., in Komor et al., “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage, ” Nature 533: 420-424 (2016) , and Nishida et al., “Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems, ” Science 353 (6305) : aaf8729 (2016) , both of which are incorporated herein by reference in their entirety.
- a nickase that nicks the non-edited strand of the target locus is hypothesized to stimulate endogenous DNA repair pathways-such as mismatch repair or long-patch base excision repair, which preferentially resolves a mismatch generated by base editing to a desired allele-or to provide better accessibility of the catalytic editing domain to the target DNA.
- the CRISPR-Cas systems described herein can be used in conjunction with proteins that act on nicked DNA.
- proteins that act on nicked DNA.
- One such class of proteins is nick-translating DNA polymerases, such as E. coli DNA polymerase I or Taq DNA polymerase.
- the CRISPR-Cas system (e.g., a CRISPR nickase) can be fused to an error-prone DNA polymerase I.
- This fusion protein can be targeted with crRNA to generate a nick at a target DNA site.
- the DNA polymerase then initiates DNA synthesis at the nick, displacing downstream nucleotides, and, because an error-prone polymerase is used, resulting in mutagenesis of the target locus.
- Polymerase variants with varying processivity, fidelity, and misincorporation biases may be used to influence characteristics of the mutants that are generated.
- EvolvR This method, called EvolvR, is described in detail, e.g., in Halperin et al., “CRISPR-guided DNA polymerases enable diversification of all nucleotides in a tunable window, ” Nature 560, 248-252 (2016) , which is incorporated herein by reference in its entirety.
- a CRISPR nickase can be used in a nick translation DNA labeling protocol.
- Nick translation first described by Rigby et al in 1977, involves incubating DNA with a DNA nicking enzyme, such as DNase I, which creates one or more nicks in the DNA molecule.
- a nick-translating DNA polymerase such as DNA polymerase I, is used to incorporate labeled nucleic acid residues at the nicked sites.
- the CRISPR-Cas systems described herein can be used for preparing next generation sequencing (NGS) libraries.
- NGS next generation sequencing
- the CRISPR-Cas systems can be used to disrupt the coding sequence of a target gene, and the CRISPR enzyme transfected clones can be screened simultaneously by next-generation sequencing (e.g., on the Ion Torrent PGM system) .
- next-generation sequencing e.g., on the Ion Torrent PGM system
- next-generation sequencing e.g., on the Ion Torrent PGM system
- a detailed description regarding how to prepare NGS libraries can be found, e.g., in Bell et al., “A high-throughput screening strategy for detecting CRISPR-Cas9 induced mutations using next-generation sequencing, ” BMC Genomics, 15.1 (2014) : 1002, which is incorporated herein by reference in its entirety. 5.3.7 Engineered Microorganisms
- Microorganisms e.g., E. coli, yeast, and microalgae
- E. coli, yeast, and microalgae are widely used for synthetic biology.
- the development of synthetic biology has a wide utility, including various clinical applications.
- the programmable CRISPR-Cas systems described herein can be used to split proteins of toxic domains for targeted cell death, e.g., using cancer-linked RNA as target transcript.
- pathways involving protein-protein interactions can be influenced in synthetic biological systems with e.g. fusion complexes with the appropriate effectors such as kinases or enzymes.
- crRNA sequences that target phage sequences can be introduced into the microorganism.
- the disclosure also provides methods of vaccinating a microorganism (e.g., a production strain) against phage infection.
- the CRISPR-Cas systems provided herein can be used to engineer microorganisms, e.g., to improve yield or improve fermentation efficiency.
- the CRISPR-Cas systems described herein can be used to engineer microorganisms, such as yeast, to generate biofuel or biopolymers from fermentable sugars, or to degrade plant-derived lignocellulose derived from agricultural waste as a source of fermentable sugars.
- the methods described herein can be used to modify the expression of endogenous genes required for biofuel production and/or to modify endogenous genes, which may interfere with the biofuel synthesis.
- the CRISPR-Cas systems described herein can be used to engineer microorganisms that have defective repair pathways, such as the mesophilic cellulolytic bacterium Clostridium cellylolyticum, a model organism for bioenergy research.
- a CRISPR nickase can be used to introduce single nicks at a target locus, which may result in insertion of an exogenously provided DNA template by homologous recombination.
- the CRISPR-Cas systems provided herein can be used to induce death or dormancy of a cell (e.g., a microorganism such as an engineered microorganism) .
- a cell e.g., a microorganism such as an engineered microorganism
- These methods can be used to induce dormancy or death of a multitude of cell types including prokaryotic and eukaryotic cells, including, but not limited to, mammalian cells (e.g., cancer cells, or tissue culture cells) , protozoans, fungal cells, cells infected with a virus, cells infected with an intracellular bacteria, cells infected with an intracellular protozoan, cells infected with a prion, bacteria (e.g., pathogenic and non-pathogenic bacteria) , protozoans, and unicellular and multicellular parasites.
- mammalian cells e.g., cancer cells, or tissue culture cells
- protozoans fungal
- the systems described herein can also be used in applications where it is desirable to kill or control a specific microbial population (e.g., a bacterial population) .
- the systems described herein may include a crRNA that targets a nucleic acid (e.g., a DNA) that is genus-, species-, or strain-specific, and can be delivered to the cell.
- a nucleic acid e.g., a DNA
- the nuclease activity of the Type V CRISPR effector proteins disrupts essential functions within the microorganisms, ultimately resulting in dormancy or death.
- the methods comprise contacting the cell with a system described herein including a Type V CRISPR effector proteins or a nucleic acid encoding the effector protein, and a crRNA or a nucleic acid encoding the RNA guide, wherein the spacer sequence is complementary to at least 15 nucleotides (e.g., 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50 or more nucleotides) of a target nucleic acid.
- a system described herein including a Type V CRISPR effector proteins or a nucleic acid encoding the effector protein, and a crRNA or a nucleic acid encoding the RNA guide, wherein the spacer sequence is complementary to at least 15 nucleotides (e.g., 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50 or more nucleotides) of a target nucleic acid.
- the nuclease activity of the Type V CRISPR effector proteins can induce programmed cell death, cell toxicity, apoptosis, necrosis, necroptosis, cell death, cell cycle arrest, cell anergy, a reduction of cell growth, or a reduction in cell proliferation.
- the cleavage of DNA by the Type V CRISPR effector proteins can be bacteriostatic or bactericidal. 5.3.8 Application in Plants
- the CRISPR-Cas systems described herein have a wide variety of utility in plants.
- the CRISPR-Cas systems can be used to engineer genomes of plants (e.g., improving production, making products with desired post-translational modifications, or introducing genes for producing industrial products) .
- the CRISPR-Cas systems can be used to introduce a desired trait to a plant (e.g., with or without heritable modifications to the genome) , or regulate expression of endogenous genes in plant cells or whole plants.
- Gene drive is the phenomenon in which the inheritance of a particular gene or set of genes is favorably biased.
- the CRISPR-Cas systems described herein can be used to build gene drives.
- the CRISPR-Cas systems can be designed to target and disrupt a particular allele of a gene, causing the cell to copy the second allele to fix the sequence. Because of the copying, the first allele will be converted to the second allele, increasing the chance of the second allele being transmitted to the offspring.
- gene disruption generally occurs with an event (such as a nuclease-induced, targeted double stranded break) that activates the endogenous non homologous end joining DNA repair mechanism of the target cell, yielding indels that often result in a loss of function mutation that is intended to benefit the patient.
- gene correction utilizes the nuclease activity to induce alternative DNA repair pathways (such as homology directed repair, or HDR) with the help of a template DNA (whether endogenous or exogenous, single stranded or double stranded) .
- alternative DNA repair pathways such as homology directed repair, or HDR
- the CRISPR systems described herein can be utilized to treat the following diseases: Cystic fibrosis by targeting CFTR (WO2015157070A2) , Duchenne Muscular Dystrophy and Becker Muscular Dystrophy by targeting Dystrophin (DMD) (WO2016161380A1) , Alpha-1-antitrypsin deficiency by targeting Alpha-1-antitrypsin (A1AT) (WO2017165862A1) , lysosomal storage disorders such as Pompe Disease aka Glycogen storage disease type II by targeting acid alpha-glucosidase (GAA) , myotonic dystrophy by targeting DMPK, Huntington disease by targeting HTT, Fragile X by targeting FMR1, Friedreich's ataxia by targeting Frataxin, amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) by targeting C9orf72, hereditary chronic kidney disease by targeting ApoL1, cardiovascular disease and hyperlipidemia by
- the condition or disease is cancer
- the cancer is selected from the group consisting of Wilms'tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, and urinary bladder cancer.
- the condition or disease is infectious, and wherein the infectious agent is selected from the group consisting of human immunodeficiency virus (HIV) , herpes simplex virus-l (HSV1) , and herpes simplex virus-2 (HSV2) , Hepatitis B.
- HIV human immunodeficiency virus
- HSV1 herpes simplex virus-l
- HSV2 herpes simplex virus-2
- Hepatitis B Hepatitis B.
- immune cells can also be edited.
- cancer immunotherapy one therapeutic mode is to modify immune cells such as T-cells to recognize and fight cancer, as referenced in WO2015161276A2.
- T-cells to recognize and fight cancer
- WO2015161276A2 To increase the efficacy and availability while decreasing cost, the creation of ‘off-the-shelf’ allogeneic T-cell therapies is attractive, and gene editing has the potential to modify surface antigens to minimize any immunological side effects (Jung et al., Mol Cell. 2018 Aug. 31) .
- CRISPR-Cas systems described herein can be engineered to enable additional functions that utilize enzymatically inactive effector protein as a chassis on top of which protein domains can be attached to confer activities such as transcriptional activation, repression, base editing, and methylation/demethylation.
- a polynucleotide encoding the CRISPR effector protein e.g. a Type V CRISPR effector protein of Section 5.2.2 (Cas Proteins)
- a polynucleotide encoding the guide molecule e.g. a crRNA of Section 5.2.1 (crRNA)
- the polynucleotide encoding the CRISPR effector protein is an mRNA molecule.
- the polynucleotide encoding the guide molecule is an mRNA molecule.
- the polynucleotide encoding the CRISPR effector protein is a circular RNA molecule. In some embodiments, the polynucleotide encoding the guide molecule is a circular RNA molecule. 5.4.1 Lipid Nanoparticles
- the mRNA encoding the CRISPR effector protein and the mRNA encoding the guide molecule are present in a delivery system selected from the group consisting of a lipid nanoparticle, a liposome, an exosome, a micro-vesicle, and a gene-gun.
- nucleic acid molecules described herein are formulated for in vitro and in vivo delivery.
- the nucleic acid molecule is formulated into a lipid-containing composition.
- the lipid-containing composition forms lipid nanoparticles enclosing the nucleic acid molecule within a lipid shell.
- the lipid shells protect the nucleic acid molecules from degradation.
- the lipid nanoparticles also facilitate transportation of the enclosed nucleic acid molecules into intracellular compartments and/or machinery to exert an intended therapeutic of prophylactic function.
- nucleic acids, when present in the lipid nanoparticles are resistant in aqueous solution to degradation with a nuclease.
- Lipid nanoparticles comprising nucleic acids and their method of preparation are known in the art, such as those disclosed in, e.g., U.S. Patent Publication No. 2004/0142025, U.S. Patent Publication No. 2007/0042031, PCT Publication No. WO 2017/004143, PCT Publication No. WO 2015/199952, PCT Publication No. WO 2013/016058, and PCT Publication No. WO 2013/086373, the full disclosures of each of which are herein incorporated by reference in their entirety for all purposes.
- the largest dimension of a nanoparticle composition provided herein is 1 ⁇ m or shorter (e.g., ⁇ 1 ⁇ m, ⁇ 900 nm, ⁇ 800 nm, ⁇ 700 nm, ⁇ 600 nm, ⁇ 500 nm, ⁇ 400 nm, ⁇ 300 nm, ⁇ 200 nm, ⁇ 175 nm, ⁇ 150 nm, ⁇ 125 nm, ⁇ 100 nm, ⁇ 75 nm, ⁇ 50 nm, or shorter) , such as when measured by dynamic light scattering (DLS) , transmission electron microscopy, scanning electron microscopy, or another method.
- the lipid nanoparticle provided herein has at least one dimension that is in the range of from about 40 to about 200 nm. In one embodiment, the at least one dimension is in the range of from about 40 to about 100 nm.
- Nanoparticle compositions that can be used in connection with the present disclosure include, for example, lipid nanoparticles (LNPs) , nano liproprotein particles, liposomes, lipid vesicles, and lipoplexes.
- nanoparticle compositions are vesicles including one or more lipid bilayers.
- a nanoparticle composition includes two or more concentric bilayers separated by aqueous compartments. Lipid bilayers may be functionalized and/or crosslinked to one another. Lipid bilayers may include one or more ligands, proteins, or channels.
- nanoparticle compositions as described comprise a lipid component including at least one lipid, such as a compound according to one of Formulae (I) to (IV) (and sub-formulas thereof) as described herein.
- a nanoparticle composition may include a lipid component including one of compounds provided herein.
- Nanoparticle compositions may also include one or more other lipid or non-lipid components as described below.
- Exemplary charged or ionizable lipids that can form part of the present nanoparticle composition include but are not limited to ( (4-hydroxybutyl) azanediyl) bis (hexane-6, 1-diyl) bis (2-hexyldecanoate) (ALC-0315) , 3- (didodecylamino) -N1, N1, 4-tridodecyl-1-piperazineethanamine (KL10) , N1- [2- (didodecylamino) ethyl] -N1, N4, N4-tridodecyl-1, 4-piperazinediethanamine (KL22) , 14, 25-ditridecyl-15, 18, 21, 24-tetraaza-octatriacontane (KL25) , 1, 2-dilinoleyloxy-N, N-dimethylaminopropane (DLinDMA) , 2, 2-dilinoleyl-4-dimethyl
- Additional exemplary charged or ionizable lipids that can form part of the present nanoparticle composition include the lipids (e.g., lipid 5) described in Sabnis et al. “A Novel Amino Lipid Series for mRNA Delivery: Improved Endosomal Escape and Sustained Pharmacology and Safety in Non-human Primates” , Molecular Therapy Vol. 26 No 6, 2018, the entirety of which is incorporated herein by reference.
- suitable cationic lipids include N- [1- (2, 3-dioleyloxy) propyl] -N, N, N-trimethylammonium chloride (DOTMA) ; N- [1- (2, 3-dioleoyloxy) propyl] -N, N, N-trimethylammonium chloride (DOTAP) ; 1, 2-dioleoyl-sn-glycero-3-ethylphosphocholine (DOEPC) ; 1, 2-dilauroyl-sn-glycero-3-ethylphosphocholine (DLEPC) ; 1, 2-dimyristoyl-sn-glycero-3-ethylphosphocholine (DMEPC) ; 1, 2-dimyristoleoyl-sn-glycero-3-ethylphosphocholine (14: 1) ; N1- [2- ( (1S) -1- [ (3-aminopropyl) amino] -4- [di (3
- cationic lipids with headgroups that are charged at physiological pH such as primary amines (e.g., DODAG N', N' -dioctadecyl-N-4, 8-diaza-10-aminodecanoylglycine amide) and guanidinium head groups (e.g., bis-guanidinium-spermidine-cholesterol (BGSC) , bis-guanidiniumtren-cholesterol (BGTC) , PONA, and (R) -5-guanidinopentane-1, 2-diyl dioleate hydrochloride (DOPen-G) ) .
- primary amines e.g., DODAG N', N' -dioctadecyl-N-4, 8-diaza-10-aminodecanoylglycine amide
- guanidinium head groups e.g., bis-guanidinium-spermidine-cholesterol (
- cationic lipid is (R) -5- (dimethylamino) pentane-1, 2-diyl dioleate hydrochloride (DODAPen-Cl) .
- the cationic lipid is a particular enantiomer or the racemic form, and includes the various salt forms of a cationic lipid as above (e.g., chloride or sulfate) .
- the cationic lipid is N- [1- (2, 3-dioleoyloxy) propyl] -N, N, N-trimethylammonium chloride (DOTAP-Cl) or N- [1- (2, 3-dioleoyloxy) propyl] -N, N, N-trimethylammonium sulfate (DOTAP-Sulfate) .
- DOTAP-Cl N-trimethylammonium chloride
- DOTAP-Sulfate N- [1- (2, 3-dioleoyloxy) propyl] -N, N, N-trimethylammonium sulfate
- the cationic lipid is an ionizable cationic lipid such as, e.g., dioctadecyldimethylammonium bromide (DDAB) ; 1, 2-dilinoleyloxy-3-dimethylaminopropane (DLinDMA) ; 2, 2-dilinoleyl-4- (2dimethylaminoethyl) - [1, 3] -dioxolane (DLin-KC2-DMA) ; heptatriaconta-6, 9, 28, 31-tetraen-19-yl 4- (dimethylamino) butanoate (DLin-MC3-DMA) ; 1, 2-dioleoyloxy-3-dimethylaminopropane (DODAP) ; 1, 2-dioleyloxy-3-dimethylaminopropane (DODMA) ; and morpholinocholesterol (Mo-CHOL) .
- DDAB dioct
- the charged or ionizable lipid that can form part of the present nanoparticle composition is a lipid including a cyclic amine group.
- Additional cationic lipids that are suitable for the formulations and methods disclosed herein include those described in WO2015199952, WO2016176330, and WO2015011633, the entire contents of each of which are hereby incorporated by reference in their entireties.
- the charged or ionizable lipid that can form part of the present nanoparticle composition is a lipid including a cyclic amine group.
- cationic lipids that are suitable for the formulations and methods disclosed herein include those described in WO2015199952, WO2016176330, WO2015011633, WO2018/081480, the entire contents of each of which are hereby incorporated by reference in their entireties.
- the lipid component of a nanoparticle composition can include one or more polymer conjugated lipids, such as PEGylated lipids (PEG lipids) .
- PEG lipids PEGylated lipids
- a polymer conjugated lipid component in a nanoparticle composition can improve of colloidal stability and/or reduce protein absorption of the nanoparticles.
- Exemplary cationic lipids that can be used in connection with the present disclosure include but are not limited to 2- [ (polyethylene glycol) -2000] -N, N-ditetradecylacetamide (ALC-0159) , PEG-modified phosphatidylethanolamines, PEG-modified phosphatidic acids, PEG-modified ceramides, PEG-modified dialkylamines, PEG-modified diacylglycerols, PEG-modifieddialkylglycerols, and mixtures thereof.
- 2- [ (polyethylene glycol) -2000] -N, N-ditetradecylacetamide (ALC-0159) PEG-modified phosphatidylethanolamines, PEG-modified phosphatidic acids, PEG-modified ceramides, PEG-modified dialkylamines, PEG-modified diacylglycerols, PEG-modifieddialkylglycerols, and mixtures thereof.
- a PEG lipid may be PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, PEG-DSPE, Ceramide-PEG2000, or Chol-PEG2000.
- the polymer conjugated lipid is a pegylated lipid.
- some embodiments include a pegylated diacylglycerol (PEG-DAG) such as 1- (monomethoxy-polyethyleneglycol) -2, 3-dimyristoylglycerol (PEG-DMG) , a pegylated phosphatidylethanoloamine (PEG-PE) , a PEG succinate diacylglycerol (PEG-S-DAG) such as 4-O- (2’ , 3’ -di (tetradecanoyloxy) propyl-1-O- ( ⁇ -methoxy (polyethoxy) ethyl) butanedioate (PEG-S-DMG) , a pegylated ceramide (PEG-cer) , or a PEG dialkoxypropylcarbamate such as ⁇ -methoxy (polyethoxy) ethy
- the polymer conjugated lipid is present in a concentration ranging from 1.0 to 2.5 molar percent. In one embodiment, the polymer conjugated lipid is present in a concentration of about 1.7 molar percent. In one embodiment, the polymer conjugated lipid is present in a concentration of about 1.5 molar percent.
- the molar ratio of cationic lipid to the polymer conjugated lipid ranges from about 35: 1 to about 25: 1. In one embodiment, the molar ratio of cationic lipid to polymer conjugated lipid ranges from about 100: 1 to about 20: 1.
- the molar ratio of cationic lipid to the polymer conjugated lipid ranges from about 35: 1 to about 25: 1. In one embodiment, the molar ratio of cationic lipid to polymer conjugated lipid ranges from about 100: 1 to about 20: 1.
- the pegylated lipid has the following Formula:
- the lipid component of a nanoparticle composition can include one or more phospholipids, such as one or more (poly) unsaturated lipids.
- phospholipids may assemble into one or more lipid bilayers structures.
- the neutral lipid is 1, 2-distearoyl-sn-glycero-3phosphocholine (DSPC) .
- the neutral lipid is selected from DSPC, DPPC, DMPC, DOPC, POPC, DOPE and SM.
- a nanoparticle composition comprising a cationic or ionizable lipid compound provided herein, a nucleic acid molecule, and one or more excipients.
- cationic or ionizable lipid compound comprises one or more ionizable lipid compounds described herein.
- the one or more excipients are selected from phospholipids, steroids, and polymer conjugated lipids.
- the therapeutic agent is encapsulated within or associated with the lipid nanoparticle.
- nanoparticle composition comprising:
- nanoparticle composition comprising:
- nanoparticle composition comprising:
- the ionizable lipid is ( (4-hydroxybutyl) azanediyl) bis (hexane-6, 1-diyl) bis (2-hexyldecanoate) (ALC-0315) .
- the phospholipid is DSPC.
- the steroid is cholesterol.
- the PEG conjugated lipid is 2- [ (polyethylene glycol) -2000] -N, N-ditetradecylacetamide (ALC-0159) .
- nanoparticle composition comprising:
- mol percent refers to a component’s molar percentage relative to total mols of all lipid components in the LNP (i.e., total mols of cationic lipid (s) , the neutral lipid, the steroid and the polymer conjugated lipid) .
- the lipid nanoparticle comprises from 41 to 49 mol percent, from 41 to 48 mol percent, from 42 to 48 mol percent, from 43 to 48 mol percent, from 44 to 48 mol percent, from 45 to 48 mol percent, from 46 to 48 mol percent, or from 47.2 to 47.8 mol percent of the cationic lipid. In one embodiment, the lipid nanoparticle comprises about 47.0, 47.1, 47.2, 47.3, 47.4, 47.5, 47.6, 47.7, 47.8, 47.9 or 48.0 mol percent of the ionizable lipid.
- the therapeutic agent to lipid ratio in the LNP i.e., N/P, were N represents the moles of cationic lipid and P represents the moles of phosphate present as part of the nucleic acid backbone
- N/P ranges from 6: 1 to 20: 1 or 2: 1 to 12: 1.
- Exemplary N/P ranges include about 3: 1. About 6: 1, about 12: 1 and about 22: 1.
- lipid nanoparticle comprising:
- a cationic lipid having an effective pKa greater than 6.0; ii) from 5 to 15 mol percent of a neutral lipid;
- mol percent is determined based on total mol of lipid present in the lipid nanoparticle.
- the cationic lipid can be any of a number of lipid species which carry a net positive charge at a selected pH, such as physiological pH. Exemplary cationic lipids are described herein below.
- the cationic lipid has a pKa greater than 6.25.
- the cationic lipid has a pKa greater than 6.5.
- the cationic lipid has a pKa greater than 6.1, greater than 6.2, greater than 6.3, greater than 6.35, greater than 6.4, greater than 6.45, greater than 6.55, greater than 6.6, greater than 6.65, or greater than 6.7.
- the lipid nanoparticle comprises from 40 to 45 mol percent of the cationic lipid. In one embodiment, the lipid nanoparticle comprises from 45 to 50 mole percent of the cationic lipid.
- the molar ratio of the cationic lipid to the neutral lipid ranges from about 2: 1 to about 8: 1. In one embodiment, the lipid nanoparticle comprises from 5 to 10 mol percent of the neutral lipid.
- Exemplary anionic lipids include, but are not limited to, phosphatidylglycerol, dioleoylphosphatidylglycerol (DOPG) , dipalmitoylphosphatidylglycerol (DPPG) or 1, 2-distearoyl-sn-glycero-3-phospho- (1' -rac-glycerol) (DSPG) .
- DOPG dioleoylphosphatidylglycerol
- DPPG dipalmitoylphosphatidylglycerol
- DSPG 1, 2-distearoyl-sn-glycero-3-phospho- (1' -rac-glycerol
- the lipid nanoparticle comprises from 1 to 10 mole percent of the anionic lipid. In one embodiment, the lipid nanoparticle comprises from 1 to 5 mole percent of the anionic lipid. In one embodiment, the lipid nanoparticle comprises from 1 to 9 mole percent, from 1 to 8 mole percent, from 1 to 7 mole percent, or from 1 to 6 mole percent of the anionic lipid. In one embodiment, the mol ratio of anionic lipid to neutral lipid ranges from 1: 1 to 1: 10.
- the steroid cholesterol In one embodiment, the steroid cholesterol. In one embodiment, the molar ratio of the cationic lipid to cholesterol ranges from about 5: 1 to 1: 1. In one embodiment, the lipid nanoparticle comprises from 32 to 40 mol percent of the steroid.
- the sum of the mol percent of neutral lipid and mol percent of anionic lipid ranges from 5 to 15 mol percent. In one embodiment, wherein the sum of the mol percent of neutral lipid and mol percent of anionic lipid ranges from 7 to 12 mol percent.
- the mol ratio of anionic lipid to neutral lipid ranges from 1: 1 to 1: 10. In one embodiment, the sum of the mol percent of neutral lipid and mol percent steroid ranges from 35 to 45 mol percent.
- the lipid nanoparticle comprises:
- the lipid nanoparticle comprises from 1.0 to 2.5 mol percent of the conjugated lipid. In one embodiment, the polymer conjugated lipid is present in a concentration of about 1.5 mol percent.
- the neutral lipid is present in a concentration ranging from 5 to 15 mol percent, 7 to 13 mol percent, or 9 to 11 mol percent. In one embodiment, the neutral lipid is present in a concentration of about 9.5, 10 or 10.5 mol percent. In one embodiment, the molar ratio of the cationic lipid to the neutral lipid ranges from about 4.1: 1.0 to about 4.9: 1.0, from about 4.5: 1.0 to about 4.8: 1.0, or from about 4.7: 1.0 to 4.8: 1.0.
- the lipid nanoparticle comprises from 1.0 to 2.5 mol percent of the conjugated lipid. In one embodiment, the polymer conjugated lipid is present in a concentration of about 1.5 mol percent.
- the molar ratio of cationic lipid to polymer conjugated lipid ranges from about 100: 1 to about 20: 1. In one embodiment, the molar ratio of cationic lipid to the polymer conjugated lipid ranges from about 35: 1 to about 25: 1.
- the lipid nanoparticle has a mean diameter ranging from 50 nm to 100 nm, or from 60 nm to 85 nm.
- the composition comprises a cationic lipid provided herein, DSPC, cholesterol, and PEG-lipid, and mRNA.
- the cationic lipid provided herein, DSPC, cholesterol, and PEG-lipid are at a molar ratio of about 50: 10: 38.5: 1.5.
- Nanoparticle compositions can be designed for one or more specific applications or targets.
- a nanoparticle composition can be designed to deliver a therapeutic and/or prophylactic agent such as an RNA to a particular cell, tissue, organ, or system or group thereof in a mammal’s body.
- Physiochemical properties of nanoparticle compositions can be altered in order to increase selectivity for particular bodily targets. For instance, particle sizes can be adjusted based on the fenestration sizes of different organs.
- the therapeutic and/or prophylactic agent included in a nanoparticle composition can also be selected based on the desired delivery target or targets.
- a therapeutic and/or prophylactic agent can be selected for a particular indication, condition, disease, or disorder and/or for delivery to a particular cell, tissue, organ, or system or group thereof (e.g., localized or specific delivery) .
- a nanoparticle composition can include an mRNA encoding a polypeptide of interest capable of being translated within a cell to produce the polypeptide of interest.
- Such a composition can be designed to be specifically delivered to a particular organ.
- a composition can be designed to be specifically delivered to a mammalian liver.
- the amount of a therapeutic and/or prophylactic agent in a nanoparticle composition can depend on the size, composition, desired target and/or application, or other properties of the nanoparticle composition as well as on the properties of the therapeutic and/or prophylactic agent.
- the amount of an RNA useful in a nanoparticle composition can depend on the size, sequence, and other characteristics of the RNA.
- the relative amounts of a therapeutic and/or prophylactic agent and other elements (e.g., lipids) in a nanoparticle composition can also vary.
- a nanoparticle composition includes one or more RNAs, and the one or more RNAs, lipids, and amounts thereof can be selected to provide a specific N: P ratio.
- the N: P ratio of the composition refers to the molar ratio of nitrogen atoms in one or more lipids to the number of phosphate groups in an RNA. In some embodiments, a lower N: P ratio is selected.
- the one or more RNA, lipids, and amounts thereof can be selected to provide an N: P ratio from about 2: 1 to about 30: 1, such as 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 12: 1, 14: 1, 16: 1, 18: 1, 20: 1, 22: 1, 24: 1, 26: 1, 28: 1, or 30: 1.
- the N: P ratio can be from about 2: 1 to about 8: 1.
- the N: P ratio is from about 5: 1 to about 8: 1.
- the N: P ratio may be about 5.0: 1, about 5.5: 1, about 5.67: 1, about 6.0: 1, about 6.5: 1, or about 7.0: 1.
- the N: P ratio may be about 5.67: 1.
- the physical properties of a nanoparticle composition can depend on the components thereof.
- a nanoparticle composition including cholesterol as a structural lipid can have different characteristics compared to a nanoparticle composition that includes a different structural lipid.
- the characteristics of a nanoparticle composition can depend on the absolute or relative amounts of its components. For instance, a nanoparticle composition including a higher molar fraction of a phospholipid may have different characteristics than a nanoparticle composition including a lower molar fraction of a phospholipid. Characteristics may also vary depending on the method and conditions of preparation of the nanoparticle composition.
- Nanoparticle compositions may be characterized by a variety of methods. For example, microscopy (e.g., transmission electron microscopy or scanning electron microscopy) may be used to examine the morphology and size distribution of a nanoparticle composition. Dynamic light scattering or potentiometry (e.g., potentiometric titrations) may be used to measure zeta potentials. Dynamic light scattering may also be utilized to determine particle sizes. Instruments such as the Zetasizer Nano ZS (Malvem Instruments Ltd, Malvem, Worcestershire, UK) may also be used to measure multiple characteristics of a nanoparticle composition, such as particle size, polydispersity index, and zeta potential.
- microscopy e.g., transmission electron microscopy or scanning electron microscopy
- Dynamic light scattering or potentiometry e.g., potentiometric titrations
- Dynamic light scattering may also be utilized to determine particle sizes.
- Instruments such as the Ze
- the mean size of a nanoparticle composition can be between 10s of nm and 100s of nm.
- the mean size can be from about 40 nm to about 150 nm, such as about 40 nm, 45 nm, 50 nm, 55 nm, 60 nm, 65 nm, 70 nm, 75 nm, 80 nm, 85 nm, 90 nm, 95 nm, 100 nm, 105 nm, 110 nm, 115 nm, 120 nm, 125 nm, 130 nm, 135 nm, 140 nm, 145 nm, or 150 nm.
- the mean size of a nanoparticle composition can be from about 50 nm to about 100 nm, from about 50 nm to about 90 nm, from about 50 nm to about 80 nm, from about 50 nm to about 70 nm, from about 50 nm to about 60 nm, from about 60 nm to about 100 nm, from about 60 nm to about 90 nm, from about 60 nm to about 80 nm, from about 60 nm to about 70 nm, from about 70 nm to about 100 nm, from about 70 nm to about 90 nm, from about 70 nm to about 80 nm, from about 80 nm to about 100 nm, from about 80 nm to about 90 nm, or from about 90 nm to about 100 nm.
- the mean size of a nanoparticle composition can be from about 70 nm to about 100 nm. In some embodiments, the mean size can be about 80
- the zeta potential of a nanoparticle composition can be used to indicate the electrokinetic potential of the composition.
- the zeta potential can describe the surface charge of a nanoparticle composition.
- Nanoparticle compositions with relatively low charges, positive or negative, are generally desirable, as more highly charged species can interact undesirably with cells, tissues, and other elements in the body.
- the efficiency of encapsulation of a therapeutic and/or prophylactic agent describes the amount of therapeutic and/or prophylactic agent that is encapsulated or otherwise associated with a nanoparticle composition after preparation, relative to the initial amount provided.
- the encapsulation efficiency is desirably high (e.g., close to 100%) .
- the encapsulation efficiency can be measured, for example, by comparing the amount of therapeutic and/or prophylactic agent in a solution containing the nanoparticle composition before and after breaking up the nanoparticle composition with one or more organic solvents or detergents. Fluorescence can be used to measure the amount of free therapeutic and/or prophylactic agent (e.g., RNA) in a solution.
- free therapeutic and/or prophylactic agent e.g., RNA
- the encapsulation efficiency of a therapeutic and/or prophylactic agent can be at least 50%, for example 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%.
- the encapsulation efficiency can be at least 80%. In certain embodiments, the encapsulation efficiency can be at least 90%.
- a nanoparticle composition can optionally comprise one or more coatings.
- a nanoparticle composition can be formulated in a capsule, film, or tablet having a coating.
- a capsule, film, or tablet including a composition described herein can have any useful size, tensile strength, hardness, or density. 5.4.2 Vectors
- the CRISPR-Cas12 systems described herein, or components thereof, nucleic acid molecules thereof, or nucleic acid molecules encoding or providing components thereof, can be delivered by various delivery systems such as vectors, e.g., plasmids, viral delivery vectors, such as adeno-associated viruses (AAV) , lentiviruses, adenoviruses, and other viral vectors, or methods, such as nucleofection or electroporation of ribonucleoprotein complexes consisting of Type V-I effectors and their cognate RNA guide or guides.
- the proteins and one or more RNA guides can be packaged into one or more vectors, e.g., plasmids or viral vectors.
- the vectors e.g., plasmids or viral vectors
- the tissue of interest by, e.g., intramuscular injection, intravenous administration, transdermal administration, intranasal administration, oral administration, or mucosal administration.
- Such delivery may be either via a single dose or multiple doses.
- the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.
- the delivery is via adeno-associated viruses (AAV) , e.g., AAV2, AAV8, or AAV9, which can be administered in a single dose containing at least 1 ⁇ 10 5 particles (also referred to as particle units, pu) of adenoviruses or adeno-associated viruses.
- AAV adeno-associated viruses
- the dose is at least about 1 ⁇ 10 6 particles, at least about 1 ⁇ 10 7 particles, at least about 1 ⁇ 10 8 particles, or at least about 1 ⁇ 10 9 particles of the adeno-associated viruses.
- the delivery methods and the doses are described, e.g., in WO 2016205764 and U.S. Pat. No. 8,454,972, both of which are incorporated herein by reference in their entirety.
- Type V-I CRISP-Cas effector proteins described herein enables greater versatility in packaging the effector and RNA guides with the appropriate control sequences (e.g., promoters) required for efficient and cell-type specific expression.
- the delivery is via a recombinant adeno-associated virus (rAAV) vector.
- a modified AAV vector may be used for delivery.
- Modified AAV vectors can be based on one or more of several capsid types, including AAV1, AV2, AAV5, AAV6, AAV8, AAV8.2. AAV9, AAV rhlO, modified AAV vectors (e.g., modified AAV2, modified AAV3, modified AAV6) and pseudotyped AAV (e.g., AAV2/8, AAV2/5 and AAV2/6) .
- Exemplary AAV vectors and techniques that may be used to produce rAAV particles are known in the art (see, e.g., Aponte-Ubillus et al. (2016) Appl. Microbiol. Biotechnol. 102 (3) : 1045-54; Zhong et al. (2012) J. Genet. Syndr. Gene Ther. S1: 008; West et al. (1987) Virology 160: 38-47 (1987) ; Tratschin et al. (1985) Mol. Cell. Biol. 5: 3251-60) ; U.S. Pat. Nos. 4,797,368 and 5,173,414; and International Publication Nos. WO 2015/054653 and WO 93/24641, each of which is incorporated by reference) .
- the delivery is via plasmids.
- the dosage can be a sufficient number of plasmids to elicit a response.
- suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg.
- Plasmids will generally include (i) a promoter; (ii) a sequence encoding a nucleic acid-targeting CRISPR effector protein, operably linked to the promoter; (iii) a selectable marker; (iv) an origin of replication; and (v) a transcription terminator downstream of and operably linked to (ii) .
- nucleic acid sequences used in a vector for delivering CRISPR effector proteins are provided in Table 5.
- CRISPR cell penetrating peptides
- a cell penetrating peptide is linked to the CRISPR effector proteins.
- the CRISPR effector proteins and/or RNA guides are coupled to one or more CPPs to transport them inside cells effectively (e.g., plant protoplasts) .
- the CRISPR effector proteins and/or RNA guide (s) are encoded by one or more circular or non-circular DNA molecules that are coupled to one or more CPPs for cell delivery.
- CPPs are short peptides of fewer than 35 amino acids derived either from proteins or from chimeric sequences capable of transporting biomolecules across cell membrane in a receptor independent manner.
- CPPs can be cationic peptides, peptides having hydrophobic sequences, amphipathic peptides, peptides having proline-rich and anti-microbial sequences, and chimeric or bipartite peptides.
- CPPs include, e.g., Tat (which is a nuclear transcriptional activator protein required for viral replication by HIV type 1) , penetratin, Kaposi fibroblast growth factor (FGF) signal peptide sequence, integrin 3 signal peptide sequence, polyarginine peptide Args sequence, Guanine rich-molecular transporters, and sweet arrow peptide.
- Tat which is a nuclear transcriptional activator protein required for viral replication by HIV type 1
- FGF Kaposi fibroblast growth factor
- FGF Kaposi fibroblast growth factor
- integrin 3 signal peptide sequence integrin 3 signal peptide sequence
- polyarginine peptide Args sequence e.g., in Hallbrink et al., “Prediction of cell-penetrating peptides, ” Methods Mol.
- CRISPR-Cas system as a ribonucleoprotein complex by electroporation or nucleofection, in which purified CRISPR effector protein is pre-incubated with an RNA guide and electroporated (or nucleofected) into cells of interest, is another method of efficiently introducing the CRISPR system to cells for gene editing. This is particularly useful for ex vivo genome editing and the development of cellular therapies, and such methods are described in Roth et al. “Reprogramming human T cell function and specificity with non-viral genome targeting, ” Nature, 2018 July; 559 (7714) : 405-409.
- kits for carrying out the various methods of the disclosure utilizing the CRISPR-Cas systems described herein comprises (a) one or more nucleic acids encoding a CRISPR effector protein and a crRNA, and/or (b) a ribonucleoprotein complex of a CRISPR effector protein and a crRNA.
- the kit comprises a Type V CRISPR effector protein (e.g. a Type V CRISPR effector protein of Section 5.2.2) and a crRNA (e.g. a crRNA of Section 5.2.1) .
- Kits of this disclosure also optionally include additional reagents, including one or more of a reaction buffer, a wash buffer, one or more control materials (e.g., a substrate or a nucleic acid encoding a CRISPR system component) , etc.
- a kit of the present disclosure also optionally includes instructions for performing a method of this disclosure using materials provided in the kit.
- the instructions are provided in physical form, e.g., as a printed document physically packaged with another item of the kit, and/or in digital form, e.g., a digitally published document downloadable from a website or provided on computer readable media. 6.
- the inventors analyzed the genome and metagenome of an uncultured organism and identified a new Cas protein through redundant removal, protein clustering, and other analyses. Blast analysis results showed that the Cas protein had low sequence similarity with the reported Cas protein sequences.
- the newly identified Cas protein was named CasY7 in this invention.
- the amino acid sequence of the CasY7 protein is shown as in Table 4 and also provided below, and the nucleotide sequence encoding the CasY7 protein (after human codon optimization) is shown as in Table 5.
- Cas Y7 amino acid sequence Nucleotide sequence encoding Cas Y7 protein:
- the DNA sequence of the direct repeat (DR) sequence corresponding to the CasY7 protein is: CAAGTTGAATCCGTCTATAACTGACGG (SEQ ID NO: 437) .
- the inventors further analyzed the RNA secondary structure of the DR sequence in the pre-crRNA using RNAfold. The analysis results are shown in FIG. 3. It was found that the PAM corresponding to CasY7 is 5’ -TTN, where N represents A/T/C/G.
- the crRNA sequence of the CasY7 protein comprises spacer sequences and direct repeat (DR) sequences.
- the CasY7 of the present invention belongs to the Cas12 protein family 6.2
- the TTR gene was selected as the target, and a spacer sequence was designed based on the target sequence of the TTR gene: GCATCTCCCCATTCCATGAG (SEQ ID NO: 435) .
- sgRNA sequences targeting the TTR target gene were designed as follows in Table 6:
- the T7 promoter and rrnB T2 terminator were added to the 5'a nd 3' ends, respectively, of the sgRNA sequences for both CasY7 and LbCpf1.
- the sequence for CasY7-TTR-sgRNA1 expression is provided as follows: The single underlined sequence represents the CasY7 DR sequence, the double underlined sequence represents the spacer sequence, the italicized sequence represents the T7 promoter, the wavy underlined sequence represents the rrnB T2 terminator sequence, the spacer sequence and rrnB T2 terminator sequence are separated by a linker sequence, the dashed sequence represents the MfeI enzyme cutting site, and the bold sequence represents the MluI enzyme cutting site.
- the sequence for LbCpf1-TTR-sgRNA1 expression is provided as follows:
- the single underlined sequence represents the LbCpf1 DR sequence
- the double underlined sequence represents the spacer sequence
- the italicized sequence represents the T7 promoter
- the wavy underlined sequence represents the rrnB T2 terminator sequence
- the spacer sequence and rrnB T2 terminator sequence are separated by a linker sequence
- the dashed sequence represents the MfeI enzyme cutting site
- the bold sequence represents the MluI enzyme cutting site.
- AGC was introduced at the 5' end and ATA at the 3' end of both the synthesized CasY7-TTR-sgRNA1 expression fragment sequence and the LbCpf1-TTR-sgRNA1 expression fragment sequence as protective bases.
- sgRNA expression sequences (CasY7-sgRNA expression sequence and LbCpf1-sgRNA expression sequence) as described in step (1) .
- the sgRNA expression sequences were subjected to double enzyme digestion (MfeI/MluI) treatment, and then inserted into the CasY7 recombinant expression plasmid vector, which had undergone the same double enzyme digestion (MfeI/MluI) treatment, to obtain the recombinant expression plasmid for expressing CasY7 and sgRNA, named CasY7+sgRNA expression plasmid.
- the same method was used to construct the LbCpf1+sgRNA expression plasmid.
- araC-pBAD-CCDB sequence fragment (sequence provided below) .
- This araC-pBAD-CCDB fragment was inserted into the pKESK22 plasmid (Addgene, Plasmid #64857) at positions 1284-1300, resulting in the construction of the Target plasmid.
- the sequence of the Target plasmid is described below, and the plasmid map can be found in FIG. 5.
- the Target plasmid was introduced into DH5 ⁇ competent cells, and streaked onto LB solid medium containing 50 ⁇ g/ml kanamycin. The plates were then incubated overnight at 37°C in an incubator. On the next day, single colonies were picked from the plates and inoculated into 4 ml LB liquid medium containing 50 ⁇ g/ml kanamycin (provided by SANGON Biotech, A100408-0100) , and incubated at 37°C with shaking at 200 rpm overnight. The next day, 4 ml of the culture was transferred to 400 ml of LB liquid medium containing 50 ⁇ g/ml kanamycin in a 2 L flask, and cultured at 37°C with shaking at 200 rpm for 2-3 hours.
- the flask was removed from the shaker and placed on ice for 10-15 minutes. Under sterile conditions, the culture medium was transferred to pre-cooled 500 ml centrifuge tubes and centrifuged at 4°C, 3000 rpm for 8 minutes. The supernatant was discarded, and approximately 200 ml of pre-cooled CaCl2 solution was added to the cell pellet. The mixture was gently pipetted to suspend the cells and incubated on ice for 30 minutes. After incubation, the culture was centrifuged again at 4°C, 3000 rpm for 8 minutes, and the supernatant was discarded.
- the CasY7+sgRNA expression plasmid and LbCpf1+sgRNA expression plasmid were separately introduced into the prepared competent cells from step 2.
- the specific procedure was as follows:
- the competent cells were taken out from -80°C and quickly thawed on ice. After approximately 5 minutes, the cell clumps melted, and the CasY7+sgRNA expression plasmid was added. The mixture was gently mixed by flicking the bottom of the centrifuge tube by hand and allowed to stand on ice for 25 minutes.
- LB agar plates containing 30 ⁇ g/ml carbenicillin (provided by SANGON Biotech, A100358-0001) (referred to as C-LB medium) and LB agar plates containing 30 ⁇ g/ml carbenicillin and 10 mM L-arabinose (provided by SANGON Biotech, A610071-0100) (referred to as CL-LB medium) .
- C-LB medium 30 ⁇ g/ml carbenicillin
- CL-LB medium 10 mM L-arabinose
- the Target plasmid carries the PBAD promoter inducible by L-arabinose and the CCDB gene regulated by the PBAD promoter.
- the CCDB gene can express the CCDB toxic protein, which acts as a DNA gyrase inhibitor, locking the DNA gyrase and the broken double-stranded DNA complex, preventing the DNA gyrase from functioning and ultimately leading to cell death.
- the inventors designed a method to detect the editing efficiency in Escherichia coli: Under conditions where L-arabinose is present in the medium, if the CasY7 protein or LbCpf1 protein, guided by sgRNA, can specifically target the target sequence of the TTR gene (gcatctcccc attccatgag) on the Target plasmid and exert cleavage action, the regulatory expression pathway of the CCDB toxic protein by the PBAD promoter will be interrupted, and the host cells will survive because they do not produce the ccdB toxic protein.
- the target sequence of the TTR gene gcatctcccc attccatgag
- the expression of the CCDB gene controlled by the PBAD promoter induced by L-arabinose will produce the CCDB toxic protein, resulting in death of the host cells, Escherichia coli.
- the editing efficiency of the CasY7 protein in targeting cleavage of the TTR target gene in Escherichia coli can be calculated based on the ratio of the number of bacterial clones on the CL-LB medium to the number of bacterial clones on the C-LB medium, as shown in step (2) .
- DR-crRNA Conventional crRNA sequences (DR-crRNA) were designed based on the target sequence of the hHao1 gene: AGAAAUCCGUCCAAAGCUGACGG GGACAGAGGGUCAGCAUGCCAA (SEQ ID NO: 402)
- the crRNA sequence (slDR-spacer) is as follows:
- Double underlined sequence represents the stem-loop sequence
- single underlined sequence represents the DR sequence
- the remaining sequence represent the spacer sequence.
- Chemical modifications were applied to the last three bases at both the 3'a nd 5' ends of the crRNA sequence (as shown in Table 9) .
- DR-crRNA and slDR-crRNA were synthesized chemically by Nanjing GenScript.
- the PCR amplification conditions were as follows: Stage 1: 94°C for 2 minutes, Stage 2: (98°C for 10 seconds; 65°C for 30 seconds; 68°C for 3 minutes) repeated for 30 cycles, Stage 3: 68°C for 5 minutes; 12°C, indefinite.
- the ethanol solution of the lipid carrier was mixed with the buffer of mRNA at a ratio of 1: 3 (volume/volume) (with a total lipid-to-mRNA mass ratio of 40: 1) , and nucleic acid lipid nanoparticles were obtained at a flow rate of 12ml/min using a microfluidic nanoparticle manufacturing system (NanoAssemblr Ignite, Canada) .
- the obtained nucleic acid lipid nanoparticles (LNPs) were immediately diluted 40 times with 1 ⁇ DPBS buffer.
- the obtained LNPs were transfected into HepG2 cells at different doses (including 1ng, 2ng, 4ng, 5ng, and other doses) , and all cells were collected after 48 hours.
- Genomic DNA was extracted using a genome extraction kit (Tiangen, DP304-03) .
- the samples were sent to a sequencing service for high-throughput sequencing. Based on the sequencing results, the corresponding editing efficiency was obtained by using the ICE Analysis online analysis program on the Synthego website, inputting the corresponding guide sequences and sequencing results.
- the statistical results of the editing efficiency mediated by various crRNAs are shown in FIGs. 16A-16F.
- Cas ⁇ -2) , and Cas12L_16_70731038 were each cloned into the pcDNA3.1 (+) (Invitrogen, V79020) backbone to obtain expression vectors for each Cas protein.
- the crRNAs were cloned into pGL3-U6-sgRNA-EGFP (Addgene, Plasmid #107721) to obtain expression vectors for each crRNA.
- hHAO1 as the target gene, with the target sequence: GGACAGAGGGUCAGCAUGCCAA, and selected sl sequences with different modifications for experiments with CasY6 and SiCas12i in different groups, as follows: Table 15 m represents methylation of the nucleotides. *represents phosphorothioate modification.
- LNP lipids purchased from Aiweito (Shanghai) Pharmaceutical Technology Co., Ltd.
- Yoltech Lipid1 Compound 10
- DSPC DSPC
- cholesterol cholesterol
- PEG-DMG phosphatidylcholine
- C05440-T5 mRNA and T5-C05440 mRNA were each dissolved with hHAO1-targeting hHAO-crRNA (at a mass ratio of 1: 1) in 100mM enzyme-free citrate buffer at pH 4 (RNA concentration of 0.2mg/mL) .
- the ethanol solution of the lipid carrier was mixed with the mRNA buffer at a ratio of 1: 3 (volume/volume) (with a mass ratio of total lipids to mRNA of 40: 1) , and passed through a microfluidic nano-drug manufacturing system (NanoAssemblr Ignite, Canada) at a flow rate of 12ml/min to obtain nucleic acid lipid nanoparticles.
- the obtained nucleic acid lipid nanoparticles were immediately diluted 40-fold in 1 ⁇ DPBS buffer.
- HepG2 cells purchased from ATCC were cultured in DMEM medium (Gibco, 11965092) supplemented with 10%FBS (v/v) and 1%Penicillin Streptomycin (v/v) (Gibco, 15140122) in a 37°Cincubator with 5%CO 2 .
- DMEM medium Gibco, 11965092
- 1%Penicillin Streptomycin v/v
- v/v 1%Penicillin Streptomycin
- Yoltech Lipid1 (Compound 10) is synthesized as follows: 5- [ (2-butyl-1-oxooctyl) oxy] pentanoic acid-7-butyl-21- (10-butyl-3, 9-dioxo-2, 8-dioxahexadecane-1-yl) -19- [3- (diethylamino) propyl] -8-oxo-19-aza-9-oxadocosane-22-yl ester
- reaction was quenched by adding 500 mL of water, extracted twice with 500 mL of ethyl acetate, and the combined organic phases were washed with saturated brine, dried over anhydrous sodium sulfate, concentrated under reduced pressure, and purified by column chromatography to obtain 5-hydroxypentanoic acid benzyl ester (37.00 g, yield 71.2%) .
- reaction was quenched by adding 200 mL of water, extracted twice with 200 mL of dichloromethane, and the combined organic phases were washed with brine, dried over anhydrous sodium sulfate, filtered, concentrated, and purified by column chromatography to obtain 2-butyloctanoic acid-18-butyl-8- (hydroxymethyl) -5, 11, 17-trioxo-6, 10, 16-trioxatetracosane-1-yl ester (7.80 g, yield 69.9%) .
- compound 1-8 600.0 mg, 0.80 mmol, 1.0 eq
- 3-amino-1-propanol 300.0 mg, 3.99 mmol, 5.0 eq
- potassium carbonate 280.0 mg, 2.00 mmol, 2.5 eq
- potassium iodide 130.0 mg, 0.80 mmol, 1.0 eq
- reaction mixture was concentrated, diluted with water, extracted with dichloromethane three times, and the combined organic phases were washed with saturated brine, dried over anhydrous sodium sulfate, concentrated, and purified by column chromatography to obtain 5- [ (2-butyl-1-oxooctyl) oxy] pentanoic acid-7-butyl-21- (10-butyl-3, 9-dioxo-2, 8-dioxahexadecane-1-yl) -19- [3- (diethylamino) propyl] -8-oxo-19-aza-9-oxadocosane-22-yl ester (132.8 mg, yield 18.8%) .
- the Cas12 system containing the sl sequences disclosed in this invention has higher editing activity and potential development value.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Plant Pathology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Medicinal Chemistry (AREA)
- Cell Biology (AREA)
- Mycology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
Provided are novel systems, methods, and compositions for the manipulation of nucleic acids in a targeted fashion. Also provided herein are non-naturally occurring, engineered CRISPR systems, components, and methods for targeted modification of nucleic acids such as DNA. Each system includes one or more protein components and one or more nucleic acid components that together target nucleic acids.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of priority to International Patent Application No.: PCT/CN2024/084836 filed on March 29, 2024, the content of which is incorporated by reference in its entirety.
SEQUENCE LISTING
SEQUENCE LISTING
This application contains an electronic Sequence Listing which has been submitted in XML file format with this application, the entire content of which is incorporated by reference herein in its entirety. The Sequence Listing XML file submitted with this application is entitled “14816-008-228_SEQLISTING. xml” , was created on March 26, 2025, and is 1,038,865 bytes in size.
The present disclosure provides novel CRISPR-Cas systems and uses thereof.
Clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated (Cas) genes, collectively referred to as CRISPR-Cas or CRISPR/Cas system, are now understood to provide immunity to bacteria and archaea against phage infection. CRISPR-Cas systems of adaptive immunity in prokaryotes consist of extremely diverse effectors, non-coding elements, and locus structures that can be engineered and used for applications such as gene editing, target detection, and disease treatment.
CRISPR-Cas systems can be broadly classified into two classes: Class 1 systems are composed of multiple effector proteins that together form a complex around a crRNA, and Class 2 systems consist of a single effector protein that complexes with the RNA guide to target DNA or RNA substrates. The single-subunit effector composition of the Class 2 systems provides a simpler component set for engineering and application translation, and have thus far been an important source of programmable effectors. Thus, the discovery, engineering, and optimization of novel Class 2 systems may lead to widespread and powerful programmable technologies for genome engineering and beyond.
The characterization and engineering of Class 2 CRISPR-Cas systems, exemplified by CRISPR-Cas9, have paved the way for a diverse array of biotechnology applications in genome editing and beyond. Nevertheless, there remains a need for additional programmable effectors and systems for modifying nucleic acids and polynucleotides (i.e., DNA, RNA, or any hybrid, derivative, or modification) beyond the current CRISPR-Cas systems that enable new applications through their unique properties.
In general, a CRISPR-Cas system is characterized by elements that promote the formation of a CRISPR-Cas complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR-Cas system) . In the context of formation of a CRISPR-Cas complex, “target sequence” refers to a sequence to which a guide molecule is designed to target, e.g., have complementarity, where hybridization between a target sequence and a sequence of a guide molecule promotes the formation of a CRISPR-Cas complex. The section of the sequence in the guide molecule through which complementarity to the target sequence is important for cleavage activity is referred to herein as the seed sequence. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides and is comprised within a target locus of interest. In some embodiments, a target sequence is located in the nucleus or cytoplasm of a cell.
The CRISPR-Cas loci has more than 50 gene families being identified and characterized, and there is no strictly universal genes. Therefore, no single evolutionary tree is feasible and a multi-pronged approach is needed to identify new families. So far, there has been comprehensive cas gene identification of 395 profiles for 93 Cas proteins. Classification includes signature gene profiles plus signatures of locus architecture. Class 1 includes multisubunit crRNA-effector complexes (Cascade) and Class 2 includes Single-subunit crRNA-effector complexes (Cas9-like) . FIG. 1 shows Class 2 (Types II, V, and VI) classification of CRISPR-Cas systems as proposed by Makarova et al. (See Makarova et al. “Evoluntionary classification of CRISPR-Cas systems; a burst of class 2 and derived variants. ” Nat Rev Microbiol. 2020 Feb; 18 (2) : 67-83. ” )
The action of the CRISPR-Cas system is usually divided into three stages: (1) adaptation or spacer integration, (2) processing of the primary transcript of the CRISPR locus (pre-crRNA) and maturation of the crRNA which includes the spacer and variable regions corresponding to 5’ and 3’ fragments of CRISPR repeats, and (3) DNA (or RNA) interference. In some CRISPR-Cas families, protein factors, such as Cas1 and Cas2, that are present in the great majority of the known CRISPR-Cas systems are sufficient for the insertion of spacers into the CRISPR cassettes. These two proteins form a complex that is required for this adaptation process; the endonuclease activity of Cas1 is required for spacer integration whereas Cas2 appears to perform a nonenzymatic function. The Cas1-Cas2 complex represents the highly conserved “information processing” module of CRISPR-Cas that appears to be quasi-autonomous from the rest of the system. (See Makarova K S, Koonin E V. “Annotation and Classification of CRISPR-Cas Systems, ” Methods Mol Biol. 2015; 1311: 47-75. )
Provided herein are new designs of a CRISPR guide molecule of a CRISPR-Cas system, particular a Type V CRISPR-Cas system. Particularly, through screening of direct repeat (DR) variants, the inventors discovered that manipulating secondary structural features of the DR can significantly enhance target recognition and cleavage efficiency of a CRISPR-Cas systems of the present disclosure. The guide molecule design disclosed herein is largely independent of the spacer sequence and tolerates variations in the DR sequence, demonstrating broad adaptability of the present guide molecule design to various CRISPR-Cas systems, including its use with various Type V CRISPR effector proteins, such as many subclasses of the Cas12 proteins.
In one aspect, provided herein is a CRISPR RNA (crRNA) comprising, in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, and a second stem-loop sequence, wherein the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 3-12 base pairs and a first loop of about 3-10 nucleotides; wherein the connector comprises about 3-10 nucleotides; and wherein the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 3-12 base pairs and a second loop of about 3-15 nucleotides.
In some embodiments, the crRNA further comprises a spacer region 3’ to the second stem-loop sequence, wherein the spacer region is at least about 15 nucleotides in length, optionally wherein the spacer region is about 15-50 nucleotides.
In some embodiments, the crRNA further comprises a floater region 5’ to the first stem-loop sequence, wherein the floater is at least about 1 nucleotide, 2 nucleotides, or 3 nucleotides.
In some embodiments, the second stem is about 5 base pairs, and the second loop is about 5, 6, or 7 nucleotides.
In some embodiments, the connector region is about 4, 5, or 6 nucleotides.
In some embodiments, the first stem is about 7 or 8 base pairs. In some embodiments, the first loop is about 4 nucleotides. In some embodiments, the first stem is about 7 or 8 base pairs, and the first loop is about 4 nucleotides. In some embodiments, the first stem is 7 base pairs, and the first loop is about 4 nucleotides. In some embodiments, the first loop comprises the sequence of 5’ -GAAA-3’ .
In some embodiments, one or more nucleotides of the crRNA molecule are methylated. In some embodiments, one or more nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, all nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, no nucleotide from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop is methylated. In some embodiments, one or more nucleotides of the 3' end of the crRNA molecule are methylated. In some embodiments, up to 3 nucleotides of the 3' end of the crRNA molecule are methylated.
In some embodiments, the spacer region is about 20 to 40 nucleotides.
In some embodiments, the polynucleotide is an RNA molecule.
In some embodiments, the first stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%to a first stem-loop sequence of any of SEQ ID NOs: 2, and 478-494; optionally wherein the first stem-loop sequence consists of a first stem-loop sequence of any of SEQ ID NOs: 2, and 478-494. In some embodiments, the first stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%to a first stem-loop sequence set forth in Table 1; optionally wherein the first stem-loop sequence consists of a first stem-loop sequence set forth in Table 1.
In some embodiments, the crRNA is a single-stranded polynucleotide, wherein the single-stranded polynucleotide comprises a sequence of any of SEQ ID NOs: 73-120, 456-476, and 547-631. In some embodiments, the crRNA is a single-stranded polynucleotide, wherein the single-stranded polynucleotide comprises a sequence set forth in Table 3.
In one aspect, provided herein is a modified Type V CRISPR RNA (crRNA) comprising at least one stem-loop sequence connected to the 5’ end of a naturally-existing Type V crRNA or a functional derivative thereof, wherein the stem-loop sequence is capable of forming a stem-loop structure having a stem of about 3-12 base pairs and a loop of about 3-10 nucleotides.
In some embodiments, the stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%to any of SEQ ID NOs: 1-17, and 441; optionally wherein the stem-loop sequence comprises a sequence set forth in any of SEQ ID NOs: 1-17, and 441; optionally wherein the stem-loop sequence consists of a sequence set forth in any of SEQ ID NOs: 1-17, and 441. In some embodiments, the stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%to a sequence set forth in Table 1; optionally wherein the stem-loop sequence comprises a sequence set forth in Table 1; optionally wherein the stem-loop sequence consists of a sequence set forth in Table 1.
In some embodiments, the stem-loop sequence is connected to the 5’ end of the naturally-existing CRISPR-Type V guide RNA or the functional derivative thereof via a connector sequence, and wherein the connector sequence comprises about 3-10 nucleotides.
In some embodiments, the naturally existing Type V crRNA is processed from a CRISPR array located 3’ to a Type V CRISPR-Cas locus.
In some embodiments, the naturally existing Type V crRNA comprises a sequence set forth in any of SEQ ID NOs: 18-70; optionally wherein the naturally existing Type V crRNA or functional derivative thereof consists of a sequence set forth in any of SEQ ID NOs: 18-70. In some embodiments, the naturally existing Type V crRNA comprises a sequence set forth in Table 2; optionally wherein the naturally existing Type V crRNA or functional derivative thereof consists of a sequence set forth in Table 2.
In some embodiments, the modified Type V crRNA comprises a sequence having at least about 75%, about 80%, about 85%, about 90%, about 95%or about 97%sequence identity to a sequence set forth in any of SEQ ID NOs: 73-120, 456-476, and 547-631; optionally wherein the modified Type V crRNA comprises a sequence set forth in any of SEQ ID NOs: 73-120, 456-476, and 547-631; optionally wherein the modified Type V crRNA consists of a sequence set forth in any of SEQ ID NOs: 73-120, and 456-476, and 547-631. In some embodiments, the modified Type V crRNA comprises a sequence having at least about 75%, about 80%, about 85%, about 90%, about 95%or about 97%sequence identity to a sequence set forth in Table 3; optionally wherein the modified Type V crRNA comprises a sequence set forth in Table 3; optionally wherein the modified Type V crRNA consists of a sequence set forth in Table 3.
In some embodiments, one or more nucleotides of the modified Type V crRNA molecule are methylated. In some embodiments, one or more nucleotides from the 5' end of the modified Type V crRNA molecule to the last nucleotide at 3’ end of the loop structure of the stem-loop are methylated. In some embodiments, all nucleotides from the 5' end of the modified Type V crRNA molecule to the last nucleotide at 3’ end of the loop structure of the stem-loop are methylated. In some embodiments, no nucleotide from the last nucleotide at 3’ end of the loop structure of the stem-loop to the 3' end of the stem-loop is methylated. In some embodiments, one or more nucleotides of the 3' end of the modified Type V crRNA molecule are methylated. In some embodiments, up to 3 nucleotides of the 3' end of the modified Type V crRNA molecule are methylated.
In one aspect, provided herein is a non-naturally occurring Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) –Cas system comprising:
(a) a CRISPR effector protein or a polynucleotide encoding the CRISPR effector protein; and
(b) a guide molecule or a polynucleotide encoding the guide molecule, wherein the guide molecule comprises the crRNA provided herein, or the modified Type V crRNA provided herein.
In some embodiments, the CRISPR effector protein comprises a RuvC-like endonuclease domain.
In some embodiments, the RuvC-like endonuclease domain comprises one or more RuvC motifs selected from a RuvC I motif, a RuvC II motif and RuvC III motif. In some embodiments, the RuvC I motif comprises the amino acid sequence of X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N. In some embodiments, the RuvC II motif comprises the amino acid sequence of X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D. In some embodiments, the RuvC III motif comprises the amino acid sequence of X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H; X2 is A, S, R, or G; X is any amino acid; X3 is N, V, I, or E; X4 is A, G, S, or K; X5 is A, or S; X6 is N, H, V, or G; X7 is I, L, or V; X8 is A, G, or L.
In some embodiments, the CRISPR effector protein does not contain an HNH-like domain.
In some embodiments, the CRISPR effector protein comprises a zinc-finger protein domain, optionally the Zinc finger domain is inserted in the RuvC-like endonuclease domain.
In some embodiments, the CRISPR effector protein further comprises a wedge (WED) domain.
In some embodiments, the CRISPR effector protein further comprises a REC domain.
In some embodiments, the CRISPR effector protein is less than about 1400 amino acids in length. In some embodiments, the CRISPR effector protein is less than about 1300 amino acids in length. In some embodiments, the CRISPR effector protein is less than about 1200 amino acids in length. In some embodiments, the CRISPR effector protein is less than about 1100 amino acids in length. In some embodiments, the CRISPR effector protein is more than about 175 amino acids in length. In some embodiments, the CRISPR effector protein is more than about 200 amino acids in length. In some embodiments, the CRISPR effector protein is more than about 225 amino acids in length. In some embodiments, the CRISPR effector protein is more than about 250 amino acids in length.
In some embodiments, the CRISPR effector protein is capable of recognizing a T-rich protospacer adjacent motif (PAM) ; optionally wherein the T-rich PAM comprises the nucleic acid sequence of 5’ -TTN-3’ or 5’ -NTN-3’ , wherein N is selected from A, T, C, G, and U; wherein optionally the PAM is selected from 5’ -TTA-3’ , 5’ -TTT-3’ , 5’ -TTG-3’ , 5’ -TTC-3’ , 5’ -ATA-3’ , and 5’ -ATG-3’ .
In some embodiments, the CRISPR effector protein is a Type V CRISPR effector protein.
In some embodiments, the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12b1 (C2c1) , Cas12b2, Cas12c (C2c3) , Cas12d (CasY) , Cas12e (CasX) , Cas12f1 (Cas14a) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12g, Cas12h, Cas12i, Cas 12j (CasΦ-2) , Cas12k (C2c5) , Cas 12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof.
In some embodiments, the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12d (CasY) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12h, Cas12i, Cas 12j (CasΦ-2) , Cas12k (C2c5) , Cas 12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof.
In some embodiments, the Type V CRISPR effector protein comprises the amino acid sequence of any one of SEQ ID NOs: 168-381 and 436, or a functional derivative having sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%thereto. In some embodiments, the Type V CRISPR effector protein comprises the amino acid sequence selected from Table 4, or a functional derivative having sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%thereto.
In some embodiments, the CRISPR effector protein is fused to a signal peptide, a nuclear localization signal (NLS) , or a nuclear export signal (NES) .
In some embodiments, the CRISPR effector protein is fused to a deaminase catalytic domain, a DNA methylation catalytic domain, a DNA demethylation catalytic domain, a histone residue modification domain, a nuclease catalytic domain, a fluorescent protein, a transcription modification factor; optionally wherein the deaminase catalytic domain is selected from the group consisting of an adenosine deaminase catalytic domain and a cytidine deaminase catalytic domain.
In some embodiments, the CRISPR effector protein is fused to a reverse transcriptase, and wherein the CRISPR-Cas system further comprises a donor template nucleic acid, wherein optionally the donor template nucleic acid is a DNA or RNA.
In some embodiments, the CRISPR effector protein is fused to a DNA methylation domain, a histone residue modification domain, a localization factor, a transcription modification factor, a light-gated control factor, a chemically inducible factor, or a chromatin visualization factor.
In some embodiments, the polynucleotide encoding the CRISPR effector protein and/or the polynucleotide encoding the guide molecule are mRNA molecules.
In some embodiments, the mRNA encoding the CRISPR effector protein and the mRNA encoding the guide molecule are present in a delivery system selected from the group consisting of a lipid nanoparticle, a liposome, an exosome, a micro-vesicles, and a gene-gun.
In some embodiments, the polynucleotide encoding the CRISPR effector protein and/or the polynucleotide encoding the guide molecule are operably linked to a promoter.
In some embodiments, the polynucleotide encoding the CRISPR effector protein and/or the polynucleotide encoding the guide molecule are in a vector selected from a retroviral vector, a lentiviral vector, a phage vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.
In some embodiments, the system lacks a tracrRNA.
In some embodiments, the system further comprises a target DNA or a nucleic acid encoding the target DNA, wherein the target DNA comprises a sequence that is capable of hybridizing to the spacer region of the guide molecule.
In some embodiments, the CRISPR effector protein and the guide molecule form a complex that associates with the target nucleic acid, thereby modifying the target nucleic acid.
In some embodiments, the spacer region is between about 15 and about 50 nucleotides in length.
In one aspect, provided herein is a cell comprising the system provided herein. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a prokaryotic cell.
In one aspect, provided herein is a method of targeting and nicking a non-spacer complementary strand of a double-stranded target nucleic acid upon recognition of a spacer complementary strand of the double-stranded target nucleic acid, the method comprising contacting the double-stranded target DNA with a system provided herein.
In one aspect, provided herein is a method of targeting and cleaving a double-stranded target nucleic acid, comprising contacting the double-stranded target DNA with a system provided herein.
In some embodiments, a non-spacer complementary strand of the double-stranded target nucleic acid is nicked before the spacer complementary strand of the double-stranded target nucleic acid is nicked.
In some embodiments, both strands of target DNA are cleaved at different sites, resulting in a staggered cut. In some embodiments, both strands of target DNA are cleaved at the same site, resulting in a blunt double-strand break.
In one aspect, provided herein is a method of targeting and cleaving a single-stranded target DNA, the method comprising contacting the target nucleic acid with a system provided herein.
In one aspect, provided herein is a method of detecting a target nucleic acid in a sample, the method comprising:
(a) contacting the sample with a system provided herein under a suitable condition to form a tertiary complex comprising the CRISPR effector protein, the guide molecule, and the target nucleic acid,
(b) contacting a labeled detector nucleic acid that is single-stranded and does not hybridize with the guide molecule; and
(c) measuring a detectable signal produced by cleavage of the labeled detector by the CRISPR effector protein, thereby detecting the target DNA.
In some embodiments, the method further comprises comparing a level of the detectable signal with a reference signal level, and determining an amount of target nucleic acid in the sample based on the level of the detectable signal. In some embodiments, the measuring is performed using gold nanoparticle detection, fluorescence polarization, colloid phase transition/dispersion, electrochemical detection, or semiconductor based-sensing. In some embodiments, the labeled reporter nucleic acid comprises a fluorescence-emitting dye pair, a fluorescence resonance energy transfer (FRET) pair, or a quencher/fluorophore pair, wherein cleavage of the labeled reporter nucleic acid by the effector protein results in an increase or a decrease of the amount of signal produced by the labeled reporter nucleic acid.
In one aspect, provided herein is a method of specifically editing a double-stranded nucleic acid, the method comprising contacting, under sufficient conditions and for a sufficient amount of time,
(a) a CRISPR effector protein and one other enzyme with sequence-specific nicking activity, and a guide molecule that guides the CRISPR effector protein to nick the opposing strand relative to the activity of the other sequence-specific nickase; and
(b) the double-stranded nucleic acid;
wherein the method results in the formation of a double-stranded break.
In one aspect, provided herein is a method of editing a double-stranded nucleic acid, the method comprising contacting, under sufficient conditions and for a sufficient amount of time,
(a) a fusion protein comprising a CRISPR effector protein and a protein domain with DNA modifying activity and a guide molecule targeting the double-stranded nucleic acid; and
(b) the double-stranded nucleic acid;
wherein the CRISPR effector protein of the fusion protein is modified to nick a non-target strand of the double-stranded nucleic acid.
In one aspect, provided herein is a method of inducing genotype-specific or transcriptional-state-specific cell death or dormancy in a cell, the method comprising contacting a cell with a system provided herein, wherein the guide molecule hybridizing to the target DNA causes a collateral DNase activity-mediated cell death or dormancy. In some embodiments, the cell is a prokaryotic cell. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a cancer cell. In some embodiments, the cell is an infectious cell or a cell infected with an infectious agent. In some embodiments, the cell is a cell infected with a virus, a cell infected with a prion, a fungal cell, a protozoan, or a parasite cell.
In one aspect, provided herein is a method of treating a condition or disease in a subject in need thereof, the method comprising administering to the subject a system provided herein, wherein the spacer sequence is complementary to at least 15 nucleotides of a target nucleic acid associated with the condition or disease; wherein the CRISPR effector protein associates with the guide molecule to form a complex; wherein the complex binds to a target nucleic acid sequence that is complementary to the at least 15 nucleotides of the spacer sequence; and wherein upon binding of the complex to the target nucleic acid sequence the CRISPR effector protein cleaves the target nucleic acid, thereby treating the condition or disease in the subject.
In some embodiments, the condition or disease is a cancer or an infectious disease.
In some embodiments, the condition or disease is selected from the group consisting of Cystic Fibrosis, Duchenne Muscular Dystrophy, Becker Muscular Dystrophy, Alpha-1 -antitrypsin Deficiency, Pompe Disease, Myotonic Dystrophy, Huntington Disease, Fragile X Syndrome, Friedreich's ataxia, Amyotrophic Lateral Sclerosis, Frontotemporal Dementia, Hereditary Chronic Kidney Disease, Hyperlipidemia, Hypercholesterolemia, Leber Congenital Amaurosis, Sickle Cell Disease, and Beta Thalassemia, Familial Hypercholesterolemia (FH) , Transthyretin Amyloidosis (ATTR) , Primary Hyperoxaluria (PH1) , Hereditary Angioedema (HAE) , and Atherosclerotic Cardiovascular Disease (ASCVD) .
In some embodiments, the condition or disease is cancer, and wherein the cancer is selected from the group consisting of Wilms'tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, and urinary bladder cancer.
In some embodiments, the condition or disease is infectious, and wherein the infectious agent is selected from the group consisting of human immunodeficiency virus (HIV) , herpes simplex virus-l (HSV1) , and herpes simplex virus-2 (HSV2) , Hepatitis B.
In one aspect, the system provided herein or the cell provided herein is for use as a medicament.
In one aspect, the system provided herein or the cell provided herein is for use in the treatment or prevention of a cancer or an infectious disease.
In some embodiments, the cancer is selected from the group consisting of Wilms'tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, and urinary bladder cancer.
In one aspect, provided herein is use of the system provided herein or cell provided herein for an in vitro or ex vivo method of:
a) targeting and editing a target nucleic acid;
b) non-specifically degrading single-stranded DNA upon recognition of a DNA target nucleic acid;
c) targeting and nicking a non-spacer complementary strand of a double-stranded target DNA upon recognition of a spacer complementary strand of the double-stranded target DNA;
d) targeting and cleaving a double-stranded target DNA;
e) detecting a target nucleic acid in a sample;
f) specifically editing a double-stranded nucleic acid;
g) base editing a double-stranded nucleic acid;
h) inducing genotype-specific or transcriptional-state-specific cell death or dormancy in a cell.
i) creating an indel in a double-stranded target DNA;
j) inserting a sequence into a double-stranded target DNA, or
k) deleting or inverting a sequence in a double-stranded target DNA.
In one aspect, provided herein is use of the provided herein or cell provided herein in a method of:
a) targeting and editing a target nucleic acid;
b) non-specifically degrading single-stranded DNA upon recognition of a DNA target nucleic acid;
c) targeting and nicking a non-spacer complementary strand of a double-stranded target DNA upon recognition of a spacer complementary strand of the double-stranded target DNA;
d) targeting and cleaving a double-stranded target DNA;
e) detecting a target nucleic acid in a sample;
f) specifically editing a double-stranded nucleic acid;
g) base editing a double-stranded nucleic acid;
h) inducing genotype-specific or transcriptional-state-specific cell death or dormancy in a cell;
i) creating an indel in a double-stranded target DNA;
j) inserting a sequence into a double-stranded target DNA, or
k) deleting or inverting a sequence in a double-stranded target DNA,
wherein the method does not comprise a process for modifying the germ line genetic identity of a human being and does not comprise a method of treatment of the human or animal body.
In some embodiment of the method provided herein, cleaving the target DNA or target nucleic acid results in the formation of an indel.
In some embodiment of the method provided herein cleaving the target DNA or target nucleic acid results in the insertion of a nucleic acid sequence.
In some embodiment of the method provided herein cleaving the target DNA or target nucleic acid comprises cleaving the target DNA or target nucleic acid in two sites, and results in the deletion or inversion of a sequence between the two sites.
In one aspect, provided herein is a eukaryotic cell comprising a modified target locus of interest, wherein the target locus of interest has been modified according to a method or via use of a composition of any one of the preceding claims.
In some embodiments, the modification of the target locus of interest results in:
(i) the eukaryotic cell comprising altered expression of at least one gene product;
(ii) the eukaryotic cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is increased;
(iii) the eukaryotic cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is decreased; or
(iv) the eukaryotic cell comprising an edited genome.
In some embodiments, the eukaryotic cell comprises a mammalian cell.
In some embodiments, the mammalian cell comprises a human cell.
In one aspect, provided herein is a eukaryotic cell line of or comprising the eukaryotic cell provided herein, or progeny thereof.
In one aspect, provided herein is a multicellular organism comprising one or more cells provided herein.
In one aspect, provided herein is a plant or animal model comprising one or more cells provided herein.
In one aspect, provided herein is a method of producing a plant, having a modified trait of interest encoded by a gene of interest, the method comprising contacting a plant cell with a system provided herein, thereby either modifying or introducing said gene of interest, and regenerating a plant from the plant cell.
In one aspect, provided herein is a method of identifying a trait of interest in a plant, wherein the trait of interest is encoded by a gene of interest, the method comprising contacting a plant cell with a system provided herein, thereby identifying the gene of interest.
In some embodiments, the method further comprises introducing the identified gene of interest into a plant cell or plant cell line or plant germ plasm and generating a plant therefrom, whereby the plant contains the gene of interest. In some embodiments, the plant exhibits the trait of interest.
4. BRIEF DESCRIPTION OF THE FIGURES
4. BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 shows Class 2 (Types II, V, and VI) classification of CRISPR-Cas systems.
FIGs. 2A and 2B show an exemplary conventional DR-crRNA (FIG. 2A) and a stem-loop (sl) modified slDR-crRNA (FIG. 2B) according to the present disclosure.
FIG. 3 shows the secondary structure of Direct Repeat (DR) corresponding to CasY7.
FIGs. 4A and 4B show plasmid maps of CasY7 recombinant expression plasmid (FIG. 4A) and LbCpf1 recombinant expression plasmid (FIG. 4B) .
FIG. 5 shows plasmid map of the Target plasmid.
FIG. 6 shows the results of editing efficiency of CasY7 and LbCpf1 in E. Coli.
FIG. 7 shows plasmid map of the PHK09T vector.
FIG. 8 shows the results of editing efficiency of CasY7 and LbCpf1 in 293T cells.
FIG. 9 shows that slDR-crRNA (crRNA2) can significantly increase the editing efficiency of CasY7 at the hHao1 locus (approximately 5-10 times) compared with the conventional crRNA sequences (DR-crRNA) (crRNA1) .
FIG. 10 shows domain architectures of exemplary Type V CRISPR effector proteins.
FIG. 11 shows an exemplary DR sequence and secondary structure correspond to Cas12a.
FIG. 12 shows an exemplary DR sequence and secondary structure correspond to Cas12i.
FIGs. 13A-13N show exemplary first stem-loop sequences and secondary structures.
FIGs. 14A-14ZZ show exemplary DR sequences and secondary structures.
FIGs. 15A-15F show additional exemplary first stem-loop sequences and secondary structures.
FIGs. 16A-16F show the statistical results of the editing efficiency mediated by various crRNAs.
FIGs. 17A and 17B show exemplary methylation pattern of the stem-loop sequences.
Provided herein are new designs of a CRISPR guide molecule of a CRISPR-Cas system, particular a Type V CRISPR-Cas system. Particularly, through screening of direct repeat (DR) variants, the inventors discovered that manipulating secondary structural features of the guide molecule, including the addition of one or more stem-loop structure (s) to the 5’ end of the DR region of the guide molecule, can significantly enhance target recognition and cleavage efficiency of a CRISPR-Cas system utilizing such modified guide molecule. The guide molecule design disclosed herein is largely independent of the spacer sequence and tolerates variations in the DR sequence, demonstrating broad adaptability of the present guide molecule design to various CRISPR-Cas systems, including its use with various Type V CRISPR effector proteins, such as many subclasses of the Cas12 proteins.
5.1 Definitions
5.1 Definitions
Unless described otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art. For purposes of interpreting this specification, the following description of terms will apply and whenever appropriate, terms used in the singular will also include the plural and vice versa. All patents, applications, published applications, and other publications are incorporated by reference in their entirety. In the event that any description of terms set forth conflicts with any document incorporated herein by reference, the description of term set forth below shall control.
As used herein, and unless otherwise indicated, the term “about” or “approximately” means an acceptable error for a particular value as determined by one of ordinary skill in the art, which depends in part on how the value is measured or determined. In certain embodiments, the term “about” or “approximately” means within 1, 2, 3, or 4 standard deviations. In certain embodiments, the term “about” or “approximately” means within 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.05%, or less of a given value or range. As used herein, when “about” is used in connection with a numerical range, the term “about” is meant to apply to both ends of such modified range (e.g., “about 5 to 10” means “about 5 to about 10” ) . When “about” is used in connection with a serial of numerical values, the term “about” is meant to apply to each value in the series (e.g., “about 75%, 80%, or 85%” means “about 75%, about 80%, or about 85%” ) .
The terms “complement” or “complementary” can be determined by the Watson-Crick base pairing between nucleotides and specifically refers to nucleotides hydrogen bonded to one another with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds. In general, a nucleic acid includes a nucleotide sequence described as having a “percent complementarity” to a specified second nucleotide sequence. For example, a nucleotide sequence may have 80%, 90%, or 100%complementarity to a specified second nucleotide sequence, indicating that 8 of 10, 9 of 10 or 10 of 10 nucleotides of a sequence are complementary to the specified second nucleotide sequence. For instance, the nucleotide sequence 3’ -TCGA-5’ is 100%complementary to the nucleotide sequence 5’ -AGCT-3’ . Further, the nucleotide sequence 3’ -TCGA-is 100%complementary to a region of the nucleotide sequence 5’ -TTAGCTGG-3’ .
The term “reverse complementary” means two nucleic acid sequences complement to each other when read in opposite directions. A pair of reverse complementary sequences can be in separated nucleic acid molecules or in different regions of a single nucleic acid molecule. In the latter case, the nucleic acid molecule is considered “self-complementary. ” As used herein a “self-complementary” nucleic acid molecule can have at least two regions that are complementary or substantially complementary to each other when read in opposite directions. Under a suitable condition, a pair of reverse-complementary regions are capable of base-pairing with each other to form a double-stranded duplex, and the sequence between the reverse-complementary regions is bend into an unpaired loop. The resulting structure is referred to as a “stem-loop, ” a “hairpin, ” or a “hairpin loop, ” which is a secondary structure found in many self-complementary molecules.
As used herein, the term “stem-loop sequence” refers to a single-stranded polynucleotide sequence having at least two regions that are complementary or substantially complementary to each other when read in opposite directions, and thus capable of base-pairing with each other to form at least one double helix (referred to herein as a “stem” ) and an unpaired loop. The resulting structure is known as a stem-loop structure, a hairpin, or a hairpin loop, which is a secondary structure found in many RNA molecules. Such structures are well known in the art, and these terms are used in accordance with their commonly known meanings in the art. In some embodiments, stem-loop structures do not require precise base pairing in the stem region. In some embodiments, the stem may comprise one or more base mismatches ( “bulges” ) . In some embodiments, stem-loop structures require precise base pairing. In some embodiments, the stem base pairing does not include any mismatches.
The terms “duplexed, ” “double-stranded, ” or “hybridized” as used herein refer to multiple nucleic acid molecules or a region of a single nucleic acid molecule (e.g., the stem region in a stem-loop structure) that is formed by hybridization of two single strands of nucleic acids containing complementary sequences. As described herein, a pair of complementary sequences can be fully complementary or partially complementary.
The terms “hybridization” and “hybridizes” refer to pairing and binding of complementary nucleic acids. Hybridization occurs to varying extents between two nucleic acids depending on factors such as the degree of complementarity of the nucleic acids, the melting temperature, Tm, of the nucleic acids and the stringency of hybridization conditions, as is well known in the art. The term “stringency of hybridization conditions” refers to conditions of temperature, ionic strength, and composition of a hybridization medium with respect to particular common additives such as formamide and Denhardt's solution. Determination of particular hybridization conditions relating to a specified nucleic acid is routine and is well known in the art, for instance, as described in J. Sambrook and D. W. Russell, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press; 3rd Ed., 2001; and F. M. Ausubel, Ed., Short Protocols in Molecular Biology, Current Protocols; 5th Ed., 2002. High stringency hybridization conditions are those which only allow hybridization of substantially complementary nucleic acids. Typically, nucleic acids having about 85-100%complementarity are considered highly complementary and hybridize under high stringency conditions. Intermediate stringency conditions are exemplified by conditions under which nucleic acids having intermediate complementarity, about 50-84%complementarity, as well as those having a high degree of complementarity, hybridize. In contrast, low stringency hybridization conditions are those in which nucleic acids having a low degree of complementarity hybridize.
A “modification” of an amino acid residue/position refers to a change of a primary amino acid sequence as compared to a starting amino acid sequence, wherein the change results from a sequence alteration involving said amino acid residue/position. For example, typical modifications include substitution of the residue with another amino acid (e.g., a conservative or substantial substitution) , insertion of one or more (e.g., generally fewer than 5, 4, or 3) amino acids adjacent to said residue/position, and/or deletion of said residue/position.
In the context of a peptide or polypeptide, the term “derivative” as used herein refers to a peptide or polypeptide that comprises an amino acid sequence of the peptide or polypeptide, or a fragment of a peptide or polypeptide, which has been altered by the introduction of amino acid residue substitutions, deletions, or additions. The term “derivative” as used herein also refers to a peptide or polypeptide, or a fragment of a peptide or polypeptide, which has been chemically modified, e.g., by the covalent attachment of any type of molecule to the polypeptide. For example, but not by way of limitation, a peptide or polypeptide or a fragment of the peptide or polypeptide may be chemically modified, e.g., by glycosylation, acetylation, pegylation, phosphorylation, amidation, derivatization by known protecting/blocking groups, proteolytic cleavage, chemical cleavage, formulation, metabolic synthesis of tunicamycin, linkage to a cellular ligand or other protein, etc. The derivatives are modified in a manner that is different from naturally occurring or starting peptide or polypeptides, either in the type or location of the molecules attached. Derivatives further include deletion of one or more chemical groups which are naturally present on the peptide or polypeptide. Further, a derivative of a peptide or polypeptide or a fragment of a peptide or polypeptide may contain one or more non-classical amino acids. In specific embodiments, a derivative is a functional derivative of the native or unmodified peptide or polypeptide (e.g., a wild-type protein) from which it was derived.
The term “functional derivative” refers to a derivative that retains one or more functions or activities of the naturally occurring or starting peptide or polypeptide (e.g. a wild-type protein) from which it is derived. For example, in some embodiments, a functional derivative of a Cas protein as described herein (e.g., Cas12i) may retain the activity of a Cas protein, including e.g., the ability to form a binary complex with crRNA and/or a tertiary complex with crRNA and a target sequence, to recognize PAM signatures, and to nick or cleave the target sequence. In some embodiments, a functional derivative of a peptide or polypeptide described herein shares at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99%sequence identity with respect to the starting (e.g., wild-type) peptide or polypeptide.
The term “sequence identity” refers to a relationship between the sequences of two or more biological molecules (e.g., a pair of polynucleotides or multiple polypeptides) , as determined by aligning and comparing the respective sequences. “Percent (%) amino acid sequence identity” with respect to a reference amino acid sequence (e.g., a reference polypeptide) is defined as the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in the reference amino acid sequence, after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity. Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, or MEGALIGN (DNAStar, Inc. ) software. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. Exemplary parameters for determining relatedness of two or more sequences using the BLAST algorithm, for example, can be as set forth below. Briefly, amino acid sequence alignments can be performed using BLASTP version 2.0.8 (Jan-05-1999) and the following parameters: Matrix: 0 BLOSUM62; gap open: 11; gap extension: 1; x_dropoff: 50; expect: 10.0; wordsize: 3; filter: on. Nucleic acid sequence alignments can be performed using BLASTN version 2.0.6 (Sept-16-1998) and the following parameters: Match: 1; mismatch: -2; gap open: 5; gap extension: 2; x_dropoff: 50; expect: 10.0; wordsize: 11; filter: off. Those skilled in the art will know what modifications can be made to the above parameters to either increase or decrease the stringency of the comparison, for example, and determine the relatedness of two or more sequences.
The terms “CRISPR-associated protein, ” “Cas protein, ” and “CRISPR effector protein” are used interchangeably herein to refer to any of the proteins presented in, and/or meet the criteria of, the classification of CRISPR-Cas systems (See P. Mohanraju et al, “Diverse evolutionary roots and mechanistic variations of the CRISPR-Cas systems. ” Science. 2016 Aug 5; 353 (6299) : aad5147; Makarova et al. “Evoluntionary classification of CRISPR-Cas systems; a burst of class 2 and derived variants. ” Nat Rev Microbiol. 2020 Feb; 18 (2) : 67-83. ” ) .
As used herein “Type V CRISPR effector proteins” encompass endogenously encoded Cas proteins identified in a Type V CRISPR-Cas loci. Many of the Type V CRISPR-Cas loci have been well characterized in the art. For example, the first characterized Type V CRISPR-Cas system is a Cas12a (Cpf1) -encoding loci denoted as subtype V-A, which encompasses cas1, cas2, a gene denoted cpf1 and a CRISPR array. Cpf1 (CRISPR-associated protein Cpf1, subtype PREFRAN) is a large protein (about 1300 amino acids) that contains a RuvC-like nuclease domain homologous to the corresponding domain of Cas9 along with a counterpart to the characteristic arginine-rich cluster of Cas9. Cpf1 lacks the HNH nuclease domain that is present in all Cas9 proteins, and the RuvC-like domain is contiguous in the Cpf1 sequence, in contrast to Cas9 where it contains long inserts including the HNH domain. Accordingly, in some embodiments, the CRISPR-Cas protein described herein comprises a RuvC-like nuclease domain and lacks a NHN nuclease domain. (See Zhang et al., “Cpf1 is a single RNA-guided endonuclease of a Class 2 CRISPR-Cas system” Cell. 2015 October 22; 163 (3) : 759–771. ) . See also Yan et al. “Functionally diverse type V CRISPR-Cas systems” Science 10.1126/science. aav7271 (2018) for Cas12g-encoding loci denoted as subtype V-G, Cas12C-encoding loci denoted as subtype V-C, Cas12i-encoding loci denoted as subtype V-I, and Cas12h-encoding loci denoted as subtype V-H. Known Type V CRISPR effector proteins include Cas12a (Cpf1) , Cas12b1 (C2c1) , Cas12b2, Cas12c (C2c3) , Cas12d (CasY) , Cas12e (CasX) , Cas12f1 (Cas14a) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12g, Cas12h, Cas12i, Cas 12j (CasΦ-2) , Cas12k (C2c5) , Cas 12l, C2c4, C2c8, C2c9, and C2c10.
Furthermore, as used herein, Type V CRISPR effector proteins also encompass functional derivatives of an endogenously encoded Type V CRISPR effector protein or artificially designed proteins that (i) retain the function of a Cas protein (e.g., ability to form a binary complex with crRNA and/or a tertiary complex with crRNA and a target sequence, to recognize PAM signatures, and to nick or cleave the target sequence, etc. ) and (ii) meet the criteria for Type V classification as known in the art. In some embodiments, a Type V CRISPR effector protein has a RuvC-like nuclease domain but lacks a HNH domain. In some embodiments, the RuvC-like endonuclease domain comprises one or more RuvC motifs selected from a RuvC I motif, a RuvC II motif and RuvC III motif. In some embodiments, the RuvC I motif comprises the amino acid sequence of X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N. In some embodiments, the RuvC I motif comprises the amino acid sequence of X1XDXNX6X7XXXX11, wherein X1 is A or G or S, X is any amino acid, X6 is Q or I, X7 is T or S or V, X11 is T or A. In some embodiments, the RuvC II motif comprises the amino acid sequence of X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D. In some embodiments, the RuvC II motif comprises the amino acid sequence of X1X2X3E, wherein X1 is C or F or I or L or M or P or V or W or Y, X2 is C or F or I or L or M or P or R or V or W or Y, and X3 is C or F or G or I or L or M or P or V or W or Y. In some embodiments, the RuvC III motif comprises the amino acid sequence of X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H; X2 is A, S, R, or G; X is any amino acid; X3 is N, V, I, or E; X4 is A, G, S, or K; X5 is A, or S; X6 is N, H, V, or G; X7 is I, L, or V; X8 is A, G, or L. In some embodiments, the RuvC III motif comprises the amino acid sequence of X1SHX4DX6X7, wherein X1 is S or T, X4 is Q or L, X6 is P or S, and X7 is F or L. In some embodiments, the RuvC-like nuclease domain comprises one or more RuvC motifs selected from a RuvC I motif: X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N; a RuvC II motif: X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D; and a RuvC III motif: X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H; X2 is A, S, R, or G; X is any amino acid; X3 is N, V, I, or E; X4 is A, G, S, or K; X5 is A, or S; X6 is N, H, V, or G; X7 is I, L, or V; X8 is A, G, or L. In some embodiments, the RuvC-like nuclease domain comprises one or more RuvC motifs selected from a RuvC I motif: X1XDXNX6X7XXXX11, wherein X1 is A or G or S, X is any amino acid, X6 is Q or I, X7 is T or S or V, X11 is T or A; a RuvC II motif: X1X2X3E, wherein X1 is C or F or I or L or M or P or V or W or Y, X2 is C or F or I or L or M or P or R or V or W or Y, and X3 is C or F or G or I or L or M or P or V or W or Y; and a RuvC III motif: X1SHX4DX6X7, wherein X1 is S or T, X4 is Q or L, X6 is P or S, and X7 is F or L. In some embodiments, the RuvC-like nuclease domain is continuous in the amino acid sequence of the Type V CRISPR effector. In some embodiments, the RuvC-like nuclease domain is discontinued in the amino acid sequence of the Type V CRISPR effector protein.
In some embodiments, a Type V CRISPR effector protein has a zinc-finger protein domain. In some embodiments, the RuvC-like nuclease domain are discontinued segments in the amino acid sequence of a Type V CRISPR effector protein, where the zinc-finger protein domain sequence is placed in between the RuvC-like nuclease domain sequences.
In some embodiments, a Type V CRISPR effector protein has a Wedge (WED) domain. In some embodiments, a Type V CRISPR effector protein has a REC domain.
In some embodiments, a Type V CRISPR effector protein meets the criteria of being capable of forming a binary complex with a crRNA without a tracr sequence. In some embodiments, a Type V CRISPR effector protein meets the criteria of being capable of forming a binary complex with a crRNA, and being guided by the crRNA to complex with a target polynucleotide sequence adjacent to a protospacer adjacent motif (PAM) , without a tracr sequence. In some embodiments, a Type CRISPR effector protein meets the criteria of being able to recognize a 5’ -TTN-3’ or 5’ -NTN-3’ PAM, where N can be any nucleotide selected from A, T, C, G, and U.
In some embodiments, a Type V CRISPR effector protein is less than about 1400, 1300, 1200, 1100, 1000, 900, 800 , 750, 700, 650, 600, 550, 500, 450, 400, 350, 300, 250, or 200, amino acids in length. In some embodiments, a Type V CRISPR effector protein is more than about 400, 350, 325, 300, 275, 250, 225, 200, or 175 amino acids in length.
As used herein, the term “tracr sequence” refers to trans-activating CRISPR RNA. As used herein, the term “tracrRNA” includes any polynucleotide sequence that has sufficient complementarity with a crRNA sequence to hybridize.
As used herein, the term “CRISPR array” refers to a nucleic acid (e.g., DNA) fragment comprising CRISPR repeats and spacers, which begins from the first nucleotide of the first CRISPR repeat and ends at the last nucleotide of the last (terminal) CRISPR repeat. Typically, each spacer in the CRISPR array is located between two repeats. As used herein, the term “CRISPR repeat” or “CRISPR direct repeat” or “direct repeat” refers to a plurality of short direct repeat sequences that exhibit very little or no sequence variation in a CRISPR array. Appropriately, Type-V CRISPR direct repeats may form a stem-loop structure. As used herein, a direct repeat sequence can be naturally existing or non-naturally existing (e.g., artificially engineered or synthesized) .
As used herein, the term “crRNA” is used interchangeably with guide molecule, gRNA, and guide RNA, and refers to nucleic acid-based molecules, which include but are not limited to RNA-based molecules capable of forming complexes with Cas proteins (e.g., any of Cas12 proteins described herein) (e.g., via direct repeat, DR) , and comprises sequences (e.g., spacers) that are sufficiently complementary to a target nucleic acid sequence to hybridize to the target nucleic acid sequence and guide sequence-specific binding of the complex to the target nucleic acid sequence. In case of a double-stranded target nucleic acid molecule, strand to which a spacer sequence in a crRNA hybridizes is referred to as the “spacer-complementary strand, ” and the other strand is referred to as the “non-spacer complementary strand. ” As used herein, a crRNA can be naturally existing or non-naturally existing (e.g., artificially engineered or synthesized) . According to the present disclosure, secondary structures (e.g., stem-loop structures) of DR play an important role in mediating CRISPR-Cas systems efficiency in the recognition and cleavage of its target.
As used herein, the term “Type V CRISPR RNA” or “Type V crRNA” are used interchangeably to refer to a crRNA that is capable of forming CRISPR-Cas complexes with Type V CRISPR effector proteins. As used herein, Type V crRNA encompasses artificially designed guide RNA molecules, as well as guide RNA molecules endogenously produced from a CRISPR array located adjacent to (e.g., to the 3’ end of) a Type V CRISPR-Cas loci.
As used herein, the term “non-naturally existing” refers to “not found in nature. ” For example, a non-naturally existing nucleic acid molecule as described herein is intended to mean that the nucleic acid molecule is not found in nature. A non-naturally occurring nucleic acid encoding a peptide or protein contains at least one genetic alternation or chemical modification not normally found in nature.
As used herein, the term “functional derivative” refers to a derivative that retains one or more functions or activities of the naturally occurring or starting peptide or polypeptide from which, it was derived. In some embodiments, a functional derivative of a starting peptide or polypeptide has an amino acid sequence that is at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity to the starting peptide or polypeptide.
As used herein, the term “targeting” refers to the ability of a complex including a CRISPR-associated protein and a guide molecule, such as a crRNA, to preferentially or specifically bind to, e.g., hybridize to, a specific target nucleic acid compared to other nucleic acids that do not have the same or similar sequence as the target nucleic acid.
As used herein, the term “target nucleic acid” refers to a specific nucleic acid substrate that contains a nucleic acid sequence complementary to the entirety or a part of the spacer in a guide molecule. In some embodiments, the target nucleic acid comprises a gene or a sequence within a gene. in some embodiments, the target nucleic acid comprises a non-coding region (e.g., a promoter) in some embodiments, the target nucleic acid is single-stranded. In some embodiments, the target nucleic acid is double-stranded.
As used herein, the term “nick” or its grammatical variants such as “nicking” refers to the creation of a break in only one strand of a double-stranded nucleic acid molecule. The term “cleave” or its grammatical variant such as “cleaving” refers to the creation of breaks in both strands of a double-stranded nucleic acid molecule. In some embodiments, a cleaving event is the result of two sequential nicking events in the two strands, respectively.
As used herein, the term “donor template nucleic acid, ” as used herein refers to a nucleic acid molecule that can be used by one or more cellular proteins to alter the structure of a target nucleic acid after a CRISPR enzyme described herein has altered a target nucleic acid. In some embodiments, the donor template nucleic acid is a double-stranded nucleic acid. In some embodiments, the donor template nucleic acid is a single-stranded nucleic acid. In some embodiments, the donor template nucleic acid is linear. In some embodiments, the donor template nucleic acid is circular (e.g., a plasmid) . In some embodiments, the donor template nucleic acid is an exogenous nucleic acid molecule. In some embodiments, the donor template nucleic acid is an endogenous nucleic acid molecule (e.g., a chromosome) .
As used herein, the term “non-naturally occurring” when used in reference to a nucleic acid, a polypeptide, or a biological system (e.g., CRISPR-Cas system) as described herein is intended to mean that such nucleic acid, polypeptide, or biological system is not found in nature. In some embodiments, a non-naturally occurring nucleic acid or a polypeptide contains at least one genetic alternation or chemical modification not normally found in a naturally occurring counterpart, including a wild-type nucleic acid or polypeptide. In some embodiments, genetic alterations include, for example, modifications to a nucleic acid or polypeptide sequence. In some embodiments, chemical modifications include, for example, one or more functional nucleotide analog as described herein. In some embodiments, a non-naturally occurring biological system can contain a non-naturally occurring nucleic acid (e.g., a CRISPR guide molecule) , a non-naturally occurring polypeptide (e.g., a Cas protein) , or a combination of a nucleic acid and a polypeptide that is not found in nature.
The term “operably linked” as used herein refers to a nucleic acid sequence in functional relationship with a second nucleic acid sequence. The term “operably linked” encompasses functional connection of two or more nucleic acid sequences, such as a nucleic acid to be transcribed and a regulatory element. The term “regulatory element” as used herein refers to a nucleotide sequence which controls some aspect of the expression of an operably linked nucleic acid coding sequence.
5.2 CRISPR-Cas system
5.2 CRISPR-Cas system
The present disclosure provides a non-naturally occurring Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) –Cas system comprising: (a) a CRISPR effector protein or a polynucleotide encoding the CRISPR effector protein (e.g. a CRISPR effector protein of Section 5.2.2 (Cas proteins) ) ; and (b) a guide molecule or a polynucleotide encoding the guide molecule, wherein the guide molecule comprises a crRNA (e.g. a crRNA of Section 5.2.1 (crRNA) ) . In some embodiments, the CRISPR-Cas system described herein has significantly enhanced target recognition and cleavage efficiency compared to a naturally-occurring CRISPR-Cas system.
5.2.1 crRNA
5.2.1 crRNA
In one aspect, the CRISPR RNA (crRNA) described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region sequence, and a second stem-loop sequence. In some embodiments, the crRNA further comprises a spacer region 3’ to the second stem-loop sequence. In some embodiments, the crRNA further comprises a floater region 5’ to the first stem-loop sequence. In some embodiments, the crRNA forms a complex with Type V CRISPR effector proteins (e.g. a CRISPR effector protein of Section 5.2.2 (Cas proteins) ) . In some embodiments, the spacer directs the complex to a target nucleic acid that is complementary to the spacer for sequence-specific binding. In some embodiments, the crRNA described herein is a single-stranded polynucleotide.
(a) Secondary Structure
(a) Secondary Structure
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, and a second stem-loop sequence, wherein the first stem-loop sequence is capable of forming a first stem-loop structure, wherein the second stem-loop sequence is capable of forming a second stem-loop structure. In some embodiments, the crRNA further comprises a spacer region 3’ to the second stem-loop sequence. In some embodiments, the crRNA further comprises a floater region 5’ to the first stem-loop sequence. In some embodiments, the crRNA is capable of forming a complex with the Type V CRISPR effector proteins. In some embodiments, the spacer directs the binary complex containing the crRNA and the Type V CRISPR effector protein to a target nucleic acid that is complementary to the spacer for sequence-specific binding. In some embodiments, the binary complex comprising the crRNA and the Type V CRISPR effector protein is capable of binding and nicking a target nucleic acid. In some embodiments, the binary complex comprising the crRNA and the Type V CRISPR effector protein is capable of binding and cleaving a target nucleic acid. An exemplary of the crRNA secondary structure described herein is shown in FIG. 2B.
(i) First Stem Loop
(i) First Stem Loop
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, and a second stem-loop sequence.
In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure.
In some embodiments of the crRNA described herein, the first stem-loop sequence comprises at least about 9 nucleotides, such as 9, 10, 11, 12, 13, 14 or more nucleotides. In some embodiments, the first stem-loop sequence comprises at most about 28 nucleotides, such as 28, 27, 26, 25, 24, or less nucleotides. In some embodiments, the first stem-loop sequence comprises about 9 nucleotides to about 28 nucleotides. In some embodiments, the first stem-loop sequence comprises about 14 nucleotides to about 28 nucleotides. In some embodiments, the first stem-loop sequence comprises about 17 nucleotides to about 26 nucleotides. In some embodiments, the first stem-loop sequence comprises about 20 to about 24 nucleotides. In some embodiments, the first stem-loop sequence comprises about 20 nucleotides. In some embodiments, the first stem-loop sequence comprises about 21 nucleotides. In some embodiments, the first stem-loop sequence comprises about 22 nucleotides. In some embodiments, the first stem-loop sequence comprises about 23 nucleotides. In some embodiments, the first stem-loop sequence comprises about 24 nucleotides.
In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure. In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of at least about 3 base pairs, such as 3, 4, 5, 6, or more base pairs. In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of at most about 12 base pairs, such as 12, 11, 10, 9, or less base pairs. In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 3-12 base pairs. In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 4-11 base pairs. In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 5-11 base pairs. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 6-11 base pairs. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 4 base pairs. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 5 base pairs. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 6 base pairs. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 7 base pairs. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 8 base pairs. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 9 base pairs. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 10 base pairs.
In some embodiments, the first stem comprises one or more nucleotide mismatches. In some embodiments, the first stem comprises 5%-40%nucleotide mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) . In some embodiments, the first stem comprises 5%-30%nucleotide mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) . In some embodiments, the first stem comprises 5%-20%nucleotide mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) . In some embodiments, the first stem may comprise 5%-10%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) . In some embodiments, the first stem may comprise 1 or 2 bulges. In some embodiments, the first stem may comprise 1 bulge. In some embodiments, the first stem may comprise 2 bulges. In some embodiments, the first stem-loop structure contains precise base pairing. In some embodiments, the first stem does not contain any bulges.
In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of at least about 3 nucleotides, such as 3, 4, 5, 6, or more nucleotides. In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of at most about 10 nucleotides, such as 10, 9, 8, 7, or less nucleotides. In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of 3-10 nucleotides. In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 3-9 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 3-8 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 3-7 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 3-6 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 3 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 4 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 5 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 6 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 7 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 8 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 9 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first loop of about 10 nucleotides.
In some embodiments of the crRNA described herein, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 3-12 base pairs and a first loop of about 3-10 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 6-11 base pairs and a first loop of about 3-6 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 4 base pairs and a first loop of about 6 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 5 base pairs and a first loop of about 4 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 5 base pairs and a first loop of about 10 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 6 base pairs and a first loop of about 4 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 7 base pairs and a first loop of about 4 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 7 base pairs and a first loop of about 5 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 8 base pairs and a first loop of about 4 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 9 base pairs and a first loop of about 6 nucleotides. In some embodiments, the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 10 base pairs and a first loop of about 4 nucleotides.
In some embodiments, the first stem-loop structure comprises 5’ -X1X2X3X4NNnNN X5X6X7X8-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4 and X5X6X7X8 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4-3’ and 5’ -X5X6X7X8-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4 and X5X6X7X8 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4 and X5X6X7X8 contains about 5%-40%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the first stem-loop structure comprises 5’ -X1X2X3X4X5NNnNNX6X7X8X9X10-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5 and X6X7X8X9X10 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5-3’ and 5’ -X6X7X8X9X10-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5 and X6X7X8X9X10 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5 and X6X7X8X9X10 contains about 5%-30%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the first stem-loop structure comprises 5’ -X1X2X3X4X5X6NNnNNX7X8X9X10X11X12-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6 and X7X8X9X10X11X12 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5X6-3' and 5’ -X7 X8X9X10X11X12-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5X6 and X7X8X9X10X11X12 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5X6 and X7X8X9X10X11X12 contains about 5%-30%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the first stem-loop structure comprises 5’ -X1X2X3X4X5X6X7NNnNNX8X9X10X11X12X13X14-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6X7 and X8X9X10X11X12X13X14 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5X6X7-3’ and 5’ -X8X9X10X11X12X13X14-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5X6X7 and X8X9X10X11X12X13X14 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5X6X7 and X8X9X10X11X12X13X14 contains about 5%-20%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the first stem-loop structure comprises 5’ -X1X2X3X4X5X6X7X8NNnNNX9X10X11X12X13X14X15X16-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16 can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6X7X8 and X9X10X11X12X13X14X15X16 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5X6X7X8-3’ and 5’ -X9X10X11X12X13X14X15X16-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5X6X7X8 and X9X10X11X12X13X14X15X16 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5X6X7X8 and X9X10X11X12X13X14X15X16 contains about 5%-20%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the first stem-loop structure comprises 5’ -X1X2X3X4X5X6X7X8X9NNnNNX10X11X12X13X14X15X16X17X18-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18 can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6X7X8X9 and X10X11X12X13X14X15X16X17X18 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5X6X7X8X9-3’ and 5’ -X10X11X12X13X14X15X16X17X18-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5X6X7X8X9 and X10X11X12X13X14X15X16X17X18 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5X6X7X8X9 and X10X11X12X13X14X15X16X17X18 contains about 5%-20%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments of the crRNA described herein, the first stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a first stem-loop sequence set forth in Table 1. In some embodiments, the first stem-loop sequence comprises a first stem-loop nucleotide sequence set forth in Table 1. In some embodiments, the first stem-loop sequence consists of a first stem-loop nucleotide sequence set forth in Table 1. Exemplary secondary structures of the first stem loop are provided in FIGs. 13A-13N, FIGs. 15A-15F.
In some embodiments of the crRNA described herein, the first stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a first stem-loop sequence of any one of SEQ ID Nos: 2 and 478-494. In some embodiments, the first stem-loop sequence comprises a first stem-loop nucleotide sequence of any one of SEQ ID Nos: 2 and 478-494. In some embodiments, the first stem-loop sequence consists of a first stem-loop nucleotide sequence of any one of SEQ ID Nos: 2 and 478-494.
In some embodiments, the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence. Exemplary sequences of the floater region and the first stem loop are provided in Table 1. Exemplary secondary structures are provided in FIGs. 13A, 13C-13N, FIGs. 15A-15F.
In some embodiments, the crRNA further comprises a connector region (or part of a connector region) (e.g. any connector region as described in Section 5.2.1 (a) (iii) (Connector Region) ) 3’ to the first stem-loop sequence. Exemplary sequences of the floater region, the first stem loop, and the connector region (or part of the connector region) are provided in Table 1. Exemplary secondary structures are provided in FIGs. 13D-13F, and 13N.
In some embodiments, the crRNA comprises a stem-loop (SL) sequence as provided in Table 1. In some embodiments, the crRNA comprises a stem-loop sequence of any one of SEQ ID Nos: 1-17, and 441.
Table 1. Stem-Loop (SL) Modification Sequences
*The sequences of the first stem loop are represented by plain letters without underlining. The floater
region 5’ to the first stem-loop sequence is represented by letters with single underlining. The connector region (or part of the connector region) 3’ to the first stem-loop sequence is represented letters with dashed underlining.
*The sequences of the first stem loop are represented by plain letters without underlining. The floater
region 5’ to the first stem-loop sequence is represented by letters with single underlining. The connector region (or part of the connector region) 3’ to the first stem-loop sequence is represented letters with dashed underlining.
Without limiting to any theory, it is contemplated that adding the sequence of 5’ -GAAA-3’ into the loop of the first stem-loop sequence would enhance editing efficiency of a crRNA-mediated gene editing system. In some embodiments, the first stem-loop sequence comprises a loop having the sequence of 5’ -GAAA-3’ .
Without limiting to any theory, it is contemplated that having 7 base pairs in the stem of the first stem-loop would enhance editing efficiency of a crRNA-mediated gene editing system. In some embodiments, the first stem-loop comprises a first stem of about 7 base pairs. In some embodiments, the first stem-loop sequence comprises a first stem of 7 base pairs.
Without limiting to any theory, it is contemplated that methylation at one or more nucleotides near its 5' end of the crRNA molecule can enhance editing efficiency of a crRNA-mediated gene editing system. Without limiting to any theory, it is contemplated that methylation from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop can enhance editing efficiency of a crRNA-mediated gene editing system, see, e.g. FIG. 17A and 17B. In some embodiments, at least 50%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 55%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 60%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 65%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 70%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 75%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 80%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 85%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 90%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 95%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 97%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, 100%of the nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, at least 25 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 24 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 23 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 22 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 21 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 20 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 19 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 18 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 17 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 16 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 15 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 14 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 13 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 12 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 11 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 10 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 9 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 8 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 7 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 6 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, at least 5 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 25 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 24 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 23 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 22 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 21 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 20 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 19 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 18 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 17 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 16 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 15 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 14 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 13 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 12 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 11 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 10 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 9 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 8 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 7 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, up to 6 nucleotides from the 5' end of the crRNA molecule are methylated. In some embodiments, one or more nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated. In some embodiments, all nucleotides from the 5' end of the crRNA molecule to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated.
Without limiting to any theory, it is contemplated that methylation at one or more nucleotides near its 3' end of the first stem-loop can reduce editing efficiency of a crRNA-mediated gene editing system. Without limiting to the theory, it is contemplated that methylation at one or more nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop reduce editing efficiency of a crRNA-mediated gene editing system. In some embodiments, at least 12 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 11 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 10 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 9 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 8 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 7 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 6 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 5 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 4 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, at least 3 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 12 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 11 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 10 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 9 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 8 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 7 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 6 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 5 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 4 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, up to 3 nucleotides from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop are not methylated. In some embodiments, no nucleotide from the last nucleotide at 3’ end of the loop structure of the first stem-loop to the 3' end of the first stem-loop is methylated.
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, any first stem-loop sequence as described herein, a connector region sequence (e.g. any connector region sequence as described in Section 5.2.1 (a) (iii) (Connector region) ) , and a second stem-loop sequence (e.g. any second stem-loop sequence as described in Section 5.2.1 (a) (ii) (Second Stem Loop (Direct Repeat) ) ) . In some embodiments, the crRNA further comprises a spacer region (e.g. any spacer region as described in Section 5.2.1 (a) (v) (Spacer Region) ) 3’ to the second stem-loop sequence. In some embodiments, the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence.
(ii) Second Stem Loop (Direct Repeat)
(ii) Second Stem Loop (Direct Repeat)
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, and a second stem-loop sequence.
In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure.
In some embodiments of the crRNA described herein, the second stem-loop sequence comprises at least about 5 nucleotides, such as 5, 6, 7, 8, 9, 10, or more nucleotides. In some embodiments, the second stem-loop sequence comprises at most about 33 nucleotides, such as 33, 32, 31, 30, 29 or less nucleotides. In some embodiments, the second stem-loop sequence comprises about 5 nucleotides to about 30 nucleotides. In some embodiments, the second stem-loop sequence comprises about 6 nucleotides to about 24 nucleotides. In some embodiments, the second stem-loop sequence comprises about 10 nucleotides to about 20 nucleotides. In some embodiments, the second stem-loop sequence comprises 16 nucleotides. In some embodiments, the second stem-loop sequence comprises 17 nucleotides. In some embodiments, the second stem-loop sequence comprises 18 nucleotides. In some embodiments, the second stem-loop sequence comprises 19 nucleotides. In some embodiments, the second stem-loop sequence comprises 20 nucleotides.
In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of at least about 3 base pairs, such as 3, 4, 5, 6, or more base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of at most about 12 base pairs, such as 12, 11, 10, 9, or less base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 3-12 base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-11 base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-10 base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-9 base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-8 base pairs. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-7 base pairs. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-6 base pairs. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4 base pairs. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 5 base pairs. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 6 base pairs.
In some embodiments, the second stem comprises one or more nucleotide mismatches. In some embodiments, the second stem comprises 5%-40%nucleotide mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) . In some embodiments, the second stem comprises one or more nucleotide mismatches. In some embodiments, the second stem comprises 5%-30%nucleotide mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) . In some embodiments, the second stem comprises one or more nucleotide mismatches. In some embodiments, the second stem comprises 5%-20%nucleotide mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) . In some embodiments, the second stem may comprise 5%-10%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) . In some embodiments, the second stem may comprise 1, 2, 3 or 4 bulges. In some embodiments, the second stem may comprise 1 bulge. In some embodiments, the second stem may comprise 2 bulges. In some embodiments, the second stem may comprise 3 bulges. In some embodiments, the second stem may comprise 4 bulges. In some embodiments, the second stem-loop structure contains precise base pairing. In some embodiments, the second stem does not contain any bulges.
In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of at least about 3 nucleotides, such as 3, 4, 5, 6, or more nucleotides. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of at most about 15 nucleotides, such as 15, 14, 13, 12, or less nucleotides. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of 3-15 nucleotides. In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 4-14 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5-13 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5-12 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5-11 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5-10 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5-9 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5-8 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 5 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 6 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 7 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second loop of about 8 nucleotides.
In some embodiments of the crRNA described herein, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 3-12 base pairs and a second loop of about 3-15 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 4-6 base pairs and a second loop of about 5-8 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 5 base pairs and a second loop of about 5 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 5 base pairs and a second loop of about 6 nucleotides. In some embodiments, the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 5 base pairs and a second loop of about 7 nucleotides.
In some embodiments, the second stem-loop structure comprises 5’ -X1X2X3X4X5NNnNNX6X7X8X9X10-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, and X10 can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5 and X6X7X8X9X10 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5-3’ and 5’ -X6X7X8X9X10-3’ are reverse complementary sequences. In some embodiments, the stem formed by X1X2X3X4X5 and X6X7X8X9X10 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5 and X6X7X8X9X10 contains about 5%-10%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the second stem-loop structure comprises 5’ -X1X2X3X4X5X6NNnNNX7X8X9X10X11X12-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6 and X7X8X9X10X11X12 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5X6-3’ and 5’ -X7 X8X9X10X11X12-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5X6 and X7X8X9X10X11X12 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5X6 and X7X8X9X10X11X12 contains about 5%-30%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the second stem-loop structure comprises 5’ -X1X2X3X4X5X6X7NNnNNX8X9X10X11X12X13X14-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6X7 and X8X9X10X11X12X13X14 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5X6X7-3’ and 5’ -X8X9X10X11X12X13X14-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5X6X7 and X8X9X10X11X12X13X14 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5X6X7 and X8X9X10X11X12X13X14 contains about 5%-20%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the second stem-loop structure comprises 5’ -X1X2X3X4X5X6X7X8NNnNNX9X10X11X12X13X14X15X16-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16 can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6X7X8 and X9X10X11X12X13X14X15X16 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5X6X7X8-3’ and 5’ -X9X10X11X12X13X14X15X16-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5X6X7X8 and X9X10X11X12X13X14X15X16 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5X6X7X8 and X9X10X11X12X13X14X15X16 contains about 5%-20%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the second stem-loop structure comprises 5’ -X1X2X3X4X5X6X7X8X9NNnNNX10X11X12X13X14X15X16X17X18-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18 can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6X7X8X9 and X10X11X12X13X14X15X16X17X18 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5X6X7X8X9-3’ and 5’ -X10X11X12X13X14X15X16X17X18-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5X6X7X8X9 and X10X11X12X13X14X15X16X17X18 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5X6X7X8X9 and X10X11X12X13X14X15X16 X17X18 contains about 5%-20%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments, the second stem-loop structure comprises 5’ -X1X2X3X4X5X6X7X8X9X10NNnNNX11X12X13X14X15X16X17X18X19X20-3’ , wherein X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20 can be any base, n can be any base or deletion, and N can be any base; wherein X1X2X3X4X5X6X7X8X9 X10 and X11X12X13X14X15X16X17X18X19X20 can hybridize to each other to form a stem and make NNnNN form a loop. In some embodiments, 5’ -X1X2X3X4X5X6X7X8X9X10-3’ and 5’ -X11X12X13X14X15X16X17X18X19X20-3’ are reverse complement sequences. In some embodiments, the stem formed by X1X2X3X4X5X6X7X8X9X10 and X11X12X13X14X15X16X17X18X19X20 does not contain base mismatches. In alternative embodiments, the stem formed by X1X2X3X4X5X6X7X8X9X10 and X11X12X13X14X15X16 X17X18X19X20 contains about 5%-20%base mismatches (percentage calculated based on the number of non-paired nucleotides in the stem divided by the number of matched pairings in the stem) .
In some embodiments of the crRNA described herein, the second stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a second stem-loop sequence set forth in Table 2. In some embodiments, the second stem-loop sequence comprises a second stem-loop nucleotide sequence set forth in Table 2. In some embodiments, the second stem-loop sequence consists of a second stem-loop nucleotide sequence set forth in Table 2.
In some embodiments of the crRNA described herein, the second stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a second stem-loop sequence of any of SEQ ID NOs: 495-546. In some embodiments, the second stem-loop sequence comprises a second stem-loop nucleotide sequence of any of SEQ ID NOs: 495-546. In some embodiments, the second stem-loop sequence consists of a second stem-loop nucleotide sequence of any of SEQ ID NOs: 495-546.
In some embodiments, the second stem loop sequence is derived from a naturally-existing Type V crRNA or a functional derivative thereof. The naturally-existing Type V crRNA is also referred as Direct Repeat (DR) . Exemplary DR sequences are provided in Table 2. Exemplary DR sequences and secondary structures are provided in FIGs. 11 and 12. More exemplary DR sequences and secondary structures are provided in FIGs. 14A-14ZZ.
In some embodiments, the naturally existing Type V crRNA is processed from a CRISPR array located 3’ to a Type V CRISPR-Cas locus. In some embodiments, the Direct Repeat is processed from a CRISPR array located 3’ to a Type V CRISPR-Cas locus. In some embodiments, the Type V crRNA or functional derivative thereof comprises a stem-loop structure for binding by a Type V Cas protein.
In some embodiments, the naturally existing Type V crRNA comprises a second stem loop sequence and a connector region sequence (or part of the connector region sequence) . In some embodiments, the naturally existing Type V crRNA comprises any second stem loop sequence as described herein and a connector region sequence or part of the connector region sequence) (e.g. any connector region sequence as described in Section 5.2.1 (a) (iii) (Connector Region) ) .
In some embodiments, the DR sequence comprises a second stem loop sequence and a connector region sequence (or part of the connector region sequence) . In some embodiments, the DR sequence comprises any second stem loop sequence as described herein and a connector region sequence (or part of the connector region sequence) (e.g. any connector region sequence as described in Section 5.2.1 (a) (iii) (Connector Region) ) .
In some embodiments, the crRNA described herein comprises a first stem-loop sequence (e.g. any first stem-loop as described in Section 5.2.1 (a) (i) (First stem loop) ) connected to the 5’ end of a naturally-existing Type V crRNA or a functional derivative thereof. In some embodiments, the crRNA described herein comprises at least one stem-loop sequence (e.g. any first stem-loop as described in Section 5.2.1 (a) (i) (First stem loop) ) connected to the 5’ end of a naturally-existing Type V crRNA or a functional derivative thereof.
In some embodiments, the crRNA described herein comprises a first stem-loop sequence (e.g. any first stem-loop as described in Section 5.2.1 (a) (i) (First stem loop) ) connected to the 5’ end of a DR or a functional derivative thereof. In some embodiments, the crRNA described herein comprises at least one stem-loop sequence (e.g. any first stem-loop as described in Section 5.2.1 (a) (i) (First stem loop) ) connected to the 5’ end of a DR or a functional derivative thereof.
In some embodiments of the crRNA described herein, the naturally existing Type V crRNA has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence set forth in Table 2. In some embodiments, the naturally existing Type V crRNA comprises a nucleotide sequence set forth in Table 2. In some embodiments, the naturally existing Type V crRNA consists of a nucleotide sequence set forth in Table 2. Exemplary naturally existing Type V crRNA sequences and secondary structures are provided in FIGs. 11 and 12. More exemplary naturally existing Type V crRNA sequences and secondary structures are provided in FIGs. 14A-14ZZ.
In some embodiments of the crRNA described herein, the naturally existing Type V crRNA has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence of any of SEQ ID NOs: 18-70. In some embodiments, the naturally existing Type V crRNA comprises a nucleotide sequence of any of SEQ ID NOs: 18-70. In some embodiments, the naturally existing Type V crRNA consists of a nucleotide sequence of any of SEQ ID NOs: 18-70.
In some embodiments of the crRNA described herein, the DR has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence set forth in Table 2. In some embodiments, the DR comprises a nucleotide sequence set forth in Table 2. In some embodiments, the DR consists of a nucleotide sequence set forth in Table 2. Exemplary DR sequences and secondary structures are provided in FIGs. 11 and 12. More exemplary DR sequences and secondary structures are provided in FIGs. 14A-14ZZ.
In some embodiments of the crRNA described herein, the DR has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence of any of SEQ ID NOs: 18-70. In some embodiments, the DR comprises a nucleotide sequence of any of SEQ ID NOs: 18-70. In some embodiments, the DR consists of a nucleotide sequence of any of SEQ ID NOs: 18-70.
Table 2. Directed Repeat (DR) Sequences
*The sequences of the second stem loop are represented by plain letters without underlining. The
connector region (or part of the connector region) 5’ to the second stem-loop sequence is represented by letters with single underlining. The spacer region (or part of the spacer region) 3’ to the second stem-loop sequence is represented letters with dashed underlining) .
*The sequences of the second stem loop are represented by plain letters without underlining. The
connector region (or part of the connector region) 5’ to the second stem-loop sequence is represented by letters with single underlining. The spacer region (or part of the spacer region) 3’ to the second stem-loop sequence is represented letters with dashed underlining) .
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Stem Loop) ) , a connector region sequence (e.g. any connector region sequence as described in Section 5.2.1 (a) (iii) (Connector Region) ) , and any second stem-loop sequence described herein. In some embodiments, the crRNA further comprises a spacer region (e.g. any spacer region as described in Section 5.2.1 (a) (v) (Spacer Region) ) 3’ to the second stem-loop sequence. In some embodiments, the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence.
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Stem Loop) ) , a naturally-existing Type V crRNA or a functional derivative thereof. In some embodiments, the crRNA further comprises a spacer region (e.g. any spacer region as described in Section 5.2.1 (a) (v) (Spacer Region) ) 3’ to the naturally-existing Type V crRNA or a functional derivative thereof. In some embodiments, the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence.
(iii) Connector region
(iii) Connector region
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, and a second stem-loop sequence. In some embodiments, the connector region links the first stem-loop sequence and the second stem-loop sequence.
In some embodiments of the crRNA described herein, the connector region comprises at least about 3 nucleotides, such as 3, 4, 5, 6, or more nucleotides. In some embodiments, the connector region comprises at most about 25 nucleotides, such as 25, 24, 23, 22, 21, 20, or less nucleotides. In some embodiments, the connector region comprises about 3-23 nucleotides. In some embodiments, the connector region comprises about 3-20 nucleotides. In some embodiments, the connector region comprises about 4-19 nucleotides. In some embodiments, the connector region comprises about 5-19 nucleotides. In some embodiments, the connector region comprises 4 nucleotides. In some embodiments, the connector region comprises 5 nucleotides. In some embodiments, the connector region comprises 6 nucleotides. In some embodiments, the connector region comprises 7 nucleotides. In some embodiments, the connector region comprises 8 nucleotides.
In some embodiments of the crRNA described herein, the connector region comprises a nucleotide sequence of any one of the connector regions as indicated in Table 1 and Table 2.
In some embodiments of the crRNA described herein, the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of AUU. In some embodiments, the connector region comprises the nucleotide sequence of AUU. In some embodiments, the connector region consists of the nucleotide sequence of AUU.
In some embodiments of the crRNA described herein, the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of AGAAAU. In some embodiments, the connector region comprises the nucleotide sequence of AGAAAU. In some embodiments, the connector region consists of the nucleotide sequence of AGAAAU.
In some embodiments of the crRNA described herein, the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of UCUGCU. In some embodiments, the connector region comprises the nucleotide sequence of UCUGCU. In some embodiments, the connector region consists of the nucleotide sequence of UCUGCU.
In some embodiments of the crRNA described herein, the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of AAUUUUU. In some embodiments, the connector region comprises the nucleotide sequence of AAUUUUU. In some embodiments, the connector region consists of the nucleotide sequence of AAUUUUU.
In some embodiments of the crRNA described herein, the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of GUUUAAA. In some embodiments, the connector region comprises the nucleotide sequence of GUUUAAA. In some embodiments, the connector region consists of the nucleotide sequence of GUUUAAA.
In some embodiments of the crRNA described herein, the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of CCCACAAUACCUGAGAAAU (SEQ ID NO: 71) . In some embodiments, the connector region comprises the nucleotide sequence of CCCACAAUACCUGAGAAAU (SEQ ID NO: 71) . In some embodiments, the connector region consists of the nucleotide sequence of CCCACAAUACCUGAGAAAU (SEQ ID NO: 71) .
In some embodiments of the crRNA described herein, the connector region comprises a nucleotide sequence having at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%sequence identity of the nucleotide sequence of GUUGCAAAACCCAAGAAAU (SEQ ID NO: 72) . In some embodiments, the connector region comprises the nucleotide sequence of GUUGCAAAACCCAAGAAAU (SEQ ID NO: 72) . In some embodiments, the connector region consists of the nucleotide sequence of GUUGCAAAACCCAAGAAAU (SEQ ID NO: 72) .
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Step Loop) ) , any connector region sequence as described herein, and a second stem-loop sequence (e.g. any second stem-loop sequence as described in Section 5.2.1 (a) (ii) (Second Stem Loop (Direct Repeat) ) . In some embodiments, the crRNA further comprises a spacer region (e.g. any spacer region as described in Section 5.2.1 (a) (v) (Spacer Region) ) 3’ to the second stem-loop sequence. In some embodiments, the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater Region) ) 5’ to the first stem-loop sequence.
(iv) Floater region
(iv) Floater region
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, a second stem-loop sequence, wherein the crRNA further comprises a floater region 5’ to the first stem-loop sequence.
In some embodiments of the crRNA described herein, the floater region is absent. In some embodiments, the floater region comprises at least about 1 nucleotide, such as 1, 2, 3 or more nucleotides. In some embodiments, the floater region comprises 1 nucleotide. In some embodiments, the floater region comprises 2 nucleotides. In some embodiments, the floater region comprises 3 nucleotides. In some embodiments, the floater region comprises 4 nucleotides. In some embodiments, the floater region comprises 5 nucleotides. In some embodiments, the floater region comprises 6 nucleotides. In some embodiments, the floater region comprises 7 nucleotides. In some embodiments, the floater region comprises 8 nucleotides. In some embodiments, the floater region comprises 9 nucleotides. In some embodiments, the floater region comprises 10 nucleotides.
In some embodiments, the floater region comprises the nucleotide sequence of CAU. In some embodiments, the floater region consists of the nucleotide sequence of CAU.
In some embodiments, the floater region comprises the nucleotide sequence of U. In some embodiments, the floater region consists of the nucleotide sequence of U.
In some embodiments, the floater region comprises the nucleotide sequence of GG. In some embodiments, the floater region consists of the nucleotide sequence of GG.
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Stem Loop) ) , a connector region sequence (e.g. any connect region sequence as described in Section 5.2.1 (a) (iii) (Connector Region) ) , and a second stem-loop sequence (e.g. any second stem-loop sequence as described in Section 5.2.1 (a) (ii) (Second Stem Loop (Direct Repeat) ) ) , wherein the crRNA further comprises any floater region as described herein 5’ to the first stem-loop sequence. In some embodiments, the crRNA further comprises a spacer region (e.g. any spacer region as described in Section 5.2.1 (a) (v) (Spacer region) ) 3’ to the second stem-loop sequence.
(v) Spacer region
(v) Spacer region
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence, a connector region, a second stem-loop sequence, wherein the crRNA further comprises a spacer region 3’ to the second stem-loop sequence.
In some embodiments of the crRNA described herein, the spacer region comprises at least about 5 nucleotides, such as 5, 15, 20, 25, 30, or more nucleotides. In some embodiments, the spacer region comprises about 5-75 nucleotides. In some embodiments, the spacer region comprises about 20-40 nucleotides. In some embodiments, the spacer region comprises about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75 or more nucleotides in length.
In some embodiments, the spacer is least about 50%, at least about 60%, least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%complementary to a target sequence. In some embodiments, the degree of complementarity between the spacer and the target sequence is 100%. In some embodiments, there are at least about 15 base pairing (e.g., at least about any of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 or more base pairing) between the spacer sequence and the target sequence of the target nucleic acid (e.g., DNA) .
Without being bound by any theory, it is contemplated that complete complementarity is not required for spacers, provided that there is sufficient complementarity for the crRNA to function (i.e., directing Type V CRISPR effector protein to the target site) . The cleavage efficiency by Type V CRISPR effector protein mediated by the crRNA can be adjusted by introducing one or more mismatches (e.g., 1 or 2 mismatches between the spacer sequence and the target sequence, including the positions along the mismatches of the spacer/target sequence) . Mismatches, such as double mismatches, have greater impact on cleavage efficiency when they are located more central to the spacer (i.e., not at the 3′ or 5′ end of the spacer) . Thus, by choosing the position of mismatches along the spacer sequence, the cleavage efficiency of Type V CRISPR effector protein can be tuned. For example, if less than 100%cleavage of the target sequence is desired (e.g., in a population of cells) , 1 or 2 mismatches between the spacer sequence and the target sequence can be introduced into the spacer sequence.
Without being bound by any theory, it is contemplated that to reduce off-target interactions, e.g., to reduce the guide interacting with a target sequence having low complementarity, mutations can be introduced to the spacer so that the Type V CRISPR-Cas system can distinguish between target and off-target sequences that have greater than 80%, 85%, 90%, or 95%complementarity. In some embodiments, the degree of complementarity is from 80%to 95%, e.g., about 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% (for example, distinguishing between a target having 18 nucleotides from an off-target of 18 nucleotides having 1, 2, or 3 mismatches) . Accordingly, in some embodiments, the degree of complementarity between the spacer sequence in a guide molecule and its corresponding target sequence is greater than 94.5%, 95%, 95.5%, 96%, 96.5%, 97%, 97.5%, 98%, 98.5%, 99%, 99.5%, or 99.9%. In some embodiments, the degree of complementarity is 100%.
Without being bound by any theory, it is further contemplated that the extent of base-pairing between the spacer and a target sequence can modulate nuclease activity of a Type V CRISPR effector protein. In some embodiments, to result in a nicking event, the spacer has no more than about 18 continuous base pairs with the target sequence. In some embodiments, to result in a nicking event, the spacer has no more than 18, no more 17, no more than 16, or no more than 15 continuous base pairs with the target sequence. In some embodiments, to result in a cleaving event, the spacer has at least about 18 continuous base pairs with the target sequence. In some embodiments, to result in a cleaving event, the spacer has at least 18, at least 19, at least 20 or more continuous base pairs with the target sequence.
In some embodiments, the crRNA described herein comprises in the 5’ -to-3’ direction, a first stem-loop sequence (e.g. any first stem-loop sequence as described in Section 5.2.1 (a) (i) (First Stem Loop) ) , a connector region sequence (e.g. any connect region sequence as described in Section 5.2.1 (a) (iii) (Connector region) ) , and a second stem-loop sequence (e.g. any second stem-loop sequence as described in Section 5.2.1 (a) (ii) (Second Stem Loop (Direct Repeat) ) , wherein the crRNA further comprises any spacer region as described herein 3’ to the second stem-loop sequence. In some embodiments, the crRNA further comprises a floater region (e.g. any floater region as described in Section 5.2.1 (a) (iv) (Floater region) ) 5’ to the first stem-loop sequence.
(vi) Stem-Loop Modified Directed Repeat (SLDR) Sequences
(vi) Stem-Loop Modified Directed Repeat (SLDR) Sequences
In some embodiments, the crRNA described herein has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about 99%to a sequence set forth in Table 3. In some embodiments, the crRNA described herein comprises a nucleotide sequence set forth in Table 3. In some embodiments, the crRNA described herein consists of a nucleotide sequence set forth in Table 3.
Table 3. Stem-Loop Modified Directed Repeat (SLDR) Sequences
Stem-loop sequences are underlined.
(b) Chemical Modifications
Stem-loop sequences are underlined.
(b) Chemical Modifications
Chemical modifications can be applied to the crRNA’s phosphate backbone, sugar, and/or base. Backbone modifications such as phosphorothioates modify the charge on the phosphate backbone and aid in the delivery and nuclease resistance of the oligonucleotide (see, e.g., Eckstein, “Phosphorothioates, essential components of therapeutic oligonucleotides, ” Nucl. Acid Ther., 24 (2014) , pp. 374-387) ; modifications of sugars, such as 2′-O-methyl (2′-OMe) , 2′-F, and locked nucleic acid (LNA) , enhance both base pairing and nuclease resistance (see, e.g., Allerson et al. “Fully 2′-modified oligonucleotide duplexes with improved in vitro potency and stability compared to unmodified small interfering RNA, ” J. Med. Chem., 48.4 (2005) : 901-904) . Chemically modified bases such as 2-thiouridine or N6-methyladenosine, among others, can allow for either stronger or weaker base pairing (see, e.g., Bramsen et al., “Development of therapeutic-grade small interfering RNAs by chemical engineering, ” Front. Genet., 2012 Aug. 20; 3: 154) . Additionally, RNA is amenable to both 5′ and 3′ end conjugations with a variety of functional moieties including fluorescent dyes, polyethylene glycol, or proteins.
A wide variety of modifications can be applied to chemically synthesized crRNA molecules. For example, modifying an oligonucleotide with a 2′-OMe to improve nuclease resistance can change the binding energy of Watson-Crick base pairing. Furthermore, a 2′-OMe modification can affect how the oligonucleotide interacts with transfection reagents, proteins or any other molecules in the cell. The effects of these modifications can be determined by empirical testing.
In some embodiments, the crRNA includes one or more phosphorothioate modifications. In some embodiments, the crRNA includes one or more locked nucleic acids for the purpose of enhancing base pairing and/or increasing nuclease resistance.
In some embodiments, one or more nucleotides of the crRNA molecule are methylated
A summary of these chemical modifications can be found, e.g., in Kelley et al., “Versatility of chemically synthesized guide RNAs for CRISPR-Cas9 genome editing, ” J. Biotechnol. 2016 Sep. 10; 233: 74-83; WO 2016205764; and U.S. Pat. No. 8,795,965 B2; each which is incorporated by reference in its entirety.
(c) Sequence Modifications
(c) Sequence Modifications
The sequences and the lengths of the crRNAs described herein can be optimized. In some embodiments, the optimized length of crRNA can be determined by identifying the processed form of the crRNAs, or by empirical length studies of the crRNAs.
The crRNAs can also include one or more aptamer sequences. Aptamers are oligonucleotide or peptide molecules that can bind to a specific target molecule. The aptamers can be specific to gene effectors, gene activators, or gene repressors. In some embodiments, the aptamers can be specific to a protein, which in turn is specific to and recruits/binds to specific gene effectors, gene activators, or gene repressors. The effectors, activators, or repressors can be present in the form of fusion proteins. In some embodiments, the crRNA has two or more aptamer sequences that are specific to the same adaptor proteins. In some embodiments, the two or more aptamer sequences are specific to different adaptor proteins. The adaptor proteins can include, e.g., MS2, PP7, Qβ, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, φCb5, φCb8r, φCb12r, φCb23r, 7s, and PRR1. Accordingly, in some embodiments, the aptamer is selected from binding proteins specifically binding any one of the adaptor proteins as described herein. In some embodiments, the aptamer sequence is a MS2 loop. A detailed description of aptamers can be found, e.g., in Nowak et al., “Guide RNA engineering for versatile Cas9 functionality, ” Nucl. Acid. Res., 2016 Nov. 16; 44 (20) : 9555-9564; and WO 2016205764, which are incorporated herein by reference in their entirety.
(d) Codon optimization
(d) Codon optimization
The invention contemplates all possible variations of nucleic acids, such as cDNA, that could be made by selecting combinations based on possible codon choices. These combinations are made in accordance with the standard triplet genetic code as applied to the polynucleotide encoding naturally occurring variant, and all such variations are to be considered as being specifically disclosed. Nucleotide sequences encoding Type V CRISPR effector protein variants that have been codon-optimized for expression in bacteria (e.g., E. coli) and in human cells are disclosed herein. For example, the codon-optimized sequences for human cells can be generated by substituting codons in the nucleotide sequence that occur at lower frequency in human cells for codons that occur at higher frequency in human cells. The frequency of occurrence for codons can be computationally determined by methods known in the art. An example of a calculation of these codon frequencies for various host cells (e.g., E. coli, yeast, insect, C. elegans, D. melanogaster, human, mouse, rat, pig, P. pastoris, A. thalian, maize, and tobacco) have been published or made available by sources such as theCodon Usage Frequence Table Tool.
5.2.2 Cas proteins
(a) Type V CRISPR Effect Proteins
5.2.2 Cas proteins
(a) Type V CRISPR Effect Proteins
The present application provides Type V CRISPR Effect Proteins which have single-stranded nucleic acid nicking activity or double-stranded nucleic acid cleavage activity. In some embodiments, the Type V CRISPR effector protein forms a complex with a crRNA (e.g. a crRNA of Section 5.2.1 (crRNA) ) . Domain architecture of exemplary Type V CRISPR effector proteins are provided in FIG. 10.
In some embodiments, the Type V CRISPR effect protein comprises a RuvC-like endonuclease domain.
In some embodiments, the RuvC-like endonuclease domain comprises one or more RuvC motifs selected from a RuvC I motif, a RuvC II motif, and a RuvC III motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC I motif, a RuvC II motif and/or a RuvC III motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC I motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC II motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC III motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC I motif and a RuvC II motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC I motif and a RuvC III motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC II motif and a RuvC III motif. In some embodiments, the Type V CRISPR effector protein comprises a RuvC I motif, a RuvC II motif, and a RuvC III motif. In some embodiments, the RuvC-like endonuclease domain comprises one or more RuvC motifs selected from a RuvC I motif, a RuvC II motif and RuvC III motif. In some embodiments, the RuvC I motif comprises the amino acid sequence of X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N. In some embodiments, the RuvC I motif comprises the amino acid sequence of X1XDXNX6X7XXXX11, wherein X1 is A or G or S, X is any amino acid, X6 is Q or I, X7 is T or S or V, X11 is T or A. In some embodiments, the RuvC II motif comprises the amino acid sequence of X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D. In some embodiments, the RuvC II motif comprises the amino acid sequence of X1X2X3E, wherein X1 is C or F or I or L or M or P or V or W or Y, X2 is C or F or I or L or M or P or R or V or W or Y, and X3 is C or F or G or I or L or M or P or V or W or Y. In some embodiments, the RuvC III motif motif comprises the amino acid sequence of X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H; X2 is A, S, R, or G; X is any amino acid; X3 is N, V, I, or E; X4 is A, G, S, or K; X5 is A, or S; X6 is N, H, V, or G; X7 is I, L, or V; X8 is A, G, or L. In some embodiments, the RuvC III motif motif comprises the amino acid sequence of X1SHX4DX6X7, wherein X1 is S or T, X4 is Q or L, X6 is P or S, and X7 is F or L.
In some embodiments, the RuvC-like nuclease domain comprises one or more RuvC motifs selected from a RuvC I motif: X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N; a RuvC II motif: X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D; and a RuvC III motif: X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H; X2 is A, S, R, or G; X is any amino acid; X3 is N, V, I, or E; X4 is A, G, S, or K; X5 is A, or S; X6 is N, H, V, or G; X7 is I, L, or V; X8 is A, G, or L.
In some embodiments, the RuvC-like nuclease domain comprises one or more RuvC motifs selected from a RuvC I motif: X1XDXNX6X7XXXX11, wherein X1 is A or G or S, X is any amino acid, X6 is Q or I, X7 is T or S or V, X11 is T or A; a RuvC II motif: X1X2X3E, wherein X1 is C or F or I or L or M or P or V or W or Y, X2 is C or F or I or L or M or P or R or V or W or Y, and X3 is C or F or G or I or L or M or P or V or W or Y; and a RuvC III motif: X1SHX4DX6X7, wherein X1 is S or T, X4 is Q or L, X6 is P or S, and X7 is F or L.
In some embodiments, the Type V CRISPR effector protein comprises a PAM interacting (PI) domain. In some embodiments, the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity sequence with 176-263 aa of Cas12i_2. In some embodiments, the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 662-762 aa of Cas12a_Cpf1_8. In some embodiments, the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 1-57 aa of cas12j Cas -2. In some embodiments, the PI domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises an oligonucleotide-binding domain (OBD) . OBD domain is also referred as wedge domain (WED) . In some embodiments, the Type V CRISPR effector protein comprises two OBD sub-domains. In some embodiments, the Type V CRISPR effector protein comprises three OBD sub-domains.
In some embodiments, the OBD domain comprises two OBD sub-domains. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 1-18 aa of Cas12i_2 and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 433-577 aa of Cas12i_2. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of MATKTIVRPYTSNLSPNA (SEQ ID NO: 124) , and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
In some embodiments, the OBD domain comprises two OBD sub-domains. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 1-13 aa of Cas12b_8, and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 390-508 aa of Cas12b_8. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of MAVKSIKVKLRLD (SEQ ID NO: 126) , and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
In some embodiments, the OBD domain comprises two OBD sub-domains. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 1-15 aa of ISDra2_TnpB and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 115-183 aa of ISDra2_TnpB. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of MIRNKAFVVRLYPNA (SEQ ID NO: 128) , and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the OBD domain comprises two OBD sub-domains. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 1-24 aa of Cas12a_Cpf1_8, and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 762-892 aa of Cas12a_Cpf1_8. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of MSIYQEFVNKYSLSKTLRFELIPQ (SEQ ID NO: 130) , and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the OBD domain comprises two OBD sub-domains. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 57-73 aa of cas12j|CasΦ-2, and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 196-363 aa of cas12j|CasΦ-2. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of NFQPPAKCHVVTKSRDF (SEQ ID NO: 132) , and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
In some embodiments, the OBD domain comprises two OBD sub-domains. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 1-19 aa of Cas12f_16, and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 192-312 aa of Cas12f_16. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of MAKNTITKTLKLRIVRPYN (SEQ ID NO: 134) , and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
In some embodiments, the OBD domain comprises two OBD sub-domains. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 1-22 aa of cas12o|GCA_012031515.1|JAAUTC010000047.1_1, and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 358-474 aa of cas12o|GCA_012031515.1|JAAUTC010000047.1_1. In some embodiments, the first OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of MAKYDPSNVEVTSAFNAPVRLE (SEQ ID NO: 136) , and the second OBD sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises a NUC domain. NUC domain is also referred as TSL (target strand loading) domain, target nucleic acid-binding (TNB) domain or ZnF (zinc-finger) domain. In some embodiments, the Type V CRISPR effector protein comprises two NUC sub-domains.
In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 867-990 aa of Cas12i_2. In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises two NUC sub-domains. In some embodiments, the first NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 895-975 aa of Cas12b_8, and the second NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 991-1129 aa of Cas12b_8. In some embodiments, the first NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100 sequence identity with the amino acid sequence of AAFSSRFDARTGAPGIRCRRVPARCTQEHNPEPFPWWLNKFVVEHTLDACPLRADDLIPTGEGEI FVSPFSAEEGDFHQIH (SEQ ID NO: 139) , and the second NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 330-360 aa of ISDra2_TnpB. In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of LCHDCGFKNPEVKNLAVRTWTCPNCGETHDR (SEQ ID NO: 141) .
In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100 sequence identity with 1078-1254 aa of Cas12a_Cpf1_8. In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100 sequence identity with 666-694 aa of cas12j|CasΦ-2. In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 473-508 aa of Cas12f_16. In some embodiments, the NUC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises two NUC sub-domains. In some embodiments, the first NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 818-853 aa of cas12o|GCA_012031515.1|JAAUTC010000047.1_1, and the second NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 874-984 of cas12o|GCA_012031515.1|JAAUTC010000047.1_1. In some embodiments, the first NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of DFTSQIDSETGEFGYRDKQNKSRLYMNGPVPLRFCD (SEQ ID NO: 145) , and the second NUC sub-domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises a bridge helix (BH) domain. BH domain is also referred as helical domain or helical hairpin (HH) domain. In some embodiments, the BH domain is inserted in the REC domain (e.g. REC2 domain) . In some embodiments, the BH domain is inserted between the REC domain and the RuvC domain. In some embodiments, the BH domain is inserted between the REC2 domain and the RuvC-II domain. In some embodiments, the BH domain is inserted in the RuvC domain. In some embodiments, the BH domain is inserted between the RuvC-I domain and the RuvC-II domain.
In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 775-814 aa of Cas12i_2. In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of FDSDLFKLGECLSEKRVNKREERANRIVSSVLQICSRLNV (SEQ ID NO: 147) .
In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 621-659 aa of Cas12b_8. In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of KLPGETESKDLRAIREERQRTLRQLRTQLAYLRLLVRCG (SEQ ID NO: 148) .
In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 953-971 aa of Cas12a_Cpf1_8. In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of YHDKLAAIEKDRDSARKDW (SEQ ID NO: 149) .
In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 348-415 aa of Cas12f_16. In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 565-606 aa of cas12o|GCA_012031515.1|JAAUTC010000047.1_1. In some embodiments, the BH domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of AKNIPVEDIRKIDKVTNMAKSVKSLIGYARQHLAAIKAKKFG (SEQ ID NO: 151) .
In some embodiments, the Type V CRISPR effector protein comprises a REC lobe. In some embodiments, the REC lobe comprises one or more REC domains selected from a REC1 domain and a REC2 domain. In some embodiments, the Type V CRISPR effector protein comprises a REC1 domain. In some embodiments, the Type V CRISPR effector protein comprises a REC2 domain. In some embodiments, the Type V CRISPR effector protein comprises two REC1 domains. In some embodiments, the Type V CRISPR effector protein comprises two REC2 domains. In some embodiments, the Type V CRISPR effector protein comprises a REC1 domain and a REC2 domain.
In some embodiments, the Type V CRISPR effector protein comprises two REC domains. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 18-176 aa of Cas12i_2, and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 263-433 aa of Cas12i_2. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of AAKKAMLDESFKFFDHAYTVFFSVFIKLWGGVKPTQVALVENDTNKIDAICSILWFRLQTKTDST NITLQSAEERIRRFKEYAQHDPSPLALSYLTGNLDPEKHEWVDCRELYQNWCAELKCDLATDIET MINHNLLPISAKQEYNCYSSFSNLFGEAE (SEQ ID NO: 152) , and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises two REC domains. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 13-390 aa of Cas12b_8, and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 659-821 of Cas12b_8. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of DDMPEIRAGLWKLHKEVNAGVRYYTEWLSLLRQENLYRRSPNGDGEQECDKTAEECKAELLER LRARQVENGHRGPAGSDDELLQLARQLYELLVPQAIGAKGDAQQIARKFLSPLADKDAVGGLGI AKAGNKPRWVRMREAGEPGWEEEKEKAETRKSADRTADVLRALADFGLKPLMRVYTDSEMSS VEWKPLRKGQAVRTWDRDMFQQAIERMMSWESWNQRVGQEYAKLVEQKNRFEQKNFVGQE HLVHLVNQLQQDMKEASPGLESKEQTAHYVTGRALRGSDKVFEKWGKLAPDAPFDLYDAEIKN VQRRNTRRFGSHDLFAKLAEPEYQALWREDASFLTRYAVYNSILRKLNHAKMFATFTLPDATAH PI (SEQ ID NO: 154) , and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 15-115 aa of ISDra2_TnpB. In some embodiments, the REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises two REC domains. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100 sequence identity with 24-339 aa of Cas12a_Cpf1_8, and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 339-591 of Cas12a_Cpf1_8. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with the amino acid sequence of QGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCISEDLLQNYSDVYFKLKKS DDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILWLKQSKDNGIELFK ANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLENKAKYESL KDKAPEAINYEQIKKDLAEELTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNTIIGGK FVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVI (SEQ ID NO: 157) , and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises two REC domains. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 73-196 aa of cas12j|CasΦ-2, and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 436-578 of cas12j|CasΦ-2. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of FAEWPIMKASEAIQRYIYALSTTERAACKPGKSSESHAAWFAATGVSNHGYSHVQGLNLIFDHTL GRYDGVLKKVQLRNEKARARLESINASRADEGLPEIKAEEEEVATNETGHLLQPPGINP (SEQ ID NO: 159) , and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 19-192 aa of Cas12f_16. In some embodiments, the REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises two REC domains. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 22-358 aa of cas12o|GCA_012031515.1|JAAUTC010000047.1_1, and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100% sequence identity with 606-764 of cas12o|GCA_012031515.1|JAAUTC010000047.1_1. In some embodiments, the first REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of EGEEVLIDFIRNEILPAADKLLELLLFFRGKPFCLSGVNYSESDVDQKLKEIYNSVSIVPEKAKRFG VKDASDFAFDQFKDEAQKLYKFFIGEESPDDGNKIKQAATSFYAIFFAKATGNRITRNIPSICSSSL FPIASFANCNLGASITAEVERKIKSFEELQKLRNEEYTKLNNAGDHNPDGEDDGSETIFASAVVDV RRFCQSLYENSKTYGFKEFGKENIKSVSEFLSENVEQLRSIFAEKGGNFSFEDEADLSRHKIVTGY KANFVNAIYSDFDYVWKSRPDVEDVYEDKKRRVRHKCLMTDRRYVKLALDLCKFIKKREINVS RFGDKGGL (SEQ ID NO: 162) , and the second REC domain comprises an amino acid sequence that shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein does not contain an HNH-like domain. In some embodiments, the HNH domain shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 775-909 aa of SpCas9. In some embodiments, the HNH domain shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with 196-296 aa of OgeuIscB1.
SpCas9:
OgeuIscB1:
In some embodiments, the HNH domain shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of KNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDV DHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK AERGGLS (SEQ ID NO: 166) . In some embodiments, the HNH domain shares at least about 15%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or 100%sequence identity with the amino acid sequence of
In some embodiments, the Type V CRISPR effector protein comprises at least about 175 amino acids, such as 175, 200, 225, 250, 275, 300 or more amino acids. In some embodiments, the CRISPR effector protein comprises at most about 1400 amino acids, such as 1400, 1300, 1200, 1100, 1000 or less amino acids. In some embodiments, the CRISPR effector protein comprises about 700 amino acids to about 1300 amino acids, such as about 750 amino acids to about 1200 amino acids. In some embodiments, the CRISPR effector protein comprises 1022 amino acids.
In some embodiments, the CRISPR effector protein of the present invention can recognize PAM (protospacer adjacent motif, protospacer adjacent motif) to act on the target sequence. In some embodiments, the PAM comprises the nucleic acid sequence of 5’ -TTN-3’ or 5’ -NTN-3’ , wherein N is selected from A, T, C, G, and U. In some embodiments, the PAM consists of the nucleic acid sequence of 5’ -TTN-3’ or 5’ -NTN-3’ , wherein N is selected from A, T, C, G, and U. In some embodiments, the PAM comprises 5’ -TTA-3’ , 5’ -TTT-3’ , 5’ -TTG-3’ , 5-TTC-3’ , 5’ -ATA-3’ , or 5’ -ATG-3’ . In some embodiments, the PAM consists of 5’ -TTA-3’ , 5’ -TTT-3’ , 5’ -TTG-3’ , 5-TTC-3’ , 5’ -ATA-3’ , or 5’ -ATG-3’ .
In some embodiments, the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12b1 (C2c1) , Cas12b2, Cas12c (C2c3) , Cas12d (CasY) , Cas12e (CasX) , Cas12f1 (Cas14a) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12g, Cas12h, Cas12i, Cas12j (CasΦ-2) , Cas12k (C2c5) , Cas12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12d (CasY) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12h, Cas12i, Cas12j (CasΦ-2) , Cas12k (C2c5) , Cas12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof.
In some embodiments, the Type V CRISPR effector protein is a Cas12a (Cpf1) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12b1 (C2c1) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12b2 or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12b2or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12c (C2c3) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12d (CasY) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12e (CasX) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12f1 (Cas14a) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12f2 (Cas14b) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12f3 (Cas14c) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12g or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12h or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12i or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12j (CasΦ-2) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12k (C2c5) or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a Cas12l or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a C2c4 or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a C2c8 or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a C2c9 or a functional derivative thereof. In some embodiments, the Type V CRISPR effector protein is a C2c10 or a functional derivative thereof.
In some embodiments, a functional derivative of a Cas12 protein comprises an amino acid sequence having at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99%sequence identity to such Cas12 protein.
In some embodiments, the functional derivative of a Cas12 protein described herein comprises one or more conservative amino acid substitutions. Conservative amino acid substitutions are ones in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been generally defined in the art, including basic side chains (e.g., lysine, arginine, histidine) , acidic side chains (e.g., aspartic acid, glutamic acid) , uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine) , nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan) , beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine) . For example, substitution of a phenylalanine for a tyrosine is a conservative substitution. Alternatively, naturally occurring residues may be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; and (6) aromatic: Trp, Tyr, Phe. Generally, conservative substitutions in the sequences of the peptides or polypeptides the disclosure do not abrogate the biological activity of interest of the peptide or polypeptide. Amino acid substitutions may be introduced into a polypeptide of interest and the products screened for a desired activity of interest, e.g., retained/improved ability of a Cas12 protein variant in producing target gene editing efficiency in a report cell line, and methods for measuring such desired activity are well-known in the art.
In contrast, substantial modifications in the biological properties of a polypeptide are accomplished by selecting substitutions that differ significantly in their effect on maintaining (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a sheet or helical conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain.
A derivative of polypeptide can be prepared using methods well-known in the art, e.g., by modifying the corresponding nucleic acid molecules encoding the derivative. For example, derivatives may be a substitution, deletion, or insertion of one or more codons encoding the polypeptide that results in a change in the amino acid sequence as compared with the wild-type sequence of the polypeptide. The derivatives can be made using methods well-known in the art such as DNA synthesis, oligonucleotide-mediated (site-directed) mutagenesis, alanine scanning, and PCR mutagenesis. Site-directed mutagenesis (see, e.g., Carter, 1986, Biochem J. 237: 1-7; and Zoller et al., 1982, Nucl. Acids Res. 10: 6487-500) , cassette mutagenesis (see, e.g., Wells et al., 1985, Gene 34: 315-23) , or other known techniques can be performed on the cloned DNA to produce the derivatives DNA.
Those skilled in the art can determine the site (s) in an amino acid sequence of a given protein, where a modification (s) can be made in order to produce functional derivatives. In some embodiments, a functional derivative of a polypeptide comprises one or more modifications to one or more predicted non-essential amino acid residues in its sequence. In some embodiments, modifications made to non-essential amino acid residues can be a conservative substation as described herein. In some embodiments, modifications made to non-essential amino acid residues can be a substantial substation described herein. In some embodiments, modifications made to non-essential amino acid residues can be a deletion of the non-essential amino acid residue. In alternative embodiments, one or more modifications can be made to one or more predicted essential amino acid residues in its sequence. In particularly embodiments, the modifications made to essential amino acid residues in a protein sequence can be a conservative substitution as described herein. Methods well-known in the art can be used to analyze a protein (e.g., a Cas12 protein) sequence to identify essential and non-essential amino acid residues of the protein. For example, in some embodiments, an amino acid residue of a protein that is not conserved among orthologous gene products is predicted to be a non-essential amino acid residue, while another amino acid residue that is conserved among orthologous gene products is predicted to be an essential amino acid residue.
In some embodiments, after making one or more modifications to the sequence of a polypeptide (e.g., by making insertions, deletions, or substitutions of amino acids in the original amino acid sequence either systematically, randomly, or at selected sites) , functional derivatives of the polypeptide can be identified by testing the resulting derivatives for activity exhibited by the original sequence. For example, to identify functional derivative of a Cas protein (e.g., a Cas12 protein) as described herein, nucleic acid molecules encoding the derivative polypeptides can be delivered into a population of in vitro cultured cells in the presence of a suitable guide molecule that targets a reporter gene. After incubating the cells under a suitable condition for the derivative Cas protein to be expressed at a sufficient level, assays can be conducted to detect and/or measure editing (e.g., knocking down) of the target gene in the testing cell population, and those derivatives that induce the gene editing phenotype in the testing cell population at a comparable level to that of the control population can be selected as functional derivatives.
In some embodiments, the Type V CRISPR effector protein provided herein has sequence identity of at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99%sequence identity to an amino acid sequence set forth in Table 4. In some embodiments, the Type V CRISPR effector protein comprises an amino acid sequence set forth in Table 4. In some embodiments, the Type V CRISPR effector protein consists of an amino acid sequence set forth in Table 4.
(b) Cas Functional domains
(b) Cas Functional domains
Functional domains are used in their broadest sense and include proteins such as enzymes or factors themselves or specific functional fragments (domains) thereof.
In some embodiments, a Type V CRISPR effector protein described herein is further associated with one or more functional domains selected from the group consisting of a deaminase (e.g., adenosine deaminase or cytidine deaminase) catalytic domain, a DNA methylation catalytic domain, a DNA demethylation catalytic domain, a histone residue modification domain, a nuclease catalytic domain, a fluorescent protein, a transcription modification factor (e.g., a transcription activation catalytic domain, a transcription inhibition catalytic domain) , a nuclear localization signal (NLS) , nuclear export signal (NES) , a light gating factor, a chemical inducible factor, or a chromatin visualization factor; preferably, the functional domain is selected from the group consisting of an adenosine deaminase catalytic domain or cytidine deaminase catalytic domain.
In some embodiments, the functional domain may be a transcription activation domain. In some embodiments, the functional domain is a transcription repression domain. In some embodiments, the functional domain is an epigenetic modification domain such that an epigenetic modification enzyme is provided. In some embodiments, the functional domain is an activation domain. In some embodiments, the Type V CRISPR effector protein is associated with one or more functional domains; and the Type V CRISPR effector protein contains one or more mutations within the RuvC domain, and the resulting CRISPR complex can deliver epigenetic modifiers, or transcript or translate activation or repression signals.
In some embodiments, the functional domain exhibits activity to modify a target DNA or proteins associated with the target DNA, wherein the activity is one or more selected from the group consisting of nuclease activity (e.g., HNH nuclease, RuvC nuclease, Trex1 nuclease, Trex2 nuclease) , methylation activity, demethylation activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, glycosylase activity, acetyl transferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitination activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase) , deglycosylation activity, transcription inhibition activity, and transcription activation activity. Target DNA associated proteins include, but not limited to, proteins that can bind to target DNA, or proteins that can bind to proteins bound to target DNA, such as histones, transcription factors, Mediator, etc.
The functional domain may be, for example, one or more domains from the group consisting of methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity, DNA cleavage activity, nucleic acid binding activity, and molecular switches (e.g., photo-inducible) . When more than one functional domain is included, the functional domains may be the same or different.
In certain exemplary embodiments, the Type V CRISPR effector protein may be fused to adenosine deaminase or cytidine deaminase for base editing purposes.
As used herein, the term “adenosine deaminase” or “adenosine deaminase protein” refers to a protein, polypeptide, or one or more functional domains of a protein or polypeptide that can catalyze hydrolytic deamination reaction to convert adenine (or the adenine portion of a molecule) to hypoxanthine (or the hypoxanthine portion of a molecule) , as shown below. In some embodiments, the adenine-containing molecule is adenosine (A) and the hypoxanthine-containing molecule is inosine (I) . The adenine-containing molecule may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) .
According to the present disclosure, adenosine deaminases that can be used in combination with the present disclosure include, but are not limited to, enzyme family members referred to as adenosine deaminase acting on RNA (ADAR) , enzyme family members referred to as adenosine deaminase acting on tRNA (ADAT) , and other family members comprising adenosine deaminase domain (ADAD) . According to the present disclosure, the adenosine deaminase is capable of targeting adenine in RNA/DNA and RNA duplexes. In fact, Zheng et al. (Nucleic Acids Res. 2017, 45 (6) : 3369-3377) demonstrated that ADAR can edit adenosine to inosine in RNA/DNA and RNA/RNA duplexes. In specific embodiments, adenosine deaminase has been modified to increase its ability to edit DNA in the RNA/DNA heteroduplex of the RNA duplex, as described in detail below.
In some embodiments, the adenosine deaminase is derived from one or more metazoan species, including but not limited to mammals, birds, frogs, squid, fish, flies, and worms. In some embodiments, the adenosine deaminase is human, squid, or drosophila adenosine deaminase.
In some embodiments, the adenosine deaminase is human ADAR, including hADAR1, hADAR2, and hADAR3. In some embodiments, the adenosine deaminase is Caenorhabditis elegans ADAR protein, including ADR-1 and ADR-2. In some embodiments, the adenosine deaminase is drosophila ADAR protein, including dAdar. In some embodiments, the adenosine deaminase is squid (Loligo pealeii) ADAR protein, including sqADAR2a and sqADAR2b. In some embodiments, adenosine deaminase is human ADAT protein. In some embodiments, the adenosine deaminase is drosophila ADAT protein. In some embodiments, the adenosine deaminase is human ADAD protein, including TENR (hADAD1) and TENRL (hADAD2) .
In some embodiments, the adenosine deaminase is TadA protein, such as E. coli TadA. See Kim et al., Biochemistry 45: 6407-6416 (2006) ; Wolf et al., EMBO J. 21: 3841-3851 (2002) . In some embodiments, the adenosine deaminase is mouse ADA. See Grunebaum et al., Curr. Opin. Allergy Clin. Immunol. 13: 630-638 (2013) . In some embodiments, the adenosine deaminase is human ADAT2. See Fukui et al., J. Nucleic Acids 2010: 260512 (2010) . In some embodiments, the deaminase (e.g., adenosine or cytidine deaminase) is one or more of those described in: Cox et al., Science. Nov. 24, 2017; 358 (6366) : 1019-1027; Komore et al., Nature. May 19, 2016; 533 (7603) : 420-4; and Gaudelli et al., Nature. Nov. 23, 2017; 551 (7681) : 464-471.
In some embodiments, the adenosine deaminase protein recognizes one or more target adenosine residues in a double-stranded nucleic acid substrate and converts them to inosine residues. In some embodiments, the double-stranded nucleic acid substrate is an RNA-DNA heteroduplex. In some embodiments, the adenosine deaminase protein recognizes a binding window on a double-stranded substrate. In some embodiments, the binding window comprises at least one target adenosine residue. In some embodiments, the binding window is in the range of about 3 bp to about 100 bp. In some embodiments, the binding window is in the range of about 5 bp to about 50 bp. In some embodiments, the binding window is in the range of about 10 bp to about 30 bp. In some embodiments, the binding window is about 1 bp, 2 bp, 3 bp, 5 bp, 7 bp, 10 bp, 15 bp, 20 bp, 25 bp, 30 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 by or 100 bp.
In some embodiments, the adenosine deaminase protein comprises one or more deaminase domains. Without wishing to be bound by a particular theory, it is contemplated that the deaminase domain is used to recognize one or more target adenosine (A) residues contained in a double-stranded nucleic acid substrate and convert them to inosine (I) residues. In some embodiments, the deaminase domain comprises an active center. In some embodiments, the active center comprises zinc ions. In some embodiments, during A-I editing, the base pair at the target adenosine residue is destroyed and the target adenosine residue is “flipped” out of the double helix to become accessible by the adenosine deaminase. In some embodiments, amino acid residues in or near the active center interact with one or more nucleotides 5′ of the target adenosine residue. In some embodiments, amino acid residues in or near the active center interact with one or more nucleotides 3′ of the target adenosine residue. In some embodiments, amino acid residues in or near the active center further interact with nucleotides complementary to the target adenosine residues on the opposite chain. In some embodiments, the amino acid residue forms a hydrogen bond with the 2′ hydroxyl group of the nucleotide.
In some embodiments, the adenosine deaminase comprises human ADAR2 whole protein (hADAR2) or deaminase domain (hADAR2-D) thereof. In some embodiments, the adenosine deaminase is a member of the ADAR family homologous to hADAR2 or hADAR2-D.
In particular, in some embodiments, the homologous ADAR protein is human ADAR1 (hADAR1) or deaminase domain (hADAR1-D) thereof. In some embodiments, glycine 1007 of hADAR1-D corresponds to glycine 487hADAR2-D, and glutamic acid 1008 of hADAR1-D corresponds to glutamic acid 488 of hADAR2-D.
In some embodiments, the adenosine deaminase comprises the wild-type amino acid sequence of hADAR2-D. In some embodiments, the adenosine deaminase comprises one or more mutations in the hADAR2-D sequence such that the editing efficiency and/or substrate editing preference of hADAR2-D are changed as desired.
In some embodiments, the adenosine deaminase is TadA8e. In some embodiments, the Type V CRISPR effector protein described herein is fused to TadA8e or functional fragment thereof (i.e., capable of A-to-I single base editing) .
In some embodiments, the deaminase is cytidine deaminase. As used herein, the term “cytidine deaminase” or “cytidine deaminase protein” refers to a protein, polypeptide, or one or more functional domains of a protein or polypeptide that can catalyze hydrolytic deamination reaction to convert cytosine (or the cytosine portion of a molecule) to uracil (or the uracil portion of a molecule) , as shown below. In some embodiments, the cytosine-containing molecule is cytidine (C) and the uracil-containing molecule is uridine (U) . The cytosine-containing molecule may be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) .
According to the present disclosure, cytidine deaminases that can be used in combination with the present disclosure include, but are not limited to, members of an enzyme family known as apolipoprotein B mRNA editing complex (APOBEC) family deaminases, activation-induced deaminase (AID) , or cytidine deaminase 1 (CDA1) , and in specific embodiments, the deaminase in APOBEC1 deaminases, APOBEC2 deaminases, APOBEC3A deaminases, APOBEC3B deaminases, APOBEC3C deaminases and APOBEC3D deaminases, APOBEC3E deaminases, APOBEC3F deaminases, APOBEC3G deaminases, APOBEC3H deaminases or APOBEC4 deaminases.
In the methods and systems of the invention, the cytidine deaminase is capable of targeting cytosines in a DNA single strand. In certain exemplary embodiments, the cytidine deaminase can edit on a single strand present outside of the binding component, e.g., bind to Cas13. In other exemplary embodiments, the cytidine deaminase may edit at localized bubbles, such as those formed at target editing sites but with guide sequence mismatching. In certain exemplary embodiments, the cytidine deaminase may comprise mutations that contribute to focus activity, such as those described in Kim et al., Nature Biotechnology (2017) 35 (4) : 371-377 (doi: 10.1038/nbt. 3803) .
In some embodiments, the cytidine deaminase is derived from one or more metazoan species, including but not limited to mammals, birds, frogs, squid, fish, flies, and worms. In some embodiments, the cytidine deaminase is human, primate, bovine, canine, rat, or mouse cytidine deaminase.
In some embodiments, the cytidine deaminase is human APOBEC, including hAPOBEC1 or hAPOBEC3. In some embodiments, the cytidine deaminase is human AID.
In some embodiments, the cytidine deaminase protein recognizes one or more target cytosine residues in a single-stranded bubble of an RNA duplex and converts them to uracil residues. In some embodiments, the cytidine deaminase protein recognizes a binding window on a single-stranded bubble of an RNA duplex. In some embodiments, the binding window comprises at least one target cytosine residue. In some embodiments, the binding window is in the range of about 3 bp to about 100 bp. In some embodiments, the binding window is in the range of about 5 bp to about 50 bp. In some embodiments, the binding window is in the range of about 10 bp to about 30 bp. In some embodiments, the binding window is about 1 bp, 2 bp, 3 bp, 5 bp, 7 bp, 10 bp, 15 bp, 20 bp, 25 bp, 30 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 by or 100 bp.
In some embodiments, the cytidine deaminase protein comprises one or more deaminase domains. Without wishing to be bound by theory, it is contemplated that deaminase domains are used to recognize one or more target cytosine (C) residues contained in a single-stranded bubble of an RNA duplex and convert them to uracil (U) residues. In some embodiments, the deaminase domain comprises an active center. In some embodiments, the active center comprises zinc ions. In some embodiments, amino acid residues in or near the active center interact with one or more nucleotides at 5′ of the target cytosine residue. In some embodiments, amino acid residues in or near the active center interact with one or more nucleotides at 3′ of the target cytosine residue.
In some embodiments, the cytidine deaminase comprises human APOBEC1 whole protein (hAPOBEC1) or its deaminase domain (hAPOBEC1-D) or its C-terminal truncated form (hAPOBEC-T) . In some embodiments, the cytidine deaminase is a member of the APOBEC family homologous to hAPOBEC1, hAPOBEC-D, or hAPOBEC-T. In some embodiments, the cytidine deaminase comprises human AID1 whole protein (hAID) or its deaminase domain (hAID-D) or its C-terminal truncated form (hAID-T) . In some embodiments, the cytidine deaminase is a member of the AID family homologous to hAID, hAID-D, or hAID-T. In some embodiments, hAID-T is hAID with the C-terminus truncated by about 20 amino acids.
In some embodiments, the cytidine deaminase comprises the wild-type amino acid sequence of cytosine deaminase. In some embodiments, the cytidine deaminase comprises one or more mutations in the cytosine deaminase sequence such that the editing efficiency and/or substrate editing preference of the cytosine deaminase are changed as desired.
5.3 Methods of Using CRISPR-Cas Systems
5.3 Methods of Using CRISPR-Cas Systems
The CRISPR-Cas systems described herein have a wide variety of utilities including modifying (e.g., deleting, inserting, translocating, inactivating, or activating) a target polynucleotide in a multiplicity of cell types. The CRISPR-Cas systems have a broad spectrum of applications in, e.g., DNA/RNA detection (e.g., specific high sensitivity enzymatic reporter unlocking (SHERLOCK) ) , tracking and labeling of nucleic acids, enrichment assays (extracting desired sequence from background) , detecting circulating tumor DNA, preparing next generation library, drug screening, disease diagnosis and prognosis, and treating various genetic disorders.
5.3.1 Genome Editing Systems Generally
5.3.1 Genome Editing Systems Generally
The term “genome editing system” refers to an engineered CRISPR-Cas system of the present disclosure having RNA-guided DNA editing activity. Genome editing systems of the present disclosure include at least two components of the CRISPR-Cas systems described above: a crRNA and a Type V CRISPR effector protein. As described above, these two components form a complex that is capable of associating with a specific nucleic acid sequence and editing the DNA in or around that nucleic acid sequence, for instance by making one or more of a single strand break (an SSB or nick) , a double strand break (aDSB) , a nucleobase modification, a DNA methylation or demethylation, a chromatin modification, etc.
Genome editing systems of the present disclosure, when introduced into cells, may alter (a) endogenous genomic DNA (gDNA) including, without limitation, DNA encoding e.g., a gene target of interest, an exonic sequence of a gene, an intronic sequence of a gene, a regulatory element of a gene or group of genes, etc. ; (b) endogenous extra-genomic DNA such as mitochondrial DNA (mtDNA) ; and/or (c) exogenous DNA such as a non-integrated viral genome, a plasmid, an artificial chromosome, etc. Throughout this disclosure, these DNA substrates are referred to as “target DNA. ”
In instances where a genome editing operates by generating SSBs or DSBs, alterations caused by the system may take the form of short DNA insertions or deletions, which are collectively referred to as “indels. ” These indels may be formed within or proximate to a predicted cleavage site that is typically proximate to the PAM sequence and/or within a region of complementarity to the spacer sequence, though in some cases indels may occur outside of such predicted cleavage site. Without wishing to be bound by any theory, it is believed that indels are often the result of the repair of an SSB or DSB by “error-prone” DNA damage repair pathways, such as non-homologous end joining (NHEJ) .
In some cases, a genome editing is used to generate two DSBs within 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1250, 1500, 1750, or 2000 base pairs of one another, which results in one or more outcomes, including the formation of an indels at one or both sites of cleavage, as well as deletion or inversion of a DNA sequence disposed between the DSBs.
Alternatively, genome editing systems of this disclosure may alter target DNA via integration of new sequences. These new sequences may be distinct from the existing sequence of the target DNA (as a non-limiting example, integrated by NHEJ by ligation of blunt-ends) or may correspond to a DNA template having one or more regions that are homologous to a region of the targeted DNA. Integration of templated homologous sequences is also referred to as “homology-directed repair” or “HDR. ” Template DNA for HDR may be endogenous to the cell, including without limitation in the form of a homologous sequence located on another copy of the same chromosome as the target DNA, a homologous sequence from the same gene cluster as the target DNA, etc. Alternatively, or additionally, the template DNA may be provided exogenously, including without limitation as a free linear or circular DNA, as a DNA bound (covalently or non-covalently) to one or more genome editing system components, or as part of a vector genome.
In some instances, editing comprises a temporary or permanent silencing of a gene by CRISPR-mediated interference, as described by Matthew H. Larson et al. “CRISPR interference (CRISPRi) for sequence-specific control of gene expression, ” Nature Protocols 8, 2180-2196 (2013) , which is incorporated by reference in its entirety and for all purposes.
Genome editing systems may include other components, including without limitation one or more heterologous functional domains which mediate site specific nucleobase modification, DNA methylation or demethylation, or chromatin modification. In some cases, the heterologous functional domain covalently bound to a Type V CRISPR effector protein, for instance by means of a direct peptide bond or an intervening peptide linker. In some embodiments, the heterologous functional domain is covalently bound to the crRNA, for instance by means of a chemical cross-link. And in some embodiments, one or more functional groups may be non-covalently associated with a Type V CRISPR effector protein and/or a crRNA. This is done, variously, by means of an aptamer appended to the crRNA and/or the heterologous functional group, a peptide motif fused to the Type V CRISPR effector protein and a binding domain configured to bind such motif fused to the heterologous functional domain, or vice versa.
Genome editing system designs and genome editing outcomes are described in greater detail elsewhere in this specification.
5.3.2 DNA/RNA Detection
5.3.2 DNA/RNA Detection
In one aspect, the CRISPR-Cas system described herein can be used in DNA/RNA detection by DNA sensing. Single effector RNA-guided DNases can be reprogrammed with RNA guides to provide a platform for specific single-stranded DNA (ssDNA) sensing. Upon recognition of its DNA target, an activated Type V CRISPR effector protein engages in “collateral” cleavage of nearby ssDNA with no sequence similarity to the target sequence. This RNA-programmed collateral cleavage activity allows the CRISPR systems to detect the presence of a specific DNA by nonspecific degradation of labeled ssDNA.
The collateral ssDNase activity can be combined with a reporter in DNA detection applications such as a method called the DNA Endonuclease-Targeted CRISPR trans reporter (DETECTR) method, which when combined with amplification achieves attomolar sensitivity for DNA detection (see, e.g., Chen et al., Science, 360 (6387) : 436-439, 2018) , which is incorporated herein by reference in its entirety. One application of using the enzymes described herein is to degrade non-target ssDNA in an in vitro environment. A “reporter” ssDNA molecule linking a fluorophore and a quencher can also be added to the in vitro system, along with an unknown sample of DNA (either single-stranded or double-stranded) . Upon recognizing the target sequence in the unknown piece of DNA, the surveillance complex containing Type V CRISPR effector protein cleaves the reporter ssDNA resulting in a fluorescent readout.
In some embodiments, the CRISPR-Cas systems described herein can be used in multiplexed error-robust fluorescence in situ hybridization (MERFISH) . These methods are described in, e.g., Chen et al., “Spatially resolved, highly multiplexed RNA profiling in single cells, ” Science, 2015 Apr. 24; 348 (6233) : aaa6090, which is incorporated herein by reference in its entirety.
In some embodiments, the CRISPR-Cas systems described herein can be used to detect a target DNA in a sample (e.g., a clinical sample, a cell, or a cell lysate) . The collateral DNase activity of the Type V CRISPR effector proteins described herein is activated when the effector proteins bind to a target nucleic acid. Upon binding to the target DNA of interest, the effector protein cleaves a labeled detector ssDNA to generate or change a signal (e.g., an increased signal or a decreased signal) thereby allowing for the qualitative and quantitative detection of the target DNA in the sample. The specific detection and quantification of DNA in the sample allows for a multitude of applications including diagnostics.
In some embodiments, the methods include a) contacting a sample with: (i) a crRNA and/or a nucleic acid encoding the crRNA, wherein the crRNA comprises a first stem-loop sequence, a second stem-loop sequence and a spacer sequence capable of hybridizing to the target RNA; (ii) a Type V CRISPR effector protein and/or a nucleic acid encoding the effector protein; and (iii) a labeled detector ssDNA; wherein the effector protein associates with the crRNA to form a complex; wherein the complex hybridizes to the target DNA; and wherein upon binding of the complex to the target DNA, the effector protein exhibits collateral DNase activity and cleaves the labeled detector ssDNA; and b) measuring a detectable signal produced by cleavage of the labeled detector ssDNA, wherein said measuring provides for detection of the target DNA in the sample.
In some embodiments, the methods further include comparing the detectable signal with a reference signal and determining the amount of target DNA in the sample. In some embodiments, the measuring is performed using gold nanoparticle detection, fluorescence polarization, colloid phase transition/dispersion, electrochemical detection, and semiconductor based-sensing. In some embodiments, the labeled detector ssDNA includes a fluorescence-emitting dye pair, a fluorescence resonance energy transfer (FRET) pair, or a quencher/fluorophore pair. In some embodiments, upon cleavage of the labeled detector ssDNA by the effector protein, an amount of detectable signal produced by the labeled detector ssDNA is decreased or increased. In some embodiments, the labeled detector ssDNA produces a first detectable signal prior to cleavage by the effector protein and a second detectable signal after cleavage by the effector protein.
In some embodiments, a detectable signal is produced when the labeled detector ssDNA is cleaved by the effector protein. In some embodiments, the labeled detector ssDNA includes a modified nucleobase, a modified sugar moiety, a modified nucleic acid linkage, or a combination thereof.
In some embodiments, the methods include the multi-channel detection of multiple independent target DNAs in a sample (e.g., two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty, thirty, forty, or more target RNAs) by using multiple CRISPR-Cas systems, each including a distinct orthologous Type V CRISPR effector protein and corresponding crRNAs, allowing for the differentiation of multiple target DNAs in the sample. In some embodiments, the methods include the multi-channel detection of multiple independent target DNAs in a sample, with the use of multiple instances of CRISPR-Cas systems, each containing an orthologous Type V CRISPR effector protein with differentiable collateral ssDNase substrates. Methods of detecting a DNA in a sample using CRISPR-associated proteins are described, for example, in U.S. Patent Publication No. 2017/0362644, the entire contents of which are incorporated herein by reference.
5.3.3 Tracking and Labeling of Nucleic Acids
5.3.3 Tracking and Labeling of Nucleic Acids
Cellular processes depend on a network of molecular interactions among proteins, RNAs, and DNAs. Accurate detection of protein-DNA and protein-RNA interactions is key to understanding such processes. In vitro proximity labeling techniques employ an affinity tag combined with, a reporter group, e.g., a photoactivatable group, to label polypeptides and RNAs in the vicinity of a protein or RNA of interest in vitro. After UV irradiation, the photoactivatable groups react with proteins and other molecules that are in close proximity to the tagged molecules, thereby labelling them. Labelled interacting molecules can subsequently be recovered and identified. The RNA targeting effector proteins can for instance be used to target probes to selected RNA sequences. These applications can also be applied in animal models for in vivo imaging of diseases or difficult-to culture cell types. The methods of tracking and labeling of nucleic acids are described, e.g., in U.S. Pat. No. 8,795,965; WO 2016205764; and WO 2017070605; each of which is incorporated herein by reference in its entirety.
5.3.4 Genome Editing Using Paired CRISPR Nickases
5.3.4 Genome Editing Using Paired CRISPR Nickases
The CRISPR-Cas systems described herein can be used in tandem such that two Cas12i nicking enzymes, or one Cas12i enzyme and one other CRISPR Cas enzyme with nicking activity, targeted by a pair of crRNAs to opposite strands of a target locus, can generate a double-strand break with overhangs. This method may reduce the likelihood of off-target modifications, because a double-strand break is expected to occur only at loci where both enzymes generate a nick, thereby increasing genome editing specificity. This method is referred to as a ‘double nicking’ or ‘paired nickase’ strategy and is described, e.g., in Ran et al., “Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity, ” Cell, 2013 Sep. 12; 154 (6) : 1380-1389, and in Mali et al., “CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering, ” Nature Biotechnology, 2013 Aug. 1; 31: 833-838, which are both incorporated herein by reference in their entireties.
The first applications of paired nickases demonstrated the utility of this strategy in mammalian cell lines. Applications of paired nickases have been described in the model plant Arabidopsis (e.g., in Fauser et al., “Both CRISPR/Cas-based nucleases and nickases can be used efficiently for genome engineering in Arabidopsis thaliana, ” The Plant Journal 79 (2) : 348-59 (2014) , and Shiml et al., “The CRISPR/Cas system can be used as nuclease for in planta gene targeting and as paired nickases for directed mutagenesis in Arabidopsis resulting in heritable progeny, ” The Plant Journal 80 (6) : 1139-50 (2014) ; in crops such as in rice (e.g., in Mikami et al., “Precision Targeted Mutagenesis via Cas9 Paired Nickases in Rice, ” Plant and Cell Physiology 57 (5) : 1058-68 (2016) and in wheat (e.g., in et al., “AMultipurpose Toolkit to Enable Advanced Genome Engineering in Plants, ” Plant Cell 29: 1196-1217 (2017) ; in bacteria (e.g., in Standage-Beier et al., “Targeted Large-Scale Deletion of Bacterial Genomes Using CRISPR-Nickases, ” ACS Synthetic Biology 4 (11) : 1217-25 (2015) ; and in primary human cells for therapeutic purposes (e.g., in Dabrowska et al., “Precise Excision of the CAG Tract from the Huntingtin Gene by Cas9 Nickases, ” Frontiers in Neuroscience 12: 75 (2018) , and in Kocher et al., “Cut and Paste: Efficient Homology-Directed Repair of a Dominant Negative KRT14 Mutation via CRISPR/Cas9 Nickases, ” Molecular Therapy 25 (11) : 2585-2598 (2017) ) , all of which are incorporated herein by reference in their entireties.
The CRISPR-Cas systems described herein can also be used as paired nickases to detect splice junctions as described e.g., in Santo &Paik, “Asplice junction-targeted CRISPR approach (spJCRISPR) reveals human FOXO3B to be a protein-coding gene, ” Gene 673: 95-101 (2018) .
The CRISPR-Cas systems described herein can also be used as paired nickases to insert DNA molecules into target loci as described in e.g., Wang et al, “Therapeutic Genome Editing for Myotonic Dystrophy Type 1 Using CRISPR/Cas9, ” Molecular Therapy 26 (11) : 2617-2630 (2018) . The CRISPR systems described herein can also be used as single nickases to insert genes as described in e.g., Gao et al, “Single Cas9 nickase induced generation of NRAMP1 knockin cattle with reduced off-target effects, ” Genome Biology 18 (1) : 13 (2017) .
5.3.5 Enhancing Base Editing Using CRISPR Nickases
5.3.5 Enhancing Base Editing Using CRISPR Nickases
The CRISPR-Cas systems described herein can be used to augment the efficiency of CRISPR base editing. In base editing, a protein domain with DNA nucleotide modifying activity (e.g., cytidine deamination) is fused to a programmable Type V CRISPR effector protein that has been deactivated by mutation so as to no longer possess double-strand DNA cleavage activity. In some embodiments, using a nickase as the programmable Cas protein has been shown to improve the efficiency of base editing as described e.g., in Komor et al., “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage, ” Nature 533: 420-424 (2016) , and Nishida et al., “Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems, ” Science 353 (6305) : aaf8729 (2016) , both of which are incorporated herein by reference in their entirety. A nickase that nicks the non-edited strand of the target locus is hypothesized to stimulate endogenous DNA repair pathways-such as mismatch repair or long-patch base excision repair, which preferentially resolves a mismatch generated by base editing to a desired allele-or to provide better accessibility of the catalytic editing domain to the target DNA.
(a) Targeted Mutagenesis and DNA Labeling with Nickases and DNA Polymerases
(a) Targeted Mutagenesis and DNA Labeling with Nickases and DNA Polymerases
The CRISPR-Cas systems described herein can be used in conjunction with proteins that act on nicked DNA. One such class of proteins is nick-translating DNA polymerases, such as E. coli DNA polymerase I or Taq DNA polymerase.
In some embodiments, the CRISPR-Cas system (e.g., a CRISPR nickase) can be fused to an error-prone DNA polymerase I. This fusion protein can be targeted with crRNA to generate a nick at a target DNA site. The DNA polymerase then initiates DNA synthesis at the nick, displacing downstream nucleotides, and, because an error-prone polymerase is used, resulting in mutagenesis of the target locus. Polymerase variants with varying processivity, fidelity, and misincorporation biases may be used to influence characteristics of the mutants that are generated. This method, called EvolvR, is described in detail, e.g., in Halperin et al., “CRISPR-guided DNA polymerases enable diversification of all nucleotides in a tunable window, ” Nature 560, 248-252 (2018) , which is incorporated herein by reference in its entirety.
In some embodiments, a CRISPR nickase can be used in a nick translation DNA labeling protocol. Nick translation, first described by Rigby et al in 1977, involves incubating DNA with a DNA nicking enzyme, such as DNase I, which creates one or more nicks in the DNA molecule. Next, a nick-translating DNA polymerase, such as DNA polymerase I, is used to incorporate labeled nucleic acid residues at the nicked sites. Methods of harnessing the programmability of CRISPR nickases to covalently tag telomeric repeats with fluorescent dyes, using a variant of a classical nick translation labeling protocol, are described in detail e.g., in McCaffery et al., “High-throughput single-molecule telomere characterization, ” Genome Research 27: 1904-1915 (2017) , which is incorporated herein by reference in its entirety. This method enables haplotype-resolved analysis of telomere lengths at the single-molecule level.
5.3.6 High-Throughput Screening
5.3.6 High-Throughput Screening
The CRISPR-Cas systems described herein can be used for preparing next generation sequencing (NGS) libraries. For example, to create a cost-effective NGS library, the CRISPR-Cas systems can be used to disrupt the coding sequence of a target gene, and the CRISPR enzyme transfected clones can be screened simultaneously by next-generation sequencing (e.g., on the Ion Torrent PGM system) . A detailed description regarding how to prepare NGS libraries can be found, e.g., in Bell et al., “A high-throughput screening strategy for detecting CRISPR-Cas9 induced mutations using next-generation sequencing, ” BMC Genomics, 15.1 (2014) : 1002, which is incorporated herein by reference in its entirety.
5.3.7 Engineered Microorganisms
5.3.7 Engineered Microorganisms
Microorganisms (e.g., E. coli, yeast, and microalgae) are widely used for synthetic biology. The development of synthetic biology has a wide utility, including various clinical applications. For example, the programmable CRISPR-Cas systems described herein can be used to split proteins of toxic domains for targeted cell death, e.g., using cancer-linked RNA as target transcript. Further, pathways involving protein-protein interactions can be influenced in synthetic biological systems with e.g. fusion complexes with the appropriate effectors such as kinases or enzymes.
In some embodiments, crRNA sequences that target phage sequences can be introduced into the microorganism. Thus, the disclosure also provides methods of vaccinating a microorganism (e.g., a production strain) against phage infection.
In some embodiments, the CRISPR-Cas systems provided herein can be used to engineer microorganisms, e.g., to improve yield or improve fermentation efficiency. For example, the CRISPR-Cas systems described herein can be used to engineer microorganisms, such as yeast, to generate biofuel or biopolymers from fermentable sugars, or to degrade plant-derived lignocellulose derived from agricultural waste as a source of fermentable sugars. More particularly, the methods described herein can be used to modify the expression of endogenous genes required for biofuel production and/or to modify endogenous genes, which may interfere with the biofuel synthesis. These methods of engineering microorganisms are described e.g., in Verwaal et al., “CRISPR/Cpf1 enables fast and simple genome editing of Saccharomyces cerevisiae, ” Yeast, 2017 Sep. 8. doi: 10.1002/yea. 3278; and Hlavova et al., “Improving microalgae for biotechnology-from genetics to synthetic biology, ” Biotechnol. Adv., 2015 Nov. 1; 33: 1194-203, both of which are incorporated herein by reference in their entirety.
In some embodiments, the CRISPR-Cas systems described herein can be used to engineer microorganisms that have defective repair pathways, such as the mesophilic cellulolytic bacterium Clostridium cellylolyticum, a model organism for bioenergy research. In some embodiments, a CRISPR nickase can be used to introduce single nicks at a target locus, which may result in insertion of an exogenously provided DNA template by homologous recombination. A detailed method regarding how to use a CRISPR nickase to edit repair-defective microbes is described e.g., in Xu et al., “Efficient Genome Editing in Clostridium cellulolyticum via CRISPR-Cas9 Nickase, ” Appl Environ Microbiology 81: 4423-4431 (2015) , which is incorporated herein in its entirety.
In some embodiments, the CRISPR-Cas systems provided herein can be used to induce death or dormancy of a cell (e.g., a microorganism such as an engineered microorganism) . These methods can be used to induce dormancy or death of a multitude of cell types including prokaryotic and eukaryotic cells, including, but not limited to, mammalian cells (e.g., cancer cells, or tissue culture cells) , protozoans, fungal cells, cells infected with a virus, cells infected with an intracellular bacteria, cells infected with an intracellular protozoan, cells infected with a prion, bacteria (e.g., pathogenic and non-pathogenic bacteria) , protozoans, and unicellular and multicellular parasites. For instance, in the field of synthetic biology it is highly desirable to have mechanisms of controlling engineered microorganisms (e.g., bacteria) to prevent their propagation or dissemination. The systems described herein can be used as “kill-switches” to regulate and/or prevent the propagation or dissemination of an engineered microorganism. Further, there is a need in the art for alternatives to current antibiotic treatments.
The systems described herein can also be used in applications where it is desirable to kill or control a specific microbial population (e.g., a bacterial population) . For example, the systems described herein may include a crRNA that targets a nucleic acid (e.g., a DNA) that is genus-, species-, or strain-specific, and can be delivered to the cell. Upon complexing and binding to the target nucleic acid, the nuclease activity of the Type V CRISPR effector proteins disrupts essential functions within the microorganisms, ultimately resulting in dormancy or death. In some embodiments, the methods comprise contacting the cell with a system described herein including a Type V CRISPR effector proteins or a nucleic acid encoding the effector protein, and a crRNA or a nucleic acid encoding the RNA guide, wherein the spacer sequence is complementary to at least 15 nucleotides (e.g., 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50 or more nucleotides) of a target nucleic acid.
Without wishing to be bound by any particular theory, the nuclease activity of the Type V CRISPR effector proteins can induce programmed cell death, cell toxicity, apoptosis, necrosis, necroptosis, cell death, cell cycle arrest, cell anergy, a reduction of cell growth, or a reduction in cell proliferation. For example, in bacteria, the cleavage of DNA by the Type V CRISPR effector proteins can be bacteriostatic or bactericidal.
5.3.8 Application in Plants
5.3.8 Application in Plants
The CRISPR-Cas systems described herein have a wide variety of utility in plants. In some embodiments, the CRISPR-Cas systems can be used to engineer genomes of plants (e.g., improving production, making products with desired post-translational modifications, or introducing genes for producing industrial products) . In some embodiments, the CRISPR-Cas systems can be used to introduce a desired trait to a plant (e.g., with or without heritable modifications to the genome) , or regulate expression of endogenous genes in plant cells or whole plants. Plants that can be edited using CRISPR-Cas systems of this disclosure can be monocots or dicots and include, without limitation safflower, maize, cannabis, rice, sugarcane, canola, sorghum, tobacco, rye, barley, wheat, millet, oats, peanut, potato, switchgrass, turfgrass, soybean, alfalfa, sunflower, cotton, and Arabidopsis. The present disclosure also encompasses a plant having a trait made according to a method of the disclosure and/or utilizing a CRISPR-Cas system of the disclosure.
In some embodiments, the CRISPR-Cas systems can be used to identify, edit, and/or silence genes encoding specific proteins, e.g., allergenic proteins (e.g., allergenic proteins in peanuts, soybeans, lentils, peas, green beans, and mung beans) . A detailed description regarding how to identify, edit, and/or silence genes encoding proteins is described, e.g., in Nicolaou et al., “Molecular diagnosis of peanut and legume allergy, ” Curr. Opin. Allergy Clin. Immunol., 11 (3) : 222-8 (2011) , and WO 2016205764 A1; both of which are incorporated herein by reference in their entirety.
5.3.9 Gene Drives
5.3.9 Gene Drives
Gene drive is the phenomenon in which the inheritance of a particular gene or set of genes is favorably biased. The CRISPR-Cas systems described herein can be used to build gene drives. For example, the CRISPR-Cas systems can be designed to target and disrupt a particular allele of a gene, causing the cell to copy the second allele to fix the sequence. Because of the copying, the first allele will be converted to the second allele, increasing the chance of the second allele being transmitted to the offspring. A detailed method regarding how to use the CRISPR system described herein to build gene drives is described, e.g., in Hammond et al., “A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae, ” Nat. Biotechnol., 2016 January; 34 (1) : 78-83, which is incorporated herein by reference in its entirety.
5.3.10 Pooled-Screening
5.3.10 Pooled-Screening
As described herein, pooled CRISPR-Cas screening is a powerful tool for identifying genes involved in biological mechanisms such as cell proliferation, drug resistance, and viral infection. Cells are transduced in bulk with a library of RNA guide (gRNA) -encoding vectors described herein, and the distribution of gRNAs is measured before and after applying a selective challenge. Pooled CRISPR screens work well for mechanisms that affect cell survival and proliferation, and they can be extended to measure the activity of individual genes (e.g., by using engineered reporter cell lines) . Arrayed CRISPR screens, in which only one gene is targeted at a time, make it possible to use RNA-seq as the readout. In some embodiments, the CRISPR systems as described herein can be used in single-cell CRISPR screens. A detailed description regarding pooled CRISPR screenings can be found, e.g., in Datlinger et al., “Pooled CRISPR screening with single-cell transcriptome read-out, ” Nat. Methods., 2017 March; 14 (3) : 297-301, which is incorporated herein by reference in its entirety.
5.3.11 Saturation Mutagenesis ( “Bashing” )
5.3.11 Saturation Mutagenesis ( “Bashing” )
The CRISPR-Cas systems described herein can be used for in situ saturating mutagenesis. In some embodiments, a pooled RNA guide library can be used to perform in situ saturating mutagenesis for particular genes or regulatory elements. Such methods can reveal critical minimal features and discrete vulnerabilities of these genes or regulatory elements (e.g., enhancers) . These methods are described, e.g., in Canver et al., “BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis, ” Nature, 2015 November 12; 527 (7577) : 192-7, which is incorporated herein by reference in its entirety.
5.3.12 Therapeutic Applications
5.3.12 Therapeutic Applications
The CRISPR-Cas systems described herein that have activity in a mammalian cellular context can have a diverse range of therapeutic applications. Moreover, each nuclease ortholog may have unique properties (e.g., size, PAM, etc. ) that render it advantaged for certain targeting, treatment, or delivery modalities, so the ortholog selection is important in allocating the nuclease that provides maximum therapeutic benefit.
There are numerous factors that influence the suitability of gene editing as a therapeutic for a particular disease. With nuclease-based gene therapies, the primary approaches to therapeutic editing have been gene disruption and gene correction. In the former, gene disruption generally occurs with an event (such as a nuclease-induced, targeted double stranded break) that activates the endogenous non homologous end joining DNA repair mechanism of the target cell, yielding indels that often result in a loss of function mutation that is intended to benefit the patient. The latter, gene correction utilizes the nuclease activity to induce alternative DNA repair pathways (such as homology directed repair, or HDR) with the help of a template DNA (whether endogenous or exogenous, single stranded or double stranded) . The templated DNA can either be an endogenous correction of a disease-causing mutation, or otherwise the insertion of a therapeutic transgene into an alternate locus (commonly safe harbor loci such as AAVS1) . Methods of designing exogenous donor template nucleic acids are described, for example, in PCT Publication No. WO 2016094874 A1, the entire contents of which are expressly incorporated herein by reference. A requisite of therapies that use either of these editing modalities is an understanding of the genetic modulators of a certain disease; the diseases do not necessarily have to be monogenic, but insight into how mutations can effect the disease progress or outcome are important to providing guidance as to the potential efficacy of a gene therapy.
Without wishing to be limited, the CRISPR systems described herein can be utilized to treat the following diseases: Cystic fibrosis by targeting CFTR (WO2015157070A2) , Duchenne Muscular Dystrophy and Becker Muscular Dystrophy by targeting Dystrophin (DMD) (WO2016161380A1) , Alpha-1-antitrypsin deficiency by targeting Alpha-1-antitrypsin (A1AT) (WO2017165862A1) , lysosomal storage disorders such as Pompe Disease aka Glycogen storage disease type II by targeting acid alpha-glucosidase (GAA) , myotonic dystrophy by targeting DMPK, Huntington disease by targeting HTT, Fragile X by targeting FMR1, Friedreich's ataxia by targeting Frataxin, amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) by targeting C9orf72, hereditary chronic kidney disease by targeting ApoL1, cardiovascular disease and hyperlipidemia by targeting PCSK9, APOC3, ANGPTL3, LPA (Nature 555, S23-S25 (2018) ) , and congenital blindness such as Leber Congenital Amaurosis Type 10 (LCA10) by targeting CEP290 (Maeder et al., Nat Med. 2019 February; 25 (2) : 229-233) . The majority of the aforementioned diseases are best treated with an in vivo gene editing approach, in which the cell types and tissues involved in the disease need to be edited in situ with a sufficient dose and efficiency to yield a therapeutic benefit. In general the smaller gene size of the Type V CRISPR effector proteins enable more versatile packaging into viral vectors with a payload restriction, such as adeno-associated viruses.
In one aspect, provided herein is a method of treating a condition or disease in a subject in need thereof, the method comprising administering to the subject a system provided herein. In some embodiments, the condition or disease is a cancer or an infectious disease. In some embodiments, the condition or disease is selected from the group consisting of Cystic Fibrosis, Duchenne Muscular Dystrophy, Becker Muscular Dystrophy, Alpha-1 -antitrypsin Deficiency, Pompe Disease, Myotonic Dystrophy, Huntington Disease, Fragile X Syndrome, Friedreich's ataxia, Amyotrophic Lateral Sclerosis, Frontotemporal Dementia, Hereditary Chronic Kidney Disease, Hyperlipidemia, Hypercholesterolemia, Leber Congenital Amaurosis, Sickle Cell Disease, and Beta Thalassemia, Familial Hypercholesterolemia (FH) , Transthyretin Amyloidosis (ATTR) , Primary Hyperoxaluria (PH1) , Hereditary Angioedema (HAE) , and and Atherosclerotic Cardiovascular Disease (ASCVD) .
In some embodiments, the condition or disease is cancer, and wherein the cancer is selected from the group consisting of Wilms'tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, and urinary bladder cancer.
In some embodiments, the condition or disease is infectious, and wherein the infectious agent is selected from the group consisting of human immunodeficiency virus (HIV) , herpes simplex virus-l (HSV1) , and herpes simplex virus-2 (HSV2) , Hepatitis B.
Ex vivo editing, in which cells are removed from the patient’s body and then edited prior to transplantation back into the patient, present a prime therapeutic opportunity for gene editing technologies. The ability to manipulate cells outside the body presents multiple advantages, ranging from the ability to use technologies for high efficiency delivery of protein, DNA, and RNA into cells such as electroporation and nucleofection that are not amenable in an in vivo context, to being able to evaluate toxicity (such as from off-target effects) , then further select and expand successfully edited cells to yield a population that provides a therapeutic advantage. These advantages are counterbalanced by the relatively few cell types and populations that can be successfully harvested, processed, and then returned to the body while preserving functionality. Without wishing to be limited, there nevertheless are serious diseases that are amenable to ex vivo genome editing using the systems described herein. For example, sickle cell disease (SCD) as referenced in WO2015148863A2, and beta-thalassemia as referenced in WO2015148860A1, both are examples of diseases in which the understanding of the pathophysiology has enabled a number of different editing modalities in hematopoietic stem cells for disease treatment. Beta thalassemia and SCD can both be treated with the disruption of the BCL11A erythroid enhancer to increase the levels of fetal hemoglobin (as illustrated using Zinc Finger Nucleases by Psatha et al. Mol Ther Methods Clin Dev. 2018 September 21) . In addition, methods of gene correction can be used to reverse the deleterious mutations in SCD and beta thalassemia. In another instance, the addition of a beta globin expressed from a safe harbor locus provides another alternative therapeutic strategy for ex vivo gene editing.
As a corollary of ex vivo editing of hematopoietic stem cells, immune cells can also be edited. In cancer immunotherapy, one therapeutic mode is to modify immune cells such as T-cells to recognize and fight cancer, as referenced in WO2015161276A2. To increase the efficacy and availability while decreasing cost, the creation of ‘off-the-shelf’ allogeneic T-cell therapies is attractive, and gene editing has the potential to modify surface antigens to minimize any immunological side effects (Jung et al., Mol Cell. 2018 Aug. 31) .
In another embodiment, the invention be used to target viruses or other pathogens with a double stranded DNA intermediate stage of their life cycle. Specifically, targeting viruses whose initial infection leaves a latent infection that persists permanently would be of significant therapeutic value. In the following examples, the CRISPR-Cas systems can be used to directly target the viral genome (such as with HSV-1, HSV-2 or HIV) , or used to edit the host cells to reduce or eliminate the receptors enabling infection to make them impervious to the virus (HIV) , as referenced for HSV-1 and HSV-2 in WO2015153789A1, WO2015153791A1, and WO2017075475A1, and for HIV in WO2015148670A1 and WO2016183236A1.
In another aspect, the CRISPR-Cas systems described herein can be engineered to enable additional functions that utilize enzymatically inactive effector protein as a chassis on top of which protein domains can be attached to confer activities such as transcriptional activation, repression, base editing, and methylation/demethylation.
Thus, this disclosure provides CRISPR-Cas systems and cells for use in the treatment or prevention of any of the disease disclosed herein.
5.4 Delivery of CRISPR-Cas Systems
5.4 Delivery of CRISPR-Cas Systems
In one aspect, provided herein is a polynucleotide encoding the CRISPR effector protein (e.g. a Type V CRISPR effector protein of Section 5.2.2 (Cas Proteins) ) . In one aspect, provided herein is a polynucleotide encoding the guide molecule (e.g. a crRNA of Section 5.2.1 (crRNA) ) . In some embodiments, the polynucleotide encoding the CRISPR effector protein is an mRNA molecule. In some embodiments, the polynucleotide encoding the guide molecule is an mRNA molecule. In some embodiments, the polynucleotide encoding the CRISPR effector protein is a circular RNA molecule. In some embodiments, the polynucleotide encoding the guide molecule is a circular RNA molecule.
5.4.1 Lipid Nanoparticles
5.4.1 Lipid Nanoparticles
In some embodiments of the CRISPR-Cas system provided herein, the mRNA encoding the CRISPR effector protein and the mRNA encoding the guide molecule are present in a delivery system selected from the group consisting of a lipid nanoparticle, a liposome, an exosome, a micro-vesicle, and a gene-gun.
In one aspect, nucleic acid molecules described herein are formulated for in vitro and in vivo delivery. Particularly, in some embodiments the nucleic acid molecule is formulated into a lipid-containing composition. In some embodiments, the lipid-containing composition forms lipid nanoparticles enclosing the nucleic acid molecule within a lipid shell. In some embodiments, the lipid shells protect the nucleic acid molecules from degradation. In some embodiments, the lipid nanoparticles also facilitate transportation of the enclosed nucleic acid molecules into intracellular compartments and/or machinery to exert an intended therapeutic of prophylactic function. In certain embodiments, nucleic acids, when present in the lipid nanoparticles, are resistant in aqueous solution to degradation with a nuclease. Lipid nanoparticles comprising nucleic acids and their method of preparation are known in the art, such as those disclosed in, e.g., U.S. Patent Publication No. 2004/0142025, U.S. Patent Publication No. 2007/0042031, PCT Publication No. WO 2017/004143, PCT Publication No. WO 2015/199952, PCT Publication No. WO 2013/016058, and PCT Publication No. WO 2013/086373, the full disclosures of each of which are herein incorporated by reference in their entirety for all purposes.
In some embodiments, the largest dimension of a nanoparticle composition provided herein is 1 μm or shorter (e.g., ≤1 μm, ≤900 nm, ≤800 nm, ≤700 nm, ≤600 nm, ≤500 nm, ≤400 nm, ≤300 nm, ≤200 nm, ≤175 nm, ≤150 nm, ≤125 nm, ≤100 nm, ≤75 nm, ≤50 nm, or shorter) , such as when measured by dynamic light scattering (DLS) , transmission electron microscopy, scanning electron microscopy, or another method. In one embodiment, the lipid nanoparticle provided herein has at least one dimension that is in the range of from about 40 to about 200 nm. In one embodiment, the at least one dimension is in the range of from about 40 to about 100 nm.
Nanoparticle compositions that can be used in connection with the present disclosure include, for example, lipid nanoparticles (LNPs) , nano liproprotein particles, liposomes, lipid vesicles, and lipoplexes. In some embodiments, nanoparticle compositions are vesicles including one or more lipid bilayers. In some embodiments, a nanoparticle composition includes two or more concentric bilayers separated by aqueous compartments. Lipid bilayers may be functionalized and/or crosslinked to one another. Lipid bilayers may include one or more ligands, proteins, or channels.
In some embodiments, nanoparticle compositions as described comprise a lipid component including at least one lipid, such as a compound according to one of Formulae (I) to (IV) (and sub-formulas thereof) as described herein. For example, in some embodiments, a nanoparticle composition may include a lipid component including one of compounds provided herein. Nanoparticle compositions may also include one or more other lipid or non-lipid components as described below.
(a) Ionizable Lipids
(a) Ionizable Lipids
As described herein, in some embodiments, a nanoparticle composition provided herein comprises one or more charged or ionizable lipids in addition to a cationic lipid. Without being bound by the theory, it is contemplated that certain charged or zwitterionic lipid components of a nanoparticle composition resembles the lipid component in the cell membrane, thereby can improve cellular uptake of the nanoparticle. Exemplary charged or ionizable lipids that can form part of the present nanoparticle composition include but are not limited to ( (4-hydroxybutyl) azanediyl) bis (hexane-6, 1-diyl) bis (2-hexyldecanoate) (ALC-0315) , 3- (didodecylamino) -N1, N1, 4-tridodecyl-1-piperazineethanamine (KL10) , N1- [2- (didodecylamino) ethyl] -N1, N4, N4-tridodecyl-1, 4-piperazinediethanamine (KL22) , 14, 25-ditridecyl-15, 18, 21, 24-tetraaza-octatriacontane (KL25) , 1, 2-dilinoleyloxy-N, N-dimethylaminopropane (DLinDMA) , 2, 2-dilinoleyl-4-dimethylaminomethyl- [1, 3] -dioxolane (DLin-K-DMA) , heptatriaconta-6, 9, 28, 31-tetraen-19-yl 4- (dimethylamino) butanoate (DLin-MC3-DMA) , 2, 2-dilinoleyl-4- (2-dimethylaminoethyl) - [1, 3] -dioxolane (DLin-KC2-DMA) , 1, 2-dioleyloxy-N, N-dimethylaminopropane (DODMA) , 2- ( {8- [ (3β) -cholest-5-en-3-yloxy] octyl} oxy) -N, N-dimethyl-3 [ (9Z, 12Z) -octadeca-9, 12-dien-1-yloxy] propan-1-amine (Octyl-CLinDMA) , (2R) -2- ( {8- [ (3β) -cholest-5-en-3-yloxy] octyl} oxy) -N, N-dimethyl-3- [ (9Z, 12Z) --octadeca-9, 12-dien-1-yloxy] propan-1-amine (Octyl-CLinDMA (2R) ) , (2S) -2- ( {8- [ (3β) -cholest-5-en-3-yloxy] octyl} oxy) -N, N-dimethyl-3- [ (9Z-, 12Z) -octadeca-9, 12-dien-1-yloxy] propan-1-amine (Octyl-CLinDMA (2S) ) , (12Z, 15Z) -N, N-dimethyl-2-nonylhenicosa 12, 15-den-1-amine, N, N-dimethyl-1- { (1S, 2R) -2-octylcyclopropyl} heptadecan-8-amine. Additional exemplary charged or ionizable lipids that can form part of the present nanoparticle composition include the lipids (e.g., lipid 5) described in Sabnis et al. “A Novel Amino Lipid Series for mRNA Delivery: Improved Endosomal Escape and Sustained Pharmacology and Safety in Non-human Primates” , Molecular Therapy Vol. 26 No 6, 2018, the entirety of which is incorporated herein by reference.
In some embodiments, suitable cationic lipids include N- [1- (2, 3-dioleyloxy) propyl] -N, N, N-trimethylammonium chloride (DOTMA) ; N- [1- (2, 3-dioleoyloxy) propyl] -N, N, N-trimethylammonium chloride (DOTAP) ; 1, 2-dioleoyl-sn-glycero-3-ethylphosphocholine (DOEPC) ; 1, 2-dilauroyl-sn-glycero-3-ethylphosphocholine (DLEPC) ; 1, 2-dimyristoyl-sn-glycero-3-ethylphosphocholine (DMEPC) ; 1, 2-dimyristoleoyl-sn-glycero-3-ethylphosphocholine (14: 1) ; N1- [2- ( (1S) -1- [ (3-aminopropyl) amino] -4- [di (3-amino-propyl) amino] butylcarboxamido) ethyl] -3, 4-di [oleyloxy] -benzamide (MVL5) ; dioctadecylamido-glycylspermine (DOGS) ; 3b- [N- (N', N' -dimethylaminoethyl) carbamoyl] cholesterol (DC-Chol) ; dioctadecyldimethylammonium bromide (DDAB) ; SAINT-2, N-methyl-4- (dioleyl) methylpyridinium; 1, 2-dimyristyloxypropyl-3-dimethylhydroxyethylammonium bromide (DMRIE) ; 1, 2-dioleoyl-3-dimethyl-hydroxyethyl ammonium bromide (DORIE) ; 1, 2-dioleoyloxypropyl-3-dimethylhydroxyethyl ammonium chloride (DORI) ; di-alkylated amino acid (DILA2) (e.g., C18: 1-norArg-C16) ; dioleyldimethylammonium chloride (DODAC) ; 1-palmitoyl-2-oleoyl-sn-glycero-3-ethylphosphocholine (POEPC) ; 1, 2-dimyristoleoyl-sn-glycero-3-ethylphosphocholine (MOEPC) ; (R) -5- (dimethylamino) pentane-1, 2-diyl dioleate hydrochloride (DODAPen-Cl) ; (R) -5-guanidinopentane-1, 2-diyl dioleate hydrochloride (DOPen-G) ; and (R) -N, N, N-trimethyl-4, 5-bis (oleoyloxy) pentan-1-aminium chloride (DOTAPen) . Also suitable are cationic lipids with headgroups that are charged at physiological pH, such as primary amines (e.g., DODAG N', N' -dioctadecyl-N-4, 8-diaza-10-aminodecanoylglycine amide) and guanidinium head groups (e.g., bis-guanidinium-spermidine-cholesterol (BGSC) , bis-guanidiniumtren-cholesterol (BGTC) , PONA, and (R) -5-guanidinopentane-1, 2-diyl dioleate hydrochloride (DOPen-G) ) . Yet another suitable cationic lipid is (R) -5- (dimethylamino) pentane-1, 2-diyl dioleate hydrochloride (DODAPen-Cl) . In certain embodiments, the cationic lipid is a particular enantiomer or the racemic form, and includes the various salt forms of a cationic lipid as above (e.g., chloride or sulfate) . For example, in some embodiments, the cationic lipid is N- [1- (2, 3-dioleoyloxy) propyl] -N, N, N-trimethylammonium chloride (DOTAP-Cl) or N- [1- (2, 3-dioleoyloxy) propyl] -N, N, N-trimethylammonium sulfate (DOTAP-Sulfate) . In some embodiments, the cationic lipid is an ionizable cationic lipid such as, e.g., dioctadecyldimethylammonium bromide (DDAB) ; 1, 2-dilinoleyloxy-3-dimethylaminopropane (DLinDMA) ; 2, 2-dilinoleyl-4- (2dimethylaminoethyl) - [1, 3] -dioxolane (DLin-KC2-DMA) ; heptatriaconta-6, 9, 28, 31-tetraen-19-yl 4- (dimethylamino) butanoate (DLin-MC3-DMA) ; 1, 2-dioleoyloxy-3-dimethylaminopropane (DODAP) ; 1, 2-dioleyloxy-3-dimethylaminopropane (DODMA) ; and morpholinocholesterol (Mo-CHOL) . In certain embodiments, a lipid nanoparticle includes a combination or two or more cationic lipids (e.g., two or more cationic lipids as above) .
Additionally, in some embodiments, the charged or ionizable lipid that can form part of the present nanoparticle composition is a lipid including a cyclic amine group. Additional cationic lipids that are suitable for the formulations and methods disclosed herein include those described in WO2015199952, WO2016176330, and WO2015011633, the entire contents of each of which are hereby incorporated by reference in their entireties. Additionally, in some embodiments, the charged or ionizable lipid that can form part of the present nanoparticle composition is a lipid including a cyclic amine group. Additional cationic lipids that are suitable for the formulations and methods disclosed herein include those described in WO2015199952, WO2016176330, WO2015011633, WO2018/081480, the entire contents of each of which are hereby incorporated by reference in their entireties.
(b) Polymer Conjugated Lipids
(b) Polymer Conjugated Lipids
In some embodiments, the lipid component of a nanoparticle composition can include one or more polymer conjugated lipids, such as PEGylated lipids (PEG lipids) . Without being bound by the theory, it is contemplated that a polymer conjugated lipid component in a nanoparticle composition can improve of colloidal stability and/or reduce protein absorption of the nanoparticles. Exemplary cationic lipids that can be used in connection with the present disclosure include but are not limited to 2- [ (polyethylene glycol) -2000] -N, N-ditetradecylacetamide (ALC-0159) , PEG-modified phosphatidylethanolamines, PEG-modified phosphatidic acids, PEG-modified ceramides, PEG-modified dialkylamines, PEG-modified diacylglycerols, PEG-modifieddialkylglycerols, and mixtures thereof. For example, a PEG lipid may be PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, PEG-DSPE, Ceramide-PEG2000, or Chol-PEG2000.
In one embodiment, the polymer conjugated lipid is a pegylated lipid. For example, some embodiments include a pegylated diacylglycerol (PEG-DAG) such as 1- (monomethoxy-polyethyleneglycol) -2, 3-dimyristoylglycerol (PEG-DMG) , a pegylated phosphatidylethanoloamine (PEG-PE) , a PEG succinate diacylglycerol (PEG-S-DAG) such as 4-O- (2’ , 3’ -di (tetradecanoyloxy) propyl-1-O- (ω-methoxy (polyethoxy) ethyl) butanedioate (PEG-S-DMG) , a pegylated ceramide (PEG-cer) , or a PEG dialkoxypropylcarbamate such as ω-methoxy (polyethoxy) ethyl-N- (2, 3-di(tetradecanoxy) propyl) carbamate or 2, 3-di (tetradecanoxy) propyl-N- (ω-methoxy (polyethoxy) ethyl) carbamate.
In one embodiment, the polymer conjugated lipid is present in a concentration ranging from 1.0 to 2.5 molar percent. In one embodiment, the polymer conjugated lipid is present in a concentration of about 1.7 molar percent. In one embodiment, the polymer conjugated lipid is present in a concentration of about 1.5 molar percent.
In one embodiment, the molar ratio of cationic lipid to the polymer conjugated lipid ranges from about 35: 1 to about 25: 1. In one embodiment, the molar ratio of cationic lipid to polymer conjugated lipid ranges from about 100: 1 to about 20: 1.
In one embodiment, the molar ratio of cationic lipid to the polymer conjugated lipid ranges from about 35: 1 to about 25: 1. In one embodiment, the molar ratio of cationic lipid to polymer conjugated lipid ranges from about 100: 1 to about 20: 1.
In one embodiment, the pegylated lipid has the following Formula:
or a pharmaceutically acceptable salt, tautomer or stereoisomer thereof, wherein:
R12 and R13 are each independently a straight or branched, saturated or unsaturated alkyl chain containing from 10 to 30 carbon atoms, wherein the alkyl chain is optionally interrupted by one or more ester bonds; and
w has a mean value ranging from 30 to 60.
In one embodiment, R12 and R13 are each independently straight, saturated alkyl chains containing from 12 to 16 carbon atoms. In other embodiments, the average w ranges from 42 to 55, for example, the average w is 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54 or 55. In some specific embodiments, the average w is about 49.
In one embodiment, the pegylated lipid has the following Formula:
wherein the average w is about 49.
(c) Structural Lipids
(c) Structural Lipids
In some embodiments, the lipid component of a nanoparticle composition can include one or more structural lipids. Without being bound by the theory, it is contemplated that structural lipids can stabilize the amphiphilic structure of a nanoparticle, such as but not limited to the lipid bilayer structure of a nanoparticle. Exemplary structural lipids that can be used in connection with the present disclosure include but are not limited to cholesterol, fecosterol, sitosterol, ergosterol, campesterol, stigmasterol, brassicasterol, tomatidine, tomatine, ursolic acid, alpha-tocopherol, and mixtures thereof. In certain embodiments, the structural lipid is cholesterol. In some embodiments, the structural lipid includes cholesterol and a corticosteroid (such as prednisolone, dexamethasone, prednisone, and hydrocortisone) , or a combination thereof.
In one embodiment, the lipid nanoparticles provided herein comprise a steroid or steroid analogue. In one embodiment, the steroid or steroid analogue is cholesterol. In one embodiment, the steroid is present in a concentration ranging from 39 to 49 molar percent, 40 to 46 molar percent, from 40 to 44 molar percent, from 40 to 42 molar percent, from 42 to 44 molar percent, or from 44 to 46 molar percent. In one embodiment, the steroid is present in a concentration of 40, 41, 42, 43, 44, 45, or 46 molar percent.
In one embodiment, the molar ratio of cationic lipid to the steroid ranges from 1.0: 0.9 to 1.0: 1.2, or from 1.0: 1.0 to 1.0: 1.2. In one embodiment, the molar ratio of cationic lipid to cholesterol ranges from about 5: 1 to 1: 1. In one embodiment, the steroid is present in a concentration ranging from 32 to 40 mol percent of the steroid.
In one embodiment, the molar ratio of cationic lipid to the steroid ranges from 1.0: 0.9 to 1.0: 1.2, or from 1.0: 1.0 to 1.0: 1.2. In one embodiment, the molar ratio of cationic lipid to cholesterol ranges from about 5: 1 to 1: 1. In one embodiment, the steroid is present in a concentration ranging from 32 to 40 mol percent of the steroid.
(d) Phospholipids
(d) Phospholipids
In some embodiments, the lipid component of a nanoparticle composition can include one or more phospholipids, such as one or more (poly) unsaturated lipids. Without being bound by the theory, it is contemplated that phospholipids may assemble into one or more lipid bilayers structures. Exemplary phospholipids that can form part of the present nanoparticle composition include but are not limited to 1, 2-distearoyl-sn-glycero-3-phosphocholine (DSPC) , 1, 2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE) , 1, 2-dilinoleoyl-sn-glycero-3-phosphocholine (DLPC) , 1, 2-dimyristoyl-sn-glycero-phosphocholine (DMPC) , 1, 2-dioleoyl-sn-glycero-3-phosphocholine (DOPC) , 1, 2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC) , 1, 2-diundecanoyl-sn-glycero-phosphocholine (DUPC) , 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) , 1, 2-di-O-octadecenyl-sn-glycero-3-phosphocholine (18: 0 Diether PC) , 1-oleoyl-2-cholesterylhemisuccinoyl-sn-glycero-3-phosphocholine (OChemsPC) , 1-hexadecyl-sn-glycero-3-phosphocholine (C16 Lyso PC) , 1, 2-dilinolenoyl-sn-glycero-3-phosphocholine, 1, 2-diarachidonoyl-sn-glycero-3-phosphocholine, 1, 2-didocosahexaenoyl-sn-glycero-3-phosphocholine, 1, 2-diphytanoyl-sn-glycero-3-phosphoethanolamine (ME 16.0 PE) , 1, 2-distearoyl-sn-glycero-3-phosphoethanolamine, 1, 2-dilinoleoyl-sn-glycero-3-phosphoethanolamine, 1, 2-dilinolenoyl-sn-glycero-3-phosphoethanolamine, 1, 2-diarachidonoyl-sn-glycero-3-phosphoethanolamine, 1, 2-didocosahexaenoyl-sn-glycero-3-phosphoethanolamine, 1, 2-dioleoyl-sn-glycero-3-phospho-rac- (1-glycerol) sodium salt (DOPG) , and sphingomyelin. In certain embodiments, a nanoparticle composition includes DSPC. In certain embodiments, a nanoparticle composition includes DOPE. In some embodiments, a nanoparticle composition includes both DSPC and DOPE.
Additional exemplary neutral lipids include, for example, dipalmitoylphosphatidylglycerol (DPPG) , palmitoyloleoyl-phosphatidylethanolamine (POPE) and dioleoyl-phosphatidylethanolamine 4- (N-maleimidomethyl) -cyclohexane-1carboxylate (DOPE-mal) , dipalmitoyl phosphatidyl ethanolamine (DPPE) , dimyristoylphosphoethanolamine (DMPE) , distearoyl-phosphatidylethanolamine (DSPE) , 16-O-monomethyl PE, 16-O-dimethyl PE, 18-1-trans PE, 1-stearioyl-2-oleoylphosphatidyethanol amine (SOPE) , and 1, 2-dielaidoyl-sn-glycero-3-phophoethanolamine (transDOPE) . In one embodiment, the neutral lipid is 1, 2-distearoyl-sn-glycero-3phosphocholine (DSPC) . In one embodiment, the neutral lipid is selected from DSPC, DPPC, DMPC, DOPC, POPC, DOPE and SM.
In one embodiment, the neutral lipid is phosphatidylcholine (PC) , phosphatidylethanolamine (PE) phosphatidylserine (PS) , phosphatidic acid (PA) , or phosphatidylglycerol (PG) .
Additionally phospholipids that can form part of the present nanoparticle composition also include those described in WO2017/112865, the entire content of which is hereby incorporated by reference in its entirety.
(e) Formulation
(e) Formulation
According to the present disclosure, nanoparticle compositions described herein can include at least one lipid component and one or more additional components, such as a therapeutic and/or prophylactic agent (e.g., the therapeutic nucleic acid described herein) . A nanoparticle composition may be designed for one or more specific applications or targets. The elements of a nanoparticle composition may be selected based on a particular application or target, and/or based on the efficacy, toxicity, expense, ease of use, availability, or other feature of one or more elements. Similarly, the particular formulation of a nanoparticle composition may be selected for a particular application or target according to, for example, the efficacy and toxicity of particular combinations of elements.
The lipid component of a nanoparticle composition may include, for example, a cationic lipid (e.g., ALC-0315) , a phospholipid (such as an unsaturated lipid, e.g., DOPE or DSPC) , a PEG lipid, and a structural lipid. The elements of the lipid component may be provided in specific fractions.
In one embodiment, provided herein is a nanoparticle composition comprising a cationic or ionizable lipid compound provided herein, a nucleic acid molecule, and one or more excipients. In one embodiment, cationic or ionizable lipid compound comprises one or more ionizable lipid compounds described herein. In one embodiment, the one or more excipients are selected from phospholipids, steroids, and polymer conjugated lipids. In one embodiment, the therapeutic agent is encapsulated within or associated with the lipid nanoparticle.
In one embodiment, provided herein is a nanoparticle composition (lipid nanoparticle) comprising:
i) between about 20 and about 60 mol percent of an ionizable lipid;
ii) between about 5 and about 30 mol percent of a phospholipid;
iii) between about 25 and about 70 mol percent of a steroid;
(vi) between about 0.2 and about 10 mol percent of a PEG-conjugated lipid.
In one embodiment, provided herein is a nanoparticle composition (lipid nanoparticle) comprising:
i) between about 30 and about 50 mol percent of an ionizable lipid;
ii) between about 5 and about 20 mol percent of a phospholipid;
iii) between about 40 and about 60 mol percent of a steroid;
iv) between about 1 and about 5 mol percent of a PEG-conjugated lipid
In one embodiment, provided herein is a nanoparticle composition (lipid nanoparticle) comprising:
i) between about 40 and about 50 mol percent of an ionizable lipid;
ii) between about 5 and about 10 mol percent of a phospholipid;
iii) between about 40 and about 50 mol percent of a steroid;
iv) between about 1 and about 2 mol percent of a PEG-conjugated lipid.
In certain embodiments, provided herein is a nanoparticle composition (lipid nanoparticle) comprising:
i) about 46.30 mol percent of an ionizable lipid;
ii) about 9.40 mol percent of an phospholipid;
iii) about 42.70 mol percent of a steroid; and
iv) about 1.60 mol percent of a PEG-conjugated lipid.
In specific embodiments, the ionizable lipid is ( (4-hydroxybutyl) azanediyl) bis (hexane-6, 1-diyl) bis (2-hexyldecanoate) (ALC-0315) . In specific embodiments, the phospholipid is DSPC. In specific embodiments, the steroid is cholesterol. In specific embodiments, the PEG conjugated lipid is 2- [ (polyethylene glycol) -2000] -N, N-ditetradecylacetamide (ALC-0159) .
In one embodiment, provided herein is a nanoparticle composition (lipid nanoparticle) comprising:
i) between 40 and 50 mol percent of an ionizable lipid;
ii) a neutral lipid;
iii) a steroid;
iv) a polymer conjugated lipid; and
v) a nucleic acid molecule.
As used herein, “mol percent” refers to a component’s molar percentage relative to total mols of all lipid components in the LNP (i.e., total mols of cationic lipid (s) , the neutral lipid, the steroid and the polymer conjugated lipid) .
In one embodiment, the lipid nanoparticle comprises from 41 to 49 mol percent, from 41 to 48 mol percent, from 42 to 48 mol percent, from 43 to 48 mol percent, from 44 to 48 mol percent, from 45 to 48 mol percent, from 46 to 48 mol percent, or from 47.2 to 47.8 mol percent of the cationic lipid. In one embodiment, the lipid nanoparticle comprises about 47.0, 47.1, 47.2, 47.3, 47.4, 47.5, 47.6, 47.7, 47.8, 47.9 or 48.0 mol percent of the ionizable lipid.
In one embodiment, the neutral lipid is present in a concentration ranging from 5 to 15 mol percent, 7 to 13 mol percent, or 9 to 11 mol percent. In one embodiment, the neutral lipid is present in a concentration of about 9.5, 10 or 10.5 mol percent. In one embodiment, the molar ratio of the cationic lipid to the neutral lipid ranges from about 4.1: 1.0 to about 4.9: 1.0, from about 4.5: 1.0 to about 4.8: 1.0, or from about 4.7: 1.0 to 4.8: 1.0.
In one embodiment, the steroid is present in a concentration ranging from 39 to 49 molar percent, 40 to 46 molar percent, from 40 to 44 molar percent, from 40 to 42 molar percent, from 42 to 44 molar percent, or from 44 to 46 molar percent. In one embodiment, the steroid is present in a concentration of 40, 41, 42, 43, 44, 45, or 46 molar percent. In one embodiment, the molar ratio of cationic lipid to the steroid ranges from 1.0: 0.9 to 1.0: 1.2, or from 1.0: 1.0 to 1.0: 1.2. In one embodiment, the steroid is cholesterol.
In one embodiment, the therapeutic agent to lipid ratio in the LNP (i.e., N/P, were N represents the moles of cationic lipid and P represents the moles of phosphate present as part of the nucleic acid backbone) range from 2: 1 to 30: 1, for example 3: 1 to 22: 1. In one embodiment, N/P ranges from 6: 1 to 20: 1 or 2: 1 to 12: 1. Exemplary N/P ranges include about 3: 1. About 6: 1, about 12: 1 and about 22: 1.
In one embodiment, provided herein is a lipid nanoparticle comprising:
i) a cationic lipid having an effective pKa greater than 6.0; ii) from 5 to 15 mol percent of a neutral lipid;
iii) from 1 to 15 mol percent of an anionic lipid;
iv) from 30 to 45 mol percent of a steroid;
v) a polymer conjugated lipid; and
vi) a nucleic acid molecule,
wherein the mol percent is determined based on total mol of lipid present in the lipid nanoparticle.
In one embodiment, the cationic lipid can be any of a number of lipid species which carry a net positive charge at a selected pH, such as physiological pH. Exemplary cationic lipids are described herein below. In one embodiment, the cationic lipid has a pKa greater than 6.25. In one embodiment, the cationic lipid has a pKa greater than 6.5. In one embodiment, the cationic lipid has a pKa greater than 6.1, greater than 6.2, greater than 6.3, greater than 6.35, greater than 6.4, greater than 6.45, greater than 6.55, greater than 6.6, greater than 6.65, or greater than 6.7.
In one embodiment, the lipid nanoparticle comprises from 40 to 45 mol percent of the cationic lipid. In one embodiment, the lipid nanoparticle comprises from 45 to 50 mole percent of the cationic lipid.
In one embodiment, the molar ratio of the cationic lipid to the neutral lipid ranges from about 2: 1 to about 8: 1. In one embodiment, the lipid nanoparticle comprises from 5 to 10 mol percent of the neutral lipid.
Exemplary anionic lipids include, but are not limited to, phosphatidylglycerol, dioleoylphosphatidylglycerol (DOPG) , dipalmitoylphosphatidylglycerol (DPPG) or 1, 2-distearoyl-sn-glycero-3-phospho- (1' -rac-glycerol) (DSPG) .
In one embodiment, the lipid nanoparticle comprises from 1 to 10 mole percent of the anionic lipid. In one embodiment, the lipid nanoparticle comprises from 1 to 5 mole percent of the anionic lipid. In one embodiment, the lipid nanoparticle comprises from 1 to 9 mole percent, from 1 to 8 mole percent, from 1 to 7 mole percent, or from 1 to 6 mole percent of the anionic lipid. In one embodiment, the mol ratio of anionic lipid to neutral lipid ranges from 1: 1 to 1: 10.
In one embodiment, the steroid cholesterol. In one embodiment, the molar ratio of the cationic lipid to cholesterol ranges from about 5: 1 to 1: 1. In one embodiment, the lipid nanoparticle comprises from 32 to 40 mol percent of the steroid.
In one embodiment, the sum of the mol percent of neutral lipid and mol percent of anionic lipid ranges from 5 to 15 mol percent. In one embodiment, wherein the sum of the mol percent of neutral lipid and mol percent of anionic lipid ranges from 7 to 12 mol percent.
In one embodiment, the mol ratio of anionic lipid to neutral lipid ranges from 1: 1 to 1: 10. In one embodiment, the sum of the mol percent of neutral lipid and mol percent steroid ranges from 35 to 45 mol percent.
In one embodiment, the lipid nanoparticle comprises:
i) from 45 to 55 mol percent of the cationic lipid;
ii) from 5 to 10 mol percent of the neutral lipid;
iii) from 1 to 5 mol percent of the anionic lipid; and
iv) from 32 to 40 mol percent of the steroid.
In one embodiment, the lipid nanoparticle comprises from 1.0 to 2.5 mol percent of the conjugated lipid. In one embodiment, the polymer conjugated lipid is present in a concentration of about 1.5 mol percent.
In one embodiment, the neutral lipid is present in a concentration ranging from 5 to 15 mol percent, 7 to 13 mol percent, or 9 to 11 mol percent. In one embodiment, the neutral lipid is present in a concentration of about 9.5, 10 or 10.5 mol percent. In one embodiment, the molar ratio of the cationic lipid to the neutral lipid ranges from about 4.1: 1.0 to about 4.9: 1.0, from about 4.5: 1.0 to about 4.8: 1.0, or from about 4.7: 1.0 to 4.8: 1.0.
In one embodiment, the steroid is cholesterol. In some embodiments, the steroid is present in a concentration ranging from 39 to 49 molar percent, 40 to 46 molar percent, from 40 to 44 molar percent, from 40 to 42 molar percent, from 42 to 44 molar percent, or from 44 to 46 molar percent. In one embodiment, the steroid is present in a concentration of 40, 41, 42, 43, 44, 45, or 46 molar percent. In certain embodiments, the molar ratio of cationic lipid to the steroid ranges from 1.0: 0.9 to 1.0: 1.2, or from 1.0: 1.0 to 1.0: 1.2.
In one embodiment, the molar ratio of cationic lipid to steroid ranges from 5: 1 to 1: 1.
In one embodiment, the lipid nanoparticle comprises from 1.0 to 2.5 mol percent of the conjugated lipid. In one embodiment, the polymer conjugated lipid is present in a concentration of about 1.5 mol percent.
In one embodiment, the molar ratio of cationic lipid to polymer conjugated lipid ranges from about 100: 1 to about 20: 1. In one embodiment, the molar ratio of cationic lipid to the polymer conjugated lipid ranges from about 35: 1 to about 25: 1.
In one embodiment, the molar ratio of cationic lipid to polymer conjugated lipid ranges from about 100: 1 to about 20: 1. In one embodiment, the molar ratio of cationic lipid to the polymer conjugated lipid ranges from about 35: 1 to about 25: 1.
In one embodiment, the lipid nanoparticle has a mean diameter ranging from 50 nm to 100 nm, or from 60 nm to 85 nm.
In one embodiment, the composition comprises a cationic lipid provided herein, DSPC, cholesterol, and PEG-lipid, and mRNA. In one embodiment, the cationic lipid provided herein, DSPC, cholesterol, and PEG-lipid are at a molar ratio of about 50: 10: 38.5: 1.5.
Nanoparticle compositions can be designed for one or more specific applications or targets. For example, a nanoparticle composition can be designed to deliver a therapeutic and/or prophylactic agent such as an RNA to a particular cell, tissue, organ, or system or group thereof in a mammal’s body. Physiochemical properties of nanoparticle compositions can be altered in order to increase selectivity for particular bodily targets. For instance, particle sizes can be adjusted based on the fenestration sizes of different organs. The therapeutic and/or prophylactic agent included in a nanoparticle composition can also be selected based on the desired delivery target or targets. For example, a therapeutic and/or prophylactic agent can be selected for a particular indication, condition, disease, or disorder and/or for delivery to a particular cell, tissue, organ, or system or group thereof (e.g., localized or specific delivery) . In certain embodiments, a nanoparticle composition can include an mRNA encoding a polypeptide of interest capable of being translated within a cell to produce the polypeptide of interest. Such a composition can be designed to be specifically delivered to a particular organ. In certain embodiments, a composition can be designed to be specifically delivered to a mammalian liver.
The amount of a therapeutic and/or prophylactic agent in a nanoparticle composition can depend on the size, composition, desired target and/or application, or other properties of the nanoparticle composition as well as on the properties of the therapeutic and/or prophylactic agent. For example, the amount of an RNA useful in a nanoparticle composition can depend on the size, sequence, and other characteristics of the RNA. The relative amounts of a therapeutic and/or prophylactic agent and other elements (e.g., lipids) in a nanoparticle composition can also vary. In some embodiments, the wt/wt ratio of the lipid component to a therapeutic and/or prophylactic agent in a nanoparticle composition can be from about 5: 1 to about 60: 1, such as about 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 22: 1, 25: 1, 30: 1, 35: 1, 40: 1, 45: 1, 50: 1, and 60: 1. For example, the wt/wt ratio of the lipid component to a therapeutic and/or prophylactic agent can be from about 10: 1 to about 40: 1. In certain embodiments, the wt/wt ratio is about 20: 1. The amount of a therapeutic and/or prophylactic agent in a nanoparticle composition can, for example, be measured using absorption spectroscopy (e.g., ultraviolet-visible spectroscopy) .
In some embodiments, a nanoparticle composition includes one or more RNAs, and the one or more RNAs, lipids, and amounts thereof can be selected to provide a specific N: P ratio. The N: P ratio of the composition refers to the molar ratio of nitrogen atoms in one or more lipids to the number of phosphate groups in an RNA. In some embodiments, a lower N: P ratio is selected. The one or more RNA, lipids, and amounts thereof can be selected to provide an N: P ratio from about 2: 1 to about 30: 1, such as 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 12: 1, 14: 1, 16: 1, 18: 1, 20: 1, 22: 1, 24: 1, 26: 1, 28: 1, or 30: 1. In certain embodiments, the N: P ratio can be from about 2: 1 to about 8: 1. In other embodiments, the N: P ratio is from about 5: 1 to about 8: 1. For example, the N: P ratio may be about 5.0: 1, about 5.5: 1, about 5.67: 1, about 6.0: 1, about 6.5: 1, or about 7.0: 1. For example, the N: P ratio may be about 5.67: 1.
The physical properties of a nanoparticle composition can depend on the components thereof. For example, a nanoparticle composition including cholesterol as a structural lipid can have different characteristics compared to a nanoparticle composition that includes a different structural lipid. Similarly, the characteristics of a nanoparticle composition can depend on the absolute or relative amounts of its components. For instance, a nanoparticle composition including a higher molar fraction of a phospholipid may have different characteristics than a nanoparticle composition including a lower molar fraction of a phospholipid. Characteristics may also vary depending on the method and conditions of preparation of the nanoparticle composition.
Nanoparticle compositions may be characterized by a variety of methods. For example, microscopy (e.g., transmission electron microscopy or scanning electron microscopy) may be used to examine the morphology and size distribution of a nanoparticle composition. Dynamic light scattering or potentiometry (e.g., potentiometric titrations) may be used to measure zeta potentials. Dynamic light scattering may also be utilized to determine particle sizes. Instruments such as the Zetasizer Nano ZS (Malvem Instruments Ltd, Malvem, Worcestershire, UK) may also be used to measure multiple characteristics of a nanoparticle composition, such as particle size, polydispersity index, and zeta potential.
In various embodiments, the mean size of a nanoparticle composition can be between 10s of nm and 100s of nm. For example, the mean size can be from about 40 nm to about 150 nm, such as about 40 nm, 45 nm, 50 nm, 55 nm, 60 nm, 65 nm, 70 nm, 75 nm, 80 nm, 85 nm, 90 nm, 95 nm, 100 nm, 105 nm, 110 nm, 115 nm, 120 nm, 125 nm, 130 nm, 135 nm, 140 nm, 145 nm, or 150 nm. In some embodiments, the mean size of a nanoparticle composition can be from about 50 nm to about 100 nm, from about 50 nm to about 90 nm, from about 50 nm to about 80 nm, from about 50 nm to about 70 nm, from about 50 nm to about 60 nm, from about 60 nm to about 100 nm, from about 60 nm to about 90 nm, from about 60 nm to about 80 nm, from about 60 nm to about 70 nm, from about 70 nm to about 100 nm, from about 70 nm to about 90 nm, from about 70 nm to about 80 nm, from about 80 nm to about 100 nm, from about 80 nm to about 90 nm, or from about 90 nm to about 100 nm. In certain embodiments, the mean size of a nanoparticle composition can be from about 70 nm to about 100 nm. In some embodiments, the mean size can be about 80 nm. In other embodiments, the mean size can be about 100 nm.
A nanoparticle composition can be relatively homogenous. A polydispersity index can be used to indicate the homogeneity of a nanoparticle composition, e.g., the particle size distribution of the nanoparticle compositions. A small (e.g., less than 0.3) polydispersity index generally indicates a narrow particle size distribution. A nanoparticle composition can have a polydispersity index from about 0 to about 0.25, such as 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, or 0.25. In some embodiments, the polydispersity index of a nanoparticle composition can be from about 0.10 to about 0.20.
The zeta potential of a nanoparticle composition can be used to indicate the electrokinetic potential of the composition. For example, the zeta potential can describe the surface charge of a nanoparticle composition. Nanoparticle compositions with relatively low charges, positive or negative, are generally desirable, as more highly charged species can interact undesirably with cells, tissues, and other elements in the body. In some embodiments, the zeta potential of a nanoparticle composition can be from about -10 mV to about +20 mV, from about -10 mV to about +15 mV, from about -10 mV to about +10 mV, from about -10 mV to about +5 mV, from about -10 mV to about 0 mV, from about -10 mV to about -5 mV, from about -5 mV to about +20 mV, from about -5 mV to about +15 mV, from about -5 mV to about +10 mV, from about -5 mV to about +5 mV, from about -5 mV to about 0 mV, from about 0 mV to about +20 mV, from about 0 mV to about +15 mV, from about 0 mV to about +10 mV, from about 0 mV to about +5 mV, from about +5 mV to about +20 mV, from about +5 mV to about +15 mV, or from about +5 mV to about +10 mV.
The efficiency of encapsulation of a therapeutic and/or prophylactic agent describes the amount of therapeutic and/or prophylactic agent that is encapsulated or otherwise associated with a nanoparticle composition after preparation, relative to the initial amount provided. The encapsulation efficiency is desirably high (e.g., close to 100%) . The encapsulation efficiency can be measured, for example, by comparing the amount of therapeutic and/or prophylactic agent in a solution containing the nanoparticle composition before and after breaking up the nanoparticle composition with one or more organic solvents or detergents. Fluorescence can be used to measure the amount of free therapeutic and/or prophylactic agent (e.g., RNA) in a solution. For the nanoparticle compositions described herein, the encapsulation efficiency of a therapeutic and/or prophylactic agent can be at least 50%, for example 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%. In some embodiments, the encapsulation efficiency can be at least 80%. In certain embodiments, the encapsulation efficiency can be at least 90%.
A nanoparticle composition can optionally comprise one or more coatings. For example, a nanoparticle composition can be formulated in a capsule, film, or tablet having a coating. A capsule, film, or tablet including a composition described herein can have any useful size, tensile strength, hardness, or density.
5.4.2 Vectors
5.4.2 Vectors
In some embodiments of the CRISPR-Cas system provided herein, the mRNA encoding the CRISPR effector protein and the mRNA encoding the guide molecule are present in a vector.
The CRISPR-Cas12 systems described herein, or components thereof, nucleic acid molecules thereof, or nucleic acid molecules encoding or providing components thereof, can be delivered by various delivery systems such as vectors, e.g., plasmids, viral delivery vectors, such as adeno-associated viruses (AAV) , lentiviruses, adenoviruses, and other viral vectors, or methods, such as nucleofection or electroporation of ribonucleoprotein complexes consisting of Type V-I effectors and their cognate RNA guide or guides. The proteins and one or more RNA guides can be packaged into one or more vectors, e.g., plasmids or viral vectors. For bacterial applications, the nucleic acids encoding any of the components of the CRISPR systems described herein can be delivered to the bacteria using a phage. Exemplary phages, include, but are not limited to, T4 phage, Mu, λ phage, T5 phage, T7 phage, T3 phage, D29, M13, MS2, Qβ, and ΦX174.
In some embodiments, the vectors, e.g., plasmids or viral vectors, are delivered to the tissue of interest by, e.g., intramuscular injection, intravenous administration, transdermal administration, intranasal administration, oral administration, or mucosal administration. Such delivery may be either via a single dose or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.
In certain embodiments, the delivery is via adeno-associated viruses (AAV) , e.g., AAV2, AAV8, or AAV9, which can be administered in a single dose containing at least 1×105 particles (also referred to as particle units, pu) of adenoviruses or adeno-associated viruses. In some embodiments, the dose is at least about 1×106 particles, at least about 1×107 particles, at least about 1×108 particles, or at least about 1×109 particles of the adeno-associated viruses. The delivery methods and the doses are described, e.g., in WO 2016205764 and U.S. Pat. No. 8,454,972, both of which are incorporated herein by reference in their entirety. Due to the limited genomic payload of recombinant AAV, the smaller size of the Type V-I CRISP-Cas effector proteins described herein enables greater versatility in packaging the effector and RNA guides with the appropriate control sequences (e.g., promoters) required for efficient and cell-type specific expression.
In some embodiments, the delivery is via a recombinant adeno-associated virus (rAAV) vector. For example, in some embodiments, a modified AAV vector may be used for delivery. Modified AAV vectors can be based on one or more of several capsid types, including AAV1, AV2, AAV5, AAV6, AAV8, AAV8.2. AAV9, AAV rhlO, modified AAV vectors (e.g., modified AAV2, modified AAV3, modified AAV6) and pseudotyped AAV (e.g., AAV2/8, AAV2/5 and AAV2/6) . Exemplary AAV vectors and techniques that may be used to produce rAAV particles are known in the art (see, e.g., Aponte-Ubillus et al. (2018) Appl. Microbiol. Biotechnol. 102 (3) : 1045-54; Zhong et al. (2012) J. Genet. Syndr. Gene Ther. S1: 008; West et al. (1987) Virology 160: 38-47 (1987) ; Tratschin et al. (1985) Mol. Cell. Biol. 5: 3251-60) ; U.S. Pat. Nos. 4,797,368 and 5,173,414; and International Publication Nos. WO 2015/054653 and WO 93/24641, each of which is incorporated by reference) .
In some embodiments, the delivery is via plasmids. The dosage can be a sufficient number of plasmids to elicit a response. In some cases, suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg. Plasmids will generally include (i) a promoter; (ii) a sequence encoding a nucleic acid-targeting CRISPR effector protein, operably linked to the promoter; (iii) a selectable marker; (iv) an origin of replication; and (v) a transcription terminator downstream of and operably linked to (ii) . The plasmids can also encode the RNA components of a CRISPR-Cas system, but one or more of these may instead be encoded on different vectors. The frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian) , or a person skilled in the art.
Exemplary nucleic acid sequences used in a vector for delivering CRISPR effector proteins are provided in Table 5.
Table 5 Exemplary Vector Sequences
5.4.3 Other Delivery Systems
5.4.3 Other Delivery Systems
Further means of introducing one or more components of the new CRISPR systems into cells is by using cell penetrating peptides (CPP) . In some embodiments, a cell penetrating peptide is linked to the CRISPR effector proteins. In some embodiments, the CRISPR effector proteins and/or RNA guides are coupled to one or more CPPs to transport them inside cells effectively (e.g., plant protoplasts) . In some embodiments, the CRISPR effector proteins and/or RNA guide (s) are encoded by one or more circular or non-circular DNA molecules that are coupled to one or more CPPs for cell delivery.
CPPs are short peptides of fewer than 35 amino acids derived either from proteins or from chimeric sequences capable of transporting biomolecules across cell membrane in a receptor independent manner. CPPs can be cationic peptides, peptides having hydrophobic sequences, amphipathic peptides, peptides having proline-rich and anti-microbial sequences, and chimeric or bipartite peptides. Examples of CPPs include, e.g., Tat (which is a nuclear transcriptional activator protein required for viral replication by HIV type 1) , penetratin, Kaposi fibroblast growth factor (FGF) signal peptide sequence, integrin 3 signal peptide sequence, polyarginine peptide Args sequence, Guanine rich-molecular transporters, and sweet arrow peptide. CPPs and methods of using them are described, e.g., in Hallbrink et al., “Prediction of cell-penetrating peptides, ” Methods Mol. Biol., 2015; 1324: 39-58; Ramakrishna et al., “Gene disruption by cell-penetrating peptide-mediated delivery of Cas9 protein and guide RNA, ” Genome Res., 2014 June; 24 (6) : 1020-7; and WO 2016205764 A1; each of which is incorporated herein by reference in its entirety.
Delivery of the CRISPR-Cas system as a ribonucleoprotein complex by electroporation or nucleofection, in which purified CRISPR effector protein is pre-incubated with an RNA guide and electroporated (or nucleofected) into cells of interest, is another method of efficiently introducing the CRISPR system to cells for gene editing. This is particularly useful for ex vivo genome editing and the development of cellular therapies, and such methods are described in Roth et al. “Reprogramming human T cell function and specificity with non-viral genome targeting, ” Nature, 2018 July; 559 (7714) : 405-409.
Various delivery methods for the CRISPR systems described herein are also described, e.g., in U.S. Pat. No. 8,795,965, EP 3009511, WO 2016205764, and WO 2017070605; each of which is incorporated herein by reference in its entirety
5.5 Kit
5.5 Kit
This disclosure also encompasses kits for carrying out the various methods of the disclosure utilizing the CRISPR-Cas systems described herein. One exemplary kit of the present disclosure comprises (a) one or more nucleic acids encoding a CRISPR effector protein and a crRNA, and/or (b) a ribonucleoprotein complex of a CRISPR effector protein and a crRNA. In some embodiments, the kit comprises a Type V CRISPR effector protein (e.g. a Type V CRISPR effector protein of Section 5.2.2) and a crRNA (e.g. a crRNA of Section 5.2.1) . As described above, a complex of the Type V CRISPR effector protein and crRNA has an editing activity such as SSB formation, DSB formation, CRISPR interference, nucleobase modification, DNA methylation or demethylation, chromatin modification, etc. In certain embodiments, the CRISPR effector protein is a variant, such as a variant having reduced endonuclease activity.
Kits of this disclosure also optionally include additional reagents, including one or more of a reaction buffer, a wash buffer, one or more control materials (e.g., a substrate or a nucleic acid encoding a CRISPR system component) , etc. A kit of the present disclosure also optionally includes instructions for performing a method of this disclosure using materials provided in the kit. The instructions are provided in physical form, e.g., as a printed document physically packaged with another item of the kit, and/or in digital form, e.g., a digitally published document downloadable from a website or provided on computer readable media.
6. EXAMPLES
6. EXAMPLES
The following is a description of various methods and materials used in the studies. They are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention, nor are they intended to represent that the experiments below were performed and are all of the experiments that may be performed. It is to be understood that exemplary descriptions written in the present tense were not necessarily performed, but rather that the descriptions can be performed to generate the data and the like associated with the teachings of the present invention. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, percentages, etc. ) , but some experimental errors and deviations should be accounted for.
6.1 Example 1: CRISPR Effector Protein (CasY7) Acquisition
6.1 Example 1: CRISPR Effector Protein (CasY7) Acquisition
The inventors analyzed the genome and metagenome of an uncultured organism and identified a new Cas protein through redundant removal, protein clustering, and other analyses. Blast analysis results showed that the Cas protein had low sequence similarity with the reported Cas protein sequences. The newly identified Cas protein was named CasY7 in this invention. The amino acid sequence of the CasY7 protein is shown as in Table 4 and also provided below, and the nucleotide sequence encoding the CasY7 protein (after human codon optimization) is shown as in Table 5.
Cas Y7 amino acid sequence:
Nucleotide sequence encoding Cas Y7 protein:
Cas Y7 amino acid sequence:
Nucleotide sequence encoding Cas Y7 protein:
Analysis of the direct repeat (DR) sequence corresponding to the CasY7 protein was performed. The DNA sequence of the direct repeat (DR) sequence corresponding to the CasY7 protein is: CAAGTTGAATCCGTCTATAACTGACGG (SEQ ID NO: 437) . The inventors further analyzed the RNA secondary structure of the DR sequence in the pre-crRNA using RNAfold. The analysis results are shown in FIG. 3. It was found that the PAM corresponding to CasY7 is 5’ -TTN, where N represents A/T/C/G.
The crRNA sequence of the CasY7 protein comprises spacer sequences and direct repeat (DR) sequences. The CasY7 of the present invention belongs to the Cas12 protein family
6.2 Example 2: Determination of CasY7 Editing Efficiency in Escherichia coli
6.2.1 Plasmid Construction
6.2 Example 2: Determination of CasY7 Editing Efficiency in Escherichia coli
6.2.1 Plasmid Construction
(1) The TTR gene was selected as the target, and a spacer sequence was designed based on the target sequence of the TTR gene: GCATCTCCCCATTCCATGAG (SEQ ID NO: 435) . According to the DR sequences of CasY7 protein and LbCpf1 protein, sgRNA sequences targeting the TTR target gene were designed as follows in Table 6:
Table 6. CasY7-TTR-sgRNA1 and LbCpf1-TTR-sgRNA1 (coding sequences)
The T7 promoter and rrnB T2 terminator were added to the 5'a nd 3' ends, respectively, of the sgRNA sequences for both CasY7 and LbCpf1. The sequence for CasY7-TTR-sgRNA1 expression is provided as follows:
The single underlined sequence represents the CasY7 DR sequence, the double underlined sequence
represents the spacer sequence, the italicized sequence represents the T7 promoter, the wavy underlined sequence represents the rrnB T2 terminator sequence, the spacer sequence and rrnB T2 terminator sequence are separated by a linker sequence, the dashed sequence represents the MfeI enzyme cutting site, and the bold sequence represents the MluI enzyme cutting site.
The single underlined sequence represents the CasY7 DR sequence, the double underlined sequence
represents the spacer sequence, the italicized sequence represents the T7 promoter, the wavy underlined sequence represents the rrnB T2 terminator sequence, the spacer sequence and rrnB T2 terminator sequence are separated by a linker sequence, the dashed sequence represents the MfeI enzyme cutting site, and the bold sequence represents the MluI enzyme cutting site.
The sequence for LbCpf1-TTR-sgRNA1 expression is provided as follows:
The single underlined sequence represents the LbCpf1 DR sequence, the double underlined sequence
represents the spacer sequence, the italicized sequence represents the T7 promoter, the wavy underlined sequence represents the rrnB T2 terminator sequence, the spacer sequence and rrnB T2 terminator sequence are separated by a linker sequence, the dashed sequence represents the MfeI enzyme cutting site, and the bold sequence represents the MluI enzyme cutting site.
The single underlined sequence represents the LbCpf1 DR sequence, the double underlined sequence
represents the spacer sequence, the italicized sequence represents the T7 promoter, the wavy underlined sequence represents the rrnB T2 terminator sequence, the spacer sequence and rrnB T2 terminator sequence are separated by a linker sequence, the dashed sequence represents the MfeI enzyme cutting site, and the bold sequence represents the MluI enzyme cutting site.
AGC was introduced at the 5' end and ATA at the 3' end of both the synthesized CasY7-TTR-sgRNA1 expression fragment sequence and the LbCpf1-TTR-sgRNA1 expression fragment sequence as protective bases.
(2) The nucleotide sequence encoding CasY7 was synthesized by Suzhou Hongxun Biotechnology Co., Ltd., and the nucleotide sequence encoding CasY7 protein was cloned into the ABE8e plasmid (Addgene, Plasmid#138489) at position 466-5160, resulting in the construction of the CasY7 recombinant expression plasmid (plasmid map see FIG. 4A) .
Suzhou Hongxun Biotechnology Co., Ltd., also synthesized the optimized nucleotide sequence (sequence provided below) encoding LbCpf1 (amino acid sequence provided below) , and constructed the LbCpf1 recombinant expression plasmid using the same method (plasmid map see FIG. 4B) .
(3) Suzhou Synbio Biotechnology Co., Ltd., synthesized the sgRNA expression sequences (CasY7-sgRNA expression sequence and LbCpf1-sgRNA expression sequence) as described in step (1) . The sgRNA expression sequences were subjected to double enzyme digestion (MfeI/MluI) treatment, and then inserted into the CasY7 recombinant expression plasmid vector, which had undergone the same double enzyme digestion (MfeI/MluI) treatment, to obtain the recombinant expression plasmid for expressing CasY7 and sgRNA, named CasY7+sgRNA expression plasmid. The same method was used to construct the LbCpf1+sgRNA expression plasmid.
(4) Construction of Target Plasmid with Targeting Sequence:
Suzhou Synbio Biotechnology Co., Ltd., synthesized a nucleotide sequence containing the TTR target sequence (gcatctccccattccatgag (SEQ ID NO: 435) ) referred to as araC-pBAD-CCDB sequence fragment (sequence provided below) . This araC-pBAD-CCDB fragment was inserted into the pKESK22 plasmid (Addgene, Plasmid #64857) at positions 1284-1300, resulting in the construction of the Target plasmid. The sequence of the Target plasmid is described below, and the plasmid map can be found in FIG. 5.
6.2.2 Preparation and Transformation of Escherichia coli Competent Cells
6.2.2 Preparation and Transformation of Escherichia coli Competent Cells
The Target plasmid was introduced into DH5α competent cells, and streaked onto LB solid medium containing 50 μg/ml kanamycin. The plates were then incubated overnight at 37℃ in an incubator. On the next day, single colonies were picked from the plates and inoculated into 4 ml LB liquid medium containing 50 μg/ml kanamycin (provided by SANGON Biotech, A100408-0100) , and incubated at 37℃ with shaking at 200 rpm overnight. The next day, 4 ml of the culture was transferred to 400 ml of LB liquid medium containing 50 μg/ml kanamycin in a 2 L flask, and cultured at 37℃ with shaking at 200 rpm for 2-3 hours.
When the OD600nm of the culture medium reached 0.3-0.5, the flask was removed from the shaker and placed on ice for 10-15 minutes. Under sterile conditions, the culture medium was transferred to pre-cooled 500 ml centrifuge tubes and centrifuged at 4℃, 3000 rpm for 8 minutes. The supernatant was discarded, and approximately 200 ml of pre-cooled CaCl2 solution was added to the cell pellet. The mixture was gently pipetted to suspend the cells and incubated on ice for 30 minutes. After incubation, the culture was centrifuged again at 4℃, 3000 rpm for 8 minutes, and the supernatant was discarded. Approximately 8 ml of pre-cooled CaCl2 solution was added to the cell pellet to resuspend the cells. The resuspended cells were aliquoted into 1.5 ml EP tubes, with each tube containing 110 μl of cells, and stored at -80℃ for later use.
6.2.3 Determination of Editing Efficiency in Escherichia coli
6.2.3 Determination of Editing Efficiency in Escherichia coli
The CasY7+sgRNA expression plasmid and LbCpf1+sgRNA expression plasmid were separately introduced into the prepared competent cells from step 2. The specific procedure was as follows:
(1) The competent cells were taken out from -80℃ and quickly thawed on ice. After approximately 5 minutes, the cell clumps melted, and the CasY7+sgRNA expression plasmid was added. The mixture was gently mixed by flicking the bottom of the centrifuge tube by hand and allowed to stand on ice for 25 minutes.
(2) The mixture was heat shocked in a 42℃ water bath for 45 seconds, then quickly returned to ice and allowed to stand for 2 minutes. Subsequently, 900 μl of sterile LB medium without antibiotics was added to the centrifuge tube, mixed, and the cells were resuscitated at 37℃ and 220 rpm for 60 minutes. 100 μl of the cell suspension was spread on LB agar plates containing 30 μg/ml carbenicillin (provided by SANGON Biotech, A100358-0001) (referred to as C-LB medium) and LB agar plates containing 30 μg/ml carbenicillin and 10 mM L-arabinose (provided by SANGON Biotech, A610071-0100) (referred to as CL-LB medium) . The two LB agar plates were inverted and placed in the incubator, and incubated overnight at 37℃.
(3) Determination of Editing Efficiency in Escherichia coli
As shown in FIG. 5, the Target plasmid carries the PBAD promoter inducible by L-arabinose and the CCDB gene regulated by the PBAD promoter. The CCDB gene can express the CCDB toxic protein, which acts as a DNA gyrase inhibitor, locking the DNA gyrase and the broken double-stranded DNA complex, preventing the DNA gyrase from functioning and ultimately leading to cell death.
Based on this, the inventors designed a method to detect the editing efficiency in Escherichia coli: Under conditions where L-arabinose is present in the medium, if the CasY7 protein or LbCpf1 protein, guided by sgRNA, can specifically target the target sequence of the TTR gene (gcatctcccc attccatgag) on the Target plasmid and exert cleavage action, the regulatory expression pathway of the CCDB toxic protein by the PBAD promoter will be interrupted, and the host cells will survive because they do not produce the ccdB toxic protein. In contrast, if the CasY7 protein or LbCpf1 protein cannot specifically target the target sequence of the TTR gene on the Target plasmid, the expression of the CCDB gene controlled by the PBAD promoter induced by L-arabinose will produce the CCDB toxic protein, resulting in death of the host cells, Escherichia coli.
Therefore, the editing efficiency of the CasY7 protein in targeting cleavage of the TTR target gene in Escherichia coli can be calculated based on the ratio of the number of bacterial clones on the CL-LB medium to the number of bacterial clones on the C-LB medium, as shown in step (2) .
The results are shown in FIG. 6. After counting the number of Escherichia coli clones and calculating the ratio, the editing efficiency of the CasY7 protein was determined to be 45.5%. Additionally, the editing efficiency of the LbCpf1 protein was found to be 11.1%. The editing efficiency of the CasY7 protein was significantly higher than that of the LbCpf1 protein.
6.3 Example 3: Determination of Cas Y7 Editing Efficiency in HEK293T cells
6.3.1 Construction of TTR-sgRNA expression plasmid
6.3 Example 3: Determination of Cas Y7 Editing Efficiency in HEK293T cells
6.3.1 Construction of TTR-sgRNA expression plasmid
(1) TTR-sgRNA sequence (Target-TTR-spacer2) was designed from the target sequence of the TTR gene: tagaagggatatacaaagtg (SEQ ID NO: 438) . Oligonucleotides (oligos) were synthesized.
CasY7-TTR-sgRNA2 coding sequence:
The underlined sequence corresponds to the DR sequence, the remaining sequence corresponds to the
spacer sequences.
The underlined sequence corresponds to the DR sequence, the remaining sequence corresponds to the
spacer sequences.
LbCPf1-TTR-sgRNA2 coding sequence:
The underlined sequence corresponds to the DR sequence and the remaining sequences corresponds to the
spacer sequence.
The underlined sequence corresponds to the DR sequence and the remaining sequences corresponds to the
spacer sequence.
(2) CACC sequence is added at 5 'end of upstream sequence of TTR-sgRNA, AAAA sequence is added at 5' end of downstream sequence, and oligos is synthesized, the sequence is as follows in Table 7:
Table 7. Upstream and downstream primer sequences for CasY7-TTR sgRNA2 and LbCpf1-TTR
sgRNA2
Table 7. Upstream and downstream primer sequences for CasY7-TTR sgRNA2 and LbCpf1-TTR
sgRNA2
After synthesizing the upstream and downstream primers for TTR-sgRNA, they were subjected to annealing using a preset program (95℃ for 5 minutes; annealing from 95℃ to 85℃ at a rate of -2℃/s; annealing from 85℃ to 25℃ at a rate of -0.1℃/s; followed by holding at 4℃) . Subsequently, the annealed products were ligated into the linearized PHK09T vector digested with BsmBI (NEB, #R0580L) . The sequence of the PHK09T vector is provided as SEQ ID NO: 399, and the plasmid map can be found in FIG. 7.
The linearization of the PHK09T vector and its ligation with the TTR-sgRNA annealed products were performed as follows: First, the PHK09T vector was linearized using the following reaction system: 3μg of PHK09T vector, 6μL of buffer (NEB: R0539L) , 2μL of BsmBI, and ddH2O up to 60μL. The reaction mixture was incubated at 50℃ overnight for enzyme digestion.
For the ligation of TTR-gRNA annealed products with the linearized vector, the following reaction system was used: 1μL of T4 DNA ligase buffer (NEB, #M0202L) , 20ng of linearized vector, 5μL of annealed oligo fragments, 0.5μL of T4 DNA ligase (NEB, #M0202L) , and ddH2O up to 10μL. The reaction mixture was incubated at 16℃ overnight for ligation, resulting in the CasY7-TTR-sgRNA expression plasmid and LbCpf1-TTR-sgRNA expression plasmid.
(3) The obtained CasY7-TTR-sgRNA expression plasmid and LbCpf1-TTR-sgRNA expression plasmid obtained from step (2) were separately transformed into Escherichia coli DH5a competent cells (Weidi Biotechnology, DL1001) . The specific steps were as follows: DH5α competent cells were thawed rapidly from -80℃ freezer and placed on ice. After about 5 minutes, when the cell clumps were melted, the ligation products were added and gently mixed by tapping the bottom of the centrifuge tube by hand. The mixture was incubated on ice for 25 minutes. Then, it was heat shocked at 42℃ for 45 seconds and quickly returned to ice for 2 minutes. After adding 700μl of antibiotic-free sterile LB medium to the centrifuge tube, the mixture was mixed and allowed to recover at 37℃ and 200 rpm for 60 minutes. After centrifugation at 3000 rpm for one minute, the supernatant was gently removed to resuspend the cell pellet in approximately 100μl and then spread onto LB agar plates containing Ampicillin. The plates were inverted and incubated overnight at 37℃. Single colonies were picked, confirmed by sequencing, and positive clones were shaken, followed by plasmid extraction (using endotoxin-free plasmid extraction kit, TIANGEN: DP120-01) . The concentration of the extracted plasmid was measured and stored at -20℃ for later use.
6.3.2 Cell-level editing efficiency detection
6.3.2 Cell-level editing efficiency detection
(1) HEK293T cell culture
HEK293T cells (purchased from ATCC) were cultured in DMEM medium supplemented with 10%FBS (v/v) (Gibco, 11965092) and 1%Penicillin Streptomycin (v/v) (Gibco, 15140122) in a 37℃ incubator with 5%CO2. The cells were seeded in a 24-well cell culture plate the day before transfection and were transfected when the cell density reached approximately 80%.
(2) CasY7 recombinant expression plasmid (plasmid map shown in FIG. 4A) , CasY7-TTR-sgRNA expression plasmid, LbCpf1 recombinant expression plasmid (plasmid map shown in FIG. 4B) , and LbCpf1-TTR-sgRNA expression plasmid were separately transfected with EGFP-C1 plasmid (Addgene, Plasmid, #54759) into HEK293T cells.
The amount of plasmid transfected into each well of the 24-well plate was as follows: 0.3μg of nucleic acid nuclease expression plasmid (CasY7 recombinant expression plasmid or LbCpf1 recombinant expression plasmid) , 0.3μg of sgRNA expression plasmid (CasY7-TTR-sgRNA expression plasmid or LbCpf1-TTR-sgRNA expression plasmid) , and 0.3μg of EGFP-C1 plasmid.
The specific transfection steps were as follows: The CasY7 expression plasmid, CasY7-TTR-sgRNA expression plasmid, and EGFP-C1 plasmid were mixed and diluted in 25μl ofTransfection Special Reduced Serum Medium (Yuanpei Biotechnology, L530KJ) , and then 2μl of Lipofectamine 3000 (Invitrogen, L3000015) was added. The mixture was gently blown and mixed and left to stand for 5 minutes. At the same time, 2μl of Lipofectamine 3000 transfection reagent (Invitrogen, L3000015) was diluted in 25μl ofTransfection Special Reduced Serum Medium (Yuanpei Biotechnology, L530KJ) and mixed, and left to stand for 5 minutes. The two mixtures were mixed and blown evenly and left to stand for 20 minutes. After standing, the mixed reagent was added dropwise to the 24-well plate cells, and then returned to a 37℃, 5%CO2 incubator for culture. After 6 hours of transfection, the medium was changed to DMEM medium containing 10%FBS. The same transfection method was used to transfect LbCpf1 recombinant expression plasmid, LbCpf1-TTR-sgRNA expression plasmid, and EGFP-C1 plasmid into HEK293T cells.
(3) Editing efficiency detection After 48 hours of transfection, the expression of EGFP fluorescent protein indicated successful cell transfection. Cells expressing EGFP were selected for editing efficiency detection. The cells were subjected to genomic DNA extraction (using genomic DNA extraction kit, TIANGEN, DP304-03) .
The primer sequences used for identification, designed according to experimental requirements, are provided in the Table 8 below:
Table 8. Primer sequences for TTR-F and TTR-R
Table 8. Primer sequences for TTR-F and TTR-R
Using genomic DNA as a template, the PCR amplification of sequences near the target site was performed using the following PCR system: 2 × Taq Master Mix (Vazyme, P112-03) 25μL, Primer-F (TTR-F) (10pmol/μL) 1μL, Primer-R (TTR-R) (10pmol/μL) 1μL, Template 1μL, ddH2O to make up to 50μL.
The PCR products obtained were used for editing efficiency identification through high-throughput deep sequencing (performed by Qiagen Bioinformatics) or Sanger sequencing (performed by BioSune Biotechnology (Shanghai) Co., Ltd. ) .
The results of editing efficiency of CasY7 and LbCpf1 are shown in FIG. 8. In 293T cells, the editing efficiency of CasY7 was 33%, while the editing efficiency of LbCpf1 was only 22%. CasY7 exhibited significantly higher editing efficiency compared to LbCpf1.
6.4 Example 4: CRISPR-CAS System with Increased Editing Activity.
6.4 Example 4: CRISPR-CAS System with Increased Editing Activity.
Through editing of the hHao1 gene, it was found that adding a stem-loop (slDR) sequence to the conventional crRNA (DR-crRNA) at the 5' end significantly increased editing activity. This implementation example shows that using slDR can significantly enhance cleavage efficiency for the hHao1 target.
Conventional crRNA sequences (DR-crRNA) were designed based on the target sequence of the hHao1 gene: AGAAAUCCGUCCAAAGCUGACGGGGACAGAGGGUCAGCAUGCCAA (SEQ ID NO: 402)
The single underlined sequence represents the DR sequence, while the remaining sequence represent the spacer sequence.
A sequence capable of forming a stem-loop structure was added to the 5' end of the DR sequence. In this example, the added sequence is: 5' -CAUACAUGAGGAUCACCCAUGU-3’ (SEQ ID NO: 1) .
The crRNA sequence (slDR-spacer) is as follows:
Double underlined sequence represents the stem-loop sequence, single underlined sequence represents the DR sequence, and the remaining sequence represent the spacer sequence. Chemical modifications were applied to the last three bases at both the 3'a nd 5' ends of the crRNA sequence (as shown in Table 9) . DR-crRNA and slDR-crRNA were synthesized chemically by Nanjing GenScript.
Table 9. Chemical modifications of DR-crRNA and slDR-crRNA
m represents methylation of the nucleotides. *represents phosphorothioate modification.
m represents methylation of the nucleotides. *represents phosphorothioate modification.
Using the expression plasmid for CasY7 mRNA expression as a template, PCR amplification was performed using the upstream primer F: 5’ -atcctctagagattaatacgactcactataagggga -3’ (SEQ ID NO: 404) , and downstream primer R: 5’ -agcttgagcccacactctactcgac -3’ (SEQ ID NO: 405) , and KOD-plus-Neo DNA polymerase (Toyobo, KOD-401) . After purification with DNA selection magnetic beads, the CasY7 mRNA transcription template was obtained. The PCR amplification conditions were as follows: Stage 1: 94℃ for 2 minutes, Stage 2: (98℃ for 10 seconds; 65℃ for 30 seconds; 68℃ for 3 minutes) repeated for 30 cycles, Stage 3: 68℃ for 5 minutes; 12℃, indefinite.
In vitro transcription was performed using the NEB transcription kit (NEB, E2040S) and CleanCapAG (Trilink, N-7113-1) , and pseudouridine (Trilink, N-1019-1) was used for transcription. The components and corresponding ratios are shown in Table 10.
Table 10.
After incubation, NEB DnaseI (NEB M0303L) was added to the transcription reaction mixture at a volume of 2μl, and further incubated at 37℃ for 0.5 hours to digest the DNA template. CasY7 mRNA was collected using an RNA recovery kit (Novazyme, RC101-01) , dissolved in RNase-free water, and stored at -80℃ after diluting to a concentration of approximately 250ng/μl. Sample testing was performed using an R1 clip kit (Houze, C105110) to confirm that the obtained CasY7 mRNA met the requirements.
Experiments were conducted using HepG2 cells for validation, with LNP delivery. MC3, DSPC, cholesterol, and PEG-DMG were dissolved in anhydrous ethanol at a molar ratio of 50: 10: 38.5: 1.5. CasY7 mRNA was dissolved with DR-crRNA and slDR-crRNA (mass ratio of 1: 1) in 100mM pH 4 citric acid buffer (RNA concentration of 0.2mg/mL) . The ethanol solution of the lipid carrier was mixed with the buffer of mRNA at a ratio of 1: 3 (volume/volume) (with a total lipid-to-mRNA mass ratio of 40: 1) , and nucleic acid lipid nanoparticles were obtained at a flow rate of 12ml/min using a microfluidic nanoparticle manufacturing system (NanoAssemblr Ignite, Canada) . The obtained nucleic acid lipid nanoparticles (LNPs) were immediately diluted 40 times with 1×DPBS buffer.
The obtained LNPs were separately transfected into HepG2 cells at 5ng and 10ng, and all cells were collected 48 hours later. Genomic DNA was extracted using a genomic extraction kit (Tiangen, DP304-03) . Using the extracted genomic DNA as a template, PCR amplification was performed using the upstream primer F: 5’ -GTCGGCAGCGTCAGATGTGTATAAGAGACAGGCTGTATCCAAGGATGCT-3' (SEQ ID NO: 406) ,and downstream primer R: 5’ -CTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTTCTGTCCCTGTGGT -3’ (SEQ ID NO: 407) , and Taq enzyme (Novazyme, P112-02) . The specific amplification conditions were as follows: Stage 1: 95℃ for 3 minutes, Stage 2: (95℃ for 15 seconds; 53℃ for 15 seconds; 72℃ for 30 seconds) repeated for 30 cycles, Stage 3: 72℃ for 5 minutes; 12℃, indefinite.
The amplification products were sent directly to a sequencing company for sequence determination. Based on the sequencing results, the corresponding editing efficiency can be obtained by using the ICE Analysis online analysis program on the Synthego website, inputting the corresponding Guide sequence and sequencing results. The statistical results of editing efficiency are shown in FIG. 9. Analysis shows that slDR-crRNA (crRNA2) can significantly increase the editing efficiency of CasY7 at the hHao1 locus (approximately 5-10 times) compared with the conventional crRNA sequences (DR-crRNA) (crRNA1) .
6.5 Example 5: Effect of DR Sequence with Other SL Sequences
6.5 Example 5: Effect of DR Sequence with Other SL Sequences
Multiple stem loop (sl) sequences and modified variants thereof were designed, for which the secondary structures are shown in FIGs. 15A-15F. Various effects of the different sl sequences were investigated, including:
(1) effects of crRNA with different sl sequences (as shown in crRNA2-7) on crRNA-mediated activity;
(2) effects of crRNA-mediated activity after partial nucleotide replacement on the loop portion of the sl sequence (using crRNA2 sl sequence as the basis for modification, resulting in modified variants such as crRNA8, 9-10 sl sequences) ;
(3) effects of sl sequences of different lengths on crRNA-mediated activity (using crRNA2 sl sequence as the basis for modification, changing its sl sequence length to obtain 5 modified variants including crRNA11-15 sl sequences; using crRNA4 sl sequence as the basis for modification, increasing the number of bases in its sl sequence to obtain 3 modified variants including crRNA19-21sl sequences) ;
(4) effects of methylation at different positions of the sl sequence on crRNA-mediated activity (using crRNA2 sl sequence as the basis for modification, through methylation at different positions of the sl sequence, 3 modified variants including crRNA16-18 sl sequences were obtained; using crRNA4 sl sequence as the basis for modification, through methylation, 1 modified variant crRNA22 sl sequence was obtained) .
Based on the target sequence of the hHao1 target gene (GGACAGAGGGUCAGCAUGCCAA (SEQ ID NO: 408) ) , corresponding crRNA sequences were designed, as shown in the following table:
Table 11
m represents methylation of the nucleotides. *represents phosphorothioate modification.
m represents methylation of the nucleotides. *represents phosphorothioate modification.
The mRNA of CasY7 was obtained using the same method as in Example 6.3.2, and the slDR-crRNA in Table 11 was synthesized by Nanjing GenScript.
HepG2 cells were used for experimental verification, delivered by LNP. MC3, DSPC, cholesterol, and PEG-DMG were dissolved in anhydrous ethanol at a molar ratio of 50: 10: 38.5: 1.5. CasY7 mRNA and each slDR-crRNA (at a mass ratio of 1: 1) were dissolved in 100mM citrate buffer at pH 4 (RNA concentration of 0.2mg/mL) . The ethanol solution of the lipid carrier was mixed with the mRNA buffer at a ratio of 1: 3 (volume/volume) (with a mass ratio of total lipid to mRNA of 40: 1) , and nucleic acid lipid nanoparticles were obtained using a microfluidic nanoparticle manufacturing system (NanoAssemblr Ignite, Canada) at a flow rate of 12ml/min. The obtained nucleic acid lipid nanoparticles (LNPs) were immediately diluted 40-fold with 1× DPBS buffer.
The obtained LNPs were transfected into HepG2 cells at different doses (including 1ng, 2ng, 4ng, 5ng, and other doses) , and all cells were collected after 48 hours. Genomic DNA was extracted using a genome extraction kit (Tiangen, DP304-03) .
Subsequently, PCR amplification of the genomic DNA was performed using the following upstream and downstream primers: Upstream primer F: 5' -GTCGGCAGCGTCAGATGTGTATAAGAGACAGGCTGTATCCAAGGATGCT -3' (SEQ ID NO: 406) , Downstream primer R: 5' -CTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTTCTGTCCCTGTGGT -3' (SEQ ID NO: 407) .
The samples were sent to a sequencing service for high-throughput sequencing. Based on the sequencing results, the corresponding editing efficiency was obtained by using the ICE Analysis online analysis program on the Synthego website, inputting the corresponding guide sequences and sequencing results. The statistical results of the editing efficiency mediated by various crRNAs are shown in FIGs. 16A-16F.
Analysis showed that compared to slDR-crRNA (crRNA2) , crRNA3-4 and 6-7 mediated comparable or higher editing efficiency, while crRNA5 mediated relatively lower editing efficiency (FIG. 16A) . Analysis of the efficiency mediated after modifying the loop structure of the sl sequence of crRNA2 showed that when the “5' -AUCG-3' ” sequence in the sl sequence was replaced with “5' -GAAA-3' ”, the corresponding crRNA (crRNA8) mediated significantly higher editing efficiency (FIG. 16B) .
From testing the stem sequence length in the sl sequences of slDR-crRNA (crRNA2) and crRNA4, it can be seen that different stem lengths affected the mediating function of crRNA. When the stem of the crRNA having 7 base pairs (crRNA2 and crRNA4) , it mediated higher editing efficiency compared to crRNAs with stem sequences of other lengths (FIGs. 16C and 16D) . When the stem part of the crRNA consisted of 5 base pairs, it mediated relatively lower editing efficiency (crRNA12, 13 in FIG. 16C) . Compared to crRNAs with stem sequences consisting of 6-10 base pairs, the editing efficiency mediated by crRNAs with stem sequences consisting of 6-10 base pairs was slightly lower (FIGs. 16C and 16D) , but still higher than crRNAs with stem sequences consisting of 5 base pairs (FIG. 16C) .
Through analysis of methylation modification experiments on slDR-crRNA (crRNA2) , it was found that after methylation modification of only the nucleotides near the 5' end of the sl sequence, the editing efficiency could be maintained or enhanced (FIGs. 16E, 16F) . However, after methylation modification of the nucleotides at the 3' end of the sl sequence (including methylation modification of only the 3' end of the sl sequence and methylation modification of both the 5' end and 3' end of the sl sequence) , the editing efficiency mediated by the corresponding crRNA was significantly reduced (FIG. 16E) .
6.6 Example 6: Effect of SL Sequence on the Cleavage Activity of additional Cas12 Family
Members
6.6 Example 6: Effect of SL Sequence on the Cleavage Activity of additional Cas12 Family
Members
To test whether the sl sequence has universality, the inventors selected several other proteins from the Cas12 family for testing. The selected Cas proteins and corresponding DR information are shown in Table 12.
Table 12
6.6.1 Evaluation of the Improvement Effect of crRNA with sl Sequence.
6.6.1 Evaluation of the Improvement Effect of crRNA with sl Sequence.
hHAO1 was selected as the target gene, with the target sequence: GGACAGAGGGUCAGCAUGCCAA. The different forms of stem-loop (sl) sequences were designed and experiments targeting CasY6, LbCpf1, SiCas12i, CasΦ-2 (cas12j|CasΦ-2) , and Cas12L_16_70731038 were conducted, as shown in Table 13.
Table 13
Table 13
CasY6, LbCpf1, SiCas12i, CasΦ-2 (cas12j|CasΦ-2) , and Cas12L_16_70731038 were each cloned into the pcDNA3.1 (+) (Invitrogen, V79020) backbone to obtain expression vectors for each Cas protein. The crRNAs were cloned into pGL3-U6-sgRNA-EGFP (Addgene, Plasmid #107721) to obtain expression vectors for each crRNA.
HEK293T cells (purchased from ATCC) were cultured in DMEM medium (Gibco, 11965092) supplemented with 10%FBS (v/v) and 1%Penicillin Streptomycin (v/v) (Gibco, 15140122) in a 37℃ incubator with 5%CO2. For transfection, cells were seeded in 24-well cell culture plates the day before, and transfection was performed when cell density reached approximately 80%.
For each well in the 24-well plate, 0.3μg of Cas protein expression plasmid, 0.3μg of crRNA expression plasmid, and 0.3μg of EGFP-C1 plasmid were used for transfection. The specific transfection procedure was as follows:
The Cas protein expression plasmid, crRNA expression plasmid, and EGFP-C1 plasmid were mixed and diluted in 25μl oftransfection-specific reduced serum medium (YuanPei Biological, L530KJ) , then 2μl of Lipofectamine 3000 (Invitrogen, L3000015) reagent was added, mixed well as Reagent A, and left to stand for 5 minutes. Simultaneously, 2μl of Lipofectamine 3000 transfection reagent (Invitrogen, L3000015) was diluted in 25μl oftransfection-specific reduced serum medium (YuanPei Biological, L530KJ) and mixed as Reagent B, then left to stand for 5 minutes.
Reagents A and B were mixed thoroughly and left to stand for 20 minutes. After standing, the mixture was added dropwise to the cells in the 24-well plate and returned to a 37℃, 5%CO2 incubator. Six hours after transfection, the medium was changed to DMEM containing 10%FBS.
After 48 hours of transfection, EGFP fluorescent protein expression indicated successful transfection, and EGFP-positive cells were sorted for editing efficiency detection. Genomic DNA was extracted from these cells (using a genomic DNA extraction kit, TIANGEN, DP304-03) . Subsequently, the targeted genomic region was amplified by PCR using the genomic DNA as a template, and the purified PCR products were used for high-throughput deep sequencing (Qingke Biotechnology Co., Ltd. ) to identify editing efficiency. Table 14 shows the effect of sl sequences on crRNA-mediated cleavage efficiency. The results indicate that crRNAs with sl sequences can mediate higher cleavage activity for various nucleases.
Table 14
6.6.2 Evaluation of the Effect of sl Sequences with Methylation Modifications on crRNA-
Mediated Nuclease Activity
Table 14
6.6.2 Evaluation of the Effect of sl Sequences with Methylation Modifications on crRNA-
Mediated Nuclease Activity
(1) The applicant selected hHAO1 as the target gene, with the target sequence: GGACAGAGGGUCAGCAUGCCAA, and selected sl sequences with different modifications for experiments with CasY6 and SiCas12i in different groups, as follows:
Table 15
m represents methylation of the nucleotides. *represents phosphorothioate modification.
Table 15
m represents methylation of the nucleotides. *represents phosphorothioate modification.
Various crRNAs were synthesized by Nanjing GenScript Biotech, and mRNAs expressing CasY6 and SiCas12i were synthesized through in vitro transcription (IVT) .
Four-component LNP lipids (purchased from Aiweito (Shanghai) Pharmaceutical Technology Co., Ltd. ) were used for delivery. Specifically, Yoltech Lipid1 (Compound 10) , DSPC, cholesterol, and PEG-DMG were dissolved in anhydrous ethanol at a molar ratio of 50: 10: 38.5: 1.5. C05440-T5 mRNA and T5-C05440 mRNA were each dissolved with hHAO1-targeting hHAO-crRNA (at a mass ratio of 1: 1) in 100mM enzyme-free citrate buffer at pH 4 (RNA concentration of 0.2mg/mL) . The ethanol solution of the lipid carrier was mixed with the mRNA buffer at a ratio of 1: 3 (volume/volume) (with a mass ratio of total lipids to mRNA of 40: 1) , and passed through a microfluidic nano-drug manufacturing system (NanoAssemblr Ignite, Canada) at a flow rate of 12ml/min to obtain nucleic acid lipid nanoparticles. The obtained nucleic acid lipid nanoparticles were immediately diluted 40-fold in 1×DPBS buffer.
HepG2 cells (purchased from ATCC) were cultured in DMEM medium (Gibco, 11965092) supplemented with 10%FBS (v/v) and 1%Penicillin Streptomycin (v/v) (Gibco, 15140122) in a 37℃incubator with 5%CO2. For transfection, cells were seeded in 96-well cell culture plates the day before, and LNP transfection was performed when cell density reached approximately 80%. LNP@mRNA (with transfection doses of 5ng/well, 10ng/well, 20ng/well, and 40ng/well) was added to HepG2 cells. After 48 hours of transfection, cells were collected for genomic DNA extraction (TIANGEN, DP304-03) . Subsequently, the targeted genomic region was amplified by PCR using the genomic DNA as a template, and the purified PCR products were used for high-throughput deep sequencing (Qingke Biotechnology Co., Ltd. ) to identify editing efficiency.
The identification results are shown in Table 16. It can be observed that methylation chemical modifications at the 3' end of the sl sequence significantly reduced editing activity (crRNAs containing sl16-DR and sl18-DR mediated lower editing activity) , while methylation chemical modifications at the 5'end of the sl sequence did not significantly affect the mediated editing activity. This is consistent with the test results in Example 5 of Section 6.4.
Table 16
m represents methylation of the nucleotides. *represents phosphorothioate modification.
Table 16
m represents methylation of the nucleotides. *represents phosphorothioate modification.
Yoltech Lipid1 (Compound 10) is synthesized as follows: 5- [ (2-butyl-1-oxooctyl) oxy] pentanoic acid-7-butyl-21- (10-butyl-3, 9-dioxo-2, 8-dioxahexadecane-1-yl) -19- [3- (diethylamino) propyl] -8-oxo-19-aza-9-oxadocosane-22-yl ester
Step 1: Synthesis of compound 1-2
In a 500 mL round-bottom flask, cyclohexyl ester (25.00 g, 249.70 mmol, 1.0 eq) , distilled water (20 mL) , ethanol (200 mL) , and sodium hydroxide (10.99 g, 274.67 mmol, 1.1 eq) were added. After reacting at 70℃ for 3 hours, the solvent was removed by concentration under reduced pressure. Then, 200 mL of acetone was slowly added to the flask, followed by tetrabutylammonium iodide (4.61 g, 12.48 mmol, 0.05 eq) and benzyl bromide (51.25 g, 299.64 mmol, 1.2 eq) , and the reaction was continued overnight at 70℃. The reaction was quenched by adding 500 mL of water, extracted twice with 500 mL of ethyl acetate, and the combined organic phases were washed with saturated brine, dried over anhydrous sodium sulfate, concentrated under reduced pressure, and purified by column chromatography to obtain 5-hydroxypentanoic acid benzyl ester (37.00 g, yield 71.2%) .
Step 2: Synthesis of compound 1-4
To a 500 mL round-bottom flask, 5-hydroxypentanoic acid benzyl ester (37.00 g, 177.67 mmol, 1.0 eq) , 2-butyloctanoic acid (35.59 g, 177.67 mmol, 1.0 eq) , 250 mL of dichloromethane, and 4-dimethylaminopyridine (21.70 g, 177.67 mmol, 1.0 eq) were added, followed by 1- (3-dimethylaminopropyl) -3-ethylcarbodiimide hydrochloride (51.09 g, 266.50 mmol, 1.5 eq) . After reacting at room temperature for 4 hours, the mixture was diluted with 500 mL of water, extracted twice with 500 mL of dichloromethane, and the combined organic phases were washed with saturated brine, dried over anhydrous sodium sulfate, concentrated under reduced pressure, and purified by column chromatography to obtain 2-butyloctanoic acid-5- (benzyloxy) -5-oxopentyl ester (64.00 g, yield 92.2%) .
Step 3: Synthesis of compound 1-5
To a 250 mL round-bottom flask, 2-butyloctanoic acid-5- (benzyloxy) -5-oxopentyl ester (64.00 g, 163.87 mmol, 1.0 eq) , methanol (75 mL) , tetrahydrofuran (75 mL) , and finally Pd/C (3.49 g, 32.78 mmol, 0.2 eq, 10%purity) were added. The reaction was carried out at room temperature under a hydrogen atmosphere at one atmosphere for 16 hours, and after filtration and concentration, 5- [ (2-butyl-1-oxooctyl) oxy] pentanoic acid (45.00 g, yield 91.4%) was obtained.
Step 4: Synthesis of compound 1-7
At room temperature, 5- [ (2-butyl-1-oxooctyl) oxy] pentanoic acid (10.00 g, 33.29 mmol, 1.0 eq) , 2-hydroxymethylpropane-1, 3-diol (3.53 g, 33.29 mmol, 1.0 eq) , 4-dimethylaminopyridine (0.81 g, 6.66 mmol, 0.2 eq) , N- (3-dimethylaminopropyl) -N' -ethylcarbodiimide hydrochloride (9.57 g, 49.94 mmol, 1.5 eq) , and N, N-diisopropylethylamine (8.60 g, 66.58 mmol, 2.0 eq) were added to a round-bottom flask containing 100 mL of dichloromethane and stirred at room temperature for 4 hours. The reaction was quenched by adding 200 mL of water, extracted twice with 200 mL of dichloromethane, and the combined organic phases were washed with brine, dried over anhydrous sodium sulfate, filtered, concentrated, and purified by column chromatography to obtain 2-butyloctanoic acid-18-butyl-8- (hydroxymethyl) -5, 11, 17-trioxo-6, 10, 16-trioxatetracosane-1-yl ester (7.80 g, yield 69.9%) .
Step 5: Synthesis of compound 1-8
At room temperature, compound 2-butyloctanoic acid-18-butyl-8- (hydroxymethyl) -5, 11, 17-trioxo-6, 10, 16-trioxatetracosane-1-yl ester (3.90 g, 5.81 mmol, 1.0 eq) and triethylamine (1.76 g, 17.43 mmol, 3.0 eq) were added to 30 mL of dichloromethane. Methanesulfonic anhydride (2.02 g, 11.62 mmol, 2.0 eq) was slowly added at 0℃, and the reaction was allowed to warm to room temperature and react for 4 hours. The reaction was quenched by adding 30 mL of water, extracted twice with 50 mL of dichloromethane, and the combined organic phases were washed with brine, dried over anhydrous sodium sulfate, filtered, concentrated, and purified by column chromatography to obtain methanesulfonic acid-12-butyl-2- (10-butyl-3, 9-dioxo-2, 8-dioxahexadecane-1-yl) -5, 11-dioxo-4, 10-dioxaoctadecane-1-yl ester (3.85 g, yield 88.4%) .
Step 6: Synthesis of compound 1-10
At room temperature, compound 1-8 (600.0 mg, 0.80 mmol, 1.0 eq) , 3-amino-1-propanol (300.0 mg, 3.99 mmol, 5.0 eq) , potassium carbonate (280.0 mg, 2.00 mmol, 2.5 eq) , and potassium iodide (130.0 mg, 0.80 mmol, 1.0 eq) were added to 10 mL of acetonitrile, protected with nitrogen, heated to 90℃, and reacted for 16 hours. The reaction mixture was concentrated, diluted with water, extracted with ethyl acetate three times, and the combined organic phases were washed with saturated brine, dried over anhydrous sodium sulfate, concentrated, and purified by column chromatography to obtain 5- [ (2-butyl-1-oxooctyl) oxy] pentanoic acid-12-butyl-2- { [ (3-hydroxypropyl) amino] methyl} -5, 11-dioxo-4, 10-dioxaoctadecane-1-yl ester (210.0 mg, 36.11%) . MS: m/z [M+H] + = 728.6.
Step 7: Synthesis of compound 10
At room temperature, compound 2-butyloctanoic acid-8- (10-butyl-3, 9-dioxo-2, 8-dioxahexadecane-1-yl) -14-ethyl-5-oxo-10, 14-diaza-6-oxahexadecane-1-yl ester (500.0 mg, 0.64 mmol, 1.0 eq) , 2-butyloctanoic acid-9-bromononyl ester (390.0 mg, 0.96 mmol, 1.5 eq) , potassium carbonate (270.0 mg, 1.92 mmol, 3.0 eq) , and potassium iodide (110.0 mg, 0.64 mmol, 1.0 eq) were added to 20 mL of acetonitrile, protected with nitrogen, heated to 90℃, and reacted overnight. The reaction mixture was concentrated, diluted with water, extracted with dichloromethane three times, and the combined organic phases were washed with saturated brine, dried over anhydrous sodium sulfate, concentrated, and purified by column chromatography to obtain 5- [ (2-butyl-1-oxooctyl) oxy] pentanoic acid-7-butyl-21- (10-butyl-3, 9-dioxo-2, 8-dioxahexadecane-1-yl) -19- [3- (diethylamino) propyl] -8-oxo-19-aza-9-oxadocosane-22-yl ester (132.8 mg, yield 18.8%) . MS: m/z [M+H] + = 1107.9.1H NMR (300 MHz, CDCl3) δ 4.15-4.01 (m, 10 H) , 3.44-3.20 (m, 4H) , 2.71-2.50 (m, 6H) , 2.39-2.23 (m, 12H) , 2.02-1.40 (m, 28H) , 1.38-1.22 (m, 48H) , 0.92-0.75 (m, 18 H) .
In the examples, it was demonstrated that under the mediation of DR-crRNA with sl sequences, various type V Cas protein systems could achieve higher editing activity in mammalian cells, indicating that the effect of sl sequences is widely applicable. Through further testing experiments, it was proven that the following modifications to the sl sequence can enhance its effect, including replacing the loop region sequence with 5' -GAAA-3' , limiting the stem region sequence length to around 7nt; methylation modification at the 5' end of the sl sequence can enhance (e.g., including sl sequence) or not affect its positive effect, but methylation modification at the 3' end of the sl sequence would adversely affect its function (e.g., FIG. 16F shows the impact of methylation at different positions) .
In conclusion, compared to before modification, the Cas12 system containing the sl sequences disclosed in this invention has higher editing activity and potential development value.
All patents and publications mentioned in this specification are incorporated herein by reference in their entireties. From the foregoing description, it will be apparent that variations and modifications can be made to the invention described herein to adopt it to various uses and conditions. Such embodiments are also within the scope of the following claims.
Claims (95)
- A CRISPR RNA (crRNA) comprising, in the 5’-to-3’ direction, a first stem-loop sequence, a connector region, and a second stem-loop sequence,wherein the first stem-loop sequence is capable of forming a first stem-loop structure having a first stem of about 3-12 base pairs and a first loop of about 3-10 nucleotides;wherein the connector comprises about 3-10 nucleotides; andwherein the second stem-loop sequence is capable of forming a second stem-loop structure having a second stem of about 3-12 base pairs and a second loop of about 3-15 nucleotides.
- The crRNA of claim 1, further comprising a spacer region 3’ to the second stem-loop sequence, wherein the spacer region is at least about 15 nucleotides in length, optionally wherein the spacer region is about 15-50 nucleotides.
- The crRNA of claim 1 or 2, further comprises a floater region 5’ to the first stem-loop sequence, wherein the floater is at least about 1 nucleotide, 2 nucleotides, or 3 nucleotides.
- The crRNA of any one of claims 1 to 3, wherein the second stem is about 5 base pairs, and the second loop is about 5, 6, or 7 nucleotides.
- The crRNA of any one of claims 1 to 4, wherein the connector region is about 4, 5, or 6 nucleotides.
- The crRNA of any one of claims 1 to 5, wherein the first stem is about 7 or 8 base pairs, and the first loop is about 4 nucleotides.
- The crRNA of claim 6, wherein the first stem is 7 base pairs, and the first loop is about 4 nucleotides.
- The crRNA of claim 7, wherein the first loop comprises the sequence of 5’-GAAA-3’.
- The crRNA of any one of claims 1 to 8 wherein the spacer region is about 20 to 40 nucleotides.
- The crRNA of any one of claims 1 to 9, wherein the polynucleotide is an RNA molecule.
- The crRNA of any one of claims 1 to 10, wherein the first stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%to a sequence of any of SEQ ID NOs: 2, and 478-494; optionally wherein the first stem-loop sequence consists of a sequence of any of SEQ ID NOs: 2, and 478-494.
- The crRNA of any one of claims 1 to 11, wherein the crRNA is a single-stranded polynucleotide, wherein the single-stranded polynucleotide comprises a sequence of any of SEQ ID NOs: 73-120, 456-476, and 547-631.
- The crRNA of any one of claims 1 to 12, wherein the crRNA is methylated at one or more nucleotides.
- The crRNA of claim 13, wherein one or more nucleotides from the 5' end of the crRNA to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated.
- The crRNA of claim 14, wherein all nucleotides from the 5' end of the crRNA to the last nucleotide at 3’ end of the loop structure of the first stem-loop are methylated.
- A modified Type V CRISPR RNA (crRNA) comprising at least one stem-loop sequence connected to the 5’ end of a naturally-existing Type V crRNA or a functional derivative thereof, wherein the stem-loop sequence is capable of forming a stem-loop structure having a stem of about 3-12 base pairs and a loop of about 3-10 nucleotides.
- The modified Type V crRNA of claim 16, wherein the stem-loop sequence has sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%to a sequence of any one of SEQ ID NOs: 1-17, and 441; optionally wherein the stem-loop sequence comprises a sequence of any of SEQ ID NOs: 1-17, and 441; optionally wherein the stem-loop sequence consists of a sequence of any of SEQ ID NOs: 1-17, and 441.
- The modified Type V crRNA of claim 16 or 17, wherein the stem-loop sequence is connected to the 5’ end of the naturally-existing CRISPR-Type V crRNA or the functional derivative thereof via a connector sequence, and wherein the connector sequence comprises about 3-10 nucleotides.
- The modified Type V crRNA of any one of claims 16 to 18, wherein the naturally existing Type V crRNA is processed from a CRISPR array located 3’ to a Type V CRISPR-Cas locus.
- The modified Type V crRNA of any one of claims 16 to 19, wherein the naturally existing Type V crRNA comprises a sequence of any one of SEQ ID NOs: 18-70; optionally wherein the naturally existing Type V crRNA or functional derivative thereof consists a sequence of any one of SEQ ID NOs: 18-70.
- The modified Type V crRNA of any one of claims 16 to 20, wherein the modified Type V crRNA comprises a sequence having at least about 75%, about 80%, about 85%, about 90%, about 95%or about 97%sequence identity to a sequence of any one of SEQ ID NOs: 73-120, 456-476, and 547-631; optionally wherein the modified Type V crRNA comprises a sequence of any one of SEQ ID NOs: 73-120, 456-476, and 547-631; optionally wherein the modified Type V crRNA consists of a sequence of any one of SEQ ID NOs: 73-120, 456-476, and 547-631.
- The modified Type V crRNA of any one of claims 16 to 21, wherein the modified Type V crRNA is methylated at one or more nucleotides.
- The modified Type V crRNA of claim 22, wherein one or more nucleotides from the 5' end of the modified Type V crRNA to the last nucleotide at 3’ end of the loop structure of the stem-loop are methylated.
- The modified Type V crRNA of claim 23, wherein all nucleotides from the 5' end of the crRNA to the last nucleotide at 3’ end of the loop structure of the stem-loop are methylated.
- A non-naturally occurring Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) –Cas system comprising:(a) a CRISPR effector protein or a polynucleotide encoding the CRISPR effector protein; and(b) a guide molecule or a polynucleotide encoding the guide molecule, wherein the guide molecule comprises the crRNA of any one of claims 1 to 15 or the modified Type V crRNA of any one of claims 16 to 24.
- The CRISPR-Cas system of claim 25, wherein the CRISPR effector protein comprises a RuvC-like endonuclease domain.
- The CRISPR-Cas system of claims 25 or 26, wherein the RuvC-like endonuclease domain comprises one or more RuvC motifs selected from a RuvC I motif, a RuvC II motif and RuvC III motif,wherein the RuvC I motif comprises the amino acid sequence of X1X2X3DX4X5X6X7, wherein X1 is L, I, V, or M; X2 is G, S, or A; X3 is I, V, or L; X4 is L, or R; X5 is G or N; X6 is E, Q, I, or L; X7 is R, T, K, or N,wherein the RuvC II motif comprises the amino acid sequence of X1X2X3EX4X5, wherein X1 is I, V, or L; X2 is V, or A; X3 is L, I, M, F, or V; X4 is D, N, K, or S; X5 is L, A, or D,wherein the RuvC III motif comprises the amino acid sequence of X1X2DXX3X4X5XX6X7X8, wherein X1 is D, N, or H; X2 is A, S, R, or G; X is any amino acid; X3 is N, V, I, or E; X4 is A, G, S, or K; X5 is A, or S; X6 is N, H, V, or G; X7 is I, L, or V; X8 is A, G, or L.
- The CRISPR-Cas system of any one of claims 25 to 27, wherein the CRISPR effector protein does not contain an HNH-like domain.
- The CRISPR-Cas system of any one of claims 25 to 28, wherein the CRISPR effector protein comprises a zinc-finger protein domain, optionally the Zinc finger domain is inserted in the RuvC-like endonuclease domain.
- The CRISPR-Cas system of any one of claims 25 to 29, wherein the CRISPR effector protein further comprises a wedge (WED) domain.
- The CRISPR-Cas system of any one of claims 25 to 30, wherein the CRISPR effector protein further comprises a REC domain.
- The CRISPR-Cas system of any one of claims 25 to 31, wherein the CRISPR effector protein is less than about 1100 amino acids in length.
- The CRISPR-Cas system of any one of claims 25 to 32, wherein the CRISPR effector protein is capable of recognizing a T-rich protospacer adjacent motif (PAM) ; optionally wherein the T-rich PAM comprises the nucleic acid sequence of 5’-TTN-3’ or 5’-NTN –3’, wherein N is selected from A, T, C, G, and U; wherein optionally the PAM is selected from 5’-TTA-3’, 5’-TTT-3’, 5’-TTG-3’, 5-TTC-3’, 5’-ATA-3’, and 5’-ATG-3’.
- The CRISPR-Cas system of claim 25, wherein the CRISPR effector protein is a Type V CRISPR effector protein.
- The CRISPR-Cas system of claim 34, wherein the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12b1 (C2c1) , Cas12b2, Cas12c (C2c3) , Cas12d (CasY) , Cas12e (CasX) , Cas12f1 (Cas14a) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12g, Cas12h, Cas12i, Cas 12j (CasΦ-2) , Cas12k (C2c5) , Cas 12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof.
- The CRISPR-Cas system of claim 34, wherein the Type V CRISPR effector protein is selected from Cas12a (Cpf1) , Cas12d (CasY) , Cas12f2 (Cas14b) , Cas12f3 (Cas14c) , Cas12h, Cas12i, Cas 12j (CasΦ-2) , Cas12k (C2c5) , Cas 12l, C2c4, C2c8, C2c9, and C2c10, or a functional derivative thereof.
- The CRISPR-Cas system of claim 34, wherein the Type V CRISPR effector protein comprises the amino acid sequence selected from any one of SEQ ID NOs: 168-381 and 436, or a functional derivative having sequence identity of at least about 70%, about 75%, about 80%, about 85%, about 90%, about 95%or about 97%thereto.
- The CRISPR-Cas system of any one of claims 25 to 37, wherein the CRISPR effector protein is fused to a signal peptide, a nuclear localization signal (NLS) , or a nuclear export signal (NES) .
- The CRISPR-Cas system of any one of claims 25 to 37, wherein the CRISPR effector protein is fused to a deaminase catalytic domain, a DNA methylation catalytic domain, a DNA demethylation catalytic domain, a histone residue modification domain, a nuclease catalytic domain, a fluorescent protein, a transcription modification factor; optionally wherein the deaminase catalytic domain is selected from the group consisting of an adenosine deaminase catalytic domain and a cytidine deaminase catalytic domain.
- The CRISPR-Cas system of any one of claims 26 to 37, wherein the CRISPR effector protein is fused to a reverse transcriptase, and wherein the CRISPR-Cas system further comprises a donor template nucleic acid, wherein optionally the donor template nucleic acid is a DNA or RNA.
- The CRISPR-Cas system of any one of claims 25 to 37, wherein the CRISPR effector protein is fused to a DNA methylation domain, a histone residue modification domain, a localization factor, a transcription modification factor, a light-gated control factor, a chemically inducible factor, or a chromatin visualization factor.
- The CRISPR-Cas system of any one of claims 26 to 41, wherein the polynucleotide encoding the CRISPR effector protein and/or the polynucleotide encoding the guide molecule are mRNA molecules.
- The CRISPR-Cas system of claim 42, wherein the mRNA encoding the CRISPR effector protein and the mRNA encoding the guide molecule are present in a delivery system selected from the group consisting of a lipid nanoparticle, a liposome, an exosome, a micro-vesicles, and a gene-gun.
- The CRISPR-Cas system of any one of claims 25 to 41, wherein the polynucleotide encoding the CRISPR effector protein and/or the polynucleotide encoding the guide molecule are operably linked to a promoter.
- The CRISPR-Cas system of claim 44, wherein the polynucleotide encoding the CRISPR effector protein and/or the polynucleotide encoding the guide molecule are in a vector selected from a retroviral vector, a lentiviral vector, a phage vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.
- The CRISPR-Cas system of any one of claims 25 to 45, wherein the system lacks a tracrRNA.
- The CRISPR-Cas system of any one of claims 25 to 45, wherein the system further comprises a target DNA or a nucleic acid encoding the target DNA, wherein the target DNA comprises a sequence that is capable of hybridizing to the spacer region of the guide molecule.
- The CRISPR-Cas system of any one of claims 25 to 45, wherein the CRISPR effector protein and the guide molecule form a complex that associates with the target nucleic acid, thereby modifying the target nucleic acid.
- The CRISPR-Cas system of any one of claims 25 to 45, wherein the spacer region is between about 15 and about 50 nucleotides in length.
- A cell comprising the system of any one of claims 25 to 49.
- The cell of claim 50, wherein the cell is a eukaryotic cell.
- The cell of claim 51, wherein the cell is a prokaryotic cell.
- A method of targeting and nicking a non-spacer complementary strand of a double-stranded target nucleic acid upon recognition of a spacer complementary strand of the double-stranded target nucleic acid, the method comprising contacting the double-stranded target DNA with a system of any one of claims 26 to 49.
- A method of targeting and cleaving a double-stranded target nucleic acid, comprising contacting the double-stranded target DNA with a system of any one of claims 26 to 49.
- The method of claim 54, wherein a non-spacer complementary strand of the double-stranded target nucleic acid is nicked before the spacer complementary strand of the double-stranded target nucleic acid is nicked.
- The method of claim 54, wherein both strands of target DNA are cleaved at different sites, resulting in a staggered cut.
- The method of claim 54, wherein both strands of target DNA are cleaved at the same site, resulting in a blunt double-strand break.
- A method of targeting and cleaving a single-stranded target DNA, the method comprising contacting the target nucleic acid with a system of any one of claims 26 to 49.
- A method of detecting a target nucleic acid in a sample, the method comprising:(a) contacting the sample with a system of any one of claims 26 to 49 under a suitable condition to form a tertiary complex comprising the CRISPR effector protein, the guide molecule, and the target nucleic acid,(b) contacting a labeled detector nucleic acid that is single-stranded and does not hybridize with the guide molecule; and(c) measuring a detectable signal produced by cleavage of the labeled detector by the CRISPR effector protein, thereby detecting the target DNA.
- The method of claim 59, further comprising comparing a level of the detectable signal with a reference signal level, and determining an amount of target nucleic acid in the sample based on the level of the detectable signal.
- The method of claim 60, wherein the measuring is performed using gold nanoparticle detection, fluorescence polarization, colloid phase transition/dispersion, electrochemical detection, or semiconductor based-sensing.
- The method of claim 61, wherein the labeled reporter nucleic acid comprises a fluorescence-emitting dye pair, a fluorescence resonance energy transfer (FRET) pair, or a quencher/fluorophore pair, wherein cleavage of the labeled reporter nucleic acid by the effector protein results in an increase or a decrease of the amount of signal produced by the labeled reporter nucleic acid.
- A method of specifically editing a double-stranded nucleic acid, the method comprising contacting, under sufficient conditions and for a sufficient amount of time,(a) a CRISPR effector protein and one other enzyme with sequence-specific nicking activity, and a guide molecule that guides the CRISPR effector protein to nick the opposing strand relative to the activity of the other sequence-specific nickase; and(b) the double-stranded nucleic acid;wherein the method results in the formation of a double-stranded break.
- A method of editing a double-stranded nucleic acid, the method comprising contacting, under sufficient conditions and for a sufficient amount of time,(a) a fusion protein comprising a CRISPR effector protein and a protein domain with DNA modifying activity and a guide molecule targeting the double-stranded nucleic acid; and(b) the double-stranded nucleic acid;wherein the CRISPR effector protein of the fusion protein is modified to nick a non-target strand of the double-stranded nucleic acid.
- A method of inducing genotype-specific or transcriptional-state-specific cell death or dormancy in a cell, the method comprising contacting a cell with a system of any one of claims 26 to 49, wherein the guide molecule hybridizing to the target DNA causes a collateral DNase activity-mediated cell death or dormancy.
- The method of claim 65, wherein the cell is a prokaryotic cell
- The method of claim 65, wherein the cell is a eukaryotic cell.
- The method of claim 67, wherein the cell is a mammalian cell.
- The method of claim 68, wherein the cell is a cancer cell.
- The method of claim 65, wherein the cell is an infectious cell or a cell infected with an infectious agent.
- The method of claim 70, wherein the cell is a cell infected with a virus, a cell infected with a prion, a fungal cell, a protozoan, or a parasite cell.
- A method of treating a condition or disease in a subject in need thereof, the method comprising administering to the subject a system of any one of claims 26 to 49,wherein the spacer sequence is complementary to at least 15 nucleotides of a target nucleic acid associated with the condition or disease;wherein the CRISPR effector protein associates with the guide molecule to form a complex;wherein the complex binds to a target nucleic acid sequence that is complementary to the at least 15 nucleotides of the spacer sequence; andwherein upon binding of the complex to the target nucleic acid sequence the CRISPR effector protein cleaves the target nucleic acid, thereby treating the condition or disease in the subject.
- The method of claim 72, wherein the condition or disease is a cancer or an infectious disease.
- The method of claim 73, wherein the condition or disease is selected from the group consisting of Cystic Fibrosis, Duchenne Muscular Dystrophy, Becker Muscular Dystrophy, Alpha-1 -antitrypsin Deficiency, Pompe Disease, Myotonic Dystrophy, Huntington Disease, Fragile X Syndrome, Friedreich's ataxia, Amyotrophic Lateral Sclerosis, Frontotemporal Dementia, Hereditary Chronic Kidney Disease, Hyperlipidemia, Hypercholesterolemia, Leber Congenital Amaurosis, Sickle Cell Disease, Beta Thalassemia, Familial Hypercholesterolemia (FH) , Transthyretin Amyloidosis (ATTR) , Primary Hyperoxaluria (PH1) , and Hereditary Angioedema (HAE) , and Atherosclerotic Cardiovascular Disease (ASCVD) .
- The method of claim 73, wherein the condition or disease is cancer, and wherein the cancer is selected from the group consisting of Wilms' tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, and urinary bladder cancer.
- The method of claim 73, wherein the condition or disease is infectious, and wherein the infectious agent is selected from the group consisting of human immunodeficiency virus (HIV) , herpes simplex virus-l (HSV1) , and herpes simplex virus-2 (HSV2) , Hepatitis B.
- The system of any one of claims 25 to 49 or the cell of any one of claims 50 to 52 for use as a medicament.
- The system of any one of claims 25 to 49 or the cell of any one of claims 50 to 52 for use in the treatment or prevention of a cancer or an infectious disease.
- The system or the cell for use in accordance with claim 78, wherein the cancer is selected from the group consisting of Wilms' tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, and urinary bladder cancer.
- Use of the system in accordance with any one of claims 25 to 49 or cell of any one of claims 50 to 52 for an in vitro or ex vivo method of:a) targeting and editing a target nucleic acid;b) non-specifically degrading single-stranded DNA upon recognition of a DNA target nucleic acid;c) targeting and nicking a non-spacer complementary strand of a double-stranded target DNA upon recognition of a spacer complementary strand of the double-stranded target DNA;d) targeting and cleaving a double-stranded target DNA;e) detecting a target nucleic acid in a sample;f) specifically editing a double-stranded nucleic acid;g) base editing a double-stranded nucleic acid;h) inducing genotype-specific or transcriptional-state-specific cell death or dormancy in a cell.i) creating an indel in a double-stranded target DNA;j) inserting a sequence into a double-stranded target DNA, ork) deleting or inverting a sequence in a double-stranded target DNA.
- Use of the system in accordance with any one of claims 25 to 49 or cell of any one of claims 50 to 52 in a method of:a) targeting and editing a target nucleic acid;b) non-specifically degrading single-stranded DNA upon recognition of a DNA target nucleic acid;c) targeting and nicking a non-spacer complementary strand of a double-stranded target DNA upon recognition of a spacer complementary strand of the double-stranded target DNA;d) targeting and cleaving a double-stranded target DNA;e) detecting a target nucleic acid in a sample;f) specifically editing a double-stranded nucleic acid;g) base editing a double-stranded nucleic acid;h) inducing genotype-specific or transcriptional-state-specific cell death or dormancy in a cell;i) creating an indel in a double-stranded target DNA;j) inserting a sequence into a double-stranded target DNA, ork) deleting or inverting a sequence in a double-stranded target DNA,wherein the method does not comprise a process for modifying the germ line genetic identity of a human being and does not comprise a method of treatment of the human or animal body.
- The method of claim 54 or 72, wherein cleaving the target DNA or target nucleic acid results in the formation of an indel.
- The method of claim 54 or 72, wherein cleaving the target DNA or target nucleic acid results in the insertion of a nucleic acid sequence.
- The method of claim 54 or 72, wherein cleaving the target DNA or target nucleic acid comprises cleaving the target DNA or target nucleic acid in two sites, and results in the deletion or inversion of a sequence between the two sites.
- A eukaryotic cell comprising a modified target locus of interest, wherein the target locus of interest has been modified according to a method or via use of a composition of any one of the preceding claims.
- The eukaryotic cell of claim 85, wherein the modification of the target locus of interest results in:(i) the eukaryotic cell comprising altered expression of at least one gene product;(ii) the eukaryotic cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is increased;(iii) the eukaryotic cell comprising altered expression of at least one gene product, wherein the expression of the at least one gene product is decreased; or(iv) the eukaryotic cell comprising an edited genome.
- The eukaryotic cell of claim 85 or 86, wherein the eukaryotic cell comprises a mammalian cell.
- The eukaryotic cell of claim 87, wherein the mammalian cell comprises a human cell.
- A eukaryotic cell line of or comprising the eukaryotic cell of any one of claims 85 to 87, or progeny thereof.
- A multicellular organism comprising one or more cells according to any one of claims 85 to 87.
- A plant or animal model comprising one or more cells according to any one of claims 85 to 87.
- A method of producing a plant, having a modified trait of interest encoded by a gene of interest, the method comprising contacting a plant cell with a system according to any one of claims 25 to 49, thereby either modifying or introducing said gene of interest, and regenerating a plant from the plant cell.
- A method of identifying a trait of interest in a plant, wherein the trait of interest is encoded by a gene of interest, the method comprising contacting a plant cell with a system according to any one of claims 25 to 49, thereby identifying the gene of interest.
- The method of claim 93, further comprising introducing the identified gene of interest into a plant cell or plant cell line or plant germ plasm and generating a plant therefrom, whereby the plant contains the gene of interest.
- The method of claim 94, wherein the plant exhibits the trait of interest.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2024084836 | 2024-03-29 | ||
| CNPCT/CN2024/084836 | 2024-03-29 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025201481A1 true WO2025201481A1 (en) | 2025-10-02 |
Family
ID=97218500
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2025/085450 Pending WO2025201481A1 (en) | 2024-03-29 | 2025-03-27 | Crispr-cas systems |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025201481A1 (en) |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160340660A1 (en) * | 2013-12-12 | 2016-11-24 | The Broad Institute Inc. | Crispr-cas systems, crystal structure and uses thereof |
| WO2017155714A1 (en) * | 2016-03-11 | 2017-09-14 | Pioneer Hi-Bred International, Inc. | Novel cas9 systems and methods of use |
| CN112041444A (en) * | 2018-03-14 | 2020-12-04 | 阿伯生物技术公司 | Novel CRISPR DNA targeting enzymes and systems |
| CN114269912A (en) * | 2019-06-14 | 2022-04-01 | 阿伯生物技术公司 | Novel CRISPR DNA targeting enzymes and systems |
| CN114729011A (en) * | 2019-08-27 | 2022-07-08 | 阿伯生物技术公司 | Novel CRISPR DNA targeting enzyme and system |
| CN116590257A (en) * | 2020-02-28 | 2023-08-15 | 辉大(上海)生物科技有限公司 | VI-E type and VI-F type CRISPR-Cas system and application thereof |
| WO2023241669A1 (en) * | 2022-06-16 | 2023-12-21 | 尧唐(上海)生物科技有限公司 | Crispr-cas effector protein, gene editing system therefor, and application |
-
2025
- 2025-03-27 WO PCT/CN2025/085450 patent/WO2025201481A1/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160340660A1 (en) * | 2013-12-12 | 2016-11-24 | The Broad Institute Inc. | Crispr-cas systems, crystal structure and uses thereof |
| WO2017155714A1 (en) * | 2016-03-11 | 2017-09-14 | Pioneer Hi-Bred International, Inc. | Novel cas9 systems and methods of use |
| CN112041444A (en) * | 2018-03-14 | 2020-12-04 | 阿伯生物技术公司 | Novel CRISPR DNA targeting enzymes and systems |
| CN114269912A (en) * | 2019-06-14 | 2022-04-01 | 阿伯生物技术公司 | Novel CRISPR DNA targeting enzymes and systems |
| CN114729011A (en) * | 2019-08-27 | 2022-07-08 | 阿伯生物技术公司 | Novel CRISPR DNA targeting enzyme and system |
| CN116590257A (en) * | 2020-02-28 | 2023-08-15 | 辉大(上海)生物科技有限公司 | VI-E type and VI-F type CRISPR-Cas system and application thereof |
| WO2023241669A1 (en) * | 2022-06-16 | 2023-12-21 | 尧唐(上海)生物科技有限公司 | Crispr-cas effector protein, gene editing system therefor, and application |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240182928A1 (en) | Novel crispr enzymes and systems | |
| JP7536053B2 (en) | Systems, methods and compositions for sequence manipulation with optimized CRISPR-Cas systems | |
| AU2023204078B2 (en) | Novel Type VI CRISPR orthologs and systems | |
| US12410435B2 (en) | Compositions and methods of use of CRISPR-Cas systems in nucleotide repeat disorders | |
| US12234454B2 (en) | Crispr enzymes and systems | |
| EP3230452B1 (en) | Dead guides for crispr transcription factors | |
| CA3059757A1 (en) | Novel type vi crispr orthologs and systems | |
| AU2014361784A1 (en) | Delivery, use and therapeutic applications of the CRISPR-Cas systems and compositions for HBV and viral diseases and disorders | |
| CA2915842A1 (en) | Delivery and use of the crispr-cas systems, vectors and compositions for hepatic targeting and therapy | |
| WO2025201481A1 (en) | Crispr-cas systems | |
| US20240425850A1 (en) | Noncanonical crRNA for Highly Efficient Genome Editing | |
| WO2025039972A9 (en) | Tls-based gene editing systems | |
| HK40003251A (en) | Novel crispr enzymes and systems | |
| HK40003251B (en) | Novel crispr enzymes and systems |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25775392 Country of ref document: EP Kind code of ref document: A1 |