WO2024240053A1 - Gene editing protein, corresponding gene editing system thereof, and use thereof - Google Patents
Gene editing protein, corresponding gene editing system thereof, and use thereof Download PDFInfo
- Publication number
- WO2024240053A1 WO2024240053A1 PCT/CN2024/093739 CN2024093739W WO2024240053A1 WO 2024240053 A1 WO2024240053 A1 WO 2024240053A1 CN 2024093739 W CN2024093739 W CN 2024093739W WO 2024240053 A1 WO2024240053 A1 WO 2024240053A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- protein
- nucleic acid
- gene editing
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K38/00—Medicinal preparations containing peptides
- A61K38/16—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- A61K38/43—Enzymes; Proenzymes; Derivatives thereof
- A61K38/46—Hydrolases (3)
- A61K38/465—Hydrolases (3) acting on ester bonds (3.1), e.g. lipases, ribonucleases
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K48/00—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
- A61K48/005—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy characterised by an aspect of the 'active' part of the composition delivered, i.e. the nucleic acid delivered
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P27/00—Drugs for disorders of the senses
- A61P27/02—Ophthalmic agents
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P27/00—Drugs for disorders of the senses
- A61P27/16—Otologicals
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P31/00—Antiinfectives, i.e. antibiotics, antiseptics, chemotherapeutics
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P35/00—Antineoplastic agents
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P35/00—Antineoplastic agents
- A61P35/02—Antineoplastic agents specific for leukemia
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6816—Hybridisation assays characterised by the detection means
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
Definitions
- the present invention relates to the field of gene editing, and in particular, to a gene editing protein, a corresponding gene editing system and applications.
- CRISPR Clustered regularly interspaced short palindromic repeats
- the Cas9 protein can process pre-crRNA into mature crRNA that binds to tracrRNA with the assistance of trans-encoded small RNA (tracrRNA). Later, people found that by artificially constructing a single-stranded chimeric guide RNA (guide RNA, gRNA) that simulates the crRNA-tracrRNA complex, the Cas9 protein can effectively mediate the recognition and cutting of the target.
- guide RNA, gRNA single-stranded chimeric guide RNA
- the three bases adjacent to the 3′ end of the target must be in the form of 5′-NGG-3′, thus forming the PAM (protospacer adjacent motif) structure required for the Cas/crRNA complex to recognize the target.
- Cas9 requires two RNAs as guide RNAs.
- the main purpose of the present invention is to provide a new CRISPR/Cas system with diverse characteristics.
- Another object of the present invention is to discover new CRISPR-Cas systems, provide alternative and robust systems and techniques for targeting nucleic acids or polynucleotides, and address the shortcomings of currently known CRISPR-Cas systems.
- the first aspect of the present invention provides a gene editing protein, wherein the protein is selected from the following group:
- polypeptide having ⁇ 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5% homology (or identity) with the amino acid sequence of SEQ ID NO:1, and the polypeptide has the biological function of SEQ ID NO:1;
- the gene editing protein is an effector protein in the CRISPR/Cas system.
- the second aspect of the present invention provides a fusion protein comprising the gene editing protein described in the first aspect of the present invention; and one or more functional domains.
- the functional domain is selected from a localization signal, a reporter protein, a Cas protein targeting portion, a DNA binding domain, an epitope tag, a transcription activation domain, a transcription repression domain, a nuclease, a deamination domain, a methylase, a demethylase, a transcription release factor, an HDAC, a cleavage active polypeptide, a ligase, an integrase, a transposase, a recombinase, a polymerase, and a base excision repair inhibitor (such as a uracil-DNA glycosylase inhibitor (UGI)).
- a uracil-DNA glycosylase inhibitor UBI
- the functional domain includes one or more of the following enzymatic activities on the target sequence: methylase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase) and deglycosylation activity.
- methylase activity methylase activity
- acetyltransferase activity deacetylase activity
- kinase activity phosphatase activity
- ubiquitin ligase activity deubiquitinating activity
- adenylation activity deadenylation activity
- the functional domain is selected from adenosine deaminase catalytic domain or cytidine deaminase catalytic domain.
- the adenosine deaminase catalytic domain or the cytidine deaminase catalytic domain includes one or more of ADAR1, ADAR2, APOBEC, AID or TAD.
- the functional domain is the full length or functional fragment of TadA8e.
- the localization signal includes a nuclear localization signal (NLS) and/or a nuclear export signal (NES).
- NLS nuclear localization signal
- NES nuclear export signal
- sequence of the nuclear localization signal is located at, near or close to the end (eg, N-terminus or C-terminus) of the protein according to claim 1.
- the nuclear export signal includes protein tyrosine kinase 2 (such as human protein tyrosine kinase 2).
- the reporter protein includes glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), ⁇ -galactosidase, ⁇ -glucuronidase, and autofluorescent protein.
- GST glutathione-S-transferase
- HRP horseradish peroxidase
- CAT chloramphenicol acetyltransferase
- ⁇ -galactosidase ⁇ -glucuronidase
- autofluorescent protein autofluorescent protein.
- the autofluorescent protein includes green fluorescent protein (e.g., GFP, GFP-2, tagGFP, turboGFP, eGFP, CopGFP, AceGFP, etc.), HcRed, DsRed, cyan fluorescent protein (e.g., eCFP, Cerulean, CyPet, AmCyanl, etc.), yellow fluorescent protein (e.g., YFP, eYFP, Citrine, Venus, YPet, PhiYFP, etc.), blue fluorescent protein (e.g., eBFP, eBFP2, Azurite, mKalamal, GFPuv, Sapphire, T-sapphire).
- green fluorescent protein e.g., GFP, GFP-2, tagGFP, turboGFP, eGFP, CopGFP, AceGFP, etc.
- HcRed e.g., eCFP, Cerulean, CyPet, AmCyanl, etc.
- yellow fluorescent protein
- the DNA binding domain includes methylation binding protein, LexADBD, and Gal4DBD.
- the epitope tag includes a histidine tag, a V5 tag, a FLAG tag, an influenza virus hemagglutinin tag, a Myc tag, a VSV-G tag, a thioredoxin tag, or a streptavidin tag.
- the transcriptional activation domain includes VP64 and/or VPR.
- the transcriptional repression domain includes KRAB and/or SID.
- the nuclease includes FokI.
- the cleavage active polypeptide includes a polypeptide having single-stranded RNA cleavage activity, a polypeptide having double-stranded RNA cleavage activity, a polypeptide having single-stranded DNA cleavage activity or a polypeptide having double-stranded DNA cleavage activity.
- the ligase includes DNA ligase and/or RNA ligase.
- the functional domain is connected to the N-terminus and/or C-terminus of the gene editing protein.
- the functional domain is inserted between the N-terminus and the C-terminus of the gene editing protein.
- the one or more functional domains are optionally connected to the N-terminus and/or C-terminus of the gene editing protein via a linker.
- the functional domain is inserted between the N-terminus and the C-terminus of the gene editing protein through a linker.
- the fusion protein has the following structure from N-terminus to C-terminus:
- Z1 is cytosine deaminase or adenosine deaminase
- Z2 is the gene editing protein described in the first aspect of the present invention.
- Z3 is the N-terminal fragment of the gene editing protein described in the first aspect of the present invention.
- Z4 is the C-terminal fragment of the gene editing protein described in the first aspect of the present invention.
- each "-" is independently a bond or a linker.
- the third aspect of the present invention provides an isolated polynucleotide, which encodes the gene editing protein described in the first aspect of the present invention or the fusion protein described in the second aspect of the present invention.
- polynucleotide is selected from the following group:
- the polynucleotide further contains auxiliary elements selected from the following groups on the flank of the ORF of the variant: a signal peptide, a secretory peptide, a tag sequence (such as 6His), or a combination thereof.
- auxiliary elements selected from the following groups on the flank of the ORF of the variant: a signal peptide, a secretory peptide, a tag sequence (such as 6His), or a combination thereof.
- the polynucleotide is selected from the following group: genomic sequence, cDNA sequence, RNA sequence, or a combination thereof.
- the polynucleotide further comprises a promoter operably linked to the ORF sequence of the variant.
- the promoter is selected from the following group: a constitutive promoter, a tissue-specific promoter, an inducible promoter, or a strong promoter.
- the host cell includes a prokaryotic cell or a eukaryotic cell.
- the host cell is a eukaryotic cell, such as a yeast cell, a plant cell or a mammalian cell (including human and non-human mammals).
- the host cell is a prokaryotic cell, such as Escherichia coli.
- the yeast cell is selected from yeast of one or more sources of the following group: Pichia pastoris, Kluyveromyces, or a combination thereof; preferably, the yeast cell includes: Kluyveromyces, more preferably Kluyveromyces marxianus, and/or Kluyveromyces lactis.
- the host cell is selected from the following group: Escherichia coli, wheat germ cells, insect cells, SF9, Hela, HEK293, CHO, yeast cells, or a combination thereof.
- the fourth aspect of the present invention provides an isolated nucleic acid molecule comprising or consisting of a sequence selected from the following:
- sequence described in any one of (ii) to (v) substantially retains the biological function of the sequence from which it is derived;
- the isolated nucleic acid molecule is RNA
- the isolated nucleic acid molecule comprises a direct repeat sequence in a CRISPR/Cas system.
- the nucleic acid molecule comprises one or more stem loops or optimized secondary structures
- sequence of any of (ii)-(v) retains the secondary structure of the sequence from which it is derived.
- the nucleic acid molecule comprises a sequence selected from the following, or consists of a sequence selected from the following:
- the fifth aspect of the present invention provides a guide RNA (gRNA), which includes a direct repeat (DR) sequence capable of binding to the gene editing protein described in the first aspect of the present invention and a spacer sequence capable of targeting the target sequence.
- gRNA guide RNA
- DR direct repeat
- the sixth aspect of the present invention provides a composite comprising:
- nucleic acid component selected from the group consisting of the guide RNA described in the fifth aspect of the present invention, a nucleic acid encoding the guide RNA described in the fifth aspect of the present invention, a precursor RNA of the guide RNA described in the fifth aspect of the present invention, a precursor RNA nucleic acid encoding the guide RNA described in the fifth aspect of the present invention, or a combination thereof;
- the protein component and the nucleic acid component are combined with each other to form a complex.
- the direct repeat (DR) sequence in the guide RNA (gRNA) is connected to the 3’ end or 5’ end of the nucleic acid molecule.
- the spacer sequence in the guide RNA comprises a complementary sequence to the target sequence.
- the seventh aspect of the present invention provides a vector comprising the polynucleotide described in the third aspect of the present invention or the nucleic acid molecule described in the fourth aspect of the present invention.
- the vector comprises:
- a first regulatory element which is operably linked to a nucleotide sequence encoding the gene editing protein described in the first aspect of the present invention or a nucleotide sequence encoding the fusion protein described in the second aspect of the present invention;
- the guide RNA comprises:
- a direct repeat (DR) sequence connected to the spacer sequence, capable of guiding the gene editing protein described in the first aspect of the present invention to bind to the guide RNA to form the complex described in the sixth aspect of the present invention that targets the target sequence.
- DR direct repeat
- the first regulatory element and the second regulatory element are located on the same or different vectors.
- the first regulatory element and/or the second regulatory element is a promoter, such as an inducible promoter.
- the vector comprises one or more promoters, which are operably connected to the nucleic acid sequence, enhancer, transcription termination signal, polyadenylation sequence, replication origin, selective marker, nucleic acid restriction site, and/or homologous recombination site.
- the vector includes a plasmid or a viral vector.
- the viral vector is selected from the following group: adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, herpes virus, SV40, poxvirus, or a combination thereof.
- AAV adeno-associated virus
- adenovirus adenovirus
- lentivirus lentivirus
- retrovirus lentivirus
- herpes virus SV40
- poxvirus poxvirus
- the vector includes a cloning vector, a transformation vector, an expression vector, a shuttle vector, an integration vector, and a multifunctional vector.
- the eighth aspect of the present invention provides a CRISPR-Cas composition, comprising:
- a first component selected from the group consisting of the gene editing protein described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, a nucleotide sequence encoding the gene editing protein described in the first aspect of the present invention or the fusion protein described in the second aspect of the present invention, and any combination thereof;
- a second component wherein the second component is a nucleotide sequence comprising one or more guide RNAs according to the fifth aspect of the present invention, or encoding the nucleotide sequence comprising one or more guide RNAs according to the fifth aspect of the present invention;
- the guide RNA is capable of forming a complex with the protein, protein variant or fusion protein described in (i).
- the guide RNA comprises a direct repeat sequence and a spacer sequence from the 5' to the 3' direction, and the spacer sequence is capable of hybridizing with the target sequence.
- the direct repeat sequence is the nucleic acid molecule defined in the fourth aspect of the present invention.
- composition further comprises a pharmaceutically acceptable carrier.
- the composition comprises a pharmaceutical composition.
- the dosage form of the composition is selected from the following group: a lyophilized preparation, a liquid preparation, or a combination thereof.
- the composition is in the form of a liquid preparation.
- the composition is in the form of an injection.
- the composition is a cell preparation.
- a ninth aspect of the present invention provides a CRISPR-Cas system, comprising one or more vectors, wherein the one or more vectors comprise:
- a first nucleic acid which is a nucleotide sequence encoding the gene editing protein described in the first aspect of the present invention or the fusion protein described in the second aspect of the present invention; optionally, the first nucleic acid is operably linked to a first regulatory element;
- a second nucleic acid encoding a nucleotide sequence comprising the guide RNA according to the fifth aspect of the present invention.
- the second nucleic acid is operably linked to a second regulatory element;
- the first nucleic acid and the second nucleic acid are present on the same or different vectors
- the guide RNA is capable of forming a complex with the protein or fusion protein described in (i).
- the vector includes a plasmid or a viral vector.
- the guide RNA includes a spacer sequence capable of hybridizing with the target sequence; and a direct repeat (DR) sequence connected to the spacer sequence and capable of guiding the protein to bind to the guide RNA, thereby forming a CRISPR-Cas composition or complex targeting the target sequence.
- DR direct repeat
- the guide RNA includes unmodified and modified guide RNA.
- the modified guide RNA includes chemical modification of the bases.
- the chemical modification includes methylation modification, methoxy modification, fluorination modification or thio modification.
- the directly repeated sequence is the nucleic acid molecule defined in claim 4.
- the first regulatory element and/or the second regulatory element is a promoter, such as an inducible promoter.
- At least one component in the composition is non-naturally occurring or modified.
- the spacer sequence is connected to the 3’ end of the direct repeat (DR) sequence.
- the spacer sequence comprises a complementary sequence to the target sequence.
- the target sequence when the target sequence is DNA, the target sequence is located at the 3' end of the protospacer adjacent motif (PAM), and the PAM has a sequence represented by 5'-PAM being TTTN, and N is A, T, C or G.
- PAM protospacer adjacent motif
- the target sequence is a DNA from a prokaryotic cell or a eukaryotic cell, or a DNA sequence formed based on RNA reverse transcription; or, the target sequence is a non-naturally occurring DNA, or a DNA sequence formed based on RNA reverse transcription.
- the target sequence includes a cDNA sequence.
- the target sequence includes single-stranded DNA and double-stranded DNA sequences.
- the target sequence exists in cells.
- the target sequence is present in the cell nucleus or cytoplasm (eg, organelle).
- the cell is a eukaryotic cell.
- the cell is a prokaryotic cell.
- the target sequence exists outside the cell.
- the gene editing protein in the first aspect of the present invention is connected to one or more NLS sequences, or the fusion protein contains one or more NLS sequences.
- the NLS sequence is connected to the N-terminus or C-terminus of the gene editing protein described in the first aspect of the present invention.
- the NLS sequence is fused to the N-terminus or C-terminus of the gene editing protein described in the first aspect of the present invention.
- the tenth aspect of the present invention provides a kit comprising one or more components selected from the following: The gene editing protein described in the first aspect of the invention, the fusion protein described in the second aspect of the invention, the polynucleotide described in the third aspect of the invention, the complex described in the sixth aspect of the invention, the vector described in the seventh aspect of the invention, the CRISPR-Cas composition described in the eighth aspect of the invention, or the system described in the ninth aspect of the invention.
- the kit further comprises a label or instructions.
- the kit is used for one or more of gene or genome editing, disease treatment, target gene targeting, and cutting of target genes or non-target genes.
- the eleventh aspect of the present invention provides a delivery composition comprising a delivery vector and one or more selected from the following: the gene editing protein described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, the polynucleotide described in the third aspect of the present invention, the complex described in the sixth aspect of the present invention, the vector described in the seventh aspect of the present invention, the CRISPR-Cas composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention.
- the delivery vehicle is a particle.
- the delivery vector is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, microvesicles, gene guns or viral vectors (e.g., replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses).
- viral vectors e.g., replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses.
- the twelfth aspect of the present invention provides a host cell, comprising the gene editing protein described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, the polynucleotide described in the third aspect of the present invention, the nucleic acid molecule described in the fourth aspect of the present invention, the complex described in the sixth aspect of the present invention, the vector described in the seventh aspect of the present invention, the CRISPR-Cas composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention.
- the host cell is a eukaryotic cell, such as a yeast cell, a plant cell or a mammalian cell (including human and non-human mammals).
- the host cell is a prokaryotic cell, such as Escherichia coli.
- the yeast cell is selected from yeast of one or more sources of the following group: Pichia pastoris, Kluyveromyces, or a combination thereof; preferably, the yeast cell includes: Kluyveromyces, more preferably Kluyveromyces marxianus, and/or Kluyveromyces lactis.
- the host cell is selected from the following group: Escherichia coli, wheat germ cells, insect cells, SF9, Hela, HEK293, CHO, yeast cells, or a combination thereof.
- the thirteenth aspect of the present invention provides an enzyme preparation, which includes the gene editing protein described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, the complex described in the sixth aspect of the present invention, the CRISPR-Cas composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention or the delivery composition described in the eleventh aspect of the present invention.
- the enzyme preparation includes an injection and/or a lyophilized preparation.
- a fourteenth aspect of the present invention provides a medicine kit, comprising:
- the drug in the first container is a single preparation containing the complex described in the sixth aspect of the present invention, the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention.
- the dosage form of the drug is selected from the following group: a lyophilized preparation, a liquid preparation, or a combination thereof.
- the dosage form of the drug is an oral dosage form or an injection dosage form.
- the medicine kit further contains instructions.
- a fifteenth aspect of the present invention provides a medicine kit, comprising:
- first container and the second container are different containers.
- the drug in the first container is a single preparation containing the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or its encoding gene or its expression vector.
- the drug in the second container is a single preparation containing the guide RNA or its expression vector described in the fifth aspect of the present invention.
- the dosage form of the drug is selected from the following group: a lyophilized preparation, a liquid preparation, or a combination thereof.
- the dosage form of the drug is an oral dosage form or an injection dosage form.
- the medicine kit further contains instructions.
- the sixteenth aspect of the present invention provides a method for targeting and editing a target gene or cutting a target gene, comprising: the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention, or the enzyme preparation described in the thirteenth aspect of the present invention, or the drug kit described in the fourteenth aspect of the present invention or the fifteenth aspect of the present invention is contacted with the target gene, or delivered to a cell containing the target gene, and the target sequence is present in the target gene.
- the target gene exists in cells.
- the cell is a prokaryotic cell.
- the cell is a eukaryotic cell, such as a mammalian cell (eg, a human cell) or a plant cell.
- the target gene exists in a nucleic acid molecule (eg, a plasmid) in vitro.
- a nucleic acid molecule eg, a plasmid
- the editing of the target gene or the cutting of the target gene includes the break of the target sequence, such as a double-strand break of DNA or a single-strand break of RNA, or the insertion of an exogenous nucleic acid into the break.
- the target gene includes DNA.
- the DNA includes single-stranded DNA and double-stranded DNA.
- the seventeenth aspect of the present invention provides a method for inducing a change in a cell state, the method comprising contacting the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention, or the enzyme preparation described in the thirteenth aspect of the present invention, or the drug kit described in the fourteenth aspect of the present invention or the fifteenth aspect of the present invention with a target gene in a cell.
- the eighteenth aspect of the present invention provides a method for changing the expression of a gene product, comprising: modifying the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the The complex described in the sixth aspect, the composition described in the eighth aspect of the present invention, the system described in the ninth aspect of the present invention, the delivery composition described in the eleventh aspect of the present invention, the enzyme preparation described in the thirteenth aspect of the present invention, or the drug kit described in the fourteenth aspect of the present invention or the fifteenth aspect of the present invention is contacted with a nucleic acid molecule encoding the gene product, or delivered to a cell containing the nucleic acid molecule, and the target sequence is present in the nucleic acid molecule.
- the nucleic acid molecule is present in an in vitro nucleic acid molecule (eg, a plasmid).
- the expression of the gene product is altered (eg, enhanced or decreased).
- the gene product is a protein.
- the protein, fusion protein, polynucleotide, isolated nucleic acid molecule, complex, vector or composition is contained in a delivery vehicle.
- the delivery vector is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, viral vectors (such as replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses).
- viral vectors such as replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses.
- the method is used to modify cells, cell lines or organisms by changing one or more target sequences in a target gene or a nucleic acid molecule encoding a target gene product.
- the nineteenth aspect of the present invention provides a cell or its progeny obtained by the method described in any one of the sixteenth to eighteenth aspects of the present invention, wherein the cell comprises a modification that is not present in its wild type.
- the twentieth aspect of the present invention provides a cell product of the cell or its progeny described in the nineteenth aspect of the present invention.
- the twenty-first aspect of the present invention provides an in vitro, ex vivo or in vivo cell or cell line or their progeny, wherein the cell or cell line or their progeny comprises: the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the polynucleotide described in the third aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the vector described in the seventh aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention.
- the cell is a prokaryotic cell.
- the cell is a eukaryotic cell, such as a mammalian cell (eg, a human cell) or a plant cell.
- the cells are stem cells or stem cell lines.
- the twenty-second aspect of the present invention provides the use of the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the polynucleotide described in the third aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the vector described in the seventh aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the kit described in the tenth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention, or the enzyme preparation described in the thirteenth aspect of the present invention, or the medicine kit described in the fourteenth aspect of the present invention or the fifteenth aspect of the present invention, for preparing a drug or preparation, wherein the drug or preparation is used for nucleic acid editing (e.g., gene or genome editing).
- nucleic acid editing e.g., gene or genome editing
- the gene or genome editing includes modifying genes, knocking out genes, changing the expression of gene products, repairing mutations, and/or inserting polynucleotides.
- the twenty-third aspect of the present invention provides the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the polynucleotide described in the third aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the vector described in the seventh aspect of the present invention, or the composition described in the eighth aspect of the present invention.
- the drug or preparation is used for one or more selected from the group consisting of:
- the disease or illness includes cancer, infectious disease, neurological disease, ophthalmic disease, hearing disease.
- the disease or condition includes cystic fibrosis, atherosclerotic cardiovascular disease (ASCVD), progressive pseudohypertrophic muscular dystrophy (Duchenne muscular dystrophy, DMD), Becker muscular dystrophy, alpha-1-antitrypsin deficiency, Pompe disease (glycogen storage disease type II), myotonic dystrophy, Huntington's disease, fragile X syndrome, Friedreich's ataxia, amyotrophic lateral sclerosis, hereditary chronic kidney disease, sickle cell disease, primary hyperoxaluria (PH1), beta thalassemia, frontotemporal dementia, Leber's congenital amaurosis, hyperlipidemia, hypercholesterolemia (FH), hereditary angioedema (HAE), transthyretinopathy.
- ASCVD atherosclerotic cardiovascular disease
- DMD progressive pseudohypertrophic muscular dystrophy
- DMD progressive pseudohypertrophic muscular dystrophy
- DMD progressive pseudohypertrophic muscular dystrophy
- ATTR Hepatitis B, Retinal diseases, Macular degeneration, Wilms' tumor, Ewing's sarcoma, Neuroendocrine tumors, Glioblastoma, Neuroblastoma, Melanoma, Skin cancer, Breast cancer, Colon cancer, Rectal cancer, Prostate cancer, Liver cancer, Kidney cancer, Pancreatic cancer, Lung cancer, Biliary tract cancer, Cervical cancer, Endometrial cancer, Esophageal cancer, Gastric cancer, Head and neck cancer, Medullary thyroid cancer, Ovarian cancer, Glioma, Lymphoma, Leukemia, Myeloma, Acute lymphocytic leukemia, Acute myeloid leukemia, Chronic lymphocytic leukemia, Chronic myeloid leukemia, Hodgkin's lymphoma, Non-Hodgkin's lymphoma and Urinary bladder cancer.
- the disorder or disease is caused by a pathogenic point mutation.
- the twenty-fourth aspect of the present invention provides a method for detecting whether a target nucleic acid molecule is present in a sample, the method comprising contacting the sample with the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the kit described in the tenth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention, or the enzyme preparation described in the thirteenth aspect of the present invention and a non-target sequence, detecting a detectable signal generated by the cleavage of the non-target sequence, thereby detecting the target nucleic acid molecule, wherein the non-target sequence does not hybridize with the guide RNA.
- the non-target sequence is cleaved by a protein in the complex or CRISPR-Cas composition or system or delivery composition, it indicates that the target nucleic acid molecule is present in the sample; and if the non-target sequence is not cleaved by a protein in the complex or CRISPR-Cas composition or system or delivery composition, it indicates that the target nucleic acid molecule is not present in the sample.
- the target nucleic acid molecule is a target DNA.
- the target DNA includes DNA formed based on RNA reverse transcription.
- the target DNA includes cDNA.
- the target DNA is selected from the following group: single-stranded DNA, double-stranded DNA, or a combination thereof.
- FIG. 1 shows an agarose gel electrophoresis diagram of the CasW1 insert fragment and the pET-28a(+) vector double-digested with EcoRI and NotI.
- FIG2 shows the SDS-PAGE Coomassie Brilliant Blue staining of the newly purified CasW1 protein.
- FIG3 shows an agarose gel electrophoresis diagram of dsDNA template prepared in vitro.
- FIG4 shows a capillary electrophoresis diagram for identifying the in vitro cleavage effect of CasW1. Taking the in vitro cleavage effect of Cpf1 as a control, it can be seen that CasW1 can cleave a 450 bp dsDNA template into two dsDNA fragments, and the cleavage activity is close to 100%.
- Figure 5 shows a capillary electrophoresis diagram for identifying the cleavage effect of CasW1 in vitro, with Cas12i.16 and S7R-Cas12i3 (M2869) in the prior art as controls.
- FIG. 6 shows a plasmid map of pET-28a(+)-CasW1.
- FIG. 7 shows a plasmid map of pET-28a(+)-LbCpf1.
- Figure 8 shows the plasmid map of pET-28a(+)-HED Cas12i.16.
- Figure 9 shows the plasmid map of pET-28a(+)-S7R Cas12i.3.
- FIG10 shows a schematic diagram of the secondary structure of the DR sequence of CasW1.
- the gene editing protein of the present invention has very good gene editing activity, can effectively edit or cut the target gene, and can effectively treat the symptoms or diseases of subjects in need. On this basis, the inventor completed the present invention.
- the term “about” can refer to a value or composition that is within an acceptable error range for a particular value or composition as determined by one of ordinary skill in the art, which will depend in part on how the value or composition is measured or determined.
- the expression “about 100” includes all values between 99 and 101 (e.g., 99.1, 99.2, 99.3, 99.4, etc.).
- the term “comprising” or “including (comprising)” may be open, semi-closed and closed. In other words, the term also includes “consisting essentially of” or “consisting of”.
- Sequence identity is determined by comparing two aligned sequences along a predetermined comparison window (which can be 50%, 60%, 70%, 80%, 90%, 95% or 100% of the length of the reference nucleotide sequence or protein) and determining the number of positions at which identical residues occur. Typically, this is expressed as a percentage.
- a predetermined comparison window which can be 50%, 60%, 70%, 80%, 90%, 95% or 100% of the length of the reference nucleotide sequence or protein
- the gene editing protein is the effector protein in the CRISPR/Cas system.
- Cas protein In the present invention, Cas protein, Cas enzyme, and Cas effector protein can be used interchangeably.
- Cas protein is taken in the broadest sense, including wild-type Cas protein, its derivatives or variants, analogs, and functional fragments thereof such as oligonucleotide binding fragments.
- wild type has the meaning generally understood by those skilled in the art, which refers to the typical form of an organism, strain, gene, protein, or the characteristics that distinguish it from mutant or variant forms when it exists in nature, which can be isolated from a source in nature and has not been intentionally modified by man.
- variant refers to polypeptides that substantially retain the function or activity of the Cas protein of the present invention.
- the derivatization of a protein does not adversely affect the desired activity of the protein (e.g., activity binding to a guide RNA, endonuclease activity, activity binding to and cutting a specific site of a target sequence under the guidance of a guide RNA), that is, the derivative of the protein has the same activity as the protein.
- a modified form of a "derivative” includes one or more amino acids of the protein that may be deleted, inserted, modified and/or substituted.
- non-naturally occurring or “engineered” are used interchangeably and indicate artificial involvement.
- the present invention provides a gene editing protein (Cas protein) comprising an amino acid sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% identity with the amino acid sequence of SEQ ID NO.1, and substantially retains the biological function of the sequence from which it is derived;
- the amino acid sequence of the Cas protein has one or more amino acid substitutions, deletions or additions compared to the amino acid sequence of SEQ ID NO.1, and substantially retains the biological function of the sequence from which it is derived;
- the Cas protein comprises the amino acid sequence shown in SEQ ID NO.1;
- amino acid substitutions, deletions or additions e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions
- amino acid substitutions, deletions or additions e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions
- the Cas protein has an amino acid sequence shown in SEQ ID NO.1.
- an amino acid residue may be substituted with another amino acid residue belonging to the same group as the site to be substituted, i.e., with a non- A polar amino acid residue replaces another non-polar amino acid residue, a polar uncharged amino acid residue replaces another polar uncharged amino acid residue, a basic amino acid residue replaces another basic amino acid residue, and an acidic amino acid residue replaces another acidic amino acid residue.
- Such substituted amino acid residues may or may not be encoded by the genetic code.
- the protein of the present invention may contain one or more conservative substitutions in the amino acid sequence, and these conservative substitutions are preferably produced by substitution according to Table A.
- the present invention also encompasses proteins that also contain one or more other non-conservative substitutions, as long as the non-conservative substitutions do not significantly affect the desired functions and biological activities of the protein of the present invention.
- Non-essential amino acid residues are amino acid residues that can be changed (deleted, substituted or replaced) without changing biological activity, while “essential” amino acid residues are required for biological activity.
- Constant amino acid substitutions are substitutions in which amino acid residues are replaced by amino acid residues with similar side chains. Amino acid substitutions can be performed in non-conserved regions of Cas enzymes. In general, such substitutions are not performed on conserved amino acid residues, or on amino acid residues located within conserved motifs, where such residues are required for protein activity. However, it will be appreciated by those skilled in the art that functional variants may have fewer conservative or non-conservative changes in conserved regions.
- one or more amino acid residues can be changed (replaced, deleted, truncated or inserted) from the N and/or C terminus of a protein while still retaining its functional activity. Therefore, proteins that have changed one or more amino acid residues from the N and/or C terminus of the Cas protein of the present invention while retaining its desired functional activity are also within the scope of the present invention.
- These changes may include changes introduced by modern molecular methods such as PCR, which includes PCR amplification of a protein coding sequence by means of an amino acid coding sequence included in the oligonucleotides used in the PCR amplification to change or extend the PCR amplification.
- proteins can be altered in a variety of ways, including amino acid substitutions, deletions, truncations, and insertions, and methods for such manipulations are generally known to those skilled in the art.
- amino acid sequence variants of Cas proteins can be prepared by mutation of DNA. It can also be accomplished by other forms of mutagenesis and/or by directed evolution, for example, using known mutagenesis, recombination and/or shuffling methods, combined with relevant screening methods, to perform one or more amino acid substitutions; or one to more amino acid deletions and/or one to more amino acid insertions.
- these minor amino acid changes in the Cas proteins of the present invention can occur (e.g., naturally occurring mutations) or be produced (e.g., using r-DNA technology) without loss of protein function or activity. If these mutations occur in the catalytic domain, active site, or other functional domain of the protein, the properties of the polypeptide may be changed, but the polypeptide may retain its activity. If the mutations present are not close to the catalytic domain, active site, or other functional domain, it can be expected that the impact will be small.
- the catalytic domain, active site or other functional domain of the protein can also be determined by physical analysis of the structure, such as by the following techniques: such as nuclear magnetic resonance, crystallography, electron diffraction or photoaffinity labeling, combined with mutations of amino acids at putative key sites.
- an "orthologue” of a protein as described herein refers to a protein belonging to a different species that performs the same or similar function as its orthologue.
- the nucleic acid cleavage disclosed herein includes: DNA or RNA breakage in the target nucleic acid produced by the Cas protein (Cis cleavage), DNA or RNA breakage in the side branch nucleic acid substrate (single-stranded nucleic acid substrate) caused by the Cas protein side cutting activity (i.e., non-specific or non-targeted, Trans cleavage).
- the cleavage is a double-stranded DNA break.
- the cleavage is a single-stranded DNA break or a single-stranded RNA break.
- Trans cutting refers to that in certain environments, the activated Cas12 family protein remains active after binding to the target sequence, and continues to non-specifically cut non-target oligonucleotides.
- the side cutting activity can detect the presence of specific target oligonucleotides using the Cas system.
- the Cas12i system is engineered to non-specifically cut ssDNA or transcripts.
- Side cutting activity is used for a highly sensitive and specific nucleic acid detection platform called SHERLOCK, which can be used for many clinical diagnoses (Gootenberg, JS et al., Nucleic acid detection with CRISPR-Cas13a/C2c2. Science 356, 438-442 (2017)).
- the present invention provides a fusion protein, comprising the Cas protein described in any one of the foregoing and one or more functional domains.
- the functional domain includes one or more of a localization signal, a reporter protein, a Cas protein targeting portion, a DNA binding domain, an epitope tag, a transcription activation domain, a transcription repression domain, a nuclease, a deamination domain, a methylase, a demethylase, a transcription release factor, an HDAC, a cleavage active polypeptide, and a ligase;
- methylase is exemplified by HhaIDNA m5c-methyltransferase (M.HhaI), DNA methyltransferase 1 (DNMT1), DNA methyltransferase 3a (DNMT3a), DNA methyltransferase 3b (DNMT3b), METI, DRM3, ZMET2, CMT1, CMT2, etc.
- M.HhaI HhaIDNA m5c-methyltransferase
- DNMT1 DNA methyltransferase 1
- DNMT3a DNA methyltransferase 3a
- DNMT3b DNA methyltransferase 3b
- METI DRM3, ZMET2, CMT1, CMT2, etc.
- Demethylase refers to an enzyme that removes methyl (CH3-) groups from nucleic acids, proteins (e.g., histones), and other molecules. Demethylases are important in epigenetic modification mechanisms. Demethylase proteins change the transcriptional regulation of the genome by controlling the methylation levels that occur on DNA and histones, and in turn regulate the chromatin state at specific loci in the organism, such as TET1 (ten-eleven translocation 1), ten-eleven translocation (TET) dioxygenase 1 (TET1CD), DME, DML1, DML2, ROS1, etc.
- TET1 ten-eleven translocation 1
- TET ten-eleven translocation 1
- TET1CD ten-eleven translocation 1
- DME DML1, DML2, ROS1, etc.
- the transcriptional release factor for example, is eukaryotic release factor 1 (ERF1) activity or eukaryotic release factor 3 (ERF3).
- the functional domain is selected from an adenosine deaminase catalytic domain or a cytidine deaminase catalytic domain.
- the localization signal comprises a nuclear localization signal and/or a nuclear export signal
- the nuclear export signal comprises human protein tyrosine kinase 2;
- the reporter protein comprises one or more of glutathione-S-transferase, horseradish peroxidase, chloramphenicol acetyltransferase, ⁇ -galactosidase, ⁇ -glucuronidase or autofluorescent protein;
- the autofluorescent protein includes one or more of green fluorescent protein, HcRed, DsRed, cyan fluorescent protein, yellow fluorescent protein or blue fluorescent protein;
- the DNA binding domain comprises one or more of methylation binding protein, LexADBD or Gal4DBD;
- the epitope tag comprises one or more of a histidine tag, a V5 tag, a FLAG tag, an influenza virus hemagglutinin tag, a Myc tag, a VSV-G tag or a thioredoxin tag;
- the transcriptional activation domain comprises VP64 and/or VPR;
- the transcriptional repression domain comprises KRAB and/or SID;
- the nuclease comprises FokI;
- the deamination domain comprises one or more of ADAR1, ADAR2, APOBEC, AID or TAD;
- the cleavage active polypeptide includes a polypeptide having single-stranded RNA cleavage activity, a polypeptide having double-stranded RNA cleavage activity, a polypeptide having single-stranded DNA cleavage activity or a polypeptide having double-stranded DNA cleavage activity;
- the ligase comprises DNA ligase and/or RNA ligase.
- the functional domain is the full length or a functional fragment of TadA8e.
- the present invention provides a polynucleotide, which is a polynucleotide sequence encoding the gene editing protein (Cas protein), or a polynucleotide sequence encoding the aforementioned fusion protein.
- a polynucleotide which is a polynucleotide sequence encoding the gene editing protein (Cas protein), or a polynucleotide sequence encoding the aforementioned fusion protein.
- the polynucleotide (DNA molecule) includes nucleotides that have more than 70%, preferably more than 90%, more preferably more than 95%, further preferably 99%, and further preferably 100% identity with the nucleotide sequence described in SEQ ID NO.2.
- the polynucleotide is a DNA molecule that is codon-optimized according to the codon preference of the host cell.
- the optimization described in the present disclosure may require mutations in the nucleotide sequence of the coded protein (e.g., the Cas protein of the present disclosure) to simulate the codon preference of the expected host organism or cell when encoding the same protein at the same time. Therefore, the codon can be changed, but the encoded protein remains unchanged.
- the expected target cell is a human cell
- a nucleotide sequence of the coded protein optimized by human codons can be used.
- the expected host cell is an animal cell (e.g., a mouse cell, an insect cell)
- a nucleotide sequence of the coded protein optimized by the animal codon can be generated.
- the expected host cell is a plant cell, a nucleotide sequence of the coded protein optimized by plant codons can be generated.
- the nucleic acid of the present disclosure comprises a nucleotide sequence encoding CasY7 or a variant thereof or a fusion protein thereof, the nucleotide sequence being codon-optimized for expression in eukaryotic cells.
- the nucleic acid of the present disclosure comprises a nucleotide sequence encoding CasW1 or a variant thereof or a fusion protein thereof, the nucleotide sequence being codon-optimized for expression in animal cells.
- the nucleic acid of the present disclosure comprises a nucleotide sequence encoding CasW1 or a variant thereof or a fusion protein thereof, the nucleotide sequence being codon-optimized for expression in fungal cells. In some cases, the nucleic acid of the present disclosure comprises a nucleotide sequence encoding CasW1 or a variant thereof or a fusion protein thereof, the nucleotide sequence being codon-optimized for expression in plant cells.
- the host cell comprises a prokaryotic cell or a eukaryotic cell.
- CRISPR-Cas system CRISPR-Cas system
- Cas CRISPR-associated
- the present invention also provides a CRISPR-Cas composition, comprising:
- Protein component the aforementioned gene editing protein (Cas protein), or the aforementioned fusion protein; or a nucleic acid molecule encoding the gene editing protein (Cas protein) or the aforementioned fusion protein;
- RNA component guide RNA, or one or more nucleic acids encoding the guide RNA, or a precursor RNA of the guide RNA, or a nucleic acid encoding a precursor RNA of the guide RNA;
- the protein component and the nucleic acid component are combined with each other to form a complex.
- the composition is an activated CRISPR complex
- the activated CRISPR complex further comprises: a target sequence of a target nucleic acid bound to the guide RNA.
- the CRISPR-Cas composition comprises one or more vectors, wherein the one or more vectors comprise:
- a first regulatory element which is operably linked to a nucleotide sequence encoding the gene editing protein (Cas protein) or a nucleotide sequence encoding the fusion protein;
- the second regulatory element is operably linked to a nucleotide sequence encoding the guide RNA, wherein the guide RNA comprises:
- a direct repeat (DR) sequence connected to the spacer sequence and capable of guiding the gene editing protein (Cas protein) to bind to the guide RNA to form a CRISPR-Cas complex targeting the target sequence;
- first regulatory element and the second regulatory element are located on the same or different vectors of the CRISPR-Cas vector system.
- the first regulatory element or the second regulatory element comprises a promoter
- the promoter comprises one or more of an inducible promoter, a constitutive promoter, or a tissue-specific promoter
- the promoter comprises one or more of T7, SP6, T3, CMV, EF1a, SV40, PGK1, human ⁇ -actin, CAG, U6, H1, T7, T7lac, araBAD, trp, lac or Ptac;
- the first regulatory element and the second regulatory element are located on the same or different vectors.
- the vector comprises a retroviral vector, a lentiviral vector, an adenoviral vector, an adeno-associated viral vector, a herpes simplex vector, or a phagemid vector;
- the vector comprises a plasmid vector.
- the target nucleic acid comprises DNA derived from a eukaryotic organism or DNA derived from a prokaryotic organism;
- the eukaryotic organism includes an animal or a plant
- the target nucleic acid comprises non-human mammal DNA, human DNA, insect DNA, bird DNA, reptile DNA, amphibian DNA, rodent DNA, fish DNA, worm DNA, nematode DNA, or yeast DNA;
- the non-human mammalian DNA comprises non-human primate DNA.
- CRISPR/Cas complex refers to a complex formed by the binding of guide RNA, gRNA (guide RNA) or mature crRNA (or guide RNA) and gene editing protein (Cas protein), which contains a co-directional repeat sequence that hybridizes to the guide sequence of the target sequence and binds to the gene editing protein (Cas protein), and the complex can recognize and cut the target nucleotide that can hybridize with the guide RNA or mature crRNA.
- gRNA Guide RNA
- a guide RNA may include a direct repeat (DR) sequence and a spacer.
- the sequence may consist essentially of or consist of a direct repeat (DR) sequence and a spacer sequence.
- the spacer sequence is any polynucleotide sequence that has sufficient complementarity with the target sequence to hybridize with the target sequence and guide the CRISPR-Cas complex to specifically bind to the target sequence.
- the degree of complementarity between the spacer sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%.
- the guide sequence comprises a sequence (e.g., a direct repeat (DR) sequence) that has sufficient complementarity with the target nucleic acid sequence to hybridize with the target nucleic acid sequence and guide the sequence-specific binding of the complex to the target nucleic acid sequence.
- DR direct repeat
- mismatches e.g., one or more mismatches between the spacer sequence and the target nucleic acid, such as 1 or 2 nucleotide mismatches (including the position of the mismatch along the spacer sequence/target sequence)
- mismatches e.g., one or more mismatches between the spacer sequence and the target nucleic acid, such as 1 or 2 nucleotide mismatches (including the position of the mismatch along the spacer sequence/target sequence
- a cleavage rate of less than 100% of the target is desired (e.g., in a cell population)
- 1 or 2 mismatches between the spacer sequence and the target sequence can be introduced into the spacer sequence.
- the present invention provides a guide RNA, which includes a direct repeat (DR) sequence capable of binding to the Cas protein and a spacer sequence capable of targeting a target sequence.
- DR direct repeat
- the direct repeat sequence, the direct repeat sequence comprises the sequence shown in SEQ ID NO.3.
- the 3' end of the direct repeat sequence comprises a stem-loop structure, and further comprises a stem of the stem-loop structure formed by hybridization of a first stem nucleotide chain and a second stem nucleotide chain, and the ring nucleotide chain forms a loop of the stem-loop structure;
- the direct repeat sequence comprises a nucleotide sequence having at least 80% identity with the nucleotide sequence described in SEQ ID NO.3;
- the direct repeat sequence comprises a nucleotide sequence having at least 85% or more, more preferably more than 90%, and further preferably more than 95% identity with the nucleotide sequence described in SEQ ID NO.3;
- the homeotropic repeat sequence includes the nucleotide sequence described in SEQ ID NO.3.
- more than 80% of the spacer sequence is complementary to the target nucleic acid
- more than 90%, more preferably more than 95%, further preferably more than 99%, and further preferably 100% of the spacer sequence is complementary to the target nucleic acid
- the length of the spacer sequence is 18-41 nt, for example 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 nt, more preferably 18 to 27 nucleotides, more preferably 18 to 24 nucleotides, and most preferably 18 to 22 nucleotides.
- the spacer sequence is 20 nt in length.
- target nucleic acid is used interchangeably with target sequence or target nucleic acid sequence or target nucleic acid molecule, and refers to a specific nucleic acid that comprises a nucleic acid sequence that is fully or partially complementary to the spacer sequence in the guide RNA.
- “Target sequence” refers to a polynucleotide targeted by a spacer sequence in a guide RNA, such as a sequence having complementarity with the spacer sequence, wherein hybridization between the target sequence and the spacer sequence will promote the formation of a CRISPR-Cas complex (including Cas protein and guide RNA). Complete complementarity is not required, as long as there is enough complementarity to cause hybridization and promote the formation of a CRISPR-Cas complex.
- the target nucleic acid comprises a non-coding region (e.g., a promoter or terminator).
- the target nucleic acid is single-stranded, or double-stranded.
- the target sequence can comprise any polynucleotide, such as DNA.
- the target sequence is located in a cell or outside the cell.
- the target sequence is located in the nucleus, cytoplasm, or organelle (e.g., mitochondria or chloroplast) of the cell.
- the target nucleic acid can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or junk DNA).
- a gene product e.g., a protein
- a non-coding sequence e.g., a regulatory polynucleotide or junk DNA
- the target sequence should be associated with a protospacer adjacent motif (PAM).
- PAM protospacer adjacent motif
- donor template nucleic acid or donor template are used interchangeably and refer to a nucleic acid molecule that can be used by one or more cellular proteins to change the structure of the target nucleic acid after the gene editing protein (Cas protein) described in this article changes the target nucleic acid.
- Cas protein gene editing protein
- the donor template nucleic acid is a double-stranded nucleic acid or a single-stranded nucleic acid.
- the donor template nucleic acid is linear or circular (e.g., plasmid).
- the donor template nucleic acid is an exogenous nucleic acid molecule.
- the donor template nucleic acid is an endogenous nucleic acid molecule (e.g., chromosome).
- the donor template can be used to realize genetic recombination, and the recombination is homologous recombination.
- Cutting refers to DNA breaks in target nucleic acids produced by gene editing proteins (Cas proteins) described herein.
- cutting is double-stranded DNA breaks.
- cutting is single-stranded DNA breaks.
- cleaving a target nucleic acid or modifying a target nucleic acid may overlap.
- Modifying a target nucleic acid includes not only modification of a single nucleotide, but also insertion or deletion of a nucleic acid fragment.
- Reporter nucleic acid refers to a molecule that can be cut or otherwise deactivated by an activated CRISPR system protein as described herein.
- Reporter nucleic acid comprises a nucleic acid element that can be cut by a CRISPR protein (e.g., using a single-stranded non-targeted nucleic acid molecule, with different reporter groups or labeling molecules at both ends). The cutting of the nucleic acid element produces a detectable signal. Before cutting, or when the reporter nucleic acid is in an "active" state, the reporter nucleic acid prevents the generation or detection of a positive detectable signal. It will be understood that in certain example embodiments, a minimum background signal can be generated in the presence of an active reporter nucleic acid.
- a positive detectable signal can be any signal that can be detected using optics, fluorescence, chemiluminescence, electrochemistry, or other detection methods known in the art.
- a first signal i.e., a negative detectable signal
- a second signal e.g., a positive detectable signal
- the reporter nucleic acid can be a single-stranded DNA molecule, a single-stranded RNA molecule, or a single-stranded DNA-RNA hybrid.
- the detection method of the present invention can be used for quantitative detection of target nucleic acid to be detected.
- the measurement index can be quantified according to the signal strength of the reporter group, such as the luminescence intensity of the fluorescent group, or the width of the color band.
- the functional domain herein takes its broadest meaning, including proteins such as enzymes or factors themselves or having specific functional fragments/domains.
- a gene editing protein e.g., a dCas protein
- the functional domains are selected from one or more of a localization signal, a reporter protein, a Cas protein targeting portion, a DNA binding domain, an epitope tag, a transcription activation domain, a transcription inhibition domain, a nuclease, a deamination domain, a methylase, a demethylase, a transcription release factor, an HDAC, a cleavage active polypeptide, and a ligase.
- the functional domains may be the same or different.
- the deamination domain includes a deaminase (e.g., an adenosine deaminase or a cytidine deaminase) catalytic domain.
- adenosine deaminase or “adenosine deaminase protein” refers to a protein, a polypeptide, or one or more functional domains of a protein or polypeptide that is capable of catalyzing a hydrolytic deamination reaction that converts adenine (or the adenine portion of a molecule) into hypoxanthine (or the hypoxanthine portion of a molecule).
- the adenine-containing molecule is adenosine (A), and the hypoxanthine-containing molecule is inosine (I).
- the adenine-containing molecule can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
- Adenosine deaminases include, but are not limited to, members of the enzyme family known as adenosine deaminases acting on RNA (ADAR), members of the enzyme family known as adenosine deaminases acting on tRNA (ADAT), and other family members containing adenosine deaminase domains (ADAD).
- adenosine deaminases are capable of targeting adenine in RNA/DNA and RNA duplexes.
- adenosine deaminases have been modified to increase their ability to edit DNA in RNA/DNA heteroduplexes of RNA duplexes.
- the deaminase is a cytidine deaminase.
- cytidine deaminase or “cytidine deaminase protein” refers to a protein, a polypeptide, or one or more functional domains of a protein or polypeptide that is capable of catalyzing a hydrolytic deamination reaction that converts cytosine (or the cytosine portion of a molecule) into uracil (or the uracil portion of a molecule).
- the cytosine-containing molecule is cytidine (C)
- the uracil-containing molecule is uridine (U).
- the cytosine-containing molecule can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
- Cytidine deaminases include, but are not limited to, members of the enzyme family known as apolipoprotein B mRNA editing complex (APOBEC) family deaminases, activation-induced deaminases (AID), or cytidine deaminase 1 (CDA1). In a specific embodiment, an APOBEC family deaminase is included.
- APOBEC apolipoprotein B mRNA editing complex
- AID activation-induced deaminases
- CDA1 cytidine deaminase 1
- the cytidine deaminase comprises a wild-type amino acid sequence of a cytidine deaminase. In some embodiments, the cytidine deaminase comprises one or more mutations in the cytidine deaminase sequence such that the editing efficiency and/or substrate editing preference of the cytidine deaminase is changed according to specific needs.
- Identity is used to refer to the matching of sequences between two polypeptides or between two nucleic acids. “Identity” means the percentage of the number of identical residues between the polypeptide or nucleic acid sequences to the total number of residues, and the calculation of the total number of residues is determined based on the type of mutation. Mutation types include insertions (extensions) at either or both ends of the sequence, deletions (truncations) at either or both ends of the sequence, substitutions/alternations of one or more amino acids/nucleotides, insertions within the sequence, and deletions within the sequence.
- the mutation type is one or more of the following: substitution/replacement of one or more amino acids/nucleotides, insertion within the sequence, and deletion within the sequence
- the total number of residues is calculated as the larger of the molecules being compared.
- the mutation type also includes an insertion (extension) at either or both ends of the sequence or a deletion (truncation) at either or both ends of the sequence, the number of amino acids inserted or deleted at either or both ends (for example, the number of insertions or deletions at both ends is less than 20) is not included in the total number of residues.
- the sequences being compared are aligned in a manner that produces the maximum match between the sequences, and gaps in the alignment (if any) are resolved by a specific algorithm. The same applies to the calculation of nucleotide identity.
- a vector is a nucleic acid molecule that is capable of transporting another nucleic acid molecule to which it has been linked.
- Vectors include, but are not limited to, single-stranded, double-stranded, or partially double-stranded nucleic acid molecules; nucleic acid molecules including one or more free ends, or no free ends (e.g., circular); nucleic acid molecules including DNA, RNA, or both; and other various polynucleotides known in the art.
- Vectors can be introduced into host cells by transformation, transduction, or transfection so that the genetic material elements they carry are expressed in the host cells.
- a vector can be introduced into a host cell to produce transcripts, proteins, or peptides, including proteins, fusion proteins, isolated nucleic acid molecules, etc. as described herein (e.g., CRISPR transcripts, such as nucleic acid transcripts, proteins, or enzymes).
- a vector can contain a variety of elements that control expression, including, but not limited to, promoter sequences, transcription start sequences, enhancer sequences, selection elements, and reporter genes.
- the vector may also contain a replication initiation site.
- Vectors include plasmids and viral vectors, wherein the plasmid refers to a circular double-stranded DNA loop into which other DNA fragments can be inserted, for example, by standard molecular cloning techniques.
- Viral vectors wherein virally derived DNA or RNA sequences are present in vectors for packaging viruses, and viruses include, for example, retroviruses, replication-defective retroviruses, adenoviruses, replication-defective adenoviruses, and adeno-associated viruses.
- Viral vectors also include polynucleotides carried by viruses for transfection into a host cell. Some vectors (for example, bacterial vectors and episomal mammalian vectors with bacterial replication origins) can replicate autonomously in the host cells into which they are introduced.
- vectors e.g., non-episomal mammalian vectors
- expression vectors are referred to as "expression vectors.”
- the vector e.g., a viral vector or a non-viral vector, such as a lentiviral vector or a plasmid
- the vector can be delivered to the target tissue by, for example, intramuscular injection, intravenous administration, transdermal administration, intranasal administration, oral administration, or mucosal administration.
- the above delivery can be carried out via a single dose or multiple doses.
- the actual dose to be delivered herein can vary to a large extent according to a variety of factors, including but not limited to vector selection, target cells, organisms, tissues, the general condition of the subject to be treated, the degree of transformation/modification sought, the route of administration, the mode of administration, and the type of transformation/modification sought.
- regulatory elements include promoters, enhancers, internal ribosome entry sites (IRES) and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals, poly-U sequences), and their detailed descriptions can be found in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif (1990).
- regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct the nucleotide sequence to be expressed only in certain host cells (e.g., tissue-specific regulatory sequences).
- Tissue-specific promoters may primarily direct expression in desired tissues of interest, such as muscle, neurons, bone, skin, blood, specific organs (e.g., liver, pancreas), or special cell types (e.g., lymphocytes).
- desired tissues of interest such as muscle, neurons, bone, skin, blood, specific organs (e.g., liver, pancreas), or special cell types (e.g., lymphocytes).
- regulatory elements may also direct expression in a timing-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue- or cell-type-specific.
- promoter refers to a non-coding nucleotide sequence that is located upstream of a gene and can initiate downstream gene expression.
- a constitutive promoter is a nucleotide sequence that, when operably linked to a polynucleotide that encodes or limits a gene product, will result in the production of a gene product in the cell under most or all physiological conditions of the cell.
- An inducible promoter refers to a promoter that selectively expresses a coding sequence or functional RNA in response to the presence of endogenous or exogenous stimuli, such as by responding to chemical compounds (chemical inducers), or to environmental, hormone, chemical, and/or developmental signals.
- Inducible or regulated promoters include promoters that are induced or regulated, for example, by light, heat, stress, flooding or drought, salt stress, osmotic stress, plant hormones, wounds, or chemicals (such as ethanol, abscisic acid (ABA), jasmonates, salicylic acid, or safeners).
- host cell refers to a eukaryotic cell (e.g., an animal cell, a plant cell, a fungal cell, etc.), a prokaryotic cell (e.g., some microbial cells, Escherichia coli, Bacillus subtilis, etc.), or a cell from a multicellular organism cultured as a unicellular entity (e.g., a cell line), which serves as a recipient of nucleic acid (e.g., an expression vector) and includes the descendants of the original cell that has been genetically modified by the nucleic acid.
- a eukaryotic cell e.g., an animal cell, a plant cell, a fungal cell, etc.
- a prokaryotic cell e.g., some microbial cells, Escherichia coli, Bacillus subtilis, etc.
- a cell from a multicellular organism cultured as a unicellular entity e.g., a cell line
- nucleic acid e.
- a “recombinant host cell” (also called a “genetically modified host cell”) is a host cell into which a heterologous nucleic acid, such as an expression vector, has been introduced.
- the design of the expression vector may depend on factors such as the choice of the host cell to be transformed, the level of expression desired, and the like.
- the present invention also provides a host cell or its progeny, wherein the host cell comprises the aforementioned gene editing protein (Cas protein), or the aforementioned fusion protein, or the aforementioned polynucleotide, or the aforementioned vector system, or the aforementioned CRISPR-Cas system, or the aforementioned composition.
- the host cell comprises the aforementioned gene editing protein (Cas protein), or the aforementioned fusion protein, or the aforementioned polynucleotide, or the aforementioned vector system, or the aforementioned CRISPR-Cas system, or the aforementioned composition.
- the host cell comprises a non-human mammal, human, insect, bird, reptile, amphibian, rodent, fish, worm, nematode, or yeast cell.
- the present invention also provides a multicellular organism, comprising the aforementioned cell or its progeny.
- the multicellular organism is an animal model or a plant model for a relevant disease.
- NLS refers to "nuclear localization sequence” or “nuclear localization signal”, which refers to an amino acid sequence that causes a protein to enter the cell nucleus.
- Nuclear localization sequences are known in the art (e.g., International PCT Application No. PCT/EP2000/011690 filed by Plank et al. on November 23, 2000 and published as WO/2001/038547), which is incorporated herein by reference for its disclosure of exemplary nuclear localization sequences.
- the NLS is an optimized NLS, e.g., as described in Koblan et al., Nature Biotech. 2018 doi: 10.1038/nbt.4172.
- “Operably linked” means that the target nucleotide sequence is linked to the regulatory elements in a manner that allows the nucleotide sequence to be expressed (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
- Advantageous vectors include lentiviruses and adeno-associated viruses, and the type of these vectors can also be selected to target specific types of cells.
- “Complementarity” refers to the ability of one nucleic acid sequence to form one or more hydrogen bonds with another nucleic acid sequence by means of traditional Watson-Crick or other non-traditional types.
- the percentage of complementarity represents the percentage of residues in one nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with another nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 are complementary, then the percentage of complementarity is 50%, 60%, 70%, 80%, 90% and 100%).
- “Complete complementarity” means that all consecutive residues of one nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in another nucleic acid sequence.
- Substantially complementary refers to a degree of complementarity of at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions.
- stringent conditions in relation to hybridization refers to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes to the target sequence and substantially does not hybridize to non-target sequences. Stringent conditions are generally sequence-dependent and depend on many factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence.
- Hybridization refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding of the bases between the nucleotide residues.
- the complex may comprise two strands forming a duplex, three or more strands forming a multi-stranded complex, a single self-hybridizing strand, or any combination of these.
- a hybridization reaction may constitute a step in a broader process such as the initiation of PCR, or cleavage of a polynucleotide by an enzyme. Sequences that are capable of hybridizing to a given sequence are referred to as the "complement" of the given sequence.
- Hybridization of the target sequence with the gRNA means that at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% of the nucleic acid sequences of the target sequence and the gRNA can hybridize to form a complex; or represents that at least 12, 15, 16, 17, 18, 19, 20 or more bases of the nucleic acid sequences of the target sequence and the gRNA can complementarily pair and hybridize to form a complex.
- Nucleic acid expression includes one or more of generation of an RNA template from a DNA sequence (e.g., transcription), processing of the RNA transcript (e.g., by splicing, editing, 5' cap formation, and/or 3' end processing), translation of the RNA into a polypeptide or protein, or post-translational modification of the polypeptide or protein.
- Delivery refers to providing an entity (such as a drug) to a destination, for example, the components of the CRISPR-Cas system/composition of the present invention can be delivered in various forms, such as a combination of DNA/RNA or RNA/RNA or protein RNA.
- a gene editing protein can be delivered as a polynucleotide encoding DNA or a polynucleotide encoding RNA or as a protein.
- the present invention also provides a delivery system, which comprises the gene editing protein (Cas protein) or the fusion protein, or the polynucleotide, or the CRISPR-Cas composition.
- a delivery system which comprises the gene editing protein (Cas protein) or the fusion protein, or the polynucleotide, or the CRISPR-Cas composition.
- the delivery system further comprises a delivery vehicle, and the delivery vehicle comprises nanoparticles, liposomes, exosomes, microbubbles, a gene gun or an electroporation device.
- a method such as cell penetrating peptide (CPP) delivery is also adopted.
- a gene editing protein (Cas protein) and/or at least one guide RNA is coupled to one or more CPPs, so as to effectively transport the CPP coupled with a gene editing protein (Cas protein) and/or a guide RNA into a plant cell (e.g., in a protoplast).
- CPP has a short peptide of less than 35 amino acids, which is derived from a protein or derived from a chimeric sequence, and can transport biomolecules across the cell membrane in a non-receptor-dependent manner.
- CPP can be a cationic peptide, a peptide with a hydrophobic sequence, an amphipathic peptide, a peptide rich in proline and an antimicrobial sequence, and a chimeric or dichotomous peptide.
- CPP can penetrate the biomembrane, and thus trigger different biomolecules to move across the cell membrane into the cytoplasm, and can improve their intracellular pathways, and thus promote the interaction between biomolecules and targets.
- CPPs include Tat (a nuclear transcription activating protein required for viral replication by HIV type 1), penetratin, Kaposi fibroblast growth factor (FGF) signal peptide sequence, integrin ⁇ 3 signal peptide sequence, poly-arginine peptide Arg sequence, guanine-rich molecular transporter, sweet arrow peptide, etc.
- Tat a nuclear transcription activating protein required for viral replication by HIV type 1
- penetratin Kaposi fibroblast growth factor (FGF) signal peptide sequence
- FGF Kaposi fibroblast growth factor
- integrin ⁇ 3 signal peptide sequence integrin ⁇ 3 signal peptide sequence
- poly-arginine peptide Arg sequence guanine-rich molecular transporter
- sweet arrow peptide etc.
- linker refers to a chemical group or molecule that connects two molecules or parts, such as two domains of a fusion protein, such as a gene editing protein (Cas protein) and a deaminase. In some connection modes, the linker is located between or flanking two groups, molecules or other parts, and connects the two by covalent bonds.
- the joint is a linear polypeptide formed by amino acids or multiple amino acid residues connected by peptide bonds.
- the joint is an organic molecule, a group, a polymer or a chemical part. The length and type of the joint can be designed as needed.
- the joint can select an artificially synthesized amino acid sequence or a naturally occurring peptide sequence.
- the present invention also provides a method for targeting and editing a target nucleic acid, the method comprising contacting the target nucleic acid with any one of the aforementioned CRISPR-Cas systems or compositions.
- the present invention also provides a method for non-specifically degrading single-stranded DNA after recognizing a target nucleic acid, the method comprising contacting the target nucleic acid with the aforementioned CRISPR-Cas composition.
- the present invention also provides a method for targeting a non-spacer complementary strand of a double-stranded target nucleic acid and causing a nick therein after recognizing a spacer complementary strand of the double-stranded target nucleic acid, the method comprising contacting the double-stranded target nucleic acid with the aforementioned CRISPR-Cas system or composition.
- the present invention also provides a method for targeting and cleaving a double-stranded target nucleic acid, the method comprising contacting the double-stranded target nucleic acid with the aforementioned CRISPR-Cas system or composition.
- the non-spacer sequence complementary strand of the double-stranded target nucleic acid is nicked before the spacer complementary strand of the double-stranded DNA is nicked.
- the present invention also provides a method for specifically editing a double-stranded nucleic acid, the method comprising contacting the following under sufficient conditions and for a sufficient amount of time,
- the method results in the formation of a double-strand break.
- the invention also provides a method of editing a double-stranded nucleic acid, the method comprising contacting the following under sufficient conditions and for a sufficient amount of time:
- the gene editing protein (Cas protein) of the fusion protein is modified to produce a nick in the non-target strand of the double-stranded nucleic acid.
- the two strands of the double-stranded nucleic acid are cleaved at different sites, resulting in staggered cleavage.
- both strands of the double-stranded nucleic acid are cleaved at the same site, resulting in a blunt double-strand break.
- the present invention also provides a method for targeting and cleaving a single-stranded target nucleic acid, the method comprising contacting the target nucleic acid with any of the aforementioned CRISPR-Cas compositions.
- the present invention also provides a method for inducing a change in a cell state, the method comprising contacting the aforementioned CRISPR-Cas composition with the target nucleic acid in a cell.
- the cell state comprises apoptosis or dormancy
- the cell comprises a eukaryotic cell or a prokaryotic cell
- the cell comprises a mammalian cell or a plant pathogenic cell
- the cell comprises a cancer cell
- the cell comprises an infectious cell or a cell infected by an infectious agent
- the cells include virus-infected cells, prion-infected cells;
- the cell comprises a fungal cell, a protozoan, or a parasite cell.
- the present invention also provides a method for detecting a target nucleic acid in a sample, the method comprising contacting the sample with the aforementioned gene editing protein (Cas protein), guide RNA and non-target sequence; detecting a detectable signal generated by the gene editing protein (Cas protein) cutting the non-target sequence, thereby detecting the target nucleic acid; the non-target sequence does not hybridize with the guide RNA.
- a gene editing protein Cas protein
- guide RNA a detectable signal generated by the gene editing protein (Cas protein) cutting the non-target sequence, thereby detecting the target nucleic acid
- the non-target sequence does not hybridize with the guide RNA.
- the present invention provides a kit, comprising the aforementioned gene editing protein (Cas protein), the aforementioned fusion protein, the aforementioned polynucleotide, the aforementioned CRISPR-Cas composition, and the use of the aforementioned host cell in preparing a kit, wherein the components of the kit are in the same or different containers.
- as protein gene editing protein
- the aforementioned fusion protein the aforementioned polynucleotide
- CRISPR-Cas composition the use of the aforementioned host cell in preparing a kit, wherein the components of the kit are in the same or different containers.
- the present invention also provides a container, comprising the aforementioned kit.
- the container comprises a sterile container
- the container comprises a syringe.
- the kit also includes instructions for using the kit, such as instructions in more than one language.
- the kit may also include one or more reagents for use in the process of utilizing one or more of the above components.
- the reagents may be provided in any suitable container.
- the kit may provide one or more reaction or storage buffers.
- the above reagents may be provided in a form (e.g., in a concentrated or lyophilized form) requiring the addition of one or more other components before use;
- the buffer may be any buffer, including but not limited to sodium carbonate buffer, sodium bicarbonate buffer, borate buffer, Tris buffer, MOPS buffer, HEPES buffer, and combinations thereof.
- the buffer may have a suitable pH value, for example, may be alkaline. In some embodiments, the pH of the buffer is between about 7-10.
- “Treatment” refers to treating or curing a subject's condition, delaying the onset of symptoms of a condition, and/or delaying the severity of a condition.
- the term “subject” includes, but is not limited to, various animals, plants, and microorganisms. Animals include mammals, such as bovines, equines, ovines, swine, canines, felines, lagomorphs, rodents (e.g., mice or rats), non-human primates (e.g., macaques or cynomolgus monkeys), or humans. In certain embodiments, the subject (e.g., a human) suffers from a condition (e.g., a condition caused by a disease-related genetic defect).
- Plant is any differentiated multicellular organism capable of photosynthesis, including crop plants at any maturity or developmental stage.
- the present invention also provides the use of the aforementioned gene editing protein (Cas protein), the aforementioned fusion protein, the aforementioned polynucleotide, the aforementioned CRISPR-Cas composition, and the aforementioned host cell in the preparation of a drug for treating a condition or disease in a subject in need thereof.
- Cas protein gene editing protein
- the aforementioned fusion protein the aforementioned polynucleotide
- CRISPR-Cas composition the aforementioned host cell
- the use comprises administering the CRISPR-Cas composition to the subject or to an ex vivo cell of the subject;
- the spacer sequence is complementary to at least 15 nucleotides of the target nucleic acid associated with the condition or disease, and the Cas protein or the fusion protein cleaves the target nucleic acid;
- condition or disease comprises cancer or an infectious disease
- the cancer comprises one or more of Wilms tumor, Ewing sarcoma, neuroendocrine tumor, glioblastoma, neuroblastoma, melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, kidney cancer, pancreatic cancer, lung cancer, biliary tract cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid cancer, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphocytic leukemia, acute myeloid leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, or urinary bladder cancer;
- the condition or disease comprises one or more of cystic fibrosis, atherosclerotic cardiovascular disease (ASCVD), pseudohypertrophic muscular dystrophy, Becker muscular dystrophy, alpha-1-antitrypsin deficiency, Pompe disease, myotonic dystrophy, Huntington disease, fragile X syndrome, Friedreich's ataxia, amyotrophic lateral sclerosis, frontotemporal dementia, hereditary chronic kidney disease, hyperlipidemia, Leber's congenital amaurosis, sickle cell disease, primary hyperoxaluria (PH1), hypercholesterolemia (FH), hereditary angioedema (HAE), retinal disease, macular degeneration, transthyretin amyloidosis, or beta thalassemia;
- ASCVD atherosclerotic cardiovascular disease
- pseudohypertrophic muscular dystrophy Becker muscular dystrophy
- alpha-1-antitrypsin deficiency Pompe disease
- Pompe disease myotonic dystrophy
- the infectious agent of the infectious disease includes one or more of human immunodeficiency virus, herpes simplex virus-1, hepatitis B or herpes simplex virus-2.
- the present invention discovers a new gene editing protein (Cas protein) for the first time.
- the gene editing protein (Cas protein) of the present invention has very good gene editing activity, can effectively edit or cut the target gene, and can effectively treat the symptoms or diseases of subjects in need (for example, one or more of cystic fibrosis, pseudohypertrophic muscular dystrophy, Becker muscular dystrophy, alpha-1-antitrypsin deficiency, Pompe disease, myotonic dystrophy, Huntington's disease, fragile X syndrome, Friedreich's ataxia, amyotrophic lateral sclerosis, frontotemporal dementia, hereditary chronic kidney disease, hyperlipidemia, Leber's congenital amaurosis, sickle cell disease, hypercholesterolemia, transthyretin amyloidosis or beta-thalassemia).
- the present invention has discovered a new Cas protein, which has a lower homology with the reported Cas enzymes and exhibits excellent DNA nuclease activity compared to the Cas enzymes in the prior art, and has broad application prospects.
- CasW1 was identified as a member of the Cas12 family.
- the CRISPR-Cas effector protein of the present invention was compared with existing effector proteins, and it was found that it had a low similarity with known Cas proteins.
- the PAM corresponding to the Cas protein obtained in the present invention is 5’-TTTN, where N represents A/T/G/C.
- the CRISPR locus of the sample containing CasW1 was annotated by PILER-CR, and the corresponding DNA encoding the direct repeat (DR) sequence was obtained as shown in SEQ ID No. 3, and its secondary structure schematic diagram is shown in Figure 10.
- Example 2 In vitro enzyme cleavage experiments verified the cleavage activity of CasW1.
- nucleotide sequence fragment encoding CasW1 was synthesized and cloned into the prokaryotic protein expression vector pET-28a(+) (Biolabio, #QN1060) by restriction endonuclease digestion and T4 DNA ligase ligation to obtain a ligation product, as shown in Figure 6.
- the inventors performed PCR on the obtained CasW1 fragment, and used EcoRI/NotI double digestion, and at the same time performed EcoRI/NotI double digestion on the prokaryotic protein expression vector pET-28a(+). Afterwards, the CasW1 fragment double digestion product and the prokaryotic protein expression vector pET-28a(+) double digestion product were subjected to agarose gel electrophoresis, and the results are shown in Figure 1, showing that the digestion product CasW1 nucleotide fragment of the correct size and the pET-28a(+) vector fragment with 11bp in the middle of the EcoRI/NotI digestion site were obtained.
- the inventors then transformed the ligation product into E. coli competent cells DH5a, and then inoculated the competent cells DH5a on a LB solid plate with kanamycin resistance. After inverted culture in a 37°C incubator overnight, a single colony was picked for Sanger sequencing. The sequence results showed that the plasmid clone with the correct sequence was subjected to plasmid extraction, and the pET-28a(+)-CasW1 expression vector was obtained.
- Transformation Take out a tube of competent Escherichia coli BL21 (DE3) (Shanghai Weidi Biotechnology Co., Ltd., EC1002) from the -80°C refrigerator and place it on ice to dissolve for 5 minutes, then pipette 1ng pET-28a(+)-CasW1 expression vector plasmid into the competent cells, flick the bottom of the tube with your finger to mix, and let it stand on ice for 25 minutes. Heat shock in a 45°C water bath for 45 seconds, quickly put it back on ice and let it stand for 2 minutes, add 700ul of antibiotic-free LB medium to the centrifuge tube, and resuscitate at 37°C, 220rpm for 60 minutes. After the recovery is complete, centrifuge at 5000rpm for 1 minute to collect the bacteria, discard 600 ⁇ l of the supernatant, then gently mix the remaining liquid and competent cells and spread them on the LB plate, and use glass beads to coat the plate.
- DE3 Escherichia
- Ultrasound Fix a 50 ml centrifuge tube vertically in a beaker filled with ice water, and adjust the position so that the ultrasonic probe is below the bacterial liquid surface.
- Ultrasound mode work for 3 seconds, rest for 12 seconds, 200w, 60 times. After the end of ultrasound, add 150 ⁇ l of PMSF protease inhibitor, and then centrifuge at 11000 rpm and 4°C for 25 minutes.
- Beads pretreatment Pipette 300 ⁇ l Ni-NTA Agarose beads (QIAGEN, #30230) into a 15 ml centrifuge tube, add 10 ml PBS, rotate at room temperature for 5 min, centrifuge at 1000 g for 2 min, and 4 °C, remove the supernatant with a rubber pipette, and repeat the washing with PBS. Then add 10 ml imidazole (concentration of 10 mM), rotate at room temperature for 5 min, centrifuge at 1000 g for 2 min, and 4 °C, carefully remove the supernatant with a rubber pipette, and place the 15 ml centrifuge tube containing the washed beads on ice for later use.
- 10 ml imidazole concentration of 10 mM
- Ni-NTA Agarose beads were eluted twice with 10 ml imidazole (40 mM concentration). After adding imidazole each time, rotate at 4°C for 5 min, and then centrifuge at 1000 ⁇ g for 2 min at 4°C. Resuspend Ni-NTA Agarose beads with 500 ⁇ l imidazole (250 mM concentration), then transfer to a pre-cooled affinity chromatography column (MedChemExpress, #HY-K0221), equilibrate for 5 min, and collect the protein fraction eluted with 250 mM imidazole in a 1.5 ml centrifuge tube. Repeat the above elution steps three times.
- hHPRT1-dsDNA-R gtcaagggcatatcctacaa (SEQ ID NO. 6).
- Taq enzyme was used for PCR amplification, and the reaction system was as follows:
- genomic DNA (as template) (total amount 100 ng)
- total amount 100 ng 10 ⁇ l of 2 ⁇ Taq PCR mix, 0.5 ⁇ l of upstream and downstream primers respectively, and ddH 2 O was added to make up the total volume to 20 ⁇ l.
- the PCR reaction program was as follows: 95°C for 5 min; 94°C for 30 s, 55°C for 30 s, 72°C for 20 s, 35 cycles; 72°C for 10 min; and insulation at 12°C.
- PCR reaction solution was then subjected to agarose gel electrophoresis, and the electrophoresis results are shown in Figure 3.
- an agarose gel DNA recovery kit (TIANGEN, DP219-02) was used for gel recovery, and finally, enzyme-free water was used for elution to obtain an in vitro cleaved dsDNA template.
- the inventors designed two groups of comparative experiments to compare their cutting activities.
- the first group was a comparison of the cutting activities of CasW1 and LbCpf1
- the second group was a comparison of the cutting activities of CasW1 with HED Cas 12i.16 and S7R-Cas12i.3.
- amino acid sequence of LbCpf1 protein is shown in SEQ ID NO.12, and the nucleotide coding sequence is shown in SEQ ID NO.11; the amino acid sequence of HED Cas12i.16 protein is shown in SEQ ID NO.10, and the nucleotide sequence is shown in SEQ ID NO.9; the amino acid sequence of S7R-Cas12i.3 is shown in SEQ ID NO.8, and the nucleotide sequence is shown in SEQ ID NO.7.
- step 1 in Example 2 was used to construct LbCpf1, HED Cas12i.16, and S7R-Cas 12i.3 expression vectors, respectively.
- the maps of the recombinant expression vectors are shown in Figures 7, 8, and 9.
- a targeting sequence was designed based on the hHPRT1 gene and named hHPRT1-spacer: GGTTAAAGATGGTTAAATGAT (SEQ ID NO.4).
- the crRNA sequences of the above Cas proteins were designed and named CasW1-hHPRT1-crRNA and LbCpf1-hHPRT1-crRNA, respectively, as follows:
- GTCTAAATGACCTATAAATTTCTACTATGTGTAGAT GGTTAAAGATGGTTAAATGAT (SEQ ID NO. 13), wherein the underlined sequence is the DR sequence of CasW1;
- TAATTTCTACTAAGTGTAGAT GGTTAAAGATGGTTAAATGAT (SEQ ID NO. 14), wherein the underlined sequence is the DR sequence of LbCpf1;
- CasW1-hHPRT1-crRNA and LbCpf1-hHPRT1-crRNA sequence fragments were chemically synthesized (by Nanjing GenScript Biotechnology Co., Ltd.) respectively. Then, a mixed solution of CasW1 and CasW1-hHPRT1-crRNA was prepared, and the components of the mixed solution were as follows:
- the mixed solution of LbCpf1 and LbCpf1-hHPRT1-crRNA was prepared in the same manner.
- the dsDNA cleavage products were then analyzed using a portable bioanalyzer (Houze Biotechnology, Qsep1), and the enzyme cleavage effect was detected by referring to the S1 high-resolution cartridge (Houze Biotechnology, C105102) detection scheme.
- the analysis results are shown in Figure 4.
- the Smear analysis option provided by Qsep1 showed that the cleavage activity of CasW1 was 99.5%, and the cleavage activity of LbCpf1 was 95%.
- the cleavage activity of CasW1 was higher than that of LbCpf1.
- the hHPRT1-spacer sequence fragment was synthesized, and the crRNA sequences of the above Cas proteins were designed according to the DR sequences of CasW1, HED Cas12i.16 and S7R-Cas12i3, and named as CasW1-hHPRT1-crRNA, HED Cas12i.16-hHPRT1-crRNA and S7R-Cas12i3-hHPRT1-crRNA, respectively.
- the specific sequences are as follows:
- GTCTAAATGACCTATAAATTTCTACTATGTGTAGAT GGTTAAAGATGGTTAAATGAT (SEQ ID NO. 13), wherein the underlined sequence is the DR sequence of CasW1;
- CasW1-hHPRT1-crRNA, HED Cas12i.16-hHPRT1-crRNA and S7R-Cas12i3-hHPRT1-crRNA sequence fragments were chemically synthesized (by Nanjing GenScript Biotech Co., Ltd.), respectively.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Animal Behavior & Ethology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biomedical Technology (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- General Engineering & Computer Science (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Chemical & Material Sciences (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Plant Pathology (AREA)
- Epidemiology (AREA)
- Oncology (AREA)
- Neurosurgery (AREA)
- Hematology (AREA)
- Neurology (AREA)
- Communicable Diseases (AREA)
- Gastroenterology & Hepatology (AREA)
- Analytical Chemistry (AREA)
- Crystallography & Structural Chemistry (AREA)
- Ophthalmology & Optometry (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
Description
本发明涉及基因编辑领域,具体地,涉及一种基因编辑蛋白、其相应的基因编辑系统及应用。The present invention relates to the field of gene editing, and in particular, to a gene editing protein, a corresponding gene editing system and applications.
Clustered regularly interspaced short palindromic repeats(CRISPR)系统是细菌和古细菌为了防御入侵噬菌体的DNA而形成的。CRISPR系统包括2个家族,1类系统进一步分为I型、III型和IV型;2类系统分为II型、V型和VI型。这6种系统类型又细分为19种亚型。许多原核生物包含多个CRISPR-Cas系统,这表明它们是兼容的并且可能共享组件。Clustered regularly interspaced short palindromic repeats (CRISPR) systems are formed by bacteria and archaea to defend against invading phage DNA. CRISPR systems include two families, type 1 systems are further divided into types I, III, and IV; type 2 systems are divided into types II, V, and VI. These six system types are further divided into 19 subtypes. Many prokaryotes contain multiple CRISPR-Cas systems, suggesting that they are compatible and may share components.
其中II型最常见的为CRISPR/Cas9系统,Cas9蛋白可在反式编码小RNA(tracrRNA)的协助下将pre-crRNA加工成与tracrRNA结合的成熟crRNA。之后,人们发现通过人工构建模拟crRNA-tracrRNA复合体的单链嵌合体引导RNA(guide RNA,gRNA),即可有效的介导Cas9蛋白对靶点的识别和切割。其中与靶点3′端紧邻的3个碱基必须是5′-NGG-3′的形式,从而构成Cas/crRNA复合体识别靶点所需的PAM(protospacer adjacent motif)结构。The most common type II is the CRISPR/Cas9 system. The Cas9 protein can process pre-crRNA into mature crRNA that binds to tracrRNA with the assistance of trans-encoded small RNA (tracrRNA). Later, people found that by artificially constructing a single-stranded chimeric guide RNA (guide RNA, gRNA) that simulates the crRNA-tracrRNA complex, the Cas9 protein can effectively mediate the recognition and cutting of the target. The three bases adjacent to the 3′ end of the target must be in the form of 5′-NGG-3′, thus forming the PAM (protospacer adjacent motif) structure required for the Cas/crRNA complex to recognize the target.
目前已知的CRISPR/Cas存在各自的优缺点,例如,Cas9需要两条RNA作为指导RNA。Currently known CRISPR/Cas have their own advantages and disadvantages. For example, Cas9 requires two RNAs as guide RNAs.
目前对于具有广泛应用价值的靶向核酸或多核苷酸的替代性且稳健的编辑系统和编辑技术存在着迫切需要。There is an urgent need for alternative and robust editing systems and editing technologies targeting nucleic acids or polynucleotides with broad application value.
因此,目前对生物技术的发展仍需要开发新的、具有多样化特征的新型CRISPR/Cas系统。Therefore, the current development of biotechnology still requires the development of new CRISPR/Cas systems with diverse characteristics.
发明内容Summary of the invention
本发明的主要目的在于提供一种新的、具有多样化特征的新型CRISPR/Cas系统。The main purpose of the present invention is to provide a new CRISPR/Cas system with diverse characteristics.
本发明的另一目的在于发掘出新的CRISPR-Cas系统,提供用于靶向核酸或多核苷酸的替代且稳健的系统和技术,以解决目前已知的CRISPR-Cas系统的缺点。Another object of the present invention is to discover new CRISPR-Cas systems, provide alternative and robust systems and techniques for targeting nucleic acids or polynucleotides, and address the shortcomings of currently known CRISPR-Cas systems.
本发明的第一方面提供了一种基因编辑蛋白,所述蛋白选自下组:The first aspect of the present invention provides a gene editing protein, wherein the protein is selected from the following group:
(a)具有SEQ ID NO:1所示氨基酸序列的多肽;(a) a polypeptide having an amino acid sequence as shown in SEQ ID NO: 1;
(b)具有与SEQ ID NO:1所示氨基酸序列≥80%,81%,82%,83%,84%,85%,86%,87%,88%,89%,90%,91%,92%,93%,94%,95%,96%,97%,98%,99%或99.5%同源性(或同一性)的多肽,且所述多肽具有SEQ ID NO:1的生物学功能;(b) a polypeptide having ≥80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5% homology (or identity) with the amino acid sequence of SEQ ID NO:1, and the polypeptide has the biological function of SEQ ID NO:1;
(c)将SEQ ID NO:1中任一所示氨基酸序列经过一个或多个(较佳地,1-20个,更佳地为1-10个、更佳地1-5个)氨基酸残基的取代、缺失或添加而形成的,且保留SEQ ID NO:1的生物学功能的衍生多肽。(c) A derivative polypeptide formed by replacing, deleting or adding one or more (preferably 1-20, more preferably 1-10, and more preferably 1-5) amino acid residues of any amino acid sequence shown in SEQ ID NO:1, and retaining the biological function of SEQ ID NO:1.
在另一优选例中,所述基因编辑蛋白是CRISPR/Cas系统中的效应蛋白。 In another preferred embodiment, the gene editing protein is an effector protein in the CRISPR/Cas system.
本发明第二方面提供了一种融合蛋白,包含本发明第一方面所述的基因编辑蛋白;以及一个或多个功能结构域。The second aspect of the present invention provides a fusion protein comprising the gene editing protein described in the first aspect of the present invention; and one or more functional domains.
在另一优选例中,所述功能结构域选自定位信号、报告蛋白、Cas蛋白靶向部分、DNA结合域、表位标签、转录激活域、转录抑制域、核酸酶、脱氨结构域、甲基化酶、脱甲基酶、转录释放因子、HDAC、裂解活性多肽、连接酶、整合酶、转座酶、重组酶、聚合酶和碱基切除修复抑制剂(如尿嘧啶-DNA糖基化酶抑制剂(UGI))。In another preferred embodiment, the functional domain is selected from a localization signal, a reporter protein, a Cas protein targeting portion, a DNA binding domain, an epitope tag, a transcription activation domain, a transcription repression domain, a nuclease, a deamination domain, a methylase, a demethylase, a transcription release factor, an HDAC, a cleavage active polypeptide, a ligase, an integrase, a transposase, a recombinase, a polymerase, and a base excision repair inhibitor (such as a uracil-DNA glycosylase inhibitor (UGI)).
在另一优选例中,所述功能结构域包括以下一种或多种对靶序列的酶活性:甲基化酶活性、脱甲基酶活性、乙酰基转移酶活性、脱乙酰酶活性、激酶活性、磷酸酶活性、泛素连接酶活性、去泛素化活性、腺苷酸化活性、脱腺苷酸化活性、SUMO化活性、脱SUMO化活性、核糖基化活性、脱核糖基化活性、豆蔻酰化活性、脱豆蔻酰化活性、糖基化活性(例如,来自O-GlcNAc转移酶)和脱糖基化活性。In another preferred example, the functional domain includes one or more of the following enzymatic activities on the target sequence: methylase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase) and deglycosylation activity.
在另一优选例中,所述功能结构域选自腺苷脱氨酶催化结构域或胞苷脱氨酶催化结构域。In another preferred embodiment, the functional domain is selected from adenosine deaminase catalytic domain or cytidine deaminase catalytic domain.
在另一优选例中,所述腺苷脱氨酶催化结构域或胞苷脱氨酶催化结构域包括ADAR1、ADAR2、APOBEC、AID或TAD中的一种或多种。In another preferred example, the adenosine deaminase catalytic domain or the cytidine deaminase catalytic domain includes one or more of ADAR1, ADAR2, APOBEC, AID or TAD.
在另一优选例中,所述功能结构域是TadA8e的全长或功能性片段。In another preferred embodiment, the functional domain is the full length or functional fragment of TadA8e.
在另一优选例中,所述定位信号包括核定位信号(NLS)和/或核输出信号(NES)。In another preferred example, the localization signal includes a nuclear localization signal (NLS) and/or a nuclear export signal (NES).
在另一优选例中,所述核定位信号的序列位于、靠近或接近权利要求1所述的蛋白的末端(例如,N端或C端)。In another preferred embodiment, the sequence of the nuclear localization signal is located at, near or close to the end (eg, N-terminus or C-terminus) of the protein according to claim 1.
在另一优选例中,所述核输出信号包括蛋白酪氨酸激酶2(如人蛋白酪氨酸激酶2)。In another preferred embodiment, the nuclear export signal includes protein tyrosine kinase 2 (such as human protein tyrosine kinase 2).
在另一优选例中,所述报告蛋白包括谷胱甘肽-S-转移酶(GST)、辣根过氧化物酶(HRP)、氯霉素乙酰转移酶(CAT)、β-半乳糖苷酶、β-葡糖醛酸糖苷酶、自发荧光蛋白。In another preferred embodiment, the reporter protein includes glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), β-galactosidase, β-glucuronidase, and autofluorescent protein.
在另一优选例中,所述自发荧光蛋白包括绿色荧光蛋白(例如,GFP、GFP-2、tagGFP、turboGFP、eGFP、CopGFP、AceGFP等)、HcRed、DsRed、青色荧光蛋白(例如,eCFP、Cerulean、CyPet、AmCyanl等)、黄色荧光蛋白(例如,(例如,YFP、eYFP、Citrine、Venus、YPet、PhiYFP等)、蓝色荧光蛋白(例如,eBFP、eBFP2、Azurite、mKalamal、GFPuv、Sapphire、T-sapphire)。In another preferred embodiment, the autofluorescent protein includes green fluorescent protein (e.g., GFP, GFP-2, tagGFP, turboGFP, eGFP, CopGFP, AceGFP, etc.), HcRed, DsRed, cyan fluorescent protein (e.g., eCFP, Cerulean, CyPet, AmCyanl, etc.), yellow fluorescent protein (e.g., YFP, eYFP, Citrine, Venus, YPet, PhiYFP, etc.), blue fluorescent protein (e.g., eBFP, eBFP2, Azurite, mKalamal, GFPuv, Sapphire, T-sapphire).
在另一优选例中,所述DNA结合域包括甲基化结合蛋白、LexADBD、Gal4DBD。In another preferred embodiment, the DNA binding domain includes methylation binding protein, LexADBD, and Gal4DBD.
在另一优选例中,所述表位标签包括组氨酸标签、V5标签、FLAG标签、流感病毒血凝素标签、Myc标签、VSV-G标签、硫氧还蛋白标签、链霉亲和素标签。In another preferred embodiment, the epitope tag includes a histidine tag, a V5 tag, a FLAG tag, an influenza virus hemagglutinin tag, a Myc tag, a VSV-G tag, a thioredoxin tag, or a streptavidin tag.
在另一优选例中,所述转录激活域包括VP64和/或VPR。In another preferred embodiment, the transcriptional activation domain includes VP64 and/or VPR.
在另一优选例中,所述转录抑制域包括KRAB和/或SID。In another preferred embodiment, the transcriptional repression domain includes KRAB and/or SID.
在另一优选例中,所述核酸酶包括FokI。In another preferred embodiment, the nuclease includes FokI.
在另一优选例中,所述裂解活性多肽包括具有单链RNA裂解活性的多肽、具有双链RNA裂解活性的多肽、具有单链DNA裂解活性的多肽或具有双链DNA裂解活性的多肽。In another preferred embodiment, the cleavage active polypeptide includes a polypeptide having single-stranded RNA cleavage activity, a polypeptide having double-stranded RNA cleavage activity, a polypeptide having single-stranded DNA cleavage activity or a polypeptide having double-stranded DNA cleavage activity.
在另一优选例中,所述连接酶包括DNA连接酶和/或RNA连接酶。 In another preferred embodiment, the ligase includes DNA ligase and/or RNA ligase.
在另一优选例中,所述功能结构域连接于所述的基因编辑蛋白的N端,和/或C端。In another preferred embodiment, the functional domain is connected to the N-terminus and/or C-terminus of the gene editing protein.
在另一优选例中,所述功能结构域插入到所述基因编辑蛋白的N端和C端之间。In another preferred embodiment, the functional domain is inserted between the N-terminus and the C-terminus of the gene editing protein.
在另一优选例中,所述一个或多个功能结构域任选地通过接头连接至所述基因编辑蛋白的N端和/或C端。In another preferred embodiment, the one or more functional domains are optionally connected to the N-terminus and/or C-terminus of the gene editing protein via a linker.
在另一优选例中,所述功能结构域通过接头插入到所述基因编辑蛋白的N端和C端之间。In another preferred embodiment, the functional domain is inserted between the N-terminus and the C-terminus of the gene editing protein through a linker.
在另一优选例中,所述融合蛋白从N端到C端具有如下结构:In another preferred embodiment, the fusion protein has the following structure from N-terminus to C-terminus:
Z1-Z2(I);或Z1-Z2(I); or
Z2-Z1(II);或Z2-Z1(II); or
Z3-Z1-Z4(III);Z3-Z1-Z4(III);
其中,Z1为胞嘧啶脱氨酶或腺苷脱氨酶;wherein Z1 is cytosine deaminase or adenosine deaminase;
Z2为本发明第一方面所述的基因编辑蛋白;Z2 is the gene editing protein described in the first aspect of the present invention;
Z3为本发明第一方面所述的基因编辑蛋白的N端片段;Z3 is the N-terminal fragment of the gene editing protein described in the first aspect of the present invention;
Z4为本发明第一方面所述的基因编辑蛋白的C端片段;Z4 is the C-terminal fragment of the gene editing protein described in the first aspect of the present invention;
并且,各“-”独立地为键或接头。Furthermore, each "-" is independently a bond or a linker.
本发明第三方面提供了一种分离的多核苷酸,所述的多核苷酸编码本发明第一方面所述的基因编辑蛋白或本发明第二方面所述的融合蛋白。The third aspect of the present invention provides an isolated polynucleotide, which encodes the gene editing protein described in the first aspect of the present invention or the fusion protein described in the second aspect of the present invention.
在另一优选例中,所述多核苷酸选自下组:In another preferred embodiment, the polynucleotide is selected from the following group:
(a)序列如SEQ ID NO.2所示的多核苷酸;(a) a polynucleotide whose sequence is shown as SEQ ID NO.2;
(b)核苷酸序列与SEQ ID NO.2所示序列的同源性≥70%(较佳地≥80%,更佳地,≥90%,更佳地≥95%,最佳地≥99%),且编码SEQ ID NO.1所示多肽的多核苷酸;(b) a polynucleotide having a nucleotide sequence homology ≥ 70% (preferably ≥ 80%, more preferably ≥ 90%, more preferably ≥ 95%, and most preferably ≥ 99%) to the sequence shown in SEQ ID NO.2 and encoding the polypeptide shown in SEQ ID NO.1;
(c)与(a)-(b)任一所述的多核苷酸互补的多核苷酸。(c) A polynucleotide complementary to the polynucleotide described in any one of (a) to (b).
在另一优选例中,所述的多核苷酸在所述变体的ORF的侧翼还额外含有选自下组的辅助元件:信号肽、分泌肽、标签序列(如6His)、或其组合。In another preferred embodiment, the polynucleotide further contains auxiliary elements selected from the following groups on the flank of the ORF of the variant: a signal peptide, a secretory peptide, a tag sequence (such as 6His), or a combination thereof.
在另一优选例中,所述的多核苷酸选自下组:基因组序列、cDNA序列、RNA序列、或其组合。In another preferred embodiment, the polynucleotide is selected from the following group: genomic sequence, cDNA sequence, RNA sequence, or a combination thereof.
在另一优选例中,该多核苷酸还包含与所述变体的ORF序列操作性连接的启动子。In another preferred example, the polynucleotide further comprises a promoter operably linked to the ORF sequence of the variant.
在另一优选例中,所述的启动子选自下组:组成型启动子、组织特异性启动子、诱导型启动子、或者强启动子。In another preferred embodiment, the promoter is selected from the following group: a constitutive promoter, a tissue-specific promoter, an inducible promoter, or a strong promoter.
在另一优选例中,宿主细胞包括原核细胞或真核细胞。In another preferred embodiment, the host cell includes a prokaryotic cell or a eukaryotic cell.
在另一优选例中,所述的宿主细胞为真核细胞,如酵母细胞、植物细胞或哺乳动物细胞(包括人和非人哺乳动物)。In another preferred embodiment, the host cell is a eukaryotic cell, such as a yeast cell, a plant cell or a mammalian cell (including human and non-human mammals).
在另一优选例中,所述的宿主细胞为原核细胞,如大肠杆菌。In another preferred embodiment, the host cell is a prokaryotic cell, such as Escherichia coli.
在另一优选例中,所述酵母细胞选自下组的一种或多种来源的酵母:毕氏酵母、克鲁维酵母、或其组合;较佳地,所述的酵母细胞包括:克鲁维酵母,更佳地为马克斯克鲁维酵母、和/或乳酸克鲁维酵母。In another preferred embodiment, the yeast cell is selected from yeast of one or more sources of the following group: Pichia pastoris, Kluyveromyces, or a combination thereof; preferably, the yeast cell includes: Kluyveromyces, more preferably Kluyveromyces marxianus, and/or Kluyveromyces lactis.
在另一优选例中,所述宿主细胞选自下组:大肠杆菌、麦胚细胞,昆虫细胞,SF9、Hela、HEK293、CHO、酵母细胞、或其组合。 In another preferred embodiment, the host cell is selected from the following group: Escherichia coli, wheat germ cells, insect cells, SF9, Hela, HEK293, CHO, yeast cells, or a combination thereof.
本发明第四方面提供了一种分离的核酸分子,包含选自下列的序列,或由选自下列的序列组成:The fourth aspect of the present invention provides an isolated nucleic acid molecule comprising or consisting of a sequence selected from the following:
(i)SEQ ID NO:3所示的序列;(i) the sequence shown in SEQ ID NO: 3;
(ii)与SEQ ID NO:3所示的序列相比具有一个或多个碱基的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个碱基的置换、缺失或添加)的序列;(ii) a sequence having one or more base substitutions, deletions or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions or additions) compared to the sequence shown in SEQ ID NO: 3;
(iii)与SEQ ID NO:3所示的序列具有至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%的序列同一性的序列;(iii) a sequence having at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% sequence identity to the sequence set forth in SEQ ID NO: 3;
(iv)在严格条件下与(i)-(iii)任一项中所述的序列杂交的序列;或(iv) a sequence that hybridizes to the sequence described in any one of (i) to (iii) under stringent conditions; or
(v)(i)-(iii)任一项中所述的序列的互补序列;(v) a complementary sequence of the sequence described in any one of (i) to (iii);
并且,(ii)-(v)中任一项所述的序列基本保留了其所源自的序列的生物学功能;Furthermore, the sequence described in any one of (ii) to (v) substantially retains the biological function of the sequence from which it is derived;
例如,所述分离的核酸分子是RNA;For example, the isolated nucleic acid molecule is RNA;
例如,所述分离的核酸分子包含CRISPR/Cas系统中的同向重复序列。For example, the isolated nucleic acid molecule comprises a direct repeat sequence in a CRISPR/Cas system.
在另一优选例中,所述核酸分子包含一个或多个茎环或优化的二级结构;In another preferred embodiment, the nucleic acid molecule comprises one or more stem loops or optimized secondary structures;
例如,(ii)-(v)中任一项所述的序列保留了其所源自的序列的二级结构。For example, the sequence of any of (ii)-(v) retains the secondary structure of the sequence from which it is derived.
在另一优选例中,所述核酸分子包含选自下列的序列,或由选自下列的序列组成:In another preferred embodiment, the nucleic acid molecule comprises a sequence selected from the following, or consists of a sequence selected from the following:
(a)SEQ ID NO:3所示的核苷酸序列;(a) The nucleotide sequence shown in SEQ ID NO: 3;
(b)在严格条件下与(a)中所述的序列杂交的序列;或(b) a sequence that hybridizes under stringent conditions to the sequence described in (a); or
(c)SEQ ID NO:3所示的核苷酸序列的互补序列。(c) The complementary sequence of the nucleotide sequence shown in SEQ ID NO: 3.
本发明第五方面提供了一种向导RNA(gRNA),所述向导RNA包括能够结合本发明第一方面所述基因编辑蛋白的同向重复(Direct Repeat,DR)序列和能够靶向靶序列的间隔(spacer)序列。The fifth aspect of the present invention provides a guide RNA (gRNA), which includes a direct repeat (DR) sequence capable of binding to the gene editing protein described in the first aspect of the present invention and a spacer sequence capable of targeting the target sequence.
本发明第六方面提供了一种复合物,包含:The sixth aspect of the present invention provides a composite comprising:
(i)蛋白组分,选自下组:本发明第一方面所述的基因编辑蛋白、本发明第二方面所述的融合蛋白、或其组合;和(i) a protein component selected from the group consisting of the gene editing protein described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, or a combination thereof; and
(ii)核酸组分,选自下组:本发明第五方面所述的向导RNA,编码本发明第五方面所述的向导RNA的核酸,本发明第五方面所述的向导RNA的前体RNA,编码本发明第五方面所述的向导RNA的前体RNA核酸、或其组合;(ii) a nucleic acid component selected from the group consisting of the guide RNA described in the fifth aspect of the present invention, a nucleic acid encoding the guide RNA described in the fifth aspect of the present invention, a precursor RNA of the guide RNA described in the fifth aspect of the present invention, a precursor RNA nucleic acid encoding the guide RNA described in the fifth aspect of the present invention, or a combination thereof;
其中,所述蛋白组分与核酸组分相互结合形成复合物。Wherein, the protein component and the nucleic acid component are combined with each other to form a complex.
在另一优选例中,所述向导RNA(gRNA)中的同向重复(Direct Repeat,DR)序列连接于所述核酸分子的3’端或5’端。In another preferred embodiment, the direct repeat (DR) sequence in the guide RNA (gRNA) is connected to the 3’ end or 5’ end of the nucleic acid molecule.
在另一优选例中,所述向导RNA(gRNA)中的间隔(spacer)序列包含所述靶序列的互补序列。In another preferred embodiment, the spacer sequence in the guide RNA (gRNA) comprises a complementary sequence to the target sequence.
本发明第七方面提供了一种载体,包含本发明第三方面所述的多核苷酸或本发明第四方面所述的核酸分子。The seventh aspect of the present invention provides a vector comprising the polynucleotide described in the third aspect of the present invention or the nucleic acid molecule described in the fourth aspect of the present invention.
在另一优选例中,所述载体包含:In another preferred embodiment, the vector comprises:
(1)第一调控元件,所述第一调控元件可操作地连接至编码本发明第一方面所述的基因编辑蛋白的核苷酸序列或编码本发明第二方面所述的融合蛋白的核苷酸序列;和(1) a first regulatory element, which is operably linked to a nucleotide sequence encoding the gene editing protein described in the first aspect of the present invention or a nucleotide sequence encoding the fusion protein described in the second aspect of the present invention; and
(2)第二调控元件,所述第二调控元件可操作地连接至编码向导RNA的核苷 酸序列,所述向导RNA包含:(2) a second regulatory element, the second regulatory element being operably linked to a nucleotide encoding a guide RNA Acid sequence, the guide RNA comprises:
(a)能够与靶序列杂交的间隔(spacer)序列,和(a) a spacer sequence capable of hybridizing to the target sequence, and
(b)同向重复(Direct Repeat,DR)序列,其连接至所述间隔(spacer)序列,能够引导本发明第一方面所述的基因编辑蛋白结合至所述向导RNA以形成靶向所述靶序列的本发明第六方面所述的复合物。(b) a direct repeat (DR) sequence connected to the spacer sequence, capable of guiding the gene editing protein described in the first aspect of the present invention to bind to the guide RNA to form the complex described in the sixth aspect of the present invention that targets the target sequence.
在另一优选例中,所述第一调控元件和所述第二调控元件位于相同或不同载体上。In another preferred example, the first regulatory element and the second regulatory element are located on the same or different vectors.
在另一优选例中,所述第一调节元件和/或第二调节元件是启动子,例如诱导型启动子。In another preferred embodiment, the first regulatory element and/or the second regulatory element is a promoter, such as an inducible promoter.
在另一优选例中,所述载体包含一个或多个启动子,所述启动子可操作地与所述核酸序列、增强子、转录终止信号、多腺苷酸化序列、复制起点、选择性标记、核酸限制性位点、和/或同源重组位点连接。In another preferred embodiment, the vector comprises one or more promoters, which are operably connected to the nucleic acid sequence, enhancer, transcription termination signal, polyadenylation sequence, replication origin, selective marker, nucleic acid restriction site, and/or homologous recombination site.
在另一优选例中,所述载体包括质粒、病毒载体。In another preferred embodiment, the vector includes a plasmid or a viral vector.
在另一优选例中,所述的病毒载体选自下组:腺相关病毒(AAV)、腺病毒、慢病毒、逆转录病毒、疱疹病毒、SV40、痘病毒、或其组合。In another preferred embodiment, the viral vector is selected from the following group: adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, herpes virus, SV40, poxvirus, or a combination thereof.
在另一优选例中,所述载体包括克隆载体、转化载体、表达载体、穿梭载体、整合载体、多功能载体。In another preferred embodiment, the vector includes a cloning vector, a transformation vector, an expression vector, a shuttle vector, an integration vector, and a multifunctional vector.
本发明第八方面提供了一种CRISPR-Cas组合物,包含:The eighth aspect of the present invention provides a CRISPR-Cas composition, comprising:
(i)第一组分,选自下组:本发明第一方面所述的基因编辑蛋白、本发明第二方面所述的融合蛋白、编码本发明第一方面所述的基因编辑蛋白或本发明第二方面所述的融合蛋白的核苷酸序列,以及其任意组合;和(i) a first component selected from the group consisting of the gene editing protein described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, a nucleotide sequence encoding the gene editing protein described in the first aspect of the present invention or the fusion protein described in the second aspect of the present invention, and any combination thereof; and
(ii)第二组分,所述第二组分为包含一种或多种本发明第五方面所述的向导RNA,或者编码所述包含一种或多种本发明第五方面所述的向导RNA的核苷酸序列;(ii) a second component, wherein the second component is a nucleotide sequence comprising one or more guide RNAs according to the fifth aspect of the present invention, or encoding the nucleotide sequence comprising one or more guide RNAs according to the fifth aspect of the present invention;
所述向导RNA能够与(i)中所述的蛋白或蛋白变体或融合蛋白形成复合物。The guide RNA is capable of forming a complex with the protein, protein variant or fusion protein described in (i).
在另一优选例中,所述向导RNA从5’至3’方向包含同向重复序列和间隔(spacer)序列,所述间隔(spacer)序列能够与靶序列杂交。In another preferred embodiment, the guide RNA comprises a direct repeat sequence and a spacer sequence from the 5' to the 3' direction, and the spacer sequence is capable of hybridizing with the target sequence.
在另一优选例中,所述同向重复序列是本发明第四方面中所定义的核酸分子。In another preferred embodiment, the direct repeat sequence is the nucleic acid molecule defined in the fourth aspect of the present invention.
在另一优选例中,所述组合物还包括药学上可接受的载体。In another preferred embodiment, the composition further comprises a pharmaceutically acceptable carrier.
在另一优选例中,所述组合物包括药物组合物。In another preferred embodiment, the composition comprises a pharmaceutical composition.
在另一优选例中,所述组合物的剂型选自下组:冻干制剂、液体制剂、或其组合。In another preferred embodiment, the dosage form of the composition is selected from the following group: a lyophilized preparation, a liquid preparation, or a combination thereof.
在另一优选例中,所述组合物的剂型为液体制剂。In another preferred embodiment, the composition is in the form of a liquid preparation.
在另一优选例中,所述组合物的剂型为注射剂型。In another preferred embodiment, the composition is in the form of an injection.
在另一优选例中,所述组合物为细胞制剂。In another preferred embodiment, the composition is a cell preparation.
本发明第九方面提供了一种CRISPR-Cas系统,包含一种或多种载体,所述一种或多种载体包含:A ninth aspect of the present invention provides a CRISPR-Cas system, comprising one or more vectors, wherein the one or more vectors comprise:
(i)第一核酸,其为编码本发明第一方面所述的基因编辑蛋白或本发明第二方面所述的融合蛋白的核苷酸序列;任选地所述第一核酸可操作地连接至第一调节元件;以及(i) a first nucleic acid, which is a nucleotide sequence encoding the gene editing protein described in the first aspect of the present invention or the fusion protein described in the second aspect of the present invention; optionally, the first nucleic acid is operably linked to a first regulatory element; and
(ii)第二核酸,其编码包含本发明第五方面所述的向导RNA的核苷酸序列; 任选地所述第二核酸可操作地连接至第二调节元件;(ii) a second nucleic acid encoding a nucleotide sequence comprising the guide RNA according to the fifth aspect of the present invention; Optionally the second nucleic acid is operably linked to a second regulatory element;
其中:in:
所述第一核酸与第二核酸存在于相同或不同的载体上;The first nucleic acid and the second nucleic acid are present on the same or different vectors;
所述向导RNA能够与(i)中所述的蛋白或融合蛋白形成复合物。The guide RNA is capable of forming a complex with the protein or fusion protein described in (i).
在另一优选例中,所述载体包括质粒、病毒载体。In another preferred embodiment, the vector includes a plasmid or a viral vector.
在另一优选例中,所述向导RNA包括能够与靶序列杂交的间隔(spacer)序列;和与间隔(spacer)序列连接,并能够引导所述蛋白结合至所述向导RNA,从而形成靶向所述靶序列的CRISPR-Cas组合物或复合物的同向重复(Direct Repeat,DR)序列。In another preferred example, the guide RNA includes a spacer sequence capable of hybridizing with the target sequence; and a direct repeat (DR) sequence connected to the spacer sequence and capable of guiding the protein to bind to the guide RNA, thereby forming a CRISPR-Cas composition or complex targeting the target sequence.
在另一优选例中,所述向导RNA包括未修饰和经修饰的向导RNA。In another preferred embodiment, the guide RNA includes unmodified and modified guide RNA.
在另一优选例中,所述经修饰的向导RNA包括碱基的化学修饰。In another preferred embodiment, the modified guide RNA includes chemical modification of the bases.
在另一优选例中,所述化学修饰包括甲基化修饰、甲氧基修饰、氟化修饰或硫代修饰。In another preferred embodiment, the chemical modification includes methylation modification, methoxy modification, fluorination modification or thio modification.
在另一优选例中,所述同向重复序列是权利要求4中所定义的核酸分子。In another preferred embodiment, the directly repeated sequence is the nucleic acid molecule defined in claim 4.
在另一优选例中,所述第一调节元件和/或第二调节元件是启动子,例如诱导型启动子。In another preferred embodiment, the first regulatory element and/or the second regulatory element is a promoter, such as an inducible promoter.
在另一优选例中,所述组合物中的至少一个组分是非天然存在的或经修饰的。In another preferred embodiment, at least one component in the composition is non-naturally occurring or modified.
在另一优选例中,所述间隔(spacer)序列连接至所述同向重复(Direct Repeat,DR)序列的3’端。In another preferred embodiment, the spacer sequence is connected to the 3’ end of the direct repeat (DR) sequence.
在另一优选例中,所述间隔(spacer)序列包含所述靶序列的互补序列。In another preferred example, the spacer sequence comprises a complementary sequence to the target sequence.
在另一优选例中,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3'端,并且所述PAM具有5'-PAM为TTTN所示的序列,N为A、T、C或G。In another preferred embodiment, when the target sequence is DNA, the target sequence is located at the 3' end of the protospacer adjacent motif (PAM), and the PAM has a sequence represented by 5'-PAM being TTTN, and N is A, T, C or G.
在另一优选例中,所述靶序列是来自原核细胞或真核细胞的DNA或基于RNA反转录形成的DNA序列;或者,所述靶序列是非天然存在的DNA或基于RNA反转录形成的DNA序列。In another preferred embodiment, the target sequence is a DNA from a prokaryotic cell or a eukaryotic cell, or a DNA sequence formed based on RNA reverse transcription; or, the target sequence is a non-naturally occurring DNA, or a DNA sequence formed based on RNA reverse transcription.
在另一优选例中,所述靶序列包括cDNA序列。In another preferred example, the target sequence includes a cDNA sequence.
在另一优选例中,所述靶序列包括单链DNA、双链DNA序列。In another preferred embodiment, the target sequence includes single-stranded DNA and double-stranded DNA sequences.
在另一优选例中,所述靶序列存在于细胞内。In another preferred embodiment, the target sequence exists in cells.
在另一优选例中,所述靶序列存在于细胞核内或细胞质(例如,细胞器)内。In another preferred embodiment, the target sequence is present in the cell nucleus or cytoplasm (eg, organelle).
在另一优选例中,所述细胞是真核细胞。In another preferred embodiment, the cell is a eukaryotic cell.
在另一优选例中,所述细胞是原核细胞。In another preferred embodiment, the cell is a prokaryotic cell.
在另一优选例中,所述靶序列存在于细胞外部。In another preferred embodiment, the target sequence exists outside the cell.
在另一优选例中,本发明第一方面所述基因编辑蛋白连接有一个或多个NLS序列,或者,所述融合蛋白包含一个或多个NLS序列。In another preferred embodiment, the gene editing protein in the first aspect of the present invention is connected to one or more NLS sequences, or the fusion protein contains one or more NLS sequences.
在另一优选例中,所述NLS序列连接至本发明第一方面所述的基因编辑蛋白的N端或C端。In another preferred embodiment, the NLS sequence is connected to the N-terminus or C-terminus of the gene editing protein described in the first aspect of the present invention.
在另一优选例中,所述NLS序列融合至本发明第一方面所述的基因编辑蛋白的N端或C端。In another preferred embodiment, the NLS sequence is fused to the N-terminus or C-terminus of the gene editing protein described in the first aspect of the present invention.
本发明第十方面提供了一种试剂盒,包括一种或多种选自下列的组分:本 发明第一方面所述的基因编辑蛋白、本发明第二方面所述的融合蛋白、本发明第三方面所述的多核苷酸、本发明第六方面所述的复合物、本发明第七方面所述的载体、本发明第八方面所述的CRISPR-Cas组合物或本发明第九方面所述的系统。The tenth aspect of the present invention provides a kit comprising one or more components selected from the following: The gene editing protein described in the first aspect of the invention, the fusion protein described in the second aspect of the invention, the polynucleotide described in the third aspect of the invention, the complex described in the sixth aspect of the invention, the vector described in the seventh aspect of the invention, the CRISPR-Cas composition described in the eighth aspect of the invention, or the system described in the ninth aspect of the invention.
在另一优选例中,所述试剂盒还包括标签或说明书。In another preferred embodiment, the kit further comprises a label or instructions.
在另一优选例中,所述试剂盒用于基因或基因组编辑、疾病治疗、靶向靶基因、切割目的基因或非目的基因的一种或多种。In another preferred embodiment, the kit is used for one or more of gene or genome editing, disease treatment, target gene targeting, and cutting of target genes or non-target genes.
本发明第十一方面提供了一种递送组合物,包含递送载体,以及选自下列的一种或多种:本发明第一方面所述的基因编辑蛋白、本发明第二方面所述的融合蛋白、本发明第三方面所述的多核苷酸、本发明第六方面所述的复合物、本发明第七方面所述的载体、本发明第八方面所述的CRISPR-Cas组合物或本发明第九方面所述的系统。The eleventh aspect of the present invention provides a delivery composition comprising a delivery vector and one or more selected from the following: the gene editing protein described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, the polynucleotide described in the third aspect of the present invention, the complex described in the sixth aspect of the present invention, the vector described in the seventh aspect of the present invention, the CRISPR-Cas composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention.
在另一优选例中,所述递送载体是粒子。In another preferred embodiment, the delivery vehicle is a particle.
在另一优选例中,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、微泡、基因枪或病毒载体(例如,复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒)。In another preferred embodiment, the delivery vector is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, microvesicles, gene guns or viral vectors (e.g., replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses).
本发明第十二方面提供了一种宿主细胞,包含本发明第一方面所述的基因编辑蛋白、本发明第二方面所述的融合蛋白、本发明第三方面所述的多核苷酸、本发明第四方面所述的核酸分子、本发明第六方面所述的复合物、本发明第七方面所述的载体、本发明第八方面所述的CRISPR-Cas组合物或本发明第九方面所述的系统或本发明第十一方面所述的递送组合物。The twelfth aspect of the present invention provides a host cell, comprising the gene editing protein described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, the polynucleotide described in the third aspect of the present invention, the nucleic acid molecule described in the fourth aspect of the present invention, the complex described in the sixth aspect of the present invention, the vector described in the seventh aspect of the present invention, the CRISPR-Cas composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention.
在另一优选例中,所述的宿主细胞为真核细胞,如酵母细胞、植物细胞或哺乳动物细胞(包括人和非人哺乳动物)。In another preferred embodiment, the host cell is a eukaryotic cell, such as a yeast cell, a plant cell or a mammalian cell (including human and non-human mammals).
在另一优选例中,所述的宿主细胞为原核细胞,如大肠杆菌。In another preferred embodiment, the host cell is a prokaryotic cell, such as Escherichia coli.
在另一优选例中,所述酵母细胞选自下组的一种或多种来源的酵母:毕氏酵母、克鲁维酵母、或其组合;较佳地,所述的酵母细胞包括:克鲁维酵母,更佳地为马克斯克鲁维酵母、和/或乳酸克鲁维酵母。In another preferred embodiment, the yeast cell is selected from yeast of one or more sources of the following group: Pichia pastoris, Kluyveromyces, or a combination thereof; preferably, the yeast cell includes: Kluyveromyces, more preferably Kluyveromyces marxianus, and/or Kluyveromyces lactis.
在另一优选例中,所述宿主细胞选自下组:大肠杆菌、麦胚细胞,昆虫细胞,SF9、Hela、HEK293、CHO、酵母细胞、或其组合。In another preferred embodiment, the host cell is selected from the following group: Escherichia coli, wheat germ cells, insect cells, SF9, Hela, HEK293, CHO, yeast cells, or a combination thereof.
本发明第十三方面提供了一种酶制剂,所述酶制剂包括本发明第一方面所述的基因编辑蛋白、本发明第二方面所述的融合蛋白、本发明第六方面所述的复合物、本发明第八方面所述的CRISPR-Cas组合物或本发明第九方面所述的系统或本发明第十一方面所述的递送组合物。The thirteenth aspect of the present invention provides an enzyme preparation, which includes the gene editing protein described in the first aspect of the present invention, the fusion protein described in the second aspect of the present invention, the complex described in the sixth aspect of the present invention, the CRISPR-Cas composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention or the delivery composition described in the eleventh aspect of the present invention.
在另一优选例中,所述的酶制剂包括注射剂、和/或冻干制剂。In another preferred embodiment, the enzyme preparation includes an injection and/or a lyophilized preparation.
本发明第十四方面提供了一种药盒,包括:A fourteenth aspect of the present invention provides a medicine kit, comprising:
第一容器,以及位于所述第一容器中的本发明第六方面所述的复合物或本发明第八方面所述的组合物或本发明第九方面所述的系统,或含有本发明第六方面所述的复合物或本发明第八方面所述的组合物或本发明第九方面所述的系统的药物。A first container, and the complex described in the sixth aspect of the present invention, the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention located in the first container, or a drug containing the complex described in the sixth aspect of the present invention, the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention.
在另一优选例中,所述的第一容器的药物是含本发明第六方面所述的复合物或本发明第八方面所述的组合物或本发明第九方面所述的系统的单方制剂。In another preferred embodiment, the drug in the first container is a single preparation containing the complex described in the sixth aspect of the present invention, the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention.
在另一优选例中,所述药物的剂型选自下组:冻干制剂、液体制剂、或其组合。 In another preferred embodiment, the dosage form of the drug is selected from the following group: a lyophilized preparation, a liquid preparation, or a combination thereof.
在另一优选例中,所述药物的剂型为口服剂型或注射剂型。In another preferred embodiment, the dosage form of the drug is an oral dosage form or an injection dosage form.
在另一优选例中,所述的药盒还含有说明书。In another preferred embodiment, the medicine kit further contains instructions.
本发明第十五方面提供了一种药盒,包括:A fifteenth aspect of the present invention provides a medicine kit, comprising:
(a1)第一容器,以及位于所述第一容器中的本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或其编码基因或其表达载体,或含有本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或其编码基因或其表达载体的药物;(a1) a first container, and the gene editing protein according to the first aspect of the present invention, or the fusion protein according to the second aspect of the present invention, or a gene encoding the same or an expression vector thereof, or a drug containing the gene editing protein according to the first aspect of the present invention, or the fusion protein according to the second aspect of the present invention, or a gene encoding the same or an expression vector thereof, located in the first container;
(b1)任选的第二容器,以及位于所述第二容器中的本发明第五方面所述的向导RNA或其表达载体,或含有本发明第五方面所述的向导RNA或其表达载体的药物。(b1) an optional second container, and the guide RNA or its expression vector according to the fifth aspect of the present invention, or a drug containing the guide RNA or its expression vector according to the fifth aspect of the present invention, located in the second container.
在另一优选例中,所述的第一容器和第二容器为不同的容器。In another preferred embodiment, the first container and the second container are different containers.
在另一优选例中,所述的第一容器的药物是含有本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或其编码基因或其表达载体的单方制剂。In another preferred embodiment, the drug in the first container is a single preparation containing the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or its encoding gene or its expression vector.
在另一优选例中,所述的第二容器的药物是含有本发明第五方面所述的向导RNA或其表达载体的单方制剂。In another preferred embodiment, the drug in the second container is a single preparation containing the guide RNA or its expression vector described in the fifth aspect of the present invention.
在另一优选例中,所述药物的剂型选自下组:冻干制剂、液体制剂、或其组合。In another preferred embodiment, the dosage form of the drug is selected from the following group: a lyophilized preparation, a liquid preparation, or a combination thereof.
在另一优选例中,所述药物的剂型为口服剂型或注射剂型。In another preferred embodiment, the dosage form of the drug is an oral dosage form or an injection dosage form.
在另一优选例中,所述的药盒还含有说明书。In another preferred embodiment, the medicine kit further contains instructions.
本发明第十六方面提供了一种靶向和编辑靶基因或切割靶基因的方法,包括:本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或本发明第六方面所述的复合物或本发明第八方面所述的组合物或本发明第九方面所述的系统或本发明第十一方面所述的递送组合物或本发明第十三方面所述的酶制剂或本发明第十四方面或本发明第十五方面所述的药盒与所述靶基因接触,或者递送至包含所述靶基因的细胞中,靶序列存在于所述靶基因中。The sixteenth aspect of the present invention provides a method for targeting and editing a target gene or cutting a target gene, comprising: the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention, or the enzyme preparation described in the thirteenth aspect of the present invention, or the drug kit described in the fourteenth aspect of the present invention or the fifteenth aspect of the present invention is contacted with the target gene, or delivered to a cell containing the target gene, and the target sequence is present in the target gene.
在另一优选例中,所述靶基因存在于细胞内。In another preferred embodiment, the target gene exists in cells.
在另一优选例中,所述细胞是原核细胞。In another preferred embodiment, the cell is a prokaryotic cell.
在另一优选例中,所述细胞是真核细胞,例如哺乳动物细胞(例如人类细胞)或植物细胞。In another preferred embodiment, the cell is a eukaryotic cell, such as a mammalian cell (eg, a human cell) or a plant cell.
在另一优选例中,所述靶基因存在于体外的核酸分子(例如,质粒)中。In another preferred embodiment, the target gene exists in a nucleic acid molecule (eg, a plasmid) in vitro.
在另一优选例中,所述编辑靶基因或切割靶基因包括靶序列的断裂,如DNA的双链断裂或RNA的单链断裂,或将外源核酸插入所述断裂中。In another preferred example, the editing of the target gene or the cutting of the target gene includes the break of the target sequence, such as a double-strand break of DNA or a single-strand break of RNA, or the insertion of an exogenous nucleic acid into the break.
在另一优选例中,所述靶基因包括DNA。In another preferred embodiment, the target gene includes DNA.
在另一优选例中,所述DNA包括单链DNA、双链DNA。In another preferred embodiment, the DNA includes single-stranded DNA and double-stranded DNA.
本发明第十七方面提供了一种诱导细胞状态改变的方法,所述方法包括将本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或本发明第六方面所述的复合物或本发明第八方面所述的组合物或本发明第九方面所述的系统或本发明第十一方面所述的递送组合物或本发明第十三方面所述的酶制剂或本发明第十四方面或本发明第十五方面所述的药盒与细胞中的靶基因接触。The seventeenth aspect of the present invention provides a method for inducing a change in a cell state, the method comprising contacting the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention, or the enzyme preparation described in the thirteenth aspect of the present invention, or the drug kit described in the fourteenth aspect of the present invention or the fifteenth aspect of the present invention with a target gene in a cell.
本发明第十八方面提供了一种改变基因产物的表达的方法,包括:将本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或本发明 第六方面所述的复合物或本发明第八方面所述的组合物或本发明第九方面所述的系统或本发明第十一方面所述的递送组合物或本发明第十三方面所述的酶制剂或本发明第十四方面或本发明第十五方面所述的药盒与编码所述基因产物的核酸分子接触,或者递送至包含所述核酸分子的细胞中,所述靶序列存在于所述核酸分子中。The eighteenth aspect of the present invention provides a method for changing the expression of a gene product, comprising: modifying the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the The complex described in the sixth aspect, the composition described in the eighth aspect of the present invention, the system described in the ninth aspect of the present invention, the delivery composition described in the eleventh aspect of the present invention, the enzyme preparation described in the thirteenth aspect of the present invention, or the drug kit described in the fourteenth aspect of the present invention or the fifteenth aspect of the present invention is contacted with a nucleic acid molecule encoding the gene product, or delivered to a cell containing the nucleic acid molecule, and the target sequence is present in the nucleic acid molecule.
在另一优选例中,所述核酸分子存在于体外的核酸分子(例如,质粒)中。In another preferred embodiment, the nucleic acid molecule is present in an in vitro nucleic acid molecule (eg, a plasmid).
在另一优选例中,所述基因产物的表达被改变(例如,增强或降低)。In another preferred embodiment, the expression of the gene product is altered (eg, enhanced or decreased).
在另一优选例中,所述基因产物是蛋白。In another preferred embodiment, the gene product is a protein.
在另一优选例中,所述的蛋白、融合蛋白、多核苷酸、分离的核酸分子、复合物、载体或组合物包含于递送载体中。In another preferred embodiment, the protein, fusion protein, polynucleotide, isolated nucleic acid molecule, complex, vector or composition is contained in a delivery vehicle.
在另一优选例中,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、病毒载体(如复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒)。In another preferred embodiment, the delivery vector is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, viral vectors (such as replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses).
在另一优选例中,用于改变靶基因或编码靶基因产物的核酸分子中的一个或多个靶序列来修饰细胞、细胞系或生物体。In another preferred embodiment, the method is used to modify cells, cell lines or organisms by changing one or more target sequences in a target gene or a nucleic acid molecule encoding a target gene product.
本发明第十九方面提供了一种由本发明第十六方面至本发明第十八方面中任一方面所述的方法获得的细胞或其子代,其中所述细胞包含在其野生型中不存在的修饰。The nineteenth aspect of the present invention provides a cell or its progeny obtained by the method described in any one of the sixteenth to eighteenth aspects of the present invention, wherein the cell comprises a modification that is not present in its wild type.
本发明第二十方面提供了本发明第十九方面所述的细胞或其子代的细胞产物。The twentieth aspect of the present invention provides a cell product of the cell or its progeny described in the nineteenth aspect of the present invention.
本发明第二十一方面提供了一种体外的、离体的或体内的细胞或细胞系或它们的子代,所述细胞或细胞系或它们的子代包含:本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或本发明第三方面所述的多核苷酸或本发明第六方面所述的复合物或本发明第七方面所述的载体或本发明第八方面所述的组合物或本发明第九方面所述的系统或本发明第十一方面所述的递送组合物。The twenty-first aspect of the present invention provides an in vitro, ex vivo or in vivo cell or cell line or their progeny, wherein the cell or cell line or their progeny comprises: the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the polynucleotide described in the third aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the vector described in the seventh aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention.
在另一优选例中,所述细胞是原核细胞。In another preferred embodiment, the cell is a prokaryotic cell.
在另一优选例中,所述细胞是真核细胞,例如哺乳动物细胞(例如人类细胞)或植物细胞。In another preferred embodiment, the cell is a eukaryotic cell, such as a mammalian cell (eg, a human cell) or a plant cell.
在另一优选例中,所述细胞是干细胞或干细胞系。In another preferred embodiment, the cells are stem cells or stem cell lines.
本发明第二十二方面提供了本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或本发明第三方面所述的多核苷酸或本发明第六方面所述的复合物或本发明第七方面所述的载体或本发明第八方面所述的组合物或本发明第九方面所述的系统或本发明第十方面所述的试剂盒或本发明第十一方面所述的递送组合物或本发明第十三方面所述的酶制剂或本发明第十四方面或本发明第十五方面所述的药盒的用途,用于制备药物或制剂,所述药物或制剂用于核酸编辑(例如,基因或基因组编辑)。The twenty-second aspect of the present invention provides the use of the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the polynucleotide described in the third aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the vector described in the seventh aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the kit described in the tenth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention, or the enzyme preparation described in the thirteenth aspect of the present invention, or the medicine kit described in the fourteenth aspect of the present invention or the fifteenth aspect of the present invention, for preparing a drug or preparation, wherein the drug or preparation is used for nucleic acid editing (e.g., gene or genome editing).
在另一优选例中,所述基因或基因组编辑包括修饰基因、敲除基因、改变基因产物的表达、修复突变、和/或插入多核苷酸。In another preferred embodiment, the gene or genome editing includes modifying genes, knocking out genes, changing the expression of gene products, repairing mutations, and/or inserting polynucleotides.
本发明第二十三方面提供了本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或本发明第三方面所述的多核苷酸或本发明第六方面所述的复合物或本发明第七方面所述的载体或本发明第八方面所述的组合物 或本发明第九方面所述的系统或本发明第十方面所述的试剂盒或本发明第十一方面所述的递送组合物或本发明第十三方面所述的酶制剂或本发明第十四方面或本发明第十五方面所述的药盒的用途,用于制备药物或制剂,所述药物或制剂用于选自下组的一种或多种:The twenty-third aspect of the present invention provides the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the polynucleotide described in the third aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the vector described in the seventh aspect of the present invention, or the composition described in the eighth aspect of the present invention. or the system of the ninth aspect of the present invention, or the kit of the tenth aspect of the present invention, or the delivery composition of the eleventh aspect of the present invention, or the enzyme preparation of the thirteenth aspect of the present invention, or the kit of the fourteenth aspect of the present invention or the fifteenth aspect of the present invention, for preparing a drug or a preparation, wherein the drug or preparation is used for one or more selected from the group consisting of:
(i)离体基因或基因组编辑;(i) ex vivo gene or genome editing;
(ii)离体单链DNA的检测;(ii) Detection of single-stranded DNA in vitro;
(iii)编辑靶基因座中的靶序列来修饰生物或非人类生物;(iii) editing a target sequence in a target locus to modify an organism or non-human organism;
(iv)治疗由靶基因座中的靶序列的缺陷引起的病症;(iv) treating a disorder caused by a defect in the target sequence in the target locus;
(v)治疗有需要的受试者的病症或疾病。(v) treating a condition or disease in a subject in need thereof.
在另一优选例中,所述病症或疾病包括癌症、传染性疾病、神经疾病、眼科疾病、听力疾病。In another preferred embodiment, the disease or illness includes cancer, infectious disease, neurological disease, ophthalmic disease, hearing disease.
在另一优选例中,所述疾病或病症包括囊性纤维化、动脉粥样硬化心血管疾病(ASCVD)、进行性假肥大性肌营养不良(Duchenne型肌营养不良,DMD)、贝克肌营养不良、α-1-抗胰蛋白酶缺乏、庞贝病(糖原贮积病Ⅱ型)、强直性肌营养不良、亨廷顿病、脆性X综合征、弗里德赖希共济失调、肌萎缩侧索硬化、遗传性慢性肾脏病、镰状细胞病、原发性高草酸尿症(PH1)、β地中海贫血、额颞叶痴呆、莱伯氏先天性黑蒙、高脂血症、高胆固醇血症(FH)、遗传性血管性水肿(HAE)、转甲状腺素蛋白淀粉样变(ATTR)、乙型肝炎(Hepatitis B)、视网膜疾病、黄斑变性、维尔姆斯瘤、尤文肉瘤、神经内分泌瘤、胶质母细胞瘤、神经母细胞瘤、黑色素瘤、皮肤癌、乳腺癌、结肠癌、直肠癌、前列腺癌、肝癌、肾癌、胰腺癌、肺癌、胆道癌、宫颈癌、子宫内膜癌、食管癌、胃癌、头颈癌、甲状腺髓样癌、卵巢癌、胶质瘤、淋巴瘤、白血病、骨髓瘤、急性淋巴细胞白血病、急性髓细胞性白血病、慢性淋巴细胞白血病、慢性髓细胞性白血病、何杰金氏淋巴瘤、非何杰金氏淋巴瘤和尿膀胱癌。In another preferred embodiment, the disease or condition includes cystic fibrosis, atherosclerotic cardiovascular disease (ASCVD), progressive pseudohypertrophic muscular dystrophy (Duchenne muscular dystrophy, DMD), Becker muscular dystrophy, alpha-1-antitrypsin deficiency, Pompe disease (glycogen storage disease type II), myotonic dystrophy, Huntington's disease, fragile X syndrome, Friedreich's ataxia, amyotrophic lateral sclerosis, hereditary chronic kidney disease, sickle cell disease, primary hyperoxaluria (PH1), beta thalassemia, frontotemporal dementia, Leber's congenital amaurosis, hyperlipidemia, hypercholesterolemia (FH), hereditary angioedema (HAE), transthyretinopathy. ATTR, Hepatitis B, Retinal diseases, Macular degeneration, Wilms' tumor, Ewing's sarcoma, Neuroendocrine tumors, Glioblastoma, Neuroblastoma, Melanoma, Skin cancer, Breast cancer, Colon cancer, Rectal cancer, Prostate cancer, Liver cancer, Kidney cancer, Pancreatic cancer, Lung cancer, Biliary tract cancer, Cervical cancer, Endometrial cancer, Esophageal cancer, Gastric cancer, Head and neck cancer, Medullary thyroid cancer, Ovarian cancer, Glioma, Lymphoma, Leukemia, Myeloma, Acute lymphocytic leukemia, Acute myeloid leukemia, Chronic lymphocytic leukemia, Chronic myeloid leukemia, Hodgkin's lymphoma, Non-Hodgkin's lymphoma and Urinary bladder cancer.
在另一优选例中,所述病症或疾病是由致病性点突变引起。In another preferred embodiment, the disorder or disease is caused by a pathogenic point mutation.
本发明第二十四方面提供了一种检测样品中是否存在靶标核酸分子的方法,所述方法包括将样品与本发明第一方面所述的基因编辑蛋白、或本发明第二方面所述的融合蛋白、或本发明第六方面所述的复合物或本发明第八方面所述的组合物或本发明第九方面所述的系统或本发明第十方面所述的试剂盒或本发明第十一方面所述的递送组合物或本发明第十三方面所述的酶制剂和非靶序列接触,检测非靶序列被切割产生的可检测信号,从而检测靶标核酸分子,所述非靶序列不与向导RNA杂交。The twenty-fourth aspect of the present invention provides a method for detecting whether a target nucleic acid molecule is present in a sample, the method comprising contacting the sample with the gene editing protein described in the first aspect of the present invention, or the fusion protein described in the second aspect of the present invention, or the complex described in the sixth aspect of the present invention, or the composition described in the eighth aspect of the present invention, or the system described in the ninth aspect of the present invention, or the kit described in the tenth aspect of the present invention, or the delivery composition described in the eleventh aspect of the present invention, or the enzyme preparation described in the thirteenth aspect of the present invention and a non-target sequence, detecting a detectable signal generated by the cleavage of the non-target sequence, thereby detecting the target nucleic acid molecule, wherein the non-target sequence does not hybridize with the guide RNA.
在另一优选例中,所述非靶序列被复合物或CRISPR-Cas组合物或系统或递送组合物中的蛋白切割,则表示所述样本中存在靶标核酸分子;而所述非靶序列不被复合物或CRISPR-Cas组合物或系统或递送组合物中的蛋白切割,则表示所述样本中不存在靶标核酸分子。In another preferred example, if the non-target sequence is cleaved by a protein in the complex or CRISPR-Cas composition or system or delivery composition, it indicates that the target nucleic acid molecule is present in the sample; and if the non-target sequence is not cleaved by a protein in the complex or CRISPR-Cas composition or system or delivery composition, it indicates that the target nucleic acid molecule is not present in the sample.
在另一优选例中,所述靶标核酸分子为靶标DNA。In another preferred embodiment, the target nucleic acid molecule is a target DNA.
在另一优选例中,所述的靶标DNA包括基于RNA反转录形成的DNA。In another preferred embodiment, the target DNA includes DNA formed based on RNA reverse transcription.
在另一优选例中,所述的靶标DNA包括cDNA。In another preferred embodiment, the target DNA includes cDNA.
在另一优选例中,所述的靶标DNA选自下组:单链DNA、双链DNA、或其组合。In another preferred embodiment, the target DNA is selected from the following group: single-stranded DNA, double-stranded DNA, or a combination thereof.
应理解,在本发明范围内中,本发明的上述各技术特征和在下文(如实施例) 中具体描述的各技术特征之间都可以互相组合,从而构成新的或优选的技术方案。限于篇幅,在此不再一一累述。It should be understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the following (such as embodiments) The various technical features specifically described in the specification can be combined with each other to form a new or preferred technical solution. Due to space limitations, they will not be described one by one here.
图1示出了EcoRI和NotI双酶切CasW1插入片段和pET-28a(+)载体琼脂糖凝胶电泳图。FIG. 1 shows an agarose gel electrophoresis diagram of the CasW1 insert fragment and the pET-28a(+) vector double-digested with EcoRI and NotI.
图2示出了新纯化后的CasW1蛋白SDS-PAGE考马斯亮蓝染色图。FIG2 shows the SDS-PAGE Coomassie Brilliant Blue staining of the newly purified CasW1 protein.
图3示出了体外制备dsDNA模板琼脂糖凝胶电泳图。FIG3 shows an agarose gel electrophoresis diagram of dsDNA template prepared in vitro.
图4示出了CasW1体外切割效果鉴定毛细电泳图,以Cpf1的体外切割效果为对照,可以看出CasW1可以将450bp的dsDNA模板,切割成两条dsDNA片段,切割活性接近100%。FIG4 shows a capillary electrophoresis diagram for identifying the in vitro cleavage effect of CasW1. Taking the in vitro cleavage effect of Cpf1 as a control, it can be seen that CasW1 can cleave a 450 bp dsDNA template into two dsDNA fragments, and the cleavage activity is close to 100%.
图5示出了CasW1在体外切割效果鉴定毛细电泳图,以现有技术中的Cas12i.16以及S7R-Cas12i3(M2869)作为对照。Figure 5 shows a capillary electrophoresis diagram for identifying the cleavage effect of CasW1 in vitro, with Cas12i.16 and S7R-Cas12i3 (M2869) in the prior art as controls.
图6示出了pET-28a(+)-CasW1的质粒图谱。FIG. 6 shows a plasmid map of pET-28a(+)-CasW1.
图7示出了pET-28a(+)-LbCpf1的质粒图谱。FIG. 7 shows a plasmid map of pET-28a(+)-LbCpf1.
图8示出了pET-28a(+)-HED Cas12i.16的质粒图谱。Figure 8 shows the plasmid map of pET-28a(+)-HED Cas12i.16.
图9示出了pET-28a(+)-S7R Cas12i.3的质粒图谱。Figure 9 shows the plasmid map of pET-28a(+)-S7R Cas12i.3.
图10示出了CasW1的DR序列二级结构示意图。FIG10 shows a schematic diagram of the secondary structure of the DR sequence of CasW1.
本发明人经过广泛而深入的研究,首次意外的发现了一种新的基因编辑蛋白,本发明的基因编辑蛋白具有非常好的基因编辑活性,可对靶基因进行有效编辑或切割,可有效治疗有需要的受试者的病症或疾病。在此基础上,本发明人完成了本发明。After extensive and in-depth research, the inventor unexpectedly discovered a new gene editing protein for the first time. The gene editing protein of the present invention has very good gene editing activity, can effectively edit or cut the target gene, and can effectively treat the symptoms or diseases of subjects in need. On this basis, the inventor completed the present invention.
术语the term
以下实施例仅用于描述本发明,而非限定本发明。除非特别指明,否则基本上按照本领域内熟知的以及在各种参考文献中描述的常规方法进行实施例中描述的实验和方法。The following examples are only used to describe the present invention, but not to limit the present invention. Unless otherwise specified, the experiments and methods described in the examples are basically carried out according to conventional methods well known in the art and described in various references.
另外,实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。本领域技术人员知晓,实施例以举例方式描述本发明,且不意欲限制本发明所要求保护的范围。本文中提及的全部公开案和其他参考资料以其全文通过引用合并入本文。In addition, if the specific conditions are not specified in the examples, they are carried out according to the conventional conditions or the conditions recommended by the manufacturer. If the manufacturer is not specified in the reagents or instruments used, they are all conventional products that can be obtained commercially. It is known to those skilled in the art that the embodiments describe the present invention by way of example and are not intended to limit the scope of the present invention. All public cases and other references mentioned herein are incorporated herein by reference in their entirety.
为了可以更容易地理解本公开,首先定义某些术语。如本申请中所使用的,除非本文另有明确规定,否则以下术语中的每一个应具有下面给出的含义。在整个申请中阐述了其它定义。In order to more easily understand the present disclosure, some terms are first defined. As used in this application, unless otherwise expressly provided herein, each of the following terms should have the meaning given below. Other definitions are set forth throughout the application.
术语“约”可以是指在本领域普通技术人员确定的特定值或组成的可接受误差范围内的值或组成,其将部分地取决于如何测量或测定值或组成。例如,如本文所用,表述“约100”包括99和101和之间的全部值(例如,99.1、99.2、99.3、99.4等)。The term "about" can refer to a value or composition that is within an acceptable error range for a particular value or composition as determined by one of ordinary skill in the art, which will depend in part on how the value or composition is measured or determined. For example, as used herein, the expression "about 100" includes all values between 99 and 101 (e.g., 99.1, 99.2, 99.3, 99.4, etc.).
如本文所用,术语“含有”或“包括(包含)”可以是开放式、半封闭式和封闭式的。换言之,所述术语也包括“基本上由…构成”、或“由…构成”。 As used herein, the term "comprising" or "including (comprising)" may be open, semi-closed and closed. In other words, the term also includes "consisting essentially of" or "consisting of".
序列同一性(或同源性)通过沿着预定的比较窗(其可以是参考核苷酸序列或蛋白的长度的50%、60%、70%、80%、90%、95%或100%)比较两个对齐的序列,并且确定出现相同的残基的位置的数目来确定。通常地,这表示为百分比。核苷酸序列的序列同一性的测量是本领域技术人员熟知的方法。Sequence identity (or homology) is determined by comparing two aligned sequences along a predetermined comparison window (which can be 50%, 60%, 70%, 80%, 90%, 95% or 100% of the length of the reference nucleotide sequence or protein) and determining the number of positions at which identical residues occur. Typically, this is expressed as a percentage. The measurement of sequence identity of nucleotide sequences is a method well known to those skilled in the art.
基因编辑蛋白Gene editing proteins
在本发明中,基因编辑蛋白为CRISPR/Cas系统中的效应蛋白。In the present invention, the gene editing protein is the effector protein in the CRISPR/Cas system.
在本发明中,Cas蛋白、Cas酶、Cas效应蛋白可以互换使用,Cas蛋白取其最广泛的含义,包含野生型Cas蛋白,其衍生物或变体、类似物,及其功能性片段例如寡核苷酸结合片段。In the present invention, Cas protein, Cas enzyme, and Cas effector protein can be used interchangeably. Cas protein is taken in the broadest sense, including wild-type Cas protein, its derivatives or variants, analogs, and functional fragments thereof such as oligonucleotide binding fragments.
术语“野生型”具有本领域技术人员通常理解的含义,其表示生物、菌株、基因、蛋白的典型形式或者当它在自然界存在时区别于突变体或变体形式的特征,其可从自然中的来源分离并且没有被人为有意地修饰。The term "wild type" has the meaning generally understood by those skilled in the art, which refers to the typical form of an organism, strain, gene, protein, or the characteristics that distinguish it from mutant or variant forms when it exists in nature, which can be isolated from a source in nature and has not been intentionally modified by man.
术语“变体”、“衍生物”和“类似物”是指基本上保持本发明Cas蛋白的功能或活性的多肽。The terms "variant", "derivative" and "analog" refer to polypeptides that substantially retain the function or activity of the Cas protein of the present invention.
通常,蛋白的衍生化不会不利影响该蛋白的期望活性(例如,与向导RNA结合的活性、核酸内切酶活性、在向导RNA引导下与靶序列特定位点结合并切割的活性),也就是说蛋白的衍生物与蛋白有相同的活性。“衍生物”的经修饰形式包括蛋白的一个或多个氨基酸可以被缺失、插入、修饰和/或取代。术语“非天然存在的”或“工程化的”可互换地使用并且表示人工的参与。Typically, the derivatization of a protein does not adversely affect the desired activity of the protein (e.g., activity binding to a guide RNA, endonuclease activity, activity binding to and cutting a specific site of a target sequence under the guidance of a guide RNA), that is, the derivative of the protein has the same activity as the protein. A modified form of a "derivative" includes one or more amino acids of the protein that may be deleted, inserted, modified and/or substituted. The terms "non-naturally occurring" or "engineered" are used interchangeably and indicate artificial involvement.
在一个方面,本发明提供了一种基因编辑蛋白(Cas蛋白),其包含与SEQ ID NO.1的氨基酸序列具有至少80%,81%,82%,83%,84%,85%,86%,87%,88%,89%,90%,91%,92%,93%,94%,95%,96%,97%,98%,99%或100%同一性的氨基酸序列,并且基本保留了其源自的序列的生物学功能;In one aspect, the present invention provides a gene editing protein (Cas protein) comprising an amino acid sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% identity with the amino acid sequence of SEQ ID NO.1, and substantially retains the biological function of the sequence from which it is derived;
在一个实施方式中,所述Cas蛋白的氨基酸序列与SEQ ID NO.1的氨基酸序列相比,具有一个或多个氨基酸的置换、缺失或添加的序列,并且基本保留了其源自的序列的生物学功能;In one embodiment, the amino acid sequence of the Cas protein has one or more amino acid substitutions, deletions or additions compared to the amino acid sequence of SEQ ID NO.1, and substantially retains the biological function of the sequence from which it is derived;
在一个实施方式中,所述的Cas蛋白,其包含SEQ ID NO.1所示的氨基酸序列;In one embodiment, the Cas protein comprises the amino acid sequence shown in SEQ ID NO.1;
或与SEQ ID NO.1所示的序列相比,具有一个或多个氨基酸的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个氨基酸的置换、缺失或添加)的序列;或与SEQ ID NO.1所示的氨基酸序列具有至少90%,91%,92%,93%,94%,95%,96%,97%,98%,99%或100%的氨基酸序列同一性的序列;or a sequence having one or more amino acid substitutions, deletions or additions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions) compared to the sequence shown in SEQ ID NO.1; or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity with the amino acid sequence shown in SEQ ID NO.1;
在一个实施方式中,所述Cas蛋白具有SEQ ID NO.1所示的氨基酸序列。In one embodiment, the Cas protein has an amino acid sequence shown in SEQ ID NO.1.
本领域技术人员清楚,可以改变蛋白质的结构而不对其活性和功能性产生不利影响,例如在蛋白质氨基酸序列中引入一个或多个保守性氨基酸取代,而不会对蛋白质分子的活性和/或三维结构产生不利影响。It is clear to those skilled in the art that the structure of a protein can be changed without adversely affecting its activity and functionality, for example, by introducing one or more conservative amino acid substitutions into the amino acid sequence of the protein without adversely affecting the activity and/or three-dimensional structure of the protein molecule.
本领域技术人员清楚保守性氨基酸取代的实例以及实施方式。具体的说,可以用与待取代位点属于相同组的另一氨基酸残基取代该氨基酸残基,即用非 极性氨基酸残基取代另一非极性氨基酸残基,用极性不带电荷的氨基酸残基取代另一极性不带电荷的氨基酸残基,用碱性氨基酸残基取代另一碱性氨基酸残基,和用酸性氨基酸残基取代另一酸性氨基酸残基。这样的取代的氨基酸残基可以是也可以不是由遗传密码编码的。只要取代不导致蛋白质生物活性的失活,则一种氨基酸被属于同组的其他氨基酸替换的保守取代落在本发明的范围内。因此,本发明的蛋白可以在氨基酸序列中包含一个或多个保守性取代,这些保守性取代最好根据表A进行替换而产生。另外,本发明也涵盖还包含一个或多个其他非保守取代的蛋白,只要该非保守取代不显著影响本发明的蛋白质的所需功能和生物活性即可。Those skilled in the art will be aware of examples and embodiments of conservative amino acid substitutions. Specifically, an amino acid residue may be substituted with another amino acid residue belonging to the same group as the site to be substituted, i.e., with a non- A polar amino acid residue replaces another non-polar amino acid residue, a polar uncharged amino acid residue replaces another polar uncharged amino acid residue, a basic amino acid residue replaces another basic amino acid residue, and an acidic amino acid residue replaces another acidic amino acid residue. Such substituted amino acid residues may or may not be encoded by the genetic code. As long as the substitution does not lead to the inactivation of the biological activity of the protein, conservative substitutions in which one amino acid is replaced by other amino acids belonging to the same group fall within the scope of the present invention. Therefore, the protein of the present invention may contain one or more conservative substitutions in the amino acid sequence, and these conservative substitutions are preferably produced by substitution according to Table A. In addition, the present invention also encompasses proteins that also contain one or more other non-conservative substitutions, as long as the non-conservative substitutions do not significantly affect the desired functions and biological activities of the protein of the present invention.
保守氨基酸置换可以在一个或多个预测的非必需氨基酸残基处进行。“非必需”氨基酸残基是可以发生改变(缺失、取代或置换)而不改变生物活性的氨基酸残基,而“必需”氨基酸残基是生物活性所需的。“保守氨基酸置换”是其中氨基酸残基被具有类似侧链的氨基酸残基替代的置换。氨基酸置换可以在Cas酶的非保守区域中进行。一般而言,此类置换不对保守的氨基酸残基,或者不对位于保守基序内的氨基酸残基进行,其中此类残基是蛋白质活性所需的。然而,本领域技术人员应当理解,功能变体可以具有较少的在保守区域中的保守或非保守改变。Conservative amino acid substitutions can be performed at one or more predicted non-essential amino acid residues. "Non-essential" amino acid residues are amino acid residues that can be changed (deleted, substituted or replaced) without changing biological activity, while "essential" amino acid residues are required for biological activity. "Conservative amino acid substitutions" are substitutions in which amino acid residues are replaced by amino acid residues with similar side chains. Amino acid substitutions can be performed in non-conserved regions of Cas enzymes. In general, such substitutions are not performed on conserved amino acid residues, or on amino acid residues located within conserved motifs, where such residues are required for protein activity. However, it will be appreciated by those skilled in the art that functional variants may have fewer conservative or non-conservative changes in conserved regions.
表A
Table A
本领域技术人员已经知晓,从蛋白质的N和/或C末端改变(置换、删除、截短或插入)一或多个氨基酸残基而仍可以保留其功能活性。因此,从本发明的Cas蛋白的N和/或C末端改变了一或多个氨基酸残基、同时保留了其所需功能活性的蛋白,也在本发明的范围内。这些改变可以包括通过现代分子方法例如PCR而引入的改变,所述方法包括借助于在PCR扩增中使用的寡核苷酸之中包含氨基酸编码序列而改变或延长蛋白质编码序列的PCR扩增。It is known to those skilled in the art that one or more amino acid residues can be changed (replaced, deleted, truncated or inserted) from the N and/or C terminus of a protein while still retaining its functional activity. Therefore, proteins that have changed one or more amino acid residues from the N and/or C terminus of the Cas protein of the present invention while retaining its desired functional activity are also within the scope of the present invention. These changes may include changes introduced by modern molecular methods such as PCR, which includes PCR amplification of a protein coding sequence by means of an amino acid coding sequence included in the oligonucleotides used in the PCR amplification to change or extend the PCR amplification.
应认识到,蛋白质可以以各种方式进行改变,包括氨基酸置换、删除、截短和插入,用于此类操作的方法是本领域技术人员通常已知晓的。It will be appreciated that proteins can be altered in a variety of ways, including amino acid substitutions, deletions, truncations, and insertions, and methods for such manipulations are generally known to those skilled in the art.
例如,可以通过对DNA的突变来制备Cas蛋白的氨基酸序列变体。还可以通过其他诱变形式和/或通过定向进化来完成,例如,使用已知的诱变、重组和/或改组方法,结合相关的筛选方法,来进行一或多个氨基酸取代;或一至多个氨基酸的缺失和/一至多个氨基酸插入。For example, amino acid sequence variants of Cas proteins can be prepared by mutation of DNA. It can also be accomplished by other forms of mutagenesis and/or by directed evolution, for example, using known mutagenesis, recombination and/or shuffling methods, combined with relevant screening methods, to perform one or more amino acid substitutions; or one to more amino acid deletions and/or one to more amino acid insertions.
本领域技术人员能够理解,本发明Cas蛋白中的这些微小氨基酸变化可以出现(例如,天然存在的突变)或者产生(例如,可使用r-DNA技术)而不损失蛋白质功能或活性。如果这些突变出现在蛋白的催化结构域、活性位点或其它功能结构域中,则多肽的性质可改变,但多肽可保持其活性。如果存在的突变不接近催化结构域、活性位点或其它功能结构域中,则可预期影响较小。Those skilled in the art will appreciate that these minor amino acid changes in the Cas proteins of the present invention can occur (e.g., naturally occurring mutations) or be produced (e.g., using r-DNA technology) without loss of protein function or activity. If these mutations occur in the catalytic domain, active site, or other functional domain of the protein, the properties of the polypeptide may be changed, but the polypeptide may retain its activity. If the mutations present are not close to the catalytic domain, active site, or other functional domain, it can be expected that the impact will be small.
本领域技术人员可以根据本领域已知的方法,例如定位诱变或蛋白进化或生物信息系的分析,来鉴定Cas蛋白的必需氨基酸。蛋白的催化结构域、活性位点或其它功能结构域也能够通过结构的物理分析而确定,如通过以下这些技术:如核磁共振、晶体学、电子衍射或光亲和标记,结合推定的关键位点氨基酸的突变来确定。Those skilled in the art can identify the essential amino acids of the Cas protein according to methods known in the art, such as site-directed mutagenesis or protein evolution or analysis of bioinformatics systems. The catalytic domain, active site or other functional domain of the protein can also be determined by physical analysis of the structure, such as by the following techniques: such as nuclear magnetic resonance, crystallography, electron diffraction or photoaffinity labeling, combined with mutations of amino acids at putative key sites.
直系同源物(orthologue,ortholog)Orthologue (ortholog)
如本文中所使用的,术语“直系同源物(orthologue,ortholog)”具有本领域技术人员通常理解的含义。作为进一步指导,如本文中所述的蛋白质的“直系同源物”是指属于不同物种的蛋白质,该蛋白质执行与作为其直系同源物的蛋白相同或相似的功能。As used herein, the term "orthologue" has the meaning commonly understood by those skilled in the art. As a further guide, an "orthologue" of a protein as described herein refers to a protein belonging to a different species that performs the same or similar function as its orthologue.
本公开的核酸切割包括:由所述Cas蛋白产生的靶核酸中的DNA或RNA断裂(Cis切割)、利用Cas蛋白旁切活性导致的DNA或RNA在侧枝核酸底物(单链核酸底物)中的断裂(即非特异性或非靶向性,Trans切割)。在一些实施方式中,所述切割是双链DNA断裂。在一些实施方案中,切割是单链DNA断裂或单链RNA断裂。The nucleic acid cleavage disclosed herein includes: DNA or RNA breakage in the target nucleic acid produced by the Cas protein (Cis cleavage), DNA or RNA breakage in the side branch nucleic acid substrate (single-stranded nucleic acid substrate) caused by the Cas protein side cutting activity (i.e., non-specific or non-targeted, Trans cleavage). In some embodiments, the cleavage is a double-stranded DNA break. In some embodiments, the cleavage is a single-stranded DNA break or a single-stranded RNA break.
Trans切割是指在某些环境中,激活的Cas12家族蛋白在结合靶序列后仍然保持活性,并继续非特异性地切割非靶寡核苷酸。该旁切活性能够使用Cas系统检测特定靶寡核苷酸的存在。例如,将Cas12i系统工程化以非特异性切割ssDNA或转录物。旁切活性被用于称为SHERLOCK的高灵敏度和特异性核酸检测平台,可用于许多临床诊断(Gootenberg,J.S.等人,Nucleic acid detection with CRISPR-Cas13a/C2c2.Science 356,438-442(2017))。 Trans cutting refers to that in certain environments, the activated Cas12 family protein remains active after binding to the target sequence, and continues to non-specifically cut non-target oligonucleotides. The side cutting activity can detect the presence of specific target oligonucleotides using the Cas system. For example, the Cas12i system is engineered to non-specifically cut ssDNA or transcripts. Side cutting activity is used for a highly sensitive and specific nucleic acid detection platform called SHERLOCK, which can be used for many clinical diagnoses (Gootenberg, JS et al., Nucleic acid detection with CRISPR-Cas13a/C2c2. Science 356, 438-442 (2017)).
融合蛋白Fusion Protein
在一个方面,本发明提供了一种融合蛋白,所述融合蛋白包括前述任一项所述的Cas蛋白和一个或多个功能结构域。In one aspect, the present invention provides a fusion protein, comprising the Cas protein described in any one of the foregoing and one or more functional domains.
在一个实施方式中,所述功能结构域包括定位信号、报告蛋白、Cas蛋白靶向部分、DNA结合域、表位标签、转录激活域、转录抑制域、核酸酶、脱氨结构域、甲基化酶、脱甲基酶、转录释放因子、HDAC、裂解活性多肽、连接酶中的一种或多种;In one embodiment, the functional domain includes one or more of a localization signal, a reporter protein, a Cas protein targeting portion, a DNA binding domain, an epitope tag, a transcription activation domain, a transcription repression domain, a nuclease, a deamination domain, a methylase, a demethylase, a transcription release factor, an HDAC, a cleavage active polypeptide, and a ligase;
在一个实施例中,“甲基化酶”,示例性的,例如HhaIDNA m5c-甲基转移酶(M.HhaI)、DNA甲基转移酶1(DNMT1)、DNA甲基转移酶3a(DNMT3a)、DNA甲基转移酶3b(DNMT3b)、METI、DRM3、ZMET2、CMT1、CMT2等。In one embodiment, "methylase" is exemplified by HhaIDNA m5c-methyltransferase (M.HhaI), DNA methyltransferase 1 (DNMT1), DNA methyltransferase 3a (DNMT3a), DNA methyltransferase 3b (DNMT3b), METI, DRM3, ZMET2, CMT1, CMT2, etc.
“脱甲基化酶”是指从核酸、蛋白(例如,组蛋白)和其他分子中去除甲基(CH3-)基团的酶。脱甲基化酶在表观遗传修饰机制中很重要。脱甲基化酶蛋白通过控制DNA和组蛋白上发生的甲基化水平来改变基因组的转录调控,并且进而调控生物体内特定基因座处的染色质状态,例如TET1(ten-eleven translocation 1)、十-十一易位(TET)双加氧酶1(TET1CD)、DME、DML1、DML2、ROS1等。"Demethylase" refers to an enzyme that removes methyl (CH3-) groups from nucleic acids, proteins (e.g., histones), and other molecules. Demethylases are important in epigenetic modification mechanisms. Demethylase proteins change the transcriptional regulation of the genome by controlling the methylation levels that occur on DNA and histones, and in turn regulate the chromatin state at specific loci in the organism, such as TET1 (ten-eleven translocation 1), ten-eleven translocation (TET) dioxygenase 1 (TET1CD), DME, DML1, DML2, ROS1, etc.
在另一优选例中,所述转录释放因子,示例性的,例如真核释放因子1(ERF1)活性、真核释放因子3(ERF3)。In another preferred embodiment, the transcriptional release factor, for example, is eukaryotic release factor 1 (ERF1) activity or eukaryotic release factor 3 (ERF3).
在一个实施方式中,所述功能结构域选自腺苷脱氨酶催化结构域或胞苷脱氨酶催化结构域。In one embodiment, the functional domain is selected from an adenosine deaminase catalytic domain or a cytidine deaminase catalytic domain.
在一个实施方式中,所述定位信号包括核定位信号和/或核输出信号;In one embodiment, the localization signal comprises a nuclear localization signal and/or a nuclear export signal;
优选地,所述核输出信号包括人类蛋白酪氨酸激酶2;Preferably, the nuclear export signal comprises human protein tyrosine kinase 2;
优选地,所述报告蛋白包括谷胱甘肽-S-转移酶、辣根过氧化物酶、氯霉素乙酰转移酶、β-半乳糖苷酶、β-葡糖醛酸糖苷酶或自发荧光蛋白中的一种或多种;Preferably, the reporter protein comprises one or more of glutathione-S-transferase, horseradish peroxidase, chloramphenicol acetyltransferase, β-galactosidase, β-glucuronidase or autofluorescent protein;
优选地,所述自发荧光蛋白包括绿色荧光蛋白、HcRed、DsRed、青色荧光蛋白、黄色荧光蛋白或蓝色荧光蛋白中的一种或多种;Preferably, the autofluorescent protein includes one or more of green fluorescent protein, HcRed, DsRed, cyan fluorescent protein, yellow fluorescent protein or blue fluorescent protein;
优选地,所述DNA结合域包括甲基化结合蛋白、LexADBD或Gal4DBD中的一种或多种;Preferably, the DNA binding domain comprises one or more of methylation binding protein, LexADBD or Gal4DBD;
优选地,所述表位标签包括组氨酸标签、V5标签、FLAG标签、流感病毒血凝素标签、Myc标签、VSV-G标签或硫氧还蛋白标签中的一种或多种;Preferably, the epitope tag comprises one or more of a histidine tag, a V5 tag, a FLAG tag, an influenza virus hemagglutinin tag, a Myc tag, a VSV-G tag or a thioredoxin tag;
优选地,所述转录激活域包括VP64和/或VPR;Preferably, the transcriptional activation domain comprises VP64 and/or VPR;
优选地,所述转录抑制域包括KRAB和/或SID;Preferably, the transcriptional repression domain comprises KRAB and/or SID;
优选地,所述核酸酶包括FokI;Preferably, the nuclease comprises FokI;
优选地,所述脱氨结构域包括ADAR1、ADAR2、APOBEC、AID或TAD中的一种或多种;Preferably, the deamination domain comprises one or more of ADAR1, ADAR2, APOBEC, AID or TAD;
优选地,所述裂解活性多肽包括具有单链RNA裂解活性的多肽、具有双链RNA裂解活性的多肽、具有单链DNA裂解活性的多肽或具有双链DNA裂解活性的多肽;Preferably, the cleavage active polypeptide includes a polypeptide having single-stranded RNA cleavage activity, a polypeptide having double-stranded RNA cleavage activity, a polypeptide having single-stranded DNA cleavage activity or a polypeptide having double-stranded DNA cleavage activity;
优选地,所述连接酶包括DNA连接酶和/或RNA连接酶。Preferably, the ligase comprises DNA ligase and/or RNA ligase.
在一个实施方式中,所述功能结构域是TadA8e的全长或功能性片段。 In one embodiment, the functional domain is the full length or a functional fragment of TadA8e.
多核苷酸Polynucleotide
在一个方面,本发明提供了一种多核苷酸,所述多核苷酸为编码所述基因编辑蛋白(Cas蛋白)的多核苷酸序列,或编码前述所述融合蛋白的多核苷酸序列。In one aspect, the present invention provides a polynucleotide, which is a polynucleotide sequence encoding the gene editing protein (Cas protein), or a polynucleotide sequence encoding the aforementioned fusion protein.
在一个实施方式中,所述多核苷酸(DNA分子)包括与SEQ ID NO.2所述的核苷酸序列具有70%以上,优选90%以上,更优选95%以上,进一步优选99%,更进一步优选为100%同一性的核苷酸。In one embodiment, the polynucleotide (DNA molecule) includes nucleotides that have more than 70%, preferably more than 90%, more preferably more than 95%, further preferably 99%, and further preferably 100% identity with the nucleotide sequence described in SEQ ID NO.2.
在一个实施方式中,所述多核苷酸为根据宿主细胞的密码子偏好性进行密码子优化的DNA分子。In one embodiment, the polynucleotide is a DNA molecule that is codon-optimized according to the codon preference of the host cell.
本公开所述的优化可能需要对编码蛋白(例如本公开的Cas蛋白)的核苷酸序列的突变以模拟预期的宿主生物体或细胞同时编码相同蛋白质时的密码子偏好。因此,密码子可改变,但编码的蛋白质保持不变。例如,如果预期的靶细胞是人细胞,可使用人密码子优化的编码蛋白的核苷酸序列。作为另一非限制性实施例,如果预期的宿主细胞是动物细胞(例如小鼠细胞、昆虫细胞),则可生成该动物密码子优化的编码蛋白的核苷酸序列。作为另一个非限制性实施例,如果预期的宿主细胞是植物细胞,则可生成植物密码子优化的编码蛋白的核苷酸序列。The optimization described in the present disclosure may require mutations in the nucleotide sequence of the coded protein (e.g., the Cas protein of the present disclosure) to simulate the codon preference of the expected host organism or cell when encoding the same protein at the same time. Therefore, the codon can be changed, but the encoded protein remains unchanged. For example, if the expected target cell is a human cell, a nucleotide sequence of the coded protein optimized by human codons can be used. As another non-limiting embodiment, if the expected host cell is an animal cell (e.g., a mouse cell, an insect cell), a nucleotide sequence of the coded protein optimized by the animal codon can be generated. As another non-limiting embodiment, if the expected host cell is a plant cell, a nucleotide sequence of the coded protein optimized by plant codons can be generated.
密码子的选择列表很容易获得,例如,在www.kazusa.or.jp/codon上可获得的“密码子用法数据库”。在一些情况下,本公开的核酸包含编码CasY7或其变体或其融合蛋白的核苷酸序列,所述核苷酸序列被密码子优化以在真核细胞中表达。在一些情况下,本公开的核酸包含编码CasW1或其变体或其融合蛋白的核苷酸序列,所述核苷酸序列被密码子优化以在动物细胞中表达。在一些情况下,本公开的核酸包含编码CasW1或其变体或其融合蛋白的核苷酸序列,所述核苷酸序列被密码子优化以在真菌细胞中表达。在一些情况下,本公开的核酸包含编码CasW1或其变体或其融合蛋白的核苷酸序列,所述核苷酸序列被密码子优化以在植物细胞中表达。A selection list of codons is readily available, for example, the "Codon Usage Database" available at www.kazusa.or.jp/codon. In some cases, the nucleic acid of the present disclosure comprises a nucleotide sequence encoding CasY7 or a variant thereof or a fusion protein thereof, the nucleotide sequence being codon-optimized for expression in eukaryotic cells. In some cases, the nucleic acid of the present disclosure comprises a nucleotide sequence encoding CasW1 or a variant thereof or a fusion protein thereof, the nucleotide sequence being codon-optimized for expression in animal cells. In some cases, the nucleic acid of the present disclosure comprises a nucleotide sequence encoding CasW1 or a variant thereof or a fusion protein thereof, the nucleotide sequence being codon-optimized for expression in fungal cells. In some cases, the nucleic acid of the present disclosure comprises a nucleotide sequence encoding CasW1 or a variant thereof or a fusion protein thereof, the nucleotide sequence being codon-optimized for expression in plant cells.
在一个实施方式中,宿主细胞包括原核细胞或真核细胞。In one embodiment, the host cell comprises a prokaryotic cell or a eukaryotic cell.
CRISPR系统CRISPR system
术语“规律成簇的间隔短回文重复(CRISPR)-CRISPR-相关(Cas)(CRISPR-Cas)系统”或“CRISPR系统”可互换地使用并且具有本领域技术人员通常理解的含义,其通常包含与CRISPR相关(“Cas”)基因的表达有关的转录产物或其他元件,或者能够指导所述Cas基因活性的转录产物或其他元件。The terms “Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) (CRISPR-Cas) system” or “CRISPR system” are used interchangeably and have the meaning commonly understood by those skilled in the art, which generally includes transcription products or other elements related to the expression of CRISPR-associated (“Cas”) genes, or transcription products or other elements capable of directing the activity of the Cas genes.
CRISPR-Cas组合物CRISPR-Cas Composition
在一个方面,本发明还提供了一种CRISPR-Cas组合物,所述组合物包含:In one aspect, the present invention also provides a CRISPR-Cas composition, comprising:
(1)蛋白组分:前述的基因编辑蛋白(Cas蛋白),或前述的融合蛋白;或编码所述基因编辑蛋白(Cas蛋白)或所述的融合蛋白的核酸分子;(1) Protein component: the aforementioned gene editing protein (Cas protein), or the aforementioned fusion protein; or a nucleic acid molecule encoding the gene editing protein (Cas protein) or the aforementioned fusion protein;
(2)RNA组分:向导RNA,或一种或多种编码所述向导RNA的核酸,或向导RNA的前体RNA,或编码所述向导RNA的前体RNA的核酸;(2) RNA component: guide RNA, or one or more nucleic acids encoding the guide RNA, or a precursor RNA of the guide RNA, or a nucleic acid encoding a precursor RNA of the guide RNA;
所述蛋白组分与核酸组分相互结合形成复合物。 The protein component and the nucleic acid component are combined with each other to form a complex.
在一个实施方式中,所述的组合物为活化的CRISPR复合物,所述活化的CRISPR复合物进一步包含:结合在所述向导RNA上的靶核酸的靶序列。In one embodiment, the composition is an activated CRISPR complex, and the activated CRISPR complex further comprises: a target sequence of a target nucleic acid bound to the guide RNA.
在一个实施方式中,所述的CRISPR-Cas组合物,包括一个或多个载体,所述一个或多个载体包含:In one embodiment, the CRISPR-Cas composition comprises one or more vectors, wherein the one or more vectors comprise:
(1)第一调控元件,所述第一调控元件可操作地连接至编码所述的基因编辑蛋白(Cas蛋白)的核苷酸序列或编码所述的融合蛋白的核苷酸序列;和(1) a first regulatory element, which is operably linked to a nucleotide sequence encoding the gene editing protein (Cas protein) or a nucleotide sequence encoding the fusion protein; and
(2)第二调控元件,所述第二调控元件可操作地连接至编码所述的向导RNA的核苷酸序列,所述向导RNA包含:(2) a second regulatory element, wherein the second regulatory element is operably linked to a nucleotide sequence encoding the guide RNA, wherein the guide RNA comprises:
(a)能够与靶核酸的靶序列杂交的间隔(Spacer)序列,和(a) a spacer sequence capable of hybridizing with a target sequence of a target nucleic acid, and
(b)连接至所述间隔(Spacer)序列的能够引导所述基因编辑蛋白(Cas蛋白)结合至所述向导RNA以形成靶向所述靶序列的CRISPR-Cas复合物的直接重复(Direct Repeat,DR)序列;(b) a direct repeat (DR) sequence connected to the spacer sequence and capable of guiding the gene editing protein (Cas protein) to bind to the guide RNA to form a CRISPR-Cas complex targeting the target sequence;
其中所述第一调控元件和所述第二调控元件位于所述CRISPR-Cas载体系统的相同或不同载体上。Wherein the first regulatory element and the second regulatory element are located on the same or different vectors of the CRISPR-Cas vector system.
在一个实施方式中,所述第一调控元件或第二调控元件包括启动子,所述启动子包括诱导型启动子、组成型启动子或组织特异性启动子中的一种或多种;In one embodiment, the first regulatory element or the second regulatory element comprises a promoter, and the promoter comprises one or more of an inducible promoter, a constitutive promoter, or a tissue-specific promoter;
在一个实施方式中,所述启动子包括T7、SP6、T3、CMV、EF1a、SV40、PGK1、humanβ-actin、CAG、U6、H1、T7、T7lac、araBAD、trp、lac或Ptac中的一种或多种;In one embodiment, the promoter comprises one or more of T7, SP6, T3, CMV, EF1a, SV40, PGK1, human β-actin, CAG, U6, H1, T7, T7lac, araBAD, trp, lac or Ptac;
在一个实施方式中,所述第一调控元件和第二调控元件位于相同或不同载体上。In one embodiment, the first regulatory element and the second regulatory element are located on the same or different vectors.
在一个实施方式中,所述载体包括逆转录病毒载体、慢病毒载体、腺病毒载体、腺相关病毒载体、单纯疱疹载体或噬菌粒载体;In one embodiment, the vector comprises a retroviral vector, a lentiviral vector, an adenoviral vector, an adeno-associated viral vector, a herpes simplex vector, or a phagemid vector;
在一个实施方式中,所述载体包括质粒载体。In one embodiment, the vector comprises a plasmid vector.
在一个实施方式中,所述靶核酸包括来源于真核生物的DNA或来源于原核生物的DNA;In one embodiment, the target nucleic acid comprises DNA derived from a eukaryotic organism or DNA derived from a prokaryotic organism;
在一个实施方式中,所述真核生物包括动物或植物;In one embodiment, the eukaryotic organism includes an animal or a plant;
在一个实施方式中,所述靶核酸包括非人类哺乳动物DNA、人类DNA、昆虫DNA、鸟类DNA、爬行动物DNA、两栖动物DNA、啮齿动物DNA、鱼类DNA、蠕虫DNA、线虫DNA或酵母DNA;In one embodiment, the target nucleic acid comprises non-human mammal DNA, human DNA, insect DNA, bird DNA, reptile DNA, amphibian DNA, rodent DNA, fish DNA, worm DNA, nematode DNA, or yeast DNA;
在一个实施方式中,所述非人类哺乳动物DNA包括非人类灵长类DNA。In one embodiment, the non-human mammalian DNA comprises non-human primate DNA.
CRISPR/Cas复合物CRISPR/Cas complex
术语“CRISPR/Cas复合物”是指,向导RNA、gRNA(guide RNA)或成熟crRNA(或指导RNA)与基因编辑蛋白(Cas蛋白)结合所形成的复合体,其包含杂交到靶序列的引导序列上并且与基因编辑蛋白(Cas蛋白)结合的同向重复序列,该复合体能够识别并切割能与该指导RNA或成熟crRNA杂交的靶核苷酸。The term "CRISPR/Cas complex" refers to a complex formed by the binding of guide RNA, gRNA (guide RNA) or mature crRNA (or guide RNA) and gene editing protein (Cas protein), which contains a co-directional repeat sequence that hybridizes to the guide sequence of the target sequence and binds to the gene editing protein (Cas protein), and the complex can recognize and cut the target nucleotide that can hybridize with the guide RNA or mature crRNA.
向导RNA(gRNA,guide RNA)Guide RNA (gRNA)
术语“向导RNA(guide RNA,gRNA)”、“成熟crRNA”、“crRNA”、“指导序列”、“指导RNA”可互换地使用并且具有本领域技术人员通常理解的含义。一般而言,指导RNA可以包含同向重复序列(direct repeat,DR)序列和间隔(spacer) 序列,或者基本上由或由同向重复(DR)序列和间隔(spacer)序列组成。The terms "guide RNA (gRNA)", "mature crRNA", "crRNA", "guide sequence", and "guide RNA" are used interchangeably and have the meanings commonly understood by those skilled in the art. In general, a guide RNA may include a direct repeat (DR) sequence and a spacer. The sequence may consist essentially of or consist of a direct repeat (DR) sequence and a spacer sequence.
在某些情况下,间隔(spacer)序列是与靶序列具有足够互补性从而与所述靶序列杂交并引导CRISPR-Cas复合物与所述靶序列的特异性结合的任何多核苷酸序列。在一个实施方式中,当最佳比对时,间隔(spacer)序列与其相应靶序列之间的互补程度为至少50%、至少60%、至少70%、至少80%、至少90%、至少95%、或至少99%。指导序列包含与靶核酸序列具有足够互补性以与靶核酸序列杂交并引导复合物与靶核酸序列的序列特异性结合的序列(例如直接重复(DR)序列)。In some cases, the spacer sequence is any polynucleotide sequence that has sufficient complementarity with the target sequence to hybridize with the target sequence and guide the CRISPR-Cas complex to specifically bind to the target sequence. In one embodiment, when optimally aligned, the degree of complementarity between the spacer sequence and its corresponding target sequence is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%. The guide sequence comprises a sequence (e.g., a direct repeat (DR) sequence) that has sufficient complementarity with the target nucleic acid sequence to hybridize with the target nucleic acid sequence and guide the sequence-specific binding of the complex to the target nucleic acid sequence.
在本领域已知,在有足够的互补性发挥作用的基础上,不需要完全的互补性,因此,在需要的情况下,可以通过引入错配(例如,间隔(spacer)序列与靶核酸之间的一个或多个错配,诸如1或2个核苷酸的错配(包括沿着间隔序列/靶标序列的错配的位置))来实现对切割效率的调节。例如,如果期望靶标的小于100%的切割率(例如,在细胞群体中),则间隔序列中可以引入间隔序列与靶序列之间的1或2个错配。It is known in the art that on the basis of sufficient complementarity to function, complete complementarity is not required, and therefore, if necessary, the adjustment of the cleavage efficiency can be achieved by introducing mismatches (e.g., one or more mismatches between the spacer sequence and the target nucleic acid, such as 1 or 2 nucleotide mismatches (including the position of the mismatch along the spacer sequence/target sequence)). For example, if a cleavage rate of less than 100% of the target is desired (e.g., in a cell population), 1 or 2 mismatches between the spacer sequence and the target sequence can be introduced into the spacer sequence.
在一个方面,本发明提供了一种向导RNA,所述向导RNA包括能够结合所述的Cas蛋白的同向重复(Direct Repeat,DR)序列和能够靶向靶序列的间隔(spacer)序列。In one aspect, the present invention provides a guide RNA, which includes a direct repeat (DR) sequence capable of binding to the Cas protein and a spacer sequence capable of targeting a target sequence.
在一个实施方式中,所述同向重复序列,所述同向重复序列(Direct Repeat,DR)包含SEQ ID NO.3所示的序列。In one embodiment, the direct repeat sequence, the direct repeat sequence (Direct Repeat, DR) comprises the sequence shown in SEQ ID NO.3.
在一个实施方式中,所述同向重复序列的3’端包含茎环结构,还包括由第一茎核苷酸链和第二茎核苷酸链彼此杂交形成所述茎环结构的茎,所述环核苷酸链形成所述茎环结构的环;In one embodiment, the 3' end of the direct repeat sequence comprises a stem-loop structure, and further comprises a stem of the stem-loop structure formed by hybridization of a first stem nucleotide chain and a second stem nucleotide chain, and the ring nucleotide chain forms a loop of the stem-loop structure;
在一个实施方式中,所述同向重复序列包括与SEQ ID NO.3所述的核苷酸序列具有至少80%同一性的核苷酸序列;In one embodiment, the direct repeat sequence comprises a nucleotide sequence having at least 80% identity with the nucleotide sequence described in SEQ ID NO.3;
在一个实施方式中,所述同向重复序列包括与SEQ ID NO.3所述的核苷酸序列具有至少85%以上,更优选90%以上,进一步优选95%以上同一性的核苷酸序列;In one embodiment, the direct repeat sequence comprises a nucleotide sequence having at least 85% or more, more preferably more than 90%, and further preferably more than 95% identity with the nucleotide sequence described in SEQ ID NO.3;
在一个实施方式中,所述同向重复序列包括SEQ ID NO.3所述的核苷酸序列。In one embodiment, the homeotropic repeat sequence includes the nucleotide sequence described in SEQ ID NO.3.
在一个实施方式中,所述间隔(spacer)序列的80%以上与所述靶核酸互补;In one embodiment, more than 80% of the spacer sequence is complementary to the target nucleic acid;
在一个实施方式中,所述间隔(spacer)序列的90%以上,更优选95%以上,进一步优选99%以上,更进一步优选100%与所述靶核酸互补;In one embodiment, more than 90%, more preferably more than 95%, further preferably more than 99%, and further preferably 100% of the spacer sequence is complementary to the target nucleic acid;
在一个实施方式中,所述间隔(spacer)序列的长度为18-41nt,例如18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41nt,更优选18至27个核苷酸,更优选18至24个核苷酸,最优选18至22个核苷酸。In one embodiment, the length of the spacer sequence is 18-41 nt, for example 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 nt, more preferably 18 to 27 nucleotides, more preferably 18 to 24 nucleotides, and most preferably 18 to 22 nucleotides.
在一个实施方式中,所述间隔(spacer)序列长度为20nt。In one embodiment, the spacer sequence is 20 nt in length.
靶核酸Target nucleic acid
在本发明中,靶核酸与靶序列或靶核酸序列或靶标核酸分子互换使用,是指特定的核酸,其包含与向导RNA中的间隔序列全部或部分互补的核酸序列。 “靶序列”是指被向导RNA中的间隔序列所靶向的多核苷酸,例如与该间隔序列具有互补性的序列,其中靶序列与间隔序列之间的杂交将促进CRISPR-Cas复合物(包括Cas蛋白和向导RNA)的形成。完全互补性不是必需的,只要存在足够互补性以引起杂交并且促进一种CRISPR-Cas复合物的形成即可。在一些实施例中,靶核酸包含非编码区(例如,启动子或终止子)。在一些实施例中,靶核酸是单链的,或双链的。In the present invention, target nucleic acid is used interchangeably with target sequence or target nucleic acid sequence or target nucleic acid molecule, and refers to a specific nucleic acid that comprises a nucleic acid sequence that is fully or partially complementary to the spacer sequence in the guide RNA. "Target sequence" refers to a polynucleotide targeted by a spacer sequence in a guide RNA, such as a sequence having complementarity with the spacer sequence, wherein hybridization between the target sequence and the spacer sequence will promote the formation of a CRISPR-Cas complex (including Cas protein and guide RNA). Complete complementarity is not required, as long as there is enough complementarity to cause hybridization and promote the formation of a CRISPR-Cas complex. In some embodiments, the target nucleic acid comprises a non-coding region (e.g., a promoter or terminator). In some embodiments, the target nucleic acid is single-stranded, or double-stranded.
靶序列可以包含任何多核苷酸,如DNA。在某些情况下,所述靶序列位于细胞内或细胞外。在某些情况下,所述靶序列位于细胞的细胞核、细胞质、细胞器(例如线粒体或叶绿体)内。The target sequence can comprise any polynucleotide, such as DNA. In some cases, the target sequence is located in a cell or outside the cell. In some cases, the target sequence is located in the nucleus, cytoplasm, or organelle (e.g., mitochondria or chloroplast) of the cell.
该靶核酸可以是一个编码基因产物(例如,蛋白质)的序列或一个非编码序列(例如,调节多核苷酸或无用DNA)。在某些情况下,该靶序列应该与原间隔序列临近基序(PAM)相关。The target nucleic acid can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or junk DNA). In some cases, the target sequence should be associated with a protospacer adjacent motif (PAM).
供体模板Donor template
本发明中,供体模板核酸或供体模板可互换使用,是指在本文所述基因编辑蛋白(Cas蛋白)改变了靶核酸之后,一种或多种细胞蛋白质可以使用其来改变靶核酸的结构的核酸分子。In the present invention, donor template nucleic acid or donor template are used interchangeably and refer to a nucleic acid molecule that can be used by one or more cellular proteins to change the structure of the target nucleic acid after the gene editing protein (Cas protein) described in this article changes the target nucleic acid.
在一些实施例中,供体模板核酸是双链核酸或单链核酸。在一些实施例中,供体模板核酸是线性的或环状的(例如,质粒)。在一些实例中,供体模板核酸是外源核酸分子。在一些实例中,供体模板核酸是内源核酸分子(例如,染色体)。在一些实施例中,可以利用供体模板实现基因重组,该重组是同源重组。In some embodiments, the donor template nucleic acid is a double-stranded nucleic acid or a single-stranded nucleic acid. In some embodiments, the donor template nucleic acid is linear or circular (e.g., plasmid). In some instances, the donor template nucleic acid is an exogenous nucleic acid molecule. In some instances, the donor template nucleic acid is an endogenous nucleic acid molecule (e.g., chromosome). In some embodiments, the donor template can be used to realize genetic recombination, and the recombination is homologous recombination.
切割Cutting
切割是指由本文所述基因编辑蛋白(Cas蛋白)产生的靶核酸中的DNA断裂。在一些实施例中,切割是双链DNA断裂。在一些实施例中,切割是单链DNA断裂。Cutting refers to DNA breaks in target nucleic acids produced by gene editing proteins (Cas proteins) described herein. In some embodiments, cutting is double-stranded DNA breaks. In some embodiments, cutting is single-stranded DNA breaks.
本发明中,切割靶核酸或修饰靶核酸的含义可以重叠。修饰靶核酸不仅包括对单核苷酸的修饰,还包括核酸片段的插入或缺失。In the present invention, the meanings of cleaving a target nucleic acid or modifying a target nucleic acid may overlap. Modifying a target nucleic acid includes not only modification of a single nucleotide, but also insertion or deletion of a nucleic acid fragment.
报告核酸Reporter nucleic acid
报告核酸是指可被本文所述的激活的CRISPR系统蛋白切割或以其他方式减活的分子。报告核酸包含可被CRISPR蛋白切割的核酸元件(例如,采用单链非靶向核酸分子,其两端包括不同的报告基团或标记分子)。核酸元件的切割产生可检测的信号。在切割之前,或者当报告核酸处于“活性”状态时,报告核酸阻止阳性可检测信号的产生或检测。将理解的是,在某些示例实施方式中,在存在活性报告核酸的情况下可产生最小的背景信号。阳性可检测信号可以是可使用光学、荧光、化学发光、电化学或本领域已知的其他检测方法检测的任何信号。例如,在某些实施方式中,当存在报告核酸时,可检测到第一信号(即阴性可检测信号),然后在检测到靶分子以及通过激活的CRISPR蛋白切割或减活后将其转换为第二信号(例如阳性可检测信号)。报告核酸可以为单链DNA分子、单链RNA分子或单链DNA-RNA杂交体。Reporter nucleic acid refers to a molecule that can be cut or otherwise deactivated by an activated CRISPR system protein as described herein. Reporter nucleic acid comprises a nucleic acid element that can be cut by a CRISPR protein (e.g., using a single-stranded non-targeted nucleic acid molecule, with different reporter groups or labeling molecules at both ends). The cutting of the nucleic acid element produces a detectable signal. Before cutting, or when the reporter nucleic acid is in an "active" state, the reporter nucleic acid prevents the generation or detection of a positive detectable signal. It will be understood that in certain example embodiments, a minimum background signal can be generated in the presence of an active reporter nucleic acid. A positive detectable signal can be any signal that can be detected using optics, fluorescence, chemiluminescence, electrochemistry, or other detection methods known in the art. For example, in certain embodiments, when a reporter nucleic acid is present, a first signal (i.e., a negative detectable signal) can be detected, which is then converted to a second signal (e.g., a positive detectable signal) after the target molecule is detected and cut or deactivated by an activated CRISPR protein. The reporter nucleic acid can be a single-stranded DNA molecule, a single-stranded RNA molecule, or a single-stranded DNA-RNA hybrid.
本发明所述的检测方法,可用于待检测靶核酸的定量检测。所述的定量检 测指标可以根据报告基团的信号强弱进行定量,如根据荧光基团的发光强度,或根据显色条带的宽度等。The detection method of the present invention can be used for quantitative detection of target nucleic acid to be detected. The measurement index can be quantified according to the signal strength of the reporter group, such as the luminescence intensity of the fluorescent group, or the width of the color band.
功能结构域Functional domain
本文中功能结构域取其最广泛的含义,包括蛋白例如酶或因子本身或其具有特定功能片段/结构域。基因编辑蛋白(例如dCas蛋白)与一个或多个功能结构域相连接/缔结,所述功能结构域选自定位信号、报告蛋白、Cas蛋白靶向部分、DNA结合域、表位标签、转录激活域、转录抑制域、核酸酶、脱氨结构域、甲基化酶、脱甲基酶、转录释放因子、HDAC、裂解活性多肽、连接酶中的一种或多种。当包括多于一个功能结构域时,所述功能结构域可以相同或不同。The functional domain herein takes its broadest meaning, including proteins such as enzymes or factors themselves or having specific functional fragments/domains. A gene editing protein (e.g., a dCas protein) is connected/concluded with one or more functional domains, and the functional domains are selected from one or more of a localization signal, a reporter protein, a Cas protein targeting portion, a DNA binding domain, an epitope tag, a transcription activation domain, a transcription inhibition domain, a nuclease, a deamination domain, a methylase, a demethylase, a transcription release factor, an HDAC, a cleavage active polypeptide, and a ligase. When more than one functional domain is included, the functional domains may be the same or different.
脱氨结构域Deamination domain
在本发明中脱氨结构域包括脱氨酶(例如腺苷脱氨酶或胞苷脱氨酶)催化结构域,如本文所用,“腺苷脱氨酶”或“腺苷脱氨酶蛋白”是指蛋白质,多肽,或蛋白质或多肽的一个或多个功能结构域,其能够催化将腺嘌呤(或分子的腺嘌呤部分)转化为次黄嘌呤(或分子的次黄嘌呤部分)的水解脱氨反应。In the present invention, the deamination domain includes a deaminase (e.g., an adenosine deaminase or a cytidine deaminase) catalytic domain. As used herein, "adenosine deaminase" or "adenosine deaminase protein" refers to a protein, a polypeptide, or one or more functional domains of a protein or polypeptide that is capable of catalyzing a hydrolytic deamination reaction that converts adenine (or the adenine portion of a molecule) into hypoxanthine (or the hypoxanthine portion of a molecule).
在一些实施方式中,含腺嘌呤的分子是腺苷(A),并且含次黄嘌呤的分子是肌苷(I)。含腺嘌呤的分子可以是脱氧核糖核酸(DNA)或核糖核酸(RNA)。In some embodiments, the adenine-containing molecule is adenosine (A), and the hypoxanthine-containing molecule is inosine (I). The adenine-containing molecule can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
腺苷脱氨酶包括但不限于称为作用于RNA的腺苷脱氨酶的酶家族成员(ADAR),称为作用于tRNA的腺苷脱氨酶的酶家族成员(ADAT),以及其他含腺苷脱氨酶结构域(ADAD)的家族成员。根据本公开,腺苷脱氨酶能够靶向RNA/DNA和RNA双链体中的腺嘌呤。在特定的实施方式中,腺苷脱氨酶已被修饰以增加其编辑RNA双链体的RNA/DNA异源双链体中的DNA的能力。Adenosine deaminases include, but are not limited to, members of the enzyme family known as adenosine deaminases acting on RNA (ADAR), members of the enzyme family known as adenosine deaminases acting on tRNA (ADAT), and other family members containing adenosine deaminase domains (ADAD). According to the present disclosure, adenosine deaminases are capable of targeting adenine in RNA/DNA and RNA duplexes. In a specific embodiment, adenosine deaminases have been modified to increase their ability to edit DNA in RNA/DNA heteroduplexes of RNA duplexes.
在一些实施例中,脱氨酶是胞苷脱氨酶。术语“胞苷脱氨酶”或“胞苷脱氨酶蛋白”是指蛋白质、多肽或者蛋白质或多肽的一个或多个功能结构域,其能够催化将胞嘧啶(或分子的胞嘧啶部分)转化为尿嘧啶(或分子的尿嘧啶部分)的水解脱氨基反应。在一些实施例中,含胞嘧啶的分子是胞苷(C),并且含尿嘧啶的分子是尿苷(U)。所述含胞嘧啶的分子可以是脱氧核糖核酸(DNA)或核糖核酸(RNA)。In some embodiments, the deaminase is a cytidine deaminase. The term "cytidine deaminase" or "cytidine deaminase protein" refers to a protein, a polypeptide, or one or more functional domains of a protein or polypeptide that is capable of catalyzing a hydrolytic deamination reaction that converts cytosine (or the cytosine portion of a molecule) into uracil (or the uracil portion of a molecule). In some embodiments, the cytosine-containing molecule is cytidine (C), and the uracil-containing molecule is uridine (U). The cytosine-containing molecule can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
胞苷脱氨酶包括但不限于被称为载脂蛋白BmRNA编辑复合物(APOBEC)家族脱氨酶的酶家族的成员,激活诱导的脱氨酶(AID),或胞苷脱氨酶1(CDA1)。在特定的实施方式中,包括APOBEC家族脱氨酶。Cytidine deaminases include, but are not limited to, members of the enzyme family known as apolipoprotein B mRNA editing complex (APOBEC) family deaminases, activation-induced deaminases (AID), or cytidine deaminase 1 (CDA1). In a specific embodiment, an APOBEC family deaminase is included.
在一些实施例中,所述的胞苷脱氨酶包含胞嘧啶脱氨酶的野生型氨基酸序列。在一些实施例中,胞苷脱氨酶在胞嘧啶脱氨酶序列中包含一个或多个突变,使得胞嘧啶脱氨酶的编辑效率和/或底物编辑偏好根据特定需要而改变。In some embodiments, the cytidine deaminase comprises a wild-type amino acid sequence of a cytidine deaminase. In some embodiments, the cytidine deaminase comprises one or more mutations in the cytidine deaminase sequence such that the editing efficiency and/or substrate editing preference of the cytidine deaminase is changed according to specific needs.
同一性Identity
“同一性”用于指两个多肽之间或两个核酸之间序列的匹配情况,“同一性”表示所述多肽或核酸序列之间相同的残基的数目占残基总数的百分比,且基于突变类型确定残基总数的计算。突变类型包括在序列任一端或两端的插入(延伸)、在序列任一端或两端的缺失(截短)、一个或多个氨基酸/核苷酸的置换/替代、在序列内部的插入、在序列内部的缺失。 "Identity" is used to refer to the matching of sequences between two polypeptides or between two nucleic acids. "Identity" means the percentage of the number of identical residues between the polypeptide or nucleic acid sequences to the total number of residues, and the calculation of the total number of residues is determined based on the type of mutation. Mutation types include insertions (extensions) at either or both ends of the sequence, deletions (truncations) at either or both ends of the sequence, substitutions/alternations of one or more amino acids/nucleotides, insertions within the sequence, and deletions within the sequence.
以多肽序列为例,如果突变类型为以下中的一种或多种:一个或多个氨基酸/核苷酸的置换/替代、在序列内部的插入和在序列内部的缺失,则残基总数以比较的分子中较大者来计算。如果突变类型还包括在序列任一端或两端的插入(延伸)或在序列任一端或两端的缺失(截短),则在任一端或两端插入或缺失的氨基酸的数量(例如,在两端插入或缺失的数量小于20个)并不计入残基总数中。在计算同一性百分数时,将正在比较的序列以产生序列之间最大匹配的方式比对,通过特定算法解决比对中的空位(如果存在的话)。核苷酸的同一性计算同理。Taking a polypeptide sequence as an example, if the mutation type is one or more of the following: substitution/replacement of one or more amino acids/nucleotides, insertion within the sequence, and deletion within the sequence, the total number of residues is calculated as the larger of the molecules being compared. If the mutation type also includes an insertion (extension) at either or both ends of the sequence or a deletion (truncation) at either or both ends of the sequence, the number of amino acids inserted or deleted at either or both ends (for example, the number of insertions or deletions at both ends is less than 20) is not included in the total number of residues. When calculating the percent identity, the sequences being compared are aligned in a manner that produces the maximum match between the sequences, and gaps in the alignment (if any) are resolved by a specific algorithm. The same applies to the calculation of nucleotide identity.
载体Carrier
载体为一种核酸分子,能够运送与其连接的另一种核酸分子。A vector is a nucleic acid molecule that is capable of transporting another nucleic acid molecule to which it has been linked.
载体包括但不限于,单链、双链、或部分双链的核酸分子;包括一个或多个自由端、无自由端(例如环状的)的核酸分子;包括DNA、RNA、或两者的核酸分子;以及本领域已知的其他多种多样的多核苷酸。载体可以通过转化,转导或者转染导入宿主细胞,使其携带的遗传物质元件在宿主细胞中获得表达。一种载体可以被引入到宿主细胞中而由此产生转录物、蛋白质、或肽,包括由如本文所述的蛋白、融合蛋白、分离的核酸分子等(例如,CRISPR转录物,如核酸转录物、蛋白或酶)。一种载体可以含有多种控制表达的元件,包括但不限于,启动子序列、转录起始序列、增强子序列、选择元件及报告基因。载体还可含有复制起始位点。Vectors include, but are not limited to, single-stranded, double-stranded, or partially double-stranded nucleic acid molecules; nucleic acid molecules including one or more free ends, or no free ends (e.g., circular); nucleic acid molecules including DNA, RNA, or both; and other various polynucleotides known in the art. Vectors can be introduced into host cells by transformation, transduction, or transfection so that the genetic material elements they carry are expressed in the host cells. A vector can be introduced into a host cell to produce transcripts, proteins, or peptides, including proteins, fusion proteins, isolated nucleic acid molecules, etc. as described herein (e.g., CRISPR transcripts, such as nucleic acid transcripts, proteins, or enzymes). A vector can contain a variety of elements that control expression, including, but not limited to, promoter sequences, transcription start sequences, enhancer sequences, selection elements, and reporter genes. The vector may also contain a replication initiation site.
载体包括质粒、病毒载体,所述质粒是指其中可以通过例如标准分子克隆技术插入另外的DNA片段的环状双链DNA环。病毒载体,其中病毒衍生的DNA或RNA序列存在于用于包装病毒的载体中,病毒包括例如逆转录病毒、复制缺陷型逆转录病毒、腺病毒、复制缺陷型腺病毒及腺相关病毒。病毒载体还包含由用于转染到一种宿主细胞中的病毒携带的多核苷酸。一些载体(例如,具有细菌复制起点的细菌载体和附加型哺乳动物载体)能够在它们被导入的宿主细胞中自主复制。Vectors include plasmids and viral vectors, wherein the plasmid refers to a circular double-stranded DNA loop into which other DNA fragments can be inserted, for example, by standard molecular cloning techniques. Viral vectors, wherein virally derived DNA or RNA sequences are present in vectors for packaging viruses, and viruses include, for example, retroviruses, replication-defective retroviruses, adenoviruses, replication-defective adenoviruses, and adeno-associated viruses. Viral vectors also include polynucleotides carried by viruses for transfection into a host cell. Some vectors (for example, bacterial vectors and episomal mammalian vectors with bacterial replication origins) can replicate autonomously in the host cells into which they are introduced.
其他载体(例如,非附加型哺乳动物载体)在引入宿主细胞后整合到该宿主细胞的基因组中,并且由此与该宿主基因组一起复制。而且,某些载体能够指导它们可操作连接的基因的表达。这样的载体被称为“表达载体”。Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of the host cell after introduction into the host cell, and are replicated together with the host genome. Moreover, some vectors are capable of directing the expression of genes to which they are operably linked. Such vectors are referred to as "expression vectors."
在一些实施方式中,可以通过例如以下方式将载体(例如,病毒载体或非病毒载体,例如慢病毒载体或质粒)递送至目的组织:肌肉内注射、静脉内施用、经皮施用、鼻内施用、口服施用或粘膜施用。上述递送可以是经由单剂量或者多剂量进行的。本领域技术人员应理解的是,本文有待递送的实际剂量可以在很大程度上根据多种因素而变化,该多种因素包括但不限于载体选择、靶细胞、生物体、组织、有待治疗的受试者的一般状况、所寻求的转化/修饰的程度、施用途径、施用方式和所寻求的转化/修饰的类型。In some embodiments, the vector (e.g., a viral vector or a non-viral vector, such as a lentiviral vector or a plasmid) can be delivered to the target tissue by, for example, intramuscular injection, intravenous administration, transdermal administration, intranasal administration, oral administration, or mucosal administration. The above delivery can be carried out via a single dose or multiple doses. It will be appreciated by those skilled in the art that the actual dose to be delivered herein can vary to a large extent according to a variety of factors, including but not limited to vector selection, target cells, organisms, tissues, the general condition of the subject to be treated, the degree of transformation/modification sought, the route of administration, the mode of administration, and the type of transformation/modification sought.
调控元件Regulatory elements
本文中,“调控元件”包括启动子、增强子、内部核糖体进入位点(IRES)和其他表达控制元件(例如转录终止信号,如多聚腺苷酸化信号、poly-U序列),其详细描述可参考Goeddel,GENE EXPRESSIONTECHNOLOGY:METHODS IN ENZYMOLOGY 185,Academic Press,SanDiego,Calif(1990)。在一些情况下,调控元件包括指导一个核苷酸序列在许多类型的宿主细胞中的组成型表达的那些序列以及指导该核苷酸序列只在某些宿主细胞中表达的那些序列(例如,组织特异型调节序列)。组织特异型启动子可主要指导在感兴趣的期望组织中的表达,所述组织例如肌肉、神经元、骨、皮肤、血液、特定的器官(例如肝、胰腺)或特殊的细胞类型(例如淋巴细胞)。在另一些情况下,调控元件还可以时序依赖性方式(如以细胞周期依赖性或发育阶段依赖性方式)指导表达,该方式可以是或者可以不是组织或细胞类型特异性的。Herein, "regulatory elements" include promoters, enhancers, internal ribosome entry sites (IRES) and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals, poly-U sequences), and their detailed descriptions can be found in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif (1990). In some cases, regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct the nucleotide sequence to be expressed only in certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters may primarily direct expression in desired tissues of interest, such as muscle, neurons, bone, skin, blood, specific organs (e.g., liver, pancreas), or special cell types (e.g., lymphocytes). In other cases, regulatory elements may also direct expression in a timing-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue- or cell-type-specific.
术语“启动子”是指一段位于基因的上游能启动下游基因表达的非编码核苷酸序列。组成型启动子是这样的核苷酸序列:当其与编码或者限定基因产物的多核苷酸可操作地相连时,在细胞的大多数或者所有生理条件下,将会导致细胞中基因产物的产生。诱导型启动子是指对内源或外源刺激的存在,例如通过化学化合物(化学诱导剂)响应,或对环境、激素、化学品、和/或发育信号响应,选择性表达编码序列或功能RNA的启动子。诱导型或调节型启动子包括例如通过光、热、胁迫、水淹或干旱、盐胁迫、渗透胁迫、植物激素、伤口或化学品(如乙醇、脱落酸(ABA)、茉莉酮酸酯、水杨酸或安全剂)诱导或调节的启动子。The term "promoter" refers to a non-coding nucleotide sequence that is located upstream of a gene and can initiate downstream gene expression. A constitutive promoter is a nucleotide sequence that, when operably linked to a polynucleotide that encodes or limits a gene product, will result in the production of a gene product in the cell under most or all physiological conditions of the cell. An inducible promoter refers to a promoter that selectively expresses a coding sequence or functional RNA in response to the presence of endogenous or exogenous stimuli, such as by responding to chemical compounds (chemical inducers), or to environmental, hormone, chemical, and/or developmental signals. Inducible or regulated promoters include promoters that are induced or regulated, for example, by light, heat, stress, flooding or drought, salt stress, osmotic stress, plant hormones, wounds, or chemicals (such as ethanol, abscisic acid (ABA), jasmonates, salicylic acid, or safeners).
宿主细胞Host cells
本文中“宿主细胞”,是指真核细胞(例如,动物细胞、植物细胞、真菌细胞等)、原核细胞(例如一些微生物细胞、大肠杆菌、枯草菌等)或来自以单细胞实体形式培养的多细胞生物体(例如细胞系)的细胞,所述细胞用作核酸的受体(例如表达载体),且包括已通过核酸遗传修饰的原始细胞的后代。As used herein, "host cell" refers to a eukaryotic cell (e.g., an animal cell, a plant cell, a fungal cell, etc.), a prokaryotic cell (e.g., some microbial cells, Escherichia coli, Bacillus subtilis, etc.), or a cell from a multicellular organism cultured as a unicellular entity (e.g., a cell line), which serves as a recipient of nucleic acid (e.g., an expression vector) and includes the descendants of the original cell that has been genetically modified by the nucleic acid.
应理解,单一细胞的后代可归因于天然、偶发或故意突变而不一定与原始亲本细胞具有完全相同的形态或基因组等。“重组宿主细胞”(也称为“遗传修饰宿主细胞”)为其中已引入异源核酸,例如表达载体的宿主细胞。It is understood that the progeny of a single cell may not necessarily have completely the same morphology or genome, etc. as the original parent cell due to natural, accidental, or deliberate mutation. A "recombinant host cell" (also called a "genetically modified host cell") is a host cell into which a heterologous nucleic acid, such as an expression vector, has been introduced.
本领域技术人员将理解,表达载体的设计可取决于诸如待转化的宿主细胞的选择、所希望的表达水平等因素。Those skilled in the art will appreciate that the design of the expression vector may depend on factors such as the choice of the host cell to be transformed, the level of expression desired, and the like.
在另一个方面中,本发明还提供了一种宿主细胞或其后代,所述宿主细胞包含前述基因编辑蛋白(Cas蛋白),或前述融合蛋白,或前述多核苷酸,或前述载体系统,或前述CRISPR-Cas系统,或前述组合物。In another aspect, the present invention also provides a host cell or its progeny, wherein the host cell comprises the aforementioned gene editing protein (Cas protein), or the aforementioned fusion protein, or the aforementioned polynucleotide, or the aforementioned vector system, or the aforementioned CRISPR-Cas system, or the aforementioned composition.
在一个实施方式中,所述宿主细胞包括非人类哺乳动物、人类、昆虫、鸟类、爬行动物、两栖动物、啮齿动物、鱼类、蠕虫、线虫或酵母细胞。In one embodiment, the host cell comprises a non-human mammal, human, insect, bird, reptile, amphibian, rodent, fish, worm, nematode, or yeast cell.
在一个方面,本发明还提供了一种多细胞生物体,所述多细胞生物体包含前述细胞或其后代。In one aspect, the present invention also provides a multicellular organism, comprising the aforementioned cell or its progeny.
在一个实施方式中,所述的多细胞生物体是用于相关疾病的动物模型或植物模型。In one embodiment, the multicellular organism is an animal model or a plant model for a relevant disease.
NLSNLS
NLS是指“核定位序列”或“核定位信号”,是指促使蛋白质进入细胞核内的氨基酸序列。核定位序列是本领域中已知的(例如Plank等人在2000年11月23日提交的国际PCT申请PCT/EP2000/011690并且在2001年5月31日公布为 WO/2001/038547中有所描述),所述专利通过引用其对于示例性核定位序列的公开内容而并入本文。在其他实施方案中,NLS是经优化的NLS,例如,Koblan等人,Nature Biotech.2018doi:10.1038/nbt.4172中所述。NLS refers to "nuclear localization sequence" or "nuclear localization signal", which refers to an amino acid sequence that causes a protein to enter the cell nucleus. Nuclear localization sequences are known in the art (e.g., International PCT Application No. PCT/EP2000/011690 filed by Plank et al. on November 23, 2000 and published as WO/2001/038547), which is incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In other embodiments, the NLS is an optimized NLS, e.g., as described in Koblan et al., Nature Biotech. 2018 doi: 10.1038/nbt.4172.
可操作地连接Operatively connected
“可操作地连接”是指目标核苷酸序列以允许核苷酸序列表达的方式(例如,在体外转录/翻译系统中或者当载体被引入宿主细胞中时在宿主细胞中)连接至调控元件。有利的载体包括慢病毒和腺相关病毒,并且也可选择这些载体的类型以靶向特定类型的细胞。"Operably linked" means that the target nucleotide sequence is linked to the regulatory elements in a manner that allows the nucleotide sequence to be expressed (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell). Advantageous vectors include lentiviruses and adeno-associated viruses, and the type of these vectors can also be selected to target specific types of cells.
互补Complementarity
“互补性”是指一个核酸序列与另一个核酸序列借助于传统的沃森-克里克或其他非传统类型形成一个或多个氢键的能力。互补百分比表示一个核酸分子中可与另一个核酸序列形成氢键(例如,沃森-克里克碱基配对)的残基的百分比(例如,10个之中有5、6、7、8、9、10个互补,则互补百分比为50%、60%、70%、80%、90%和100%)。“完全互补”表示一个核酸序列的所有连续残基与另一个核酸序列中的相同数目的连续残基均形成氢键。“基本上互补”是指在一个具有8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、30、35、40、45、50个或更多个核苷酸的区域上至少为60%、65%、70%、75%、80%、85%、90%、95%、97%、98%、99%或100%的互补程度,或者是指在严格条件下杂交的两个核酸。"Complementarity" refers to the ability of one nucleic acid sequence to form one or more hydrogen bonds with another nucleic acid sequence by means of traditional Watson-Crick or other non-traditional types. The percentage of complementarity represents the percentage of residues in one nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with another nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 are complementary, then the percentage of complementarity is 50%, 60%, 70%, 80%, 90% and 100%). "Complete complementarity" means that all consecutive residues of one nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in another nucleic acid sequence. "Substantially complementary" refers to a degree of complementarity of at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions.
与杂交相关的术语“严格条件”是指与靶序列具有互补性的一个核酸主要地与该靶序列杂交并且基本上不杂交到非靶序列上的条件。严格条件通常是序列依赖性的,并且取决于许多因素。一般而言,该序列越长,则该序列特异性地杂交到其靶序列上的温度就越高。The term "stringent conditions" in relation to hybridization refers to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes to the target sequence and substantially does not hybridize to non-target sequences. Stringent conditions are generally sequence-dependent and depend on many factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence.
“杂交”是指其中一个或多个多核苷酸反应形成一种复合物的反应,该复合物经由这些核苷酸残基之间的碱基的氢键键合而稳定化。该复合物可包含形成一个双链体的两条链、形成多链复合物的三条或多条链、单个自我杂交链、或这些的任何组合。杂交反应可以构成一个更广泛的过程(如PCR的开始、或经由一种酶的多核苷酸的切割)中的一个步骤。能够与一个给定序列杂交的序列被称为该给定序列的“互补物”。"Hybridization" refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding of the bases between the nucleotide residues. The complex may comprise two strands forming a duplex, three or more strands forming a multi-stranded complex, a single self-hybridizing strand, or any combination of these. A hybridization reaction may constitute a step in a broader process such as the initiation of PCR, or cleavage of a polynucleotide by an enzyme. Sequences that are capable of hybridizing to a given sequence are referred to as the "complement" of the given sequence.
靶序列与gRNA的杂交,表示靶序列和gRNA的核酸序列至少60%、65%、70%、75%、80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%、99%或100%的可以杂交,形成复合物;或者代表靶序列和gRNA的核酸序列至少有12个、15个、16个、17个、18个、19个、20个或更多个碱基可以互补配对,杂交形成复合物。Hybridization of the target sequence with the gRNA means that at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% of the nucleic acid sequences of the target sequence and the gRNA can hybridize to form a complex; or represents that at least 12, 15, 16, 17, 18, 19, 20 or more bases of the nucleic acid sequences of the target sequence and the gRNA can complementarily pair and hybridize to form a complex.
表达Express
核酸表达包括从DNA序列产生RNA模板(例如转录)、RNA转录物的加工(例如通过剪接、编辑、5′帽形成和/或3′末端加工)、将RNA翻译成多肽或蛋白质或多肽或蛋白质的翻译后修饰中的一种或多种。 Nucleic acid expression includes one or more of generation of an RNA template from a DNA sequence (e.g., transcription), processing of the RNA transcript (e.g., by splicing, editing, 5' cap formation, and/or 3' end processing), translation of the RNA into a polypeptide or protein, or post-translational modification of the polypeptide or protein.
递送deliver
“递送”指向目的地提供实体(如药物),例如,本发明的CRISPR-Cas系统/组合物的组分可以各种形式递送,例如DNA/RNA或RNA/RNA或蛋白质RNA的组合。例如,基因编辑蛋白(Cas蛋白)可作为编码DNA的多核苷酸或编码RNA的多核苷酸或作为蛋白质被递送。"Delivery" refers to providing an entity (such as a drug) to a destination, for example, the components of the CRISPR-Cas system/composition of the present invention can be delivered in various forms, such as a combination of DNA/RNA or RNA/RNA or protein RNA. For example, a gene editing protein (Cas protein) can be delivered as a polynucleotide encoding DNA or a polynucleotide encoding RNA or as a protein.
在一个方面,本发明还提供了一种递送系统,所述递送系统包括所述的基因编辑蛋白(Cas蛋白)或所述的融合蛋白,或所述的多核苷酸,或所述的CRISPR-Cas组合物。In one aspect, the present invention also provides a delivery system, which comprises the gene editing protein (Cas protein) or the fusion protein, or the polynucleotide, or the CRISPR-Cas composition.
在一个实施方式中,所述的递送系统还包括递送媒介物,所述的递送媒介物包括纳米颗粒、脂质体、外泌体、微泡、基因枪或电转装置。In one embodiment, the delivery system further comprises a delivery vehicle, and the delivery vehicle comprises nanoparticles, liposomes, exosomes, microbubbles, a gene gun or an electroporation device.
此外,当递送对象为植物细胞时,还会采用诸如细胞穿透肽(CPP)进行递送的方式。例如,在一个具体实施方式中,基因编辑蛋白(Cas蛋白)和/或至少一种向导RNA与一种或多种CPP偶联,从而有效地将偶联有基因编辑蛋白(Cas蛋白)和/或向导RNA的CPP运输到植物细胞内(例如原生质体内)。CPP具有少于35个氨基酸的短肽,其衍生自蛋白质或衍生自嵌合序列,能够以非受体依赖性方式跨细胞膜运输生物分子。CPP可以是阳离子肽、具有疏水序列的肽、两亲性肽、具有富含脯氨酸及抗微生物序列的肽以及嵌合或二分肽。CPP能够穿透生物膜,并且因此触发不同生物分子跨细胞膜移动到细胞质中,并能改进它们的细胞内通路,并且因此促进生物分子与靶标的相互作用。In addition, when the delivery object is a plant cell, a method such as cell penetrating peptide (CPP) delivery is also adopted. For example, in a specific embodiment, a gene editing protein (Cas protein) and/or at least one guide RNA is coupled to one or more CPPs, so as to effectively transport the CPP coupled with a gene editing protein (Cas protein) and/or a guide RNA into a plant cell (e.g., in a protoplast). CPP has a short peptide of less than 35 amino acids, which is derived from a protein or derived from a chimeric sequence, and can transport biomolecules across the cell membrane in a non-receptor-dependent manner. CPP can be a cationic peptide, a peptide with a hydrophobic sequence, an amphipathic peptide, a peptide rich in proline and an antimicrobial sequence, and a chimeric or dichotomous peptide. CPP can penetrate the biomembrane, and thus trigger different biomolecules to move across the cell membrane into the cytoplasm, and can improve their intracellular pathways, and thus promote the interaction between biomolecules and targets.
示例性的,CPP包括Tat(为通过HIV 1型进行病毒复制所需的核转录活化蛋白)、穿透素、卡波西(Kaposi)成纤维细胞生长因子(FGF)信号肽序列、整联蛋白β3信号肽序列、聚精氨酸肽Arg序列、富含鸟嘌呤分子转运体、甜箭头肽等。Exemplarily, CPPs include Tat (a nuclear transcription activating protein required for viral replication by HIV type 1), penetratin, Kaposi fibroblast growth factor (FGF) signal peptide sequence, integrin β3 signal peptide sequence, poly-arginine peptide Arg sequence, guanine-rich molecular transporter, sweet arrow peptide, etc.
接头Connectors
本文中“接头”指连接两个分子或部分,例如融合蛋白的两个域,例如基因编辑蛋白(Cas蛋白)和脱氨酶的化学基团或分子。在一些连接方式中,接头位于两个基团、分子或其他部分之间或侧翼,并且通过共价键连接两者。Herein, "linker" refers to a chemical group or molecule that connects two molecules or parts, such as two domains of a fusion protein, such as a gene editing protein (Cas protein) and a deaminase. In some connection modes, the linker is located between or flanking two groups, molecules or other parts, and connects the two by covalent bonds.
在一些实施例中,接头是氨基酸或多个氨基酸残基通过肽键连接形成的线性多肽。在一些实施方案中,接头是有机分子、基团、聚合物或化学部分。接头的长度以及类型,可以根据需要来进行设计。在一些实施例中,接头可以选择人工合成的氨基酸序列或天然存在的多肽序列。In some embodiments, the joint is a linear polypeptide formed by amino acids or multiple amino acid residues connected by peptide bonds. In some embodiments, the joint is an organic molecule, a group, a polymer or a chemical part. The length and type of the joint can be designed as needed. In some embodiments, the joint can select an artificially synthesized amino acid sequence or a naturally occurring peptide sequence.
检测Detection
在一个方面,本发明还提供了一种靶向和编辑靶核酸的方法,所述方法包括使所述靶核酸与前述任一项CRISPR-Cas系统或组合物接触。In one aspect, the present invention also provides a method for targeting and editing a target nucleic acid, the method comprising contacting the target nucleic acid with any one of the aforementioned CRISPR-Cas systems or compositions.
在一个方面,本发明还提供了一种在识别靶核酸后非特异性降解单链DNA的方法,所述方法包括使所述靶核酸与前述CRISPR-Cas组合物接触。In one aspect, the present invention also provides a method for non-specifically degrading single-stranded DNA after recognizing a target nucleic acid, the method comprising contacting the target nucleic acid with the aforementioned CRISPR-Cas composition.
在一个方面,本发明还提供了一种在识别双链靶核酸的间隔(Spacer)互补链后靶向所述双链靶核酸的非间隔(Spacer)互补链并使其产生切口的方法,所述方法包括使所述双链靶核酸与前述CRISPR-Cas系统或组合物接触。In one aspect, the present invention also provides a method for targeting a non-spacer complementary strand of a double-stranded target nucleic acid and causing a nick therein after recognizing a spacer complementary strand of the double-stranded target nucleic acid, the method comprising contacting the double-stranded target nucleic acid with the aforementioned CRISPR-Cas system or composition.
在一个方面,本发明还提供了一种靶向和切割双链靶核酸的方法,所述方法包括使所述双链靶核酸与前述CRISPR-Cas系统或组合物接触。 In one aspect, the present invention also provides a method for targeting and cleaving a double-stranded target nucleic acid, the method comprising contacting the double-stranded target nucleic acid with the aforementioned CRISPR-Cas system or composition.
在一个实施方式中,在使所述双链DNA的间隔(Spacer)互补链产生切口之前,使所述双链靶核酸的非间隔(Spacer)序列互补链产生切口。In one embodiment, the non-spacer sequence complementary strand of the double-stranded target nucleic acid is nicked before the spacer complementary strand of the double-stranded DNA is nicked.
在一个方面,本发明还提供了一种特异性编辑双链核酸的方法,所述方法包括在充分的条件下使以下进行接触充分的时间量,In one aspect, the present invention also provides a method for specifically editing a double-stranded nucleic acid, the method comprising contacting the following under sufficient conditions and for a sufficient amount of time,
(1)前述所述基因编辑蛋白(Cas蛋白)、或融合蛋白、另一具有序列特异性切口活性的酶,以及所述向导RNA,所述向导RNA指导所述基因编辑蛋白(Cas蛋白)或所述融合蛋白,相对于所述另一序列特异性切口酶的活性使相对链产生切口;以及(2)所述双链核酸;所述方法导致双链断裂的形成。(1) the aforementioned gene editing protein (Cas protein), or fusion protein, another enzyme with sequence-specific nicking activity, and the guide RNA, wherein the guide RNA guides the gene editing protein (Cas protein) or the fusion protein to produce a nick on the relative strand relative to the activity of the other sequence-specific nicking enzyme; and (2) the double-stranded nucleic acid; the method results in the formation of a double-strand break.
在一个方面,本发明还提供了一种编辑双链核酸的方法,所述方法包括在充分的条件下使以下进行接触充分的时间量:In one aspect, the invention also provides a method of editing a double-stranded nucleic acid, the method comprising contacting the following under sufficient conditions and for a sufficient amount of time:
(1)前述基因编辑蛋白(Cas蛋白)、或融合蛋白,和具有DNA修饰活性的蛋白质结构域的融合蛋白,以及靶向所述双链核酸的所述RNA指导物;以及(2)所述双链核酸;(1) the aforementioned gene editing protein (Cas protein), or a fusion protein, and a fusion protein of a protein domain having DNA modification activity, and the RNA guide targeting the double-stranded nucleic acid; and (2) the double-stranded nucleic acid;
所述融合蛋白的基因编辑蛋白(Cas蛋白)被修饰以使所述双链核酸的非靶链产生切口。The gene editing protein (Cas protein) of the fusion protein is modified to produce a nick in the non-target strand of the double-stranded nucleic acid.
在一个实施方式中,所述双链核酸的两条链在不同的位点被切割,导致交错切割。In one embodiment, the two strands of the double-stranded nucleic acid are cleaved at different sites, resulting in staggered cleavage.
在一个实施方式中,所述双链核酸的两条链在同一位点被切割,导致平双链断裂。In one embodiment, both strands of the double-stranded nucleic acid are cleaved at the same site, resulting in a blunt double-strand break.
在一个方面,本发明还提供了一种靶向并切割单链靶核酸的方法,所述方法包括使靶核酸与前述任一项所述的CRISPR-Cas组合物接触。In one aspect, the present invention also provides a method for targeting and cleaving a single-stranded target nucleic acid, the method comprising contacting the target nucleic acid with any of the aforementioned CRISPR-Cas compositions.
在一个方面,本发明还提供了一种诱导细胞状态改变的方法,所述方法包括使前述CRISPR-Cas组合物与细胞中的所述靶核酸接触。In one aspect, the present invention also provides a method for inducing a change in a cell state, the method comprising contacting the aforementioned CRISPR-Cas composition with the target nucleic acid in a cell.
在一个实施方式中,所述细胞状态包括凋亡或休眠;In one embodiment, the cell state comprises apoptosis or dormancy;
在一个实施方式中,所述细胞包括真核细胞或原核细胞;In one embodiment, the cell comprises a eukaryotic cell or a prokaryotic cell;
在一个实施方式中,所述细胞包括哺乳动物细胞或植物病变细胞;In one embodiment, the cell comprises a mammalian cell or a plant pathogenic cell;
在一个实施方式中,所述细胞包括癌细胞;In one embodiment, the cell comprises a cancer cell;
在一个实施方式中,所述细胞包括感染性细胞或被感染原感染的细胞;In one embodiment, the cell comprises an infectious cell or a cell infected by an infectious agent;
在一个实施方式中,所述细胞包括被病毒感染的细胞、被朊病毒感染的细胞;In one embodiment, the cells include virus-infected cells, prion-infected cells;
在一个实施方式中,所述细胞包括真菌细胞、原生动物或寄生虫细胞。In one embodiment, the cell comprises a fungal cell, a protozoan, or a parasite cell.
在一个方面,本发明还提供了一种检测样品中靶核酸的方法,所述方法包括将样品与前述基因编辑蛋白(Cas蛋白)、向导RNA和非靶序列接触;检测由所述基因编辑蛋白(Cas蛋白)切割非靶序列产生的可检测信号,从而检测靶核酸;所述非靶序列不与所述向导RNA杂交。In one aspect, the present invention also provides a method for detecting a target nucleic acid in a sample, the method comprising contacting the sample with the aforementioned gene editing protein (Cas protein), guide RNA and non-target sequence; detecting a detectable signal generated by the gene editing protein (Cas protein) cutting the non-target sequence, thereby detecting the target nucleic acid; the non-target sequence does not hybridize with the guide RNA.
试剂盒Reagent test kit
在一个方面,本发明提供了一种试剂盒,所述试剂盒包括前述的基因编辑蛋白(Cas蛋白)、前述的融合蛋白、前述多核苷酸、前述的CRISPR-Cas组合物、前述的宿主细胞在制备试剂盒的用途,所述试剂盒的组分在相同或不同的容器中。In one aspect, the present invention provides a kit, comprising the aforementioned gene editing protein (Cas protein), the aforementioned fusion protein, the aforementioned polynucleotide, the aforementioned CRISPR-Cas composition, and the use of the aforementioned host cell in preparing a kit, wherein the components of the kit are in the same or different containers.
在一个方面,本发明还提供了一种容器,所述容器包含前述试剂盒。 In one aspect, the present invention also provides a container, comprising the aforementioned kit.
在一个实施方式中,所述容器包括无菌容器;In one embodiment, the container comprises a sterile container;
在一个实施方式中,所述容器包括注射器。In one embodiment, the container comprises a syringe.
在一些实施方式中,试剂盒还包括使用该试剂盒的说明书,例如一种以上语言的说明书。试剂盒还可以包含一种或多种试剂,用于利用上述一种或多种组分的过程中。试剂可在任何合适容器中提供。例如,试剂盒可提供一种或多种反应或储存缓冲液。上述试剂可以在使用前以需要添加一种或多种其他组分的形式(例如,以浓缩或冻干形式)提供;缓冲液可以是任何缓冲液,包括但不限于碳酸钠缓冲液、碳酸氢钠缓冲液、硼酸盐缓冲液、Tris缓冲液、MOPS缓冲液、HEPES缓冲液以及它们的组合。缓冲液可以具有适合的酸碱度(pH值),例如,可以是碱性的。在一些实施方案中,缓冲液的pH为约7-10之间。In some embodiments, the kit also includes instructions for using the kit, such as instructions in more than one language. The kit may also include one or more reagents for use in the process of utilizing one or more of the above components. The reagents may be provided in any suitable container. For example, the kit may provide one or more reaction or storage buffers. The above reagents may be provided in a form (e.g., in a concentrated or lyophilized form) requiring the addition of one or more other components before use; the buffer may be any buffer, including but not limited to sodium carbonate buffer, sodium bicarbonate buffer, borate buffer, Tris buffer, MOPS buffer, HEPES buffer, and combinations thereof. The buffer may have a suitable pH value, for example, may be alkaline. In some embodiments, the pH of the buffer is between about 7-10.
治疗treat
“治疗”是指,治疗或治愈受试者病症,延缓病症的症状的发作,和/或延缓病症严重程度。术语“受试者”包括但不限于各种动物、植物和微生物。动物,包括哺乳动物,例如牛科动物、马科动物、羊科动物、猪科动物、犬科动物、猫科动物、兔科动物、啮齿类动物(例如,小鼠或大鼠)、非人灵长类动物(例如,猕猴或食蟹猴)或人。在某些实施方式中,所述受试者(例如人)患有病症(例如,疾病相关基因缺陷所导致的病症)。“植物”为能够进行光合作用的任何分化的多细胞生物,在包括处于任何成熟或发育阶段的作物植物。"Treatment" refers to treating or curing a subject's condition, delaying the onset of symptoms of a condition, and/or delaying the severity of a condition. The term "subject" includes, but is not limited to, various animals, plants, and microorganisms. Animals include mammals, such as bovines, equines, ovines, swine, canines, felines, lagomorphs, rodents (e.g., mice or rats), non-human primates (e.g., macaques or cynomolgus monkeys), or humans. In certain embodiments, the subject (e.g., a human) suffers from a condition (e.g., a condition caused by a disease-related genetic defect). "Plant" is any differentiated multicellular organism capable of photosynthesis, including crop plants at any maturity or developmental stage.
在一个方面,本发明还提供了前述的基因编辑蛋白(Cas蛋白)、前述的融合蛋白、前述多核苷酸、前述的CRISPR-Cas组合物、前述的宿主细胞在制备治疗有需要的受试者病症或疾病的药物中的应用。In one aspect, the present invention also provides the use of the aforementioned gene editing protein (Cas protein), the aforementioned fusion protein, the aforementioned polynucleotide, the aforementioned CRISPR-Cas composition, and the aforementioned host cell in the preparation of a drug for treating a condition or disease in a subject in need thereof.
在一个实施方式中,所述应用包括向所述受试者或所述受试者的离体细胞施用所述CRISPR-Cas组合物;In one embodiment, the use comprises administering the CRISPR-Cas composition to the subject or to an ex vivo cell of the subject;
在一个实施方式中,所述间隔(spacer)序列与跟所述病症或疾病相关的所述靶核酸的至少15个核苷酸互补,所述Cas蛋白或所述融合蛋白切割所述靶核酸;In one embodiment, the spacer sequence is complementary to at least 15 nucleotides of the target nucleic acid associated with the condition or disease, and the Cas protein or the fusion protein cleaves the target nucleic acid;
在一个实施方式中,所述病症或疾病包括癌症或感染性疾病;In one embodiment, the condition or disease comprises cancer or an infectious disease;
在一个实施方式中,所述癌症包括维尔姆斯瘤、尤文肉瘤、神经内分泌瘤、胶质母细胞瘤、神经母细胞瘤、黑色素瘤、皮肤癌、乳腺癌、结肠癌、直肠癌、前列腺癌、肝癌、肾癌、胰腺癌、肺癌、胆道癌、宫颈癌、子宫内膜癌、食管癌、胃癌、头颈癌、甲状腺髓样癌、卵巢癌、胶质瘤、淋巴瘤、白血病、骨髓瘤、急性淋巴细胞白血病、急性髓细胞性白血病、慢性淋巴细胞白血病、慢性髓细胞性白血病、何杰金氏淋巴瘤、非何杰金氏淋巴瘤或尿膀胱癌中的一种或多种;In one embodiment, the cancer comprises one or more of Wilms tumor, Ewing sarcoma, neuroendocrine tumor, glioblastoma, neuroblastoma, melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, kidney cancer, pancreatic cancer, lung cancer, biliary tract cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid cancer, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphocytic leukemia, acute myeloid leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, or urinary bladder cancer;
在一个实施方式中,所述病症或疾病包括囊性纤维化、、动脉粥样硬化心血管疾病(ASCVD)、进行性假肥大性肌营养不良、贝克肌营养不良、α-1-抗胰蛋白酶缺乏、庞贝病、强直性肌营养不良、亨廷顿病、脆性X综合征、弗里德赖希共济失调、肌萎缩侧索硬化、额颞叶痴呆、遗传性慢性肾脏病、高脂血症、莱伯氏先天性黑蒙、镰状细胞病、原发性高草酸尿症(PH1)、高胆固醇血症(FH)、遗传性血管性水肿(HAE)、视网膜疾病、黄斑变性、转甲状腺素蛋白淀粉样变或β地中海贫血中的一种或多种; In one embodiment, the condition or disease comprises one or more of cystic fibrosis, atherosclerotic cardiovascular disease (ASCVD), pseudohypertrophic muscular dystrophy, Becker muscular dystrophy, alpha-1-antitrypsin deficiency, Pompe disease, myotonic dystrophy, Huntington disease, fragile X syndrome, Friedreich's ataxia, amyotrophic lateral sclerosis, frontotemporal dementia, hereditary chronic kidney disease, hyperlipidemia, Leber's congenital amaurosis, sickle cell disease, primary hyperoxaluria (PH1), hypercholesterolemia (FH), hereditary angioedema (HAE), retinal disease, macular degeneration, transthyretin amyloidosis, or beta thalassemia;
在一个实施方式中,所述感染性疾病的感染原包括人类免疫缺陷病毒、单纯疱疹病毒-1、乙型肝炎(Hepatitis B)或单纯疱疹病毒-2中的一种或多种。In one embodiment, the infectious agent of the infectious disease includes one or more of human immunodeficiency virus, herpes simplex virus-1, hepatitis B or herpes simplex virus-2.
本发明的主要优点包括:The main advantages of the present invention include:
(a)本发明首次发现一种新的基因编辑蛋白(Cas蛋白),本发明的基因编辑蛋白(Cas蛋白)具有非常好的基因编辑活性,可对靶基因进行有效编辑或切割,可有效治疗有需要的受试者的病症或疾病(比如,囊性纤维化、进行性假肥大性肌营养不良、贝克肌营养不良、α-1-抗胰蛋白酶缺乏、庞贝病、强直性肌营养不良、亨廷顿病、脆性X综合征、弗里德赖希共济失调、肌萎缩侧索硬化、额颞叶痴呆、遗传性慢性肾脏病、高脂血症、莱伯氏先天性黑蒙、镰状细胞病、高胆固醇血症、转甲状腺素蛋白淀粉样变或β地中海贫血中的一种或多种)。(a) The present invention discovers a new gene editing protein (Cas protein) for the first time. The gene editing protein (Cas protein) of the present invention has very good gene editing activity, can effectively edit or cut the target gene, and can effectively treat the symptoms or diseases of subjects in need (for example, one or more of cystic fibrosis, pseudohypertrophic muscular dystrophy, Becker muscular dystrophy, alpha-1-antitrypsin deficiency, Pompe disease, myotonic dystrophy, Huntington's disease, fragile X syndrome, Friedreich's ataxia, amyotrophic lateral sclerosis, frontotemporal dementia, hereditary chronic kidney disease, hyperlipidemia, Leber's congenital amaurosis, sickle cell disease, hypercholesterolemia, transthyretin amyloidosis or beta-thalassemia).
(b)本发明发现了一个新的Cas蛋白,与已经报道的Cas酶的同源性较低,相对于现有技术中的Cas酶,表现出优异的DNA核酸酶的活性,具有广泛的应用前景。(b) The present invention has discovered a new Cas protein, which has a lower homology with the reported Cas enzymes and exhibits excellent DNA nuclease activity compared to the Cas enzymes in the prior art, and has broad application prospects.
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明具体条件的实验方法,通常按照常规条件,例如Sambrook等人,分子克隆:实验室手册(New York:Cold Spring Harbor Laboratory Press,1989)中所述的条件,或按照制造厂商所建议的条件。除非另外说明,否则百分比和份数是重量百分比和重量份数。The present invention is further described below in conjunction with specific examples. It should be understood that these examples are only used to illustrate the present invention and are not used to limit the scope of the present invention. The experimental methods in the following examples where specific conditions are not specified are usually carried out under conventional conditions, such as the conditions described in Sambrook et al., Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory Press, 1989), or according to the conditions recommended by the manufacturer. Unless otherwise stated, percentages and parts are weight percentages and weight parts.
除非有特别说明,否则本发明实施例中的试剂和材料均为市售产品。Unless otherwise specified, the reagents and materials in the examples of the present invention are all commercially available products.
实施例1新Cas蛋白的鉴定Example 1 Identification of new Cas proteins
首先,发明人通过利用计算程序对宏基因组数据进行挖掘,对未培养物的宏基因组进行分析,通过对去冗余、蛋白质聚类分析,鉴定得到了一个新的Cas蛋白酶,将其命名为CasW1,其氨基酸序列分别如SEQ ID No.1所示,核苷酸编码序列如SEQ ID No.2所示。经序列比对,鉴定出CasW1为Cas12家族。First, the inventors mined the metagenomic data using a computer program and analyzed the metagenomics of uncultured plants. Through redundancy removal and protein clustering analysis, a new Cas protease was identified and named CasW1. Its amino acid sequence is shown in SEQ ID No. 1 and its nucleotide coding sequence is shown in SEQ ID No. 2. After sequence alignment, CasW1 was identified as a member of the Cas12 family.
通过对宏基因组进行分析预测和筛选得到新型CRISPR-Cas系统相关的蛋白和相关的元件,将本发明的CRISPR-Cas效应蛋白与已有效应蛋白相比,发现其与已知Cas蛋白相似度较低。By analyzing, predicting and screening the metagenome, proteins and related elements related to the novel CRISPR-Cas system were obtained. The CRISPR-Cas effector protein of the present invention was compared with existing effector proteins, and it was found that it had a low similarity with known Cas proteins.
通过分析发现,本发明所获得Cas蛋白对应的PAM为5’-TTTN,N代表A/T/G/C;通过PILER-CR对含CasW1的样品进行CRISPR基因座进行注释,得到其对应的编码同向重复(DR)序列的DNA分别如SEQ ID No.3所示,其二级结构示意图如图10所示。Through analysis, it was found that the PAM corresponding to the Cas protein obtained in the present invention is 5’-TTTN, where N represents A/T/G/C. The CRISPR locus of the sample containing CasW1 was annotated by PILER-CR, and the corresponding DNA encoding the direct repeat (DR) sequence was obtained as shown in SEQ ID No. 3, and its secondary structure schematic diagram is shown in Figure 10.
实施例2体外酶切实验验证CasW1的切割活性。Example 2 In vitro enzyme cleavage experiments verified the cleavage activity of CasW1.
为验证CasW1是否是一个具有双链DNA核酸酶活性,发明人对其切割活性进行了验证。To verify whether CasW1 is a double-stranded DNA nuclease activity, the inventors verified its cutting activity.
1、构建CasW1的表达载体。1. Construct the expression vector of CasW1.
合成上述编码CasW1的核苷酸序列片段,并将其采用限制性内切酶酶切和T4 DNA ligase连接的方法克隆到原核蛋白表达载体pET-28a(+)(百奥莱博,#QN1060),得到连接产物,图谱参见图6。 The above-mentioned nucleotide sequence fragment encoding CasW1 was synthesized and cloned into the prokaryotic protein expression vector pET-28a(+) (Biolabio, #QN1060) by restriction endonuclease digestion and T4 DNA ligase ligation to obtain a ligation product, as shown in Figure 6.
发明人对获得的CasW1片段进行PCR,并采用EcoRI/NotI双酶切,同时对原核蛋白表达载体pET-28a(+)进行EcoRI/NotI双酶切。之后将CasW1片段双酶切产物和原核蛋白表达载体pET-28a(+)双酶切产物进行琼脂糖凝胶电泳,结果如图1所示,显示获得了大小正确的酶切产物CasW1核苷酸片段以及切除EcoRI/NotI酶切位点中间11bp的pET-28a(+)载体片段。The inventors performed PCR on the obtained CasW1 fragment, and used EcoRI/NotI double digestion, and at the same time performed EcoRI/NotI double digestion on the prokaryotic protein expression vector pET-28a(+). Afterwards, the CasW1 fragment double digestion product and the prokaryotic protein expression vector pET-28a(+) double digestion product were subjected to agarose gel electrophoresis, and the results are shown in Figure 1, showing that the digestion product CasW1 nucleotide fragment of the correct size and the pET-28a(+) vector fragment with 11bp in the middle of the EcoRI/NotI digestion site were obtained.
之后发明人将连接产物转化至大肠杆菌感受态细胞DH5a中,之后将感受态细胞DH5a接种于涂卡那抗性的LB固体平板。37℃恒温箱倒置培养过夜后,挑取单菌落进行Sanger测序,序列结果显示序列正确的质粒克隆进行质粒抽提,即获得pET-28a(+)-CasW1表达载体。The inventors then transformed the ligation product into E. coli competent cells DH5a, and then inoculated the competent cells DH5a on a LB solid plate with kanamycin resistance. After inverted culture in a 37°C incubator overnight, a single colony was picked for Sanger sequencing. The sequence results showed that the plasmid clone with the correct sequence was subjected to plasmid extraction, and the pET-28a(+)-CasW1 expression vector was obtained.
2、CasW1蛋白的体外纯化。2. In vitro purification of CasW1 protein.
实验操作步骤如下:The experimental steps are as follows:
(1)转化:从-80℃冰箱取出一管感受态细胞大肠杆菌BL21(DE3)(上海唯地生物技术有限公司,EC1002)置于冰上溶解5min,然后吸取1ng pET-28a(+)-CasW1表达载体质粒加入到上述感受态细胞中,用手指轻弹管底混匀并在冰上静置25min。45℃水浴热激45sec,迅速放回冰上静置2min,向离心管加入700ul无抗LB培养基,37℃,220rpm复苏60min。待复苏完成,5000rpm离心1min收菌,弃掉600μl上清液,然后轻轻将剩余液体和感受态细胞混匀后涂布于LB平板,使用玻璃珠涂板。(1) Transformation: Take out a tube of competent Escherichia coli BL21 (DE3) (Shanghai Weidi Biotechnology Co., Ltd., EC1002) from the -80℃ refrigerator and place it on ice to dissolve for 5 minutes, then pipette 1ng pET-28a(+)-CasW1 expression vector plasmid into the competent cells, flick the bottom of the tube with your finger to mix, and let it stand on ice for 25 minutes. Heat shock in a 45℃ water bath for 45 seconds, quickly put it back on ice and let it stand for 2 minutes, add 700ul of antibiotic-free LB medium to the centrifuge tube, and resuscitate at 37℃, 220rpm for 60 minutes. After the recovery is complete, centrifuge at 5000rpm for 1 minute to collect the bacteria, discard 600μl of the supernatant, then gently mix the remaining liquid and competent cells and spread them on the LB plate, and use glass beads to coat the plate.
(2)诱导:于次日早上9点从转化的LB平板上挑单克隆,并将该单克隆培养至3ml含有卡那霉素(10mg/ml)的LB液体培养基,之后置于摇床上,在37℃、220rpm条件下培养8h。下午5点将3ml LB中的菌转接至300ml含有含卡那霉素的2x YT培养基,往菌液中加入150ul IPTG(终浓度0.5mM),16℃过夜诱导13-14h。(2) Induction: At 9 am the next day, pick a single clone from the transformed LB plate and culture it in 3 ml LB liquid culture medium containing kanamycin (10 mg/ml), then place it on a shaker and culture it at 37°C and 220 rpm for 8 hours. At 5 pm, transfer the bacteria in 3 ml LB to 300 ml 2x YT medium containing kanamycin, add 150 ul IPTG (final concentration 0.5 mM) to the bacterial solution, and induce it overnight at 16°C for 13-14 hours.
(3)收菌:将诱导后获得的菌液在4500rpm、4℃条件下,离心10min,弃掉上清。用15ml的咪唑(浓度为10mM)将菌种重悬在50ml离心管里(加入咪唑用于竞争性洗脱CasW1蛋白),再补加150μl的PMSF蛋白酶抑制剂(用于抑制蛋白降解,提高收率)。(3) Harvesting bacteria: The bacterial solution obtained after induction was centrifuged at 4500 rpm and 4°C for 10 min, and the supernatant was discarded. The bacteria were resuspended in a 50 ml centrifuge tube with 15 ml of imidazole (concentration of 10 mM) (imidazole was added for competitive elution of CasW1 protein), and then 150 μl of PMSF protease inhibitor was added (to inhibit protein degradation and improve yield).
(4)超声:将50ml离心管竖直固定在盛满冰水的烧杯中,调整位置,使超声探头在菌液液面以下。超声模式:工作3s,间歇12s,200w,60次。超声结束后补加150μl的PMSF蛋白酶抑制剂,之后在11000rpm、4℃条件下,离心25min。(4) Ultrasound: Fix a 50 ml centrifuge tube vertically in a beaker filled with ice water, and adjust the position so that the ultrasonic probe is below the bacterial liquid surface. Ultrasound mode: work for 3 seconds, rest for 12 seconds, 200w, 60 times. After the end of ultrasound, add 150 μl of PMSF protease inhibitor, and then centrifuge at 11000 rpm and 4°C for 25 minutes.
(5)Beads预处理:吸取300μl Ni-NTA Agarose beads(QIAGEN,#30230)到15ml离心管里,加入10ml PBS,室温旋转5min,1000g,2min,4℃离心,用胶头滴管吸去上清,重复用PBS清洗一次。随后加入10ml咪唑(浓度为10mM),室温旋转5min,1000g,2min,4℃离心,胶头滴管小心吸去上清,将装有洗好beads的15ml离心管放置冰上备用。(5) Beads pretreatment: Pipette 300 μl Ni-NTA Agarose beads (QIAGEN, #30230) into a 15 ml centrifuge tube, add 10 ml PBS, rotate at room temperature for 5 min, centrifuge at 1000 g for 2 min, and 4 °C, remove the supernatant with a rubber pipette, and repeat the washing with PBS. Then add 10 ml imidazole (concentration of 10 mM), rotate at room temperature for 5 min, centrifuge at 1000 g for 2 min, and 4 °C, carefully remove the supernatant with a rubber pipette, and place the 15 ml centrifuge tube containing the washed beads on ice for later use.
(6)将步骤(4)离心后获得的菌液上清全部吸到步骤(5)得到的装有Ni-NTA Agarose beads的离心管里,4℃条件下旋转孵育1h。(6) Transfer all the supernatant of the bacterial solution obtained after centrifugation in step (4) into the centrifuge tube containing Ni-NTA Agarose beads obtained in step (5), and incubate with rotation at 4°C for 1 hour.
(7)清洗(Wash):Ni-NTA Agarose beads用10ml咪唑(浓度为40mM)洗脱两次。每次加完咪唑,4℃旋转5min,再在4℃条件下,1000×g,离心2min。用500μl咪唑(浓度为250mM)重悬Ni-NTA Agarose beads,之后转移至预冷的亲和层析柱(MedChemExpress,#HY-K0221),平衡5min,用1.5ml的离心管收集250mM咪唑洗脱的蛋白组分,上述洗脱步骤重复三次。 (7) Wash: Ni-NTA Agarose beads were eluted twice with 10 ml imidazole (40 mM concentration). After adding imidazole each time, rotate at 4°C for 5 min, and then centrifuge at 1000 × g for 2 min at 4°C. Resuspend Ni-NTA Agarose beads with 500 μl imidazole (250 mM concentration), then transfer to a pre-cooled affinity chromatography column (MedChemExpress, #HY-K0221), equilibrate for 5 min, and collect the protein fraction eluted with 250 mM imidazole in a 1.5 ml centrifuge tube. Repeat the above elution steps three times.
(8)用NanoDrop OD280测完各管蛋白浓度后,将蛋白液收集在一起,通过30kd的超滤管将蛋白置换到PBS缓冲液里,加入终浓度为10%的甘油,分装后液氮速冻,保存在-80℃冰箱。SDS-PAGE聚丙烯酰胺凝胶电泳鉴定蛋白大小和纯度,如图2所示,CasW1蛋白的大小约为130kd,表明已获得纯化较好的CasW1蛋白。(8) After measuring the protein concentration of each tube with NanoDrop OD280, the protein solution was collected together, and the protein was replaced into PBS buffer through a 30 kd ultrafiltration tube. Glycerol with a final concentration of 10% was added, and the aliquots were quickly frozen in liquid nitrogen and stored in a -80 °C refrigerator. SDS-PAGE polyacrylamide gel electrophoresis was used to identify the protein size and purity. As shown in Figure 2, the size of the CasW1 protein was about 130 kd, indicating that a well-purified CasW1 protein had been obtained.
3、体外切割验证3. In vitro cleavage verification
3.1制备体外切割dsDNA模板。3.1 Preparation of in vitro cleaved dsDNA template.
以HepG2细胞(ATCC,货号HB-8065)基因组为模板,根据hHPRT1基因(Genebank,NG_012329.2)dsDNA模板制备正向和反向引物,上下游引物如下所示:Using the HepG2 cell (ATCC, catalog number HB-8065) genome as a template, forward and reverse primers were prepared based on the hHPRT1 gene (Genebank, NG_012329.2) dsDNA template. The upstream and downstream primers are as follows:
hHPRT1-dsDNA-F:gtagtgtcaactcattgctg(SEQ ID NO.5);hHPRT1-dsDNA-F:gtagtgtcaactcattgctg(SEQ ID NO.5);
hHPRT1-dsDNA-R:gtcaagggcatatcctacaa(SEQ ID NO.6)。hHPRT1-dsDNA-R: gtcaagggcatatcctacaa (SEQ ID NO. 6).
采用Taq酶进行PCR扩增,反应体系如下所示:Taq enzyme was used for PCR amplification, and the reaction system was as follows:
基因组DNA(作为模板)1μl(总量100ng),2×Taq PCR mix 10μl,上、下游引物分别为0.5μl,ddH2O补齐至总体积为20μl。1 μl of genomic DNA (as template) (total amount 100 ng), 10 μl of 2×Taq PCR mix, 0.5 μl of upstream and downstream primers respectively, and ddH 2 O was added to make up the total volume to 20 μl.
PCR反应程序如下:95℃5min;94℃30s,55℃30s,72℃20s,35个循环;72℃10min;12℃保温。The PCR reaction program was as follows: 95°C for 5 min; 94°C for 30 s, 55°C for 30 s, 72°C for 20 s, 35 cycles; 72°C for 10 min; and insulation at 12°C.
之后将PCR反应液进行琼脂糖凝胶电泳,电泳结果如图3所示。之后用琼脂糖凝胶DNA回收试剂盒(TIANGEN,DP219-02)进行胶回收,最后用无酶水洗脱,得到体外切割dsDNA模板。The PCR reaction solution was then subjected to agarose gel electrophoresis, and the electrophoresis results are shown in Figure 3. Then, an agarose gel DNA recovery kit (TIANGEN, DP219-02) was used for gel recovery, and finally, enzyme-free water was used for elution to obtain an in vitro cleaved dsDNA template.
3.2体外酶切反应。3.2 In vitro enzyme digestion reaction.
为检测CasW1的切割活性,发明人设计了两组对比实验,分别对其切割活性进行了比较,第一组为CasW1和LbCpf1的切割活性对比,第二组为CasW1与HED Cas 12i.16、S7R-Cas12i.3的切割活性对比。To detect the cutting activity of CasW1, the inventors designed two groups of comparative experiments to compare their cutting activities. The first group was a comparison of the cutting activities of CasW1 and LbCpf1, and the second group was a comparison of the cutting activities of CasW1 with HED Cas 12i.16 and S7R-Cas12i.3.
其中,LbCpf1蛋白的氨基酸序列如SEQ ID NO.12所示,核苷酸编码序列如SEQ ID NO.11所示;HED Cas12i.16蛋白的氨基酸序列如SEQ ID NO.10所示,核苷酸序列如SEQ ID NO.9所示;S7R-Cas12i.3的氨基酸序列如SEQ ID NO.8所示,核苷酸序列如SEQ ID NO.7所示。Among them, the amino acid sequence of LbCpf1 protein is shown in SEQ ID NO.12, and the nucleotide coding sequence is shown in SEQ ID NO.11; the amino acid sequence of HED Cas12i.16 protein is shown in SEQ ID NO.10, and the nucleotide sequence is shown in SEQ ID NO.9; the amino acid sequence of S7R-Cas12i.3 is shown in SEQ ID NO.8, and the nucleotide sequence is shown in SEQ ID NO.7.
采用实施例2步骤1的方法分别构建LbCpf1、HED Cas12i.16、S7R-Cas 12i.3表达载体,重组表达载体图谱见图7、8、9。The method of step 1 in Example 2 was used to construct LbCpf1, HED Cas12i.16, and S7R-Cas 12i.3 expression vectors, respectively. The maps of the recombinant expression vectors are shown in Figures 7, 8, and 9.
3.2.1CasW1、LbCpf1的切割活性比较3.2.1 Comparison of cleavage activity of CasW1 and LbCpf1
具体操作步骤如下:The specific steps are as follows:
(1)分别制备CasW1-crRNA及LbCpf1-crRNA。(1) Prepare CasW1-crRNA and LbCpf1-crRNA respectively.
根据hHPRT1基因设计靶向序列(spacer),命名为hHPRT1-spacer:GGTTAAAGATGGTTAAATGAT(SEQ ID NO.4)。A targeting sequence (spacer) was designed based on the hHPRT1 gene and named hHPRT1-spacer: GGTTAAAGATGGTTAAATGAT (SEQ ID NO.4).
根据CasW1和LbCpf1的DR序列,分别设计上述Cas蛋白的crRNA序列,并分别命名为CasW1-hHPRT1-crRNA和LbCpf1-hHPRT1-crRNA,具体如下:According to the DR sequences of CasW1 and LbCpf1, the crRNA sequences of the above Cas proteins were designed and named CasW1-hHPRT1-crRNA and LbCpf1-hHPRT1-crRNA, respectively, as follows:
CasW1-hHPRT1-crRNA:CasW1-hHPRT1-crRNA:
GTCTAAATGACCTATAAATTTCTACTATGTGTAGATGGTTAAAGATGGTTAAATGAT(SEQ ID NO.13),其中,划线部分序列为CasW1的DR序列; GTCTAAATGACCTATAAATTTCTACTATGTGTAGAT GGTTAAAGATGGTTAAATGAT (SEQ ID NO. 13), wherein the underlined sequence is the DR sequence of CasW1;
LbCpf1-hHPRT1-crRNA:LbCpf1-hHPRT1-crRNA:
TAATTTCTACTAAGTGTAGATGGTTAAAGATGGTTAAATGAT(SEQ ID NO.14),其中,划线部分序列为LbCpf1的DR序列; TAATTTCTACTAAGTGTAGAT GGTTAAAGATGGTTAAATGAT (SEQ ID NO. 14), wherein the underlined sequence is the DR sequence of LbCpf1;
分别化学合成(由南京金斯瑞生物科技有限公司合成)CasW1-hHPRT1-crRNA和LbCpf1-hHPRT1-crRNA序列片段。之后配制CasW1与CasW1-hHPRT1-crRNA混合溶液,所述混合溶液组分如下:CasW1-hHPRT1-crRNA and LbCpf1-hHPRT1-crRNA sequence fragments were chemically synthesized (by Nanjing GenScript Biotechnology Co., Ltd.) respectively. Then, a mixed solution of CasW1 and CasW1-hHPRT1-crRNA was prepared, and the components of the mixed solution were as follows:
无酶H2O 20μl,NEBuffer r2.1(10×,NEB,#B6002S)3μl,crRNA(CasW1-hHPRT1-crRNA或LbCpf1-hHPRT1-crRNA)3μl(浓度为30nM),Cas蛋白(CasW1或LbCpf1)1μl(浓度为30nM),反应体系总体积为27μl。20 μl of enzyme-free H 2 O, 3 μl of NEBuffer r2.1 (10×, NEB, #B6002S), 3 μl of crRNA (CasW1-hHPRT1-crRNA or LbCpf1-hHPRT1-crRNA) (concentration: 30 nM), 1 μl of Cas protein (CasW1 or LbCpf1) (concentration: 30 nM), and the total volume of the reaction system was 27 μl.
采用同样的方式配制LbCpf1与LbCpf1-hHPRT1-crRNA混合溶液。The mixed solution of LbCpf1 and LbCpf1-hHPRT1-crRNA was prepared in the same manner.
之后将CasW1与CasW1-hHPRT1-crRNA混合溶液,以及LbCpf1与LbCpf1-hHPRT1-crRNA混合溶液分别置于PCR仪中,25℃条件下反应10min。Then, the mixed solution of CasW1 and CasW1-hHPRT1-crRNA, and the mixed solution of LbCpf1 and LbCpf1-hHPRT1-crRNA were placed in a PCR instrument respectively and reacted at 25°C for 10 minutes.
(2)向反应后的CasW1与CasW1-hHPRT1-crRNA混合溶液、LbCpf1与LbCpf1-hHPRT1-crRNA混合溶液中分别加入3uL 60nM dsDNA溶液(终浓度为6nM),总的反应体系为30μl,充分混匀,瞬离后置于PCR仪中37℃孵育10分钟。(2) Add 3uL 60nM dsDNA solution (final concentration is 6nM) to the mixed solution of CasW1 and CasW1-hHPRT1-crRNA and the mixed solution of LbCpf1 and LbCpf1-hHPRT1-crRNA respectively. The total reaction system is 30μl. Mix thoroughly, centrifuge briefly, and incubate in a PCR instrument at 37°C for 10 minutes.
(3)向上一步反应体系中加入1uL Proteinase K充分混匀,瞬离,室温条件下孵育10分钟,消化掉反应体系中的蛋白组分;(3) Add 1uL Proteinase K to the reaction system in the previous step, mix thoroughly, centrifuge, and incubate at room temperature for 10 minutes to digest the protein components in the reaction system;
(4)采用磁珠纯化酶切反应后的DNA片段。(4) Use magnetic beads to purify the DNA fragments after enzyme digestion reaction.
提前20min将DNA分选磁珠液(诺唯赞,货号N411-02)从4℃冰箱取出,平衡至室温。上下颠倒混匀磁珠,吸取3倍体积(约150μl)的磁珠液加入至步骤(3)得到的dsDNA切割产物中,并使用移液器轻轻吹打10次混匀,室温条件下孵育10min,使dsDNA切割产物结合到磁珠上。将盛有dsDNA切割产物样品溶液的PCR管置于磁力架上,待溶液澄清后,小心移除上清。保持前述PCR管始终处于磁力架上,加入200μl新鲜配制的80%乙醇以漂洗磁珠,室温孵育30s,小心移除上清。之后重复漂洗一次。室温下开盖干燥磁珠5min,将PCR管从磁力架上取出,加入15μl无酶水,涡旋震荡或使用移液器吹打充分混匀,室温静置2min。之后再将PCR管置于磁力架上静置5min,待溶液澄清后,小心吸取上清至一个新的无核酸酶PCR管中。Take out the DNA sorting magnetic bead solution (Novozyme, catalog number N411-02) from the 4°C refrigerator 20 minutes in advance and equilibrate to room temperature. Mix the magnetic beads by turning them upside down, draw 3 times the volume (about 150 μl) of the magnetic bead solution and add it to the dsDNA cleavage product obtained in step (3), and use a pipette to gently blow 10 times to mix, incubate at room temperature for 10 minutes to allow the dsDNA cleavage product to bind to the magnetic beads. Place the PCR tube containing the dsDNA cleavage product sample solution on the magnetic rack, and carefully remove the supernatant after the solution is clarified. Keep the aforementioned PCR tube on the magnetic rack at all times, add 200 μl of freshly prepared 80% ethanol to rinse the magnetic beads, incubate at room temperature for 30 seconds, and carefully remove the supernatant. Repeat the rinse once. Open the lid and dry the magnetic beads at room temperature for 5 minutes, remove the PCR tube from the magnetic rack, add 15 μl of enzyme-free water, vortex or use a pipette to mix thoroughly, and let it stand at room temperature for 2 minutes. Then place the PCR tube on the magnetic rack and let it stand for 5 minutes. After the solution is clarified, carefully pipette the supernatant into a new nuclease-free PCR tube.
(5)随后用便携式生物分析仪(厚泽生物,Qsep1)分析dsDNA切割产物,参考S1高分辨率卡夹(厚泽生物,C105102)检测方案检测酶切效果,分析结果如图4。通过Qsep1自带的Smear分析选项分析得出CasW1的切割活性为99.5%,LbCpf1的切割活性为95%,CasW1的切割活性高于LbCpf1。(5) The dsDNA cleavage products were then analyzed using a portable bioanalyzer (Houze Biotechnology, Qsep1), and the enzyme cleavage effect was detected by referring to the S1 high-resolution cartridge (Houze Biotechnology, C105102) detection scheme. The analysis results are shown in Figure 4. The Smear analysis option provided by Qsep1 showed that the cleavage activity of CasW1 was 99.5%, and the cleavage activity of LbCpf1 was 95%. The cleavage activity of CasW1 was higher than that of LbCpf1.
3.2.1CasW1、HED Cas12i.16、S7R-Cas12i3的切割活性比较3.2.1 Comparison of cleavage activity of CasW1, HED Cas12i.16, and S7R-Cas12i3
具体操作步骤如下:The specific steps are as follows:
(1)分别制备CasW1、HED Cas12i.16以及S7R-Cas12i3的crRNA。(1) Prepare crRNAs of CasW1, HED Cas12i.16 and S7R-Cas12i3 respectively.
合成hHPRT1-spacer序列片段,根据CasW1、HED Cas12i.16以及S7R-Cas12i3的DR序列,分别设计上述Cas蛋白的crRNA序列,并分别命名为CasW1-hHPRT1-crRNA、HED Cas12i.16-hHPRT1-crRNA和S7R-Cas12i3-hHPRT1-crRNA,具体序列如下:The hHPRT1-spacer sequence fragment was synthesized, and the crRNA sequences of the above Cas proteins were designed according to the DR sequences of CasW1, HED Cas12i.16 and S7R-Cas12i3, and named as CasW1-hHPRT1-crRNA, HED Cas12i.16-hHPRT1-crRNA and S7R-Cas12i3-hHPRT1-crRNA, respectively. The specific sequences are as follows:
CasW1-hHPRT1-crRNA:CasW1-hHPRT1-crRNA:
GTCTAAATGACCTATAAATTTCTACTATGTGTAGATGGTTAAAGATGGTTAAATGAT(SEQ ID NO.13),其中,划线部分序列为CasW1的DR序列; GTCTAAATGACCTATAAATTTCTACTATGTGTAGAT GGTTAAAGATGGTTAAATGAT (SEQ ID NO. 13), wherein the underlined sequence is the DR sequence of CasW1;
HED Cas12i.16-hHPRT1-crRNA:HED Cas12i.16-hHPRT1-crRNA:
CTAGCAATGACTCAGAAATGTGTCCCCAGTTGACACGGTTAAAGATGGTTAAATGAT(SEQ ID NO.15),其中,划线部分序列为HED Cas12i.16的DR序列; CTAGCAATGACTCAGAAATGTGTCCCCAGTTGACAC GGTTAAAGATGGTTAAATGAT (SEQ ID NO. 15), wherein the underlined sequence is the DR sequence of HED Cas12i.16;
S7R-Cas12i3-hHPRT1-crRNA:S7R-Cas12i3-hHPRT1-crRNA:
AGAGAATGTGTGCATAGTCACACGGTTAAAGATGGTTAAATGAT(SEQ ID NO.16),其中,划线部分序列为S7R-Cas12i3的DR序列; AGAGAATGTGTGCATAGTCACAC GGTTAAAGATGGTTAAATGAT (SEQ ID NO. 16), wherein the underlined sequence is the DR sequence of S7R-Cas12i3;
分别化学合成(由南京金斯瑞生物科技有限公司合成)CasW1-hHPRT1-crRNA、HED Cas12i.16-hHPRT1-crRNA和S7R-Cas12i3-hHPRT1-crRNA序列片段。CasW1-hHPRT1-crRNA, HED Cas12i.16-hHPRT1-crRNA and S7R-Cas12i3-hHPRT1-crRNA sequence fragments were chemically synthesized (by Nanjing GenScript Biotech Co., Ltd.), respectively.
(2)参考3.2.1步骤中切割活性比较实验,分别进行CasW1、HED Cas12i.16以及S7R-Cas12i3体外酶切实验,最后采用磁珠纯化酶切反应后的DNA片段。(2) Referring to the cutting activity comparison experiment in step 3.2.1, in vitro enzyme cleavage experiments were performed with CasW1, HED Cas12i.16 and S7R-Cas12i3 respectively, and finally the DNA fragments after the enzyme cleavage reaction were purified using magnetic beads.
(3)随后用便携式生物分析仪(厚泽生物,Qsep1)分析dsDNA切割产物,参考S1高分辨率卡夹(厚泽生物,C105102)检测方案检测酶切效果,分析结果如图5。通过Qsep1自带的Smear分析选项分析得出CasW1的切割活性为100%,HED Cas12i.16的切割活性为14%,S7R-Cas12i3的切割活性为58%,CasW1的切割活性远高于HED Cas12i.16和S7R-Cas12i3。(3) The dsDNA cleavage products were then analyzed using a portable bioanalyzer (Houze Bio, Qsep1), and the enzyme cleavage effect was detected by referring to the S1 high-resolution card clamp (Houze Bio, C105102) detection scheme. The analysis results are shown in Figure 5. The Smear analysis option provided by Qsep1 showed that the cleavage activity of CasW1 was 100%, the cleavage activity of HED Cas12i.16 was 14%, and the cleavage activity of S7R-Cas12i3 was 58%. The cleavage activity of CasW1 was much higher than that of HED Cas12i.16 and S7R-Cas12i3.
序列信息:
Sequence information:
在本发明提及的所有文献都在本申请中引用作为参考,就如同每一篇文献被单独引用作为参考那样。此外应理解,在阅读了本发明的上述讲授内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。 All documents mentioned in the present invention are cited as references in this application, just as each document is cited as reference individually. In addition, it should be understood that after reading the above teachings of the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the claims attached to this application.
Claims (22)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310593098.5 | 2023-05-24 | ||
| CN202310593098.5A CN116622678A (en) | 2023-05-24 | 2023-05-24 | Gene editing protein, corresponding gene editing system and application |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024240053A1 true WO2024240053A1 (en) | 2024-11-28 |
Family
ID=87602058
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/093739 Pending WO2024240053A1 (en) | 2023-05-24 | 2024-05-16 | Gene editing protein, corresponding gene editing system thereof, and use thereof |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN116622678A (en) |
| WO (1) | WO2024240053A1 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116622678A (en) * | 2023-05-24 | 2023-08-22 | 尧唐(上海)生物科技有限公司 | Gene editing protein, corresponding gene editing system and application |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3388517A1 (en) * | 2017-04-10 | 2018-10-17 | Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. | Compounds for increasing genome editing efficiency |
| CN113373130A (en) * | 2021-05-31 | 2021-09-10 | 复旦大学 | Cas12 protein, gene editing system containing Cas12 protein and application |
| WO2022100527A1 (en) * | 2020-11-11 | 2022-05-19 | 山东舜丰生物科技有限公司 | Novel cas enzyme and system and use thereof |
| WO2022256440A2 (en) * | 2021-06-01 | 2022-12-08 | Arbor Biotechnologies, Inc. | Gene editing systems comprising a crispr nuclease and uses thereof |
| US20220398426A1 (en) * | 2019-09-10 | 2022-12-15 | IonQ, Inc. | Novel Class 2 Type II and Type V CRISPR-Cas RNA-Guided Endonucleases |
| CN116622678A (en) * | 2023-05-24 | 2023-08-22 | 尧唐(上海)生物科技有限公司 | Gene editing protein, corresponding gene editing system and application |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113015798B (en) * | 2018-11-15 | 2023-01-10 | 中国农业大学 | CRISPR-Cas12a enzymes and systems |
| CN114190093B (en) * | 2019-02-13 | 2025-01-17 | 比姆医疗股份有限公司 | Disruption of splice acceptor sites of disease-associated genes using an adenylate deaminase base editor, including for the treatment of genetic diseases |
| AU2022382751A1 (en) * | 2021-11-02 | 2024-05-23 | Huidagene Therapeutics (Singapore) Pte. Ltd. | Novel crispr-cas12i systems and uses thereof |
| CN116751763B (en) * | 2023-05-08 | 2024-02-13 | 珠海舒桐医疗科技有限公司 | A Cpf1 protein, V-type gene editing system and its application |
-
2023
- 2023-05-24 CN CN202310593098.5A patent/CN116622678A/en active Pending
-
2024
- 2024-05-16 WO PCT/CN2024/093739 patent/WO2024240053A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3388517A1 (en) * | 2017-04-10 | 2018-10-17 | Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. | Compounds for increasing genome editing efficiency |
| US20220398426A1 (en) * | 2019-09-10 | 2022-12-15 | IonQ, Inc. | Novel Class 2 Type II and Type V CRISPR-Cas RNA-Guided Endonucleases |
| WO2022100527A1 (en) * | 2020-11-11 | 2022-05-19 | 山东舜丰生物科技有限公司 | Novel cas enzyme and system and use thereof |
| CN113373130A (en) * | 2021-05-31 | 2021-09-10 | 复旦大学 | Cas12 protein, gene editing system containing Cas12 protein and application |
| WO2022256440A2 (en) * | 2021-06-01 | 2022-12-08 | Arbor Biotechnologies, Inc. | Gene editing systems comprising a crispr nuclease and uses thereof |
| CN116622678A (en) * | 2023-05-24 | 2023-08-22 | 尧唐(上海)生物科技有限公司 | Gene editing protein, corresponding gene editing system and application |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116622678A (en) | 2023-08-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN116376874B (en) | Cas protein, gene editing system and application thereof | |
| CN115651927B (en) | Methods and compositions for editing RNA | |
| US20210403861A1 (en) | Nucleotide-specific recognition sequences for designer tal effectors | |
| WO2022253185A1 (en) | Cas12 protein, gene editing system containing cas12 protein, and application | |
| CN116096885A (en) | Compositions and methods for targeting C9orf72 | |
| KR20190005801A (en) | Target Specific CRISPR variants | |
| CN113015798A (en) | CRISPR-Cas12a enzymes and systems | |
| WO2025025808A1 (en) | Cas protein, corresponding gene editing system, and use | |
| CN108823202A (en) | Base editing system, method, kit and its application of the mutation of people's HBB gene are repaired for specificity | |
| JP2022533673A (en) | Single Nucleotide Polymorphism Editing Using Programmable Nucleotide Editor System | |
| WO2019127087A1 (en) | System and method for genome editing | |
| CN114507654A (en) | Novel Cas enzymes and systems and uses | |
| WO2019120193A1 (en) | Split single-base gene editing systems and application thereof | |
| US20210198328A1 (en) | Modulated cas-inhibitors | |
| US20210363206A1 (en) | Proteins that inhibit cas12a (cpf1), a cripr-cas nuclease | |
| WO2024251229A1 (en) | Cas enzyme and system and use thereof | |
| WO2024240053A1 (en) | Gene editing protein, corresponding gene editing system thereof, and use thereof | |
| Sathyan et al. | ARF‐AID: a rapidly inducible protein degradation system that preserves basal endogenous protein levels | |
| CN118979026A (en) | A gene editing protein, its corresponding gene editing system and application | |
| CN114277015A (en) | Novel CRISPR enzymes and uses | |
| CN120330161A (en) | A Cas protein and its variants, and corresponding gene editing systems and applications | |
| US20250179530A1 (en) | Fusion proteins | |
| US20250223576A1 (en) | Optimized cas protein and use thereof | |
| WO2025201316A1 (en) | Crispr-cas system | |
| CN120040556A (en) | Connecting peptide and application thereof |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24810299 Country of ref document: EP Kind code of ref document: A1 |