[go: up one dir, main page]

WO2025119363A1 - Cas protein, crispr-cas system containing cas protein, and use of cas protein - Google Patents

Cas protein, crispr-cas system containing cas protein, and use of cas protein Download PDF

Info

Publication number
WO2025119363A1
WO2025119363A1 PCT/CN2024/137580 CN2024137580W WO2025119363A1 WO 2025119363 A1 WO2025119363 A1 WO 2025119363A1 CN 2024137580 W CN2024137580 W CN 2024137580W WO 2025119363 A1 WO2025119363 A1 WO 2025119363A1
Authority
WO
WIPO (PCT)
Prior art keywords
optionally
domain
cell
sequence
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/137580
Other languages
French (fr)
Chinese (zh)
Inventor
张红玲
冯昶瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yoltech Therapeutics Co Ltd
Original Assignee
Yoltech Therapeutics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yoltech Therapeutics Co Ltd filed Critical Yoltech Therapeutics Co Ltd
Publication of WO2025119363A1 publication Critical patent/WO2025119363A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • A61K38/16Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • A61K38/43Enzymes; Proenzymes; Derivatives thereof
    • A61K38/46Hydrolases (3)
    • A61K38/465Hydrolases (3) acting on ester bonds (3.1), e.g. lipases, ribonucleases
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • A61K38/16Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • A61K38/43Enzymes; Proenzymes; Derivatives thereof
    • A61K38/46Hydrolases (3)
    • A61K38/50Hydrolases (3) acting on carbon-nitrogen bonds, other than peptide bonds (3.5), e.g. asparaginase
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • A61K48/005Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy characterised by an aspect of the 'active' part of the composition delivered, i.e. the nucleic acid delivered
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P13/00Drugs for disorders of the urinary system
    • A61P13/12Drugs for disorders of the urinary system of the kidneys
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P25/00Drugs for disorders of the nervous system
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P25/00Drugs for disorders of the nervous system
    • A61P25/14Drugs for disorders of the nervous system for treating abnormal movements, e.g. chorea, dyskinesia
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P27/00Drugs for disorders of the senses
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P27/00Drugs for disorders of the senses
    • A61P27/02Ophthalmic agents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P27/00Drugs for disorders of the senses
    • A61P27/16Otologicals
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P3/00Drugs for disorders of the metabolism
    • A61P3/06Antihyperlipidemics
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P31/00Antiinfectives, i.e. antibiotics, antiseptics, chemotherapeutics
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • A61P35/02Antineoplastic agents specific for leukemia
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P43/00Drugs for specific purposes, not provided for in groups A61P1/00-A61P41/00
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04004Adenosine deaminase (3.5.4.4)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • Patent application number CN202311664660.5 filed on December 6, 2023, entitled "A Cas protein, a CRISPR-Cas system comprising it and its application", and the entire contents of the application, including any sequence listings and drawings, are incorporated herein by reference in their entirety.
  • the present disclosure relates to the field of gene editing, and in particular, to a Cas protein, a CRISPR-Cas system comprising the same, and applications thereof.
  • CRISPR-Cas Clustered regularly interspaced short palindromic repeats
  • Cas CRISPR-associated genes
  • the CRISPR-Cas system of prokaryotic adaptive immunity is an extremely diverse set of protein effectors, non-coding elements, and loci that can be engineered and used for applications such as gene editing, target detection, and disease treatment.
  • the main purpose of the present disclosure is to provide new Cas proteins and CRISPR-Cas systems to meet diverse application needs.
  • the present disclosure provides a Cas protein, comprising an OBD domain, a REC domain, a RuvC domain, a Helical domain, and a Nuc domain.
  • the RuvC domain comprises RuvC-I, RuvC-II, and RuvC-III domains.
  • the Cas protein does not comprise a HNH domain and a PI domain.
  • the RuvC-III domain is located between the Nuc-I domain and the Nuc-II domain.
  • the OBD domain is a bi-split domain, comprising OBD-I and OBD-II domains.
  • the OBD-I domain is located at the N-terminus and the Nuc-II domain is located at the C-terminus.
  • the Cas protein performs nucleic acid cleavage function without the aid of tracrRNA.
  • the present disclosure provides a fusion protein comprising the Cas protein of the present disclosure; and one or more functional domains.
  • the functional domain is selected from a localization signal, a reporter protein, a Cas protein targeting portion, a DNA binding domain, an epitope tag, a transcription activation domain, a transcription repression domain, a nuclease, a deamination domain, a methylase, a demethylase, a transcription release factor, an HDAC, a cleavage-active polypeptide, a ligase, an integrase, a transposase, a recombinase, a polymerase, and a base excision repair inhibitor (such as a uracil-DNA glycosylase inhibitor (UGI)).
  • a uracil-DNA glycosylase inhibitor UBI
  • the functional domain comprises one or more of the following enzymatic activities on the target sequence: methylase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase), and deglycosylation activity.
  • methylase activity methylase activity
  • acetyltransferase activity deacetylase activity
  • kinase activity phosphatase activity
  • ubiquitin ligase activity deubiquitinating activity
  • adenylation activity deadenylation activity
  • the functional domain is selected from an adenosine deaminase catalytic domain or a cytidine deaminase catalytic domain.
  • the present disclosure provides an isolated polynucleotide encoding the Cas protein described in the present disclosure or the fusion protein described in the present disclosure.
  • the present disclosure provides an isolated nucleic acid molecule comprising a structure as shown in Formula IV below:
  • segments R1a and R1b are reverse complementary sequences and form a first stem (R1) having a plurality (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) of nucleotide pairs in the Cas protein;
  • Segments Ba and Bb do not base pair with each other and form a bulge (B);
  • Segments R2a and R2b are reverse complementary sequences and form a second stem (R2), which has multiple (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) base pairs; and L is a loop formed at the second stem and formed by multiple (3, 4, 5, 6, 7, 8, 9, 10) nucleotides.
  • the nucleic acid molecule comprises the structure shown in Formula IV below:
  • segments R1a and R1b are reverse complementary sequences and form a first stem (R1) having 3 or 5 nucleotide pairs in Cas12o;
  • the segments Ba and Bb do not exist at the same time, and the bulge (B) formed by the existing segment Ba or segment Bb is formed by 2 or 3 nucleotides;
  • Segments R2a and R2b are reverse complementary sequences and form a second stem (R2) having 6 or 7 base pairs; and L is a loop formed at the second stem and having 5 or 7 nucleotides.
  • the nucleic acid molecule comprises or consists of a sequence selected from the following:
  • sequence described in any one of (ii) to (v) substantially retains the biological function of the sequence from which it is derived;
  • the isolated nucleic acid molecule is RNA
  • the isolated nucleic acid molecule contains a direct repeat (DR) sequence in the CRISPR/Cas system.
  • DR direct repeat
  • the nucleic acid molecule comprises one or more stem-loops or optimized secondary structures
  • sequence of any of (ii)-(v) retains the secondary structure of the sequence from which it is derived.
  • the nucleic acid molecule comprises or consists of a sequence selected from the following:
  • the present disclosure provides a guide RNA (gRNA), which includes a direct repeat (DR) sequence capable of binding to the Cas protein of the present disclosure and a spacer sequence capable of targeting a target sequence.
  • gRNA guide RNA
  • DR direct repeat
  • the present disclosure provides a vector comprising the polynucleotide of the present disclosure and/or the nucleic acid molecule of the present disclosure.
  • the present disclosure provides a composite comprising:
  • a protein component selected from the group consisting of a Cas protein of the present disclosure, a fusion protein of the present disclosure, or a combination thereof;
  • nucleic acid component selected from the group consisting of a guide RNA of the present disclosure, a nucleic acid encoding a guide RNA of the present disclosure, a precursor RNA of the guide RNA of the present disclosure, a precursor RNA nucleic acid encoding a guide RNA of the present disclosure, or a combination thereof;
  • the present disclosure provides a CRISPR-Cas composition comprising:
  • a first component selected from the group consisting of a Cas protein of the present disclosure, a fusion protein of the present disclosure, a nucleotide sequence encoding the Cas protein of the present disclosure or the fusion protein of the present disclosure, and any combination thereof;
  • the second component is a nucleotide sequence comprising one or more guide RNAs disclosed herein, or encoding the nucleotide sequence comprising one or more guide RNAs disclosed herein;
  • the guide RNA comprises:
  • the present disclosure provides a CRISPR-Cas system comprising one or more vectors, wherein the one or more vectors comprise:
  • a first nucleic acid which is a nucleotide sequence encoding the Cas protein of the present disclosure or the fusion protein of the present disclosure; optionally, the first nucleic acid is operably linked to a first regulatory element;
  • RNA comprises:
  • the first nucleic acid and the second nucleic acid are present on the same or different vectors.
  • the guide RNA is capable of forming a complex with the Cas protein or fusion protein described in (i).
  • the vector comprises a plasmid or a viral vector.
  • the guide RNA includes a spacer sequence capable of hybridizing with a target sequence; and a direct repeat (DR) sequence connected to the spacer sequence and capable of guiding the protein to bind to the guide RNA, thereby forming a CRISPR-Cas composition or complex targeting the target sequence.
  • DR direct repeat
  • the guide RNA includes unmodified and modified guide RNA.
  • the modified guide RNA includes chemical modifications of bases.
  • the chemical modification comprises methylation modification, methoxy modification, fluorination modification, or thio modification.
  • the first regulatory element and/or the second regulatory element is a promoter, such as an inducible promoter.
  • At least one component in the composition is non-naturally occurring or modified.
  • the spacer sequence is connected to the 3' end of the direct repeat (DR) sequence.
  • the spacer sequence comprises a complementary sequence to the target sequence.
  • the target sequence when the target sequence is DNA, the target sequence is located 3' to a protospacer adjacent motif (PAM), and the PAM is 5'-TN, wherein N is A, T, G or C.
  • PAM protospacer adjacent motif
  • the target sequence is a DNA from a prokaryotic cell or a eukaryotic cell, or a DNA sequence formed based on RNA reverse transcription; alternatively, the target sequence is a non-naturally occurring DNA, or a DNA sequence formed based on RNA reverse transcription.
  • the target sequence comprises a cDNA sequence.
  • the target sequence comprises a single-stranded DNA or a double-stranded DNA sequence.
  • the target sequence is present within a cell.
  • the target sequence is present in the nucleus or in the cytoplasm (eg, an organelle).
  • the cell is a eukaryotic cell.
  • the cell is a prokaryotic cell.
  • the target sequence is present outside the cell.
  • the Cas protein of the present disclosure is linked to one or more NLS sequences, or the fusion protein comprises one or more NLS sequences.
  • the NLS sequence is linked to the N-terminus or C-terminus of the Cas protein of the present disclosure.
  • the NLS sequence is fused to the N-terminus or C-terminus of the Cas protein of the present disclosure.
  • the present disclosure provides a kit comprising one or more components selected from the following: a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, or a CRISPR-Cas system of the present disclosure.
  • the kit further comprises a label or instructions.
  • the kit is used for one or more of gene or genome editing, disease treatment, targeting a target gene, and cutting a target gene or a non-target gene.
  • the present disclosure provides a delivery composition comprising a delivery vector or a delivery medium, and one or more selected from the following: a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, or a CRISPR-Cas system of the present disclosure.
  • the present disclosure provides a host cell comprising a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, a CRISPR-Cas system of the present disclosure, or a delivery composition of the present disclosure.
  • the present disclosure provides an enzyme preparation, comprising the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, or the CRISPR-Cas system of the present disclosure, or the delivery composition of the present disclosure.
  • the present disclosure provides a kit comprising:
  • the present disclosure provides a kit comprising:
  • (a1) a first container, and a Cas protein of the present disclosure, or a fusion protein of the present disclosure, or a gene encoding the Cas protein of the present disclosure, or an expression vector thereof, or a drug containing the Cas protein of the present disclosure, or a fusion protein of the present disclosure, or a gene encoding the Cas protein of the present disclosure, or an expression vector thereof, located in the first container;
  • (b1) an optional second container, and the guide RNA of the present disclosure or its expression vector, or a drug containing the guide RNA of the present disclosure or its expression vector, located in the second container.
  • the present disclosure provides a method for targeting and editing a target gene or cutting a target gene, comprising: contacting the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the drug kit of the present disclosure with the target gene, or delivering it to a cell containing the target gene, wherein the target sequence is present in the target gene.
  • the present disclosure provides a method of inducing a change in a cell state, the method comprising contacting the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the drug kit of the present disclosure with a target gene in a cell.
  • the present disclosure provides a method for altering the expression of a gene product, comprising: contacting the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the drug kit of the present disclosure with a nucleic acid molecule encoding the gene product, or delivering it to a cell comprising the nucleic acid molecule, wherein the target sequence is present in the nucleic acid molecule.
  • the present disclosure provides a cell or progeny thereof obtained by any of the methods described herein, wherein the cell comprises a modification that is not present in its wild type.
  • the disclosure provides a cell product of a cell of the disclosure or a progeny thereof.
  • the present disclosure provides an in vitro, ex vivo or in vivo cell or cell line or progeny thereof, comprising: a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, a CRISPR-Cas system of the present disclosure, or a delivery composition of the present disclosure.
  • the present disclosure provides a cell preparation comprising the host cell of the present disclosure, the cell of the present disclosure or its progeny, or a cell product of the cell of the present disclosure or its progeny, or the cell or cell line of the present disclosure or its progeny.
  • the present disclosure also provides uses of the Cas protein of the present disclosure, the fusion protein of the present disclosure, the polynucleotide of the present disclosure, the vector of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the kit of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the drug kit of the present disclosure for preparing a drug or preparation for nucleic acid editing (e.g., gene or genome editing).
  • nucleic acid editing e.g., gene or genome editing
  • the present disclosure provides uses of the Cas protein of the present disclosure, the fusion protein of the present disclosure, the polynucleotide of the present disclosure, the vector of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the kit of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the kit of the present disclosure for preparing a medicament or preparation, wherein the medicament or preparation is used for one or more selected from the following group:
  • the present disclosure provides a method for detecting the presence of a target nucleic acid molecule in a sample, the method comprising contacting the sample with the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the kit of the present disclosure, the delivery composition of the present disclosure, or the enzyme preparation of the present disclosure and a non-target sequence, detecting a detectable signal generated by the cleavage of the non-target sequence, thereby detecting the target nucleic acid molecule, wherein the non-target sequence does not hybridize with the guide RNA.
  • the present disclosure provides a method of treating a condition or disease in a subject in need thereof, comprising administering to the subject a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, a CRISPR-Cas system of the present disclosure, a kit of the present disclosure, a delivery composition of the present disclosure, an enzyme preparation of the present disclosure, or a pharmaceutical kit of the present disclosure.
  • the present disclosure provides a sterile container comprising a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, or a CRISPR-Cas system of the present disclosure, or a delivery composition of the present disclosure, or an enzyme preparation of the present disclosure.
  • the present disclosure provides an implantable device comprising a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, a CRISPR-Cas system of the present disclosure, a delivery composition of the present disclosure, or an enzyme preparation of the present disclosure.
  • Figure 1 depicts a phylogenetic tree of Cas12o homologs.
  • Figure 2 describes the schematic domain structure ( Figures 2A, 2C) and predicted three-dimensional structure (Figure 2B) of Cas12o.
  • FIG3 depicts the Cas12o1 expression vector ( FIG3A ) and the LbCpf1 expression vector ( FIG3B )
  • FIG. 4 depicts the Target plasmid carrying the target sequence hTTR1.
  • FIG5 depicts a comparison of the cleavage activities of Cas12o and LbCpf1 against the hTTR1 target sequence in competent cells.
  • Figure 6 describes the secondary structure prediction of the DR sequences of Cas12o1, Cas12o2, and Cas12o3.
  • FIG. 7 depicts the experimentally determined PAM preference of Cas12o1.
  • the Cas protein of the present invention includes an OBD domain, a RuvC domain, a Helical domain, and a Nuc domain, and has a structure shown in Formula I, Formula II, or Formula III.
  • the Cas protein of the present invention has very good gene editing activity and specificity, can effectively edit or cut the target gene, and can be used to treat the symptoms or diseases of subjects in need of the present invention.
  • Sequence identity is determined by comparing two aligned sequences along a predetermined comparison window (which can be 50%, 60%, 70%, 80%, 90%, 95% or 100% of the length of the reference nucleotide sequence or protein) and determining the number of positions at which identical residues occur. Typically, this is expressed as a percentage.
  • a predetermined comparison window which can be 50%, 60%, 70%, 80%, 90%, 95% or 100% of the length of the reference nucleotide sequence or protein
  • a and “an” are used herein to refer to one or more than one (ie, at least one) of the grammatical object of the article.
  • a polypeptide expresses one or more polypeptides.
  • the term "about” or “approximately” refers to a quantity, level, value, quantity, frequency, percentage, dimension, size, amount, weight, or length that varies by up to 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% compared to a reference quantity, level, value, quantity, frequency, percentage, dimension, size, amount, weight, or length.
  • the term "about” or “approximately” refers to a quantity, level, value, quantity, frequency, percentage, dimension, size, amount, weight, or length range of ⁇ 15%, ⁇ 10%, ⁇ 9%, ⁇ 8%, ⁇ 7%, ⁇ 6%, ⁇ 5%, ⁇ 4%, ⁇ 3%, ⁇ 2%, or ⁇ 1% around a reference quantity, level, value, quantity, frequency, percentage, dimension, size, amount, weight, or length.
  • heterologous refers to a nucleotide or polypeptide sequence that is not present in a natural nucleic acid or protein, respectively.
  • a heterologous polypeptide comprises an amino acid sequence from a protein other than the Cas12o protein.
  • a portion of a Cas12o protein from one species is fused with a portion of a Cas12o protein from a different species. Therefore, it can be considered that the Cas12o sequences from each species are heterologous to each other.
  • a Cas12o protein e.g., dCas12o protein
  • a non-Cas12o protein e.g., deaminase, histone deacetylase
  • the sequence of the active domain can be considered to be a heterologous polypeptide (it is heterologous to the Cas12o protein).
  • a "homolog” of a protein as used herein is a protein of the same species that performs the same or similar function as the protein to which it is a homolog.
  • Homologous proteins may be, but need not be structurally related, or only partially structurally related.
  • An "ortholog” of a protein as used herein is a protein of a different species that performs the same or similar function as the protein to which it is a homolog.
  • Orthologous proteins may be, but need not be structurally related, or only partially structurally related.
  • a homolog or ortholog of a nucleic acid-guided nuclease such as that mentioned herein has at least 80%, at least 85%, at least 90%, at least 95% sequence homology or identity with the nucleic acid-guided nuclease.
  • a homolog or ortholog of a nucleic acid-guided nuclease has at least 80%, at least 85%, at least 90%, or at least 95% sequence identity with a wild-type nucleic acid-guided nuclease.
  • orthologs of known nucleic acid-guided nucleases can be identified.
  • Some methods of identifying orthologs of nucleic acid-guided nucleases may involve identifying a tracr sequence in a target genome. Identification of tracr sequences may involve the following steps: Searching for direct repeat sequences or tracr mate sequences in a database to identify regions containing nucleic acid-guided nucleases. Searching for homologous sequences in regions flanking nucleic acid-guided nucleases in the sense and antisense directions. Looking for transcription terminators and secondary structures.
  • the chimeric enzyme can comprise a first fragment and a second fragment, and the fragments can be fragments of nucleic acid-guided nuclease orthologs of an organism of a certain genus or species, for example, the fragments are from nucleic acid-guided nuclease orthologs of different species.
  • polynucleotide and nucleic acid are used interchangeably herein and refer to a polymeric form of nucleotides (ribonucleotides or deoxynucleotides) of any length.
  • the term includes, but is not limited to, single-stranded, double-stranded or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or polymers containing purine bases and pyrimidine bases or other natural, chemically or biochemically modified, non-natural or derived nucleotide bases.
  • polynucleotide and “nucleic acid” should be understood to include single-stranded (such as sense or antisense strands) and double-stranded polynucleotides as applicable to the described embodiments.
  • polypeptide refers to a polymeric form of amino acids of any length, which may include genetically encoded and non-genetically encoded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides with modified peptide backbones.
  • the terms include: fusion proteins, including but not limited to fusion proteins with heterologous amino acid sequences, fusions with heterologous and homologous leader sequences, with or without N-terminal methionine residues; immunolabeled proteins, etc.
  • isolated is meant to describe a polynucleotide, polypeptide or cell that is in an environment different from that in which the polynucleotide, polypeptide or cell naturally occurs.
  • An isolated genetically modified host cell may be present in a mixed population of genetically modified host cells.
  • exogenous nucleic acid refers to nucleic acids that are not normally or naturally occurring in nature and/or are not produced by a given bacterium, organism, or cell.
  • endogenous nucleic acid refers to nucleic acids that are normally occurring in nature and/or are produced by a given bacterium, organism, or cell.
  • Endogenous nucleic acid is also referred to as “native nucleic acid” or nucleic acids that are “native” to a given bacterium, organism, or cell.
  • nucleic acid specifically nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction and/or ligation steps, which produce constructs with structural coding sequences or non-coding sequences that can be distinguished from endogenous nucleic acids present in natural systems.
  • the DNA sequence encoding the structural coding sequence can be assembled by cDNA fragments and short oligonucleotide linkers or by a series of synthetic oligonucleotides to provide a synthetic nucleic acid that can be expressed by a recombinant transcription unit contained in a cell or in a cell-free transcription and translation system.
  • sequences can be provided in the form of an open reading frame that is not interrupted by internal non-translated sequences or introns, which are typically present in eukaryotic genes. Genomic DNA containing related sequences can also be used in the formation of recombinant genes or transcription units. The sequence of non-translated DNA can be present at the 5' end or 3' end of the open reading frame, wherein such sequences do not interfere with the operation or expression of the coding region, and can actually play a role in regulating the production of the desired product by various mechanisms.
  • the term "recombinant" polynucleotide or “recombinant” nucleic acid refers to a non-naturally occurring polynucleotide or nucleic acid, such as a polynucleotide or nucleic acid made by an artificial combination of two otherwise separated segments of a sequence through human intervention.
  • This artificial combination is often accomplished by chemical synthesis means or by artificially manipulating the separated segments of nucleic acid (e.g., by genetic engineering techniques). This operation is usually performed to replace codons with redundant codons encoding the same or conservative amino acids, and sequence recognition sites are usually introduced or removed.
  • nucleic acid segments with desired functions are linked together to produce desired functional combinations.
  • This artificial combination is often accomplished by chemical synthesis means or by artificially manipulating the separated segments of nucleic acid (e.g., by genetic engineering techniques).
  • recombinant polypeptide refers to a non-naturally occurring polypeptide, such as a polypeptide made by the artificial combination of two otherwise separate segments of amino acid sequence through human intervention.
  • a polypeptide comprising a heterologous amino acid sequence is recombinant.
  • operably linked refers to a juxtaposition in which the components are in a relationship that allows them to function in their intended manner.
  • a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence.
  • heterologous promoter and “heterologous control regions” refer to promoters and other control regions that are not normally associated with a particular nucleic acid in nature.
  • a "transcriptional control region heterologous to a coding region” is a transcriptional control region that is not normally associated with a coding region in nature.
  • vector refers to a nucleic acid molecule capable of transporting another nucleic acid to which it is linked. It is a replicon, such as a plasmid, phage or cosmid, into which another DNA segment can be inserted to achieve replication of the inserted segment. Generally, when combined with appropriate control elements, the vector is capable of replication.
  • the vector system comprises a single vector. Alternatively, the vector system comprises a plurality of vectors.
  • the vector can be a viral vector.
  • Vectors include, but are not limited to, single-stranded, double-stranded or partially double-stranded nucleic acid molecules; nucleic acid molecules comprising one or more free ends, no free ends (e.g., circular); nucleic acid molecules comprising DNA, RNA or both; and other polynucleotide variants known in the art.
  • plasmid refers to a circular double-stranded DNA loop, in which other DNA segments can be inserted, for example, by standard molecular cloning techniques.
  • viral vector in which there is a DNA or RNA sequence of viral origin in the vector for packaging into a virus (e.g., a retrovirus, a replication-defective retrovirus, an adenovirus, a replication-defective adenovirus, and an adeno-associated virus).
  • viruses e.g., a retrovirus, a replication-defective retrovirus, an adenovirus, a replication-defective adenovirus, and an adeno-associated virus.
  • viral vectors also include polynucleotides carried by viruses for transfection into host cells.
  • Certain vectors are capable of autonomous replication in host cells into which they are introduced (e.g., bacterial vectors and free mammalian vectors with a bacterial origin of replication).
  • vectors After being introduced into the host cell, other vectors (e.g., non-free mammalian vectors) are integrated into the genome of the host cell, thereby replicating together with the host genome.
  • certain vectors are capable of directing the expression of genes operably connected thereto.
  • Such vectors are referred to herein as "expression vectors”.
  • Vectors expressed in eukaryotic cells and vectors that cause expression in eukaryotic cells may be referred to herein as "eukaryotic expression vectors.”
  • Common expression vectors useful in recombinant DNA techniques are often in the form of plasmids.
  • the recombinant expression vector can be suitable for the form of expressing nucleic acid in host cells to include nucleic acid of the present disclosure, which means that the recombinant expression vector includes one or more regulatory elements, which can be selected according to the host cell to be used for expression, and the nucleic acid is operably connected to the nucleic acid sequence to be expressed.
  • "operably connected" is intended to refer to the target nucleotide sequence to be connected to the regulatory element in a manner that allows the nucleotide sequence to be expressed (for example, in an in vitro transcription/translation system or in a host cell when the vector is introduced into a host cell).
  • Advantageous vectors include slow viruses and adeno-associated viruses, and the types of these vectors can also be selected to target specific types of cells.
  • host cell refers to a cell (e.g., cell line) from a multicellular organism cultured as a unicellular entity, a eukaryotic cell, a prokaryotic cell, or as a unicellular entity in vivo or in vitro, which can be used as or has been used as a receptor for nucleic acids (e.g., expression vectors), and includes progeny of the original cell genetically modified by nucleic acids. It should be understood that due to natural, accidental or intentional mutations, the progeny of the unicellular cell may not necessarily be identical to the original parent in morphology or in terms of genome or total DNA complement sequence.
  • a “recombinant host cell” (also referred to as a “genetically modified host cell”) is a host cell into which a heterologous nucleic acid (e.g., expression vector) has been introduced.
  • a subject prokaryotic host cell is a prokaryotic host cell (e.g., bacteria) that has been genetically modified by introducing a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the prokaryotic host cell (not normally found in nature) or a recombinant nucleic acid that is not normally found in a prokaryotic host cell, into a suitable prokaryotic host cell;
  • a subject eukaryotic host cell is a eukaryotic host cell that has been genetically modified by introducing a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the eukaryotic host cell or a recombinant nucleic acid that is not normally
  • amino acid refers to the twenty common naturally occurring amino acids.
  • Naturally occurring amino acids include alanine (Ala, A), arginine (Arg, R), asparagine (Asn, N), aspartic acid (Asp, D), cysteine (Cys, C), glutamic acid (Glu, E), glutamine (Gln, Q), glycine (Gly, G), histidine (His, H), isoleucine (Ile, I), leucine (Leu, L), lysine (Lys, K), methionine (Met, M), phenylalanine (Phe, F), proline (Pro, P), serine (Ser, S), threonine (Thr, T), tryptophan (Trp, W), tyrosine (Tyr, Y) and valine (Val, V).
  • Naturally occurring amino acids include alanine (Ala, A), arginine (Arg, R), asparagine (Asn, N),
  • Sequence identity between two polypeptides or nucleic acid sequences means the percentage of the number of identical residues between the sequences to the total number of residues, and the calculation of the total number of residues is determined based on the mutation type. Mutation types include insertions (extensions) at either or both ends of the sequence, deletions (truncations) at either or both ends of the sequence, substitutions/alternations of one or more amino acids/nucleotides, insertions within the sequence, and deletions within the sequence.
  • the mutation type is one or more of the following: substitutions/alternations of one or more amino acids/nucleotides, insertions within the sequence, and deletions within the sequence, the total number of residues is calculated as the larger of the molecules being compared. If the mutation type also includes insertions (extensions) at either or both ends of the sequence or deletions (truncations) at either or both ends of the sequence, the number of amino acids inserted or deleted at either or both ends (e.g., the number of insertions or deletions at both ends is less than 20) is not counted in the total number of residues.
  • the sequences being compared are aligned in a manner that produces the maximum match between the sequences, and the gaps (if any) in the alignment are resolved by a specific algorithm.
  • conservative amino acid substitution refers to the replacement of an amino acid with a chemically or functionally similar amino acid.
  • Conservative substitution tables providing similar amino acids are well known in the art.
  • the amino acid groups provided below are considered to be conservative substitutions of each other.
  • the selected groups of amino acids considered to be conservative substitutions of each other are:
  • the selected group of amino acids that are considered to be conservative substitutions for each other are:
  • treatment refers to obtaining a desired pharmacological and/or physiological effect.
  • the effect may be preventive, in terms of completely or partially preventing a disease or its symptoms, and/or therapeutic, in terms of partially or completely curing a disease and/or side effects attributable to the disease.
  • treatment covers any treatment of a disease in a mammal (e.g., a human), and includes: (1) preventing the occurrence of a disease in a subject who may be susceptible to the disease but has not yet been diagnosed with the disease; (2) inhibiting the disease, i.e., arresting its development; and (3) relieving the disease, i.e., causing regression of the disease.
  • the terms “individual,” “subject,” “host,” and “patient” are used interchangeably herein to refer to an individual organism, such as a mammal, including but not limited to rodents, apes, humans, mammalian farm animals, mammalian sports animals, and mammalian pets.
  • Cas12o RNA-guided endonuclease polypeptides
  • Cas12o proteins also referred to as “Cas12o proteins”
  • nucleic acids encoding Cas12o proteins and modified host cells comprising Cas12o proteins and/or nucleic acids encoding Cas12o proteins.
  • Cas12o proteins can be used in various applications provided, are smaller in size than other Cas (e.g., Cas9 or Cas12), and are easier to deliver (can be delivered by means including AAV or LNP).
  • the present disclosure provides a guide RNA (referred to herein as “Cas12o guide RNA”, “guide RNA”, “crRNA” or “guide RNA (gRNA)”) that binds to a Cas12o protein and provides sequence specificity for the Cas12o protein; a nucleic acid encoding the Cas12o guide RNA; and a modified host cell comprising the Cas12o guide RNA and/or a nucleic acid encoding the Cas12o guide RNA.
  • the Cas12o guide RNA can be used in various applications provided.
  • Cas12o protein includes wild-type Cas12o protein, its derivatives or variants, and functional fragments thereof such as oligonucleotide binding fragments.
  • the Cas12o protein comprises an OBD domain, a REC domain, a RuvC domain, a Helical domain, and a Nuc domain.
  • the RuvC domain includes a RuvC-I domain, a RuvC-II domain, and a RuvC-III domain.
  • the Cas12o protein does not include a HNH domain and a PI domain.
  • the Cas protein is a class 2 type V Cas endonuclease.
  • the OBD domain is a bi-split domain, including discontinuous OBD-I domain and OBD-II domain.
  • the size of the Cas12o protein is between 500 and 1200 amino acids, between 500 and 1100 amino acids, between 700 and 1100 amino acids, and between 900 and 1000 amino acids, and the size variation may depend in part on the specific domain architecture of Cas12o or its homologs.
  • the Cas12o protein may be derived from a naturally occurring protein, a modified naturally occurring protein, a functional fragment or truncated version thereof, or a non-naturally occurring protein.
  • the Cas12o protein may comprise one or more domains derived from other Cas12o protein nucleases, more particularly from different organisms.
  • the Cas12o protein nuclease may be designed by a computer method. Examples of computer protein design have been described in the art and are therefore known to the skilled person.
  • the Cas12o protein locus is not associated with a CRISPR array.
  • Cas12o protein may also encompass homologs or orthologs of the Cas12o protein whose sequence is specifically described herein. Orthologous proteins may, but need not be structurally related or only partially related in structure.
  • a homolog or ortholog of the Cas12o protein as mentioned herein has at least 80%, at least 85%, at least 90%, at least 95%, at least 99% sequence homology or identity with the Cas12o protein nuclease.
  • a homolog or ortholog of the Cas12o protein nuclease has at least 80%, at least 85%, at least 90% or at least 95% sequence identity with the wild-type Cas12o protein nuclease.
  • the Cas protein comprises a sequence having at least 70%, at least 75%, at least 80%, or at least 90% sequence identity to any one of SEQ ID NO: 1, 3, 5, 7-9.
  • the Cas12o protein comprises an amino acid sequence as shown in any one of SEQ ID NO. 1, 3, 5, 7-9, wherein SEQ ID NO. 7-9 is a functional fragment of Cas12o.
  • the Cas12o protein may comprise one or more modifications.
  • modified generally refers to a variant (Cas12o variant) nuclease of a Cas12o protein having one or more modifications or mutations (including point mutations, truncations, insertions, deletions, chimeras, fusion proteins, etc.) compared to the source wild-type counterpart.
  • derivative means that the derivative enzyme is mainly based on the wild-type enzyme in the sense of having a high degree of sequence homology with the wild-type enzyme, but the derivative enzyme has been mutated (modified) in a manner known in the art or as described herein.
  • the derivative enzyme of the Cas12o protein includes a Cas12o protein substantially lacking catalytic activity (deadCas12o, dCas12o) or a Cas12o nickase (nicklase Cas12o, nCas12o) having single-stranded cutting ability.
  • dCas12o may have reduced nuclease activity or no nuclease activity (retaining less than 50% (e.g., less than any about 40%, 35%, 30%, 27.5%, 25%, 22.5%, 20%, 17.5%, 15%, 12.5%, 10%, 7.5%, 25 5%, 4%, 3%, 2.5%, 2%, 1% or less) of the corresponding original Cas12o protein (e.g., a novel Cas12o protein comprising any of the amino acid sequences of SEQ ID NOs: 1, 3, 5, 7-9) or a variant thereof.
  • a novel Cas12o protein comprising any of the amino acid sequences of SEQ ID NOs: 1, 3, 5, 7-9
  • the Cas12o protein can be identified with reference to the general class of enzymes having homology to the largest nuclease having multiple nuclease domains from a type I, type II, type III, type IV, type V or type VI CRISPR system.
  • a catalytically inactive or dead nuclease may have nickase (nCas) activity. In some cases, a catalytically inactive or dead nuclease may not have nickase activity. Such a catalytically inactive or dead nuclease may not cause double-stranded or single-stranded breaks on the target polynucleotide, but may still bind to the target polynucleotide or otherwise form a complex.
  • the Cas12o nickase (nicklase Cas12o, nCas12o) with single-stranded cutting ability is obtained by modifying the Cas12o protein, and one or more amino acid mutations are introduced into the Cas12o protein to enable it to have a nickase single-stranded DNA cutting activity that cuts one strand of the double-stranded DNA.
  • the modification of Cas12o protein may or may not result in functional changes.
  • modifications that do not result in functional changes include, for example, codon optimization for expression in a specific host, or providing specific markers to the nuclease (e.g., for visualization).
  • Modifications that may result in functional changes may also include mutations, including point mutations, insertions, deletions, truncations (including split nucleases), etc., and chimeric nucleases (e.g., comprising domains from different orthologs or homologs) or fusion proteins.
  • the chimeric enzyme may include a first fragment and a second fragment, and the fragment may be a fragment of a Cas12o protein nuclease ortholog of a genus or a species of an organism, for example, the fragment is from a Cas12o protein nuclease ortholog of different species.
  • nuclease domain of the Cas12o protein is catalytically inactive, or is modified to be catalytically inactive, or is modified to be a nickase. In one embodiment, both nuclease domains are catalytically inactive.
  • the Cas12o protein nuclease may include one or more modifications that lead to enhanced activity and/or specificity, such as including a mutant residue that stabilizes the targeting or non-targeting chain.
  • the altered or modified activity of the engineered Cas12o protein includes increased targeting efficiency or reduced off-target binding.
  • the altered activity of the engineered Cas12o protein nuclease includes a modified cleavage activity.
  • the altered activity includes an increased cleavage activity to the target polynucleotide locus.
  • the altered activity includes a reduced cleavage activity to the target polynucleotide locus.
  • the altered activity includes a reduced cleavage activity to the off-target polynucleotide locus.
  • the altered or modified activity of the modified nuclease includes altered helicase kinetics.
  • the modified nuclease includes a modification that changes the association of a protein with a nucleic acid molecule comprising RNA, or a chain of a target polynucleotide locus, or a chain of an off-target polynucleotide.
  • the engineered Cas12o protein nuclease includes a modification that changes the formation of the Cas12o protein nuclease and the associated complex.
  • the activity of the change includes the increased cleavage activity to the off-target polynucleotide locus.
  • the specificity of the target polynucleotide locus is increased.
  • the specificity of the target polynucleotide locus is reduced.
  • mutation causes off-target effect (such as cutting or binding properties, activity or kinetics) to be reduced, for example, causing the tolerance of the mismatch between the target and crRNA to be reduced.Other mutations may cause off-target effect (for example, cutting or binding properties, activity or kinetics) to increase.Other mutations may cause the on-target effect (for example, cutting or binding properties, activity or kinetics) to increase or decrease.
  • mutation causes the change (for example, increase or decrease) helicase activity, association or formation of the functional nuclea
  • the Cas12o protein can be guided to the position or vicinity of the target sequence, such as within the target sequence and/or within the complementary sequence of the target sequence or at the cutting of one or two DNA chains at the sequence associated with the target sequence.
  • the Cas12o protein can guide the cutting of one or two DNA chains within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 300, 400, 500 or more base pairs or nucleotides from the first or last nucleotide of the target sequence.
  • the cutting position of Cas12o is about 12-19 nucleotides from the first nucleotide of the target sequence to the cutting of two DNA chains.
  • the cutting can be staggered, that is, sticky ends are produced.
  • the cutting is a staggered cut with a 5' overhang.
  • the cleavage is a staggered nick with a 5' overhang of 1 to 15 nucleotides, preferably 4 or 9 nucleotides.
  • the cleavage site is distal to the target adjacent motif (TAM), which is used interchangeably with the term "PAM” herein, e.g., cleavage occurs after the nth nucleotide on the non-target strand and after the nucleotide on the targeted strand.
  • TAM target adjacent motif
  • cleavage site occurs after an identified nucleotide on the non-target strand (calculated from the PAM) and after a further identified nucleotide on the targeted strand (calculated from the PAM).
  • the vector encodes a nucleic acid-targeting effector protein, which can be mutated relative to the corresponding wild-type enzyme, such that the mutated nucleic acid-targeting effector protein lacks the ability to cleave one or both DNA and RNA strands of a target polynucleotide containing a target sequence.
  • the reference Cas12o protein disclosed herein comprises a REC-I domain, a REC-II domain, an OBD domain, a RuvC-I domain, a Helical domain, a RuvC-II domain, a Nuc-I domain, a RuvC-III domain, and a Nuc-II domain in the order of N-terminus-C-terminus (FIG. 2A).
  • the oligonucleotide binding domain is a bi-split domain, including discontinuous OBD-I domains and OBD-II domains, in which case OBD-I is located at the N-terminus of the Cas12o protein, and Cas12o comprises an OBD-I domain, a REC-I domain, a REC-II domain, an OBD-II domain, a RuvC-I domain, a Helical domain, a RuvC-II domain, a Nuc-I domain, a RuvC-III domain, and a Nuc-II domain in the order of N-terminus-C-terminus (FIG. 2C).
  • OBD domain oligonucleotide binding domain
  • the reference Cas12o protein of the present disclosure comprises an oligonucleotide binding domain (OBD).
  • OBD oligonucleotide binding domain
  • Certain Cas proteins other than Cas12o have domains that can be named in a similar manner.
  • the OBD comprises one or more unique functional features, or comprises a sequence unique to the Cas12o protein, or a combination thereof.
  • the OBD domain comprises an OBD-I domain and an OBD-II domain, as shown in the OBD domain distribution in Figure 2C.
  • the OBD-I domain comprises the amino acid sequence at positions 1-15 in SEQ ID NO:5
  • the OBD-II domain comprises the amino acid sequence at positions 333-480 in SEQ ID NO:5.
  • the OBD domain comprises only a single domain, exemplarily, the OBD domain distribution as shown in Figure 2A.
  • the exemplary OBD domain is shown as amino acids 359-474 (OBD-II) in SEQ ID NO: 1 or amino acids 337-452 (OBD-II) in SEQ ID NO: 3.
  • the reference Cas12o protein disclosed herein comprises a RuvC domain, which includes a tri-split RuvC domain, including 3 discontinuous RuvC domains (RuvC-I, RuvC-II and RuvC-III domains).
  • the RuvC domain is the ancestral domain of all type 12 CRISPR proteins.
  • the RuvC domain is derived from a TnpB (transposase B)-like transposase. Similar to other RuvC domains, the Cas12o RuvC domain has a DED catalytic triad responsible for coordinating magnesium (Mg) ions and cleaving DNA.
  • the RuvC-I domain comprises the amino acid sequence of positions 475-563 in SEQ ID NO:1, the amino acid sequence of positions 453-536 in SEQ ID NO:3, and the amino acid sequence of positions 481-560 in SEQ ID NO:5, the RuvC-II domain comprises the amino acid sequence of positions 766-818 in SEQ ID NO:1, the amino acid sequence of positions 732-784 in SEQ ID NO:3, and the amino acid sequence of positions 758-812 in SEQ ID NO:5, and the RuvC-III domain comprises the amino acid sequence of positions 854-869 in SEQ ID NO:1, the amino acid sequence of positions 817-832 in SEQ ID NO:3, and the amino acid sequence of positions 845-860 in SEQ ID NO:5.
  • the REC (recognition) domain comprises at least one REC domain (e.g., a REC-I domain and optionally a REC-II domain), which is believed to interact with the repeat: anti-repeat duplex of crRNA and mediate the formation of the Cas protein/crRNA complex.
  • the reference Cas12o protein disclosed herein comprises a REC domain, which comprises a first REC domain (REC-I) and a second REC domain (REC-II) from the N-terminus to the C-terminus.
  • the REC-I domain comprises the amino acid sequence of positions 1-165 in SEQ ID NO: 1, the amino acid sequence of positions 1-183 in SEQ ID NO: 3, and the amino acid sequence of positions 16-196 in SEQ ID NO: 5
  • the REC-II domain comprises the amino acid sequence of positions 166-358 in SEQ ID NO: 1, the amino acid sequence of positions 184-336 in SEQ ID NO: 3, and the amino acid sequence of positions 197-332 in SEQ ID NO: 5.
  • the reference Cas12o disclosed herein comprises a Helical domain.
  • the Helical domain comprises the amino acid sequence at positions 564-765 in SEQ ID NO:1, the amino acid sequence at positions 537-731 in SEQ ID NO:3, and the amino acid sequence at positions 561-757 in SEQ ID NO:5.
  • the Nuc domain is thought to be involved in target strand cleavage (Yamano et al., Cell 2016, 165: 949-962). Other mutational studies in other Cas12 proteins have shown that the Nuc domain contributes to guide and target binding (Swarts et al., Mol Cell 2017, 66: 221-233).
  • the reference Cas12o disclosed herein comprises a Nuc domain (including a Nuc-I domain and a Nuc-II domain), exemplarily, the Nuc-I domain comprises the amino acid sequence at positions 819-853 in SEQ ID NO:1, the amino acid sequence at positions 785-816 in SEQ ID NO:3, and the amino acid sequence at positions 813-844 in SEQ ID NO:5, and the Nuc-II domain comprises the amino acid sequence at positions 870-984 in SEQ ID NO:1, the amino acid sequence at positions 833-954 in SEQ ID NO:3, and the amino acid sequence at positions 861-966 in SEQ ID NO:5.
  • guide RNA is used interchangeably with guide molecules, guide RNA, gRNA or crRNA, etc., and refers to nucleic acid-based molecules, including but not limited to RNA-based molecules (e.g., direct repeat (DR) sequences) that are capable of forming a complex with a CRISPR-Cas protein and contain a targeting sequence (e.g., a spacer sequence) that is sufficiently complementary to a target nucleic acid sequence to hybridize with the target nucleic acid sequence and guide sequence-specific binding of the complex to the target nucleic acid sequence.
  • RNA-based molecules e.g., direct repeat (DR) sequences
  • DR direct repeat
  • the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%).
  • the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 100%.
  • the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 100% over the seven consecutive nucleotides most 3' to the target site of the target nucleic acid.
  • the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 17 or more, 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides.
  • the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 17 or more, 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides.
  • the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 17 or more, 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 17 or more (e.g., 17 or more, 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides.
  • the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 consecutive nucleotides. In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 99% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 consecutive nucleotides.
  • the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 consecutive nucleotides. In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 100% over 19-25 consecutive nucleotides.
  • the targeting sequence has a length in the range of 19-30 nucleotides (nt) (e.g., 19-25, 19-22, 19-20, 20-30, 20-25, or 20-22 nt). In some cases, the targeting sequence has a length in the range of 19-25 nucleotides (nt) (e.g., 19-22, 19-20, 20-25, 20-25, or 20-22 nt). In some cases, the targeting sequence has a length of 19 or more nt (e.g., 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.).
  • the targeting sequence has a length of 19 nt. In some cases, the targeting sequence has a length of 20 nt. In some cases, the guide sequence has a length of 21 nt. In some cases, the guide sequence has a length of 22 nt. In some cases, the guide sequence has a length of 23 nt.
  • the crRNA of Cas12o comprises, or is essentially composed of, or is composed of: a direct repeat (DR) sequence and a spacer (Spacer) sequence.
  • the crRNA comprises, or is essentially composed of, or is composed of: a direct repeat sequence connected to a spacer sequence.
  • the crRNA comprises a direct repeat sequence, a spacer sequence, and a direct repeat sequence (DR-Spacer-DR). This is a typical feature of the precursor crRNA (pre-crRNA) configuration.
  • the crRNA comprises a direct repeat sequence, a spacer sequence, a direct repeat sequence, and a spacer sequence (DR-Spacer-DR-Spacer).
  • the crRNA comprises two or more direct repeat sequences and two or more spacer sequences.
  • the crRNA includes a truncated direct repeat sequence, and a spacer sequence. This is a typical feature of a processed or mature crRNA.
  • the CRISPR-Cas12o effector protein forms a complex with the crRNA, and the spacer sequence guides the complex to sequence-specific binding with a target nucleic acid, and the target nucleic acid is complementary to the spacer sequence.
  • Any DR sequence that can mediate the binding of the Cas12o protein described herein to the corresponding crRNA can be used in the present disclosure.
  • the general DR sequence of Cas12o of the present disclosure comprises 5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3', wherein segments R1a and R1b are reverse complementary sequences and form a first stem (R1), wherein the first stem (R1) has a plurality of (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) nucleotide pairs in Cas12o; segment Ba and Bb do not base pair with each other and form a bulge (B); segments R2a and R2b are reverse complementary sequences and form a second stem (R2), which has multiple (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) base pairs; and L is a loop formed at the second stem and formed by multiple (3, 4, 5, 6, 7, 8, 9, 10) nucleotides.
  • the DR sequence comprises 5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3', wherein segments R1a and R1b are reverse complementary sequences and form a first stem (R1), and the first stem (R1) has 3 or 5 nucleotide pairs in Cas12o; segments Ba and Bb do not exist at the same time, and a protrusion (B) formed by 2 or 3 nucleotides is formed by the existing segment Ba or segment Bb; segments R2a and R2b are reverse complementary sequences and form a second stem (R2), and the second stem (R2) has 6 or 7 base pairs; and L is a loop formed at the second stem, with 5 or 7 nucleotides.
  • the DR sequence is as shown in FIG. 6A , which comprises 5′-R1a(ACA)-Ba(absent)-R2a(GGUAUCC)-L(UAAAC)-R2b(GGAUGCU)-Bb(GA)-R1b(UGU)-3′.
  • the DR sequence is as shown in Figure 6B, which comprises 5'-R1a(UUACA)-Ba(absent)-R2a(ACUAUUC)-L(UUGAAAC)-R2b(GAAUGGU)-Bb(GAU)-R1b(UGUAA)-3'.
  • the DR sequence is as shown in Figure 6C, which comprises 5'-R1a(UCAGU)-Ba(GUG)-R2a(GGUCUG)-L(AAACA)-R2b(CAGACC)-Bb(absent)-R1b(AUUGA)-3'.
  • the DR sequence corresponding to the Cas12o protein of the present disclosure is shown in SEQ ID NO. 2, 4, and 6.
  • the direct repeat contains a "functional variant" of the sequence shown in SEQ ID NO. 2, 4, and 6, such as a "functional truncated version", “functionally extended version”, or “functionally replaced version”, for example, the DR variant obtained by truncating or deleting the sequence of the DR sequence still has the DR function
  • the DR "functional variant” is a DR sequence that still has at least 20% (such as at least about any 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or higher) of the reference DR (such as the parent DR) after the 5' and/or 3' ends are extended (functionally extended version) or truncated (functionally truncated version), and/or one or more nucleotides are inserted, deleted, and/or replaced (functionally replaced version) in the reference DR sequence, i.e.,
  • the DR functional variants generally retain a stem-loop-like secondary structure or part thereof that can be bound by Cas12o protein.
  • the stem-loop structure of the DR sequence of Cas12o1 can be as shown in Figure 6.
  • the stem of the direct repetition contained in the crRNA consists of 10-13 pairs of complementary bases that hybridize with each other, which generally include 1 RNA bulge, and the loop length is 5-7 nucleotides.
  • the loop length is 5 nucleotides; in some embodiments, the loop length is 7 nucleotides.
  • the stem may include at least 10, at least 11, at least 12 or at least 13 base pairs.
  • the direct repetition includes two nucleotide complementary segments with a total length of about 10-15 nucleotides, and 5-7 nucleotides constituting the loop.
  • the stem-loop structure comprises a first stem nucleotide chain of 10-15 nucleotides in length; a second stem nucleotide chain of 10-15 nucleotides in length, wherein the first and second stem nucleotide chains can hybridize with each other; and a cyclic nucleotide chain arranged between the first and second stem nucleotide chains, wherein the cyclic nucleotide chain comprises 5, 6 or 7 nucleotides.
  • the cyclic nucleotide chain comprised by the stem-loop structure comprises at least 3 adenine nucleotides.
  • the DR sequence that can guide Cas12o to the target site has one or more nucleotide changes selected from nucleotide addition, insertion, deletion and substitution that do not cause substantial differences in the secondary structure compared to the DR sequence shown in any one of SEQ ID NO. 2, 4, and 6.
  • Exemplary DR sequences include nucleotide sequences that have 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% identity) to the sequence shown in any one of SEQ ID NO. 2, 4, and 6.
  • the length of the spacer sequence is greater than 17 nucleotides, preferably 17 to 100 nucleotides, more preferably 16 to 50 nucleotides (e.g., 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides), more preferably 17 to 50 nucleotides, more preferably 17 to 40 nucleotides, more preferably 18 to 39 nucleotides, and most preferably 18 to 37 nucleotides.
  • the Cas12o protein can have one or more functional domains associated (e.g., via a fusion protein, or a suitable linker), including, for example, one or more domains from the group comprising, or essentially consisting of, or consisting of: associated with one or more functional domains selected from a nuclear localization signal (NLS) domain, a nuclear export signal (NES) domain, a translation activation domain, a transcriptional activation domain (e.g., VP64, p65, MyoD1, HSF1, RTA, and SET7/9), a translation initiation domain, a transcriptional repression domain (e.g., a KRAB domain, a NuE domain, an NcoR domain, and a SID domain, such as a SID4X domain), a nuclease domain (e.g., FokI), a histone modification domain (e.g., a histone acetyltransferase
  • NLS nuclear localization signal
  • NES nuclear
  • the functional domain includes a deaminase. In another embodiment, the functional domain is a transposase. In another embodiment, the functional domain is a reverse transcriptase.
  • the CRISPR-Cas12o complex as a whole may be associated with two or more functional domains. For example, there may be two or more functional domains associated with the Cas12o protein, or there may be two or more functional domains associated with a guide RNA or crRNA, or there may be one or more functional domains associated with an effector protein targeting RNA and one or more with a guide RNA or crRNA.
  • the Cas12o protein is associated with one or more functional domains, which can be achieved by direct connection of the effector protein to the functional domain, or by association with crRNA.
  • the crRNA comprises an added or inserted sequence that can be associated with the target functional domain, including, for example, an aptamer or nucleotide that binds to a nucleic acid binding adapter protein.
  • the functional domain can be a functional heterologous domain.
  • the Cas12o protein is associated with one or more functional domains, which can be achieved by direct connection of the effector protein to the functional domain, or by association with crRNA.
  • the crRNA comprises an added or inserted sequence that can be associated with the target functional domain, including, for example, an aptamer or nucleotide that binds to a nucleic acid binding adapter protein.
  • the functional domains may be functional heterologous domains. At least one or more heterologous functional domains may be at or near the amino terminus of the effector protein and/or at least one or more heterologous functional domains may be at or near the carboxyl terminus of the effector protein.
  • the one or more heterologous functional domains may be fused to the effector protein.
  • the one or more heterologous functional domains may be tethered to the effector protein.
  • the one or more heterologous functional domains may be connected to the effector protein via a linker portion.
  • the one or more functional domains are heterologous functional domains.
  • the heterologous functional domains have one or more of the following activities: nuclease activity, methylation activity, demethylation activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, glycosylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristoylation activity,
  • the Cas12o protein or its ortholog or homolog can be used as a universal nucleic acid binding protein fused or operably connected to a functional domain.
  • exemplary functional domains may include, but are not limited to, nuclear localization signals (NLS), nuclear export signals (NES), deaminases (e.g., adenosine deaminase or cytidine deaminase) domains, transcriptional activation domains, DNA methylation catalytic domains, histone residue modification domains, nuclease catalytic domains, fluorescent proteins, transcriptional modifiers, light-gated factors, chemically inducible factors, chromatin visualization factors, targeting polypeptides, epigenetic modification domains, transposase domains, reverse transcriptase domains, topoisomerases, phosphatases, polymerases that provide binding to cell surface moieties on target cells or target cell types.
  • NLS nuclear localization signals
  • NES nuclear export signals
  • deaminases
  • functional domain is transcriptional activation domain, such as but not limited to VP64, p65, MyoD1, HSF1, RTA, SET7/9 or histone acetyltransferase.
  • functional domain is transcriptional repression domain, preferably KRAB.
  • transcriptional repression domain is SID or SID concatemer (such as SID4X).
  • functional domain is epigenetic modification domain, so as to provide epigenetic modification enzyme.
  • functional domain is transcriptional activation domain, it can be P65 activation domain.
  • the nucleic acid-guided nuclease is associated with a ligase or a functional fragment thereof.
  • the ligase can connect single-strand breaks (nicks) produced by the nucleic acid-guided nuclease. In some cases, the ligase can connect double-strand breaks produced by the nucleic acid-guided nuclease. In some examples, the nucleic acid-guided nuclease is associated with a reverse transcriptase or a functional fragment thereof.
  • a transposase domain a transposase domain, a HR (homologous recombination) machinery domain, a recombinase domain and/or an integrase domain are used as functional domains of the present disclosure.
  • the DNA integration activity comprises a HR machinery domain, an integrase domain, a recombinase domain and/or a transposase domain.
  • the DNA cleavage activity is due to a nuclease.
  • the nuclease includes a Fok1 nuclease (e.g., see, "Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing", Shengdar Q. Tsai, Nicolas Wyvekens, Cyd Khayter, Jennifer A. Foden, Vishal Thapar, Deepak Reyon, Mathew J. Goodwin, Martin J. Aryee, J. Keith Joung Nature Biotechnology 32(6):569--77 (2014)), which relates to a dimeric RNA-guided FokI nuclease that recognizes extended sequences and can efficiently edit endogenous genes in human cells.
  • Fok1 nuclease e.g., see, "Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing", Shengdar Q. Tsai, Nicolas Wyvekens, Cyd Khayter, Jennifer A. Foden, Vis
  • the Cas12o protein may include one or more heterologous functional domains.
  • a heterologous functional domain is a polypeptide that is not derived from the same species as the nucleic acid-guided nuclease.
  • the heterologous functional domain of the nucleic acid-guided nuclease derived from species A is a polypeptide derived from a species different from species A, or an artificial polypeptide.
  • One or more heterologous functional domains may include one or more nuclear localization signal (NLS) domains.
  • One or more heterologous functional domains may include at least two or more NLS.
  • One or more heterologous functional domains may include one or more transcriptional activation domains.
  • the transcriptional activation domain may include VP64.
  • One or more heterologous functional domains may include one or more transcriptional repression domains.
  • the transcriptional repression domain may include a KRAB domain or a SID domain.
  • One or more heterologous functional domains may include one or more nuclease domains.
  • One or more nuclease domains may include Fok1.
  • one or more functional domains comprise acetyltransferase, preferably histone acetyltransferase.
  • the method for interrogating epigenome can include, for example, targeting epigenomic sequence.
  • Targeting epigenomic sequence can include guiding thing to epigenomic target sequence.
  • Epigenomic target sequence can include, in one embodiment, including promoter, silencer or enhancer sequence.
  • acetyltransferases examples include a histone acetyltransferase.
  • the histone acetyltransferase may comprise the catalytic core of human acetyltransferase p300 (Gerbasch & Reddy, Nature Biotech April 6, 2015).
  • a Cas12o protein e.g., dCas12o
  • a Cas12o protein can be associated with (e.g., fused to) a deaminase (e.g., an adenosine deaminase or a cytidine deaminase) that can change the identity of a nucleotide, for example, from C ⁇ G to T ⁇ A or from A ⁇ T to G ⁇ C (Gaudelli et al., Programmable base editing of A ⁇ T to G ⁇ C in genomic DNA without DNA cleavage.” Nature (2017); Nishida et al.
  • the base editing fusion protein may comprise, for example, an active (double-strand break generating), partially active (nicking enzyme), or inactive (catalytically inactive) Cas12o nuclease and deaminase.
  • nucleotide deaminase is a mutant form of adenosine deaminase.
  • the mutant form of adenosine deaminase may have both adenosine deaminase and cytidine deaminase activity.
  • adenosine deaminase or “adenosine deaminase protein” refers to a protein, a polypeptide, or one or more functional domains of a protein or polypeptide that is capable of catalyzing a hydrolytic deamination reaction that converts adenine (or the adenine portion of a molecule) to hypoxanthine (or the hypoxanthine portion of a molecule).
  • the adenine-containing molecule is adenosine (A)
  • the hypoxanthine-containing molecule is inosine (I).
  • the adenine-containing molecule can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
  • adenosine deaminases that can be used in conjunction with the present disclosure include, but are not limited to, members of the enzyme family known as adenosine deaminases acting on RNA (ADAR), members of the enzyme family known as adenosine deaminases acting on tRNA (ADAT), and other family members containing adenosine deaminase domains (ADAD).
  • adenosine deaminases are capable of targeting adenine in RNA/DNA and RNA duplexes. In fact, Zheng et al., (Nucleic Acids Res.
  • ADARs can perform editing reactions of adenosine to inosine on RNA/DNA and RNA/RNA duplexes.
  • adenosine deaminase has been modified to increase its ability to edit DNA in RNA/DNA heteroduplexes of RNA duplexes, as described in detail below.
  • the adenosine deaminase is derived from one or more metazoan species, including but not limited to mammals, birds, frogs, squid, fish, flies, and worms. In some embodiments, the adenosine deaminase is human, squid, or fruit fly adenosine deaminase.
  • the adenosine deaminase is a human ADAR, including hADAR1, hADAR2, hADAR3. In some embodiments, the adenosine deaminase is a Caenorhabditis elegans ADAR protein, including ADR-1 and ADR-2. In some embodiments, the adenosine deaminase is a Drosophila ADAR protein, including dAdar. In some embodiments, the adenosine deaminase is a squid (Loligo pealeii) ADAR protein, including sqADAR2a and sqADAR2b.
  • the adenosine deaminase is a human ADAT protein. In some embodiments, the adenosine deaminase is a Drosophila ADAT protein. In some embodiments, the adenosine deaminase is a human ADAD protein, including TENR (hADAD1) and TENRL (hADAD2).
  • the adenosine deaminase is a TadA protein, such as E. coli TadA. See Kim et al., Biochemistry 45:6407-6416 (2006); Wolf et al., EMBO J. 21:3841-3851 (2002).
  • the adenosine deaminase is mouse ADA. See Grunebaum et al., Curr. Opin. Allergy Clin. Immunol. 13:630-638 (2013).
  • the adenosine deaminase is human ADAT2. See Fukui et al., J. Nucleic Acids 2010:260512 (2010).
  • the deaminase e.g., an adenosine or cytidine deaminase
  • the deaminase is one or more of those described in Cox et al., Science. 2017 Nov 24;358(6366):1019-1027; Komore et al., Nature. 2016 May 19;533(7603):420-4; and Gaudelli et al., Nature. 2017 Nov 23;551(7681):464-471.
  • the adenosine deaminase protein recognizes one or more target adenosine residues in a double-stranded nucleic acid substrate and converts them into inosine residues.
  • the double-stranded nucleic acid substrate is an RNA-DNA hybrid duplex.
  • the adenosine deaminase protein recognizes a binding window on a double-stranded substrate.
  • the binding window comprises at least one target adenosine residue.
  • the binding window is in the range of about 3bp to about 100bp. In some embodiments, the binding window is in the range of about 5bp to about 50bp.
  • the binding window is in the range of about 10bp to about 30bp. In some embodiments, the binding window is about 1bp, 2bp, 3bp, 5bp, 7bp, 10bp, 15bp, 20bp, 25bp, 30bp, 40bp, 45bp, 50bp, 55bp, 60bp, 65bp, 70bp, 75bp, 80bp, 85bp, 90bp, 95bp or 100bp.
  • the adenosine deaminase protein comprises one or more deaminase domains. Without wishing to be bound by a particular theory, it is expected that the deaminase domain is used to identify one or more target adenosine (A) residues contained in a double-stranded nucleic acid substrate and convert them into inosine (I) residues.
  • the deaminase domain comprises an active center. In some embodiments, the active center comprises a zinc ion.
  • the base pairing at the target adenosine residue is destroyed, and the target adenosine residue is "flipped" out of the double helix to become accessible to adenosine deaminase.
  • the amino acid residues in or near the active center interact with one or more nucleotides of the 5' of the target adenosine residue.
  • the amino acid residues in or near the active center interact with one or more nucleotides of the 3' of the target adenosine residue.
  • the amino acid residues in or near the active center further interact with the nucleotides complementary to the target adenosine residues on the opposite chain.
  • the amino acid residues form hydrogen bonds with the 2' hydroxyl of the nucleotide.
  • the adenosine deaminase comprises a human ADAR2 full protein (hADAR2) or a deaminase domain thereof (hADAR2-D). In some embodiments, the adenosine deaminase is an ADAR family member homologous to hADAR2 or hADAR2-D.
  • the homologous ADAR protein is human ADAR1 (hADAR1) or its deaminase domain (hADAR1-D).
  • hADAR1-D human ADAR1
  • hADAR1-D its deaminase domain
  • glycine 1007 of hADAR1-D corresponds to glycine 487 hADAR2-D
  • glutamate 1008 of hADAR1-D corresponds to glutamate 488 of hADAR2-D.
  • the adenosine deaminase comprises the wild-type amino acid sequence of hADAR2-D. In some embodiments, the adenosine deaminase comprises one or more mutations in the hADAR2-D sequence such that the editing efficiency and/or substrate editing preference of hADAR2-D is changed according to specific needs.
  • the adenosine deaminase catalytic domain comprises an amino acid sequence that is at least 80%, 82%, 85%, 87%, 90%, 92%, 95%, 96%, 97%, 98%, or 99% or 100% identical to the amino acid sequence shown in SEQ ID NO:30, and it retains the deamination activity of the amino acid sequence shown in SEQ ID NO:30.
  • the adenosine deaminase catalytic domain includes a mutant of the amino acid sequence shown in SEQ ID NO: 30: E18K+F19S+N20L, named adenosine deaminase 004V14 (see WO2023193536A1).
  • the adenosine deaminase catalytic domain comprises an amino acid sequence that is at least 80%, 82%, 85%, 87%, 90%, 92%, 95%, 96%, 97%, 98%, 99% or 100% identical to the amino acid sequence shown in SEQ ID NO:31 (selected from 005V1 deaminase in CN114634923A, in which the amino acid sequence is SEQ ID NO:2), and it retains the deamination activity of the amino acid sequence shown in SEQ ID NO:31.
  • the amino acid sequence of the adenosine deaminase catalytic domain has amino acid additions, insertions, deletions, and substitutions relative to the amino acid sequence shown in SEQ ID NO:30 or 31.
  • the adenosine deaminase catalytic domain includes a mutant of the amino acid sequence shown in SEQ ID NO:31: Q148G+Q149M+P150R, named deaminase 005V1-10-3.
  • the functional domain is the full length or a functional fragment of TadA8e.
  • the adenosine deaminase is 004V1 (SEQ ID NO.30) or 005V1 (SEQ ID NO.31).
  • the deaminase is a cytidine deaminase.
  • cytidine deaminase or “cytidine deaminase protein” refers to a protein, a polypeptide, or one or more functional domains of a protein or polypeptide that can catalyze the hydrolytic deamination reaction of converting cytosine (or the cytosine portion of a molecule) into uracil (or the uracil portion of a molecule), as shown below.
  • the molecule containing cytosine is cytidine (C)
  • the molecule containing uracil is uridine (U).
  • the molecule containing cytosine can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
  • cytidine deaminases that can be used in conjunction with the present disclosure include, but are not limited to, members of the enzyme family known as apolipoprotein B mRNA editing complex (APOBEC) family deaminases, activation-induced deaminases (AID), or cytidine deaminase 1 (CDA1).
  • APOBEC apolipoprotein B mRNA editing complex
  • AID activation-induced deaminases
  • CDA1 cytidine deaminase 1
  • a cytidine deaminase is capable of targeting cytosine in a single strand of DNA.
  • the cytidine deaminase can edit on a single strand that is present outside a binding component, such as in conjunction with Cas13.
  • the cytidine deaminase can edit at a localized bubble, such as a localized bubble formed by a target editing site but a guide sequence mismatch.
  • the cytidine deaminase may include mutations that contribute to focused activity, such as those described in Kim et al., Nature Biotechnology (2017) 35(4):371-377 (doi:10.1038/nbt.3803.
  • the cytidine deaminase is derived from one or more metazoan species, including but not limited to mammals, birds, frogs, squids, fish, flies, and worms. In some embodiments, the cytidine deaminase is a human, primate, cow, dog, rat, or mouse cytidine deaminase.
  • the cytidine deaminase is human APOBEC, including hAPOBEC1 or hAPOBEC3. In some embodiments, the cytidine deaminase is human AID.
  • the cytidine deaminase protein recognizes one or more target cytosine residues in the single-stranded bubble of the RNA duplex and converts them into uracil residues. In some embodiments, the cytidine deaminase protein recognizes the binding window on the single-stranded bubble of the RNA duplex. In some embodiments, the binding window comprises at least one target cytosine residue. In some embodiments, the binding window is in the range of about 3bp to about 100bp. In some embodiments, the binding window is in the range of about 5bp to about 50bp. In some embodiments, the binding window is in the range of about 10bp to about 30bp.
  • the binding window is about 1bp, 2bp, 3bp, 5bp, 7bp, 10bp, 15bp, 20bp, 25bp, 30bp, 40bp, 45bp, 50bp, 55bp, 60bp, 65bp, 70bp, 75bp, 80bp, 85bp, 90bp, 95bp or 100bp.
  • the cytidine deaminase protein comprises one or more deaminase domains. Without wishing to be bound by theory, it is expected that the deaminase domain is used to recognize one or more target cytosine (C) residues contained in the single-stranded bubble of the RNA duplex and convert it into uracil (U) residues.
  • the deaminase domain comprises an active center. In some embodiments, the active center comprises a zinc ion.
  • the amino acid residues in or near the active center interact with one or more nucleotides of the 5' of the target cytosine residue. In some embodiments, the amino acid residues in or near the active center interact with one or more nucleotides of the 3' of the target cytosine residue.
  • the cytidine deaminase comprises a human APOBEC1 full protein (hAPOBEC1) or a deaminase domain thereof (hAPOBEC1-D) or a C-terminal truncated form thereof (hAPOBEC-T).
  • the cytidine deaminase is an APOBEC family member homologous to hAPOBEC1, hAPOBEC-D or hAPOBEC-T.
  • the cytidine deaminase comprises a human AID1 full protein (hAID) or a deaminase domain thereof (hAID-D) or a C-terminal truncated form thereof (hAID-T).
  • the cytidine deaminase is an AID family member homologous to hAID, hAID-D or hAID-T.
  • hAID-T is a hAID with a C-terminal truncation of about 20 amino acids.
  • the cytidine deaminase comprises the wild-type amino acid sequence of cytosine deaminase. In some embodiments, the cytidine deaminase comprises one or more mutations in the cytosine deaminase sequence, so that the editing efficiency and/or substrate editing preference of the cytosine deaminase are changed according to specific needs.
  • association is taken in its broadest meaning, covering the situation where two functional modules directly or indirectly (for example, through a linker) form a fusion protein, and also covering the situation where two functional modules are independent and bonded together by covalent bonds (such as disulfide bonds, etc.) or non-covalent bonds.
  • vector refers to a nucleic acid molecule capable of transporting another nucleic acid to which it is linked. It is a replicon, such as a plasmid, phage or cosmid, into which another DNA segment can be inserted to achieve replication of the inserted segment. Typically, a vector is capable of replication when combined with appropriate control elements.
  • the vector system comprises a single vector.
  • the vector system comprises multiple vectors.
  • the vector can be a viral vector.
  • vectors include but are not limited to single-stranded, double-stranded or partially double-stranded nucleic acid molecules; nucleic acid molecules comprising one or more free ends, no free ends (e.g., circular); nucleic acid molecules comprising DNA, RNA or both; and other polynucleotide variants known in the art.
  • plasmid refers to a circular double-stranded DNA loop, in which other DNA segments can be inserted, for example, by standard molecular cloning techniques.
  • viral vector in which there is a DNA or RNA sequence of viral origin in the vector for packaging into a virus (e.g., retrovirus, replication-defective retrovirus, adenovirus, replication-defective adenovirus, and adeno-associated virus).
  • Viral vectors also include polynucleotides carried by viruses for transfection into host cells. Some vectors are able to replicate autonomously in the host cells into which they are introduced (e.g., bacterial vectors and free mammalian vectors with bacterial replication origins). After being introduced into the host cell, other vectors (e.g., non-free mammalian vectors) are integrated into the genome of the host cell, thereby replicating together with the host genome.
  • vectors are able to guide the expression of genes operably connected thereto. Such vectors are referred to herein as "expression vectors”.
  • vectors expressed in eukaryotic cells and vectors that cause expression in eukaryotic cells may be referred to herein as "eukaryotic expression vectors.”
  • Common expression vectors useful in recombinant DNA techniques are often in the form of plasmids.
  • the recombinant expression vector can be suitable for the form of expressing nucleic acid in host cells to include nucleic acid of the present disclosure, which means that the recombinant expression vector includes one or more regulatory elements, which can be selected according to the host cell to be used for expression, and the nucleic acid is operably connected to the nucleic acid sequence to be expressed.
  • "operably connected" is intended to refer to the target nucleotide sequence to be connected to the regulatory element in a manner that allows the nucleotide sequence to be expressed (for example, in an in vitro transcription/translation system or in a host cell when the vector is introduced into a host cell).
  • Advantageous vectors include slow viruses and adeno-associated viruses, and the types of these vectors can also be selected to target specific types of cells.
  • regulatory element is intended to include promoters, enhancers, internal ribosome entry sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly-U sequences).
  • regulatory elements are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990).
  • Regulatory elements include those that direct constitutive expression of nucleotide sequences in many types of host cells and those that direct expression of nucleotide sequences only in certain host cells (e.g., tissue-specific regulatory sequences).
  • Tissue-specific promoters may direct expression primarily in target desired tissues such as muscle, neurons, bones, skin, blood, specific organs (e.g., liver, pancreas), or specific cell types (e.g., lymphocytes). Regulatory elements may also direct expression in a time-dependent manner, such as in a cell cycle-dependent or developmental stage-dependent manner, which may or may not be tissue or cell type specific.
  • the vector comprises one or more pol III promoters (e.g., 1, 2, 3, 4, 5 or more pol III promoters), one or more pol II promoters (e.g., 1, 2, 3, 4, 5 or more pol II promoters), one or more pol I promoters (e.g., 1, 2, 3, 4, 5 or more pol I promoters), or a combination thereof.
  • pol III promoters include, but are not limited to, U6 and H1 promoters.
  • pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with CMV enhancer) [see, e.g., Boshart et al., Cell, 41:521-530 (1985)], the SV40 promoter, the dihydrofolate reductase promoter, the ⁇ -actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EF1 ⁇ promoter.
  • RSV Rous sarcoma virus
  • CMV cytomegalovirus
  • PGK phosphoglycerol kinase
  • regulatory element also encompasses enhancer elements, such as WPRE; CMV enhancer; R-U5' segment in the LTR of HTLV-1 (Mol. Cell. Biol., Vol. 8 (1), No. 466-472, 1988); SV40 enhancer; and the intron sequence between exons 2 and 3 of rabbit ⁇ -globin (Proc. Natl. Acad. Sci. USA., Vol. 78 (3), pp. 1527-31, 1981).
  • WPRE WPRE
  • CMV enhancer CMV enhancer
  • R-U5' segment in the LTR of HTLV-1 Mol. Cell. Biol., Vol. 8 (1), No. 466-472, 1988
  • SV40 enhancer promoter
  • the intron sequence between exons 2 and 3 of rabbit ⁇ -globin Proc. Natl. Acad. Sci. USA., Vol. 78 (3), pp. 1527-31, 1981.
  • the design of the expression vector may depend on factors such as the choice of the host cell to
  • the vector may be introduced into a host cell to thereby produce a transcript, protein, or peptide encoded by a nucleic acid described herein, including a fusion protein or peptide (e.g., a clustered regularly interspaced short palindromic repeat (CRISPR) transcript, protein, enzyme, mutant form thereof, fusion protein thereof, etc.).
  • a fusion protein or peptide e.g., a clustered regularly interspaced short palindromic repeat (CRISPR) transcript, protein, enzyme, mutant form thereof, fusion protein thereof, etc.
  • Advantageous vectors include lentiviruses and adeno-associated viruses, and the type of such vector may also be selected to target a particular type of cell.
  • a bicistronic vector is used for the guide RNA and the (optionally modified or mutated) CRISPR enzyme (e.g., Cas12o).
  • Vectors can be designed to express CRISPR transcripts (e.g., nucleic acid transcripts, proteins, or enzymes) in prokaryotic or eukaryotic cells.
  • CRISPR transcripts can be expressed in bacterial cells such as Escherichia coli, insect cells (using baculovirus expression vectors), yeast cells, or mammalian cells. Suitable host cells are further discussed in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990).
  • the vector can be introduced and propagated in a prokaryotic organism or a prokaryotic cell, and in some embodiments, the prokaryotic organism is used to amplify a copy of the vector to be introduced into a eukaryotic cell, or as an intermediate vector in the production of the vector to be introduced into a eukaryotic cell (e.g., amplification of plasmid as part of a viral vector packaging system). In some embodiments, the prokaryotic organism is used to amplify a copy of the vector and express one or more nucleic acids, such as providing a source of one or more proteins to be delivered to a host cell or host organism.
  • fusion vector adds many amino acids to the protein encoded therein, such as to the amino terminus of a recombinant protein.
  • Such fusion vectors can provide one or more purposes, such as: (i) increasing the expression of recombinant proteins; (ii) increasing the solubility of recombinant proteins; and (iii) assisting the purification of recombinant proteins by acting as a ligand in affinity purification.
  • the vector is a yeast expression vector.
  • vectors for expression in the yeast Saccharomyces cerivisae include pYepSec1 (Baldari et al., 1987. EMBO J. 6:229-234), pMFa (Kuijan and Herskowitz, 1982. Cell 30:933-943), pJRY88 (Schultz et al., 1987. Gene 54:113-123), pYES2 (Invitrogen Corporation, San Diego, Calif.), and picZ (InVitrogen Corp, San Diego, Calif.).
  • the vector uses a baculovirus expression vector to drive protein expression in insect cells.
  • Baculovirus vectors that can be used to express proteins in cultured insect cells include the pAc series (Smith et al., 1983. Mol. Cell. Biol. 3:2156-2165) and the pVL series (Lucklow and Summers, 1989. Virology 170:31-39).
  • the vector is capable of driving the expression of one or more sequences in mammalian cells using a mammalian expression vector.
  • mammalian expression vectors include pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufman et al., 1987. EMBO J. 6: 187-195).
  • the control function of the expression vector is generally provided by one or more regulatory elements.
  • commonly used promoters are derived from polyoma, adenovirus, cytomegalovirus, simian virus, and other promoters disclosed herein and known in the art.
  • the recombinant mammalian expression vector is capable of preferentially directing expression of the nucleic acid in a specific cell type (e.g., tissue-specific regulatory elements are used to express the nucleic acid).
  • tissue-specific regulatory elements are known in the art.
  • suitable tissue-specific promoters include the albumin promoter (liver-specific; Pinkert et al., 1987. Genes Dev. 1:268-277), lymphoid-specific promoters (Calame and Eaton, 1988. Adv. Immunol. 43:235-275), in particular promoters for the T cell receptor (Winoto and Baltimore, 1989. EMBO J.
  • one or more vectors driving the expression of one or more elements of the nucleic acid targeting system are introduced into the host cell so that the expression of the elements of the nucleic acid targeting system guides the formation of the nucleic acid targeting complex at one or more target sites.
  • a single promoter drives the expression of transcripts encoding Cas12o proteins and guide RNAs, and the transcripts are embedded in one or more intron sequences (e.g., each in a different intron, two or more in at least one intron, or all in a single intron).
  • Cas12o proteins and guide RNAs can be operably connected to the same promoter and expressed from the same promoter.
  • the vector comprises one or more insertion sites, such as restriction endonuclease recognition sequences (also referred to as "cloning sites").
  • one or more insertion sites e.g., about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more insertion sites
  • a single expression construct can be used to target a plurality of different corresponding target sequences in a cell with nucleic acid targeting activity.
  • a single vector may contain about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20 or more guide sequences.
  • the vector comprises a regulatory element operably connected to a coding sequence encoding a Cas12o protein.
  • Cas12o protein or one or more nucleic acid targeting guide RNAs may be delivered separately; and advantageously, at least one of these is delivered via a particle complex.
  • Cas12o protein mRNA may be delivered before the guide RNA to allow time for expression of the Cas12o protein.
  • Cas12o protein mRNA may be administered 1-12 hours (preferably about 2-6 hours) before administering the guide RNA.
  • Cas12o protein mRNA and guide RNA may be administered together.
  • a second booster dose of guide RNA may be administered 1-12 hours (preferably about 2-6 hours) after the initial administration of Cas12o protein mRNA + guide RNA. Additional administration of Cas12o protein mRNA and/or guide RNA may be useful for achieving the most effective genome modification level.
  • the vector encodes a Cas12o protein comprising one or more nuclear localization sequences (NLS), such as about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS. More particularly, the vector comprises one or more NLSs that are not naturally present in the Cas12o protein. Most particularly, the NLS is present in the vector 5' and/or 3' of the Cas12o protein sequence.
  • NLS nuclear localization sequences
  • the effector protein targeting RNA comprises about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the amino terminus, and comprises about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the carboxyl terminus, or a combination of these (e.g., 0 or at least one or more NLSs at the amino terminus and 0 or one or more NLSs at the carboxyl terminus).
  • each may be selected independently of the other, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs in one or more copies.
  • an NLS is considered to be near the N-terminus or C-terminus when the closest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50 or more amino acids along the polypeptide chain from the N-terminus or C-terminus.
  • Non-limiting examples of NLS include NLS sequences derived from: NLS of SV40 virus large T antigen, which has the amino acid sequence PKKKRKV (SEQ ID NO.28); NLS from nucleoplasmic protein (for example, the nucleoplasmic protein bipartite NLS having the sequence KRPAATKKAGQAKKKK (SEQ ID NO.24)); NLS having the amino acid sequence KRTADGSEFESPKKKRKV (SEQ ID NO.22), AVKRPAATKKAGQAKKKKLD (SEQ ID NO.23), KKTELQTTNAENKTKKL (SEQ ID NO.25), KRGINDRNFWRGENGRKTR (SEQ ID NO.26), RKSGKIAAIVVKRPRK (SEQ ID NO.27), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO.29).
  • the NLS sequence also includes a c-myc NLS having an amino acid sequence of PAAKRVKLD (SEQ ID NO.32) or RQRRNELKRSP (SEQ ID NO:33), an hRNPA1M9 NLS comprising an amino acid sequence of NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO.34); a sequence of RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO:35) from the IBB domain of importin- ⁇ ; a sequence of VSRKRPRP (SEQ ID NO:36) and PPKKARED (SEQ ID NO:37) from myoma T protein.
  • a c-myc NLS having an amino acid sequence of PAAKRVKLD (SEQ ID NO.32) or RQRRNELKRSP (SEQ ID NO:33), an hRNPA1M9 NLS comprising an amino acid sequence of NQSSNF
  • one or more NLSs are of sufficient strength to drive the accumulation of detectable amounts of DNA/RNA-targeting Cas12o proteins in the nucleus of eukaryotic cells.
  • the strength of nuclear localization activity can be derived from the number of NLSs in the nucleic acid targeting effector protein, the specific NLS used, or a combination of these factors.
  • the codon-optimized Cas12o effector protein comprises an NLS attached to the C-terminus of the protein.
  • other localization tags can be fused to the Cas12o protein, such as, but not limited to, localizing the Cas12o protein to a specific site in the cell, such as an organelle, such as mitochondria, plastids, chloroplasts, vesicles, Golgi bodies, (nuclear or cell) membranes, ribosomes, nucleoli, ER, cytoskeleton, vacuoles, centrosomes, nucleosomes, granules, centrioles, etc.
  • organelle such as mitochondria, plastids, chloroplasts, vesicles, Golgi bodies, (nuclear or cell) membranes, ribosomes, nucleoli, ER, cytoskeleton, vacuoles, centrosomes, nucleosomes, granules, centrioles, etc.
  • the Cas12o protein of the present disclosure comprises one or more NLSs at its N-terminus and/or C-terminus, preferably one NLS each at its N-terminus and C-terminus.
  • the present disclosure contemplates the use of codon-optimized Cas12 proteins, more particularly nucleic acid sequences (and optionally protein sequences) encoding Cas12o.
  • An example of a codon-optimized sequence is a sequence optimized for expression in eukaryotes such as humans (i.e., optimized for expression in humans), or a sequence optimized for another eukaryote, animal, or mammal as discussed herein. Although this is preferred, it should be understood that other examples are also possible, and codon optimization for host species other than humans or codon optimization for specific organs is known.
  • the enzyme coding sequence encoding the Cas protein targeting DNA/RNA is codon-optimized for expression in specific cells such as eukaryotic cells.
  • Eukaryotic cells can be eukaryotic cells of specific organisms such as plants or mammals, or eukaryotic cells derived from specific organisms such as plants or mammals, including but not limited to humans or non-human eukaryotes or animals or mammals as discussed herein, such as mice, rats, rabbits, dogs, livestock, or non-human mammals or primates.
  • methods for modifying human germline genetic characteristics and/or methods for modifying genetic characteristics of animals that may cause human suffering without any substantial medical benefit to humans or animals, as well as animals produced by such methods may be excluded.
  • codon optimization refers to a method of modifying a nucleic acid sequence in a target host cell to enhance expression by replacing at least one codon of a native sequence with a codon that is more frequently or most frequently used in the genes of the host cell (e.g., about or greater than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) and maintaining the native amino acid sequence.
  • Various species exhibit specific biases for certain codons of specific amino acids. Codon bias (differences in codon usage between organisms) is generally related to the translation efficiency of messenger RNA (mRNA), which is believed to be particularly dependent on the properties of the codons translated and the availability of specific transfer RNA (tRNA) molecules.
  • mRNA messenger RNA
  • tRNA transfer RNA
  • Codon usage tables are readily available, for example, in the "Codon Usage Database” at www.kazusa.orjp/codon/, and these tables can be modified in a variety of ways. See Nakamura, Y. et al., "Codon usage tabulated from the international DNA sequence databases: status for the year 2000" Nucl. Acids Res. 28:292 (2000).
  • Computer algorithms for codon optimization of specific sequences for expression in specific host cells are also available, such as Gene Forge (Aptagen; Jacobus, PA).
  • one or more codons in the sequence encoding the DNA/RNA-targeting Cas protein correspond to the most commonly used codons for a specific amino acid.
  • codon usage in yeast refer to the online yeast genome database available at www.yeastgenome.org/community/codon_usage.shtml, or Codon selection in yeast, Bennetzen and Hall, J Biol Chem. 1982 Mar 25;257(6):3026-31.
  • codon usage in plants including algae, see Codon usage in higher plants, green algae, and cyan bacteria, Campbell and Gowri, Plant Physiol.
  • the polynucleotide encoding Cas12o has been codon-optimized for expression in a corresponding host cell (eg, a mammalian cell, more specifically a human cell, such as an HSC or an iPSC).
  • a corresponding host cell eg, a mammalian cell, more specifically a human cell, such as an HSC or an iPSC.
  • linker refers to a molecule that connects proteins to form a fusion protein. Typically, such molecules have no specific biological activity except for connecting or maintaining a certain minimum distance or other spatial relationship between proteins. However, in embodiments, the linker can be selected to affect some properties of the linker and/or fusion protein, such as the folding, net charge or hydrophobicity of the linker.
  • Suitable linkers for the methods herein include straight or branched carbon linkers, heterocyclic carbon linkers or peptide linkers. However, as used herein, linkers may also be covalent bonds (carbon-carbon bonds or carbon-heteroatom bonds). In embodiments, linkers may be chemical moieties, which may be monomers, dimers, polymers or polymers. Preferably, linkers include amino acids. Typical amino acids in flexible linkers include Gly, Asn and Ser. Therefore, in specific embodiments, linkers include one or more combinations of Gly, Asn and Ser amino acids. Other near-neutral amino acids, such as Thr and Ala, may also be used for linker sequences. Exemplary, GlySer linkers GGS, GGGS or GSG may be used.
  • GGS, GSG, GGGS or GGGGS linkers may be repeated multiple times (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or even more, e.g., (GGS) 3 (SEQ ID NO: 10), (GGGGS) 3 (SEQ ID NO: 15)) to provide a suitable length.
  • compositions and systems which may include Cas12o protein or its catalytically inactive form, one or more crRNA or guide molecules, and reverse transcriptase.
  • the system can be used to insert a donor polynucleotide into a target polynucleotide.
  • the composition or system includes a catalytically inactive Cas12o protein, a reverse transcriptase that associates with or can otherwise form a complex with the Cas12o protein, and a crRNA that can form a complex with the Cas12o protein and guide the site-specific binding of the complex to the target sequence of the target polynucleotide, and the crRNA also includes a donor sequence for inserting the target polynucleotide.
  • the catalytically inactive Cas12o protein can be a nickase, such as a DNA nickase.
  • the Cas12o protein has one or more mutations.
  • Cas12o protein can be associated with reverse transcriptase.
  • the reverse transcriptase domain can be a reverse transcriptase or a fragment thereof.
  • the reverse transcriptase is human immunodeficiency virus (HIV) RT, avian myoblast virus (AMV) RT, Moloney murine leukemia virus (M-MLV) RT, group II intron RT, group II intron-like RT, or chimeric RT.
  • HCV human immunodeficiency virus
  • AMV avian myoblast virus
  • M-MLV Moloney murine leukemia virus
  • RT comprises a modified form of these RTs, such as an engineered variant of avian myoblast virus (AMV) RT, Moloney murine leukemia virus (M-MLV) RT or human immunodeficiency virus (HIV) RT (see, e.g., Anzalone et al., Search-and-replace genome editing without double-strand breaks or donor DNA, Nature. 2019 December; 576(7785): 149-157).
  • AMV avian myoblast virus
  • M-MLV Moloney murine leukemia virus
  • HAV human immunodeficiency virus
  • the compositions and systems may include a Cas12o protein or a variant thereof disclosed herein; a reverse transcriptase (RT) polypeptide connected to or otherwise capable of forming a complex with the Cas12o protein or a variant thereof; and a crRNA capable of forming a complex with the Cas12o protein or a variant thereof and comprising: a crRNA or a guide sequence capable of guiding the site-specific binding of the Cas12o protein or its variant to a target sequence of a target polynucleotide; a 3' binding site region capable of binding to the upstream cleavage strand of the target polynucleotide; and an RT template sequence encoding an extension sequence, wherein the extension sequence comprises a variant region and a 3' homologous sequence capable of hybridizing with the downstream cleavage strand of the target polynucleotide.
  • RT reverse transcriptase
  • the reverse transcriptase domain can be a reverse transcriptase or a fragment thereof.
  • a wide variety of reverse transcriptases (RT) can be used in alternative embodiments of the present disclosure, including prokaryotic and eukaryotic RTs, provided that these RTs function in a host and produce a donor polynucleotide sequence from an RNA template.
  • the nucleotide sequence of a natural RT can be modified, for example, using known codon optimization techniques to optimize expression in a desired host.
  • Reverse transcriptase (RT) is an enzyme for producing complementary DNA (cDNA) from an RNA template, a process known as reverse transcription.
  • the RT domain of a reverse transcriptase is used in the present disclosure.
  • the domain may only include RNA-dependent DNA polymerase activity.
  • the RT domain is non-mutagenic, i.e., does not cause mutations in the donor polynucleotide (e.g., during the reverse transcriptase process).
  • the RT domain may be a non-reverse transcriptase RT, such as a viral RT or a human endogenous RT.
  • the RT domain may be a reverse transcriptase RT or a DGR RT.
  • RT may be less mutagenic than the corresponding wild-type RT. In one embodiment, the RT herein is not mutagenic.
  • the reverse transcriptase may be fused to the C-terminus of Cas12o or a variant thereof. Alternatively or additionally, the reverse transcriptase may be fused to the N-terminus of Cas12o or a variant thereof. The fusion may be performed through a linker. In some examples, the reverse transcriptase may be an M-MLV reverse transcriptase or a variant thereof. The M-MLV reverse transcriptase variant may comprise one or more mutations.
  • One or more functional domains may be one or more reverse transcriptase domains.
  • the system comprises an engineered system for modifying a target polynucleotide, the system comprising: a Cas12o protein or a CRISPR-associated Cas12o protein or a variant thereof (e.g., dCas12o); a reverse transcriptase (RT) domain; an RNA template comprising or encoding a donor polynucleotide of a target sequence to be inserted into a target polynucleotide; and crRNA.
  • a Cas12o protein or a CRISPR-associated Cas12o protein or a variant thereof e.g., dCas12o
  • RT reverse transcriptase
  • the donor template for homologous recombination is produced by using the self-starting RNA template for reverse transcription.
  • a self-starting reverse transcription system is a reverse transcription subsystem.
  • Term " reverse transcriptase" means a kind of genetic element, and its encoded component enables the synthesis of single-stranded DNA (msDNA) and reverse transcriptase connected by branched RNA.
  • the reverse transcriptase domain is a reverse transcriptase RT domain.
  • the RNA template encodes the reverse transcriptase RNA template identified and reversely transcribed by the reverse transcriptase reverse transcriptase domain.Reverse transcriptase is all conservative in many bacterial species, and is a relatively unknown efficient reverse transcription system of function.
  • the reverse transcription subsystem is composed of reverse transcriptase RT protein and msr and msd transcripts, and msr and msd transcripts serve as primers and template sequences respectively.
  • All components of the reverse transcriptase subsystem are expressed as a single transcript from a single open reading frame, which includes msr-msd and encodes the reverse transcriptase RT protein (Lampson et al., 2005, Retrons, msDNA, and the bacterial genome. Cytogenet Genome Res 110: 491-499).
  • the msr element ORF of the reverse transcriptase provides the RNA portion of the msDNA molecule, while the msd element ORF provides the DNA portion of the msDNA molecule.
  • the main transcript from the msr-msd region is considered to serve as a template and primer for producing msDNA.
  • the msDNA is synthesized from the internal rG residue of the RNA transcript using its 2'-OH group.
  • MSD or msr can also be modified to allow the insertion of an RNA template encoding a donor polynucleotide within msd without changing the function or production of msDNA.
  • the RNA template encoding the donor polynucleotide sequence can be of any length, but preferably less than about 5kb nucleotides, or also less than about 2kb, or also less than 500 bases, provided that an msDNA product is produced.
  • Topoisomerase is a class of enzymes that change the topological state of DNA by breaking and reconnecting nucleic acid chains.
  • the topoisomerase may be a DNA topoisomerase, which controls and changes the topological state of DNA during transcription and catalyzes the instantaneous breakage and reconnection of DNA single strands, thereby allowing the chains to pass through each other, thereby changing the topological structure of DNA.
  • one or more functional domains may be one or more topoisomerase domains.
  • an engineered system for modifying a target polynucleotide comprises: Cas12o protein; a topoisomerase domain; and a nucleic acid template comprising or encoding a donor polynucleotide of a target sequence to be inserted into a target polynucleotide.
  • two or more of the following may form a complex: Cas12o protein; a topoisomerase domain; and a nucleic acid template.
  • two or more of the following may be included in a fusion protein: Cas12o protein; a topoisomerase domain.
  • the topoisomerase domain is capable of connecting the donor polynucleotide to the target polynucleotide.
  • the connection can be achieved by sticky end or blunt end connection.
  • the donor polynucleotide may include an overhang, which includes a sequence complementary to a region of the target polynucleotide.
  • Examples of connecting the donor polynucleotide to the target polynucleotide include examples of TOPO cloning, for example, those described in "The Technology Behind TOPO Cloning," at www.thermofisher.com/us/en/home/life-science/cloning/topo/topo-resources/the-technology-behind-topo-cloning.html.
  • the topoisomerase domain can be associated with the donor polynucleotide.
  • the topoisomerase domain is covalently linked to the donor polynucleotide.
  • topoisomerases examples include type I (including type IA and type IB topoisomerases), which cut a single strand of a double-stranded nucleic acid molecule; and type II topoisomerases (e.g., gyrase), which cut both strands of a double-stranded nucleic acid molecule.
  • the topoisomerase is a DNA topoisomerase I, such as vaccinia virus topoisomerase I.
  • the topoisomerase may be preloaded with a donor polynucleotide.
  • the vaccinia virus topoisomerase may require a target comprising a 5'-OH group.
  • the systems herein may also comprise a phosphatase domain.
  • a phosphatase is an enzyme that is capable of removing a phosphate group from a molecule, e.g., a nucleic acid such as DNA.
  • Examples of phosphatases include calf intestinal phosphatase, shrimp alkaline phosphatase, Antarctic phosphatase, and APEX alkaline phosphatase.
  • the 5'-OH group in the target polynucleotide can be produced by a phosphatase.
  • Topoisomerases compatible with 5' phosphate targets can be used to produce stably loaded intermediates.
  • Cas12o proteins or CRISPR-related Cas12o proteins that leave 5'OH after cutting the target polynucleotide can be used.
  • the phosphatase domain can be associated with (e.g., fused to) the Cas12o protein. The phosphatase domain may be able to produce an -OH group at the 5' end of the target polynucleotide.
  • the phosphatase can be separated from other components in the system, for example, as a separate protein, delivered on a carrier separated from other components.
  • the system herein may also include a polymerase domain.
  • a polymerase refers to an enzyme that synthesizes a nucleic acid chain.
  • the polymerase may be a DNA polymerase or an RNA polymerase.
  • the system includes an engineered system for modifying a target polynucleotide, the engineered system comprising: a Cas12o protein or a CRISPR-associated Cas12o protein; a DNA polymerase domain; and a DNA template comprising a donor polynucleotide of a target sequence to be inserted into the target polynucleotide.
  • a Cas12o protein or a CRISPR-associated Cas12o protein
  • a DNA polymerase domain and a DNA template comprising a donor polynucleotide of a target sequence to be inserted into the target polynucleotide.
  • two or more of the following may form a complex: a Cas12o protein; a DNA polymerase domain; and a DNA template.
  • two or more of the following are included in a fusion protein: a Cas12o protein; a DNA polymerase domain.
  • the system may comprise a Cas12o protein or a CRISPR-associated Cas12o protein (or a variant thereof, such as a dCas12o protein or a CRISPR-associated Cas12o protein or a CRISPR-associated Cas12o nickase) and a DNA polymerase (e.g., phi29, T4, T7 DNA polymerase).
  • the system may also comprise a single-stranded DNA or double-stranded DNA template.
  • the DNA template may comprise i) a first sequence homologous to the target site of the Cas12o protein on the target polynucleotide, and/or ii) a second sequence homologous to another region of the target polynucleotide.
  • the template may be a synthetic single-stranded or PCR-generated DNA molecule (optionally end-protected by modified nucleotides), or a viral genome (e.g., AAV).
  • the template is generated using a reverse transcriptase.
  • an endogenous DNA polymerase in the cell may be used.
  • an exogenous DNA polymerase may be expressed in the cell.
  • the DNA template may be end-protected by one or more modified nucleotides, or may comprise a portion of a viral genome.
  • DNA polymerases examples include Taq, Tne(exo-), Tma(exo-), Pfu(exo-), Pwo(exo-), Thermoanaerobacter thermohydrosulf uricus DNA polymerase, Thermococcus litoralis DNA polymerase I, Escherichia coli DNA polymerase I, Taq DNA polymerase I, Tth DNA polymerase I, Bacillus stearothermophilus (Bst) DNA polymerase I, Escherichia coli DNA polymerase III, bacteriophage T5 DNA polymerase, bacteriophage M2 DNA polymerase.
  • Thermoanaerobacter thermohydrosulf uricus DNA polymerase examples include Taq, Tne(exo-), Tma(exo-), Pfu(exo-), Pwo(exo-), Thermoanaerobacter thermohydrosulf uricus DNA polymerase
  • bacteriophage T4 DNA polymerase bacteriophage T7 DNA polymerase, bacteriophage phi29 DNA polymerase, bacteriophage PRD1 DNA polymerase, bacteriophage phi15 DNA polymerase, bacteriophage phi21 DNA polymerase, bacteriophage PZE DNA polymerase, bacteriophage PZA DNA polymerase, bacteriophage NfDNA polymerase, bacteriophage M2Y DNA polymerase, bacteriophage B103 DNA polymerase, bacteriophage SF5 DNA polymerase, bacteriophage GA-1 DNA polymerase, bacteriophage Cp-5 DNA polymerase, bacteriophage Cp-7 DNA polymerase, bacteriophage PR4 DNA polymerase, bacteriophage PR5 DNA polymerase, bacteriophage PR722 DNA polymerase and bacteriophage L17 DNA polymerase.
  • the present disclosure also provides a nucleic acid targeting system.Such systems can be used for targeting, modifying and otherwise manipulating nucleic acids.
  • the system comprises Cas12o protein or CRISPR-related Cas12o protein and one or more crRNA or guide RNA.Cas12o protein or CRISPR-related Cas12o protein may have nuclease activity, for example, capable of cutting DNA or RNA.Cas12o protein or CRISPR-related Cas12o protein may have nickase activity, for example, capable of producing single-strand breaks on double-stranded nucleic acids such as dsDNA or dsRNA.Cas12o protein or CRISPR-related Cas12o protein may be in a dead form, for example, with nickase activity, or without nuclease or nickase activity.
  • the system also comprises one or more functional domains, for example, nucleotide deaminase, reverse transcriptase, non-LTR retro
  • the donor polynucleotide may be included in or encoded by the nucleic acid template.
  • two or more components in the system herein may form a complex.
  • the components are independent molecules, but interact directly or indirectly with each other.
  • Some of the two or more components in the system herein may be included in a fusion protein.
  • target sequence refers to a sequence to which crRNA is designed to have complementarity, wherein the hybridization between the target sequence and the spacer sequence promotes the formation of a complex targeting DNA or RNA. Complete complementarity is not necessarily required, as long as there is enough complementarity to cause hybridization and promote the formation of a complex targeting nucleic acid.
  • the target sequence may comprise an RNA polynucleotide.
  • the target sequence is located in the nucleus or cytoplasm of the cell.
  • the target sequence may be in an organelle of a eukaryotic cell, for example, a mitochondria or a chloroplast.
  • a sequence or template that can be used to recombine to a targeting locus comprising a target sequence is referred to as an "editing template” or “editing sequence”.
  • an exogenous template may be referred to as an editing template.
  • recombination is homologous recombination.
  • a nucleic acid-targeting complex comprising a guide RNA hybridized to a target sequence and complexed with one or more nucleic acid-targeting effector proteins
  • the formation of a nucleic acid-targeting complex results in the cleavage of one or both nucleic acid strands in or near (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50 or more base pairs of) the target sequence.
  • one or more vectors driving the expression of one or more elements of a nucleic acid-targeting system are introduced into a host cell such that expression of these elements of the nucleic acid-targeting system directs the formation of a nucleic acid-targeting complex at one or more target sites.
  • a nucleic acid-targeting effector protein and a crRNA or guide RNA can each be operably linked to a separate regulatory element on a separate vector.
  • two or more of these elements expressed from the same or different regulatory elements can be combined in a single vector, wherein one or more additional vectors provide any components of the nucleic acid-targeting system not contained in the first vector.
  • the nucleic acid-targeting system elements combined in a single vector can be arranged in any suitable orientation, such as one element being located 5' ("upstream") relative to a second element or 3' ("downstream") relative to the second element.
  • the coding sequence of one element may be located on the same strand or the opposite strand of the coding sequence of the second element and oriented in the same or opposite direction.
  • a single promoter drives the expression of a transcript encoding a nucleic acid-targeting effector protein and a guide RNA embedded within one or more intron sequences (e.g., each in a different intron, two or more in at least one intron, or all in a single intron).
  • the nucleic acid-targeting effector protein and guide RNA are operably linked to and expressed from the same promoter.
  • compositions and systems herein may include one or more nucleic acid templates.
  • the nucleic acid template may include one or more polynucleotides.
  • the nucleic acid template may include the coding sequence of one or more polynucleotides.
  • the nucleic acid template may be an RNA template.
  • the nucleic acid template may be a DNA template.
  • Donor polynucleotides can be used to edit target polynucleotides.
  • donor polynucleotides include one or more mutations to be introduced into target polynucleotides. Examples of such mutations include substitutions, deletions, insertions, or combinations thereof. Mutations can cause open reading frame shifts on target polynucleotides.
  • donor polynucleotides change the stop codons in target polynucleotides. For example, donor polynucleotides can correct premature stop codons. Correction can be achieved by making the stop codons missing or introducing one or more mutations to the stop codons.
  • donor polynucleotides solve loss-of-function mutations, deletions, or translocations that may occur, for example, in certain disease situations by inserting or restoring a functional copy of a gene or its functional fragment or a functional regulatory sequence or a functional fragment of a regulatory sequence.
  • Functional fragments refer to genes that are less than complete copies, in a manner that provides enough nucleotide sequences to restore the functionality of wild-type genes or non-coding regulatory sequences (e.g., sequences encoding long non-coding RNAs).
  • the system disclosed herein can be used to replace a single allele of a defective gene or its defective fragment.
  • the system disclosed herein can be used to replace two alleles of a defective gene or a defective gene fragment.
  • a "defective gene” or “defective gene fragment” is a gene or gene portion that, when expressed, fails to produce a functional protein or non-coding RNA with a corresponding wild-type gene.
  • these defective genes may be associated with one or more disease phenotypes.
  • the defective gene or gene fragment is not replaced, but the system described herein is used to insert a donor polynucleotide encoding a gene or gene fragment that compensates or covers the expression of the defective gene, thereby eliminating the cell phenotype associated with the expression of the defective gene or changing it to a different or desired cell phenotype.
  • the donor polynucleotide may include but is not limited to a gene or gene fragment, a coded protein to be expressed or an RNA transcript, a regulatory element, a repair template, etc.
  • the donor polynucleotide may include left and right end sequence elements that work together with the transposition component that mediates insertion.
  • the donor polynucleotide manipulates the splice site on the target polynucleotide.
  • the donor polynucleotide destroys the splice site. Destruction can be achieved by inserting the polynucleotide into the splice site and/or introducing one or more mutations into the splice site.
  • the donor polynucleotide can restore the splice site.
  • the polynucleotide may include a splice site sequence.
  • the size of the donor polynucleotide to be inserted can be from 10 base pairs or nucleotides to 50 kb in length, for example, 50 to 40k, 100 and 30k, 100 to 10000, 100 to 300, 200 to 400, 300 to 500, 400 to 600, 500 to 700, 600 to 800, 700 to 900, 800 to 1000, 900 to 1100, 1000 to 1200, 1100 to 1300, 1200 to 1400, 1300 to 1500, 1400 to 1600, 1500 to 1700, 600 to 1800, 1700 to 1900, 1800 to 2000 base pairs (bp) or nucleotides in length.
  • the present disclosure also provides a delivery system for introducing the components of the systems and compositions herein into cells, tissues, organs or organisms.
  • the delivery system may include one or more delivery vehicles and/or cargo.
  • the components of the CRISPR-Cas system can be delivered in various forms, such as a combination of DNA/RNA or RNA/RNA or protein RNA.
  • the Cas12o protein can be delivered as a polynucleotide encoding DNA or a polynucleotide encoding RNA or as a protein.
  • the guide can be delivered as a DNA encoding polynucleotide or RNA. All possible combinations are envisioned, including mixed delivery forms.
  • the disclosure provides methods comprising delivering one or more polynucleotides, such as one or more vectors as described herein, one or more transcripts thereof, and/or one or more proteins transcribed therefrom, to a host cell.
  • one or more vectors that drive the expression of one or more elements of the nucleic acid targeting system are introduced into a host cell so that the expression of the elements of the nucleic acid targeting system directs the formation of a nucleic acid targeting complex at one or more target sites.
  • a nucleic acid targeting effector enzyme and a nucleic acid targeting guide RNA can each be operably linked to separate regulatory elements on separate vectors.
  • RNA of the nucleic acid targeting system can be delivered to a transgenic nucleic acid targeting effector protein animal or mammal, for example, an animal or mammal that constitutively or inducibly or conditionally expresses the nucleic acid targeting effector protein; or an animal or mammal that otherwise expresses the nucleic acid targeting effector protein or has cells containing the nucleic acid targeting effector protein, for example, by previously administering thereto one or more vectors encoding and expressing the nucleic acid targeting effector protein in vivo.
  • two or more elements expressed by the same or different regulatory elements can be combined in a single vector, and one or more additional vectors provide any components of the nucleic acid targeting system not contained in the first vector.
  • the nucleic acid targeting system elements combined in a single vector can be arranged in any suitable orientation, for example, one element is located 5' ("upstream") relative to the second element or 3' ("downstream") relative to the second element.
  • the coding sequence of one element may be located on the same or opposite strand of the coding sequence of the second element and oriented in the same or opposite direction.
  • a single promoter drives expression of transcripts encoding nucleic acid targeting effector proteins and nucleic acid targeting guide RNAs, which are embedded within one or more intron sequences (e.g., each in a different intron, two or more in at least one intron, or all in a single intron).
  • the nucleic acid targeting effector protein and the nucleic acid targeting guide RNA may be operably linked to and expressed from the same promoter.
  • Delivery vehicles, vectors, particles, nanoparticles, formulations, and components thereof for expressing one or more elements of a nucleic acid targeting system are as used in WO 2014/093622 (PCT/US2013/074667).
  • the vector comprises one or more insertion sites, such as restriction endonuclease recognition sequences (also referred to as "cloning sites").
  • one or more insertion sites are located upstream and/or downstream of one or more sequence elements of one or more vectors.
  • a single expression construct can be used to target nucleic acid targeting activity to multiple different corresponding target sequences within a cell.
  • a single vector may contain about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20 or more guide sequences.
  • about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more such guide sequence-containing vectors may be provided and optionally delivered to a cell.
  • the vector comprises a regulatory element operably linked to an enzyme coding sequence encoding a nucleic acid targeting effector protein.
  • the nucleic acid targeting effector protein or one or more nucleic acid targeting guide RNAs may be delivered separately; and advantageously, at least one of these is delivered via a particle complex.
  • the nucleic acid targeting effector protein mRNA may be delivered before the nucleic acid targeting guide RNA to allow time for the nucleic acid targeting effector protein to be expressed.
  • the nucleic acid-targeting effector protein mRNA can be administered 1-12 hours (preferably about 2-6 hours) before the administration of the nucleic acid-targeting guide RNA.
  • nucleic acid-targeting effector protein mRNA and the nucleic acid-targeting guide RNA can be administered together.
  • a second booster dose of the guide RNA can be administered 1-12 hours (preferably about 2-6 hours) after the initial administration of the nucleic acid-targeting effector protein mRNA + guide RNA.
  • Other administrations of nucleic acid-targeting effector protein mRNA and/or guide RNA may be useful to achieve the most effective level of genome modification.
  • Non-viral vector delivery systems include DNA plasmids, RNA (transcripts of vectors such as those described herein), naked nucleic acids, and nucleic acids complexed with delivery vehicles such as liposomes.
  • Viral vector delivery systems include DNA and RNA viruses that have free or integrated genomes after delivery to cells.
  • Non-viral delivery methods for nucleic acids include lipofection, nuclear transfection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycations or lipids: nucleic acid conjugates, naked DNA, artificial virosomes, and agent-enhanced DNA uptake.
  • Lipofection is described in, for example, U.S. Patent Nos. 5,049,386, 4,946,787; and 4,897,355 and lipofection reagents are sold commercially (e.g., Transfectam TM and Lipofectin TM ).
  • Cationic lipids and neutral lipids suitable for effective receptor recognition lipofection of polynucleotides include Felgner, WO 91/17424; Those of WO 91/16024. Can be delivered to cells (e.g., in vitro or ex vivo administration) or target tissues (e.g., in vivo administration).
  • Plasmid delivery involves cloning the guide RNA into a plasmid expressing the CRISPR-Cas protein and transfecting the DNA in cell culture.
  • Plasmid backbones are commercially available and do not require specific equipment. They have the advantage of modularity, being able to carry CRISPR-Cas coding sequences of different sizes (including sequences encoding larger-sized proteins) and selection markers.
  • the advantage of plasmids is that they ensure instantaneous but sustained expression.
  • the delivery of plasmids is not direct, making the in vivo efficiency generally low. Sustained expression may also be disadvantageous because it can increase off-target editing.
  • excessive accumulation of CRISPR-Cas proteins may be toxic to cells.
  • plasmids always have the risk of random integration of dsDNA in the host genome, more particularly considering the risk of producing double-strand breaks (on-target and off-target).
  • lipid:nucleic acid complexes including targeted liposomes, such as immunolipid complexes
  • lipid:nucleic acid complexes including targeted liposomes, such as immunolipid complexes
  • Boese et al. Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1995); 654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787). This will be discussed in more detail below.
  • RNA or DNA virus-based systems to deliver nucleic acids utilizes a highly evolved process of targeting specific cells in the virus body and transporting the viral payload to the nucleus.
  • Viral vectors can be directly applied to patients (in vivo), or they can be used for in vitro therapeutic cells, and modified cells can be optionally applied to patients (ex vivo).
  • Conventional viral-based systems can include retrovirus, slow virus, adenovirus, adeno-associated virus and herpes simplex virus vectors for gene transfer. Retrovirus, slow virus and adeno-associated virus gene transfer methods can be integrated into the host genome, which usually results in the long-term expression of the inserted transgene. In addition, high transduction efficiency has been observed in many different cell types and target tissues.
  • retroviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Therefore, the choice of retroviral gene transfer system will depend on the target tissue. Retroviral vectors consist of cis-acting long terminal repeats with the ability to package up to 6-10 kb of foreign sequence. The minimal cis-acting LTRs are sufficient for replication and packaging of the vector, which is then used to integrate the therapeutic gene into the target cells to provide permanent transgene expression.
  • Widely used retroviral vectors include those based on murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), simian immunodeficiency virus (SIV), human immunodeficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992); Sommnerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al., J. Virol. 65:2220-2224 (1991); PCT/US94/05700).
  • MiLV murine leukemia virus
  • GaLV gibbon ape leukemia virus
  • SIV simian immunodeficiency virus
  • HAV human immunodeficiency virus
  • Adenovirus-based systems can be used.
  • Adenovirus-based vectors can achieve high transduction efficiencies in many cell types and do not require cell division. Using such vectors, high titers and expression levels have been obtained.
  • the vector can be produced in large quantities in a relatively simple system.
  • Adeno-associated virus (“AAV”) vectors can also be used to transduce cells with target nucleic acids, for example, in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, for example, West et al., Virology 160:38-47 (1987); U.S. Patent No.
  • the present disclosure provides an AAV comprising or consisting essentially of an exogenous nucleic acid molecule encoding a CRISPR system, e.g., a plurality of cassettes comprising or consisting of a first cassette, the first cassette comprising or consisting essentially of a promoter, a nucleic acid molecule encoding a CRISPR-associated (Cas) protein (a putative nuclease or helicase protein), e.g., Cas12o, and a terminator, and one or more, advantageously up to the packaging size limit of the vector, e.g., a total of five cassettes (including the first cassette), the cassette comprising or consisting essentially of a promoter, a nucleic acid molecule encoding a guide RNA (gRNA), and a terminator (e.g., each cassette is schematically represented as promoter-gRNA1-terminator, promoter-gRNA2-terminator...promoter-gRNA(N)-terminator
  • rAAV may contain a single box comprising or consisting essentially of: a promoter, multiple crRNA/gRNAs, and a terminator (e.g., schematically represented as promoter-gRNA1-gRNA2...gRNA(N)-terminator, where N is the number of the upper limit of the packaging size limit of the vector that can be inserted). See Zetsche et al., Nature Biotechnology 35, 31-34 (2017), which is incorporated herein by reference in its entirety.
  • the nucleic acid molecules discussed herein regarding AAV or rAAV are advantageously DNA.
  • the promoter is advantageously the human synapsin I promoter (hSyn).
  • Other methods for delivering nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817, which is incorporated herein by reference.
  • Cocal vesiculovirus enveloped pseudotyped retroviral vector particles are contemplated (see, e.g., U.S. Patent Publication No. 20120164118 assigned to Fred Hutchinson Cancer Research Center).
  • Cocal virus belongs to the genus Vesiculovirus and is the etiological agent of vesicular stomatitis in mammals. Cocal virus was originally isolated from mites in Trinidad (Jonkers et al., Am. J. Vet. Res. 25:236-242 (1964)), and infections have been identified in Trinidad, Brazil, and Argentina from insects, cattle, and horses. Many vesiculoviruses that infect mammals have been isolated from naturally infected arthropods, suggesting that they are vector-borne.
  • Cocal vesiculovirus envelope pseudotyped retroviral vector particles may include, for example, lentiviral, alpha retroviral, beta retroviral, gamma retroviral, delta retroviral, and epsilon retroviral vector particles, which may contain retroviral Gag, Pol, and/or one or more accessory proteins and Cocal vesiculovirus envelope proteins.
  • Gag, Pol, and accessory proteins are lentiviral and/or gamma retroviral.
  • host cells are transiently or non-transiently transfected with one or more vectors described herein.
  • the cell when the cell is naturally present in a subject, the cell is transfected and optionally reintroduced therein.
  • the transfected cell is taken from the subject.
  • the cell is derived from a cell taken from the subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art.
  • cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huh1, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Pancl, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5 , MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelium
  • the transient expression and/or presence of one or more components of the AD-functionalized CRISPR system can be of interest, for example, to reduce off-target effects.
  • cells transfected with one or more vectors as described herein are used to establish new cell lines comprising one or more vector-derived sequences.
  • cells transiently transfected with components of the AD-functionalized CRISPR system as described herein e.g., by transient transfection of one or more vectors, or transfected with RNA
  • modified by the activity of the CRISPR complex are used to establish new cell lines comprising cells containing modifications but lacking any other exogenous sequences.
  • cells transiently or non-transiently transfected with one or more vectors as described herein, or cell lines derived from such cells are used to evaluate one or more test compounds.
  • RNA and/or protein may be introduced directly into a host cell.
  • a CRISPR-Cas protein may be delivered as an encoding mRNA together with an in vitro transcribed guide RNA. Such methods may reduce the time required to ensure that the CRISPR-Cas protein is active and further prevent long-term expression of CRISPR system components.
  • the RNA molecules of the present disclosure are delivered in the form of liposomes or lipofectin formulations, and can be prepared by methods well known to those skilled in the art. Such methods are described in, for example, U.S. Pat. Nos. 5,593,972, 5,589,466, and 5,580,859, which are incorporated herein by reference. Delivery systems specifically directed to enhancing and improving the delivery of siRNA into mammalian cells have been developed (see, e.g., Shen et al., FEBS Let. 2003, 539: 111-114; Xia et al., Nat. Biotech. 2002, 20: 1006-1010; Reich et al., Mol. Vision.
  • siRNA has recently been successfully used to inhibit gene expression in primates (see, e.g., Tolentino et al., Retina 24(4): 660), which may also be applied to the present disclosure.
  • RNA delivery is a useful method for in vivo delivery.
  • Cas12o, adenosine deaminase, and guide RNA can be delivered to cells using liposomes or particles. Therefore, the delivery of CRISPR-Cas proteins (such as Cas12o), the delivery of adenosine deaminase (which can be fused to CRISPR-Cas proteins or adapter proteins), and/or the delivery of RNA disclosed herein can be in the form of RNA and via microvesicles, liposomes or particles or nanoparticles.
  • Cas12o mRNA, adenosine deaminase mRNA, and guide RNA can be packaged into liposome particles for in vivo delivery.
  • Liposomal transfection reagents such as lipofectamine from Life Technologies and other reagents on the market, can effectively deliver RNA molecules to the liver.
  • lipid nanoparticles LNPs simultaneously encapsulate Cas12o and its corresponding crRNA.
  • lipid nanoparticles encapsulating Cas12o and/or its corresponding crRNA are administered to subjects (such as humans) in need by intravenous injection.
  • RNA also preferably includes delivery of RNA via particles (Cho, S., Goldberg, M., Son, S., Xu, Q., Yang, F., Mei, Y., Bogatyrev, S., Langer, R. and Anderson, D., Lipid-like nanoparticles for small interfering RNA delivery to endothelial cells, Advanced Functional Materials, 19: 3112-3118, 2010) or exosomes (Schroeder, A., Levins, C., Cortez, C., Langer, R. and Anderson, D., Lipid-based nanotherapeutics for siRNA delivery, Journal of Internal Medicine, 267: 9-21, 2010, PMID: 20059641).
  • RNA via particles Cho, S., Goldberg, M., Son, S., Xu, Q., Yang, F., Mei, Y., Bogatyrev, S., Langer, R. and Anderson, D., Lipid-like nanoparticles for small interfering RNA delivery
  • exosomes have been shown to be particularly useful in delivering siRNA, which is a system somewhat similar to the CRISPR system.
  • El-Andaloussi S et al. (“Exosome-mediated delivery of siRNA in vitro and in vivo.” Nat Protoc. 2012 Dec; 7(12): 2112-26. doi: 10.1038/nprot.2012.131. Epub 2012 Nov 15) describe how exosomes are a promising tool for drug delivery across different biological barriers and can be used for delivery of siRNA in vitro and in vivo.
  • Their method is to generate targeted exosomes by transfecting an expression vector containing an exosomal protein fused to a peptide ligand.
  • the exosomes are then purified and characterized from the transfected cell supernatant, and RNA is then loaded into the exosomes. Delivery or administration according to the present disclosure can be performed with exosomes, particularly but not limited to the brain.
  • Vitamin E ⁇ -tocopherol
  • HDL high-density lipoprotein
  • siRNA short interfering RNA
  • mice were infused via an Osmotic minipump (Model 1007D; Alzet, Cupertino, CA) filled with phosphate-buffered saline (PBS) or free TocsiBACE or Toc-siBACE/HDL and connected to the Brain Infusion Kit 3 (Alzet).
  • the brain infusion cannula was placed in the midline approximately 0.5 mm behind the anterior bregma for infusion into the dorsal third ventricle.
  • Uno et al. found that as little as 3 nmol of Toc-siRNA in HDL could induce a substantial reduction in the target using the same ICV infusion method.
  • CRISPR Cas conjugated to ⁇ -tocopherol and co-administered with HDL targeting the brain may be considered, for example, about 3 nmol to about 3 ⁇ mol of CRISPR Cas targeting the brain may be considered.
  • Zou et al. (HUMAN GENETHERAPY 22:465-475 (April 2011)) described a lentiviral-mediated delivery method of short hairpin RNA targeting PKC ⁇ for in vivo gene silencing in the spinal cord of rats.
  • Zou et al. administered about 10 ⁇ l of recombinant lentivirus via an intrathecal catheter at a titer of 1 ⁇ 10 9 transduction units (TU)/ml.
  • humans may consider similar doses of CRISPR Cas expressed in a lentiviral vector targeting the brain, for example, about 10-50 ml of CRISPR Cas targeting the brain may be considered in a lentivirus at a titer of 1 ⁇ 10 9 transduction units (TU)/ml.
  • TU transduction units
  • the delivery system can be used to introduce the components of the system and composition into plant cells.
  • the components can be delivered to plants using electroporation, microinjection, aerosol injection of plant cell protoplasts, biolistic methods, DNA particle bombardment, and/or Agrobacterium-mediated transformation.
  • methods and delivery systems for plants include those described in Fu et al., Transgenic Res. 2000 February; 9(1): 11-9; Klein RM et al., Biotechnology. 1992; 24: 384-6; Casas AM et al., Proc Natl Acad Sci U SA. 1993 December 1; 90(23): 11212-11216; and U.S. Pat. No. 5,563,055, Davey MR et al., Plant Mol Biol. 1989 September; 13(3): 273-85, which are incorporated herein by reference in their entirety.
  • compositions or Cas12o proteins or CRISPR-associated Cas12o proteins are also applicable to functional domains and other components (e.g., other proteins and polynucleotides associated with Cas12o proteins or CRISPR-associated Cas12o proteins, such as reverse transcriptases, nucleotide deaminases, retrotransposons, donor polynucleotides, etc.).
  • other proteins and polynucleotides associated with Cas12o proteins or CRISPR-associated Cas12o proteins such as reverse transcriptases, nucleotide deaminases, retrotransposons, donor polynucleotides, etc.
  • the delivery system may include one or more goods.
  • the goods may include one or more components of the systems and compositions herein.
  • the goods may include one or more of the following: i) a plasmid encoding one or more protein components such as Cas12o protein or CRISPR-related Cas12o protein and/or functional domains in the composition and system; ii) a plasmid encoding one or more crRNAs, iii) one or more protein components such as Cas12o protein or CRISPR-related Cas12o protein and/or mRNA of functional domains in the composition and system; iv) one or more guide RNAs; v) one or more protein components such as Cas12o protein or CRISPR-related Cas12o protein and/or functional domains in the composition and system; vi) any combination thereof.
  • the one or more protein components may include nucleic acid-guided nucleases (e.g., Cas), reverse transcriptases, nucleotide deaminases
  • goods may include one or more protein components such as Cas12o protein or CRISPR-related Cas12o protein and/or functional domains and one or more (e.g., multiple) guide RNA plasmids in encoding compositions and systems.
  • plasmids may also encode recombinant templates (e.g., for HDR).
  • goods may include mRNA encoding one or more protein components and one or more guide RNAs.
  • goods may include one or more protein components and one or more crRNA or guide RNAs, for example, in the form of ribonucleoprotein complexes (RNPs). Ribonucleoprotein complexes can be delivered by the methods and systems herein.
  • RNPs ribonucleoprotein complexes
  • ribonucleoproteins can be delivered by shuttle agents based on polypeptides.
  • ribonucleoproteins can be delivered using synthetic peptides, and the synthetic peptides include an endosome leakage domain (ELD) operably connected to a cell penetration domain (CPD), an ELD operably connected to a histidine-rich domain and a CPD, for example, as described in WO2016161516.
  • ELD endosome leakage domain
  • CPD cell penetration domain
  • RNPs can also be used to deliver compositions and systems to plant cells, for example, as described in Wu JW et al., Nat Biotechnol. 2015 Nov;33(11):1162-4.
  • the cargo can be introduced into the cell by a physical delivery method.
  • physical methods include microinjection, electroporation, and hydrodynamic delivery. Both nucleic acids and proteins can be delivered using such methods.
  • one or more protein components can be prepared in vitro, separated (if necessary, refolded, purified), and introduced into the cell.
  • Microinjection of cargo directly into cells can achieve high efficiencies, e.g., greater than 90% or about 100%.
  • microinjection can be performed using a microscope and a needle (e.g., 0.5-5.0 ⁇ m in diameter) to pierce the cell membrane and deliver the cargo directly to a target site within the cell.
  • Microinjection can be used for in vitro and ex vivo delivery.
  • the plasmid containing the coding sequence of one or more protein components and/or crRNA, mRNA and/or guide RNA can be microinjected.
  • microinjection can be used for i) DNA is delivered directly to the nucleus, and/or ii) mRNA (e.g., in vitro transcribed) is delivered to the nucleus or cytoplasm.
  • mRNA e.g., in vitro transcribed
  • microinjection can be used for crRNA is delivered directly to the nucleus and mRNA is delivered to the cytoplasm, so as to promote the translation of one or more protein components and the shuttling to the nucleus.
  • Microinjection can be used to produce genetically modified animals. For example, gene editing cargo can be injected into fertilized eggs to allow efficient germline modification. This method can produce normal embryos and full-term mouse pups with the desired modifications. Microinjection can also be used to provide transient upregulation or downregulation of specific genes within the cell genome, for example using Cas12o proteins or CRISPR-associated Cas12o proteins.
  • cargo and/or delivery vehicle can be delivered by electroporation.
  • Electroporation can use pulsed high voltage current to instantaneously open nanometer-sized pores in the cell membrane of cells suspended in a buffer, thereby allowing components with a hydrodynamic diameter of tens of nanometers to flow into the cell.
  • electroporation can be used for various cell types and efficiently transfer cargo into cells. Electroporation can be used for in vitro and ex vivo delivery.
  • Electroporation can also be used to deliver cargo to the nucleus of mammalian cells by applying specific voltages and reagents, such as by nuclear transfection. Such methods include those described in Wu Y et al. (2015). Cell Res 25:67-79; Ye L et al. (2014). Proc Natl Acad Sci USA 111:9591-6; Choi PS, Meyerson M. (2014). Nat Commun 5:3728; Wang J, Quake SR. (2014). Proc Natl Acad Sci 111:13157-62. Electroporation can also be used to deliver cargo in vivo, for example, by using the method described in Zuckermann M et al. (2015). Nat Commun 6:7391.
  • Hydrodynamic delivery can also be used to deliver cargo, such as for in vivo delivery.
  • hydrodynamic delivery can be performed by rapidly pushing a large volume (8%-10% body weight) solution containing gene editing cargo into the bloodstream of a subject (e.g., an animal or a human), for example, for mice, through the tail vein. Since blood is incompressible, large doses of liquid may cause an increase in hydrodynamic pressure, thereby temporarily enhancing the permeability to endothelial cells and parenchymal cells, thereby allowing cargo that normally cannot pass through the cell membrane to enter the cell.
  • This method can be used to deliver naked DNA plasmids and proteins.
  • the delivered cargo can be enriched in the liver, kidneys, lungs, muscles, and/or heart.
  • Cargo such as nucleic acids
  • Cargo can be introduced into cells by transfection methods for introducing nucleic acids into cells.
  • transfection methods include calcium phosphate-mediated transfection, cationic transfection, liposome transfection, dendrimer transfection, heat shock transfection, magnetofection, lipofection, impalefection, optical transfection, proprietary agent enhanced nucleic acid uptake.
  • the delivery vehicle includes a cell penetrating peptide (CPP).
  • CPPs are short peptides that facilitate cellular uptake of a variety of molecular cargoes (e.g., from nano-sized particles to small chemical molecules and large DNA fragments).
  • CPPs can have different sizes, amino acid sequences and charges.
  • CPPs can translocate the plasma membrane and facilitate the delivery of various molecular cargoes to the cytoplasm or organelles.
  • CPPs can be introduced into cells by different mechanisms, such as direct penetration of the membrane, endocytosis-mediated entry, and translocation through the formation of transient structures.
  • the amino acid composition of a CPP may contain a high relative abundance of positively charged amino acids (such as lysine or arginine), or have a sequence containing an alternating pattern of polar/charged amino acids and nonpolar hydrophobic amino acids. These two types of structures are called polycationic or amphipathic structures, respectively.
  • the third type of CPP is a hydrophobic peptide that contains only nonpolar residues, has a low net charge, or has a hydrophobic amino acid group that is critical for cellular uptake.
  • Another type of CPP is the transactivating transcription activator (Tat) from human immunodeficiency virus 1 (HIV-1).
  • CPPs examples include penetratin, Tat (48-60), transportan, and (R-AhX-R4) (Ahx refers to aminohexanoyl), Kaposi's fibroblast growth factor (FGF) signal peptide sequence, integrin ⁇ 3 signal peptide sequence, polyarginine peptide Arg sequence, guanine-rich molecular transporter, and sweet arrow peptide.
  • Ahx refers to aminohexanoyl
  • FGF Kaposi's fibroblast growth factor
  • FGF fibroblast growth factor
  • integrin peptide Arg sequence examples include those described in US Pat. No. 8,372,951.
  • CPP can be easily used for in vitro and ex vivo effects, and generally requires extensive optimization for each cargo and cell type.
  • CPP can be covalently attached directly to Cas12o protein, which is then compounded with crRNA and delivered to cells.
  • CPP-Cas12o and CPP-crRNA can be delivered to multiple cells individually.
  • CPP can also be used to deliver RNP.
  • CPPs can be used to deliver compositions and systems to plants.
  • CPPs can be used to deliver components to plant protoplasts, which are then regenerated into plant cells and further regenerated into plants.
  • the delivery vehicle includes gold nanoparticles (also known as AuNPs or colloidal gold).
  • the gold nanoparticles can form a complex with a cargo such as Cas12o protein:crRNA RNP.
  • the gold nanoparticles can be coated, for example, in silicate and endosome disrupting polymer PAsp (DET).
  • Examples of gold nanoparticles include Spherical Nucleic Acid (SNATM) constructs from AuraSense Therapeutics, and those described in Mout R, et al. (2017). ACS Nano 11:2452-8; Lee K et al. (2017). Nat Biomed Eng 1:889-901.
  • the present disclosure also provides cells comprising one or more components of the compositions and systems herein, such as Cas12o proteins or CRISPR-related Cas12o proteins and/or crRNA. Also provided are cells modified by the systems and methods herein, and cell cultures, tissues, organs, and organisms comprising such cells or their progeny. In one embodiment, the present disclosure provides a method for modifying a cell or an organism.
  • the cell may be a prokaryotic cell or a eukaryotic cell.
  • the cell may be a mammalian cell.
  • the mammalian cell may be a non-human primate, a cattle, a pig, a rodent, or a mouse cell.
  • the cell may be a non-mammalian eukaryotic cell, such as poultry, fish, or shrimp.
  • the cell may be a therapeutic T cell or an antibody-producing B cell.
  • the cell may also be a plant cell.
  • the plant cell may be a cell of a crop such as cassava, corn, sorghum, wheat, or rice.
  • the plant cell may also be a cell of algae, a tree, or a vegetable.
  • the modification introduced into the cell by the present disclosure may cause the cell and the progeny of the cell to be changed to increase the production of a bioproduct (such as an antibody, starch, alcohol, or other desired cell output).
  • the modifications introduced into a cell by the present disclosure can be such that the cell and progeny of the cell include the alteration in an altered biological product produced.
  • one or more polynucleotide molecules, vectors or vector systems that drive the expression of one or more elements of a composition, system or delivery system comprising one or more elements of a nucleic acid targeting system are introduced into a host cell such that expression of these elements of the nucleic acid targeting system directs the formation of a nucleic acid-targeting complex at one or more target sites.
  • the host cell may be a eukaryotic cell, a prokaryotic cell or a plant cell.
  • the host cell is a cell of a cell line.
  • the cell line can be obtained from a variety of sources known to those skilled in the art (see, for example, American Type Culture Collection (ATCC) (Manassus, Va.)).
  • ATCC American Type Culture Collection
  • cells transfected with one or more vectors as described herein are used to establish a new cell line comprising sequences from one or more vector sources.
  • a cell line is established using a cell transiently transfected with components of a system as described herein (such as transiently transfected by one or more vectors, or transfected with RNA) and modified by the activity of the complex, the cell line comprising cells containing modifications but lacking any other exogenous sequences.
  • cells transiently or non-transiently transfected with one or more vectors as described herein, or cell lines derived from such cells are used to evaluate one or more test compounds.
  • human cells or tissues, plants or non-human animals comprising one or more of the polynucleotide molecules, vectors, vector systems or cells of any one of the embodiments herein.
  • host cells and cell lines modified by or comprising a composition, system or modified enzyme of the present disclosure are provided, including (isolated) stem cells and progeny thereof.
  • a plant or non-human animal comprises at least one of the system components, polynucleotide molecules, vectors, vector systems or cells described in any one of the embodiments herein at least one tissue type of the plant or non-human animal.
  • a non-human animal comprises at least one of the system components, polynucleotide molecules, vectors, vector systems or cells described in any one of the embodiments herein in at least one tissue type.
  • the presence of system components is transient because they degrade over time.
  • the expression of the components of the system and composition described in any one of the embodiments contained in polynucleotide molecules, vectors, vector systems or cells is limited to certain tissue types or regions in plants or non-human animals.
  • the expression of the components of the system and composition described in any one of the embodiments contained in polynucleotide molecules, vectors, vector systems or cells depends on physiological cues. In one embodiment, the expression of the components of the system and composition described in any one of the embodiments contained in polynucleotide molecules, vectors, vector systems or cells can be triggered by exogenous molecules. In one embodiment, the expression of the components of the system and composition described in any one of the embodiments contained in polynucleotide molecules, vectors, vector systems or cells depends on the expression of non-cas molecules in plants or non-human animals.
  • the systems, vector systems, vectors and compositions described in this article can be used for various nucleic acid targeting applications, changing or modifying the synthesis of gene products (such as proteins), nucleic acid cleavage, nucleic acid editing, nucleic acid splicing; transport of target nucleic acids, tracking of target nucleic acids, separation of target nucleic acids, visualization of target nucleic acids, etc.
  • gene products such as proteins
  • nucleic acid cleavage such as proteins
  • nucleic acid editing such as nucleic acid editing
  • nucleic acid splicing transport of target nucleic acids, tracking of target nucleic acids, separation of target nucleic acids, visualization of target nucleic acids, etc.
  • aspects of the present disclosure also encompass methods and uses of the compositions and systems described herein in genome engineering, such as for altering or manipulating the expression of one or more genes or one or more gene products in prokaryotic or eukaryotic cells in vitro, in vivo, or ex vivo.
  • the target polynucleotide is a target sequence within genomic DNA (including nuclear genomic DNA, mitochondrial DNA, or chloroplast DNA).
  • nucleic acid-targeting complex comprising a crRNA or guide RNA hybridized to a target sequence and complexed with one or more nucleic acid-targeting effector proteins
  • cleavage of one or both DNA or RNA strands in or near e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50 or more base pairs of
  • one or more sequences associated with a target locus of interest refers to sequences that are near (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs of) a target sequence, wherein the target sequence is contained in the target locus of interest.
  • the present disclosure provides a method for targeting a polynucleotide, the method comprising contacting a sample (such as a cell, a cell group, a tissue, an organ or an organism) containing a target polynucleotide with a composition, a system, a polynucleotide or a vector.
  • a sample such as a cell, a cell group, a tissue, an organ or an organism
  • the contact may result in modification of a gene product or modification of the amount or expression of a gene product.
  • the target sequence of the polynucleotide is a disease-associated target sequence.
  • the present disclosure provides a method for modifying a target polynucleotide, the method comprising delivering one or more polynucleotides in a composition, or one or more vectors to a cell or cell population comprising the target polynucleotide, wherein the complex guides a reverse transcriptase to the target sequence and the reverse transcriptase promotes insertion of a donor sequence from the crRNA into the target polynucleotide.
  • target polynucleotides include sequences associated with signal transduction biochemical pathways, such as signal transduction biochemical pathway-related genes or polynucleotides.
  • target polynucleotides include disease-related genes or polynucleotides.
  • Disease-related genes or polynucleotides refer to any genes or polynucleotides that produce transcription or translation products at abnormal levels or in abnormal forms in cells derived from disease-accumulated tissues compared to tissues or cells of non-disease controls. It may be a gene that becomes expressed at an abnormally high level; it may be a gene that becomes expressed at an abnormally low level, and this altered expression is related to the occurrence and/or progression of the disease.
  • Disease-related genes also refer to genes with mutations or gene variations that are directly responsible for or in linkage disequilibrium with genes that cause the cause of the disease.
  • the products of transcription or translation may be known or unknown and may be at normal or abnormal levels.
  • the target polynucleotide of the complex can be any polynucleotide endogenous or exogenous to a eukaryotic cell.
  • the target polynucleotide can be a polynucleotide present in the nucleus of a eukaryotic cell.
  • the target polynucleotide can be a sequence encoding a gene product (e.g., protein) or a non-coding sequence (e.g., regulatory polynucleotide or junk DNA). It is not desirable to be bound by theory, but it is believed that the target sequence should be associated with TAM (target adjacent motif) (i.e., a short sequence identified by the complex).
  • TAM target adjacent motif
  • TAM The precise sequence and length requirements of TAM vary due to the Cas12o protein used or the CRISPR-related Cas12o protein, but TAM is typically a 2-5 base pair sequence adjacent to the prototype spacer (i.e., target sequence), and technicians will be able to identify additional TAM sequences for use with a given Cas12o protein or CRISPR-related Cas12o protein.
  • the engineering of the TAM interaction domain allows programming of TAM specificity, improves target site recognition fidelity, and increases the versatility of the Cas12o protein genome engineering platform.
  • Cas12o proteins or CRISPR-associated Cas12o proteins can be engineered to change their PAM specificity.
  • the target sequence when the target sequence is DNA, the target sequence is located at the 3' end of the original spacer sequence adjacent to the motif (PAM), and the PAM is 5'-TN, wherein N is A, T, G or C.
  • target polynucleotides include sequences associated with signal transduction biochemical pathways, such as signal transduction biochemical pathway-related genes or polynucleotides.
  • target polynucleotides include disease-related genes or polynucleotides.
  • Disease-related genes or polynucleotides refer to any genes or polynucleotides that produce transcription or translation products at abnormal levels or in abnormal forms in cells derived from disease-accumulated tissues compared to tissues or cells of non-disease controls. It may be a gene that becomes expressed at an abnormally high level; it may be a gene that becomes expressed at an abnormally low level, and this altered expression is related to the occurrence and/or progression of the disease.
  • Disease-related genes also refer to genes with mutations or gene variations that are directly responsible for or in linkage disequilibrium with genes that cause the cause of the disease.
  • the products of transcription or translation may be known or unknown and may be at normal or abnormal levels.
  • aspects of the present disclosure relate to: a method for targeting a polynucleotide, the method comprising contacting a sample containing the polynucleotide with a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein as described in any embodiment herein; a delivery system comprising a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein nuclease as described in any embodiment herein; a polynucleotide comprising a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein as described in any embodiment herein; a carrier comprising a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein as described in any embodiment herein; or a carrier system comprising a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein as described in any embodiment herein.
  • the target polynucleotide is contacted with at least two different compositions, systems, or Cas12o proteins or CRISPR-associated Cas12o proteins.
  • the two different Cas12o proteins have different target polynucleotide specificities, or degrees of specificity.
  • the two different Cas12o proteins or CRISPR-associated Cas12o proteins have different TAM specificities.
  • Also contemplated are methods of targeting polynucleotides comprising contacting a sample comprising the polynucleotides with the compositions and systems, vectors, polynucleotides herein, wherein the contacting results in modification of the gene product or modification of the amount or expression of the gene product.
  • the expression of the targeted gene product is increased by the method.
  • the expression of the targeted gene product is increased by at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, 100%.
  • the expression of the targeted gene product is increased by at least 1.5 times, at least 2 times, at least 2.5 times, at least 3 times, at least 3.5 times, at least 3.5 times, at least 4 times, at least 4.5 times, at least 5 times, at least 10 times, at least 10 times, at least 15 times, at least 20 times, at least 25 times, at least 50 times, at least 100 times.
  • the expression of the targeted gene product is reduced by at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%.
  • the expression of the targeted gene product is reduced to at least 1/1.5, at least 1/2, at least 1/2.5, at least 1/3, at least 1/3.5, at least 1/3.5, at least 1/4, at least 1/4.5, at least 1/5, at least 1/10, at least 1/10, at least 1/15, at least 1/20, at least 1/25, at least 1/50, at least 1/100.
  • the expression of the targeted gene product is reduced by the method.
  • the expression of the targeted gene can be completely eliminated, or can be considered eliminated if the residual expression level of the targeted gene is reduced to below the detection limit of the method known in the art for quantification, detection or monitoring expression levels.
  • one or more polynucleotide molecules, vectors or vector systems that drive the expression of one or more elements of a nucleic acid targeting system or a delivery system comprising one or more elements of a nucleic acid targeting system are introduced into a host cell such that expression of these elements of the nucleic acid targeting system directs the formation of a nucleic acid-targeting complex at one or more target sites.
  • the host cell may be a eukaryotic cell, a prokaryotic cell or a plant cell.
  • human cells or tissues, plants or non-human animals comprising one or more of the polynucleotide molecules, vectors, vector systems or cells of any one of the embodiments herein.
  • host cells and cell lines modified by or comprising a composition, system or modified enzyme of the present disclosure are provided, including (isolated) stem cells and progeny thereof.
  • a plant or non-human animal comprises at least one of the compositions, polynucleotide molecules, vectors, vector systems or cells described in any one of the embodiments herein at least one tissue type of the plant or non-human animal.
  • a non-human animal comprises at least one of the compositions, polynucleotide molecules, vectors, vector systems or cells described in any one of the embodiments herein in at least one tissue type.
  • the presence of the composition is transient because they degrade over time.
  • the expression of the composition described in any one of the embodiments contained in a polynucleotide molecule, a vector, a vector system or a cell is limited to certain tissue types or regions in a plant or non-human animal. In one embodiment, the expression of the composition described in any one of the embodiments contained in a polynucleotide molecule, a vector, a vector system or a cell depends on physiological cues. In one embodiment, the expression of the composition described in any one of the embodiments contained in a polynucleotide molecule, a vector, a vector system or a cell may be triggered by an exogenous molecule. In one embodiment, the expression of the composition described in any one of the embodiments contained in a polynucleotide molecule, a vector, a vector system or a cell depends on the expression of non-Cas molecules in plants or non-human animals.
  • the present disclosure provides a method using one or more elements of a nucleic acid targeting system.
  • the target nucleic acid complex disclosed herein provides an efficient means for modifying a target DNA or RNA (single-stranded or double-stranded, linear or supercoiled).
  • the target nucleic acid complex disclosed herein has a variety of practicality, including modification (e.g., deletion, insertion, translocation, inactivation, activation) of a target DNA or RNA in a variety of cell types.
  • modification e.g., deletion, insertion, translocation, inactivation, activation
  • the target nucleic acid complex disclosed herein has a wide spectrum of applications in, for example, gene therapy, drug screening, disease diagnosis and prognosis.
  • the target nucleic acid complex of an exemplary target nucleic acid includes a target DNA or RNA effector protein that is hybridized to a crRNA or guide RNA compounded with a target sequence in a target locus.
  • the present disclosure provides a method for cutting a target polynucleotide.
  • the method may include modifying the target polynucleotide using a complex of a targeting nucleic acid that binds to the target polynucleotide and affects the cutting of the target polynucleotide.
  • the complex of the targeting nucleic acid of the present disclosure may produce a break (e.g., a single-strand or double-strand break) in a polynucleotide sequence when introduced into a cell.
  • the method may include allowing a composition to bind to a target DNA or RNA to achieve cutting of the target DNA or RNA so as to modify the target DNA or RNA, wherein the complex of the targeting nucleic acid comprises an effector protein of a targeting nucleic acid that is compounded with a guide RNA that hybridizes to a target sequence within the target DNA or RNA.
  • the present disclosure provides a method for modifying the expression of DNA or RNA in a eukaryotic cell.
  • the method includes allowing a complex of a targeting nucleic acid to bind to a DNA or RNA so that the binding causes an increase or decrease in the expression of the DNA or RNA; wherein the complex of the targeting nucleic acid comprises an effector protein of a targeting nucleic acid compounded with a crRNA or a guide RNA.
  • Similar considerations and conditions apply to the above-mentioned method for modifying the target DNA or RNA. In fact, these sampling, culturing, and reintroduction options are applicable to various aspects of the present disclosure.
  • the present disclosure provides a method for modifying a target DNA or RNA in a eukaryotic cell, which method can be performed in vivo, in vitro or in vitro.
  • the method includes sampling a cell or cell population from a human or non-human animal, and modifying the one or more cells. Cultivation can be performed in vitro at any stage. One or more cells can even be reintroduced into a non-human animal or plant. For the reintroduced cells, it is particularly preferred that the cells are stem cells.
  • the composition as described in any embodiment herein can be used to detect a nucleic acid identifier.
  • the composition herein induces double-strand breaks in order to achieve the purpose of inducing HDR-mediated correction.
  • two or more guide RNAs compounded with Cas12o protein or its ortholog or homolog can be used to induce multiple breaks in order to achieve the purpose of inducing HDR-mediated correction.
  • Recombinant template nucleic acid refers to a nucleic acid sequence that can be used in combination with the composition disclosed herein to change the structure of the target position.
  • the target nucleic acid is modified to have some or all of the sequences of the recombinant template nucleic acid, usually at or near the cleavage site.
  • the recombinant template nucleic acid is single-stranded. In an alternative embodiment, the recombinant template nucleic acid is double-stranded. In one embodiment, the recombinant template nucleic acid is DNA, such as double-stranded DNA. In an alternative embodiment, the recombinant template nucleic acid is single-stranded DNA. In one embodiment, a recombinant template is provided for use as a template in homologous recombination, for example, provided in or near a target sequence cut or cut by an effector protein of a targeting nucleic acid as a part of a complex of a targeting nucleic acid. In one embodiment, nuclease-induced non-homologous end joining (NHEJ) can be used for targeted gene-specific knockout.
  • NHEJ non-homologous end joining
  • the present disclosure provides a non-naturally occurring or engineered composition, or one or more polynucleotides encoding the components of the composition, or a vector or delivery system comprising one or more polynucleotides encoding the components of the composition, which is used to modify target cells in vivo, in vitro or in vitro, and the modification can be implemented in such a way that the cell is changed so that once modified, the progeny or cell line of the cell modified by the Cas12o protein or CRISPR-related Cas12o protein retains the changed phenotype.
  • the modified cells and progeny can be part of a multicellular organism, such as a plant or animal in which the composition is applied in vitro or in vivo to a desired cell type.
  • the methods herein include therapeutic treatment methods.
  • the therapeutic treatment method may include gene or genome editing, or gene therapy.
  • one or more vectors described herein are used to produce non-human transgenic animals or transgenic plants.
  • the transgenic animal is a mammal, such as a mouse, a rat or a rabbit.
  • Methods for producing transgenic animals and plants are known in the art, and generally start from a cell transfection method, such as described herein.
  • the present disclosure provides an engineered non-natural composition comprising a catalytically inactive Cas12o protein or a CRISPR-related Cas12o protein as described herein, and this system is used in detection methods such as fluorescence in situ hybridization (FISH).
  • FISH fluorescence in situ hybridization
  • Nucleic acid targeting systems that target DNA can be used to screen patients or patient samples for the presence of such repeat sequences.
  • the repeat sequence can be a target for the RNA of the nucleic acid targeting system, and if the nucleic acid targeting system binds to it, the binding can be detected, indicating the presence of such a repeat sequence. Therefore, the nucleic acid targeting system can be used to screen patients or patient samples for the presence of the repeat sequence.
  • the patient can then be administered an appropriate compound to address the condition; alternatively, the nucleic acid targeting system can be administered to bind and cause insertions, deletions, or mutations and alleviate the condition.
  • the Cas12o proteins and systems described herein can be used to perform efficient and cost-effective functional genomic screening. Such screening can utilize whole genome libraries based on Cas12o protein nucleases. Such screening and libraries can be used to determine the function of genes, the cellular pathways involved in genes, and the way in which any changes in gene expression lead to specific biological processes.
  • An advantage of the present disclosure is that the composition avoids off-target binding and the side effects it produces. This is achieved using a system that is arranged to have a high degree of sequence specificity for target DNA.
  • the Cas12o protein or CRISPR-related Cas12o protein complex is a Cas12o protein complex.
  • the present disclosure provides a method for evaluating and screening gene function.
  • Compositions are used to accurately deliver functional domains, activate or repress genes, or change epigenetic states by accurately changing methylation sites on specific target loci, which can be applied to single cells or cell groups together with one or more crRNAs or guide RNAs or applied to genomes in cell banks in vitro or in vivo together with libraries, including applying or expressing libraries comprising multiple crRNAs (comprising guide molecules), and wherein screening also includes using Cas12o proteins or CRISPR-related Cas12o proteins, wherein the complex comprising Cas12o proteins or CRISPR-related Cas12o proteins is modified to include heterologous functional domains.
  • the present disclosure also provides cells comprising one or more components of the system herein, such as Cas12o protein or CRISPR-related Cas12o protein and/or crRNA. Also provided are cells modified by the systems and methods herein, and cell cultures, tissues, organs, and organisms comprising such cells or their progeny.
  • the present disclosure includes, in one embodiment, a method for modifying cells or organisms.
  • the cell may be a prokaryotic cell or a eukaryotic cell.
  • the cell may be a mammalian cell.
  • the mammalian cell may be a non-human primate, a cattle, a pig, a rodent, or a mouse cell.
  • the cell may be a non-mammalian eukaryotic cell, such as poultry, fish, or shrimp.
  • the cell may also be a plant cell.
  • the plant cell may be a cell of a crop such as cassava, corn, sorghum, wheat, or rice.
  • the plant cell may also be a cell of algae, a tree, or a vegetable.
  • the modification introduced into the cell by the present disclosure may cause the cell and the progeny of the cell to be changed to increase the production of a biological product (such as an antibody, starch, alcohol, or other desired cell output).
  • the modification introduced into the cell by the present disclosure may cause the cell and the progeny of the cell to include a change in the biological product produced.
  • a method for diagnosing, predicting, treating and/or preventing a disease, state or illness of a subject may include using a composition, system or its components as described herein to modify a polynucleotide in a subject or its cell, and/or include using a composition, system or its components as described herein to detect a diseased or healthy polynucleotide in a subject or its cell.
  • a treatment or prevention method may include using a composition, system or its components to modify a polynucleotide of an infectious organism (e.g., bacteria or virus) in a subject or its cell.
  • a treatment or prevention method may include using a composition, system or its components to modify a polynucleotide of an infectious organism or a symbiotic organism in a subject.
  • the composition, system and its components may be used to develop a model of a disease, state or illness.
  • the composition, system and its components may be used to detect a disease state or its correction, such as by a treatment or prevention method as described herein.
  • the composition, system and its components may be used to screen and select cells that may be used as, for example, treatment or prevention as described herein.
  • the composition, system and its components may be used to develop a bioactive agent that may be used to modify one or more biological functions or activities in a subject or its cell.
  • the use comprises administering the aforementioned fusion protein, the aforementioned polynucleotide, the aforementioned CRISPR-Cas composition, complex, system, kit, delivery composition, enzyme preparation to the subject or the subject's ex vivo cells.
  • the condition or disease includes metabolic disease, cancer, neurological disease, ophthalmic disease, and infectious disease.
  • condition or disease comprises a genetic disease.
  • condition or disease is caused by a pathogenic point mutation.
  • the disease comprises cystic fibrosis, Duchenne muscular dystrophy (DMD), Becker muscular dystrophy, alpha-1-antitrypsin deficiency, Pompe disease (glycogen storage disease type II), myotonic dystrophy, Huntington disease, fragile X syndrome, Friedreich's ataxia, amyotrophic lateral sclerosis, hereditary chronic kidney disease, sickle cell disease, beta thalassemia, frontotemporal dementia, Leber's congenital amaurosis, hyperlipidemia, hypercholesterolemia (FH), atherosclerosis (ASCVD), transthyretin amyloidosis (ATTR), Alpha-1 antitrypsin deficiency (AATD), retinal disease disease, macular degeneration, Wilms' tumor, Ewing's sarcoma, neuroendocrine tumors, glioblastoma, neuroblastoma, melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer,
  • the condition or disease includes hypercholesterolemia (FH), atherosclerosis (ASCVD), transthyretin amyloidosis (ATTR), alpha-1 antitrypsin deficiency (AATD), primary hyperoxaluria (PH1), hereditary angioedema (HAE), and hepatitis B.
  • FH hypercholesterolemia
  • ASCVD atherosclerosis
  • TRR transthyretin amyloidosis
  • AATD alpha-1 antitrypsin deficiency
  • PH1 primary hyperoxaluria
  • HAE hereditary angioedema
  • hepatitis B hepatitis B.
  • the disorder or disease comprises a disease caused by a single base mutation.
  • the clinical variant database is obtained from the NCBI ClinVar database available on the NCBI ClinVar website.
  • Pathogenic single nucleotide polymorphisms SNPs
  • CRISPR targets in the region overlapping and surrounding each SNP are identified.
  • a selection of SNPs that can be corrected using base editing in combination with Cas proteins or variants thereof to target causal mutations are listed in the table below, which lists only one alias for each disease.
  • RS# corresponds to the RS accession number in the SNP database on the NCBI website.
  • AlleleID corresponds to the causal allele accession number.
  • the "Name” column contains the locus identifier of the gene, the gene name, the mutation position in the gene, and the changes caused by the mutation.
  • the compositions, systems and/or components thereof described herein can be used to treat and/or prevent circulatory system diseases.
  • the compositions, systems described herein can be used to treat nervous system diseases.
  • the compositions, systems described herein can be used to treat hearing diseases, such as hearing diseases or hearing loss in one or both ears. Deafness is usually caused by the loss or damage of hair cells, which makes it impossible to transmit signals to auditory neurons. In such cases, cochlear implants can be used to respond to sound and transmit electrical signals to nerve cells. However, due to the reduced growth factors released by damaged hair cells, these neurons often degenerate and retract from the cochlea.
  • the compositions, systems and/or components thereof described herein can be used to treat diseases in non-dividing cells; in one embodiment, the gene or transcript to be corrected is located in a non-dividing cell.
  • Exemplary non-dividing cells are muscle cells or neurons.
  • Non-dividing (especially non-dividing, fully differentiated) cell types raise questions about gene targeting or genome engineering, for example because homologous recombination (HR) is generally inhibited in the G1 cell cycle phase.
  • the disease to be treated is a disease affecting the eye.
  • the compositions, systems, or components thereof described herein are delivered to one or both eyes.
  • the compositions, systems, and methods described herein can be used to correct eye defects caused by several genetic mutations, which are further described in Genetic Diseases of the Eye, 2nd edition, edited by Elias I. Traboulsi, Oxford University Press, 2012.
  • the condition to be treated or targeted is an ocular disorder.
  • the ocular disorder may include glaucoma.
  • the ocular disorder includes a retinal degenerative disease.
  • the retinal degenerative disease is selected from Stargardt disease, Bardet-Biedl syndrome, Best disease, blue cone achromatopsia, choroideremia, cone-rod dystrophy, congenital stationary night blindness, enhanced S cone syndrome, juvenile X-linked retinoschisis, Leber Congenital Amaurosis, Malattia Leventinesse, Norrie Disease or X-linked familial exudative vitreoretinopathy, pattern dystrophy, Sorsby Dystrophy, Usher Syndrome, retinitis pigmentosa, color blindness or macular dystrophy or degeneration, retinitis pigmentosa, color blindness and age-related macular degeneration.
  • the retinal degenerative disease is Leber Congenital Amaurosis (LCA) or retinitis pigmentosa.
  • a lentiviral vector is used for administration to the eye.
  • the lentiviral vector is an equine infectious anemia virus (EIAV) vector.
  • EIAV equine infectious anemia virus
  • Other viral vectors may also be used for delivery to the eye, such as AAV vectors, such as those described in: Campochiaro et al., Human Gene Therapy 17:167-176 (February 2006); Millington-Ward et al. (Molecular Therapy, Vol. 19, No. 4, 642-649 April 2011; Dalkara et al. (Sci Transl Med 5, 189ra76 (2013)), which may be suitable for use with the compositions and systems described herein.
  • the dosage may be in the range of about 106 to 109.5 particle units.
  • Cardiovascular disease generally includes hypertension, heart attack, heart failure, and stroke and TIA. Any chromosomal sequence related to cardiovascular disease or protein encoded by any chromosomal sequence related to cardiovascular disease can be used for the method disclosed herein.
  • Cardiovascular-related proteins are usually selected based on the experimental association of cardiovascular-related proteins with the development of cardiovascular disease. For example, relative to a group lacking cardiovascular disease, in a group suffering from cardiovascular disease, the production rate or circulating concentration of cardiovascular-related proteins can be increased or decreased.
  • Proteomic techniques can be used to assess the difference in protein levels, including but not limited to western blot, immunohistochemical staining, enzyme-linked immunosorbent assay (ELISA) and mass spectrometry.
  • ELISA enzyme-linked immunosorbent assay
  • cardiovascular-related proteins can be identified by using genomic technology to obtain the gene expression profile of the gene encoding the protein, including but not limited to DNA microarray analysis, gene expression series analysis (SAGE) and quantitative real-time polymerase chain reaction (Q-PCR).
  • compositions and systems herein can be used to treat diseases of the muscle system.
  • the present disclosure also contemplates delivering the compositions, systems, and effector protein systems described herein to muscles.
  • the muscle disease to be treated is muscular dystrophy, such as DMD.
  • the compositions and systems described herein (such as systems capable of RNA modification) can be used to achieve exon skipping to achieve correction of diseased genes.
  • the method includes treating sickle cell-related diseases, such as sickle cell traits, sickle cell diseases such as sickle cell anemia, and beta thalassemia.
  • the methods and systems can be used to modify the genome of sickle cells, for example, by correcting one or more mutations in the beta globin gene.
  • sickle cell anemia can be corrected by modifying HSCs with the system.
  • the system allows for specific editing of the genome of a cell by cutting the cell's DNA and then allowing it to repair itself.
  • the Cas12o protein is inserted and guided to the mutation point by the RNA guide, and then the DNA is cut at that point.
  • a healthy version of the sequence is inserted. This sequence is used by the cell's own repair system to repair the induced incision. In this way, the Cas12o protein or CRISPR-associated Cas12o protein allows correction of mutations in previously obtained stem cells.
  • RNA or guide RNA targeting particles containing mutations and Cas12o proteins is contacted with HSCs carrying mutations.
  • the particles may also contain a suitable HDR template to correct mutations so as to correctly express ⁇ globin; or the HSC may be contacted with a second particle or vector containing or delivering an HDR template. Cells so contacted may be administered; and optionally processed/amplified; refer to Cartier.
  • the HDR template may enable HSCs to express engineered ⁇ globin genes (e.g., ⁇ A-T87Q) or ⁇ globin.
  • compositions, systems, or components thereof described herein can be used to treat diseases of the kidney or liver.
  • the compositions, systems, or components thereof described herein are delivered to the liver or kidney.
  • Delivery strategies for inducing cellular uptake of therapeutic nucleic acids include physical forces or carrier systems, such as delivery based on viruses, lipids, or complexes, or nanocarriers.
  • various gene therapeutic viral and non-viral vectors have been used to target post-transcriptional events in vivo in different animal kidney disease models when nucleic acids are delivered to kidney cells by systemic hydrodynamic high-pressure injection ((Csaba Révész and Péter Hamar (2011).
  • Methods of delivery to the kidney may include those described in Yuan et al. (Am J Physiol Renal Physiol 295:F6). 05-F617,2008).
  • Yuang et al. can be applied to the compositions of the present disclosure, which contemplates subcutaneous injection of 1-2 g of Cas12o protein conjugated to cholesterol into humans for delivery to the kidneys.
  • the method of Molitoris et al. (J Am Soc Nephrol 20:1754-1764,2009) can be adapted to the compositions, and a cumulative dose of 12-20 mg/kg for humans can be used for delivery to the proximal tubule cells of the kidney.
  • Thompson et al. The method of human (Nucleic Acid Therapeutics, Vol. 22, No. 4, 2012) can be adapted to the composition and can deliver doses of up to 25 mg/kg by intravenous (i.v.) administration.
  • J Am Soc Nephrol 21:622-633, 2010 can be adapted to the composition and can use a dose of about 10-20 ⁇ mol of the composition complexed with a nanocarrier in about 1-2 liters of saline for intraperitoneal (i.p.) administration.
  • the disease treated or prevented by the compositions and systems described herein can be a pulmonary or epithelial disease.
  • the compositions and systems described herein can be used to treat epithelial and/or pulmonary diseases.
  • the present disclosure also contemplates delivering the compositions and systems described herein to one or both lungs.
  • a viral vector can be used to deliver the composition, system, or components thereof to the lung.
  • the AAV is AAV-1, AAV-2, AAV-5, AAV-6, and/or AAV-9 for delivery to the lung.
  • the MOI can vary from 1 ⁇ 103 to 4 ⁇ 105 vector genomes/cell.
  • the delivery vector can be an RSV vector as in Zamora et al. (Am J Respir Crit Care Med Vol. 183. pp. 531-538, 2011). The method of Zamora et al. can be applied to the nucleic acid targeting system of the present disclosure, and the aerosolized composition, e.g., at a dose of 0.6 mg/kg, can be considered for use in the present disclosure.
  • compositions and systems described herein can be used to treat skin disorders.
  • present disclosure also contemplates delivering the compositions and systems described herein to the skin.
  • the composition, system, or components thereof may be delivered to the skin via one or more microneedles or a device containing microneedles (intradermal delivery).
  • the device and the method of Hickerson et al. may be used and/or adapted to deliver the composition, system described herein to the skin, for example, at a dose of up to 300 ⁇ l of a 0.1 mg/ml composition.
  • the methods and techniques of Leachman et al. (Molecular Therapy, Vol. 18, No. 2, 442-446 February 2010) may be used and/or adapted to deliver the composition described herein to the skin.
  • the methods and techniques of Zheng et al. may be used and/or adapted to deliver nanoparticles of the composition described herein to the skin.
  • a dose of about 25 nM applied in a single application may achieve gene knockdown in the skin.
  • compositions and systems described herein can be used to treat cancer.
  • the present disclosure also contemplates delivering the compositions and systems described herein to cancer cells.
  • the compositions and systems can be used to modify immune cells, such as CAR or CART cells, which can then be used to treat and/or prevent cancer. This is also described in International Patent Publication No. WO 2015/161276, the disclosure of which is hereby incorporated by reference and described below.
  • compositions, systems and components thereof described herein can be used to modify cells for adoptive cell therapy.
  • methods and compositions for editing target nucleic acid sequences or regulating the expression of target nucleic acid sequences and their application in combination with cancer immunotherapy are understood by adapting the compositions and systems disclosed herein.
  • the compositions, systems and methods can be used to modify stem cells (e.g., induced pluripotent cells) to derive modified natural killer cells, ⁇ T cells and ⁇ T cells, which can be used for adoptive cell therapy.
  • stem cells e.g., induced pluripotent cells
  • ⁇ T cells and ⁇ T cells which can be used for adoptive cell therapy.
  • the compositions, systems and methods can be used to modify modified natural killer cells, ⁇ T cells and ⁇ T cells.
  • adoptive cell therapy may refer to the transfer of cells to a patient with the goal of transferring functionality and characteristics into the new host through the engraftment of the cells (see, e.g., Mettananda et al., Editing an ⁇ -globin enhancerin primary human hematopoietic stem cells as a treatment for ⁇ -thalassemia, Nat Commun. 2017 Sep 4;8(1):424).
  • engraft or “engraftment” refers to the process of incorporating cells into a target tissue in vivo through contact with existing cells of the tissue.
  • Adoptive cell therapy may refer to the transfer of cells (most commonly immunogenic cells) back into the same patient or into a new recipient host with the goal of transferring immune functionality and characteristics into the new host. If possible, using autologous cells helps the recipient by minimizing GVHD issues.
  • TIL tumor infiltrating lymphocytes
  • allogeneic immune cells are transferred (see, e.g., Ren et al., (2017) Clin Cancer Res 23(9)2255-2266).
  • allogeneic cells can be edited to reduce alloreactivity and prevent graft-versus-host disease.
  • the use of allogeneic cells allows cells to be obtained from a healthy donor and prepared for use in a patient, rather than preparing autologous cells from a patient after diagnosis.
  • the antigen (such as a tumor antigen) targeted in adoptive cell therapy (such as in particular CAR or TCR T cell therapy) for a disease (such as in particular a tumor or cancer) can be selected from the group consisting of: MR1 (see, for example, Crowther et al., 2020, Genome-wide CRISPR-Cas9 screening reveals subiquitous T cell cancer targeting via the monomorphic MHC class I-related protein MR1, Nature Immunology, 2019, pp. 215-221). 1, pp.
  • BCMA B cell maturation antigen
  • PSA prostate-specific antigen
  • PSMA prostate-specific membrane antigen
  • PSCA prostate stem cell antigen
  • tyrosine-protein kinase transmembrane receptor ROR1 fibroblast activation protein
  • FAP tumor-associated glycoprotein 72
  • CEA carcinoembryonic antigen
  • EPCAM epithelial cell adhesion molecule
  • mesothelin human epidermal growth factor receptor 2 (ERBB2 (Her2/neu)
  • prostate enzyme prostatic acid phosphatase (PAP); elongation factor 2 mutant (ELF2M); insulin-like growth factor 1 receptor (IGF-1R); gplOO; BCR-ABL (breakpoint cluster region-Abelson); tyrosinase; New York es
  • the compositions, systems or components of the present disclosure can be used to treat and/or prevent genetic diseases or diseases with genetic and/or epigenetic aspects.
  • the genes and diseases exemplified herein are not exhaustive.
  • the method for treating and/or preventing genetic diseases may include administering a composition, system and/or one or more components thereof to a subject, wherein the composition, system and/or one or more components thereof are capable of modifying one or more copies of one or more genes associated with a genetic disease or a disease with genetic and/or epigenetic aspects in one or more cells of the subject.
  • modifying one or more copies of one or more genes associated with a genetic disease or a disease with genetic and/or epigenetic aspects in a subject can eliminate the genetic disease or its symptoms of the subject. In one embodiment, modifying one or more copies of one or more genes associated with a genetic disease or a disease with genetic and/or epigenetic aspects in a subject can reduce the severity of the genetic disease or its symptoms of the subject. In one embodiment, a composition, system or its components can modify one or more genes or polynucleotides associated with one or more diseases, and the one or more diseases include genetic diseases and/or diseases with genetic and/or epigenetic aspects.
  • the composition, system or its components can be used to diagnose, predict, treat and/or prevent infectious diseases caused by microorganisms such as bacteria, viruses, fungi, parasites or combinations thereof.
  • the system or its components are capable of targeting specific microorganisms in a mixed population. Exemplary methods of such technologies are described, for example, in Gomaa AA, Klumpe HE, Luo ML, Selle K, Barrangou R, Beisel CL. 2014. Programmable removal of bacterial strains by use of genome-targeting composition, systems, mBio5: e00928-13; Citorik RJ, Mimee M, Lu TK. 2014. Sequence-specific antimicrobials using efficiently delivered RNA-guided nucleases.
  • compositions, systems and their components described herein can target pathogenic and/or drug-resistant microorganisms, such as bacteria, viruses, parasites and fungi.
  • the composition, system and/or its components can target and modify one or more polynucleotides in pathogenic microorganisms, so that the microorganisms are reduced in toxicity, killed, inhibited or otherwise unable to cause disease and/or infection and/or replication in host cells.
  • the compositions, systems described herein can be used to modify mtDNA mutations.
  • the mitochondrial disease that can be diagnosed, predicted, treated and/or prevented can be MELAS (mitochondrial myopathy encephalopathy and lactic acidosis and stroke-like episodes), CPEO/PEO (chronic progressive external ophthalmoplegia syndrome/progressive external ophthalmoplegia), KSS (Kearns-Sell syndrome), MIDD (maternally inherited diabetes mellitus and deafness), MERRF (myoclonic epilepsy with ragged red fibers), NIDDM (non-insulin-dependent diabetes mellitus), LHON (Leber hereditary optic neuropathy), LS (Leigh syndrome), aminoglycoside-induced hearing impairment, NARP (neuropathy, ataxia and pigmentary
  • the subject's mtDNA can be modified in vivo or ex vivo.
  • cells containing the modified mitochondria can be administered back to the subject.
  • the composition, system, or component thereof is capable of correcting mtDNA mutations or a combination thereof.
  • the compositions, systems, or components thereof disclosed herein can be used for microbiome modification.
  • Microbiome plays an important role in health and disease.
  • the intestinal microbiome can play a role in health by controlling digestion and preventing the growth of pathogenic microorganisms, and is believed to affect mood and emotions.
  • An unbalanced microbiome can promote disease and is believed to cause weight gain, uncontrolled blood sugar, high cholesterol, cancer, and other conditions.
  • a healthy microbiome has a series of joint features that can be distinguished from unhealthy individuals, so the detection and identification of disease-related microbiome can be used to diagnose and detect individual diseases.
  • the compositions, systems, and components thereof can be used to screen microbiome cell populations and to identify disease-related microbiome. Cell screening methods using compositions, systems, and components thereof are described elsewhere herein and can be applied to screen a subject's microbiome, such as intestinal, skin, vaginal, and/or oral microbiome.
  • the composition, system and/or its components described herein can be used to modify the microbial population of the subject's microbiome.
  • the composition, system and/or its components can be used to identify and select one or more cell types in the microbiome and remove them from the microbiome population. Exemplary methods for selecting cells using the composition, system and/or its components are described elsewhere herein.
  • the change causes the composition of the diseased microbiome to be changed to the healthy microbiome composition.
  • the ratio of one microbial type or species to another can be modified, such as changing the ratio from a diseased ratio to a healthy ratio.
  • the selected cell is a pathogenic microorganism.
  • compositions and systems described herein can be used to modify polynucleotides in a microorganism of a subject's microbiome.
  • the microorganism is a pathogenic microorganism.
  • the microorganism is a symbiotic and non-pathogenic microorganism. Methods for modifying polynucleotides in a subject's cell are described elsewhere herein and can be applied to these embodiments.
  • the present disclosure provides a method of modeling a disease associated with a genomic locus in a eukaryotic or non-human organism, the method comprising manipulating a target sequence within a coding, non-coding or regulatory element of the genomic locus, comprising delivering a non-naturally occurring or engineered composition comprising a viral vector system, the viral vector system comprising one or more viral vectors operably encoding a composition for expression thereof, wherein the composition comprises a particle delivery system or a delivery system or a viral particle as described in any of the above embodiments or a cell as described in any of the above embodiments.
  • RNA-guided DNA nucleases suitable for use to provide modified tissues for transplantation.
  • RNA-guided DNA nucleases can be used to knock out, knock down, or disrupt selected genes in animals such as transgenic pigs (such as human heme oxygenase 1 transgenic pig lines), for example by disrupting the expression of genes encoding epitopes recognized by the human immune system, i.e., the expression of xenoantigen genes.
  • Candidate pig genes for disruption may, for example, include ⁇ (l,3)-galactosyltransferase and cytidine monophosphate-N-acetylneuraminic acid hydroxylase genes (see PCT Patent Publication WO 2014/066505). Additionally, genes encoding endogenous retroviruses, such as those encoding all porcine endogenous retroviruses, may be disrupted (see Yang et al., 2015, Genome-wide inactivation of porcine endogenous retroviruses (PERVs), Science 2015 Nov 27: 350, 6264, 1101-1104). Additionally, RNA-guided DNA nucleases may be used to target the integration sites of other genes in xenotransplant donor animals, such as the human CD55 gene, to improve protection against hyperacute rejection.
  • compositions, systems and methods described herein can be used for gene or genome interrogation or editing or manipulation in plants and fungi.
  • applications include investigation and/or selection and/or interrogation and/or comparison and/or manipulation and/or transformation of plant genes or genomes; for example, to create, identify, develop, optimize or confer plant traits or characteristics, or transform plant or fungal genomes. Therefore, the yield of plants, new plants with a combination of new traits or characteristics, or new plants with enhanced traits can be increased.
  • the compositions, systems and methods can be used for plants in site-directed integration (SDI) or gene editing (GE) or any near reverse breeding (NRB) or reverse breeding (RB) technology.
  • SDI site-directed integration
  • GE gene editing
  • NFB near reverse breeding
  • RB reverse breeding
  • compositions, systems and methods herein can be used to confer desired traits (e.g., enhanced nutritional quality, enhanced disease resistance and resistance to biotic and abiotic stresses, and increased production of commercially valuable plant products or heterologous compounds) to essentially any plant and fungus, and their cells and tissues.
  • desired traits e.g., enhanced nutritional quality, enhanced disease resistance and resistance to biotic and abiotic stresses, and increased production of commercially valuable plant products or heterologous compounds
  • desired traits e.g., enhanced nutritional quality, enhanced disease resistance and resistance to biotic and abiotic stresses, and increased production of commercially valuable plant products or heterologous compounds
  • compositions, systems and methods herein can be used to confer desired traits to essentially any plant.
  • a variety of plants and plant cell systems can be engineered to obtain desired physiological and agronomic characteristics.
  • the term "plant” refers to any of the various photosynthetic, eukaryotic, unicellular or multicellular organisms in the plant kingdom, characterized by growth by cell division, containing chloroplasts, and having a cell wall composed of cellulose.
  • the term plant encompasses monocots and dicots.
  • target plants and plant cells for engineering include those monocots and dicots, such as crops including: cereal crops (e.g., wheat, corn, rice, millet, barley), fruit crops (e.g., tomato, apple, pear, strawberry, orange), forage crops (e.g., alfalfa), root vegetable crops (e.g., carrot, potato, beet, yam), leafy vegetable crops (e.g., lettuce, spinach); flowering plants (e.g., petunia, rose, chrysanthemum), conifers and pines (e.g., pine fir, spruce); plants used for phytoremediation (e.g., heavy metal accumulating plants); oil crops (e.g., sunflower, rapeseed) and plants for experimental purposes (e.g., Arabidopsis thaliana).
  • crops including: cereal crops (e.g., wheat, corn, rice, millet, barley), fruit crops (e.g., tomato, apple, pear,
  • plants are intended to include, but are not limited to, angiosperms and gymnosperms such as acacia, alfalfa, amaranth, apple, apricot, artichoke, ash, asparagus, avocado, banana, barley, beans, beets, birch, beech, blackberries, blueberries, broccoli, Brussels sprouts, cabbage, rapeseed, cantaloupe, carrots, cassava, cauliflower, cedar, cereals, celery, chestnuts, cherries, Chinese cabbage, citrus, clementines, clover, coffee, corn, cotton, cowpea, cucumber, cypress, eggplant, elm, chicory, eucalyptus, fennel, fig, fir, geranium, grape, grapefruit, peanut, ground cherry, gum hemlock, hickory, kale, kiwi, kohlrabi, larch, lettuce, leek, lemon, lime, acacia, pine, maidenhair
  • the term plant also encompasses algae, which are primarily photoautotrophs, formed primarily due to the lack of roots, leaves, and other organs characteristic of higher plants.
  • the compositions, systems, and methods can be used for a wide range of "algae” or "algae cells.”
  • algae include eukaryotic phyla, including Rhodophyta (red algae), Chlorophyta (green algae), Phaeophyta (brown algae), Bacillariophyta (diatoms), Eustigmatophyta, and dinoflagellates, as well as prokaryotic Cyanobacteria (blue-green algae).
  • algae species include those of the genera Amphora, Anabaena, Anikstrodesmis, Botryococcus, Chaetoceros, Chlamydomonas, Chlorella, Chlorococcum, Cyclotella, Cylindrotheca, Dunaliella, Emiliana, Euglena, Hematococcus, Isochrysis, Monochrysis, Monoraphidium, Nannochloris, Nannochloropsis, Navicula , Nephrochloris, Nephroselmis, Nitzschia, Nodularia, Nostoc, Oochromonas, Oocystis, Oscillartoria, Pavlova, Phaeodactylum, Playtmonas, Pleurochrysis, Porhyra, Pseudoanabaena, Pyramimonas, Stichococcus, Synechococcus, Synechocystis, Tetraselmis, Thalassio
  • polynucleotides encoding components of the composition and system may be introduced to stably integrate into the genome of a plant cell.
  • a vector or expression system may be used for such integration.
  • the design of the vector or expression system may be adjusted according to the time, place and conditions of expression of the guide RNA and/or Cas12o protein or CRISPR-related Cas12o protein gene.
  • the polynucleotides may be integrated into the organelles of the plant, such as plastids, mitochondria or chloroplasts.
  • the elements of the expression system may be located on one or more expression constructs, which are circular, such as plasmids or transformation vectors, or non-circular, such as linear double-stranded DNA.
  • the integration method generally comprises the following steps: selecting a suitable host cell or host tissue, introducing the construct into the host cell or host tissue, and regenerating a plant cell or plant therefrom.
  • the expression system for stable integration into the plant cell genome may contain one or more of the following elements: a promoter element, which can be used to express RNA and/or Cas12o protein in plant cells; a 5' untranslated region for enhancing expression; an intron element for further enhancing expression in certain cells (such as monocotyledonous cells); a multiple cloning site for providing convenient restriction sites for inserting guide RNA and/or Cas12o protein gene sequences and other required elements; and a 3' untranslated region for providing efficient termination of expressed transcripts.
  • the components of the composition and system can be transiently expressed in plant cells.
  • the composition and system can modify the target nucleic acid only when the guide RNA and the Cas12o protein or the CRISPR-related Cas12o protein are present in the cell, so that the genome modification can be further controlled. Since the expression of the Cas12o protein or the CRISPR-related Cas12o protein is transient, the plants regenerated from such plant cells are generally free of foreign DNA.
  • the Cas12o protein or the CRISPR-related Cas12o protein is stably expressed and the guide sequence is transiently expressed.
  • DNA and/or RNA can be introduced into plant cells for transient expression.
  • sufficient amounts of the introduced nucleic acid can be provided to modify the cell, but the introduced nucleic acid will not persist after a desired period of time or after one or more cell divisions.
  • Transient expression can be achieved using a suitable vector.
  • exemplary vectors that can be used for transient expression include pEAQ vectors (customizable for Agrobacterium-mediated transient expression) and Cabbage Leaf Curl Virus (CaLCuV), and vectors described in Sainsbury F. et al., Plant Biotechnol J. 2009 Sep; 7(7): 682-93; and Yin K et al., Scientific Reports Vol. 5, Article No.: 14926 (2015).
  • the composition, system and method can be used to produce genetic variation in target plants (e.g., crops).
  • One or more ⁇ RNAs targeting one or more positions in the genome can be provided, such as a library of ⁇ RNAs, and introduced into plant cells together with the Cas12o protein nuclease.
  • a set of genome-scale point mutations and gene knockouts can be produced.
  • the composition, system and method can be used to produce plant parts or plants from the cells so obtained, and screen cells for target traits.
  • the target gene can include coding regions and non-coding regions simultaneously.
  • the trait is stress tolerance
  • the method is a method for producing stress-tolerant crop varieties.
  • compositions, systems and methods are used to modify endogenous genes or modify their expression.
  • the expression of the components can be induced by the direct activity of the Cas12o protein or the CRISPR-associated Cas12o protein and the optional introduction of recombinant template DNA, or by modifying the targeted gene to induce targeted modification of the genome.
  • the different strategies described above allow targeted genome editing mediated by the Cas12o protein or the CRISPR-associated Cas12o protein without requiring the introduction of the components into the plant genome.
  • the modification can be performed without permanently introducing any foreign genes (including those encoding the components of the compositions herein) into the plant genome to avoid the presence of foreign DNA in the plant genome. This may be of interest because regulatory requirements for non-transgenic plants are less stringent. Components that are transiently introduced into plant cells are typically removed upon hybridization.
  • modification can be performed by transient expression of the components of the compositions and systems.
  • Transient expression can be performed by delivering the components of the compositions and systems using viral vectors, by delivery into protoplasts via particulate molecules such as nanoparticles or CPPs.
  • the present disclosure provides a kit containing any one or more elements disclosed in the above-mentioned methods and compositions.
  • the present disclosure provides a kit comprising one or more components described herein.
  • the kit includes instructions for use of the composition herein and the kit.
  • the kit includes instructions for use of a carrier system and the kit.
  • the kit includes instructions for use of a delivery system and the kit.
  • the kit includes instructions for use of a carrier system and the kit.
  • Each element may be provided individually or in combination, and each element may be provided in any suitable container such as a vial, a bottle or a tube.
  • the kit may include crRNA as described herein and an optional unbound protective chain.
  • the kit may include crRNA, wherein the protective chain is at least partially bound to a reprogrammable spacer portion (i.e., a spacer sequence) of a crRNA sequence.
  • the kit includes instructions in one or more languages, such as instructions in more than one language. These instructions may be specific to the applications and methods described herein.
  • the kit includes one or more reagents for use in the process of utilizing one or more elements described herein.
  • Reagents can be provided in any suitable container.
  • the kit can provide one or more reaction or storage buffers.
  • Reagents can be provided in a form useful in a particular assay, or provided in a form that requires one or more other components to be added before use (for example, provided in a concentrate or lyophilized form).
  • the buffer can be any buffer, including but not limited to sodium carbonate buffer, sodium bicarbonate buffer, borate buffer, Tris buffer, MOPS buffer, HEPES buffer and combinations thereof.
  • the buffer is alkaline.
  • the buffer has a pH of about 7 to about 10.
  • the kit includes homologous recombination template polynucleotides. In one embodiment, the kit includes one or more vectors and/or one or more polynucleotides described herein. The kit can advantageously allow all elements of the disclosed system to be provided.
  • the present disclosure has discovered a new Cas protein for the first time, and the Cas protein of the present disclosure includes an OBD domain, a REC domain, a RuvC domain, a Helical domain, and a Nuc domain, and has a structure shown in Formula I or Formula II.
  • the Cas protein of the present disclosure has very good gene editing activity, can effectively edit or cut the target gene, and can be used to treat the symptoms or diseases of subjects in need.
  • the reagents and materials in the embodiments of the present disclosure are all commercially available products.
  • Cas12o proteins or fragments associated with the discovered CRISPR system were found in the samples and named Cas12o1-Cas12o6, respectively, with amino acid sequences as shown in SEQ ID NO.1, 3, 5, 7-9, and nucleotide coding sequences as shown in SEQ ID NO.16, 40, and 46, respectively.
  • the CRISPR loci of samples containing Cas12o1, Cas12o2, and Cas12o3 were annotated by PILERCR, and the corresponding direct repeat (DR) sequences were obtained, as shown in SEQ ID NO.2, 4, and 6, respectively, as shown in the following table:
  • RNA secondary structure of the above three DR sequences was further analyzed by RNAfold. The results are shown in Figure 6. All DR sequences obviously have very conservative secondary structures:
  • the above three DR sequences contain a 5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3' structure, wherein segments R1a and R1b are reverse complementary sequences and form a first stem (R1), which has 3 or 5 nucleotide pairs in Cas12o; segments Ba and Bb do not exist at the same time, and a protrusion (B) formed by 2 or 3 nucleotides is formed by the existing segments Ba or Bb; segments R2a and R2b are reverse complementary sequences and form a second stem (R2), which has 6 or 7 base pairs; and L is a loop formed at the second stem, with 5 or 7 nucleotides.
  • R1a and R1b are reverse complementary sequences and form a first stem (R1), which has 3 or 5 nucleotide pairs in Cas12o; segments Ba and Bb do not exist at the same time, and a protrusion (B) formed by 2 or 3 nucleotides is formed by
  • FIG. 6A the DR sequence of Cas12o1 is shown in Figure 6A, which contains 5'-R1a(ACA)-Ba(not present)-R2a(GGUAUCC)-L(UAAAC)-R2b(GGAUGCU)-Bb(GA)-R1b(UGU)-3'.
  • Cas12o is a new Cas subtype belonging to class 2, type V, which is close to Cas12h, Cas12i, and Cas12b subtypes on the RuvC domain phylogenetic tree ( Figure 1).
  • the protein folding software AlphaFold was used to predict the three-dimensional conformation of Cas12o ( Figure 2B), and it was found that it has conserved domains REC-I, REC-II, OBD, RuvC-I, Helical, RuvC-II, Nuc-I, RuvC-III, and Nuc-II. Its protein conformation is different from other nucleases in the Cas12 family.
  • the OBD domain is a binary fission domain, which is divided into discontinuous OBD-I and OBD-II domains.
  • vectors expressing Cas12o1 and crRNA were constructed as follows:
  • the human TTR gene was selected as the cutting target gene, and Cas12o1-hTTR1-crRNA (SEQ ID NO.14) with a PAM sequence of TN corresponding to the target sequence was designed based on the hTTR1 target sequence (SEQ ID NO.12).
  • LbCpf1-hTTR1-crRNA (SEQ ID NO.21) with PAM as TTN was designed.
  • T7 promoter was added to the 5' end of Cas12o1-hTTR1-crRNA sequence and LbCpf1-hTTR1-crRNA sequence, and rrnB T2 terminator was added to the 3' end of both, respectively, to obtain Cas12o1-hTTR1-crRNA expression sequence and LbCpf1-hTTR1-crRNA expression sequence, respectively, wherein the single underline sequence portion is the Cas12o1/LbCpf1 DR sequence, the double underline sequence portion is the spacer sequence, the italic sequence portion is the T7 promoter, the wavy underline sequence portion is the rrnB T2 terminator sequence, the linker sequence is between the spacer sequence and the rrnB T2 terminator sequence, the dotted sequence portion is the MfeI restriction site, the bold sequence portion is the MluI restriction site, CACCG is the linker, and to protect the integrity of the sequence fragment, the protective base AGC was introduced at the
  • the nucleotide coding sequences of Cas12o1 and LbCpf1 were synthesized (synthesized by Suzhou Hongxun Biotechnology Co., Ltd. and Beijing Qingke Biotechnology Co., Ltd.), and constructed into the 466-5160 positions of the ABE8e plasmid (Addgene, Plasmid #138489), respectively, to construct the Cas12o1 expression vector ( Figure 3A) and the LbCpf1 expression vector ( Figure 3B).
  • the Cas12o1-hTTR1-crRNA expression sequence fragment and the LbCpf1-hTTR1-crRNA expression sequence fragment were treated with double enzyme digestion (MfeI/MluI), and then inserted into the Cas12o1 expression vector backbone and the LbCpf1 expression vector backbone that had been treated with double enzyme digestion (MfeI/MluI), respectively, to obtain the Cas12o1-hTTR1-crRNA expression vector and the LbCpf1-hTTR1-crRNA expression vector.
  • the araC-pBAD-CCDB fragment (SEQ ID NO.17) with the target sequence hTTR1 target sequence (SEQ ID NO.12) was designed and synthesized (Suzhou Hongxun Biotechnology Co., Ltd.), and inserted into the 1284-1300 site of pKESK2 plasmid 2 (Addgene, Plasmid, #64857) to obtain the Target plasmid (SEQ ID NO.11, see Figure 4 for the map).
  • the Target plasmid carries the CCDB gene, which can express the CCDB toxic protein (the CCDB toxic protein acts as a DNA gyrase inhibitor and can lock the DNA promoter).
  • the gyrase and broken double-stranded DNA complex makes the DNA gyrase unable to function, eventually leading to cell death).
  • the PBAD promoter induced by L-arabinose can regulate the expression of the CCDB gene.
  • the regulatory expression pathway between the PBAD promoter and the CCDB toxic protein is interrupted, and the host cell will not produce ccdB toxic protein and survive; conversely, if the hTTR1 target sequence on the Target plasmid is not cut, the PBAD promoter regulates the CCDB gene to express the ccDB toxic protein, leading to the death of the host cell. Therefore, the bacterial survival ratio can indicate the cutting activity.
  • the Target plasmid was transfected into DH5a competent cells, and then the Cas12o1-hTTR1-crRNA expression vector and the LbCpf1-hTTR1-crRNA expression vector were transfected into DH5a competent cells carrying the Target plasmid, respectively.
  • nucleotide coding sequence of Cas12o1 SEQ ID NO.16
  • nucleotide coding sequence of Cas12o2 SEQ ID NO.40
  • nucleotide coding sequence of Cas12o3 SEQ ID NO.46
  • hTTR was selected as the target, and the corresponding crRNA containing the hTTR spacer sequence was constructed into the pGL3-U6-sgRNA-EGFP (Plasmid #107721) plasmid to obtain the crRNA expression vector.
  • the target sequence and crRNA are shown in the following table:
  • the Cas protein expression vector, crRNA expression vector and Target plasmid were transfected into DH5a competent cells for large-scale preparation, and the concentrations were measured and stored for later use.
  • HEK293T cells were plated in 24-well plates with 2 ⁇ 10 5 cells per well (500 ⁇ L).
  • Cas protein expression vector, crRNA expression vector, and EGFP-C1 plasmid were mixed and then incubated with 25 ⁇ L of Dilute the transfection-specific reduced serum medium (Source Bio, L530KJ), add 2 ⁇ l of Lipofectamine 3000 (Invitrogen, L3000015) reagent, mix well by pipetting as reagent A, and let stand for 5 minutes.
  • Lipofectamine 3000 transfection reagent (Invitrogen, L3000015) was diluted with 25 ⁇ l of Dilute and mix the reduced serum medium for transfection (Source Biotechnology, L530KJ) as reagent B and let stand for 5 minutes.
  • the above reagents A and B were mixed and blown evenly, and allowed to stand for 20 minutes. After standing, the mixed reagents were added dropwise to the 24-well plate cells to be transfected.
  • the plasmid dosages for transfection of each well of the 24-well plate were 0.3 ⁇ g of Cas protein expression vector, 0.3 ⁇ g of crRNA expression vector, and 0.3 ug of EGFP-C1 plasmid.
  • the plasmid dosages for transfection of each well of the 24-well plate were 0.3 ⁇ g of Cas protein expression vector, 0.3 ⁇ g of crRNA expression vector, and 0.3 ug of EGFP-C1 plasmid.
  • the culture medium was replaced with DMEM culture medium containing 10% FBS.
  • EGFP fluorescent protein expression indicated that the cells were successfully transfected, and cells with positive EGFP expression were sorted for editing efficiency detection.
  • the cells were subjected to genomic extraction (using a genomic DNA extraction kit, TIANGEN, DP304-03), and the PCR products after PCR amplification were used for high-throughput deep sequencing (Qingke Biotechnology Co., Ltd.) or Sanger sequencing (Boshang Biotechnology (Shanghai) Co., Ltd.) for identification of editing efficiency.
  • Example 2 proves that the CRISPR-Cas12o system disclosed in the present invention can achieve multifunctional and efficient genome editing in mammalian cells. And due to the smaller size, simple structure, shorter crRNA and self-processing characteristics of the Cas12o series (Cas12o1, Cas12o2, Cas12o3), it is suitable for delivery methods including AAV or LNP, and can be used for multiple gene editing applications in vivo or in vitro in the future.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Biotechnology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Neurology (AREA)
  • Plant Pathology (AREA)
  • Neurosurgery (AREA)
  • Immunology (AREA)
  • Oncology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Hematology (AREA)
  • Communicable Diseases (AREA)
  • Ophthalmology & Optometry (AREA)
  • Diabetes (AREA)
  • Obesity (AREA)
  • Urology & Nephrology (AREA)

Abstract

The present disclosure provides a Cas protein, a CRISPR-Cas system containing the Cas protein, and a use of the Cas protein. Specifically, the Cas protein of the present disclosure comprises an OBD domain, a REC domain, a RuvC domain, a helical domain, and a Nuc domain, and has a structure as shown in formula I or formula II. The Cas protein of the present disclosure has very good gene editing activity, can effectively edit or cleave a target gene, and can effectively treat disorders or diseases of a subject in need.

Description

一种Cas蛋白、包含其的CRISPR-Cas系统及其应用A Cas protein, a CRISPR-Cas system containing the same and applications thereof

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求享有以下专利申请的利益和优先权:专利申请号CN202311664660.5,申请于2023年12月06日,标题为“一种Cas蛋白、包含其的CRISPR-Cas系统及其应用”,该申请的全部内容,包括任何序列列表和图纸,均通过引用整体并入本文中。This application claims the benefit and priority of the following patent application: Patent application number CN202311664660.5, filed on December 6, 2023, entitled "A Cas protein, a CRISPR-Cas system comprising it and its application", and the entire contents of the application, including any sequence listings and drawings, are incorporated herein by reference in their entirety.

电子序列列表的引用References to electronic sequence listings

本披露包含一个电子序列列表(“P2024-3068xlb.xml”,由“WIPOSequence”软件根据WIPO标准ST.26创建),该序列列表通过引用整体并入本文。根据WIPO标准ST.26,符号“t”用于表示DNA中的T和RNA中的U。因此,在根据ST.26准备的序列列表中,无论何时序列为RNA,序列中的T应视为U。This disclosure contains an electronic sequence listing ("P2024-3068xlb.xml", created by "WIPOSequence" software in accordance with WIPO Standard ST.26), which is incorporated herein by reference in its entirety. In accordance with WIPO Standard ST.26, the symbol "t" is used to represent T in DNA and U in RNA. Therefore, in the sequence listing prepared in accordance with ST.26, whenever the sequence is RNA, T in the sequence should be regarded as U.

技术领域Technical Field

本公开涉及基因编辑领域,具体地,涉及一种Cas蛋白、包含其的CRISPR-Cas系统及其应用。The present disclosure relates to the field of gene editing, and in particular, to a Cas protein, a CRISPR-Cas system comprising the same, and applications thereof.

背景技术Background Art

成簇规律间隔短回文重复序列(CRISPR)和CRISPR相关(Cas)基因,统称为CRISPR-Cas或CRISPR/Cas系统,目前被认为是细菌和古细菌的抗噬菌体感染的免疫。原核生物适应性免疫的CRISPR-Cas系统是一组极其多样的蛋白质效应子(proteineffector)、非编码元件以及基因座结构,其可被工程化并用于基因编辑、靶标检测和疾病治疗等应用。Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) genes, collectively referred to as CRISPR-Cas or CRISPR/Cas systems, are currently considered to be the immune system against phage infection in bacteria and archaea. The CRISPR-Cas system of prokaryotic adaptive immunity is an extremely diverse set of protein effectors, non-coding elements, and loci that can be engineered and used for applications such as gene editing, target detection, and disease treatment.

目前已有多种Cas蛋白及对应的编辑技术,本领域仍然需要新的Cas蛋白和CRISPR-Cas系统以满足多样化的应用需求。Currently, there are a variety of Cas proteins and corresponding editing technologies. The field still needs new Cas proteins and CRISPR-Cas systems to meet diverse application needs.

发明内容Summary of the invention

本公开的主要目的在于提供新的Cas蛋白和CRISPR-Cas系统以满足多样化的应用需求。The main purpose of the present disclosure is to provide new Cas proteins and CRISPR-Cas systems to meet diverse application needs.

在一个方面,本公开提供了一种Cas蛋白,包括OBD结构域、REC结构域、RuvC结构域、Helical结构域、Nuc结构域。In one aspect, the present disclosure provides a Cas protein, comprising an OBD domain, a REC domain, a RuvC domain, a Helical domain, and a Nuc domain.

在一些实施方案中,所述RuvC结构域包含RuvC-I、RuvC-II和RuvC-III结构域。In some embodiments, the RuvC domain comprises RuvC-I, RuvC-II, and RuvC-III domains.

在一些实施方案中,所述Cas蛋白不包含HNH结构域和PI结构域。In some embodiments, the Cas protein does not comprise a HNH domain and a PI domain.

在一些实施方案中,所述RuvC-III结构域位于Nuc-I结构域与Nuc-II结构域之间。In some embodiments, the RuvC-III domain is located between the Nuc-I domain and the Nuc-II domain.

在一些实施方案中,所述OBD结构域为二分裂结构域(bi-split domain),包含OBD-I和OBD-II结构域。In some embodiments, the OBD domain is a bi-split domain, comprising OBD-I and OBD-II domains.

在一些实施方案中,所述OBD-I结构域位于N端,所述Nuc-II结构域位于C端。In some embodiments, the OBD-I domain is located at the N-terminus and the Nuc-II domain is located at the C-terminus.

在一些实施方案中,所述Cas蛋白不借助tracrRNA行使核酸切割功能。In some embodiments, the Cas protein performs nucleic acid cleavage function without the aid of tracrRNA.

在另一方面,本公开提供了一种融合蛋白,包含本公开的Cas蛋白;以及一个或多个功能结构域。In another aspect, the present disclosure provides a fusion protein comprising the Cas protein of the present disclosure; and one or more functional domains.

在一些实施方案中,所述功能结构域功能结构域选自定位信号、报告蛋白、Cas蛋白靶向部分、DNA结合域、表位标签、转录激活域、转录抑制域、核酸酶、脱氨结构域、甲基化酶、脱甲基酶、转录释放因子、HDAC、裂解活性多肽、连接酶、整合酶、转座酶、重组酶、聚合酶和碱基切除修复抑制剂(如尿嘧啶-DNA糖基化酶抑制剂(UGI))。In some embodiments, the functional domain is selected from a localization signal, a reporter protein, a Cas protein targeting portion, a DNA binding domain, an epitope tag, a transcription activation domain, a transcription repression domain, a nuclease, a deamination domain, a methylase, a demethylase, a transcription release factor, an HDAC, a cleavage-active polypeptide, a ligase, an integrase, a transposase, a recombinase, a polymerase, and a base excision repair inhibitor (such as a uracil-DNA glycosylase inhibitor (UGI)).

在一些实施方案中,所述功能结构域包括以下一种或多种对靶序列的酶活性:甲基化酶活性、脱甲基酶活性、乙酰基转移酶活性、脱乙酰酶活性、激酶活性、磷酸酶活性、泛素连接酶活性、去泛素化活性、腺苷酸化活性、脱腺苷酸化活性、SUMO化活性、脱SUMO化活性、核糖基化活性、脱核糖基化活性、豆蔻酰化活性、脱豆蔻酰化活性、糖基化活性(例如,来自O-GlcNAc转移酶)和脱糖基化活性。In some embodiments, the functional domain comprises one or more of the following enzymatic activities on the target sequence: methylase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase), and deglycosylation activity.

在一些实施方案中,所述功能结构域选自腺苷脱氨酶催化结构域或胞苷脱氨酶催化结构域。In some embodiments, the functional domain is selected from an adenosine deaminase catalytic domain or a cytidine deaminase catalytic domain.

在一个方面,本公开提供了一种分离的多核苷酸,所述的多核苷酸编码本公开所述的Cas蛋白或本公开所述的融合蛋白。In one aspect, the present disclosure provides an isolated polynucleotide encoding the Cas protein described in the present disclosure or the fusion protein described in the present disclosure.

在一个方面,本公开提供了一种分离的核酸分子,所述核酸分子包含如下式IV所示的结构:In one aspect, the present disclosure provides an isolated nucleic acid molecule comprising a structure as shown in Formula IV below:

5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3'(IV),5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3'(IV),

其中区段R1a和R1b是反向互补序列并形成第一茎(R1),所述第一茎(R1)具有在Cas蛋白中的多个(2个、或3个、或4个、或5个、或6个、或7个、或8个、或9个、或10个)核苷酸对;wherein segments R1a and R1b are reverse complementary sequences and form a first stem (R1) having a plurality (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) of nucleotide pairs in the Cas protein;

区段Ba和Bb不相互碱基配对,并形成凸起(B);Segments Ba and Bb do not base pair with each other and form a bulge (B);

区段R2a和R2b是反向互补序列并形成第二茎(R2),所述第二茎(R2)具有在多个(2个、或3个、或4个、或5个、或6个、或7个、或8个、或9个、或10个)碱基对;并且L为第二茎部处形成的、由多个(3个、4个、5个、6个、7个、8个、9个、10个)核苷酸形成的环。Segments R2a and R2b are reverse complementary sequences and form a second stem (R2), which has multiple (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) base pairs; and L is a loop formed at the second stem and formed by multiple (3, 4, 5, 6, 7, 8, 9, 10) nucleotides.

在一些实施方案中,所述的核酸分子包含如下式IV所示的结构:In some embodiments, the nucleic acid molecule comprises the structure shown in Formula IV below:

5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3'(IV),5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3'(IV),

其中区段R1a和R1b是反向互补序列并形成第一茎(R1),所述第一茎(R1)具有在Cas12o中的3个或5个核苷酸对;Wherein segments R1a and R1b are reverse complementary sequences and form a first stem (R1) having 3 or 5 nucleotide pairs in Cas12o;

区段Ba和Bb不同时存在,由存在的区段Ba或区段Bb形成的、由2个或3个核苷酸形成凸起(B);The segments Ba and Bb do not exist at the same time, and the bulge (B) formed by the existing segment Ba or segment Bb is formed by 2 or 3 nucleotides;

区段R2a和R2b是反向互补序列并形成第二茎(R2),所述第二茎(R2)具有在6个或7个碱基对;并且L为第二茎部处形成的、有5个或7个核苷酸形成的环。Segments R2a and R2b are reverse complementary sequences and form a second stem (R2) having 6 or 7 base pairs; and L is a loop formed at the second stem and having 5 or 7 nucleotides.

在一些实施方案中,所述核酸分子包含选自下列的序列,或由选自下列的序列组成:In some embodiments, the nucleic acid molecule comprises or consists of a sequence selected from the following:

(i)SEQ ID NO:2、4、6中任一所示的序列;(i) any one of SEQ ID NOs: 2, 4, and 6;

(ii)与SEQ ID NO:2、4、6中任一所示的序列相比具有一个或多个碱基的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个碱基的置换、缺失或添加)的序列;(ii) a sequence having one or more base substitutions, deletions or additions (e.g., substitutions, deletions or additions of 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 bases) compared to the sequence shown in any one of SEQ ID NOs: 2, 4, 6;

(iii)与SEQ ID NO:2、4、6中任一所示的序列具有至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%的序列同一性的序列;(iii) a sequence having at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% sequence identity to any of SEQ ID NOs: 2, 4, and 6;

(iv)在严格条件下与(i)-(iii)任一项中所述的序列杂交的序列;或(iv) a sequence that hybridizes to the sequence described in any one of (i) to (iii) under stringent conditions; or

(v)(i)-(iii)任一项中所述的序列的互补序列;(v) a complementary sequence of the sequence described in any one of (i) to (iii);

并且,(ii)-(v)中任一项所述的序列基本保留了其所源自的序列的生物学功能;Furthermore, the sequence described in any one of (ii) to (v) substantially retains the biological function of the sequence from which it is derived;

例如,所述分离的核酸分子是RNA;For example, the isolated nucleic acid molecule is RNA;

例如,所述分离的核酸分子包含CRISPR/Cas系统中的同向重复(Direct Repeat,DR)序列。For example, the isolated nucleic acid molecule contains a direct repeat (DR) sequence in the CRISPR/Cas system.

在一些实施方案中,所述核酸分子包含一个或多个茎环或优化的二级结构;In some embodiments, the nucleic acid molecule comprises one or more stem-loops or optimized secondary structures;

例如,(ii)-(v)中任一项所述的序列保留了其所源自的序列的二级结构。For example, the sequence of any of (ii)-(v) retains the secondary structure of the sequence from which it is derived.

在一些实施方案中,所述核酸分子包含选自下列的序列,或由选自下列的序列组成:In some embodiments, the nucleic acid molecule comprises or consists of a sequence selected from the following:

(a)SEQ ID NO:2、4、6中任一所示的核苷酸序列;(a) the nucleotide sequence shown in any one of SEQ ID NOs: 2, 4, and 6;

(b)在严格条件下与(a)中所述的序列杂交的序列;或(b) a sequence that hybridizes under stringent conditions to the sequence described in (a); or

(c)SEQ ID NO:2、4、6中任一所示的核苷酸序列的互补序列。(c) The complementary sequence of the nucleotide sequence shown in any one of SEQ ID NO: 2, 4, and 6.

在一个方面,本公开提供了一种向导RNA(gRNA),所述向导RNA包括能够结合本公开的Cas蛋白的同向重复(Direct Repeat,DR)序列和能够靶向靶序列的间隔(spacer)序列。In one aspect, the present disclosure provides a guide RNA (gRNA), which includes a direct repeat (DR) sequence capable of binding to the Cas protein of the present disclosure and a spacer sequence capable of targeting a target sequence.

在一个方面,本公开提供了一种载体,包含本公开的多核苷酸和/或本公开的核酸分子。In one aspect, the present disclosure provides a vector comprising the polynucleotide of the present disclosure and/or the nucleic acid molecule of the present disclosure.

在一个方面,本公开提供了一种复合物,包含:In one aspect, the present disclosure provides a composite comprising:

(i)蛋白组分,选自下组:本公开的Cas蛋白、本公开所述的融合蛋白、或其组合;和(i) a protein component selected from the group consisting of a Cas protein of the present disclosure, a fusion protein of the present disclosure, or a combination thereof; and

(ii)核酸组分,选自下组:本公开的向导RNA,编码本公开的向导RNA的核酸,本公开所述的向导RNA的前体RNA,编码本公开的向导RNA的前体RNA核酸、或其组合;(ii) a nucleic acid component selected from the group consisting of a guide RNA of the present disclosure, a nucleic acid encoding a guide RNA of the present disclosure, a precursor RNA of the guide RNA of the present disclosure, a precursor RNA nucleic acid encoding a guide RNA of the present disclosure, or a combination thereof;

在一个方面,本公开提供了一种CRISPR-Cas组合物,包含:In one aspect, the present disclosure provides a CRISPR-Cas composition comprising:

(i)第一组分,选自下组:本公开的Cas蛋白、本公开的融合蛋白、编码本公开的Cas蛋白或本公开的融合蛋白的核苷酸序列,以及其任意组合;和(i) a first component selected from the group consisting of a Cas protein of the present disclosure, a fusion protein of the present disclosure, a nucleotide sequence encoding the Cas protein of the present disclosure or the fusion protein of the present disclosure, and any combination thereof; and

(ii)第二组分,所述第二组分为包含一种或多种本公开的向导RNA,或者编码所述包含一种或多种本公开的向导RNA的核苷酸序列;所述向导RNA包括:(ii) a second component, wherein the second component is a nucleotide sequence comprising one or more guide RNAs disclosed herein, or encoding the nucleotide sequence comprising one or more guide RNAs disclosed herein; the guide RNA comprises:

(iii)能够结合本公开所述的Cas蛋白的同向重复(Direct Repeat,DR)序列;和(iii) a direct repeat (DR) sequence capable of binding to the Cas protein described in the present disclosure; and

(iv)能够靶向靶DNA的靶序列的间隔(spacer)序列,所述向导RNA被配置成与所述Cas蛋白形成复合物;(iv) a spacer sequence capable of targeting a target sequence of a target DNA, wherein the guide RNA is configured to form a complex with the Cas protein;

在一个方面,本公开提供了一种CRISPR-Cas系统,包含一种或多种载体,所述一种或多种载体包含:In one aspect, the present disclosure provides a CRISPR-Cas system comprising one or more vectors, wherein the one or more vectors comprise:

(i)第一核酸,其为编码本公开的Cas蛋白或本公开的融合蛋白的核苷酸序列;任选地,所述第一核酸可操作地连接至第一调节元件;以及(i) a first nucleic acid, which is a nucleotide sequence encoding the Cas protein of the present disclosure or the fusion protein of the present disclosure; optionally, the first nucleic acid is operably linked to a first regulatory element; and

(ii)第二核酸,其编码本公开的向导RNA的核苷酸序列;任选地,所述第二核酸可操作地连接至第二调节元件;所述向导RNA包含:(ii) a second nucleic acid encoding a nucleotide sequence of a guide RNA of the present disclosure; optionally, the second nucleic acid is operably linked to a second regulatory element; the guide RNA comprises:

(iii)能够结合本公开的Cas蛋白的同向重复(Direct Repeat,DR)序列;和(iii) a direct repeat (DR) sequence capable of binding to the Cas protein of the present disclosure; and

(iv)能够靶向靶DNA的靶序列的间隔(spacer)序列,所述向导RNA被配置成与所述Cas蛋白形成复合物;(iv) a spacer sequence capable of targeting a target sequence of a target DNA, wherein the guide RNA is configured to form a complex with the Cas protein;

其中:in:

所述第一核酸与第二核酸存在于相同或不同的载体上所述向导RNA能够与(i)中所述的Cas蛋白或融合蛋白形成复合物。The first nucleic acid and the second nucleic acid are present on the same or different vectors. The guide RNA is capable of forming a complex with the Cas protein or fusion protein described in (i).

在一些实施方案中,所述载体包括质粒、病毒载体。In some embodiments, the vector comprises a plasmid or a viral vector.

在一些实施方案中,所述向导RNA包括能够与靶序列杂交的间隔(spacer)序列;和与间隔(spacer)序列连接,并能够引导所述蛋白结合至所述向导RNA,从而形成靶向所述靶序列的CRISPR-Cas组合物或复合物的同向重复(Direct Repeat,DR)序列。In some embodiments, the guide RNA includes a spacer sequence capable of hybridizing with a target sequence; and a direct repeat (DR) sequence connected to the spacer sequence and capable of guiding the protein to bind to the guide RNA, thereby forming a CRISPR-Cas composition or complex targeting the target sequence.

在一些实施方案中,所述向导RNA包括未修饰和经修饰的向导RNA。In some embodiments, the guide RNA includes unmodified and modified guide RNA.

在一些实施方案中,所述经修饰的向导RNA包括碱基的化学修饰。In some embodiments, the modified guide RNA includes chemical modifications of bases.

在一些实施方案中,所述化学修饰包括甲基化修饰、甲氧基修饰、氟化修饰、或硫代修饰。In some embodiments, the chemical modification comprises methylation modification, methoxy modification, fluorination modification, or thio modification.

在一些实施方案中,所述第一调节元件和/或第二调节元件是启动子,例如诱导型启动子。In some embodiments, the first regulatory element and/or the second regulatory element is a promoter, such as an inducible promoter.

在一些实施方案中,所述组合物中的至少一个组分是非天然存在的或经修饰的。In some embodiments, at least one component in the composition is non-naturally occurring or modified.

在一些实施方案中,所述间隔(spacer)序列连接至所述同向重复(Direct Repeat,DR)序列的3'端。In some embodiments, the spacer sequence is connected to the 3' end of the direct repeat (DR) sequence.

在一些实施方案中,所述间隔(spacer)序列包含所述靶序列的互补序列。In some embodiments, the spacer sequence comprises a complementary sequence to the target sequence.

在一些实施方案中,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3'端,并且所述PAM为5'-TN,其中,N为A、T、G或C。In some embodiments, when the target sequence is DNA, the target sequence is located 3' to a protospacer adjacent motif (PAM), and the PAM is 5'-TN, wherein N is A, T, G or C.

在一些实施方案中,所述靶序列是来自原核细胞或真核细胞的DNA或基于RNA反转录形成的DNA序列;或者,所述靶序列是非天然存在的DNA或基于RNA反转录形成的DNA序列。In some embodiments, the target sequence is a DNA from a prokaryotic cell or a eukaryotic cell, or a DNA sequence formed based on RNA reverse transcription; alternatively, the target sequence is a non-naturally occurring DNA, or a DNA sequence formed based on RNA reverse transcription.

在一些实施方案中,所述靶序列包括cDNA序列。In some embodiments, the target sequence comprises a cDNA sequence.

在一些实施方案中,所述靶序列包括单链DNA、双链DNA序列。In some embodiments, the target sequence comprises a single-stranded DNA or a double-stranded DNA sequence.

在一些实施方案中,所述靶序列存在于细胞内。In some embodiments, the target sequence is present within a cell.

在一些实施方案中,所述靶序列存在于细胞核内或细胞质(例如,细胞器)内。In some embodiments, the target sequence is present in the nucleus or in the cytoplasm (eg, an organelle).

在一些实施方案中,所述细胞是真核细胞。In some embodiments, the cell is a eukaryotic cell.

在一些实施方案中,所述细胞是原核细胞。In some embodiments, the cell is a prokaryotic cell.

在一些实施方案中,所述靶序列存在于细胞外部。In some embodiments, the target sequence is present outside the cell.

在一些实施方案中,本公开的Cas蛋白连接有一个或多个NLS序列,或者,所述融合蛋白包含一个或多个NLS序列。In some embodiments, the Cas protein of the present disclosure is linked to one or more NLS sequences, or the fusion protein comprises one or more NLS sequences.

在一些实施方案中,所述NLS序列连接至本公开的Cas蛋白的N端或C端。In some embodiments, the NLS sequence is linked to the N-terminus or C-terminus of the Cas protein of the present disclosure.

在一些实施方案中,所述NLS序列融合至本公开的Cas蛋白的N端或C端。In some embodiments, the NLS sequence is fused to the N-terminus or C-terminus of the Cas protein of the present disclosure.

在一个方面,本公开提供了一种试剂盒,包括一种或多种选自下列的组分:本公开的Cas蛋白、本公开的融合蛋白、本公开的多核苷酸、本公开的载体、本公开的复合物、本公开的CRISPR-Cas组合物、或本公开的CRISPR-Cas系统。In one aspect, the present disclosure provides a kit comprising one or more components selected from the following: a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, or a CRISPR-Cas system of the present disclosure.

在一些实施方案中,所述试剂盒还包括标签或说明书。In some embodiments, the kit further comprises a label or instructions.

在一些实施方案中,所述试剂盒用于基因或基因组编辑、疾病治疗、靶向靶基因、切割目的基因或非目的基因的一种或多种。In some embodiments, the kit is used for one or more of gene or genome editing, disease treatment, targeting a target gene, and cutting a target gene or a non-target gene.

在一个方面,本公开提供了一种递送组合物,包含递送载体或递送介质,以及选自下列的一种或多种:本公开的Cas蛋白、本公开的融合蛋白、本公开的多核苷酸、本公开的载体、本公开的复合物、本公开的CRISPR-Cas组合物、或本公开的CRISPR-Cas系统。In one aspect, the present disclosure provides a delivery composition comprising a delivery vector or a delivery medium, and one or more selected from the following: a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, or a CRISPR-Cas system of the present disclosure.

在一个方面,本公开提供了一种宿主细胞,包含本公开的Cas蛋白、本公开的融合蛋白、本公开的多核苷酸、本公开的载体、本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、或本公开的递送组合物。In one aspect, the present disclosure provides a host cell comprising a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, a CRISPR-Cas system of the present disclosure, or a delivery composition of the present disclosure.

在一个方面,本公开提供了一种酶制剂,所述酶制剂包括本公开的Cas蛋白、本公开的融合蛋白、本公开的复合物、本公开的CRISPR-Cas组合物、或本公开的CRISPR-Cas系统、或本公开的递送组合物。In one aspect, the present disclosure provides an enzyme preparation, comprising the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, or the CRISPR-Cas system of the present disclosure, or the delivery composition of the present disclosure.

在另一方面,本公开提供了一种药盒,包括:In another aspect, the present disclosure provides a kit comprising:

第一容器,以及位于所述第一容器中的本公开的复合物、或本公开的CRISPR-Cas组合物、或本公开的CRISPR-Cas系统、或含有本公开的复合物或本公开的CRISPR-Cas组合物或本公开的CRISPR-Cas系统的药物。A first container, and a complex of the present disclosure, or a CRISPR-Cas composition of the present disclosure, or a CRISPR-Cas system of the present disclosure, or a drug containing the complex of the present disclosure, or a CRISPR-Cas composition of the present disclosure, or a CRISPR-Cas system of the present disclosure, located in the first container.

在另一方面,本公开提供了一种药盒,包括:In another aspect, the present disclosure provides a kit comprising:

(a1)第一容器,以及位于所述第一容器中的本公开的Cas蛋白、或本公开的融合蛋白、或其编码基因或其表达载体,或含有本公开的Cas蛋白、或本公开的融合蛋白、或其编码基因或其表达载体的药物;(a1) a first container, and a Cas protein of the present disclosure, or a fusion protein of the present disclosure, or a gene encoding the Cas protein of the present disclosure, or an expression vector thereof, or a drug containing the Cas protein of the present disclosure, or a fusion protein of the present disclosure, or a gene encoding the Cas protein of the present disclosure, or an expression vector thereof, located in the first container;

(b1)任选的第二容器,以及位于所述第二容器中的本公开的向导RNA或其表达载体,或含有本公开的向导RNA或其表达载体的药物。(b1) an optional second container, and the guide RNA of the present disclosure or its expression vector, or a drug containing the guide RNA of the present disclosure or its expression vector, located in the second container.

在另一方面,本公开提供了一种靶向和编辑靶基因或切割靶基因的方法,包括:将本公开的Cas蛋白、本公开的融合蛋白、本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、本公开的递送组合物、本公开的酶制剂、或本公开的药盒与所述靶基因接触,或者递送至包含所述靶基因的细胞中,靶序列存在于所述靶基因中。On the other hand, the present disclosure provides a method for targeting and editing a target gene or cutting a target gene, comprising: contacting the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the drug kit of the present disclosure with the target gene, or delivering it to a cell containing the target gene, wherein the target sequence is present in the target gene.

在另一方面,本公开提供了一种诱导细胞状态改变的方法,所述方法包括将本公开的Cas蛋白、本公开的融合蛋白、本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、本公开的递送组合物、本公开的酶制剂、或本公开的药盒与细胞中的靶基因接触。In another aspect, the present disclosure provides a method of inducing a change in a cell state, the method comprising contacting the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the drug kit of the present disclosure with a target gene in a cell.

在另一方面,本公开提供了一种改变基因产物的表达的方法,包括:将本公开的Cas蛋白、本公开的融合蛋白、本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、本公开的递送组合物、本公开的酶制剂、或本公开的药盒与编码所述基因产物的核酸分子接触,或者递送至包含所述核酸分子的细胞中,所述靶序列存在于所述核酸分子中。On the other hand, the present disclosure provides a method for altering the expression of a gene product, comprising: contacting the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the drug kit of the present disclosure with a nucleic acid molecule encoding the gene product, or delivering it to a cell comprising the nucleic acid molecule, wherein the target sequence is present in the nucleic acid molecule.

在另一方面,本公开提供了一种由本公开任一所述的方法获得的细胞或其子代,其中所述细胞包含在其野生型中不存在的修饰。In another aspect, the present disclosure provides a cell or progeny thereof obtained by any of the methods described herein, wherein the cell comprises a modification that is not present in its wild type.

在另一方面,本公开提供了本公开的细胞或其子代的细胞产物。In another aspect, the disclosure provides a cell product of a cell of the disclosure or a progeny thereof.

在另一方面,本公开提供了一种体外的、离体的或体内的细胞或细胞系或它们的子代,所述细胞或细胞系或它们的子代包含:本公开的Cas蛋白、本公开的融合蛋白、本公开的多核苷酸、本公开的载体、本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、或本公开的递送组合物。In another aspect, the present disclosure provides an in vitro, ex vivo or in vivo cell or cell line or progeny thereof, comprising: a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, a CRISPR-Cas system of the present disclosure, or a delivery composition of the present disclosure.

在另一方面,本公开提供了一种细胞制剂,包括本公开的宿主细胞、本公开的细胞或其子代、或本公开的细胞或其子代的细胞产物、或本公开的细胞或细胞系或它们的子代。In another aspect, the present disclosure provides a cell preparation comprising the host cell of the present disclosure, the cell of the present disclosure or its progeny, or a cell product of the cell of the present disclosure or its progeny, or the cell or cell line of the present disclosure or its progeny.

在另一方面,本公开还提供了本公开的Cas蛋白、本公开的融合蛋白、本公开的多核苷酸、本公开的载体、本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、本公开的试剂盒、本公开的递送组合物、本公开的酶制剂、或本公开的药盒的用途,用于制备药物或制剂,所述药物或制剂用于核酸编辑(例如,基因或基因组编辑)。On the other hand, the present disclosure also provides uses of the Cas protein of the present disclosure, the fusion protein of the present disclosure, the polynucleotide of the present disclosure, the vector of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the kit of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the drug kit of the present disclosure for preparing a drug or preparation for nucleic acid editing (e.g., gene or genome editing).

在另一方面,本公开提供了本公开的Cas蛋白、本公开的融合蛋白、本公开的多核苷酸、本公开的载体、本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、本公开的试剂盒、本公开的递送组合物、本公开的酶制剂、或本公开的药盒的用途,用于制备药物或制剂,所述药物或制剂用于选自下组的一种或多种:In another aspect, the present disclosure provides uses of the Cas protein of the present disclosure, the fusion protein of the present disclosure, the polynucleotide of the present disclosure, the vector of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the kit of the present disclosure, the delivery composition of the present disclosure, the enzyme preparation of the present disclosure, or the kit of the present disclosure for preparing a medicament or preparation, wherein the medicament or preparation is used for one or more selected from the following group:

(i)离体基因或基因组编辑;(i) ex vivo gene or genome editing;

(ii)离体单链DNA的检测;(ii) Detection of single-stranded DNA in vitro;

(iii)编辑靶基因座中的靶序列来修饰生物或非人类生物;(iii) editing a target sequence in a target locus to modify an organism or non-human organism;

(iv)治疗由靶基因座中的靶序列的缺陷引起的病症;(iv) treating a disorder caused by a defect in the target sequence in the target locus;

(v)治疗有需要的受试者的病症或疾病。(v) treating a condition or disease in a subject in need thereof.

在另一方面,本公开提供了一种检测样品中是否存在靶标核酸分子的方法,所述方法包括将样品与本公开的Cas蛋白、本公开的融合蛋白、本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、本公开的试剂盒、本公开的递送组合物、或本公开的酶制剂和非靶序列接触,检测非靶序列被切割产生的可检测信号,从而检测靶标核酸分子,所述非靶序列不与向导RNA杂交。On the other hand, the present disclosure provides a method for detecting the presence of a target nucleic acid molecule in a sample, the method comprising contacting the sample with the Cas protein of the present disclosure, the fusion protein of the present disclosure, the complex of the present disclosure, the CRISPR-Cas composition of the present disclosure, the CRISPR-Cas system of the present disclosure, the kit of the present disclosure, the delivery composition of the present disclosure, or the enzyme preparation of the present disclosure and a non-target sequence, detecting a detectable signal generated by the cleavage of the non-target sequence, thereby detecting the target nucleic acid molecule, wherein the non-target sequence does not hybridize with the guide RNA.

在另一方面,本公开提供了一种治疗有需要的受试者的病症或疾病的方法,其包括向所述受试者施用本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、本公开的试剂盒、本公开的递送组合物、本公开的酶制剂、或本公开的药盒。In another aspect, the present disclosure provides a method of treating a condition or disease in a subject in need thereof, comprising administering to the subject a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, a CRISPR-Cas system of the present disclosure, a kit of the present disclosure, a delivery composition of the present disclosure, an enzyme preparation of the present disclosure, or a pharmaceutical kit of the present disclosure.

在另一方面,本公开提供了一种无菌容器,其包含本公开的Cas蛋白、本公开的融合蛋白、本公开的多核苷酸、本公开的载体、本公开的复合物、本公开的CRISPR-Cas组合物或本公开的CRISPR-Cas系统或本公开的递送组合物或本公开的酶制剂。On the other hand, the present disclosure provides a sterile container comprising a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, or a CRISPR-Cas system of the present disclosure, or a delivery composition of the present disclosure, or an enzyme preparation of the present disclosure.

在另一方面,本公开提供了一种可植入装置,其包含本公开的Cas蛋白、本公开的融合蛋白、本公开的多核苷酸、本公开的载体、本公开的复合物、本公开的CRISPR-Cas组合物、本公开的CRISPR-Cas系统、本公开的递送组合物、或本公开的酶制剂。In another aspect, the present disclosure provides an implantable device comprising a Cas protein of the present disclosure, a fusion protein of the present disclosure, a polynucleotide of the present disclosure, a vector of the present disclosure, a complex of the present disclosure, a CRISPR-Cas composition of the present disclosure, a CRISPR-Cas system of the present disclosure, a delivery composition of the present disclosure, or an enzyme preparation of the present disclosure.

应理解,在本公开范围内中,本公开的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合,从而构成新的或优选的技术方案。限于篇幅,在此不再一一累述。It should be understood that within the scope of the present disclosure, the above-mentioned technical features of the present disclosure and the technical features specifically described below (such as embodiments) can be combined with each other to form a new or preferred technical solution. Due to space limitations, they will not be described one by one here.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1描述了Cas12o同源物的系统树。Figure 1 depicts a phylogenetic tree of Cas12o homologs.

图2描述了Cas12o的示意性结构域(图2A、2C)及预测的三维结构(图2B)。Figure 2 describes the schematic domain structure (Figures 2A, 2C) and predicted three-dimensional structure (Figure 2B) of Cas12o.

图3描述了Cas12o1表达载体(图3A)和LbCpf1表达载体(图3B)FIG3 depicts the Cas12o1 expression vector ( FIG3A ) and the LbCpf1 expression vector ( FIG3B )

图4描述了带有靶序列hTTR1靶序列的Target质粒。FIG. 4 depicts the Target plasmid carrying the target sequence hTTR1.

图5描述了Cas12o和LbCpf1在感受态细胞中针对hTTR1靶序列的切割活性比较。FIG5 depicts a comparison of the cleavage activities of Cas12o and LbCpf1 against the hTTR1 target sequence in competent cells.

图6描述了Cas12o1、Cas12o2、Cas12o3的DR序列的二级结构预测。Figure 6 describes the secondary structure prediction of the DR sequences of Cas12o1, Cas12o2, and Cas12o3.

图7描述了通过实验所确定的Cas12o1的PAM偏好。FIG. 7 depicts the experimentally determined PAM preference of Cas12o1.

具体实施方式DETAILED DESCRIPTION

本公开人经过广泛而深入的研究,首次发现了一种全新的Cas蛋白,本公开的Cas蛋白包括OBD结构域、RuvC结构域、Helical结构域、Nuc结构域,并且具有式I、式II或式III所示的结构。本公开的Cas蛋白具有非常好的基因编辑活性和特异性,可对靶基因进行有效编辑或切割,可用于治疗有需要的受试者的病症或疾病本公开。After extensive and in-depth research, the present inventors have discovered a new Cas protein for the first time. The Cas protein of the present invention includes an OBD domain, a RuvC domain, a Helical domain, and a Nuc domain, and has a structure shown in Formula I, Formula II, or Formula III. The Cas protein of the present invention has very good gene editing activity and specificity, can effectively edit or cut the target gene, and can be used to treat the symptoms or diseases of subjects in need of the present invention.

受益于前述描述中呈现的教导,本公开所属领域普通技术人员将想到本文中阐述的本公开的许多修改及其他实施方案。因此,应该明白,本公开不限于所公开的具体实施方案,且修改及其他实施方案预期被包含在所附权利要求的范围内。虽然本文采用特定术语,但这类术语仅以一般性及描述性意义使用,而非出于限制性目的。With the benefit of the teachings presented in the foregoing description, one of ordinary skill in the art to which the present disclosure pertains will recognize many modifications and other embodiments of the present disclosure set forth herein. Therefore, it should be understood that the present disclosure is not limited to the specific embodiments disclosed, and modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, such terms are used only in a general and descriptive sense and not for limiting purposes.

术语the term

以下实施例仅用于描述本公开,而非限定本公开。除非特别指明,否则基本上按照本领域内熟知的以及在各种参考文献中描述的常规方法进行实施例中描述的实验和方法。The following examples are only used to illustrate the present disclosure, rather than to limit the present disclosure. Unless otherwise specified, the experiments and methods described in the examples are basically carried out according to conventional methods well known in the art and described in various references.

另外,实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。本领域技术人员知晓,实施例以举例方式描述本公开,且不意欲限制本公开所要求保护的范围。本文中提及的全部公开案和其他参考资料以其全文通过引用合并入本文。In addition, if the specific conditions are not specified in the examples, they are carried out according to the conditions recommended by the conventional conditions or the manufacturers. If the manufacturers are not specified in the reagents or instruments used, they are all conventional products that can be obtained commercially. It is known to those skilled in the art that the embodiments describe the present disclosure by way of example and are not intended to limit the scope of the present disclosure. All public cases and other references mentioned herein are incorporated herein by reference in their entirety.

为了可以更容易地理解本公开,首先定义某些术语。如本申请中所使用的,除非本文另有明确规定,否则以下术语中的每一个应具有下面给出的含义。在整个申请中阐述了其它定义。In order to more easily understand the present disclosure, some terms are first defined. As used in this application, unless otherwise expressly provided herein, each of the following terms should have the meaning given below. Other definitions are set forth throughout the application.

序列同一性(或同源性)通过沿着预定的比较窗(其可以是参考核苷酸序列或蛋白的长度的50%、60%、70%、80%、90%、95%或100%)比较两个对齐的序列,并且确定出现相同的残基的位置的数目来确定。通常地,这表示为百分比。核苷酸序列的序列同一性的测量是本领域技术人员熟知的方法。Sequence identity (or homology) is determined by comparing two aligned sequences along a predetermined comparison window (which can be 50%, 60%, 70%, 80%, 90%, 95% or 100% of the length of the reference nucleotide sequence or protein) and determining the number of positions at which identical residues occur. Typically, this is expressed as a percentage. The measurement of sequence identity of nucleotide sequences is a method well known to those skilled in the art.

一般定义General Definition

冠词“一(a)”及“一(an)”在本文中用于指该冠词的一或多于一个(也即,至少一个)的语法对象。作为实例,“一个多肽”表达一个或多个多肽。The articles "a" and "an" are used herein to refer to one or more than one (ie, at least one) of the grammatical object of the article. As an example, "a polypeptide" expresses one or more polypeptides.

术语“约”或“大约”是指与参考数量、水平、值、数量、频率、百分比、尺度、大小、量、重量或长度相比较,改变多达15%、10%、9%、8%、7%、6%、5%、4%、3%、2%或1%的数量、水平、值、数量、频率、百分比、尺度、大小、量、重量或长度。在一个实施方式中,术语"约"或"大约”是指围绕参考数量、水平、值、数量、频率、百分比、尺度、大小、量、重量或长度±15%、±10%、±9%、±8%、±7%、±6%、±5%、±4%、±3%、±2%或±1%的数量、水平、值、数量、频率、百分比、尺度、大小、量、重量或长度范围。The term "about" or "approximately" refers to a quantity, level, value, quantity, frequency, percentage, dimension, size, amount, weight, or length that varies by up to 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% compared to a reference quantity, level, value, quantity, frequency, percentage, dimension, size, amount, weight, or length. In one embodiment, the term "about" or "approximately" refers to a quantity, level, value, quantity, frequency, percentage, dimension, size, amount, weight, or length range of ±15%, ±10%, ±9%, ±8%, ±7%, ±6%, ±5%, ±4%, ±3%, ±2%, or ±1% around a reference quantity, level, value, quantity, frequency, percentage, dimension, size, amount, weight, or length.

术语“任选的”或“任选地”,是指指随后描述的事件、情况或替代物可发生或可未发生,并且描述包括其中所述事件或情况发生的情况和其中它未发生的情况。The term "optional" or "optionally" means that the subsequently described event, circumstance or alternative may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.

在本说明书全文,除非上下文另有要求,否则术语“包含”、“包括”、“含有”和“具有”应理解为暗示包括所述步骤或要素或者步骤或要素组,但不排除任何其他步骤或要素或者步骤或要素组。在特定实施方式中,术语“包含”、“包括”、“含有”和“具有”同义使用。Throughout this specification, unless the context requires otherwise, the terms "comprises", "including", "containing" and "having" should be understood to imply the inclusion of the stated steps or elements or groups of steps or elements, but not the exclusion of any other steps or elements or groups of steps or elements. In specific embodiments, the terms "comprises", "including", "containing" and "having" are used synonymously.

术语“异源的”,是指分别不存在于天然核酸或蛋白质中的核苷酸或多肽序列。例如,相对于Cas12o,异源多肽包含来自除Cas12o蛋白之外的蛋白质的氨基酸序列。在一些情况下,来自一个物种的Cas12o蛋白的一部分与来自不同物种的Cas12o蛋白的一部分融合。因此,可认为来自每个物种的Cas12o序列相对于彼此是异源的。作为另一个实例,Cas12o蛋白(例如,dCas12o蛋白)可与来自非Cas12o蛋白(例如,脱氨酶、组蛋白脱乙酰酶)的活性结构域融合,并且所述活性结构域的序列可被认为是异源多肽(它与Cas12o蛋白是异源的)。The term "heterologous" refers to a nucleotide or polypeptide sequence that is not present in a natural nucleic acid or protein, respectively. For example, relative to Cas12o, a heterologous polypeptide comprises an amino acid sequence from a protein other than the Cas12o protein. In some cases, a portion of a Cas12o protein from one species is fused with a portion of a Cas12o protein from a different species. Therefore, it can be considered that the Cas12o sequences from each species are heterologous to each other. As another example, a Cas12o protein (e.g., dCas12o protein) can be fused with an active domain from a non-Cas12o protein (e.g., deaminase, histone deacetylase), and the sequence of the active domain can be considered to be a heterologous polypeptide (it is heterologous to the Cas12o protein).

术语“直系同源物”和“同系物”,在本领域中是众所周知的。作为进一步的指导,如本文所用的蛋白质的“同系物”是与作为其同系物的蛋白质发挥相同或类似功能的相同物种的蛋白质。同源蛋白质可以是但不需要是结构上相关的,或仅是部分结构上相关的。如本文所用的蛋白质的“直系同源物”是与作为其直系同源物的蛋白质发挥相同或类似功能的不同物种的蛋白质。直系同源蛋白质可以是但不需要是结构上相关的,或仅是部分结构上相关的。在一个实施方案中,诸如本文所提及的核酸指导的核酸酶的同系物或直系同源物与核酸指导的核酸酶具有至少80%、至少85%、至少90%、至少95%的序列同源性或同一性。在另外的实施方案中,核酸指导的核酸酶的同系物或直系同源物与野生型核酸指导的核酸酶具有至少80%、至少85%、至少90%或至少95%的序列同一性。The terms "orthologs" and "homologs" are well known in the art. As a further guide, a "homolog" of a protein as used herein is a protein of the same species that performs the same or similar function as the protein to which it is a homolog. Homologous proteins may be, but need not be structurally related, or only partially structurally related. An "ortholog" of a protein as used herein is a protein of a different species that performs the same or similar function as the protein to which it is a homolog. Orthologous proteins may be, but need not be structurally related, or only partially structurally related. In one embodiment, a homolog or ortholog of a nucleic acid-guided nuclease such as that mentioned herein has at least 80%, at least 85%, at least 90%, at least 95% sequence homology or identity with the nucleic acid-guided nuclease. In another embodiment, a homolog or ortholog of a nucleic acid-guided nuclease has at least 80%, at least 85%, at least 90%, or at least 95% sequence identity with a wild-type nucleic acid-guided nuclease.

可鉴定已知的核酸指导的核酸酶的其他直系同源物。鉴定核酸指导的核酸酶的直系同源物的一些方法可涉及鉴定目标基因组中的tracr序列。tracr序列的鉴定可涉及以下步骤:在数据库中搜索正向重复序列或tracr配对序列以鉴定包含核酸指导的核酸酶的区域。在正义和反义方向上侧接核酸指导的核酸酶的区域中搜索同源序列。寻找转录终止子和二级结构。鉴定不是正向重复序列或tracr配对序列,但与正向重复序列或tracr配对序列具有大于50%同一性的任何序列作为潜在tracr序列。获取潜在tracr序列并且分析与其相关联的转录终止子序列。Other orthologs of known nucleic acid-guided nucleases can be identified. Some methods of identifying orthologs of nucleic acid-guided nucleases may involve identifying a tracr sequence in a target genome. Identification of tracr sequences may involve the following steps: Searching for direct repeat sequences or tracr mate sequences in a database to identify regions containing nucleic acid-guided nucleases. Searching for homologous sequences in regions flanking nucleic acid-guided nucleases in the sense and antisense directions. Looking for transcription terminators and secondary structures. Identifying any sequence that is not a direct repeat sequence or tracr mate sequence but has greater than 50% identity with a direct repeat sequence or tracr mate sequence as a potential tracr sequence. Obtaining a potential tracr sequence and analyzing the transcription terminator sequence associated therewith.

嵌合酶可以包含第一片段和第二片段,并且所述片段可以是某一属或某一种的生物体的核酸指导的核酸酶直系同源物的片段,例如,所述片段来自不同种的核酸指导的核酸酶直系同源物。The chimeric enzyme can comprise a first fragment and a second fragment, and the fragments can be fragments of nucleic acid-guided nuclease orthologs of an organism of a certain genus or species, for example, the fragments are from nucleic acid-guided nuclease orthologs of different species.

术语“多核苷酸”和“核酸”在本文中可互换使用,是指具有任何长度的核苷酸(核糖核苷酸或脱氧核苷酸)的聚合形式。因此,该术语包括但不限于单链、双链或多链DNA或RNA、基因组DNA、cDNA、DNA-RNA杂交体或包含嘌呤碱基和嘧啶碱基或其他天然的、化学或生物化学修饰的、非天然的或衍生的核苷酸碱基的聚合物。术语“多核苷酸”和“核酸”应理解为包括如可适用于所描述的实施方案的单链(诸如有义链或反义链)和双链多核苷酸。The terms "polynucleotide" and "nucleic acid" are used interchangeably herein and refer to a polymeric form of nucleotides (ribonucleotides or deoxynucleotides) of any length. Thus, the term includes, but is not limited to, single-stranded, double-stranded or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or polymers containing purine bases and pyrimidine bases or other natural, chemically or biochemically modified, non-natural or derived nucleotide bases. The terms "polynucleotide" and "nucleic acid" should be understood to include single-stranded (such as sense or antisense strands) and double-stranded polynucleotides as applicable to the described embodiments.

术语“多肽”、“肽”和“蛋白质”在本文中可互换使用,是指具有任何长度的氨基酸的聚合形式,其可包括遗传编码和非遗传编码的氨基酸、化学或生物化学修饰的或衍生的氨基酸以及具有修饰的肽骨架的多肽。所述术语包括:融合蛋白,其包括但不限于具有异源氨基酸序列的融合蛋白,具有异源和同源前导序列、具有或不具有N端甲硫氨酸残基的融合体;免疫标记蛋白等。The terms "polypeptide", "peptide" and "protein" are used interchangeably herein and refer to a polymeric form of amino acids of any length, which may include genetically encoded and non-genetically encoded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides with modified peptide backbones. The terms include: fusion proteins, including but not limited to fusion proteins with heterologous amino acid sequences, fusions with heterologous and homologous leader sequences, with or without N-terminal methionine residues; immunolabeled proteins, etc.

术语“分离的”是指在描述处于与多核苷酸、多肽或细胞天然存在的环境不同的环境中的所述多核苷酸、多肽或细胞。分离的遗传修饰的宿主细胞可存在于遗传修饰的宿主细胞的混合群体中。The term "isolated" is meant to describe a polynucleotide, polypeptide or cell that is in an environment different from that in which the polynucleotide, polypeptide or cell naturally occurs. An isolated genetically modified host cell may be present in a mixed population of genetically modified host cells.

术语“外源核酸”是指在自然界中不是正常或天然存在的核酸和/或不是由给定细菌、生物体或细胞产生的核酸。如本文所用,术语“内源核酸”是指在自然界中正常存在的核酸和/或由给定细菌、生物体或细胞产生的核酸。“内源核酸”也称为“天然核酸”或对于给定细菌、生物体或细胞“天然”的核酸。The term "exogenous nucleic acid" refers to nucleic acids that are not normally or naturally occurring in nature and/or are not produced by a given bacterium, organism, or cell. As used herein, the term "endogenous nucleic acid" refers to nucleic acids that are normally occurring in nature and/or are produced by a given bacterium, organism, or cell. "Endogenous nucleic acid" is also referred to as "native nucleic acid" or nucleic acids that are "native" to a given bacterium, organism, or cell.

术语“重组”,具体核酸(DNA或RNA)是克隆、限制和/或连接步骤的各种组合的产物,所述步骤产生具有可与天然系统中存在的内源核酸区别开的结构编码序列或非编码序列的构建体。一般而言,编码结构编码序列的DNA序列可由cDNA片段和短寡核苷酸接头或由一系列合成寡核苷酸组装,以提供能够由包含在细胞中或无细胞转录和翻译系统中的重组转录单元表达的合成核酸。此类序列可以不被内部非翻译序列或内含子中断的开放阅读框形式提供,所述内部非翻译序列或内含子通常存在于真核基因中。包含相关序列的基因组DNA还可用于重组基因或转录单元的形成中。非翻译DNA的序列可存在于开放读码框的5'端或3'端,其中此类序列不干扰编码区的操作或表达,并且实际上可通过各种机制起到调节所需产物的产生的作用。The term "recombinant", specifically nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction and/or ligation steps, which produce constructs with structural coding sequences or non-coding sequences that can be distinguished from endogenous nucleic acids present in natural systems. In general, the DNA sequence encoding the structural coding sequence can be assembled by cDNA fragments and short oligonucleotide linkers or by a series of synthetic oligonucleotides to provide a synthetic nucleic acid that can be expressed by a recombinant transcription unit contained in a cell or in a cell-free transcription and translation system. Such sequences can be provided in the form of an open reading frame that is not interrupted by internal non-translated sequences or introns, which are typically present in eukaryotic genes. Genomic DNA containing related sequences can also be used in the formation of recombinant genes or transcription units. The sequence of non-translated DNA can be present at the 5' end or 3' end of the open reading frame, wherein such sequences do not interfere with the operation or expression of the coding region, and can actually play a role in regulating the production of the desired product by various mechanisms.

因此,例如术语“重组”多核苷酸或“重组”核酸是指非天然存在的多核苷酸或核酸,例如通过人干预由序列的两个另外分开的区段的人工组合制成的多核苷酸或核酸。这种人工组合常常通过化学合成手段或通过人工操纵核酸的分开区段(例如,通过遗传工程化技术)来完成。通常进行这种操作以用编码相同或保守氨基酸的冗余密码子替换密码子,同时通常引入或移除序列识别位点。可替代地,将具有所需功能的核酸区段连接在一起以产生所需的功能组合。这种人工组合常常通过化学合成手段或通过人工操纵核酸的分开区段(例如,通过遗传工程化技术)来完成。Therefore, for example, the term "recombinant" polynucleotide or "recombinant" nucleic acid refers to a non-naturally occurring polynucleotide or nucleic acid, such as a polynucleotide or nucleic acid made by an artificial combination of two otherwise separated segments of a sequence through human intervention. This artificial combination is often accomplished by chemical synthesis means or by artificially manipulating the separated segments of nucleic acid (e.g., by genetic engineering techniques). This operation is usually performed to replace codons with redundant codons encoding the same or conservative amino acids, and sequence recognition sites are usually introduced or removed. Alternatively, nucleic acid segments with desired functions are linked together to produce desired functional combinations. This artificial combination is often accomplished by chemical synthesis means or by artificially manipulating the separated segments of nucleic acid (e.g., by genetic engineering techniques).

类似地,术语“重组”多肽是指非天然存在的多肽,例如通过人干预由氨基酸序列的两个另外分开的区段的人工组合制成的多肽。因此,例如,包含异源氨基酸序列的多肽是重组的。Similarly, the term "recombinant" polypeptide refers to a non-naturally occurring polypeptide, such as a polypeptide made by the artificial combination of two otherwise separate segments of amino acid sequence through human intervention. Thus, for example, a polypeptide comprising a heterologous amino acid sequence is recombinant.

术语“可操作地连接”是指其中所述组分处于允许它们以其预期的方式起作用的关系的并置。例如,如果启动子影响编码序列的转录或表达,将启动子可操作地连接至所述编码序列。如本文所用,术语“异源启动子”和“异源控制区”是指通常与自然界中的特定核酸不相关的启动子和其他控制区。例如,“与编码区异源的转录控制区”是通常与自然界中的编码区不相关的转录控制区。The term "operably linked" refers to a juxtaposition in which the components are in a relationship that allows them to function in their intended manner. For example, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. As used herein, the terms "heterologous promoter" and "heterologous control regions" refer to promoters and other control regions that are not normally associated with a particular nucleic acid in nature. For example, a "transcriptional control region heterologous to a coding region" is a transcriptional control region that is not normally associated with a coding region in nature.

术语“载体”是指能够转运与其连接的另一核酸的核酸分子。它是复制子,例如质粒、噬菌体或粘粒,可在其中插入另一个DNA区段以实现所插入区段的复制。通常,当与适当的控制元件结合时,载体能够复制。在某些情况下,载体系统包含单个载体。或者,载体系统包含多个载体。载体可以是病毒载体。The term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it is linked. It is a replicon, such as a plasmid, phage or cosmid, into which another DNA segment can be inserted to achieve replication of the inserted segment. Generally, when combined with appropriate control elements, the vector is capable of replication. In some cases, the vector system comprises a single vector. Alternatively, the vector system comprises a plurality of vectors. The vector can be a viral vector.

载体包括但不限于单链、双链或部分双链的核酸分子;包含一个或多个自由端、无自由端(例如环状)的核酸分子;包含DNA、RNA或两者的核酸分子;和本领域已知的其他多核苷酸变体。载体的一种类型是“质粒”,其是指环状双链DNA环,例如通过标准分子克隆技术,可以在其中插入其他DNA区段。另一种类型的载体是病毒载体,其中载体中存在病毒来源的DNA或RNA序列,用于包装成病毒(例如逆转录病毒、复制缺陷型逆转录病毒、腺病毒、复制缺陷型腺病毒和腺相关病毒)。病毒载体还包括病毒携带的用于转染到宿主细胞中的多核苷酸。某些载体能够在引入它们的宿主细胞中自主复制(例如,具有细菌复制起点的细菌载体和游离型哺乳动物载体)。在引入宿主细胞中后,将其他载体(例如,非游离型哺乳动物载体)整合到宿主细胞的基因组中,从而与宿主基因组一起复制。此外,某些载体能够引导与其可操作连接的基因的表达。此类载体在本文中称为“表达载体”。在真核细胞中表达的载体和导致在真核细胞中表达的载体在本文中可称为“真核表达载体”。在重组DNA技术中有用的常见表达载体通常是质粒的形式。Vectors include, but are not limited to, single-stranded, double-stranded or partially double-stranded nucleic acid molecules; nucleic acid molecules comprising one or more free ends, no free ends (e.g., circular); nucleic acid molecules comprising DNA, RNA or both; and other polynucleotide variants known in the art. One type of vector is a "plasmid", which refers to a circular double-stranded DNA loop, in which other DNA segments can be inserted, for example, by standard molecular cloning techniques. Another type of vector is a viral vector, in which there is a DNA or RNA sequence of viral origin in the vector for packaging into a virus (e.g., a retrovirus, a replication-defective retrovirus, an adenovirus, a replication-defective adenovirus, and an adeno-associated virus). Viral vectors also include polynucleotides carried by viruses for transfection into host cells. Certain vectors are capable of autonomous replication in host cells into which they are introduced (e.g., bacterial vectors and free mammalian vectors with a bacterial origin of replication). After being introduced into the host cell, other vectors (e.g., non-free mammalian vectors) are integrated into the genome of the host cell, thereby replicating together with the host genome. In addition, certain vectors are capable of directing the expression of genes operably connected thereto. Such vectors are referred to herein as "expression vectors". Vectors expressed in eukaryotic cells and vectors that cause expression in eukaryotic cells may be referred to herein as "eukaryotic expression vectors." Common expression vectors useful in recombinant DNA techniques are often in the form of plasmids.

重组表达载体可以适合在宿主细胞中表达核酸的形式包含本公开的核酸,这意味着重组表达载体包含一个或多个调控元件,所述调控元件可根据待用于表达的宿主细胞进行选择,所述核酸可操作地连接至待表达的核酸序列。在重组表达载体内,“可操作地连接”旨在是指目标核苷酸序列以允许核苷酸序列表达的方式(例如,在体外转录/翻译系统中或者当载体被引入宿主细胞中时在宿主细胞中)连接至调控元件。有利的载体包括慢病毒和腺相关病毒,并且也可选择这些载体的类型以靶向特定类型的细胞。The recombinant expression vector can be suitable for the form of expressing nucleic acid in host cells to include nucleic acid of the present disclosure, which means that the recombinant expression vector includes one or more regulatory elements, which can be selected according to the host cell to be used for expression, and the nucleic acid is operably connected to the nucleic acid sequence to be expressed. In the recombinant expression vector, "operably connected" is intended to refer to the target nucleotide sequence to be connected to the regulatory element in a manner that allows the nucleotide sequence to be expressed (for example, in an in vitro transcription/translation system or in a host cell when the vector is introduced into a host cell). Advantageous vectors include slow viruses and adeno-associated viruses, and the types of these vectors can also be selected to target specific types of cells.

术语“宿主细胞”,是指代体内或体外真核细胞、原核细胞或作为单细胞实体培养的来自多细胞生物体的细胞(例如,细胞系),所述真核细胞或原核细胞可用作或已用作核酸(例如,表达载体)的受体,并且包括已通过核酸遗传修饰的原始细胞的子代。应理解由于天然、偶然或有意突变,单细胞的子代可不必在形态或在基因组或总DNA互补序列方面与原始亲本完全相同。“重组宿主细胞”(也称为“遗传修饰的宿主细胞”)是已向其中引入异源核酸(例如,表达载体)的宿主细胞。例如,主题原核宿主细胞是通过将异源核酸引入合适的原核宿主细胞中的遗传修饰的原核宿主细胞(例如,细菌),所述异源核酸是例如对原核宿主细胞外源(通常在自然界中不存在)的外源核酸或通常在原核宿主细胞中不存在的重组核酸;并且主题真核宿主细胞是通过将异源核酸引入合适的真核宿主细胞中的遗传修饰的真核宿主细胞,所述异源核酸是例如对真核宿主细胞外源的外源核酸或通常在真核宿主细胞中不存在的重组核酸。The term "host cell" refers to a cell (e.g., cell line) from a multicellular organism cultured as a unicellular entity, a eukaryotic cell, a prokaryotic cell, or as a unicellular entity in vivo or in vitro, which can be used as or has been used as a receptor for nucleic acids (e.g., expression vectors), and includes progeny of the original cell genetically modified by nucleic acids. It should be understood that due to natural, accidental or intentional mutations, the progeny of the unicellular cell may not necessarily be identical to the original parent in morphology or in terms of genome or total DNA complement sequence. A "recombinant host cell" (also referred to as a "genetically modified host cell") is a host cell into which a heterologous nucleic acid (e.g., expression vector) has been introduced. For example, a subject prokaryotic host cell is a prokaryotic host cell (e.g., bacteria) that has been genetically modified by introducing a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the prokaryotic host cell (not normally found in nature) or a recombinant nucleic acid that is not normally found in a prokaryotic host cell, into a suitable prokaryotic host cell; and a subject eukaryotic host cell is a eukaryotic host cell that has been genetically modified by introducing a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the eukaryotic host cell or a recombinant nucleic acid that is not normally found in a eukaryotic host cell, into a suitable eukaryotic host cell.

术语“氨基酸”,是指二十种常见的天然存在的氨基酸。天然存在的氨基酸包括丙氨酸(Ala,A)、精氨酸(Arg,R)、天冬酰胺(Asn,N)、天冬氨酸(Asp,D)、半胱氨酸(Cys,C)、谷氨酸(Glu,E)、谷氨酰胺(Gln,Q)、甘氨酸(Gly,G)、组氨酸(His,H)、异亮氨酸(Ile,I)、亮氨酸(Leu,L)、赖氨酸(Lys,K)、蛋氨酸(Met,M)、苯丙氨酸(Phe,F)、脯氨酸(Pro,P)、丝氨酸(Ser,S)、苏氨酸(Thr,T)、色氨酸(Trp,W)、酪氨酸(Tyr,Y)和缬氨酸(Val,V)。The term "amino acid" refers to the twenty common naturally occurring amino acids. Naturally occurring amino acids include alanine (Ala, A), arginine (Arg, R), asparagine (Asn, N), aspartic acid (Asp, D), cysteine (Cys, C), glutamic acid (Glu, E), glutamine (Gln, Q), glycine (Gly, G), histidine (His, H), isoleucine (Ile, I), leucine (Leu, L), lysine (Lys, K), methionine (Met, M), phenylalanine (Phe, F), proline (Pro, P), serine (Ser, S), threonine (Thr, T), tryptophan (Trp, W), tyrosine (Tyr, Y) and valine (Val, V).

两条多肽或核酸序列之间的“序列同一性”表示所述序列之间相同的残基的数目占残基总数的百分比,且基于突变类型确定残基总数的计算。突变类型包括在序列任一端或两端的插入(延伸)、在序列任一端或两端的缺失(截短)、一个或多个氨基酸/核苷酸的置换/替代、在序列内部的插入、在序列内部的缺失。举多肽为例(核苷酸同理),如果突变类型为以下中的一种或多种:一个或多个氨基酸/核苷酸的置换/替代、在序列内部的插入和在序列内部的缺失,则残基总数以比较的分子中较大者来计算。如果突变类型还包括在序列任一端或两端的插入(延伸)或在序列任一端或两端的缺失(截短),则在任一端或两端插入或缺失的氨基酸的数量(例如,在两端插入或缺失的数量小于20个)并不计入残基总数中。在计算同一性百分数时,将正在比较的序列以产生序列之间最大匹配的方式比对,通过特定算法解决比对中的空位(如果存在的话)。"Sequence identity" between two polypeptides or nucleic acid sequences means the percentage of the number of identical residues between the sequences to the total number of residues, and the calculation of the total number of residues is determined based on the mutation type. Mutation types include insertions (extensions) at either or both ends of the sequence, deletions (truncations) at either or both ends of the sequence, substitutions/alternations of one or more amino acids/nucleotides, insertions within the sequence, and deletions within the sequence. Taking polypeptides as an example (similar to nucleotides), if the mutation type is one or more of the following: substitutions/alternations of one or more amino acids/nucleotides, insertions within the sequence, and deletions within the sequence, the total number of residues is calculated as the larger of the molecules being compared. If the mutation type also includes insertions (extensions) at either or both ends of the sequence or deletions (truncations) at either or both ends of the sequence, the number of amino acids inserted or deleted at either or both ends (e.g., the number of insertions or deletions at both ends is less than 20) is not counted in the total number of residues. When calculating the percentage of identity, the sequences being compared are aligned in a manner that produces the maximum match between the sequences, and the gaps (if any) in the alignment are resolved by a specific algorithm.

术语“保守氨基酸取代”,指用化学或功能相似的氨基酸置换氨基酸。提供相似氨基酸的保守性置换表是本领域熟知的。举例来说,在一些实施方式中,以下提供的氨基酸组被认为是相互的保守性置换。在一些实施方案中,被认为是相互保守性置换的氨基酸的所选组是:The term "conservative amino acid substitution" refers to the replacement of an amino acid with a chemically or functionally similar amino acid. Conservative substitution tables providing similar amino acids are well known in the art. For example, in some embodiments, the amino acid groups provided below are considered to be conservative substitutions of each other. In some embodiments, the selected groups of amino acids considered to be conservative substitutions of each other are:

表A
Table A

在一些实施方案中,被认为是相互的保守性置换的氨基酸的所选组是:In some embodiments, the selected group of amino acids that are considered to be conservative substitutions for each other are:

表B
Table B

在一个实施方案中,被认为是相互的保守性置换的氨基酸的其他所选组(参见例如,Creighton,《蛋白质(Proteins)》(1984)):In one embodiment, other selected groups of amino acids that are considered conservative substitutions for each other (see, e.g., Creighton, Proteins (1984)):

表C
Table C

在一些实施方案中,被认为是相互的保守性置换的氨基酸的其他所选组:In some embodiments, other selected groups of amino acids that are considered conservative substitutions for each other are:

表D
Table D

术语“治疗(treatment、treating)”是指获得所需的药理学和/或生理学效果。就完全或部分预防疾病或其症状而言,所述效果可以是预防性的,并且/或者就部分或完全治愈疾病和/或可归因于所述疾病的副作用而言,所述效果可以是治疗性的。如本文所用,“治疗”覆盖对哺乳动物(例如,人类)的疾病的任何治疗,并且包括:(1)在可能易患疾病但还未诊断患有所述疾病的受试者中预防疾病发生;(2)抑制疾病,即阻止其发展;和(3)缓解疾病,即引起疾病消退。The terms "treatment" and "treating" refer to obtaining a desired pharmacological and/or physiological effect. The effect may be preventive, in terms of completely or partially preventing a disease or its symptoms, and/or therapeutic, in terms of partially or completely curing a disease and/or side effects attributable to the disease. As used herein, "treatment" covers any treatment of a disease in a mammal (e.g., a human), and includes: (1) preventing the occurrence of a disease in a subject who may be susceptible to the disease but has not yet been diagnosed with the disease; (2) inhibiting the disease, i.e., arresting its development; and (3) relieving the disease, i.e., causing regression of the disease.

术语“个体”、“受试者”、“宿主”和“患者”在本文中可互换使用,是指个体生物体,例如哺乳动物,包括但不限于鼠类、猿、人类、哺乳类农场动物、哺乳类运动动物和哺乳动物宠物。The terms "individual," "subject," "host," and "patient" are used interchangeably herein to refer to an individual organism, such as a mammal, including but not limited to rodents, apes, humans, mammalian farm animals, mammalian sports animals, and mammalian pets.

概述Overview

本公开提供RNA指导的内切核酸酶多肽,在本文中称为“Cas12o”多肽(也称为“Cas12o蛋白”);编码Cas12o蛋白的核酸;以及包含Cas12o蛋白和/或编码Cas12o蛋白的核酸的经修饰的宿主细胞。Cas12o蛋白可用于提供的各种应用中,与其他Cas(例如Cas9或Cas12)相比尺寸较小,更容易递送(可以采用包括AAV或LNP在内的方式递送)。The present disclosure provides RNA-guided endonuclease polypeptides, referred to herein as "Cas12o" polypeptides (also referred to as "Cas12o proteins"); nucleic acids encoding Cas12o proteins; and modified host cells comprising Cas12o proteins and/or nucleic acids encoding Cas12o proteins. Cas12o proteins can be used in various applications provided, are smaller in size than other Cas (e.g., Cas9 or Cas12), and are easier to deliver (can be delivered by means including AAV or LNP).

本公开提供与Cas12o蛋白结合并提供针对Cas12o蛋白的序列特异性的指导RNA(在本文中称为“Cas12o指导RNA”、“指导RNA”、“crRNA”或“向导RNA(gRNA)”);编码Cas12o指导RNA的核酸;以及包含Cas12o指导RNA和/或编码Cas12o指导RNA的核酸的经修饰的宿主细胞。Cas12o指导RNA可用于提供的各种应用中。The present disclosure provides a guide RNA (referred to herein as "Cas12o guide RNA", "guide RNA", "crRNA" or "guide RNA (gRNA)") that binds to a Cas12o protein and provides sequence specificity for the Cas12o protein; a nucleic acid encoding the Cas12o guide RNA; and a modified host cell comprising the Cas12o guide RNA and/or a nucleic acid encoding the Cas12o guide RNA. The Cas12o guide RNA can be used in various applications provided.

Cas12o蛋白Cas12o protein

术语“Cas12o蛋白”包含野生型Cas12o蛋白,其衍生物或变体,及其功能性片段例如寡核苷酸结合片段。The term "Cas12o protein" includes wild-type Cas12o protein, its derivatives or variants, and functional fragments thereof such as oligonucleotide binding fragments.

在一些实施方案中,所述的Cas12o蛋白包含OBD结构域、REC结构域、RuvC结构域、Helical结构域、Nuc结构域。In some embodiments, the Cas12o protein comprises an OBD domain, a REC domain, a RuvC domain, a Helical domain, and a Nuc domain.

在一些实施方案中,所述RuvC结构域包括RuvC-I结构域、RuvC-II结构域、RuvC-III结构域。In some embodiments, the RuvC domain includes a RuvC-I domain, a RuvC-II domain, and a RuvC-III domain.

在一些实施方案中,所述的Cas12o蛋白不包含HNH结构域和PI结构域。In some embodiments, the Cas12o protein does not include a HNH domain and a PI domain.

在一些实施方案中,所述Cas蛋白为2类V型Cas核酸内切酶。In some embodiments, the Cas protein is a class 2 type V Cas endonuclease.

在一些实施方案中,所述OBD结构域为二分裂结构域(bi-split domain),包括不连续的OBD-I结构域、OBD-II结构域。In some embodiments, the OBD domain is a bi-split domain, including discontinuous OBD-I domain and OBD-II domain.

在一些示例性实施方案中,Cas12o蛋白的大小介于500与1200个氨基酸之间,大小介于500与1100个氨基酸之间,大小介于700与1100个氨基酸之间,大小介于900与1000个氨基酸之间,大小变化可能部分取决于Cas12o或其同系物的特定结构域架构。In some exemplary embodiments, the size of the Cas12o protein is between 500 and 1200 amino acids, between 500 and 1100 amino acids, between 700 and 1100 amino acids, and between 900 and 1000 amino acids, and the size variation may depend in part on the specific domain architecture of Cas12o or its homologs.

Cas12o蛋白可源自天然存在的蛋白质、修饰的天然存在的蛋白质、其功能片段或截短型式,或非天然存在的蛋白质。在一个实施方案中,Cas12o蛋白可包含一个或多个源自其他Cas12o蛋白核酸酶,更特别地源自不同生物体的结构域。在一个实施方案中,Cas12o蛋白核酸酶可通过计算机方法设计。计算机蛋白质设计的实例在本领域中已有描述,因此是技术人员已知的。在特定实施方案中,Cas12o蛋白基因座不与CRISPR阵列相关联。The Cas12o protein may be derived from a naturally occurring protein, a modified naturally occurring protein, a functional fragment or truncated version thereof, or a non-naturally occurring protein. In one embodiment, the Cas12o protein may comprise one or more domains derived from other Cas12o protein nucleases, more particularly from different organisms. In one embodiment, the Cas12o protein nuclease may be designed by a computer method. Examples of computer protein design have been described in the art and are therefore known to the skilled person. In a particular embodiment, the Cas12o protein locus is not associated with a CRISPR array.

Cas12o蛋白还可涵盖其序列在本文中具体描述的Cas12o蛋白的同系物或直系同源物。直系同源蛋白质可以但不必在结构上相关或者仅在结构上部分相关。在一个实施方案中,如本文所提及的Cas12o蛋白的同系物或直系同源物与Cas12o蛋白核酸酶具有至少80%、至少85%、至少90%、至少95%、至少99%的序列同源性或同一性。在另外的实施方案中,Cas12o蛋白核酸酶的同系物或直系同源物与野生型Cas12o蛋白核酸酶具有至少80%、至少85%、至少90%或至少95%的序列同一性。Cas12o protein may also encompass homologs or orthologs of the Cas12o protein whose sequence is specifically described herein. Orthologous proteins may, but need not be structurally related or only partially related in structure. In one embodiment, a homolog or ortholog of the Cas12o protein as mentioned herein has at least 80%, at least 85%, at least 90%, at least 95%, at least 99% sequence homology or identity with the Cas12o protein nuclease. In another embodiment, a homolog or ortholog of the Cas12o protein nuclease has at least 80%, at least 85%, at least 90% or at least 95% sequence identity with the wild-type Cas12o protein nuclease.

在一些实施方案中,所述Cas蛋白包括与SEQ ID NO:1、3、5、7-9中的任一者具有至少70%、至少75%、至少80%或至少90%序列同一性的序列。示例性地,Cas12o蛋白包含SEQ ID NO.1、3、5、7-9任一项所示的氨基酸序列,其中,SEQ ID NO.7-9为Cas12o的功能性片段。In some embodiments, the Cas protein comprises a sequence having at least 70%, at least 75%, at least 80%, or at least 90% sequence identity to any one of SEQ ID NO: 1, 3, 5, 7-9. Exemplarily, the Cas12o protein comprises an amino acid sequence as shown in any one of SEQ ID NO. 1, 3, 5, 7-9, wherein SEQ ID NO. 7-9 is a functional fragment of Cas12o.

Cas12o变体Cas12o variants

Cas12o蛋白可包含一个或多个修饰。术语“修饰的”一般是指与来源野生型对应物相比具有一个或多个修饰或突变(包括点突变、截短、插入、缺失、嵌合体、融合蛋白等)的Cas12o蛋白的变体(Cas12o变体)核酸酶。所谓衍生的,是指在与野生型酶具有高度序列同源性的意义上,衍生酶主要基于野生型酶,但是已经以本领域已知或如本文所述的某种方式对衍生酶进行了突变(修饰)。The Cas12o protein may comprise one or more modifications. The term "modified" generally refers to a variant (Cas12o variant) nuclease of a Cas12o protein having one or more modifications or mutations (including point mutations, truncations, insertions, deletions, chimeras, fusion proteins, etc.) compared to the source wild-type counterpart. The so-called derivative means that the derivative enzyme is mainly based on the wild-type enzyme in the sense of having a high degree of sequence homology with the wild-type enzyme, but the derivative enzyme has been mutated (modified) in a manner known in the art or as described herein.

在一些实施方案中,Cas12o蛋白的衍生酶包括实质性缺乏催化活性的Cas12o蛋白(deadCas12o,dCas12o)或者具有单链切割能力的Cas12o切口酶(nicklase Cas12o,nCas12o)。In some embodiments, the derivative enzyme of the Cas12o protein includes a Cas12o protein substantially lacking catalytic activity (deadCas12o, dCas12o) or a Cas12o nickase (nicklase Cas12o, nCas12o) having single-stranded cutting ability.

与野生型对应物核酸酶相比,dCas12o可具有降低的核酸酶活性或没有核酸酶活性(保留小于50%(例如小于任何约40%、35%、30%、27.5%、25%、22.5%、20%、17.5%、15%、12.5%、10%、7.5%、25 5%、4%、3%、2.5%、2%、1%或更低的)对应的原始Cas12o蛋白(例如包含SEQ ID NOs:1、3、5、7-9中任一氨基酸序列的新型Cas12o蛋白)或其变体。一个实例可以是当突变形式的核酸切割活性与非突变形式相比是零或可忽略不计时。Cas12o蛋白可参考与具有来自I型、II型、III型、IV型、V型或VI型CRISPR系统的多个核酸酶结构域的最大核酸酶具有同源性的酶的一般类别来鉴定。Compared to the wild-type counterpart nuclease, dCas12o may have reduced nuclease activity or no nuclease activity (retaining less than 50% (e.g., less than any about 40%, 35%, 30%, 27.5%, 25%, 22.5%, 20%, 17.5%, 15%, 12.5%, 10%, 7.5%, 25 5%, 4%, 3%, 2.5%, 2%, 1% or less) of the corresponding original Cas12o protein (e.g., a novel Cas12o protein comprising any of the amino acid sequences of SEQ ID NOs: 1, 3, 5, 7-9) or a variant thereof. An example may be when the nucleic acid cleavage activity of the mutant form is zero or negligible compared to the non-mutant form. The Cas12o protein can be identified with reference to the general class of enzymes having homology to the largest nuclease having multiple nuclease domains from a type I, type II, type III, type IV, type V or type VI CRISPR system.

在一些情况下,无催化活性的或死亡的核酸酶可具有切口酶(nCas)活性。在一些情况下,无催化活性的或死亡的核酸酶可不具有切口酶活性。这种无催化活性的或死亡的核酸酶可能不会在靶多核苷酸上造成双链或单链断裂,但仍可能与所述靶多核苷酸结合或以其他方式形成复合物。In some cases, a catalytically inactive or dead nuclease may have nickase (nCas) activity. In some cases, a catalytically inactive or dead nuclease may not have nickase activity. Such a catalytically inactive or dead nuclease may not cause double-stranded or single-stranded breaks on the target polynucleotide, but may still bind to the target polynucleotide or otherwise form a complex.

如本文所述的,通过对Cas12o蛋白修饰获得的具有单链切割能力的Cas12o切口酶(nicklase Casl2o,nCas12o),一个或多个氨基酸突变被引入Cas12o蛋白中以使其具有切割双链DNA的一条链的切口酶单链DNA切割活性。As described in this article, the Cas12o nickase (nicklase Cas12o, nCas12o) with single-stranded cutting ability is obtained by modifying the Cas12o protein, and one or more amino acid mutations are introduced into the Cas12o protein to enable it to have a nickase single-stranded DNA cutting activity that cuts one strand of the double-stranded DNA.

在一个实施方案中,Cas12o蛋白的修饰可能会或可能不会导致功能性改变。举例来讲,不导致功能性改变的修饰包括例如针对表达到特定宿主中进行密码子优化,或向核酸酶提供特定标志物(例如用于可视化)。可能导致功能性改变的修饰还可以包括突变,包括点突变、插入、缺失、截短(包括拆分核酸酶)等,以及嵌合核酸酶(例如包含来自不同直系同源物或同系物的结构域)或融合蛋白。嵌合酶可以包含第一片段和第二片段,并且片段可以是某一属或某一种的生物体的Cas12o蛋白核酸酶直系同源物的片段,例如,所述片段来自不同种的Cas12o蛋白核酸酶直系同源物。In one embodiment, the modification of Cas12o protein may or may not result in functional changes. For example, modifications that do not result in functional changes include, for example, codon optimization for expression in a specific host, or providing specific markers to the nuclease (e.g., for visualization). Modifications that may result in functional changes may also include mutations, including point mutations, insertions, deletions, truncations (including split nucleases), etc., and chimeric nucleases (e.g., comprising domains from different orthologs or homologs) or fusion proteins. The chimeric enzyme may include a first fragment and a second fragment, and the fragment may be a fragment of a Cas12o protein nuclease ortholog of a genus or a species of an organism, for example, the fragment is from a Cas12o protein nuclease ortholog of different species.

在一个实施方案中,Cas12o蛋白的核酸酶结构域是无催化活性的,或被修饰成无催化活性的,或被修饰成切口酶的。在一个实施方案中,两个核酸酶结构域均是无催化活性的。In one embodiment, the nuclease domain of the Cas12o protein is catalytically inactive, or is modified to be catalytically inactive, or is modified to be a nickase. In one embodiment, both nuclease domains are catalytically inactive.

在一个实施方案中,Cas12o蛋白核酸酶可包含一个或多个导致增强的活性和/或特异性的修饰,诸如包括使靶向或非靶向链稳定的突变残基。在一个实施方案中,工程化Cas12o蛋白的改变或修饰的活性包括增加的靶向效率或减少的脱靶结合。在一个实施方案中,工程化Cas12o蛋白核酸酶的改变的活性包括修改的切割活性。在一个实施方案中,改变的活性包括对靶多核苷酸基因座的增加的切割活性。在一个实施方案中,改变的活性包括对靶多核苷酸基因座的降低的切割活性。在一个实施方案中,改变的活性包括对脱靶多核苷酸基因座的降低的切割活性。在一个实施方案中,修饰的核酸酶的改变的或修改的活性包括改变的解旋酶动力学。在一个实施方案中,修饰的核酸酶包含改变蛋白质与包含RNA的核酸分子、或靶多核苷酸基因座的链、或脱靶多核苷酸的链的缔合的修饰。在本公开的一个方面,工程化Cas12o蛋白核酸酶包含改变Cas12o蛋白核酸酶和相关复合物的形成的修饰。在一个实施方案中,改变的活性包括对脱靶多核苷酸基因座的增加的切割活性。因此,在一个实施方案中,相较于脱靶多核苷酸基因座,对靶多核苷酸基因座的特异性增加。在其他实施方案中,相较于脱靶多核苷酸基因座,对靶多核苷酸基因座的特异性降低。在一个实施方案中,突变导致脱靶效应(例如切割或结合特性、活性或动力学)降低,例如导致对靶标与crRNA之间的错配的耐受性降低。其他突变可能导致脱靶效应(例如,切割或结合特性、活性或动力学)增加。其他突变可能导致中靶效应(例如,切割或结合特性、活性或动力学)增加或降低。在一个实施方案中,突变导致功能核酸酶复合物的改变的(例如,增加或降低的)解旋酶活性、缔合或形成。在一个实施方案中,突变导致PAM识别改变,即相较于未修饰的Cas12o蛋白核酸酶,可能(另外地或替代地)识别不同的PAM。In one embodiment, the Cas12o protein nuclease may include one or more modifications that lead to enhanced activity and/or specificity, such as including a mutant residue that stabilizes the targeting or non-targeting chain. In one embodiment, the altered or modified activity of the engineered Cas12o protein includes increased targeting efficiency or reduced off-target binding. In one embodiment, the altered activity of the engineered Cas12o protein nuclease includes a modified cleavage activity. In one embodiment, the altered activity includes an increased cleavage activity to the target polynucleotide locus. In one embodiment, the altered activity includes a reduced cleavage activity to the target polynucleotide locus. In one embodiment, the altered activity includes a reduced cleavage activity to the off-target polynucleotide locus. In one embodiment, the altered or modified activity of the modified nuclease includes altered helicase kinetics. In one embodiment, the modified nuclease includes a modification that changes the association of a protein with a nucleic acid molecule comprising RNA, or a chain of a target polynucleotide locus, or a chain of an off-target polynucleotide. In one aspect of the present disclosure, the engineered Cas12o protein nuclease includes a modification that changes the formation of the Cas12o protein nuclease and the associated complex. In one embodiment, the activity of the change includes the increased cleavage activity to the off-target polynucleotide locus.Therefore, in one embodiment, compared to the off-target polynucleotide locus, the specificity of the target polynucleotide locus is increased.In other embodiments, compared to the off-target polynucleotide locus, the specificity of the target polynucleotide locus is reduced.In one embodiment, mutation causes off-target effect (such as cutting or binding properties, activity or kinetics) to be reduced, for example, causing the tolerance of the mismatch between the target and crRNA to be reduced.Other mutations may cause off-target effect (for example, cutting or binding properties, activity or kinetics) to increase.Other mutations may cause the on-target effect (for example, cutting or binding properties, activity or kinetics) to increase or decrease.In one embodiment, mutation causes the change (for example, increase or decrease) helicase activity, association or formation of the functional nuclease complex.In one embodiment, mutation causes PAM recognition to change, i.e., compared to unmodified Cas12o protein nuclease, it is possible to (additionally or alternatively) recognize different PAMs.

在一个实施方案中,Cas12o蛋白可被引导至靶序列的位置或附近处,诸如在靶序列内和/或在靶序列的互补序列内或在与靶序列缔合的序列处的一条或两条DNA链的切割。在一个实施方案中,Cas12o蛋白可引导在从靶序列的第一个或最后一个核苷酸起约1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、15个、20个、25个、50个、100个、200个、300个、400个、500个或更多个碱基对或核苷酸内的一条或两条DNA链的切割。在一个实施方案中,Cas12o的切割位置为从靶序列的第一个核苷酸起约12-19个核苷酸对两条DNA链的切割。在一个实施方案中,切割可以是交错的,即产生粘性末端。在一个实施方案中,切割是交错切口,具有5'突出端。在一个实施方案中,切割是交错切口,具有1至15个核苷酸,优选地4或9个核苷酸的5'突出端。In one embodiment, the Cas12o protein can be guided to the position or vicinity of the target sequence, such as within the target sequence and/or within the complementary sequence of the target sequence or at the cutting of one or two DNA chains at the sequence associated with the target sequence. In one embodiment, the Cas12o protein can guide the cutting of one or two DNA chains within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 300, 400, 500 or more base pairs or nucleotides from the first or last nucleotide of the target sequence. In one embodiment, the cutting position of Cas12o is about 12-19 nucleotides from the first nucleotide of the target sequence to the cutting of two DNA chains. In one embodiment, the cutting can be staggered, that is, sticky ends are produced. In one embodiment, the cutting is a staggered cut with a 5' overhang. In one embodiment, the cleavage is a staggered nick with a 5' overhang of 1 to 15 nucleotides, preferably 4 or 9 nucleotides.

在一个实施方案中,切割位点远离靶标相邻基序(TAM),其与本文的术语“PAM”可互换使用,例如,切割发生在非靶链上的第n个核苷酸之后和靶向链上的核苷酸之后。在一个实施方案中,切割位点发生在非靶链上的已鉴定核苷酸(从PAM计算)之后和靶向链上进一步鉴定的核苷酸(从PAM计算)之后。在一个实施方案中,载体编码靶向核酸的效应蛋白,可相对于相应的野生型酶使所述靶向核酸的效应蛋白突变,使得所述突变的靶向核酸的效应蛋白缺乏切割含有靶序列的靶多核苷酸的一条或两条DNA和RNA链的能力。In one embodiment, the cleavage site is distal to the target adjacent motif (TAM), which is used interchangeably with the term "PAM" herein, e.g., cleavage occurs after the nth nucleotide on the non-target strand and after the nucleotide on the targeted strand. In one embodiment, the cleavage site occurs after an identified nucleotide on the non-target strand (calculated from the PAM) and after a further identified nucleotide on the targeted strand (calculated from the PAM). In one embodiment, the vector encodes a nucleic acid-targeting effector protein, which can be mutated relative to the corresponding wild-type enzyme, such that the mutated nucleic acid-targeting effector protein lacks the ability to cleave one or both DNA and RNA strands of a target polynucleotide containing a target sequence.

本公开的参考Cas12o蛋白按照N端-C端的顺序依次包含REC-I结构域、REC-II结构域、OBD结构域、RuvC-I结构域、Helical结构域、RuvC-II结构域、Nuc-I结构域、RuvC-III结构域和Nuc-II结构域(图2A)。在部分情况下,寡核苷酸结合域(OBD)为二分裂结构域(bi-split domain),包括不连续的OBD-I结构域和OBD-II结构域,在该情况下,OBD-I位于Cas12o蛋白的N端,Cas12o按照N端-C端的顺序依次包含OBD-I结构域、REC-I结构域、REC-II结构域、OBD-II结构域、RuvC-I结构域、Helical结构域、RuvC-II结构域、Nuc-I结构域、RuvC-III结构域和Nuc-II结构域(图2C)。The reference Cas12o protein disclosed herein comprises a REC-I domain, a REC-II domain, an OBD domain, a RuvC-I domain, a Helical domain, a RuvC-II domain, a Nuc-I domain, a RuvC-III domain, and a Nuc-II domain in the order of N-terminus-C-terminus (FIG. 2A). In some cases, the oligonucleotide binding domain (OBD) is a bi-split domain, including discontinuous OBD-I domains and OBD-II domains, in which case OBD-I is located at the N-terminus of the Cas12o protein, and Cas12o comprises an OBD-I domain, a REC-I domain, a REC-II domain, an OBD-II domain, a RuvC-I domain, a Helical domain, a RuvC-II domain, a Nuc-I domain, a RuvC-III domain, and a Nuc-II domain in the order of N-terminus-C-terminus (FIG. 2C).

OBD结构域(寡核苷酸结合域)OBD domain (oligonucleotide binding domain)

本公开的参考Cas12o蛋白包含寡核苷酸结合域(OBD)。除Cas12o以外的某些Cas蛋白具有可以类似方式命名的域。然而,在一些实施例中,OBD包含一种或多种独特功能特征,或包含相对于Cas12o蛋白独特的序列,或其组合。The reference Cas12o protein of the present disclosure comprises an oligonucleotide binding domain (OBD). Certain Cas proteins other than Cas12o have domains that can be named in a similar manner. However, in some embodiments, the OBD comprises one or more unique functional features, or comprises a sequence unique to the Cas12o protein, or a combination thereof.

在一个实施方案中,所述OBD结构域包含OBD-I结构域和OBD-II结构域,如图2C所示的OBD结构域分布,示例性的,OBD-I结构域包含SEQ ID NO:5中的第1-15位的氨基酸序列,OBD-II结构域包含SEQ ID NO:5的第333-480位的氨基酸序列。In one embodiment, the OBD domain comprises an OBD-I domain and an OBD-II domain, as shown in the OBD domain distribution in Figure 2C. Exemplarily, the OBD-I domain comprises the amino acid sequence at positions 1-15 in SEQ ID NO:5, and the OBD-II domain comprises the amino acid sequence at positions 333-480 in SEQ ID NO:5.

在一个实施方案中,所述OBD结构域仅包含单一结构域,示例性的,如图2A所示的OBD结构域分布。在该种情况下,示例性的OBD结构域如SEQ ID NO:1中第359-474位的氨基酸(OBD-II)或SEQ ID NO:3中第337-452位的氨基酸序列(OBD-II)所示。In one embodiment, the OBD domain comprises only a single domain, exemplarily, the OBD domain distribution as shown in Figure 2A. In this case, the exemplary OBD domain is shown as amino acids 359-474 (OBD-II) in SEQ ID NO: 1 or amino acids 337-452 (OBD-II) in SEQ ID NO: 3.

RuvC结构域RuvC domain

本公开的参考Cas12o蛋白包含RuvC域,其包括三分裂RuvC结构域(tri-split RuvC domain),包括3个不连续的RuvC域(RuvC-I、RuvC-II和RuvC-III结构域)。RuvC域为所有12型CRISPR蛋白的祖先域。RuvC域源自TnpB(转座酶B)样转座酶。与其它RuvC域类似,Cas12o RuvC域具有负责配位镁(Mg)离子和裂解DNA的DED催化三联体。The reference Cas12o protein disclosed herein comprises a RuvC domain, which includes a tri-split RuvC domain, including 3 discontinuous RuvC domains (RuvC-I, RuvC-II and RuvC-III domains). The RuvC domain is the ancestral domain of all type 12 CRISPR proteins. The RuvC domain is derived from a TnpB (transposase B)-like transposase. Similar to other RuvC domains, the Cas12o RuvC domain has a DED catalytic triad responsible for coordinating magnesium (Mg) ions and cleaving DNA.

示例性的,RuvC-I结构域包含SEQ ID NO:1中第475-563位的氨基酸序列、SEQ ID NO:3中第453-536位的氨基酸序列、SEQ ID NO:5中第481-560位的氨基酸序列,RuvC-II结构域包含SEQ ID NO:1中第766-818位的氨基酸序列、SEQ ID NO:3中第732-784位的氨基酸序列、SEQ ID NO:5中第758-812位的氨基酸序列,RuvC-III结构域包含SEQ ID NO:1中第854-869位的氨基酸序列、SEQ ID NO:3中第817-832位的氨基酸序列、SEQ ID NO:5中第845-860位的氨基酸序列。Exemplarily, the RuvC-I domain comprises the amino acid sequence of positions 475-563 in SEQ ID NO:1, the amino acid sequence of positions 453-536 in SEQ ID NO:3, and the amino acid sequence of positions 481-560 in SEQ ID NO:5, the RuvC-II domain comprises the amino acid sequence of positions 766-818 in SEQ ID NO:1, the amino acid sequence of positions 732-784 in SEQ ID NO:3, and the amino acid sequence of positions 758-812 in SEQ ID NO:5, and the RuvC-III domain comprises the amino acid sequence of positions 854-869 in SEQ ID NO:1, the amino acid sequence of positions 817-832 in SEQ ID NO:3, and the amino acid sequence of positions 845-860 in SEQ ID NO:5.

REC结构域REC domain

REC(识别)结构域包含至少一个REC域(例如REC-I结构域和任选地REC-II域),REC结构域被认为与crRNA的重复:抗重复双链体相互作用并且介导Cas蛋白/crRNA复合物的形成。The REC (recognition) domain comprises at least one REC domain (e.g., a REC-I domain and optionally a REC-II domain), which is believed to interact with the repeat: anti-repeat duplex of crRNA and mediate the formation of the Cas protein/crRNA complex.

本公开的参考Cas12o蛋白包含REC结构域,其包含从N端到C端的第一REC结构域(REC-I)、第二REC结构域(REC-II)。示例性的,REC-I结构域包含SEQ ID NO:1中第1-165位的氨基酸序列、SEQ ID NO:3中第1-183位的氨基酸序列、SEQ ID NO:5中第16-196位的氨基酸序列,REC-II结构域包含SEQ ID NO:1中第166-358位的氨基酸序列、SEQ ID NO:3中第184-336位的氨基酸序列、SEQ ID NO:5中第197-332位的氨基酸序列。The reference Cas12o protein disclosed herein comprises a REC domain, which comprises a first REC domain (REC-I) and a second REC domain (REC-II) from the N-terminus to the C-terminus. Exemplarily, the REC-I domain comprises the amino acid sequence of positions 1-165 in SEQ ID NO: 1, the amino acid sequence of positions 1-183 in SEQ ID NO: 3, and the amino acid sequence of positions 16-196 in SEQ ID NO: 5, and the REC-II domain comprises the amino acid sequence of positions 166-358 in SEQ ID NO: 1, the amino acid sequence of positions 184-336 in SEQ ID NO: 3, and the amino acid sequence of positions 197-332 in SEQ ID NO: 5.

Helical结构域Helical domain

本公开的参考Cas12o包含Helical结构域,示例性的,Helical结构域包含SEQ ID NO:1中第564-765位的氨基酸序列、SEQ ID NO:3中第537-731位的氨基酸序列、SEQ ID NO:5中第561-757位的氨基酸序列。The reference Cas12o disclosed herein comprises a Helical domain. Exemplarily, the Helical domain comprises the amino acid sequence at positions 564-765 in SEQ ID NO:1, the amino acid sequence at positions 537-731 in SEQ ID NO:3, and the amino acid sequence at positions 561-757 in SEQ ID NO:5.

Nuc结构域Nuc domain

Nuc结构域被认为与靶链的切割(Yamano等人,Cell[细胞]2016,165:949-962)有关。在其他Cas12蛋白中的其他突变研究表明Nuc结构域有助于指导物和靶结合(Swarts等人,Mol Cell[分子细胞]2017,66:221-233)。本公开的参考Cas12o包含Nuc结构域(包括Nuc-I结构域和Nuc-II结构域),示例性的,Nuc-I结构域包含SEQ ID NO:1中第819-853位的氨基酸序列、SEQ ID NO:3中第785-816位的氨基酸序列、SEQ ID NO:5中第813-844位的氨基酸序列,Nuc-II结构域包含SEQ ID NO:1中第870-984位的氨基酸序列、SEQ ID NO:3中第833-954位的氨基酸序列、SEQ ID NO:5中第861-966位的氨基酸序列。The Nuc domain is thought to be involved in target strand cleavage (Yamano et al., Cell 2016, 165: 949-962). Other mutational studies in other Cas12 proteins have shown that the Nuc domain contributes to guide and target binding (Swarts et al., Mol Cell 2017, 66: 221-233). The reference Cas12o disclosed herein comprises a Nuc domain (including a Nuc-I domain and a Nuc-II domain), exemplarily, the Nuc-I domain comprises the amino acid sequence at positions 819-853 in SEQ ID NO:1, the amino acid sequence at positions 785-816 in SEQ ID NO:3, and the amino acid sequence at positions 813-844 in SEQ ID NO:5, and the Nuc-II domain comprises the amino acid sequence at positions 870-984 in SEQ ID NO:1, the amino acid sequence at positions 833-954 in SEQ ID NO:3, and the amino acid sequence at positions 861-966 in SEQ ID NO:5.

示例性的,本公开的Cas12o的结构域及氨基酸位置如表1所示:Exemplarily, the structural domains and amino acid positions of Cas12o disclosed herein are shown in Table 1:

表1

Table 1

指导RNA(crRNA,sgRNA)Guide RNA (crRNA, sgRNA)

在本文中,术语“指导RNA”可与指导分子、向导RNA、gRNA或crRNA等互换地使用,其是指基于核酸的分子,包括但不限于能够与CRISPR-Cas蛋白质形成复合物的基于RNA的分子(例如,直接重复(Direct Repeat,DR)序列),并包含与靶核酸序列具有足够互补性以与靶核酸序列杂交并引导复合物与靶核酸序列的序列特异性结合的靶向序列(例如,间隔(Spacer)序列)。As used herein, the term "guide RNA" is used interchangeably with guide molecules, guide RNA, gRNA or crRNA, etc., and refers to nucleic acid-based molecules, including but not limited to RNA-based molecules (e.g., direct repeat (DR) sequences) that are capable of forming a complex with a CRISPR-Cas protein and contain a targeting sequence (e.g., a spacer sequence) that is sufficiently complementary to a target nucleic acid sequence to hybridize with the target nucleic acid sequence and guide sequence-specific binding of the complex to the target nucleic acid sequence.

在一些实施方案中,靶向序列与靶核酸的靶位点之间的互补性百分比为60%或更高(例如,65%或更高、70%或更高、75%或更高、80%或更高、85%或更高、90%或更高、95%或更高、97%或更高、98%或更高、99%或更高或者100%)。在一些情况下,靶向序列与靶核酸的靶位点之间的互补性百分比为80%或更高(例如,85%或更高、90%或更高、95%或更高、97%或更高、98%或更高、99%或更高或者100%)。在一些情况下,靶向序列与靶核酸的靶位点之间的互补性百分比为90%或更高(例如,95%或更高、97%或更高、98%或更高、99%或更高或者100%)。在一些情况下,靶向序列与靶核酸的靶位点之间的互补性百分比为100%。In some embodiments, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 60% or more (e.g., 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%). In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 100%.

在一些实施方案中,靶向序列与靶核酸的靶位点之间的互补性百分比在靶核的酸靶位点最3'端的七个连续核苷酸上为100%。In some embodiments, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 100% over the seven consecutive nucleotides most 3' to the target site of the target nucleic acid.

在一些实施方案中,靶向序列与靶核酸的靶位点之间的互补性百分比在17个或更多个(例如,17个或更多个、18个或更多个、19个或更多个、20个或更多个、21个或更多个、22个或更多个)连续核苷酸上为60%或更高(例如,70%或更高、75%或更高、80%或更高、85%或更高、90%或更高、95%或更高、97%或更高、98%或更高、99%或更高或者100%)。在一些情况下,靶向序列与靶核酸的靶位点之间的互补性百分比在17个或更多个(例如,17个或更多个、18个或更多个、19个或更多个、20个或更多个、21个或更多个、22个或更多个)连续核苷酸上为80%或更高(例如,85%或更高、90%或更高、95%或更高、97%或更高、98%或更高、99%或更高或者100%)。在一些情况下,靶向序列与靶核酸的靶位点之间的互补性百分比在17个或更多个(例如,17个或更多个、18个或更多个、19个或更多个、20个或更多个、21个或更多个、22个或更多个)连续核苷酸上为90%或更高(例如,95%或更高、97%或更高、98%或更高、99%或更高或者100%)。在一些情况下,指导序列与靶核酸的靶位点之间的互补性百分比在17个或更多个(例如,17个或更多个、18个或更多个、19个或更多个、20个或更多个、21个或更多个、22个或更多个)连续核苷酸上为100%。In some embodiments, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 17 or more, 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 17 or more, 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 17 or more, 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides. In some cases, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 17 or more (e.g., 17 or more, 18 or more, 19 or more, 20 or more, 21 or more, 22 or more) consecutive nucleotides.

在一些情况下,靶向序列与靶核酸的靶位点之间的互补性百分比在19-25个连续核苷酸上为60%或更高(例如,70%或更高、75%或更高、80%或更高、85%或更高、90%或更高、95%或更高、97%或更高、98%或更高、99%或更高或者100%)。在一些情况下,靶向序列与靶核酸的靶位点之间的互补性百分比在19-25个连续核苷酸上为80%或更高(例如,85%或更高、90%或更高、95%或更高、99%或更高、97%或更高、98%或更高、99%或更高或者100%)。在一些情况下,靶向序列与靶核酸的靶位点之间的互补性百分比在19-25个连续核苷酸上为90%或更高(例如,95%或更高、97%或更高、98%或更高、99%或更高或者100%)。在一些情况下,靶向序列与靶核酸的靶位点之间的互补性百分比在19-25个连续核苷酸上为100%。In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 60% or more (e.g., 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 consecutive nucleotides. In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 99% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 consecutive nucleotides. In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%) over 19-25 consecutive nucleotides. In some cases, the percent complementarity between the targeting sequence and the target site of the target nucleic acid is 100% over 19-25 consecutive nucleotides.

在一些情况下,靶向序列具有在19-30个核苷酸(nt)(例如,19-25个、19-22个、19-20个、20-30个、20-25个或20-22个nt)的范围内的长度。在一些情况下,靶向序列具有在19-25个核苷酸(nt)(例如,19-22个、19-20个、20-25个、20-25个或20-22个nt)的范围内的长度。在一些情况下,靶向序列具有19或更多个nt(例如,20个或更多个、21个或更多个、或者22个或更多个nt;19个nt、20个nt、21个nt、22个nt、23个nt、24个nt、25个nt等)的长度。在一些情况下,靶向序列具有19个nt的长度。在一些情况下,靶向序列具有20个nt的长度。在一些情况下,指导序列具有21个nt的长度。在一些情况下,指导序列具有22个nt的长度。在一些情况下,指导序列具有23个nt的长度。In some cases, the targeting sequence has a length in the range of 19-30 nucleotides (nt) (e.g., 19-25, 19-22, 19-20, 20-30, 20-25, or 20-22 nt). In some cases, the targeting sequence has a length in the range of 19-25 nucleotides (nt) (e.g., 19-22, 19-20, 20-25, 20-25, or 20-22 nt). In some cases, the targeting sequence has a length of 19 or more nt (e.g., 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.). In some cases, the targeting sequence has a length of 19 nt. In some cases, the targeting sequence has a length of 20 nt. In some cases, the guide sequence has a length of 21 nt. In some cases, the guide sequence has a length of 22 nt. In some cases, the guide sequence has a length of 23 nt.

在一些实施方案中,Cas12o的crRNA包含、或者基本上由以下组成、或者由以下组成:直接重复(DR)序列和间隔(Spacer)序列。在一些实施方案中,所述crRNA包含、或者基本上由以下组成、或者由以下组成:与间隔序列连接的直接重复序列。在一些实施方案中,所述crRNA包含直接重复序列、间隔序列和直接重复序列(DR-Spacer-DR)。这是前体crRNA(pre-crRNA)构型的典型特征。在一些实施方式中,所述crRNA包含直接重复序列、间隔序列、直接重复序列和间隔序列(DR-Spacer-DR-Spacer)。在一些实施方案中,所述crRNA包含两个或更多个直接重复序列和两个或更多个间隔序列。在一些实施方案中,所述crRNA包括截短的直接重复序列,以及间隔序列。这是经加工的或成熟的crRNA的典型特征。在一些实施方式中,所述CRISPR-Cas12o效应蛋白与crRNA形成复合物,并且所述间隔序列将所述复合物引导至与靶核酸进行序列特异性结合,所述靶核酸与间隔序列互补。In some embodiments, the crRNA of Cas12o comprises, or is essentially composed of, or is composed of: a direct repeat (DR) sequence and a spacer (Spacer) sequence. In some embodiments, the crRNA comprises, or is essentially composed of, or is composed of: a direct repeat sequence connected to a spacer sequence. In some embodiments, the crRNA comprises a direct repeat sequence, a spacer sequence, and a direct repeat sequence (DR-Spacer-DR). This is a typical feature of the precursor crRNA (pre-crRNA) configuration. In some embodiments, the crRNA comprises a direct repeat sequence, a spacer sequence, a direct repeat sequence, and a spacer sequence (DR-Spacer-DR-Spacer). In some embodiments, the crRNA comprises two or more direct repeat sequences and two or more spacer sequences. In some embodiments, the crRNA includes a truncated direct repeat sequence, and a spacer sequence. This is a typical feature of a processed or mature crRNA. In some embodiments, the CRISPR-Cas12o effector protein forms a complex with the crRNA, and the spacer sequence guides the complex to sequence-specific binding with a target nucleic acid, and the target nucleic acid is complementary to the spacer sequence.

任何可以介导本文所述Cas12o蛋白与相对应的crRNA结合的DR序列均可用于本公开。Any DR sequence that can mediate the binding of the Cas12o protein described herein to the corresponding crRNA can be used in the present disclosure.

在一个实施方案中,本公开的Cas12o的一般DR序列包含5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3',其中区段R1a和R1b是反向互补序列并形成第一茎(R1),所述第一茎(R1)具有在Cas12o中的多个(2个、或3个、或4个、或5个、或6个、或7个、或8个、或9个、或10个)核苷酸对;区段Ba和Bb不相互碱基配对,并形成凸起(B);区段R2a和R2b是反向互补序列并形成第二茎(R2),所述第二茎(R2)具有在多个(2个、或3个、或4个、或5个、或6个、或7个、或8个、或9个、或10个)碱基对;并且L为第二茎部处形成的、有多个(3个、4个、5个、6个、7个、8个、9个、10个)核苷酸形成的环。In one embodiment, the general DR sequence of Cas12o of the present disclosure comprises 5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3', wherein segments R1a and R1b are reverse complementary sequences and form a first stem (R1), wherein the first stem (R1) has a plurality of (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) nucleotide pairs in Cas12o; segment Ba and Bb do not base pair with each other and form a bulge (B); segments R2a and R2b are reverse complementary sequences and form a second stem (R2), which has multiple (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) base pairs; and L is a loop formed at the second stem and formed by multiple (3, 4, 5, 6, 7, 8, 9, 10) nucleotides.

在一个实施方案中,所述的DR序列包含5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3',其中区段R1a和R1b是反向互补序列并形成第一茎(R1),所述第一茎(R1)具有在Cas12o中的3个或5个核苷酸对;区段Ba和Bb不同时存在,由存在的区段Ba或区段Bb形成的、由2个或3个核苷酸形成凸起(B);区段R2a和R2b是反向互补序列并形成第二茎(R2),所述第二茎(R2)具有在6个或7个碱基对;并且L为第二茎部处形成的、有5个或7个核苷酸形成的环。In one embodiment, the DR sequence comprises 5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3', wherein segments R1a and R1b are reverse complementary sequences and form a first stem (R1), and the first stem (R1) has 3 or 5 nucleotide pairs in Cas12o; segments Ba and Bb do not exist at the same time, and a protrusion (B) formed by 2 or 3 nucleotides is formed by the existing segment Ba or segment Bb; segments R2a and R2b are reverse complementary sequences and form a second stem (R2), and the second stem (R2) has 6 or 7 base pairs; and L is a loop formed at the second stem, with 5 or 7 nucleotides.

在一个实施方案中,所述DR序列如图6A所示,其包含5'-R1a(ACA)-Ba(不存在)-R2a(GGUAUCC)-L(UAAAC)-R2b(GGAUGCU)-Bb(GA)-R1b(UGU)-3'。In one embodiment, the DR sequence is as shown in FIG. 6A , which comprises 5′-R1a(ACA)-Ba(absent)-R2a(GGUAUCC)-L(UAAAC)-R2b(GGAUGCU)-Bb(GA)-R1b(UGU)-3′.

在一个实施方案中,所述DR序列如图6B所示,其包含5'-R1a(UUACA)-Ba(不存在)-R2a(ACUAUUC)-L(UUGAAAC)-R2b(GAAUGGU)-Bb(GAU)-R1b(UGUAA)-3'。In one embodiment, the DR sequence is as shown in Figure 6B, which comprises 5'-R1a(UUACA)-Ba(absent)-R2a(ACUAUUC)-L(UUGAAAC)-R2b(GAAUGGU)-Bb(GAU)-R1b(UGUAA)-3'.

在一个实施方案中,所述DR序列如图6C所示,其包含5'-R1a(UCAGU)-Ba(GUG)-R2a(GGUCUG)-L(AAACA)-R2b(CAGACC)-Bb(不存在)-R1b(AUUGA)-3'。In one embodiment, the DR sequence is as shown in Figure 6C, which comprises 5'-R1a(UCAGU)-Ba(GUG)-R2a(GGUCUG)-L(AAACA)-R2b(CAGACC)-Bb(absent)-R1b(AUUGA)-3'.

在一些实施方案中,本公开的Cas12o蛋白相对应的DR序列如SEQ ID NO.2、4、6所示。在一些实施方案中,直接重复包含SEQ ID NO.2、4、6所示序列的“功能性变体”,例如“功能性截短版本”、“功能性延长版本”、或“功能性替换版本”,例如,将DR序列的序列进行截短、缺失所得到的DR变体仍然具有DR功能,DR“功能性变体”是参比DR(如亲本DR)的5'和/或3'端延长(功能性延长版本)或截短(功能性截短版本),和/或参比DR序列中插入、缺失、和/或替换(功能性替换版本)一个或多个核苷酸后,依然具有参比DR的至少20%(如至少约任何30%、40%、50%、60%、70%、80%、90%、95%、或更高)功能的DR序列,即介导Cas12o蛋白与相对应的crRNA结合的功能。DR功能性变体一般保留可供Cas12o蛋白结合的茎环样二级结构或其部分。在一个实施方案中,Cas12o1的DR序列的茎环结构可以如图6所示。在一些实施方式中,crRNA中包含的直接重复的茎由10-13对互相杂交的互补碱基组成,其中通常包含1个RNA突起(bulge),并且环长度是5-7个核苷酸。在一些实施方式中,环长度是5个核苷酸;在一些实施方式中,环长度是7个核苷酸。在不同的实施例中,茎可以包含至少10个、至少11个、至少12个或至少13个碱基对。在一些实施方式中,直接重复包含总长度为约10-15个核苷酸的两个核苷酸互补段,以及构成环的5-7个核苷酸。在一些实施方式中,所述茎环结构包含长度为10-15个核苷酸的第一茎核苷酸链;长度为10-15个核苷酸的第二茎核苷酸链,其中所述第一和第二茎核苷酸链可以彼此杂交;以及排列在所述第一和第二茎核苷酸链之间的环核苷酸链,其中所述环核苷酸链包含5个、6个或7个核苷酸。在一些实施方式中,所述茎环结构包含的环核苷酸链包含至少3个腺嘌呤核苷酸。In some embodiments, the DR sequence corresponding to the Cas12o protein of the present disclosure is shown in SEQ ID NO. 2, 4, and 6. In some embodiments, the direct repeat contains a "functional variant" of the sequence shown in SEQ ID NO. 2, 4, and 6, such as a "functional truncated version", "functionally extended version", or "functionally replaced version", for example, the DR variant obtained by truncating or deleting the sequence of the DR sequence still has the DR function, and the DR "functional variant" is a DR sequence that still has at least 20% (such as at least about any 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or higher) of the reference DR (such as the parent DR) after the 5' and/or 3' ends are extended (functionally extended version) or truncated (functionally truncated version), and/or one or more nucleotides are inserted, deleted, and/or replaced (functionally replaced version) in the reference DR sequence, i.e., the function of mediating the binding of the Cas12o protein to the corresponding crRNA. DR functional variants generally retain a stem-loop-like secondary structure or part thereof that can be bound by Cas12o protein. In one embodiment, the stem-loop structure of the DR sequence of Cas12o1 can be as shown in Figure 6. In some embodiments, the stem of the direct repetition contained in the crRNA consists of 10-13 pairs of complementary bases that hybridize with each other, which generally include 1 RNA bulge, and the loop length is 5-7 nucleotides. In some embodiments, the loop length is 5 nucleotides; in some embodiments, the loop length is 7 nucleotides. In different embodiments, the stem may include at least 10, at least 11, at least 12 or at least 13 base pairs. In some embodiments, the direct repetition includes two nucleotide complementary segments with a total length of about 10-15 nucleotides, and 5-7 nucleotides constituting the loop. In some embodiments, the stem-loop structure comprises a first stem nucleotide chain of 10-15 nucleotides in length; a second stem nucleotide chain of 10-15 nucleotides in length, wherein the first and second stem nucleotide chains can hybridize with each other; and a cyclic nucleotide chain arranged between the first and second stem nucleotide chains, wherein the cyclic nucleotide chain comprises 5, 6 or 7 nucleotides. In some embodiments, the cyclic nucleotide chain comprised by the stem-loop structure comprises at least 3 adenine nucleotides.

在一个实施方案中,可引导Cas12o至靶位点的DR序列与SEQ ID NO.2、4、6中任一项所示的DR序列相比具有不导致二级结构发生实质性差异的选自核苷酸添加、插入、缺失和置换的一个或多个核苷酸变化。示例性的DR序列包含SEQ ID NO.2、4、6中任一个所示的序列具有80%或更高的同一性(例如,85%或更高、90%或更高、93%或更高、95%或更高、97%或更高、98%或更高、99%或更高或100%的同一性)的核苷酸序列。In one embodiment, the DR sequence that can guide Cas12o to the target site has one or more nucleotide changes selected from nucleotide addition, insertion, deletion and substitution that do not cause substantial differences in the secondary structure compared to the DR sequence shown in any one of SEQ ID NO. 2, 4, and 6. Exemplary DR sequences include nucleotide sequences that have 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100% identity) to the sequence shown in any one of SEQ ID NO. 2, 4, and 6.

在一些实施方式中,间隔序列的长度大于17个核苷酸,优选17至100个核苷酸,更优选16至50个核苷酸(例如,17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50个核苷酸),更优选17至50个核苷酸,更优选17至40个核苷酸,更优选18至39个核苷酸,最优选18至37个核苷酸。In some embodiments, the length of the spacer sequence is greater than 17 nucleotides, preferably 17 to 100 nucleotides, more preferably 16 to 50 nucleotides (e.g., 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides), more preferably 17 to 50 nucleotides, more preferably 17 to 40 nucleotides, more preferably 18 to 39 nucleotides, and most preferably 18 to 37 nucleotides.

融合蛋白Fusion Protein

Cas12o蛋白(例如dCas12o)可以具有缔合的(例如经由融合蛋白,或者合适的接头)一个或多个功能结构域,包括例如来自包括下项,或基本上由或由下项组成的组的一个或多个结构域:与一个或多个功能结构域相缔合,所述功能结构域选自核定位信号(NLS)结构域、核输出信号(NES)结构域、翻译激活结构域、转录激活结构域(例如VP64、p65、MyoD1、HSF1、RTA和SET7/9)、翻译起始结构域、转录阻遏结构域(例如KRAB结构域、NuE结构域、NcoR结构域和SID结构域,诸如SID4X结构域)、核酸酶结构域(例如FokI)、组蛋白修饰结构域(例如组蛋白乙酰转移酶)、光诱导型/可控结构域、化学诱导型/可控结构域、转座酶结构域、同源重组机制结构域、重组酶结构域、整合酶结构域、拓扑异构酶以及它们的组合。The Cas12o protein (e.g., dCas12o) can have one or more functional domains associated (e.g., via a fusion protein, or a suitable linker), including, for example, one or more domains from the group comprising, or essentially consisting of, or consisting of: associated with one or more functional domains selected from a nuclear localization signal (NLS) domain, a nuclear export signal (NES) domain, a translation activation domain, a transcriptional activation domain (e.g., VP64, p65, MyoD1, HSF1, RTA, and SET7/9), a translation initiation domain, a transcriptional repression domain (e.g., a KRAB domain, a NuE domain, an NcoR domain, and a SID domain, such as a SID4X domain), a nuclease domain (e.g., FokI), a histone modification domain (e.g., a histone acetyltransferase), a light-inducible/controllable domain, a chemically-inducible/controllable domain, a transposase domain, a homologous recombination machinery domain, a recombinase domain, an integrase domain, a topoisomerase, and a combination thereof.

在一个实施方案中,所述功能结构域包括脱氨酶。在另一实施方案中,所述功能结构域是转座酶。在另一个实施方案中,所述功能结构域是逆转录酶。在一些情况下,所述CRISPR-Cas12o复合物作为一个整体可与两个或更多个功能结构域缔合。例如,可存在两个或更多个与Cas12o蛋白缔合的功能结构域,或者可存在两个或更多个与指导RNA或crRNA缔合的功能结构域,或者可存在一个或多个与靶向RNA的效应蛋白缔合的功能结构域和一个或多个与指导RNA或crRNA。In one embodiment, the functional domain includes a deaminase. In another embodiment, the functional domain is a transposase. In another embodiment, the functional domain is a reverse transcriptase. In some cases, the CRISPR-Cas12o complex as a whole may be associated with two or more functional domains. For example, there may be two or more functional domains associated with the Cas12o protein, or there may be two or more functional domains associated with a guide RNA or crRNA, or there may be one or more functional domains associated with an effector protein targeting RNA and one or more with a guide RNA or crRNA.

在一个实施方案中,Cas12o蛋白与一个或多个功能结构域缔合,这种缔合可以通过效应蛋白与功能结构域的直接连接,或通过与crRNA的缔合来实现。在一个非限制性实例中,crRNA包含可与目标功能结构域缔合的添加或插入的序列,包括例如结合至核酸结合衔接蛋白的适体或核苷酸。功能结构域可以是功能异源结构域。In one embodiment, the Cas12o protein is associated with one or more functional domains, which can be achieved by direct connection of the effector protein to the functional domain, or by association with crRNA. In a non-limiting example, the crRNA comprises an added or inserted sequence that can be associated with the target functional domain, including, for example, an aptamer or nucleotide that binds to a nucleic acid binding adapter protein. The functional domain can be a functional heterologous domain.

在一个实施方案中,Cas12o蛋白与一个或多个功能结构域缔合,这种缔合可以通过效应蛋白与功能结构域的直接连接,或通过与crRNA的缔合来实现。在一个非限制性实例中,crRNA包含可与目标功能结构域缔合的添加或插入的序列,包括例如结合至核酸结合衔接蛋白的适体或核苷酸。In one embodiment, the Cas12o protein is associated with one or more functional domains, which can be achieved by direct connection of the effector protein to the functional domain, or by association with crRNA. In a non-limiting example, the crRNA comprises an added or inserted sequence that can be associated with the target functional domain, including, for example, an aptamer or nucleotide that binds to a nucleic acid binding adapter protein.

在一些实施方案中,功能结构域可以是功能异源结构域。至少一个或多个异源功能结构域可处于效应蛋白的氨基末端处或附近并且/或者其中至少一个或多个异源功能结构域处于效应蛋白的羧基末端处或附近。所述一个或多个异源功能结构域可与效应蛋白融合。所述一个或多个异源功能结构域可拴系至效应蛋白。所述一个或多个异源功能结构域可通过接头部分连接至效应蛋白。In some embodiments, the functional domains may be functional heterologous domains. At least one or more heterologous functional domains may be at or near the amino terminus of the effector protein and/or at least one or more heterologous functional domains may be at or near the carboxyl terminus of the effector protein. The one or more heterologous functional domains may be fused to the effector protein. The one or more heterologous functional domains may be tethered to the effector protein. The one or more heterologous functional domains may be connected to the effector protein via a linker portion.

在一个实施方案中,所述一个或多个功能结构域为异源功能结构域。在一些实施方案中,所述异源功能结构域以具有以下活性中的一者或多者:核酸酶活性、甲基化活性、脱甲基化活性、DNA修复活性、DNA损伤活性、脱氨基活性、歧化酶活性、烷基化活性、脱嘌呤活性、氧化活性、嘧啶二聚体形成活性、整合酶活性、转座酶活性、重组酶活性、聚合酶活性、连接酶活性、解旋酶活性、光裂合酶活性、糖基化酶活性、乙酰转移酶活性、脱乙酰酶活性、激酶活性、磷酸酶活性、泛素连接酶活性、去泛素化活性、腺苷酸化活性、脱腺苷酸化活性、SUMO化活性、脱SUMO化活性、核糖基化活性、脱核糖基化活性、豆蔻酰化活性、脱豆蔻酰化活性、糖基化活性(例如,来自O-GlcNAc转移酶)、脱糖基化活性、转录抑制活性、转录激活活性、翻译激活活性、翻译抑制活性、组蛋白修饰活性、单链RNA切割活性、双链RNA切割活性、单链DNA切割活性、双链DNA切割活性和核酸结合活性、染色质修饰或重塑活性和可检测的活性、In one embodiment, the one or more functional domains are heterologous functional domains. In some embodiments, the heterologous functional domains have one or more of the following activities: nuclease activity, methylation activity, demethylation activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, glycosylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase), deglycosylation activity, transcriptional repression activity, transcriptional activation activity, translational activation activity, translational repression activity, histone modification activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity and nucleic acid binding activity, chromatin modification or remodeling activity and detectable activity,

在一个实施方案中,Cas12o蛋白或其直系同源物或同系物可用作与功能结构域融合或可操作地连接的通用核酸结合蛋白。示例性功能结构域可包括但不限于核定位信号(NLS)、核输出信号(NES)、脱氨酶(例如腺苷脱氨酶或胞苷脱氨酶)结构域、转录激活结构域、DNA甲基化催化结构域、组蛋白残基修饰结构域、核酸酶催化结构域、荧光蛋白、转录修饰因子、光门控因子、化学诱导型因子、染色质可视化因子、提供与靶细胞或靶细胞类型上的细胞表面部分的结合的靶向多肽、表观遗传修饰结构域、转座酶结构域、逆转录酶结构域、拓扑异构酶、磷酸酶、聚合酶。In one embodiment, the Cas12o protein or its ortholog or homolog can be used as a universal nucleic acid binding protein fused or operably connected to a functional domain. Exemplary functional domains may include, but are not limited to, nuclear localization signals (NLS), nuclear export signals (NES), deaminases (e.g., adenosine deaminase or cytidine deaminase) domains, transcriptional activation domains, DNA methylation catalytic domains, histone residue modification domains, nuclease catalytic domains, fluorescent proteins, transcriptional modifiers, light-gated factors, chemically inducible factors, chromatin visualization factors, targeting polypeptides, epigenetic modification domains, transposase domains, reverse transcriptase domains, topoisomerases, phosphatases, polymerases that provide binding to cell surface moieties on target cells or target cell types.

在一些实施方案中,所述的功能结构域包括在一些优选的实施方案中,功能结构域是转录激活结构域,诸如但不限于VP64、p65、MyoD1、HSF1、RTA、SET7/9或组蛋白乙酰转移酶。在一个实施方案中,功能结构域是转录抑制结构域,优选为KRAB。在一个实施方案中,转录抑制结构域是SID或SID的串联体(例如SID4X)。在一个实施方案中,功能结构域是表观遗传修饰结构域,从而提供了表观遗传修饰酶。在一个实施方案中,功能结构域是转录激活结构域,其可以是P65激活结构域。In some embodiments, described functional domain is included in some preferred embodiments, functional domain is transcriptional activation domain, such as but not limited to VP64, p65, MyoD1, HSF1, RTA, SET7/9 or histone acetyltransferase.In one embodiment, functional domain is transcriptional repression domain, preferably KRAB.In one embodiment, transcriptional repression domain is SID or SID concatemer (such as SID4X).In one embodiment, functional domain is epigenetic modification domain, so as to provide epigenetic modification enzyme.In one embodiment, functional domain is transcriptional activation domain, it can be P65 activation domain.

在一些实施方案中,核酸指导的核酸酶与连接酶或其功能片段缔合。连接酶可连接由核酸指导的核酸酶产生的单链断裂(切口)。在某些情况下,连接酶可连接由核酸指导的核酸酶产生的双链断裂。在某些实例中,核酸指导的核酸酶与逆转录酶或其功能片段缔合。In some embodiments, the nucleic acid-guided nuclease is associated with a ligase or a functional fragment thereof. The ligase can connect single-strand breaks (nicks) produced by the nucleic acid-guided nuclease. In some cases, the ligase can connect double-strand breaks produced by the nucleic acid-guided nuclease. In some examples, the nucleic acid-guided nuclease is associated with a reverse transcriptase or a functional fragment thereof.

优选地,转座酶结构域、HR(同源重组)机制结构域、重组酶结构域和/或整合酶结构域作为本公开的功能结构域。在一个实施方案中,DNA整合活性包括HR机制结构域、整合酶结构域、重组酶结构域和/或转座酶结构域。Preferably, a transposase domain, a HR (homologous recombination) machinery domain, a recombinase domain and/or an integrase domain are used as functional domains of the present disclosure. In one embodiment, the DNA integration activity comprises a HR machinery domain, an integrase domain, a recombinase domain and/or a transposase domain.

在一个实施方案中,DNA切割活性是由于核酸酶。在一个实施方案中,核酸酶包括Fok1核酸酶(例如,参见,“Dimeric CRISPR RNA-guided FokI nucleases for highly specificgenome editing”,Shengdar Q.Tsai,Nicolas Wyvekens,Cyd Khayter,JenniferA.Foden,Vishal Thapar,Deepak Reyon,Mathew J.Goodwin,Martin J.Aryee,J.KeithJoung Nature Biotechnology 32(6):569--77(2014)),涉及二聚体RNA指导的FokI核酸酶,所述核酸酶识别延长序列并且可以在人细胞中高效编辑内源性基因。In one embodiment, the DNA cleavage activity is due to a nuclease. In one embodiment, the nuclease includes a Fok1 nuclease (e.g., see, "Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing", Shengdar Q. Tsai, Nicolas Wyvekens, Cyd Khayter, Jennifer A. Foden, Vishal Thapar, Deepak Reyon, Mathew J. Goodwin, Martin J. Aryee, J. Keith Joung Nature Biotechnology 32(6):569--77 (2014)), which relates to a dimeric RNA-guided FokI nuclease that recognizes extended sequences and can efficiently edit endogenous genes in human cells.

在一个实施方案中,Cas12o蛋白可以包含一个或多个异源功能结构域。如本文所用,异源功能结构域是不与核酸指导的核酸酶源自相同物种的多肽。例如,源自物种A的核酸指导的核酸酶的异源功能结构域是源自不同于物种A的物种的多肽,或人工多肽。一个或多个异源功能结构域可包含一个或多个核定位信号(NLS)结构域。一个或多个异源功能结构域可包含至少两个或更多个NLS。一个或多个异源功能结构域可包含一个或多个转录激活结构域。转录激活结构域可包含VP64。一个或多个异源功能结构域可包含一个或多个转录抑制结构域。转录抑制结构域可包含KRAB结构域或SID结构域。一个或多个异源功能结构域可包含一个或多个核酸酶结构域。一个或多个核酸酶结构域可包含Fok1。In one embodiment, the Cas12o protein may include one or more heterologous functional domains. As used herein, a heterologous functional domain is a polypeptide that is not derived from the same species as the nucleic acid-guided nuclease. For example, the heterologous functional domain of the nucleic acid-guided nuclease derived from species A is a polypeptide derived from a species different from species A, or an artificial polypeptide. One or more heterologous functional domains may include one or more nuclear localization signal (NLS) domains. One or more heterologous functional domains may include at least two or more NLS. One or more heterologous functional domains may include one or more transcriptional activation domains. The transcriptional activation domain may include VP64. One or more heterologous functional domains may include one or more transcriptional repression domains. The transcriptional repression domain may include a KRAB domain or a SID domain. One or more heterologous functional domains may include one or more nuclease domains. One or more nuclease domains may include Fok1.

在一个实施方案中,一个或多个功能结构域包含乙酰转移酶,优选为组蛋白乙酰转移酶。这些可用于表观基因组学领域中,例如询问表观基因组的方法中。询问表观基因组的方法可包括,例如,靶向表观基因组序列。靶向表观基因组序列可包括将指导物引导至表观基因组靶序列。表观基因组靶序列可包括,在一个实施方案中,包括启动子、沉默子或增强子序列。In one embodiment, one or more functional domains comprise acetyltransferase, preferably histone acetyltransferase. These can be used in the field of epigenomics, for example, in the method for interrogating epigenome. The method for interrogating epigenome can include, for example, targeting epigenomic sequence. Targeting epigenomic sequence can include guiding thing to epigenomic target sequence. Epigenomic target sequence can include, in one embodiment, including promoter, silencer or enhancer sequence.

乙酰转移酶的实例是已知的,但在一个实施方案中可包括组蛋白乙酰转移酶。在一个实施方案中,组蛋白乙酰转移酶可包含人乙酰转移酶p300的催化核心(Gerbasch&Reddy,Nature Biotech 2015年4月6日)。Examples of acetyltransferases are known, but in one embodiment may include a histone acetyltransferase. In one embodiment, the histone acetyltransferase may comprise the catalytic core of human acetyltransferase p300 (Gerbasch & Reddy, Nature Biotech April 6, 2015).

碱基编辑Base editing

在一些实施方案中,Cas12o蛋白(例如dCas12o),可以与脱氨酶(例如腺苷脱氨酶或胞苷脱氨酶)缔合(例如融合),所述脱氨酶可以改变核苷酸的同一性,例如从C·G到T·A或从A·T到G·C(Gaudelli等人,Programmable baseediting of A·T to G·C in genomic DNA without DNA cleavage[基因组DNA中A·T到G·C的可编程碱基编辑,无需DNA切割].″Nature[自然](2017);Nishida等人“Targetednucleotide editing using hybrid prokaryotic and vertebrate adaptive immunesystems[使用混合原核和脊椎动物适应性免疫系统进行靶向核苷酸编辑].”Science[科学]353(6305)(2016);Komor等人“Programmable editing of a target base in genomicDNA without double-stranded DNA cleavage[无需双链DNA切割即可对基因组DNA中的靶碱基进行可编程编辑].”Nature[自然]533(7603)(2016):420-4.)。碱基编辑融合蛋白可包含例如活性(双链断裂产生)、部分活性(切口酶)或失活(无催化活性)的Cas12o核酸酶和脱氨酶。碱基编辑修复抑制剂和糖基化酶抑制剂(例如:在一些实施例中,尿嘧啶糖基化酶抑制剂(以防止尿嘧啶移除))被认为是碱基编辑系统的其他组分。在一些实施方案中,在某些实例中,核苷酸脱氨酶是腺苷脱氨酶的突变形式。腺苷脱氨酶的突变形式可具有腺苷脱氨酶和胞苷脱氨酶活性两者。In some embodiments, a Cas12o protein (e.g., dCas12o) can be associated with (e.g., fused to) a deaminase (e.g., an adenosine deaminase or a cytidine deaminase) that can change the identity of a nucleotide, for example, from C·G to T·A or from A·T to G·C (Gaudelli et al., Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage." Nature (2017); Nishida et al. "Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems." Science [ Science 353(6305)(2016); Komor et al. "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage." Nature 533(7603)(2016): 420-4.). The base editing fusion protein may comprise, for example, an active (double-strand break generating), partially active (nicking enzyme), or inactive (catalytically inactive) Cas12o nuclease and deaminase. Base editing repair inhibitors and glycosylase inhibitors (e.g., in some embodiments, uracil glycosylase inhibitors (to prevent uracil removal)) are contemplated as additional components of the base editing system. In some embodiments, in certain instances, the nucleotide deaminase is a mutant form of adenosine deaminase. The mutant form of adenosine deaminase may have both adenosine deaminase and cytidine deaminase activity.

腺苷脱氨酶Adenosine deaminase

术语“腺苷脱氨酶”或“腺苷脱氨酶蛋白”是指蛋白质,多肽,或蛋白质或多肽的一个或多个功能结构域,其能够催化将腺嘌呤(或分子的腺嘌呤部分)转化为次黄嘌呤(或分子的次黄嘌呤部分)的水解脱氨反应。在一些实施方案中,含腺嘌呤的分子是腺苷(A),并且含次黄嘌呤的分子是肌苷(I)。含腺嘌呤的分子可以是脱氧核糖核酸(DNA)或核糖核酸(RNA)。The term "adenosine deaminase" or "adenosine deaminase protein" refers to a protein, a polypeptide, or one or more functional domains of a protein or polypeptide that is capable of catalyzing a hydrolytic deamination reaction that converts adenine (or the adenine portion of a molecule) to hypoxanthine (or the hypoxanthine portion of a molecule). In some embodiments, the adenine-containing molecule is adenosine (A), and the hypoxanthine-containing molecule is inosine (I). The adenine-containing molecule can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).

根据本公开,可与本公开结合使用的腺苷脱氨酶包括但不限于称为作用于RNA的腺苷脱氨酶的酶家族成员(ADAR),称为作用于tRNA的腺苷脱氨酶的酶家族成员(ADAT),以及其他含腺苷脱氨酶结构域(ADAD)的家族成员。根据本公开,腺苷脱氨酶能够靶向RNA/DNA和RNA双链体中的腺嘌呤。实际上,Zheng等人,(Nucleic Acids Res.2017,45(6):3369-3377)证实ADAR可对RNA/DNA和RNA/RNA双链体进行腺苷至肌苷的编辑反应。在特定的实施方案中,腺苷脱氨酶已被修饰以增加其编辑RNA双链体的RNA/DNA异源双链体中的DNA的能力,如下文所详述。According to the present disclosure, adenosine deaminases that can be used in conjunction with the present disclosure include, but are not limited to, members of the enzyme family known as adenosine deaminases acting on RNA (ADAR), members of the enzyme family known as adenosine deaminases acting on tRNA (ADAT), and other family members containing adenosine deaminase domains (ADAD). According to the present disclosure, adenosine deaminases are capable of targeting adenine in RNA/DNA and RNA duplexes. In fact, Zheng et al., (Nucleic Acids Res. 2017, 45(6): 3369-3377) demonstrated that ADARs can perform editing reactions of adenosine to inosine on RNA/DNA and RNA/RNA duplexes. In a specific embodiment, adenosine deaminase has been modified to increase its ability to edit DNA in RNA/DNA heteroduplexes of RNA duplexes, as described in detail below.

在一些实施方案中,腺苷脱氨酶源自一种或多种后生动物物种,包括但不限于哺乳动物、鸟类、青蛙、鱿鱼、鱼、蝇和蠕虫。在一些实施方案中,腺苷脱氨酶是人类、鱿鱼或果蝇腺苷脱氨酶。In some embodiments, the adenosine deaminase is derived from one or more metazoan species, including but not limited to mammals, birds, frogs, squid, fish, flies, and worms. In some embodiments, the adenosine deaminase is human, squid, or fruit fly adenosine deaminase.

在一些实施方案中,腺苷脱氨酶是人类ADAR,包括hADAR1、hADAR2、hADAR3。在一些实施方案中,腺苷脱氨酶是秀丽隐杆线虫(Caenorhabditis elegans)ADAR蛋白,包括ADR-1和ADR-2。在一些实施方案中,腺苷脱氨酶是果蝇ADAR蛋白,包括dAdar。在一些实施方案中,腺苷脱氨酶是鱿鱼(长翼鱿鱼(Loligo pealeii))ADAR蛋白,包括sqADAR2a和sqADAR2b。在一些实施方案中,腺苷脱氨酶是人类ADAT蛋白。在一些实施方案中,腺苷脱氨酶是果蝇ADAT蛋白。在一些实施方案中,腺苷脱氨酶是人类ADAD蛋白,包括TENR(hADAD1)和TENRL(hADAD2)。In some embodiments, the adenosine deaminase is a human ADAR, including hADAR1, hADAR2, hADAR3. In some embodiments, the adenosine deaminase is a Caenorhabditis elegans ADAR protein, including ADR-1 and ADR-2. In some embodiments, the adenosine deaminase is a Drosophila ADAR protein, including dAdar. In some embodiments, the adenosine deaminase is a squid (Loligo pealeii) ADAR protein, including sqADAR2a and sqADAR2b. In some embodiments, the adenosine deaminase is a human ADAT protein. In some embodiments, the adenosine deaminase is a Drosophila ADAT protein. In some embodiments, the adenosine deaminase is a human ADAD protein, including TENR (hADAD1) and TENRL (hADAD2).

在一些实施方案中,腺苷脱氨酶是TadA蛋白,例如大肠杆菌TadA。参见Kim等人,Biochemistry 45:6407-6416(2006);Wolf等人,EMBO J.21:3841-3851(2002)。在一些实施方案中,腺苷脱氨酶是小鼠ADA。参见Grunebaum等人,Curr.Opin.AllergyClin.Immunol.13:630-638(2013)。在一些实施方案中,腺苷脱氨酶是人类ADAT2。参见Fukui等人,J.Nucleic Acids2010:260512(2010)。在一些实施方案中,脱氨酶(例如腺苷或胞苷脱氨酶)是以下文献中描述的那些中的一种或多种:Cox等人,Science.2017年11月24日;358(6366):1019-1027;Komore等人,Nature.2016年5月19日;533(7603):420-4;以及Gaudelli等人,Nature.2017年11月23日;551(7681):464-471。In some embodiments, the adenosine deaminase is a TadA protein, such as E. coli TadA. See Kim et al., Biochemistry 45:6407-6416 (2006); Wolf et al., EMBO J. 21:3841-3851 (2002). In some embodiments, the adenosine deaminase is mouse ADA. See Grunebaum et al., Curr. Opin. Allergy Clin. Immunol. 13:630-638 (2013). In some embodiments, the adenosine deaminase is human ADAT2. See Fukui et al., J. Nucleic Acids 2010:260512 (2010). In some embodiments, the deaminase (e.g., an adenosine or cytidine deaminase) is one or more of those described in Cox et al., Science. 2017 Nov 24;358(6366):1019-1027; Komore et al., Nature. 2016 May 19;533(7603):420-4; and Gaudelli et al., Nature. 2017 Nov 23;551(7681):464-471.

在一些实施方案中,腺苷脱氨酶蛋白识别双链核酸底物中的一个或多个靶腺苷残基并将其转化为肌苷残基。在一些实施方案中,双链核酸底物是RNA-DNA杂合双链体。在一些实施方案中,腺苷脱氨酶蛋白识别双链底物上的结合窗口。在一些实施方案中,结合窗口包含至少一个靶腺苷残基。在一些实施方案中,结合窗口在约3bp至约100bp的范围内。在一些实施方案中,结合窗口在约5bp至约50bp的范围内。在一些实施方案中,结合窗口在约10bp至约30bp的范围内。在一些实施方案中,结合窗口为约1bp、2bp、3bp、5bp、7bp、10bp、15bp、20bp、25bp、30bp、40bp、45bp、50bp、55bp、60bp、65bp、70bp、75bp、80bp、85bp、90bp、95bp或100bp。In some embodiments, the adenosine deaminase protein recognizes one or more target adenosine residues in a double-stranded nucleic acid substrate and converts them into inosine residues. In some embodiments, the double-stranded nucleic acid substrate is an RNA-DNA hybrid duplex. In some embodiments, the adenosine deaminase protein recognizes a binding window on a double-stranded substrate. In some embodiments, the binding window comprises at least one target adenosine residue. In some embodiments, the binding window is in the range of about 3bp to about 100bp. In some embodiments, the binding window is in the range of about 5bp to about 50bp. In some embodiments, the binding window is in the range of about 10bp to about 30bp. In some embodiments, the binding window is about 1bp, 2bp, 3bp, 5bp, 7bp, 10bp, 15bp, 20bp, 25bp, 30bp, 40bp, 45bp, 50bp, 55bp, 60bp, 65bp, 70bp, 75bp, 80bp, 85bp, 90bp, 95bp or 100bp.

在一些实施方案中,腺苷脱氨酶蛋白包含一个或多个脱氨酶结构域。不希望受特定理论的束缚,预期脱氨酶结构域用于识别双链核酸底物中所含的一个或多个靶腺苷(A)残基并将其转化为肌苷(I)残基。在一些实施方案中,脱氨酶结构域包含活性中心。在一些实施方案中,活性中心包含锌离子。在一些实施方案中,在A-I编辑过程期间,靶腺苷残基处的碱基配对被破坏,并且靶腺苷残基被“翻转”出双螺旋,以变得可被腺苷脱氨酶接近。在一些实施方案中,活性中心中或附近的氨基酸残基与靶腺苷残基的5'的一个或多个核苷酸相互作用。在一些实施方案中,在活性中心内或附近的氨基酸残基与靶腺苷残基的3'的一个或多个核苷酸相互作用。在一些实施方案中,活性中心中或附近的氨基酸残基进一步与和相反链上的靶腺苷残基互补的核苷酸相互作用。在一些实施方案中,氨基酸残基与核苷酸的2'羟基形成氢键。In some embodiments, the adenosine deaminase protein comprises one or more deaminase domains. Without wishing to be bound by a particular theory, it is expected that the deaminase domain is used to identify one or more target adenosine (A) residues contained in a double-stranded nucleic acid substrate and convert them into inosine (I) residues. In some embodiments, the deaminase domain comprises an active center. In some embodiments, the active center comprises a zinc ion. In some embodiments, during the A-I editing process, the base pairing at the target adenosine residue is destroyed, and the target adenosine residue is "flipped" out of the double helix to become accessible to adenosine deaminase. In some embodiments, the amino acid residues in or near the active center interact with one or more nucleotides of the 5' of the target adenosine residue. In some embodiments, the amino acid residues in or near the active center interact with one or more nucleotides of the 3' of the target adenosine residue. In some embodiments, the amino acid residues in or near the active center further interact with the nucleotides complementary to the target adenosine residues on the opposite chain. In some embodiments, the amino acid residues form hydrogen bonds with the 2' hydroxyl of the nucleotide.

在一些实施方案中,腺苷脱氨酶包含人类ADAR2全蛋白(hADAR2)或其脱氨酶结构域(hADAR2-D)。在一些实施方案中,腺苷脱氨酶是与hADAR2或hADAR2-D同源的ADAR家族成员。In some embodiments, the adenosine deaminase comprises a human ADAR2 full protein (hADAR2) or a deaminase domain thereof (hADAR2-D). In some embodiments, the adenosine deaminase is an ADAR family member homologous to hADAR2 or hADAR2-D.

特别地,在一些实施方案中,同源ADAR蛋白是人类ADAR1(hADAR1)或其脱氨酶结构域(hADAR1-D)。在一些实施方案中,hADAR1-D的甘氨酸1007对应于甘氨酸487hADAR2-D,并且hADAR1-D的谷氨酸1008对应于hADAR2-D的谷氨酸488。In particular, in some embodiments, the homologous ADAR protein is human ADAR1 (hADAR1) or its deaminase domain (hADAR1-D). In some embodiments, glycine 1007 of hADAR1-D corresponds to glycine 487 hADAR2-D, and glutamate 1008 of hADAR1-D corresponds to glutamate 488 of hADAR2-D.

在一些实施方案中,腺苷脱氨酶包含hADAR2-D的野生型氨基酸序列。在一些实施方案中,腺苷脱氨酶在hADAR2-D序列中包含一个或多个突变,使得hADAR2-D的编辑效率和/或底物编辑偏好根据特定需要而改变。In some embodiments, the adenosine deaminase comprises the wild-type amino acid sequence of hADAR2-D. In some embodiments, the adenosine deaminase comprises one or more mutations in the hADAR2-D sequence such that the editing efficiency and/or substrate editing preference of hADAR2-D is changed according to specific needs.

在一些实施方案中,腺苷脱氨酶催化结构域包含与SEQ ID NO:30所示的氨基酸序列具有至少80%、82%、85%、87%、90%、92%、95%、96%、97%、98%或99%或100%同一性的氨基酸序列,并且其保留如SEQ ID NO:30所示的氨基酸序列的脱氨活性。In some embodiments, the adenosine deaminase catalytic domain comprises an amino acid sequence that is at least 80%, 82%, 85%, 87%, 90%, 92%, 95%, 96%, 97%, 98%, or 99% or 100% identical to the amino acid sequence shown in SEQ ID NO:30, and it retains the deamination activity of the amino acid sequence shown in SEQ ID NO:30.

在一些实施方案中,腺苷脱氨酶催化结构域包括SEQ ID NO:30所示的氨基酸序列的突变体:E18K+F19S+N20L,命名为腺苷脱氨酶004V14(可参见WO2023193536A1)。In some embodiments, the adenosine deaminase catalytic domain includes a mutant of the amino acid sequence shown in SEQ ID NO: 30: E18K+F19S+N20L, named adenosine deaminase 004V14 (see WO2023193536A1).

在一些实施方案中,腺苷脱氨酶催化结构域包含与SEQ ID NO:31(选自CN114634923A中的005V1脱氨酶,在该申请中氨基酸序列为SEQ ID NO:2)所示的氨基酸序列具有至少80%、82%、85%、87%、90%、92%、95%、96%、97%、98%或99%或100%同一性的氨基酸序列,并且其保留如SEQ ID NO:31所示的氨基酸序列的脱氨活性。In some embodiments, the adenosine deaminase catalytic domain comprises an amino acid sequence that is at least 80%, 82%, 85%, 87%, 90%, 92%, 95%, 96%, 97%, 98%, 99% or 100% identical to the amino acid sequence shown in SEQ ID NO:31 (selected from 005V1 deaminase in CN114634923A, in which the amino acid sequence is SEQ ID NO:2), and it retains the deamination activity of the amino acid sequence shown in SEQ ID NO:31.

在一些实施方案中,腺苷脱氨酶催化结构域的氨基酸序列相对于SEQ ID NO:30或31所示的氨基酸序列出现氨基酸添加、插入、缺失和置换。In some embodiments, the amino acid sequence of the adenosine deaminase catalytic domain has amino acid additions, insertions, deletions, and substitutions relative to the amino acid sequence shown in SEQ ID NO:30 or 31.

在一些实施方案中,腺苷脱氨酶催化结构域包括SEQ ID NO:31所示的氨基酸序列的突变体:Q148G+Q149M+P150R,命名为脱氨酶005V1-10-3。In some embodiments, the adenosine deaminase catalytic domain includes a mutant of the amino acid sequence shown in SEQ ID NO:31: Q148G+Q149M+P150R, named deaminase 005V1-10-3.

在一些实施方案中,所述功能结构域是TadA8e的全长或功能性片段。In some embodiments, the functional domain is the full length or a functional fragment of TadA8e.

在一些实施方案中,所述腺苷脱氨酶为004V1(SEQ ID NO.30)、005V1(SEQ ID NO.31)。In some embodiments, the adenosine deaminase is 004V1 (SEQ ID NO.30) or 005V1 (SEQ ID NO.31).

胞苷脱氨酶Cytidine deaminase

在一些实施方案中,脱氨酶是胞苷脱氨酶。如本文所用,术语“胞苷脱氨酶”或“胞苷脱氨酶蛋白”是指蛋白质、多肽或者蛋白质或多肽的一个或多个功能结构域,其能够催化将胞嘧啶(或分子的胞嘧啶部分)转化为尿嘧啶(或分子的尿嘧啶部分)的水解脱氨基反应,如下所示。在一些实施方案中,含胞嘧啶的分子是胞苷(C),并且含尿嘧啶的分子是尿苷(U)。所述含胞嘧啶的分子可以是脱氧核糖核酸(DNA)或核糖核酸(RNA)。In some embodiments, the deaminase is a cytidine deaminase. As used herein, the term "cytidine deaminase" or "cytidine deaminase protein" refers to a protein, a polypeptide, or one or more functional domains of a protein or polypeptide that can catalyze the hydrolytic deamination reaction of converting cytosine (or the cytosine portion of a molecule) into uracil (or the uracil portion of a molecule), as shown below. In some embodiments, the molecule containing cytosine is cytidine (C), and the molecule containing uracil is uridine (U). The molecule containing cytosine can be deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).

根据本公开,可与本公开结合使用的胞苷脱氨酶包括但不限于被称为载脂蛋白BmRNA编辑复合物(APOBEC)家族脱氨酶的酶家族的成员,激活诱导的脱氨酶(AID),或胞苷脱氨酶1(CDA1)。在特定的实施方案中,APOBEC1脱氨酶、APOBEC2脱氨酶、APOBEC3A脱氨酶、APOBEC3B脱氨酶、APOBEC3C脱氨酶和APOBEC3D脱氨酶、APOBEC3E脱氨酶、APOBEC3F脱氨酶、APOBEC3G脱氨酶、APOBEC3H脱氨酶或APOBEC4脱氨酶中的脱氨酶。According to the present disclosure, cytidine deaminases that can be used in conjunction with the present disclosure include, but are not limited to, members of the enzyme family known as apolipoprotein B mRNA editing complex (APOBEC) family deaminases, activation-induced deaminases (AID), or cytidine deaminase 1 (CDA1). In specific embodiments, a deaminase in APOBEC1 deaminase, APOBEC2 deaminase, APOBEC3A deaminase, APOBEC3B deaminase, APOBEC3C deaminase, and APOBEC3D deaminase, APOBEC3E deaminase, APOBEC3F deaminase, APOBEC3G deaminase, APOBEC3H deaminase, or APOBEC4 deaminase.

在本公开的方法和系统中,胞苷脱氨酶能够靶向DNA单链中的胞嘧啶。在某些示例实施方案中,胞苷脱氨酶可在存在于结合组分外部的单链上进行编辑,例如结合Cas13。在其他示例实施方案中,胞苷脱氨酶可在局部化泡,例如由靶标编辑位点处但指导序列错配形成的局部化泡处编辑。在某些示例实施方案中,胞苷脱氨酶可包含有助于聚焦活性的突变,例如Kim等人,Nature Biotechnology(2017)35(4):371-377(doi:10.1038/nbt.3803中所述的那些。In the methods and systems of the present disclosure, a cytidine deaminase is capable of targeting cytosine in a single strand of DNA. In certain example embodiments, the cytidine deaminase can edit on a single strand that is present outside a binding component, such as in conjunction with Cas13. In other example embodiments, the cytidine deaminase can edit at a localized bubble, such as a localized bubble formed by a target editing site but a guide sequence mismatch. In certain example embodiments, the cytidine deaminase may include mutations that contribute to focused activity, such as those described in Kim et al., Nature Biotechnology (2017) 35(4):371-377 (doi:10.1038/nbt.3803.

在一些实施方案中,胞苷脱氨酶源自一种或多种后生动物物种,包括但不限于哺乳动物、鸟类、青蛙、鱿鱼、鱼、蝇和蠕虫。在一些实施方案中,胞苷脱氨酶是人类、灵长类、牛、狗、大鼠或小鼠胞苷脱氨酶。In some embodiments, the cytidine deaminase is derived from one or more metazoan species, including but not limited to mammals, birds, frogs, squids, fish, flies, and worms. In some embodiments, the cytidine deaminase is a human, primate, cow, dog, rat, or mouse cytidine deaminase.

在一些实施方案中,胞苷脱氨酶是人类APOBEC,包括hAPOBEC1或hAPOBEC3。在一些实施方案中,胞苷脱氨酶是人类AID。In some embodiments, the cytidine deaminase is human APOBEC, including hAPOBEC1 or hAPOBEC3. In some embodiments, the cytidine deaminase is human AID.

在一些实施方案中,胞苷脱氨酶蛋白识别RNA双链体的单链泡中的一个或多个靶胞嘧啶残基并将其转化为尿嘧啶残基。在一些实施方案中,胞苷脱氨酶蛋白识别RNA双链体的单链泡上的结合窗口。在一些实施方案中,结合窗口包含至少一个靶胞嘧啶残基。在一些实施方案中,结合窗口在约3bp至约100bp的范围内。在一些实施方案中,结合窗口在约5bp至约50bp的范围内。在一些实施方案中,结合窗口在约10bp至约30bp的范围内。在一些实施方案中,结合窗口为约1bp、2bp、3bp、5bp、7bp、10bp、15bp、20bp、25bp、30bp、40bp、45bp、50bp、55bp、60bp、65bp、70bp、75bp、80bp、85bp、90bp、95bp或100bp。In some embodiments, the cytidine deaminase protein recognizes one or more target cytosine residues in the single-stranded bubble of the RNA duplex and converts them into uracil residues. In some embodiments, the cytidine deaminase protein recognizes the binding window on the single-stranded bubble of the RNA duplex. In some embodiments, the binding window comprises at least one target cytosine residue. In some embodiments, the binding window is in the range of about 3bp to about 100bp. In some embodiments, the binding window is in the range of about 5bp to about 50bp. In some embodiments, the binding window is in the range of about 10bp to about 30bp. In some embodiments, the binding window is about 1bp, 2bp, 3bp, 5bp, 7bp, 10bp, 15bp, 20bp, 25bp, 30bp, 40bp, 45bp, 50bp, 55bp, 60bp, 65bp, 70bp, 75bp, 80bp, 85bp, 90bp, 95bp or 100bp.

在一些实施方案中,胞苷脱氨酶蛋白包含一个或多个脱氨酶结构域。不希望受理论的束缚,预期脱氨酶结构域用于识别RNA双链体的单链泡中所含的一个或多个靶胞嘧啶(C)残基并将其转化为尿嘧啶(U)残基。在一些实施方案中,脱氨酶结构域包含活性中心。在一些实施方案中,活性中心包含锌离子。在一些实施方案中,在活性中心内或附近的氨基酸残基与靶胞嘧啶残基的5'的一个或多个核苷酸相互作用。在一些实施方案中,在活性中心内或附近的氨基酸残基与靶胞嘧啶残基的3'的一个或多个核苷酸相互作用。In some embodiments, the cytidine deaminase protein comprises one or more deaminase domains. Without wishing to be bound by theory, it is expected that the deaminase domain is used to recognize one or more target cytosine (C) residues contained in the single-stranded bubble of the RNA duplex and convert it into uracil (U) residues. In some embodiments, the deaminase domain comprises an active center. In some embodiments, the active center comprises a zinc ion. In some embodiments, the amino acid residues in or near the active center interact with one or more nucleotides of the 5' of the target cytosine residue. In some embodiments, the amino acid residues in or near the active center interact with one or more nucleotides of the 3' of the target cytosine residue.

在一些实施方案中,胞苷脱氨酶包含人类APOBEC1全蛋白(hAPOBEC1)或其脱氨酶结构域(hAPOBEC1-D)或其C末端截短形式(hAPOBEC-T)。在一些实施方案中,胞苷脱氨酶是与hAPOBEC1、hAPOBEC-D或hAPOBEC-T同源的APOBEC家族成员。在一些实施方案中,胞苷脱氨酶包含人类AID1全蛋白(hAID)或其脱氨酶结构域(hAID-D)或其C末端截短形式(hAID-T)。在一些实施方案中,胞苷脱氨酶是与hAID、hAID-D或hAID-T同源的AID家族成员。在一些实施方案中,hAID-T是C末端截短约20个氨基酸的hAID。In some embodiments, the cytidine deaminase comprises a human APOBEC1 full protein (hAPOBEC1) or a deaminase domain thereof (hAPOBEC1-D) or a C-terminal truncated form thereof (hAPOBEC-T). In some embodiments, the cytidine deaminase is an APOBEC family member homologous to hAPOBEC1, hAPOBEC-D or hAPOBEC-T. In some embodiments, the cytidine deaminase comprises a human AID1 full protein (hAID) or a deaminase domain thereof (hAID-D) or a C-terminal truncated form thereof (hAID-T). In some embodiments, the cytidine deaminase is an AID family member homologous to hAID, hAID-D or hAID-T. In some embodiments, hAID-T is a hAID with a C-terminal truncation of about 20 amino acids.

在一些实施方案中,胞苷脱氨酶包含胞嘧啶脱氨酶的野生型氨基酸序列。在一些实施方案中,胞苷脱氨酶在胞嘧啶脱氨酶序列中包含一个或多个突变,使得胞嘧啶脱氨酶的编辑效率和/或底物编辑偏好根据特定需要而改变。In some embodiments, the cytidine deaminase comprises the wild-type amino acid sequence of cytosine deaminase. In some embodiments, the cytidine deaminase comprises one or more mutations in the cytosine deaminase sequence, so that the editing efficiency and/or substrate editing preference of the cytosine deaminase are changed according to specific needs.

在本文中,“缔合”取其最广泛的含义,涵盖两个功能模块直接或间接(例如通过接头)形成融合蛋白的情形,也涵盖两个功能模块各自独立,通过共价键(例如二硫键等)或非共价键键合在一起的情形。In this article, "association" is taken in its broadest meaning, covering the situation where two functional modules directly or indirectly (for example, through a linker) form a fusion protein, and also covering the situation where two functional modules are independent and bonded together by covalent bonds (such as disulfide bonds, etc.) or non-covalent bonds.

在本文中,术语“载体”是指能够转运与其连接的另一核酸的核酸分子。它是复制子,例如质粒、噬菌体或粘粒,可在其中插入另一个DNA区段以实现所插入区段的复制。通常,当与适当的控制元件结合时,载体能够复制。As used herein, the term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it is linked. It is a replicon, such as a plasmid, phage or cosmid, into which another DNA segment can be inserted to achieve replication of the inserted segment. Typically, a vector is capable of replication when combined with appropriate control elements.

在某些情况下,载体系统包含单个载体。或者,载体系统包含多个载体。载体可以是病毒载体。In some cases, the vector system comprises a single vector. Alternatively, the vector system comprises multiple vectors. The vector can be a viral vector.

在一些情况下,载体包括但不限于单链、双链或部分双链的核酸分子;包含一个或多个自由端、无自由端(例如环状)的核酸分子;包含DNA、RNA或两者的核酸分子;和本领域已知的其他多核苷酸变体。载体的一种类型是“质粒”,其是指环状双链DNA环,例如通过标准分子克隆技术,可以在其中插入其他DNA区段。另一种类型的载体是病毒载体,其中载体中存在病毒来源的DNA或RNA序列,用于包装成病毒(例如逆转录病毒、复制缺陷型逆转录病毒、腺病毒、复制缺陷型腺病毒和腺相关病毒)。病毒载体还包括病毒携带的用于转染到宿主细胞中的多核苷酸。某些载体能够在引入它们的宿主细胞中自主复制(例如,具有细菌复制起点的细菌载体和游离型哺乳动物载体)。在引入宿主细胞中后,将其他载体(例如,非游离型哺乳动物载体)整合到宿主细胞的基因组中,从而与宿主基因组一起复制。此外,某些载体能够引导与其可操作连接的基因的表达。此类载体在本文中称为“表达载体”。在真核细胞中表达的载体和导致在真核细胞中表达的载体在本文中可称为“真核表达载体”。在重组DNA技术中有用的常见表达载体通常是质粒的形式。In some cases, vectors include but are not limited to single-stranded, double-stranded or partially double-stranded nucleic acid molecules; nucleic acid molecules comprising one or more free ends, no free ends (e.g., circular); nucleic acid molecules comprising DNA, RNA or both; and other polynucleotide variants known in the art. One type of vector is a "plasmid", which refers to a circular double-stranded DNA loop, in which other DNA segments can be inserted, for example, by standard molecular cloning techniques. Another type of vector is a viral vector, in which there is a DNA or RNA sequence of viral origin in the vector for packaging into a virus (e.g., retrovirus, replication-defective retrovirus, adenovirus, replication-defective adenovirus, and adeno-associated virus). Viral vectors also include polynucleotides carried by viruses for transfection into host cells. Some vectors are able to replicate autonomously in the host cells into which they are introduced (e.g., bacterial vectors and free mammalian vectors with bacterial replication origins). After being introduced into the host cell, other vectors (e.g., non-free mammalian vectors) are integrated into the genome of the host cell, thereby replicating together with the host genome. In addition, some vectors are able to guide the expression of genes operably connected thereto. Such vectors are referred to herein as "expression vectors". Vectors expressed in eukaryotic cells and vectors that cause expression in eukaryotic cells may be referred to herein as "eukaryotic expression vectors." Common expression vectors useful in recombinant DNA techniques are often in the form of plasmids.

重组表达载体可以适合在宿主细胞中表达核酸的形式包含本公开的核酸,这意味着重组表达载体包含一个或多个调控元件,所述调控元件可根据待用于表达的宿主细胞进行选择,所述核酸可操作地连接至待表达的核酸序列。在重组表达载体内,“可操作地连接”旨在是指目标核苷酸序列以允许核苷酸序列表达的方式(例如,在体外转录/翻译系统中或者当载体被引入宿主细胞中时在宿主细胞中)连接至调控元件。有利的载体包括慢病毒和腺相关病毒,并且也可选择这些载体的类型以靶向特定类型的细胞。The recombinant expression vector can be suitable for the form of expressing nucleic acid in host cells to include nucleic acid of the present disclosure, which means that the recombinant expression vector includes one or more regulatory elements, which can be selected according to the host cell to be used for expression, and the nucleic acid is operably connected to the nucleic acid sequence to be expressed. In the recombinant expression vector, "operably connected" is intended to refer to the target nucleotide sequence to be connected to the regulatory element in a manner that allows the nucleotide sequence to be expressed (for example, in an in vitro transcription/translation system or in a host cell when the vector is introduced into a host cell). Advantageous vectors include slow viruses and adeno-associated viruses, and the types of these vectors can also be selected to target specific types of cells.

术语“调控元件”旨在包括启动子、增强子、内部核糖体进入位点(IRES)和其他表达控制元件(例如,转录终止信号,例如聚腺苷酸化信号和poly-U序列)。此类调控元件描述于例如Goeddel,GENE EXPRESSION TECHNOLOGY:METHODS IN ENZYMOLOGY 185,AcademicPress,San Diego,Calif.(1990)中。调控元件包括在许多类型的宿主细胞中引导核苷酸序列组成性表达的那些和仅在某些宿主细胞中引导核苷酸序列表达的那些(例如组织特异性调控序列)。组织特异性启动子可引导主要在目标所需组织例如肌肉、神经元、骨骼、皮肤、血液、特定器官(例如肝、胰腺)或特定细胞类型(例如淋巴细胞)中表达。调控元件也可以时间依赖性的方式引导表达,例如以细胞周期依赖性或发育阶段依赖性的方式引导表达,其也可以是或可以不是组织或细胞类型特异性的。在一些实施方案中,载体包含一个或多个pol III启动子(例如1、2、3、4、5个或更多个pol III启动子),一个或多个pol II启动子(例如1、2、3、4、5个或更多个pol II启动子),一个或多个pol I启动子(例如1、2、3、4、5个或更多个pol I启动子)或其组合。pol III启动子的实例包括但不限于U6和H1启动子。pol II启动子的实例包括但不限于逆转录病毒劳斯肉瘤病毒(RSV)LTR启动子(任选地带有RSV增强子),巨细胞病毒(CMV)启动子(任选地带有CMV增强子)[参见例如Boshart等,Cell,41:521-530(1985)],SV40启动子,二氢叶酸还原酶启动子,β-肌动蛋白启动子,磷酸甘油激酶(PGK)启动子和EF1α启动子。术语“调控元件”还涵盖增强子元件,例如WPRE;CMV增强子;HTLV-1的LTR中的R-U5'区段(Mol.Cell.Biol.,第8(1)卷,第466-472期,1988);SV40增强子;以及兔β-珠蛋白的外显子2和3之间的内含子序列(Proc.Natl.Acad.Sci.USA.,第78(3)卷,第1527-31页,1981)。本领域技术人员将理解,表达载体的设计可取决于例如要转化的宿主细胞的选择、所需表达水平等因素。可将载体引入宿主细胞以从而产生由本文所述的核酸编码的转录物、蛋白质或肽,包括融合蛋白或肽(例如,成簇的规律间隔的短回文重复序列(CRISPR)转录物、蛋白质、酶、其突变体形式、其融合蛋白等)。有利的载体包括慢病毒和腺相关病毒,并且也可选择这类载体的类型以靶向特定类型的细胞。在特定的实施方案中,使用双顺反子载体用于指导RNA和(任选地修饰或突变的)CRISPR酶(例如Cas12o)。The term "regulatory element" is intended to include promoters, enhancers, internal ribosome entry sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly-U sequences). Such regulatory elements are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990). Regulatory elements include those that direct constitutive expression of nucleotide sequences in many types of host cells and those that direct expression of nucleotide sequences only in certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters may direct expression primarily in target desired tissues such as muscle, neurons, bones, skin, blood, specific organs (e.g., liver, pancreas), or specific cell types (e.g., lymphocytes). Regulatory elements may also direct expression in a time-dependent manner, such as in a cell cycle-dependent or developmental stage-dependent manner, which may or may not be tissue or cell type specific. In some embodiments, the vector comprises one or more pol III promoters (e.g., 1, 2, 3, 4, 5 or more pol III promoters), one or more pol II promoters (e.g., 1, 2, 3, 4, 5 or more pol II promoters), one or more pol I promoters (e.g., 1, 2, 3, 4, 5 or more pol I promoters), or a combination thereof. Examples of pol III promoters include, but are not limited to, U6 and H1 promoters. Examples of pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with CMV enhancer) [see, e.g., Boshart et al., Cell, 41:521-530 (1985)], the SV40 promoter, the dihydrofolate reductase promoter, the β-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EF1α promoter. The term "regulatory element" also encompasses enhancer elements, such as WPRE; CMV enhancer; R-U5' segment in the LTR of HTLV-1 (Mol. Cell. Biol., Vol. 8 (1), No. 466-472, 1988); SV40 enhancer; and the intron sequence between exons 2 and 3 of rabbit β-globin (Proc. Natl. Acad. Sci. USA., Vol. 78 (3), pp. 1527-31, 1981). Those skilled in the art will appreciate that the design of the expression vector may depend on factors such as the choice of the host cell to be transformed, the desired expression level, and the like. The vector may be introduced into a host cell to thereby produce a transcript, protein, or peptide encoded by a nucleic acid described herein, including a fusion protein or peptide (e.g., a clustered regularly interspaced short palindromic repeat (CRISPR) transcript, protein, enzyme, mutant form thereof, fusion protein thereof, etc.). Advantageous vectors include lentiviruses and adeno-associated viruses, and the type of such vector may also be selected to target a particular type of cell. In particular embodiments, a bicistronic vector is used for the guide RNA and the (optionally modified or mutated) CRISPR enzyme (e.g., Cas12o).

可设计载体以在原核或真核细胞中表达CRISPR转录物(例如核酸转录物、蛋白质或酶)。例如,CRISPR转录物可在细菌细胞如大肠杆菌、昆虫细胞(使用杆状病毒表达载体)、酵母细胞或哺乳动物细胞中表达。合适的宿主细胞在Goeddel,GENE EXPRESSIONTECHNOLOGY:METHODS IN ENZYMOLOGY 185,Academic Press,San Diego,Calif.(1990)中进一步讨论。Vectors can be designed to express CRISPR transcripts (e.g., nucleic acid transcripts, proteins, or enzymes) in prokaryotic or eukaryotic cells. For example, CRISPR transcripts can be expressed in bacterial cells such as Escherichia coli, insect cells (using baculovirus expression vectors), yeast cells, or mammalian cells. Suitable host cells are further discussed in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990).

载体可在原核生物或原核细胞中引入并增殖,在一些实施方案中,原核生物用于扩增将要引入真核细胞的载体的拷贝,或在要引入真核细胞中的载体的生产中作为中间载体(例如,扩增质粒作为病毒载体包装系统的一部分)。在一些实施方案中,原核生物用于扩增载体的拷贝并表达一种或多种核酸,例如提供一种或多种蛋白质的来源以递送至宿主细胞或宿主生物体。蛋白质在原核生物中的表达最通常在大肠杆菌中与含有引导融合蛋白或非融合蛋白表达的组成型或诱导型启动子的载体一起进行。融合载体将许多氨基酸添加到其中编码的蛋白质上,例如添加到重组蛋白的氨基末端上。此类融合载体可提供一个或多个目的,例如:(i)增加重组蛋白的表达;(ii)增加重组蛋白的溶解度;以及(iii)通过在亲和纯化中充当配体来帮助重组蛋白的纯化。在一些实施方案中,载体是酵母表达载体。在酵母酿酒酵母(Saccharomyces cerivisae)中表达的载体的实例包括pYepSec1(Baldari等人,1987.EMBO J.6:229-234)、pMFa(Kuijan和Herskowitz,1982.Cell 30:933-943)、pJRY88(Schultz等人,1987.Gene 54:113-123)、pYES2(Invitrogen Corporation,San Diego,Calif.)和picZ(InVitrogen Corp,San Diego,Calif.)。在一些实施方案中,载体使用杆状病毒表达载体驱动昆虫细胞中的蛋白质表达。可用于在培养的昆虫细胞(例如SF9细胞)中表达蛋白质的杆状病毒载体包括pAc系列(Smith等人,1983.Mol.Cell.Biol.3:2156-2165)和pVL系列(Lucklow和Summers,1989.Virology 170:31-39)。The vector can be introduced and propagated in a prokaryotic organism or a prokaryotic cell, and in some embodiments, the prokaryotic organism is used to amplify a copy of the vector to be introduced into a eukaryotic cell, or as an intermediate vector in the production of the vector to be introduced into a eukaryotic cell (e.g., amplification of plasmid as part of a viral vector packaging system). In some embodiments, the prokaryotic organism is used to amplify a copy of the vector and express one or more nucleic acids, such as providing a source of one or more proteins to be delivered to a host cell or host organism. The expression of proteins in prokaryotes is most commonly carried out in Escherichia coli with a vector containing a constitutive or inducible promoter that guides the expression of fusion proteins or non-fusion proteins. A fusion vector adds many amino acids to the protein encoded therein, such as to the amino terminus of a recombinant protein. Such fusion vectors can provide one or more purposes, such as: (i) increasing the expression of recombinant proteins; (ii) increasing the solubility of recombinant proteins; and (iii) assisting the purification of recombinant proteins by acting as a ligand in affinity purification. In some embodiments, the vector is a yeast expression vector. Examples of vectors for expression in the yeast Saccharomyces cerivisae include pYepSec1 (Baldari et al., 1987. EMBO J. 6:229-234), pMFa (Kuijan and Herskowitz, 1982. Cell 30:933-943), pJRY88 (Schultz et al., 1987. Gene 54:113-123), pYES2 (Invitrogen Corporation, San Diego, Calif.), and picZ (InVitrogen Corp, San Diego, Calif.). In some embodiments, the vector uses a baculovirus expression vector to drive protein expression in insect cells. Baculovirus vectors that can be used to express proteins in cultured insect cells (e.g., SF9 cells) include the pAc series (Smith et al., 1983. Mol. Cell. Biol. 3:2156-2165) and the pVL series (Lucklow and Summers, 1989. Virology 170:31-39).

在一些实施方案中,载体能够使用哺乳动物表达载体驱动哺乳动物细胞中一个或多个序列的表达。哺乳动物表达载体的实例包括pCDM8(Seed,1987.Nature 329:840)和pMT2PC(Kaufman等人,1987.EMBO J.6:187-195)。当用于哺乳动物细胞中时,表达载体的控制功能通常由一个或多个调控元件提供。例如,常用的启动子衍生自多瘤、腺病毒、巨细胞病毒、猿猴病毒以及本文公开和本领域已知的其他启动子。对于用于原核和真核细胞的其他合适的表达系统,参见例如Sambrook等人,MOLECULAR CLONING:ALABORATORY MANUAL.第2版,Cold Spring Harbor Laboratory,Cold Spring Harbor Laboratory Press,Cold Spring Harbor,N.Y.,1989的第16章和第17章。In some embodiments, the vector is capable of driving the expression of one or more sequences in mammalian cells using a mammalian expression vector. Examples of mammalian expression vectors include pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufman et al., 1987. EMBO J. 6: 187-195). When used in mammalian cells, the control function of the expression vector is generally provided by one or more regulatory elements. For example, commonly used promoters are derived from polyoma, adenovirus, cytomegalovirus, simian virus, and other promoters disclosed herein and known in the art. For other suitable expression systems for prokaryotic and eukaryotic cells, see, for example, Sambrook et al., MOLECULAR CLONING: ALABORATORY MANUAL. 2nd edition, Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989, Chapters 16 and 17.

在一些实施方案中,重组哺乳动物表达载体能够优先引导核酸在特定细胞类型中的表达(例如,组织特异性调控元件用于表达核酸)。组织特异性调控元件是本领域已知的。合适的组织特异性启动子的非限制性实例包括白蛋白启动子(肝特异性;Pinkert等人,1987.Genes Dev.1:268-277),淋巴样特异性启动子(Calame和Eaton,1988.Adv.Immunol.43:235-275),特别是T细胞受体的启动子(Winoto和Baltimore,1989.EMBO J.8:729-733)和免疫球蛋白(Baneiji等人,1983.Cell 33:729-740;Queen和Baltimore,1983.Cell 33:741-748),神经元特异性启动子(例如,神经丝启动子;Byrne和Ruddle,1989.Proc.Natl.Acad.Sci.USA 86:5473-5477),胰腺特异性启动子(Edlund等人,1985.Science 230:912-916)和乳腺特异性启动子(例如乳清启动子;美国专利第4,873,316号和欧洲申请公开第264,166号)。In some embodiments, the recombinant mammalian expression vector is capable of preferentially directing expression of the nucleic acid in a specific cell type (e.g., tissue-specific regulatory elements are used to express the nucleic acid). Tissue-specific regulatory elements are known in the art. Non-limiting examples of suitable tissue-specific promoters include the albumin promoter (liver-specific; Pinkert et al., 1987. Genes Dev. 1:268-277), lymphoid-specific promoters (Calame and Eaton, 1988. Adv. Immunol. 43:235-275), in particular promoters for the T cell receptor (Winoto and Baltimore, 1989. EMBO J. 8:729-733) and immunoglobulins (Baneiji et al., 1983. Cell 33:729-74 0; Queen and Baltimore, 1983. Cell 33:741-748), neuron-specific promoters (e.g., neurofilament promoter; Byrne and Ruddle, 1989. Proc. Natl. Acad. Sci. USA 86:5473-5477), pancreas-specific promoters (Edlund et al., 1985. Science 230:912-916), and mammary gland-specific promoters (e.g., whey promoter; U.S. Pat. No. 4,873,316 and European Application Publication No. 264,166).

在一些实施方案中,将驱动核酸靶向系统的一个或多个元件表达的一个或多个载体引入宿主细胞,使得核酸靶向系统的元件的表达引导核酸靶向复合物在一个或多个靶位点的形成。在一些实施方案中,单个启动子驱动编码Cas12o蛋白和指导RNA的转录物的表达,所述转录物嵌入一个或多个内含子序列内(例如,各自在不同的内含子中,两个或更多个在至少一个内含子中,或全部在单个内含子中)。在一些实施方案中,Cas12o蛋白和指导RNA可以可操作地连接至同一启动子并从同一启动子表达。在一些实施方案中,载体包含一个或多个插入位点,例如限制性核酸内切酶识别序列(也称为“克隆位点”)。在一些实施方案中,一个或多个插入位点(例如,约或大于约1、2、3、4、5、6、7、8、9、10个或更多个插入位点)位于一个或多个载体的一个或多个序列元件的上游和/或下游。当使用多个不同的指导序列时,单个表达构建体可用于将核酸靶向活性靶向细胞内的多个不同的相应靶序列。例如,单个载体可包含约或大于约1、2、3、4、5、6、7、8、9、10、15、20个或更多个指导序列。在一些实施方案中,可提供约或大于约1、2、3、4、5、6、7、8、9、10个或更多个这样的含指导序列的载体,并任选地递送至细胞。在一些实施方案中,载体包含与编码Cas12o蛋白的编码序列可操作地连接的调控元件。Cas12o蛋白或者一种或多种核酸靶向指导RNA可分开递送;并且有利地,这些中的至少一种经由粒子复合物递送。可在指导RNA之前递送Cas12o蛋白mRNA,以留出时间表达Cas12o蛋白。Cas12o蛋白mRNA可在施用指导RNA之前1-12小时(优选约2-6小时)施用。或者,Cas12o蛋白mRNA和指导RNA可一起施用。有利地,可在初次施用Cas12o蛋白mRNA+指导RNA后1-12小时(优选约2-6小时)施用指导RNA的第二加强剂量。Cas12o蛋白mRNA和/或指导RNA的其他施用可能对实现最有效的基因组修饰水平有用。In some embodiments, one or more vectors driving the expression of one or more elements of the nucleic acid targeting system are introduced into the host cell so that the expression of the elements of the nucleic acid targeting system guides the formation of the nucleic acid targeting complex at one or more target sites. In some embodiments, a single promoter drives the expression of transcripts encoding Cas12o proteins and guide RNAs, and the transcripts are embedded in one or more intron sequences (e.g., each in a different intron, two or more in at least one intron, or all in a single intron). In some embodiments, Cas12o proteins and guide RNAs can be operably connected to the same promoter and expressed from the same promoter. In some embodiments, the vector comprises one or more insertion sites, such as restriction endonuclease recognition sequences (also referred to as "cloning sites"). In some embodiments, one or more insertion sites (e.g., about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more insertion sites) are located upstream and/or downstream of one or more sequence elements of one or more vectors. When a plurality of different guide sequences are used, a single expression construct can be used to target a plurality of different corresponding target sequences in a cell with nucleic acid targeting activity. For example, a single vector may contain about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20 or more guide sequences. In some embodiments, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more such vectors containing guide sequences may be provided and optionally delivered to cells. In some embodiments, the vector comprises a regulatory element operably connected to a coding sequence encoding a Cas12o protein. Cas12o protein or one or more nucleic acid targeting guide RNAs may be delivered separately; and advantageously, at least one of these is delivered via a particle complex. Cas12o protein mRNA may be delivered before the guide RNA to allow time for expression of the Cas12o protein. Cas12o protein mRNA may be administered 1-12 hours (preferably about 2-6 hours) before administering the guide RNA. Alternatively, Cas12o protein mRNA and guide RNA may be administered together. Advantageously, a second booster dose of guide RNA may be administered 1-12 hours (preferably about 2-6 hours) after the initial administration of Cas12o protein mRNA + guide RNA. Additional administration of Cas12o protein mRNA and/or guide RNA may be useful for achieving the most effective genome modification level.

在一些实施方案中,载体编码Cas12o蛋白,所述Cas12o蛋白包含一个或多个核定位序列(NLS),例如约或大于约1、2、3、4、5、6、7、8、9、10个或更多个NLS。更特别地,载体包含一个或多个天然不存在于Cas12o蛋白中的NLS。最特别地,NLS存在于Cas12o蛋白序列的载体5'和/或3'中。在一些实施方案中,靶向RNA的效应蛋白在氨基末端处或其附近包含约或大于约1、2、3、4、5、6、7、8、9、10个或更多个NLS,在羧基末端处或其附近包含约或大于约1、2、3、4、5、6、7、8、9、10个或更多个NLS,或这些的组合(例如,在氨基末端处的0个或至少一个或多个NLS和在羧基末端处的0个或一个或多个NLS)。当存在一个以上的NLS时,各自可彼此独立地进行选择,使得单个NLS可存在于一个以上的拷贝中和/或与一个或多个其他NLS组合存在于一个或多个拷贝中。在一些实施方案中,当NLS的最接近的氨基酸从N末端或C末端沿着多肽链在约1、2、3、4、5、10、15、20、25、30、40、50个或更多个氨基酸内时,认为NLS接近N末端或C末端。In some embodiments, the vector encodes a Cas12o protein comprising one or more nuclear localization sequences (NLS), such as about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS. More particularly, the vector comprises one or more NLSs that are not naturally present in the Cas12o protein. Most particularly, the NLS is present in the vector 5' and/or 3' of the Cas12o protein sequence. In some embodiments, the effector protein targeting RNA comprises about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the amino terminus, and comprises about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the carboxyl terminus, or a combination of these (e.g., 0 or at least one or more NLSs at the amino terminus and 0 or one or more NLSs at the carboxyl terminus). When more than one NLS is present, each may be selected independently of the other, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs in one or more copies. In some embodiments, an NLS is considered to be near the N-terminus or C-terminus when the closest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50 or more amino acids along the polypeptide chain from the N-terminus or C-terminus.

NLS的非限制性实例包括衍生自以下的NLS序列:SV40病毒大T抗原的NLS,其具有氨基酸序列PKKKRKV(SEQ ID NO.28);来自核质蛋白的NLS(例如具有序列KRPAATKKAGQAKKKK(SEQ ID NO.24)的核质蛋白二分NLS);具有氨基酸序列KRTADGSEFESPKKKRKV(SEQ ID NO.22)、AVKRPAATKKAGQAKKKKLD(SEQ ID NO.23)、KKTELQTTNAENKTKKL(SEQ ID NO.25)、KRGINDRNFWRGENGRKTR(SEQ ID NO.26)、RKSGKIAAIVVKRPRK(SEQ ID NO.27)、MDSLLMNRRKFLYQFKNVRWAKGRRETYLC(SEQ ID NO.29)的NLS。在一些实施方案中,NLS序列还包括具有氨基酸序列PAAKRVKLD(SEQ ID NO.32)或RQRRNELKRSP(SEQ ID NO:33)的c-myc NLS,包含氨基酸序列序列NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY(SEQ ID NO.34)的hRNPA1M9 NLS;来自输入蛋白-α的IBB结构域的序列RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV(SEQ ID NO:35);肌瘤T蛋白的序列VSRKRPRP(SEQ ID NO:36)和PPKKARED(SEQ ID NO:37);人类p53的序列PQPKKKPL(SEQ ID NO:38);小鼠c-abl IV的序列SALIKKKKKMAP(SEQ ID NO:39);流感病毒NS1的序列DRLRR和PKQKKRK(SEQ ID NO:41);肝炎病毒δ抗原的序列RKLKKKIKKL(SEQ ID NO:42);小鼠Mx1蛋白的序列REKKKFLKRR(SEQ ID NO:43);人类聚(ADP-核糖)聚合酶的序列KRKGDEVDGVDEVAKKKSKK(SEQ ID NO:44);以及类固醇激素受体(人类)糖皮质激素的序列RKCLQAGMNLEARKTKK(SEQ ID NO:45)。通常,一个或多个NLS具有足够的强度来驱动可检测量的靶向DNA/RNA的Cas12o蛋白在真核细胞核中的积累。通常,核定位活性的强度可源自核酸靶向效应蛋白中NLS的数量、所使用的特定NLS或这些因素的组合。在本文描述的Cas12o蛋白复合物和系统的优选实施方案中,密码子优化的Cas12o效应蛋白包含附接到所述蛋白的C末端的NLS。在某些实施方案中,可将其他定位标签融合至Cas12o蛋白,例如但不限于将Cas12o蛋白定位至细胞中的特定位点,例如细胞器,例如线粒体、质体、叶绿体、囊泡、高尔基体、(核或细胞)膜、核糖体、核仁、ER、细胞骨架、液泡、中心体、核小体、小粒、中心粒等。Non-limiting examples of NLS include NLS sequences derived from: NLS of SV40 virus large T antigen, which has the amino acid sequence PKKKRKV (SEQ ID NO.28); NLS from nucleoplasmic protein (for example, the nucleoplasmic protein bipartite NLS having the sequence KRPAATKKAGQAKKKK (SEQ ID NO.24)); NLS having the amino acid sequence KRTADGSEFESPKKKRKV (SEQ ID NO.22), AVKRPAATKKAGQAKKKKLD (SEQ ID NO.23), KKTELQTTNAENKTKKL (SEQ ID NO.25), KRGINDRNFWRGENGRKTR (SEQ ID NO.26), RKSGKIAAIVVKRPRK (SEQ ID NO.27), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO.29). In some embodiments, the NLS sequence also includes a c-myc NLS having an amino acid sequence of PAAKRVKLD (SEQ ID NO.32) or RQRRNELKRSP (SEQ ID NO:33), an hRNPA1M9 NLS comprising an amino acid sequence of NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO.34); a sequence of RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO:35) from the IBB domain of importin-α; a sequence of VSRKRPRP (SEQ ID NO:36) and PPKKARED (SEQ ID NO:37) from myoma T protein. ); the sequence of human p53 PQPKKKPL (SEQ ID NO: 38); the sequence of mouse c-abl IV SALIKKKKKMAP (SEQ ID NO: 39); the sequences of influenza virus NS1 DRLRR and PKQKKRK (SEQ ID NO: 41); the sequence of hepatitis virus delta antigen RKLKKKIKKL (SEQ ID NO: 42); the sequence of mouse Mx1 protein REKKKFLKRR (SEQ ID NO: 43); the sequence of human poly (ADP-ribose) polymerase KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 44); and the sequence of steroid hormone receptor (human) glucocorticoid RKCLQAGMNLEARKTKK (SEQ ID NO: 45). Typically, one or more NLSs are of sufficient strength to drive the accumulation of detectable amounts of DNA/RNA-targeting Cas12o proteins in the nucleus of eukaryotic cells. Typically, the strength of nuclear localization activity can be derived from the number of NLSs in the nucleic acid targeting effector protein, the specific NLS used, or a combination of these factors. In preferred embodiments of the Cas12o protein complex and system described herein, the codon-optimized Cas12o effector protein comprises an NLS attached to the C-terminus of the protein. In certain embodiments, other localization tags can be fused to the Cas12o protein, such as, but not limited to, localizing the Cas12o protein to a specific site in the cell, such as an organelle, such as mitochondria, plastids, chloroplasts, vesicles, Golgi bodies, (nuclear or cell) membranes, ribosomes, nucleoli, ER, cytoskeleton, vacuoles, centrosomes, nucleosomes, granules, centrioles, etc.

在一个实施方案中,本公开的Cas12o蛋白在其N端和/或C端包含一个或多个NLS,优选在其N端和C端各包含一个NLS。In one embodiment, the Cas12o protein of the present disclosure comprises one or more NLSs at its N-terminus and/or C-terminus, preferably one NLS each at its N-terminus and C-terminus.

密码子优化Codon optimization

在效应蛋白要作为核酸施用的情况下,本公开设想使用密码子优化的Cas12蛋白,更特别是编码Cas12o的核酸序列(和任选地蛋白序列)。密码子优化序列的一个实例,在这种情况下是为在真核生物例如人类中表达而优化的序列(即,为在人类中表达而优化),或为如本文讨论的另一种真核生物、动物或哺乳动物而优化的序列。尽管这是优选的,但应理解,其他实例也是可能的,并且用于除人类以外的宿主物种的密码子优化或用于特定器官的密码子优化是已知的。在一些实施方案中,对编码靶向DNA/RNA的Cas蛋白的酶编码序列进行密码子优化以在特定细胞如真核细胞中表达。真核细胞可以是特定生物体如植物或哺乳动物的真核细胞,或源自特定生物体如植物或哺乳动物的真核细胞,包括但不限于如本文所讨论的人类或非人类真核生物或动物或哺乳动物,例如小鼠、大鼠、兔、狗、牲畜或非人类的哺乳动物或灵长类动物。在一些实施方案中,用于修饰人类的种系遗传特性的方法和/或用于修饰可能导致人类遭受痛苦而对人类或动物没有任何实质性医学益处的动物的遗传特性的方法,以及由此类方法产生的动物,可能会被排除在外。一般来说,密码子优化是指通过用宿主细胞的基因中更频繁或最频繁使用的密码子代替天然序列的至少一个密码子(例如,约或大于约1、2、3、4、5、10、15、20、25、50个或更多个密码子)并同时保持天然氨基酸序列而在目标宿主细胞中修饰核酸序列以增强表达的方法。各种物种对特定氨基酸的某些密码子表现出特定的偏性。密码子偏性(生物体之间密码子使用的差异)通常与信使RNA(mRNA)的翻译效率相关,而信使RNA(mRNA)的翻译效率又被认为尤其取决于所翻译的密码子的特性和特定转移RNA(tRNA)分子的可用性。所选tRNA在细胞中的优势通常反映了肽合成中最常使用的密码子。因此,可基于密码子优化来定制基因以在给定生物中最佳基因表达。密码子使用表很容易获得,例如,可在www.kazusa.orjp/codon/的“密码子使用数据库”中获得,并且这些表格可通过多种方式进行修改。参见Nakamura,Y.等人,“Codon usage tabulated from theinternational DNA Sequence databas es:status for the year 2000”Nucl.AcidsRes.28:292(2000)。也可获得用于密码子优化特定序列以在特定宿主细胞中表达的计算机算法,例如Gene For ge(Aptagen;Jacobus,PA)。在一些实施方案中,编码靶向DNA/RNA的Cas蛋白的序列中的一个或多个密码子(例如1、2、3、4、5、10、15、20、25、50个或更多个或所有密码子)对应于特定氨基酸最常用的密码子。关于酵母中的密码子使用,参考可在www.yeastgenome.org/community/codon_usage.shtml获得的在线酵母基因组数据库,或Codon selection in yeast,Bennetzen和Ha ll,J Biol Chem.1982年3月25日;257(6):3026-31。关于在包括藻类的植物中的密码子使用,参考Codon usage in higher plants,green algae,and cyan obacteria,Campbell和Gowri,Plant Physiol.1990年1月;92(1):1-11.;以及Codon usage in plant genes,Murray等人,Nucleic Acids Res.1989年1月25日;17(2):477-98;或Selection on the codon bias of chloroplast and cyanell egenes in different plant and algal lineages,Morton BR,J Mol Evol.1998年4月;46(4):449-59。在一些实施方案中,编码Cas12o的多核苷酸已进行密码子优化已在对应宿主细胞(例如,针对哺乳动物细胞,更具体的例如人细胞,例如HSC或iPSC)中表达。In the case where the effector protein is to be administered as a nucleic acid, the present disclosure contemplates the use of codon-optimized Cas12 proteins, more particularly nucleic acid sequences (and optionally protein sequences) encoding Cas12o. An example of a codon-optimized sequence, in this case, is a sequence optimized for expression in eukaryotes such as humans (i.e., optimized for expression in humans), or a sequence optimized for another eukaryote, animal, or mammal as discussed herein. Although this is preferred, it should be understood that other examples are also possible, and codon optimization for host species other than humans or codon optimization for specific organs is known. In some embodiments, the enzyme coding sequence encoding the Cas protein targeting DNA/RNA is codon-optimized for expression in specific cells such as eukaryotic cells. Eukaryotic cells can be eukaryotic cells of specific organisms such as plants or mammals, or eukaryotic cells derived from specific organisms such as plants or mammals, including but not limited to humans or non-human eukaryotes or animals or mammals as discussed herein, such as mice, rats, rabbits, dogs, livestock, or non-human mammals or primates. In some embodiments, methods for modifying human germline genetic characteristics and/or methods for modifying genetic characteristics of animals that may cause human suffering without any substantial medical benefit to humans or animals, as well as animals produced by such methods, may be excluded. In general, codon optimization refers to a method of modifying a nucleic acid sequence in a target host cell to enhance expression by replacing at least one codon of a native sequence with a codon that is more frequently or most frequently used in the genes of the host cell (e.g., about or greater than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) and maintaining the native amino acid sequence. Various species exhibit specific biases for certain codons of specific amino acids. Codon bias (differences in codon usage between organisms) is generally related to the translation efficiency of messenger RNA (mRNA), which is believed to be particularly dependent on the properties of the codons translated and the availability of specific transfer RNA (tRNA) molecules. The advantage of the selected tRNA in the cell generally reflects the most frequently used codons in peptide synthesis. Therefore, genes can be customized based on codon optimization for optimal gene expression in a given organism. Codon usage tables are readily available, for example, in the "Codon Usage Database" at www.kazusa.orjp/codon/, and these tables can be modified in a variety of ways. See Nakamura, Y. et al., "Codon usage tabulated from the international DNA sequence databases: status for the year 2000" Nucl. Acids Res. 28:292 (2000). Computer algorithms for codon optimization of specific sequences for expression in specific host cells are also available, such as Gene Forge (Aptagen; Jacobus, PA). In some embodiments, one or more codons (e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more or all codons) in the sequence encoding the DNA/RNA-targeting Cas protein correspond to the most commonly used codons for a specific amino acid. For codon usage in yeast, refer to the online yeast genome database available at www.yeastgenome.org/community/codon_usage.shtml, or Codon selection in yeast, Bennetzen and Hall, J Biol Chem. 1982 Mar 25;257(6):3026-31. For codon usage in plants, including algae, see Codon usage in higher plants, green algae, and cyan bacteria, Campbell and Gowri, Plant Physiol. 1990 Jan;92(1):1-11.; and Codon usage in plant genes, Murray et al., Nucleic Acids Res. 1989 Jan 25;17(2):477-98; or Selection on the codon bias of chloroplast and cyanell enes in different plant and algal lineages, Morton BR, J Mol Evol. 1998 Apr;46(4):449-59. In some embodiments, the polynucleotide encoding Cas12o has been codon-optimized for expression in a corresponding host cell (eg, a mammalian cell, more specifically a human cell, such as an HSC or an iPSC).

接头Connectors

术语“接头”是指连接蛋白质以形成融合蛋白的分子。通常,此类分子除了连接或保持蛋白质之间的某一最小距离或其他空间关系外,没有特定的生物活性。然而,在实施方案中,可选择接头以影响接头和/或融合蛋白的一些特性,诸如接头的折叠、净电荷或疏水性。The term "linker" refers to a molecule that connects proteins to form a fusion protein. Typically, such molecules have no specific biological activity except for connecting or maintaining a certain minimum distance or other spatial relationship between proteins. However, in embodiments, the linker can be selected to affect some properties of the linker and/or fusion protein, such as the folding, net charge or hydrophobicity of the linker.

用于本文方法的合适接头包括直链或支链碳接头、杂环碳接头或肽接头。然而,如本文所用,接头也可以是共价键(碳-碳键或碳-杂原子键)。在实施方案中,接头可以是化学部分,其可以是单体、二聚体、多聚体或聚合体。优选地,接头包含氨基酸。柔性接头中的典型氨基酸包括Gly、Asn和Ser。因此,在特定的实施方案中,接头包含Gly、Asn和Ser氨基酸中的一种或多种的组合。其他近中性氨基酸,诸如Thr和Ala,也可用于接头序列。示例性的,可使用GlySer接头GGS、GGGS或GSG。GGS、GSG、GGGS或GGGGS接头可以多个重复(例如,可以是2个、3个、4个、5个、6个、7个、8个、9个、甚至更多个,例如,(GGS)3(SEQ ID NO:10)、(GGGGS)3(SEQ ID NO:15))以提供合适的长度。Suitable linkers for the methods herein include straight or branched carbon linkers, heterocyclic carbon linkers or peptide linkers. However, as used herein, linkers may also be covalent bonds (carbon-carbon bonds or carbon-heteroatom bonds). In embodiments, linkers may be chemical moieties, which may be monomers, dimers, polymers or polymers. Preferably, linkers include amino acids. Typical amino acids in flexible linkers include Gly, Asn and Ser. Therefore, in specific embodiments, linkers include one or more combinations of Gly, Asn and Ser amino acids. Other near-neutral amino acids, such as Thr and Ala, may also be used for linker sequences. Exemplary, GlySer linkers GGS, GGGS or GSG may be used. GGS, GSG, GGGS or GGGGS linkers may be repeated multiple times (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or even more, e.g., (GGS) 3 (SEQ ID NO: 10), (GGGGS) 3 (SEQ ID NO: 15)) to provide a suitable length.

引导编辑(Prime editing)Prime editing

在一个实施方案中,本公开提供了组合物和系统,所述组合物和系统可包含Cas12o蛋白或其无催化活性的形式、一个或多个crRNA或指导分子,和逆转录酶。所述系统可用于将供体多核苷酸插入靶多核苷酸。在一些实施方案中,所述组合物或系统包含无催化活性的Cas12o蛋白、与Cas12o蛋白缔合或能够以其他方式与之形成复合物的逆转录酶,以及能够与Cas12o蛋白形成复合物并引导复合物与靶多核苷酸的靶序列的位点特异性结合的crRNA,所述crRNA还包含用于插入靶多核苷酸的供体序列。在一些情况下,无催化活性的Cas12o蛋白可以是切口酶,例如DNA切口酶。在一些情况下,Cas12o蛋白具有一个或多个突变。In one embodiment, the present disclosure provides compositions and systems, which may include Cas12o protein or its catalytically inactive form, one or more crRNA or guide molecules, and reverse transcriptase. The system can be used to insert a donor polynucleotide into a target polynucleotide. In some embodiments, the composition or system includes a catalytically inactive Cas12o protein, a reverse transcriptase that associates with or can otherwise form a complex with the Cas12o protein, and a crRNA that can form a complex with the Cas12o protein and guide the site-specific binding of the complex to the target sequence of the target polynucleotide, and the crRNA also includes a donor sequence for inserting the target polynucleotide. In some cases, the catalytically inactive Cas12o protein can be a nickase, such as a DNA nickase. In some cases, the Cas12o protein has one or more mutations.

Cas12o蛋白可与逆转录酶缔合。逆转录酶结构域可以是逆转录酶或其片段。在一些情况下,逆转录酶是人免疫缺陷病毒(HIV)RT、禽成肌细胞病毒(AMV)RT、莫洛尼鼠白血病病毒(M-MLV)RT、II组内含子RT、II组内含子样RT,或嵌合RT。在一个实施方案中,RT包含这些RT的修饰形式,诸如禽成肌细胞病毒(AMV)RT、莫洛尼鼠白血病病毒(M-MLV)RT或人免疫缺陷病毒(HIV)RT的工程化变体(参见,例如,Anzalone等人,Search-and-replace genome editing without double-strand breaks or donor DNA,Nature.2019年12月;576(7785):149-157)。Cas12o protein can be associated with reverse transcriptase. The reverse transcriptase domain can be a reverse transcriptase or a fragment thereof. In some cases, the reverse transcriptase is human immunodeficiency virus (HIV) RT, avian myoblast virus (AMV) RT, Moloney murine leukemia virus (M-MLV) RT, group II intron RT, group II intron-like RT, or chimeric RT. In one embodiment, RT comprises a modified form of these RTs, such as an engineered variant of avian myoblast virus (AMV) RT, Moloney murine leukemia virus (M-MLV) RT or human immunodeficiency virus (HIV) RT (see, e.g., Anzalone et al., Search-and-replace genome editing without double-strand breaks or donor DNA, Nature. 2019 December; 576(7785): 149-157).

在一些实施方案中,所述组合物和系统可包含本文公开的Cas12o蛋白或其变体;与Cas12o蛋白或其变体连接或以其他方式能够与之形成复合物的逆转录酶(RT)多肽;以及crRNA,其能够与Cas12o蛋白或其变体形成复合物,并且包含:能够引导Cas12o蛋白或其变体与靶多核苷酸的靶序列的位点特异性结合的crRNA或指导序列;能够结合至靶多核苷酸的上游切割链的3'结合位点区;和编码延伸序列的RT模板序列,其中所述延伸序列包含变体区和能够与靶多核苷酸的下游切割链杂交的3'同源序列。In some embodiments, the compositions and systems may include a Cas12o protein or a variant thereof disclosed herein; a reverse transcriptase (RT) polypeptide connected to or otherwise capable of forming a complex with the Cas12o protein or a variant thereof; and a crRNA capable of forming a complex with the Cas12o protein or a variant thereof and comprising: a crRNA or a guide sequence capable of guiding the site-specific binding of the Cas12o protein or its variant to a target sequence of a target polynucleotide; a 3' binding site region capable of binding to the upstream cleavage strand of the target polynucleotide; and an RT template sequence encoding an extension sequence, wherein the extension sequence comprises a variant region and a 3' homologous sequence capable of hybridizing with the downstream cleavage strand of the target polynucleotide.

逆转录酶结构域可以是逆转录酶或其片段。多种多样的逆转录酶(RT)可用于本公开的替代实施方案,包括原核和真核RT,前提是这些RT在宿主内起作用而从RNA模板产生供体多核苷酸序列。如果需要,可修饰天然RT的核苷酸序列,例如使用已知的密码子优化技术进行修饰,以便优化在所需宿主内的表达。逆转录酶(RT)是一种用于从RNA模板产生互补DNA(cDNA)的酶,这一过程称为逆转录。在一个实施方案中,逆转录酶的RT结构域用于本公开中。该结构域可能仅包括依赖于RNA的DNA聚合酶活性。在一些实例中,RT结构域是非诱变性的,即不会引起供体多核苷酸中的突变(例如,在逆转录酶过程中)。在一些情况下,在一些实例中,RT结构域可以是非逆转录子RT,例如病毒RT或人内源性RT。在一些实例中,RT结构域可以是逆转录子RT或DGR RT。在一些实例中,RT可能比对应的野生型RT的诱变性更小。在一个实施方案中,本文的RT不是诱变性的。逆转录酶可融合至Cas12o或其变体的C末端。或者或另外,逆转录酶可融合至Cas12o或其变体的N末端。融合可通过接头进行。在一些实例中,逆转录酶可以是M-MLV逆转录酶或其变体。M-MLV逆转录酶变体可包含一个或多个突变。The reverse transcriptase domain can be a reverse transcriptase or a fragment thereof. A wide variety of reverse transcriptases (RT) can be used in alternative embodiments of the present disclosure, including prokaryotic and eukaryotic RTs, provided that these RTs function in a host and produce a donor polynucleotide sequence from an RNA template. If desired, the nucleotide sequence of a natural RT can be modified, for example, using known codon optimization techniques to optimize expression in a desired host. Reverse transcriptase (RT) is an enzyme for producing complementary DNA (cDNA) from an RNA template, a process known as reverse transcription. In one embodiment, the RT domain of a reverse transcriptase is used in the present disclosure. The domain may only include RNA-dependent DNA polymerase activity. In some instances, the RT domain is non-mutagenic, i.e., does not cause mutations in the donor polynucleotide (e.g., during the reverse transcriptase process). In some cases, in some instances, the RT domain may be a non-reverse transcriptase RT, such as a viral RT or a human endogenous RT. In some instances, the RT domain may be a reverse transcriptase RT or a DGR RT. In some instances, RT may be less mutagenic than the corresponding wild-type RT. In one embodiment, the RT herein is not mutagenic. The reverse transcriptase may be fused to the C-terminus of Cas12o or a variant thereof. Alternatively or additionally, the reverse transcriptase may be fused to the N-terminus of Cas12o or a variant thereof. The fusion may be performed through a linker. In some examples, the reverse transcriptase may be an M-MLV reverse transcriptase or a variant thereof. The M-MLV reverse transcriptase variant may comprise one or more mutations.

逆转录酶结构域Reverse transcriptase domain

一个或多个功能结构域可以是一个或多个逆转录酶结构域。在一个实施方案中,所述系统包含用于修饰靶多核苷酸的工程化系统,所述系统包含:Cas12o蛋白或CRISPR相关Cas12o蛋白或其变体(例如,dCas12o);逆转录酶(RT)结构域;包含或编码待插入靶多核苷酸的靶序列的供体多核苷酸的RNA模板;和crRNA。One or more functional domains may be one or more reverse transcriptase domains. In one embodiment, the system comprises an engineered system for modifying a target polynucleotide, the system comprising: a Cas12o protein or a CRISPR-associated Cas12o protein or a variant thereof (e.g., dCas12o); a reverse transcriptase (RT) domain; an RNA template comprising or encoding a donor polynucleotide of a target sequence to be inserted into a target polynucleotide; and crRNA.

逆转录子Retrotranscriptase

在一个实施方案中,用于同源重组的供体模板是通过使用用于逆转录的自启动RNA模板产生的。自启动逆转录系统的一个非限制性实例是逆转录子系统。术语“逆转录子”意指一种遗传元件,它编码的组分使得能够合成支链RNA连接的单链DNA(msDNA)和逆转录酶。在一个实施方案中,逆转录酶结构域是逆转录子RT结构域。在一个实施方案中,RNA模板编码由逆转录子逆转录酶结构域识别和逆转录的逆转录子RNA模板。逆转录子在许多细菌物种中都是保守的,是功能相对未知的高效逆转录系统。逆转录子系统由逆转录子RT蛋白以及msr和msd转录物组成,msr和msd转录物分别充当引物和模板序列。逆转录子系统的所有组分均从单个开放阅读框作为单个转录物表达,所述转录物包括msr-msd并编码逆转录子RT蛋白(Lampson等人,2005,Retrons,msDNA,and the bacterialgenome.Cytogenet Genome Res 110:491-499)。逆转录子的msr元件ORF提供msDNA分子的RNA部分,而msd元件ORF提供msDNA分子的DNA部分。来自msr-msd区域的主要转录物被认为充当产生msDNA的模板和引物。使用其2'-OH基团从RNA转录物的内部rG残基开始合成msDNA。还可对msd或msr进行修饰以允许在msd内插入编码供体多核苷酸的RNA模板而不改变msDNA的功能或产生。编码供体多核苷酸序列的RNA模板可以是任何长度,但优选地小于约5kb核苷酸,或还小于约2kb,或还小于500个碱基,条件是产生msDNA产物。In one embodiment, the donor template for homologous recombination is produced by using the self-starting RNA template for reverse transcription.A non-limiting example of a self-starting reverse transcription system is a reverse transcription subsystem.Term " reverse transcriptase " means a kind of genetic element, and its encoded component enables the synthesis of single-stranded DNA (msDNA) and reverse transcriptase connected by branched RNA.In one embodiment, the reverse transcriptase domain is a reverse transcriptase RT domain.In one embodiment, the RNA template encodes the reverse transcriptase RNA template identified and reversely transcribed by the reverse transcriptase reverse transcriptase domain.Reverse transcriptase is all conservative in many bacterial species, and is a relatively unknown efficient reverse transcription system of function.The reverse transcription subsystem is composed of reverse transcriptase RT protein and msr and msd transcripts, and msr and msd transcripts serve as primers and template sequences respectively. All components of the reverse transcriptase subsystem are expressed as a single transcript from a single open reading frame, which includes msr-msd and encodes the reverse transcriptase RT protein (Lampson et al., 2005, Retrons, msDNA, and the bacterial genome. Cytogenet Genome Res 110: 491-499). The msr element ORF of the reverse transcriptase provides the RNA portion of the msDNA molecule, while the msd element ORF provides the DNA portion of the msDNA molecule. The main transcript from the msr-msd region is considered to serve as a template and primer for producing msDNA. The msDNA is synthesized from the internal rG residue of the RNA transcript using its 2'-OH group. MSD or msr can also be modified to allow the insertion of an RNA template encoding a donor polynucleotide within msd without changing the function or production of msDNA. The RNA template encoding the donor polynucleotide sequence can be of any length, but preferably less than about 5kb nucleotides, or also less than about 2kb, or also less than 500 bases, provided that an msDNA product is produced.

拓扑异构酶Topoisomerase

拓扑异构酶是一类通过核酸链的断裂和重新连接来改变DNA的拓扑状态的酶。在一些情况下,拓扑异构酶可以是DNA拓扑异构酶,这种酶在转录过程中控制和改变DNA的拓扑状态,并催化DNA单链的瞬时断裂和重新连接,从而允许链彼此穿过,由此改变DNA的拓扑结构。在一些实施方案中,一个或多个功能结构域可以是一个或多个拓扑异构酶结构域。在一个实施方案中,用于修饰靶多核苷酸的工程化系统包含:Cas12o蛋白;拓扑异构酶结构域;和包含或编码待插入靶多核苷酸的靶序列的供体多核苷酸的核酸模板。在一些实例中,以下两者或更多者可形成复合物:Cas12o蛋白;拓扑异构酶结构域;和核酸模板。在一些实例中,以下两者或更多者可包含在融合蛋白中:Cas12o蛋白;拓扑异构酶结构域。Topoisomerase is a class of enzymes that change the topological state of DNA by breaking and reconnecting nucleic acid chains. In some cases, the topoisomerase may be a DNA topoisomerase, which controls and changes the topological state of DNA during transcription and catalyzes the instantaneous breakage and reconnection of DNA single strands, thereby allowing the chains to pass through each other, thereby changing the topological structure of DNA. In some embodiments, one or more functional domains may be one or more topoisomerase domains. In one embodiment, an engineered system for modifying a target polynucleotide comprises: Cas12o protein; a topoisomerase domain; and a nucleic acid template comprising or encoding a donor polynucleotide of a target sequence to be inserted into a target polynucleotide. In some examples, two or more of the following may form a complex: Cas12o protein; a topoisomerase domain; and a nucleic acid template. In some examples, two or more of the following may be included in a fusion protein: Cas12o protein; a topoisomerase domain.

在一个实施方案中,拓扑异构酶结构域能够连接供体多核苷酸与靶多核苷酸。连接可通过粘端或平端连接来实现。在一个实例中,供体多核苷酸可包含突出端,该突出端包含与靶多核苷酸的区域互补的序列。将供体多核苷酸与靶多核苷酸连接的实例包括TOPO克隆的实例,例如,在“The Technology Behind TOPO Cloning,”于www.thermofisher.com/us/en/home/life-science/cloning/topo/topo-resources/the-technology-behind-topo-cloning.html中描述的那些。在一个实施方案中,拓扑异构酶结构域可与供体多核苷酸缔合。例如,拓扑异构酶结构域与供体多核苷酸共价连接。In one embodiment, the topoisomerase domain is capable of connecting the donor polynucleotide to the target polynucleotide. The connection can be achieved by sticky end or blunt end connection. In one example, the donor polynucleotide may include an overhang, which includes a sequence complementary to a region of the target polynucleotide. Examples of connecting the donor polynucleotide to the target polynucleotide include examples of TOPO cloning, for example, those described in "The Technology Behind TOPO Cloning," at www.thermofisher.com/us/en/home/life-science/cloning/topo/topo-resources/the-technology-behind-topo-cloning.html. In one embodiment, the topoisomerase domain can be associated with the donor polynucleotide. For example, the topoisomerase domain is covalently linked to the donor polynucleotide.

拓扑异构酶的实例包括I型(包括IA型和IB型拓扑异构酶),其切割双链核酸分子的单条链;以及II型拓扑异构酶(例如,促旋酶),其切割双链核酸分子的两条链。在一些实例中,拓扑异构酶是DNA拓扑异构酶I,例如牛痘病毒拓扑异构酶I。拓扑异构酶可预加载供体多核苷酸。牛痘病毒拓扑异构酶可能需要包含5'-OH基团的靶标。Examples of topoisomerases include type I (including type IA and type IB topoisomerases), which cut a single strand of a double-stranded nucleic acid molecule; and type II topoisomerases (e.g., gyrase), which cut both strands of a double-stranded nucleic acid molecule. In some instances, the topoisomerase is a DNA topoisomerase I, such as vaccinia virus topoisomerase I. The topoisomerase may be preloaded with a donor polynucleotide. The vaccinia virus topoisomerase may require a target comprising a 5'-OH group.

磷酸酶Phosphatase

本文的系统还可包含磷酸酶结构域。磷酸酶是一种能够从分子例如核酸诸如DNA中去除磷酸基团的酶。磷酸酶的实例包括小牛肠磷酸酶、虾碱性磷酸酶、热敏磷酸酶(Antarctic phosphatase)和APEX碱性磷酸酶。The systems herein may also comprise a phosphatase domain. A phosphatase is an enzyme that is capable of removing a phosphate group from a molecule, e.g., a nucleic acid such as DNA. Examples of phosphatases include calf intestinal phosphatase, shrimp alkaline phosphatase, Antarctic phosphatase, and APEX alkaline phosphatase.

在一些实施方案中,靶多核苷酸中的5'-OH基团可由磷酸酶产生。与5'磷酸靶标相容的拓扑异构酶可用于产生稳定加载的中间体。在一些情况下,可使用在切割靶多核苷酸后留下5'OH的Cas12o蛋白或CRISPR相关Cas12o蛋白。在一些情况下,磷酸酶结构域可与Cas12o蛋白缔合(例如,融合)。磷酸酶结构域可能能够在靶多核苷酸的5'端产生-OH基团。磷酸酶可与系统中的其他组分分开,例如,作为单独的蛋白质,在与其他组分分开的载体上递送。In some embodiments, the 5'-OH group in the target polynucleotide can be produced by a phosphatase. Topoisomerases compatible with 5' phosphate targets can be used to produce stably loaded intermediates. In some cases, Cas12o proteins or CRISPR-related Cas12o proteins that leave 5'OH after cutting the target polynucleotide can be used. In some cases, the phosphatase domain can be associated with (e.g., fused to) the Cas12o protein. The phosphatase domain may be able to produce an -OH group at the 5' end of the target polynucleotide. The phosphatase can be separated from other components in the system, for example, as a separate protein, delivered on a carrier separated from other components.

聚合酶Polymerase

本文的系统还可包含聚合酶结构域。聚合酶是指合成核酸链的酶。聚合酶可以是DNA聚合酶或RNA聚合酶。The system herein may also include a polymerase domain. A polymerase refers to an enzyme that synthesizes a nucleic acid chain. The polymerase may be a DNA polymerase or an RNA polymerase.

在一个实施方案中,所述系统包括用于修饰靶多核苷酸的工程化系统,所述工程化系统包含:Cas12o蛋白或CRISPR相关Cas12o蛋白;DNA聚合酶结构域;和包含待插入靶多核苷酸的靶序列的供体多核苷酸的DNA模板。在一些实例中,以下两者或更多者可形成复合物:Cas12o蛋白;DNA聚合酶结构域;和DNA模板。在一些实例中,以下两者或更多者包含在融合蛋白中:Cas12o蛋白;DNA聚合酶结构域。例如,Cas12o蛋白或CRISPR相关Cas12o蛋白和DNA聚合酶结构域可包含在融合蛋白中。In one embodiment, the system includes an engineered system for modifying a target polynucleotide, the engineered system comprising: a Cas12o protein or a CRISPR-associated Cas12o protein; a DNA polymerase domain; and a DNA template comprising a donor polynucleotide of a target sequence to be inserted into the target polynucleotide. In some examples, two or more of the following may form a complex: a Cas12o protein; a DNA polymerase domain; and a DNA template. In some examples, two or more of the following are included in a fusion protein: a Cas12o protein; a DNA polymerase domain. For example, a Cas12o protein or a CRISPR-associated Cas12o protein and a DNA polymerase domain may be included in a fusion protein.

在一个实施方案中,所述系统可包含Cas12o蛋白或CRISPR相关Cas12o蛋白(或其变体,诸如dCas12o蛋白或CRISPR相关Cas12o蛋白或CRISPR相关Cas12o切口酶)和DNA聚合酶(例如phi29、T4、T7 DNA聚合酶)。所述系统还可包含单链DNA或双链DNA模板。DNA模板可包含i)与靶多核苷酸上的Cas12o蛋白的靶位点同源的第一序列,和/或ii)与靶多核苷酸的另一区域同源的第二序列。在一个实施方案中,模板可以是合成的单链或PCR产生的DNA分子(任选地由修饰的核苷酸加以端保护),或病毒基因组(例如AAV)。在另一个实施方案中,模板是使用逆转录酶产生的。当系统被递送到细胞中时,可使用细胞中的内源性DNA聚合酶。或者或另外,外源性DNA聚合酶可在细胞中表达。DNA模板可由一个或多个修饰的核苷酸加以端保护,或者包含病毒基因组的一部分。In one embodiment, the system may comprise a Cas12o protein or a CRISPR-associated Cas12o protein (or a variant thereof, such as a dCas12o protein or a CRISPR-associated Cas12o protein or a CRISPR-associated Cas12o nickase) and a DNA polymerase (e.g., phi29, T4, T7 DNA polymerase). The system may also comprise a single-stranded DNA or double-stranded DNA template. The DNA template may comprise i) a first sequence homologous to the target site of the Cas12o protein on the target polynucleotide, and/or ii) a second sequence homologous to another region of the target polynucleotide. In one embodiment, the template may be a synthetic single-stranded or PCR-generated DNA molecule (optionally end-protected by modified nucleotides), or a viral genome (e.g., AAV). In another embodiment, the template is generated using a reverse transcriptase. When the system is delivered to a cell, an endogenous DNA polymerase in the cell may be used. Alternatively or additionally, an exogenous DNA polymerase may be expressed in the cell. The DNA template may be end-protected by one or more modified nucleotides, or may comprise a portion of a viral genome.

DNA聚合酶的实例包括Taq、Tne(exo-)、Tma(exo-)、Pfu(exo-)、Pwo(exo-)、热硫化氢热厌氧菌(Thermoanaerobacter thermohydrosulf uricus)DNA聚合酶、嗜热高温球菌(Thermococcus litoralis)DNA聚合酶I、大肠杆菌DNA聚合酶I、Taq DNA聚合酶I、Tth DNA聚合酶I、嗜热脂肪芽胞杆菌(Bacillus stearothermophilus)(Bst)DNA聚合酶I、大肠杆菌DNA聚合酶III、噬菌体T5 DNA聚合酶、噬菌体M2 DNA聚合酶、噬菌体T4 DNA聚合酶、噬菌体T7 DNA聚合酶、噬菌体phi29 DNA聚合酶、噬菌体PRD1 DNA聚合酶、噬菌体phi15DNA聚合酶、噬菌体phi21DNA聚合酶、噬菌体PZE DNA聚合酶、噬菌体PZA DNA聚合酶、噬菌体NfDNA聚合酶、噬菌体M2Y DNA聚合酶、噬菌体B103 DNA聚合酶、噬菌体SF5 DNA聚合酶、噬菌体GA-1DNA聚合酶、噬菌体Cp-5 DNA聚合酶、噬菌体Cp-7 DNA聚合酶、噬菌体PR4 DNA聚合酶、噬菌体PR5 DNA聚合酶、噬菌体PR722 DNA聚合酶和噬菌体L17DNA聚合酶。Examples of DNA polymerases include Taq, Tne(exo-), Tma(exo-), Pfu(exo-), Pwo(exo-), Thermoanaerobacter thermohydrosulf uricus DNA polymerase, Thermococcus litoralis DNA polymerase I, Escherichia coli DNA polymerase I, Taq DNA polymerase I, Tth DNA polymerase I, Bacillus stearothermophilus (Bst) DNA polymerase I, Escherichia coli DNA polymerase III, bacteriophage T5 DNA polymerase, bacteriophage M2 DNA polymerase. synthase, bacteriophage T4 DNA polymerase, bacteriophage T7 DNA polymerase, bacteriophage phi29 DNA polymerase, bacteriophage PRD1 DNA polymerase, bacteriophage phi15 DNA polymerase, bacteriophage phi21 DNA polymerase, bacteriophage PZE DNA polymerase, bacteriophage PZA DNA polymerase, bacteriophage NfDNA polymerase, bacteriophage M2Y DNA polymerase, bacteriophage B103 DNA polymerase, bacteriophage SF5 DNA polymerase, bacteriophage GA-1 DNA polymerase, bacteriophage Cp-5 DNA polymerase, bacteriophage Cp-7 DNA polymerase, bacteriophage PR4 DNA polymerase, bacteriophage PR5 DNA polymerase, bacteriophage PR722 DNA polymerase and bacteriophage L17 DNA polymerase.

系统和复合物Systems and complexes

本公开还提供了核酸靶向系统。此类系统可用于靶向、修饰和以其他方式操纵核酸。在一个实施方案中,所述系统包含Cas12o蛋白或CRISPR相关Cas12o蛋白和一种或多种crRNA或指导RNA。Cas12o蛋白或CRISPR相关Cas12o蛋白可具有核酸酶活性,例如,能够切割DNA或RNA。Cas12o蛋白或CRISPR相关Cas12o蛋白可具有切口酶活性,例如,能够在双链核酸诸如dsDNA或dsRNA上产生单链断裂。Cas12o蛋白或CRISPR相关Cas12o蛋白可呈死亡形式,例如,具有切口酶活性,或不具有核酸酶或切口酶活性。在一个实施方案中,所述系统还包含一个或多个功能结构域,例如,核苷酸脱氨酶、逆转录酶、非LTR逆转录转座子(和编码的蛋白质)、聚合酶、产生多样性的元件(和编码的蛋白质)。在一些实例中,所述系统还包含一个或多个供体多核苷酸。供体多核苷酸可通过系统插入靶多核苷酸。供体多核苷酸可包含在核酸模板中或由核酸模板编码。在一些实施方案中,本文系统中的两种或更多种组分可形成复合物。例如,所述组分是独立的分子,但彼此直接或间接相互作用。在本文系统中的某些两种或更多种组分中可包含在融合蛋白中。The present disclosure also provides a nucleic acid targeting system.Such systems can be used for targeting, modifying and otherwise manipulating nucleic acids.In one embodiment, the system comprises Cas12o protein or CRISPR-related Cas12o protein and one or more crRNA or guide RNA.Cas12o protein or CRISPR-related Cas12o protein may have nuclease activity, for example, capable of cutting DNA or RNA.Cas12o protein or CRISPR-related Cas12o protein may have nickase activity, for example, capable of producing single-strand breaks on double-stranded nucleic acids such as dsDNA or dsRNA.Cas12o protein or CRISPR-related Cas12o protein may be in a dead form, for example, with nickase activity, or without nuclease or nickase activity.In one embodiment, the system also comprises one or more functional domains, for example, nucleotide deaminase, reverse transcriptase, non-LTR retrotransposon (and encoded protein), polymerase, and elements (and encoded protein) for producing diversity.In some examples, the system also comprises one or more donor polynucleotides.Donor polynucleotides can be inserted into target polynucleotides by the system. The donor polynucleotide may be included in or encoded by the nucleic acid template. In some embodiments, two or more components in the system herein may form a complex. For example, the components are independent molecules, but interact directly or indirectly with each other. Some of the two or more components in the system herein may be included in a fusion protein.

术语“靶序列”是指crRNA被设计成与其具有互补性的序列,其中靶序列与间隔(Spacer)序列之间的杂交促进靶向DNA或RNA的复合物的形成。不一定需要完全互补性,只要有足够的互补性以引起杂交并促进靶向核酸的复合物的形成。靶序列可包含RNA多核苷酸。在一个实施方案中,靶序列位于细胞的细胞核或细胞质中。在一个实施方案中,靶序列可在真核细胞的细胞器,例如,线粒体或叶绿体内。可用于重组到包含靶序列的靶向基因座的序列或模板称为“编辑模板”或“编辑序列”。在本公开的多个方面,外源性模板可称为编辑模板。在一个方面,重组是同源重组。The term "target sequence" refers to a sequence to which crRNA is designed to have complementarity, wherein the hybridization between the target sequence and the spacer sequence promotes the formation of a complex targeting DNA or RNA. Complete complementarity is not necessarily required, as long as there is enough complementarity to cause hybridization and promote the formation of a complex targeting nucleic acid. The target sequence may comprise an RNA polynucleotide. In one embodiment, the target sequence is located in the nucleus or cytoplasm of the cell. In one embodiment, the target sequence may be in an organelle of a eukaryotic cell, for example, a mitochondria or a chloroplast. A sequence or template that can be used to recombine to a targeting locus comprising a target sequence is referred to as an "editing template" or "editing sequence". In various aspects of the present disclosure, an exogenous template may be referred to as an editing template. In one aspect, recombination is homologous recombination.

在一个实施方案中,靶向核酸的复合物(包含与靶序列杂交并与一种或多种靶向核酸的效应蛋白复合的指导RNA)的形成导致靶序列中或附近(例如距其1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、20个、50个或更多个碱基对以内)一条或两条核酸链的切割。在一个实施方案中,驱动核酸靶向系统的一个或多个元件表达的一种或多种载体被引入到宿主细胞中,以使得该核酸靶向系统的这些元件的表达能引导靶向核酸的复合物在一个或多个靶位点处形成。例如,靶向核酸的效应蛋白和crRNA或指导RNA可各自可操作地连接至单独载体上的单独调控元件。或者,从相同或不同调控元件表达的这些元件的两种或更多种可组合在单一载体中,其中一种或多种另外的载体提供核酸靶向系统在第一载体中不包含的任何组分。组合于单一载体中的核酸靶向系统元件可以布置为任何适合的取向,诸如一个元件位于相对于第二元件的5'(“上游”)或相对于该第二元件的3'(“下游”)。一个元件的编码序列可位于第二元件的编码序列的相同链或相反链上,并且取向为相同或相反方向。在一个实施方案中,单一启动子驱动编码靶向核酸的效应蛋白的转录物和嵌入一个或多个内含子序列之内(例如,各自在不同内含子中、两个或更多个在至少一个内含子中,或所有在单一内含子中)的指导RNA的表达。在一个实施方案中,靶向核酸的效应蛋白和指导RNA可操作地连接至同一启动子并从该同一启动子表达。In one embodiment, the formation of a nucleic acid-targeting complex (comprising a guide RNA hybridized to a target sequence and complexed with one or more nucleic acid-targeting effector proteins) results in the cleavage of one or both nucleic acid strands in or near (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50 or more base pairs of) the target sequence. In one embodiment, one or more vectors driving the expression of one or more elements of a nucleic acid-targeting system are introduced into a host cell such that expression of these elements of the nucleic acid-targeting system directs the formation of a nucleic acid-targeting complex at one or more target sites. For example, a nucleic acid-targeting effector protein and a crRNA or guide RNA can each be operably linked to a separate regulatory element on a separate vector. Alternatively, two or more of these elements expressed from the same or different regulatory elements can be combined in a single vector, wherein one or more additional vectors provide any components of the nucleic acid-targeting system not contained in the first vector. The nucleic acid-targeting system elements combined in a single vector can be arranged in any suitable orientation, such as one element being located 5' ("upstream") relative to a second element or 3' ("downstream") relative to the second element. The coding sequence of one element may be located on the same strand or the opposite strand of the coding sequence of the second element and oriented in the same or opposite direction. In one embodiment, a single promoter drives the expression of a transcript encoding a nucleic acid-targeting effector protein and a guide RNA embedded within one or more intron sequences (e.g., each in a different intron, two or more in at least one intron, or all in a single intron). In one embodiment, the nucleic acid-targeting effector protein and guide RNA are operably linked to and expressed from the same promoter.

供体多核苷酸Donor polynucleotide

在一个实施方案中,本文的组合物和系统可包含一个或多个核酸模板。在一些情况下,核酸模板可包含一个或多个多核苷酸。在某些情况下,核酸模板可包含一个或多个多核苷酸的编码序列。核酸模板可以是RNA模板。核酸模板可以是DNA模板。In one embodiment, the compositions and systems herein may include one or more nucleic acid templates. In some cases, the nucleic acid template may include one or more polynucleotides. In some cases, the nucleic acid template may include the coding sequence of one or more polynucleotides. The nucleic acid template may be an RNA template. The nucleic acid template may be a DNA template.

供体多核苷酸可用于编辑靶多核苷酸。在一些情况下,供体多核苷酸包含一个或多个待引入靶多核苷酸的突变。此类突变的实例包括取代、缺失、插入或它们的组合。突变可引起靶多核苷酸上的开放阅读框移码。在一些情况下,供体多核苷酸改变靶多核苷酸中的终止密码子。例如,供体多核苷酸可校正提前终止密码子。可通过使终止密码子缺失或向终止密码子引入一个或多个突变来实现校正。在其他示例性实施方案中,供体多核苷酸通过插入或恢复基因的功能拷贝或其功能片段或功能调控序列或调控序列的功能片段来解决例如在某些疾病情形下可能发生的功能丧失突变、缺失或易位。功能片段是指基因的而不到完整拷贝,方式是提供足够的核苷酸序列来恢复野生型基因或非编码调控序列(例如编码长非编码RNA的序列)的功能性。在某些示例性实施方案中,本文公开的系统可用于替代缺陷基因或其缺陷片段的单个等位基因。在另一个示例性实施方案中,本文公开的系统可用于替代缺陷基因或缺陷基因片段的两个等位基因。“缺陷基因”或“缺陷基因片段”是表达时不能产生具有相应野生型基因的功能性蛋白或非编码RNA的基因或基因部分。在某些示例性实施方案中,这些缺陷基因可能与一种或多种疾病表型相关联。在某些示例性实施方案中,缺陷基因或基因片段未被替代,但本文所述的系统用于插入供体多核苷酸,所述供体多核苷酸编码补偿或覆盖缺陷基因表达的基因或基因片段,从而消除与缺陷基因表达相关的细胞表型或更改为不同的或所需的细胞表型。Donor polynucleotides can be used to edit target polynucleotides. In some cases, donor polynucleotides include one or more mutations to be introduced into target polynucleotides. Examples of such mutations include substitutions, deletions, insertions, or combinations thereof. Mutations can cause open reading frame shifts on target polynucleotides. In some cases, donor polynucleotides change the stop codons in target polynucleotides. For example, donor polynucleotides can correct premature stop codons. Correction can be achieved by making the stop codons missing or introducing one or more mutations to the stop codons. In other exemplary embodiments, donor polynucleotides solve loss-of-function mutations, deletions, or translocations that may occur, for example, in certain disease situations by inserting or restoring a functional copy of a gene or its functional fragment or a functional regulatory sequence or a functional fragment of a regulatory sequence. Functional fragments refer to genes that are less than complete copies, in a manner that provides enough nucleotide sequences to restore the functionality of wild-type genes or non-coding regulatory sequences (e.g., sequences encoding long non-coding RNAs). In certain exemplary embodiments, the system disclosed herein can be used to replace a single allele of a defective gene or its defective fragment. In another exemplary embodiment, the system disclosed herein can be used to replace two alleles of a defective gene or a defective gene fragment. A "defective gene" or "defective gene fragment" is a gene or gene portion that, when expressed, fails to produce a functional protein or non-coding RNA with a corresponding wild-type gene. In certain exemplary embodiments, these defective genes may be associated with one or more disease phenotypes. In certain exemplary embodiments, the defective gene or gene fragment is not replaced, but the system described herein is used to insert a donor polynucleotide encoding a gene or gene fragment that compensates or covers the expression of the defective gene, thereby eliminating the cell phenotype associated with the expression of the defective gene or changing it to a different or desired cell phenotype.

在一个实施方案中,供体多核苷酸可包括但不限于基因或基因片段、待表达的编码蛋白或RNA转录物、调控元件、修复模板等。根据本公开,供体多核苷酸可包含与介导插入的转座组分一起发挥作用的左端和右端序列元件。在某些情况下,供体多核苷酸操纵靶多核苷酸上的剪接位点。在一些实例中,供体多核苷酸破坏剪接位点。可通过将多核苷酸插入剪接位点和/或将一个或多个突变引入剪接位点来实现破坏。在某些实例中,供体多核苷酸可恢复剪接位点。例如,多核苷酸可包含剪接位点序列。In one embodiment, the donor polynucleotide may include but is not limited to a gene or gene fragment, a coded protein to be expressed or an RNA transcript, a regulatory element, a repair template, etc. According to the present disclosure, the donor polynucleotide may include left and right end sequence elements that work together with the transposition component that mediates insertion. In some cases, the donor polynucleotide manipulates the splice site on the target polynucleotide. In some instances, the donor polynucleotide destroys the splice site. Destruction can be achieved by inserting the polynucleotide into the splice site and/or introducing one or more mutations into the splice site. In some instances, the donor polynucleotide can restore the splice site. For example, the polynucleotide may include a splice site sequence.

待插入的供体多核苷酸的大小可为10个碱基对或核苷酸至50kb的长度,例如,50至40k、100和30k、100至10000、100至300、200至400、300至500、400至600、500至700、600至800、700至900、800至1000、900至1100、1000至1200、1100至1300、1200至1400、1300至1500、1400至1600、1500至1700、600至1800、1700至1900、1800至2000个碱基对(bp)或核苷酸的长度。The size of the donor polynucleotide to be inserted can be from 10 base pairs or nucleotides to 50 kb in length, for example, 50 to 40k, 100 and 30k, 100 to 10000, 100 to 300, 200 to 400, 300 to 500, 400 to 600, 500 to 700, 600 to 800, 700 to 900, 800 to 1000, 900 to 1100, 1000 to 1200, 1100 to 1300, 1200 to 1400, 1300 to 1500, 1400 to 1600, 1500 to 1700, 600 to 1800, 1700 to 1900, 1800 to 2000 base pairs (bp) or nucleotides in length.

递送deliver

本公开还提供了用于将本文的系统和组合物的组分引入细胞、组织、器官或生物体中的递送系统。递送系统可包含一种或多种递送媒介物和/或货物。在一些实施方式中,CRISPR-Cas系统的组分可以各种形式递送,例如DNA/RNA或RNA/RNA或蛋白质RNA的组合。例如,Cas12o蛋白可作为编码DNA的多核苷酸或编码RNA的多核苷酸或作为蛋白质被递送。所述指导物可作为DNA编码多核苷酸或RNA被递送。设想了所有可能的组合,包括混合的递送形式。The present disclosure also provides a delivery system for introducing the components of the systems and compositions herein into cells, tissues, organs or organisms. The delivery system may include one or more delivery vehicles and/or cargo. In some embodiments, the components of the CRISPR-Cas system can be delivered in various forms, such as a combination of DNA/RNA or RNA/RNA or protein RNA. For example, the Cas12o protein can be delivered as a polynucleotide encoding DNA or a polynucleotide encoding RNA or as a protein. The guide can be delivered as a DNA encoding polynucleotide or RNA. All possible combinations are envisioned, including mixed delivery forms.

在一些方面,本公开提供了包括将一个或多个多核苷酸例如如本文所述的一个或多个载体、其一个或多个转录物和/或从其转录的一个或多个蛋白质递送至宿主细胞的方法。In some aspects, the disclosure provides methods comprising delivering one or more polynucleotides, such as one or more vectors as described herein, one or more transcripts thereof, and/or one or more proteins transcribed therefrom, to a host cell.

在一些实施方案中,将驱动核酸靶向系统的一个或多个元件表达的一个或多个载体引入宿主细胞,使得核酸靶向系统的元件的表达引导核酸靶向复合物在一个或多个靶位点的形成。例如,核酸靶向效应酶和核酸靶向指导RNA可各自可操作地连接至分开的载体上的分开的调控元件。可将核酸靶向系统的RNA递送至转基因核酸靶向效应蛋白动物或哺乳动物,例如,组成性或诱导性或条件性表达核酸靶向效应蛋白的动物或哺乳动物;或以其他方式表达核酸靶向效应蛋白或者具有含有核酸靶向效应蛋白的细胞的动物或哺乳动物,例如通过事先向其施用编码并表达体内核酸靶向效应蛋白的一个或多个载体。或者,可将由相同或不同调控元件表达的两个或更多个元件组合在单个载体中,而一个或多个额外载体提供不包含在第一载体中的核酸靶向系统的任何组分。组合在单个载体中的核酸靶向系统元件可以任何合适的方向排列,例如一个元件位于相对于第二元件的(“上游”)5'或相对于第二元件的(“下游”)3'。一个元件的编码序列可位于第二元件的编码序列的相同或相反链上,并以相同或相反的方向定向。在一些实施方案中,单个启动子驱动编码核酸靶向效应蛋白和核酸靶向指导RNA的转录物的表达,所述转录物嵌入一个或多个内含子序列内(例如,各自在不同的内含子中,两个或更多个在至少一个内含子中,或全部在单个内含子中)。在一些实施方案中,核酸靶向效应蛋白和核酸靶向指导RNA可以可操作地连接至同一启动子并从同一启动子表达。用于表达核酸靶向系统的一个或多个元件的递送媒介物、载体、粒子、纳米粒子、制剂及其组分如WO 2014/093622(PCT/US2013/074667)中所使用。在一些实施方案中,载体包含一个或多个插入位点,例如限制性核酸内切酶识别序列(也称为“克隆位点”)。在一些实施方案中,一个或多个插入位点(例如,约或大于约1、2、3、4、5、6、7、8、9、10个或更多个插入位点)位于一个或多个载体的一个或多个序列元件的上游和/或下游。当使用多个不同的指导序列时,单个表达构建体可用于将核酸靶向活性靶向细胞内的多个不同的相应靶序列。例如,单个载体可包含约或大于约1、2、3、4、5、6、7、8、9、10、15、20个或更多个指导序列。在一些实施方案中,可提供约或大于约1、2、3、4、5、6、7、8、9、10个或更多个这样的含指导序列的载体,并任选地递送至细胞。在一些实施方案中,载体包含与编码核酸靶向效应蛋白的酶编码序列可操作地连接的调控元件。核酸靶向效应蛋白或者一种或多种核酸靶向指导RNA可分开递送;并且有利地,这些中的至少一种经由粒子复合物递送。可在核酸靶向指导RNA之前递送核酸靶向效应蛋白mRNA,以留出时间表达核酸靶向效应蛋白。核酸靶向效应蛋白mRNA可在施用核酸靶向指导RNA之前1-12小时(优选约2-6小时)施用。或者,核酸靶向效应蛋白mRNA和核酸靶向指导RNA可一起施用。有利地,可在初次施用核酸靶向效应蛋白mRNA+指导RNA后1-12小时(优选约2-6小时)施用指导RNA的第二加强剂量。核酸靶向效应蛋白mRNA和/或指导RNA的其他施用可能对实现最有效的基因组修饰水平有用。In some embodiments, one or more vectors that drive the expression of one or more elements of the nucleic acid targeting system are introduced into a host cell so that the expression of the elements of the nucleic acid targeting system directs the formation of a nucleic acid targeting complex at one or more target sites. For example, a nucleic acid targeting effector enzyme and a nucleic acid targeting guide RNA can each be operably linked to separate regulatory elements on separate vectors. The RNA of the nucleic acid targeting system can be delivered to a transgenic nucleic acid targeting effector protein animal or mammal, for example, an animal or mammal that constitutively or inducibly or conditionally expresses the nucleic acid targeting effector protein; or an animal or mammal that otherwise expresses the nucleic acid targeting effector protein or has cells containing the nucleic acid targeting effector protein, for example, by previously administering thereto one or more vectors encoding and expressing the nucleic acid targeting effector protein in vivo. Alternatively, two or more elements expressed by the same or different regulatory elements can be combined in a single vector, and one or more additional vectors provide any components of the nucleic acid targeting system not contained in the first vector. The nucleic acid targeting system elements combined in a single vector can be arranged in any suitable orientation, for example, one element is located 5' ("upstream") relative to the second element or 3' ("downstream") relative to the second element. The coding sequence of one element may be located on the same or opposite strand of the coding sequence of the second element and oriented in the same or opposite direction. In some embodiments, a single promoter drives expression of transcripts encoding nucleic acid targeting effector proteins and nucleic acid targeting guide RNAs, which are embedded within one or more intron sequences (e.g., each in a different intron, two or more in at least one intron, or all in a single intron). In some embodiments, the nucleic acid targeting effector protein and the nucleic acid targeting guide RNA may be operably linked to and expressed from the same promoter. Delivery vehicles, vectors, particles, nanoparticles, formulations, and components thereof for expressing one or more elements of a nucleic acid targeting system are as used in WO 2014/093622 (PCT/US2013/074667). In some embodiments, the vector comprises one or more insertion sites, such as restriction endonuclease recognition sequences (also referred to as "cloning sites"). In some embodiments, one or more insertion sites (e.g., about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more insertion sites) are located upstream and/or downstream of one or more sequence elements of one or more vectors. When multiple different guide sequences are used, a single expression construct can be used to target nucleic acid targeting activity to multiple different corresponding target sequences within a cell. For example, a single vector may contain about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20 or more guide sequences. In some embodiments, about or greater than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more such guide sequence-containing vectors may be provided and optionally delivered to a cell. In some embodiments, the vector comprises a regulatory element operably linked to an enzyme coding sequence encoding a nucleic acid targeting effector protein. The nucleic acid targeting effector protein or one or more nucleic acid targeting guide RNAs may be delivered separately; and advantageously, at least one of these is delivered via a particle complex. The nucleic acid targeting effector protein mRNA may be delivered before the nucleic acid targeting guide RNA to allow time for the nucleic acid targeting effector protein to be expressed. The nucleic acid-targeting effector protein mRNA can be administered 1-12 hours (preferably about 2-6 hours) before the administration of the nucleic acid-targeting guide RNA. Alternatively, the nucleic acid-targeting effector protein mRNA and the nucleic acid-targeting guide RNA can be administered together. Advantageously, a second booster dose of the guide RNA can be administered 1-12 hours (preferably about 2-6 hours) after the initial administration of the nucleic acid-targeting effector protein mRNA + guide RNA. Other administrations of nucleic acid-targeting effector protein mRNA and/or guide RNA may be useful to achieve the most effective level of genome modification.

常规的基于病毒和非病毒的基因转移方法可用于将核酸引入哺乳动物细胞或靶组织中。此类方法可用于向培养中或宿主生物体中的细胞施用编码核酸靶向系统组分的核酸。非病毒载体递送系统包含DNA质粒,RNA(例如本文所述的载体的转录物),裸核酸和与例如脂质体的递送媒介物复合的核酸。病毒载体传递系统包含DNA和RNA病毒,它们在递送至细胞后具有游离或整合的基因组。关于基因治疗程序的综述,参见Anderson,Science256:808-813(1992);Nabel和Felgner,TIBTECH 11;211-217(1993);Mitani和Caskey,TIBTECH 11:162-166(1993);Dillon,TIBTECH 11:167-175 10(1993);Miller,Nature 357:455-460(1992);Van Brunt,Biotechnology 6(10):1149-1154(1988);Vigne,Restorative Neurology and Neuroscience 8:35-36(1995);Kremer和Perricaudet,British Medical Bulletin 51(1):31-44(1995);Haddada等人,Current Topics in Microbiology and Immunology,Doerfler和Bohm(编)(1995);以及Yu等人,Gene Therapy 1:13-26(1994)。Conventional viral and non-viral gene transfer methods can be used to introduce nucleic acids into mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding nucleic acid targeting system components to cells in culture or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (transcripts of vectors such as those described herein), naked nucleic acids, and nucleic acids complexed with delivery vehicles such as liposomes. Viral vector delivery systems include DNA and RNA viruses that have free or integrated genomes after delivery to cells. For reviews of gene therapy procedures, see Anderson, Science 256:808-813 (1992); Nabel and Felgner, TIBTECH 11:211-217 (1993); Mitani and Caskey, TIBTECH 11:162-166 (1993); Dillon, TIBTECH 11:167-175 10 (1993); Miller, Nature 357:455-460 (1992); Van Brunt, Biotechnology 6(10):1149-1154 (198 8); Vigne, Restorative Neurology and Neuroscience 8:35-36 (1995); Kremer and Perricaudet, British Medical Bulletin 51(1):31-44 (1995); Haddada et al., Current Topics in Microbiology and Immunology, Doerfler and Bohm (eds.) (1995); and Yu et al., Gene Therapy 1:13-26 (1994).

核酸的非病毒递送方法包括脂质转染、核转染、显微注射、生物弹射(biolistics)、病毒体、脂质体、免疫脂质体、聚阳离子或脂质:核酸缀合物、裸DNA、人工病毒体和试剂增强的DNA摄取。脂质转染描述于例如美国专利第5,049,386号、第4,946,787号;和第4,897,355号中并且脂质转染试剂在商业上出售(例如TransfectamTM和LipofectinTM)。适用于多核苷酸的有效受体识别脂质转染的阳离子脂质和中性脂质包括Felgner,WO 91/17424;WO 91/16024的那些。可递送至细胞(例如体外或离体施用)或靶组织(例如体内施用)。Non-viral delivery methods for nucleic acids include lipofection, nuclear transfection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycations or lipids: nucleic acid conjugates, naked DNA, artificial virosomes, and agent-enhanced DNA uptake. Lipofection is described in, for example, U.S. Patent Nos. 5,049,386, 4,946,787; and 4,897,355 and lipofection reagents are sold commercially (e.g., Transfectam TM and Lipofectin TM ). Cationic lipids and neutral lipids suitable for effective receptor recognition lipofection of polynucleotides include Felgner, WO 91/17424; Those of WO 91/16024. Can be delivered to cells (e.g., in vitro or ex vivo administration) or target tissues (e.g., in vivo administration).

质粒递送涉及将指导RNA克隆到表达CRISPR-Cas蛋白的质粒中,并在细胞培养物中转染DNA。质粒骨架可商购获得并且不需要特定的设备。它们具有模块化的优势,能够携带不同大小的CRISPR-Cas编码序列(包括编码更大尺寸蛋白质的序列)以及选择标志物。同时,质粒的优点在于它们可确保瞬时但持续的表达。然而,质粒的递送并不是直接的,使得体内效率通常很低。持续表达也可能是不利的,因为它可增加脱靶编辑。另外,CRISPR-Cas蛋白的过量积累可能对细胞有毒。最后,质粒始终具有dsDNA在宿主基因组中随机整合的风险,更特别是考虑到产生双链断裂(在靶和脱靶)的风险。Plasmid delivery involves cloning the guide RNA into a plasmid expressing the CRISPR-Cas protein and transfecting the DNA in cell culture. Plasmid backbones are commercially available and do not require specific equipment. They have the advantage of modularity, being able to carry CRISPR-Cas coding sequences of different sizes (including sequences encoding larger-sized proteins) and selection markers. At the same time, the advantage of plasmids is that they ensure instantaneous but sustained expression. However, the delivery of plasmids is not direct, making the in vivo efficiency generally low. Sustained expression may also be disadvantageous because it can increase off-target editing. In addition, excessive accumulation of CRISPR-Cas proteins may be toxic to cells. Finally, plasmids always have the risk of random integration of dsDNA in the host genome, more particularly considering the risk of producing double-strand breaks (on-target and off-target).

脂质:核酸复合物(包括靶向脂质体,例如免疫脂质复合物)的制备是本领域技术人员众所周知的(参见例如Crystal,Science 270:404-410(1995);Blaese等人,CancerGene Ther.2:291-297(1995);Behr等人,Bioconjugate Chem.5:382-389(1994);Remy等人,Bioconjugate Chem.5:647-654(1994);Gao等人,Gene Therapy 2:710-722(1995);Ahmad等人,Cancer Res.52:4817-4820(1992);美国专利第4,186,183号、第4,217,344号、第4,235,871号、第4,261,975号、第4,485,054号、第4,501,728号、第4,774,085号、第4,837,028号和第4,946,787号)。这将在下面更详细地讨论。The preparation of lipid:nucleic acid complexes (including targeted liposomes, such as immunolipid complexes) is well known to those skilled in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1995); 654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787). This will be discussed in more detail below.

使用基于RNA或DNA病毒的系统来递送核酸利用了将病毒靶向体内的特定细胞并将病毒有效载荷运输至细胞核的高度进化的过程。病毒载体可直接施用于患者(体内),或者它们可用于体外治疗细胞,并且修饰的细胞可任选地施用于患者(离体)。常规的基于病毒的系统可包括逆转录病毒、慢病毒、腺病毒、腺相关病毒和单纯疱疹病毒载体,用于基因转移。用逆转录病毒、慢病毒和腺相关病毒基因转移方法可整合到宿主基因组中,这通常会导致插入的转基因的长期表达。另外,已在许多不同的细胞类型和靶组织中观察到高转导效率。The use of RNA or DNA virus-based systems to deliver nucleic acids utilizes a highly evolved process of targeting specific cells in the virus body and transporting the viral payload to the nucleus. Viral vectors can be directly applied to patients (in vivo), or they can be used for in vitro therapeutic cells, and modified cells can be optionally applied to patients (ex vivo). Conventional viral-based systems can include retrovirus, slow virus, adenovirus, adeno-associated virus and herpes simplex virus vectors for gene transfer. Retrovirus, slow virus and adeno-associated virus gene transfer methods can be integrated into the host genome, which usually results in the long-term expression of the inserted transgene. In addition, high transduction efficiency has been observed in many different cell types and target tissues.

逆转录病毒的嗜性可通过并入外来包膜蛋白,扩大靶细胞的潜在靶标群体来改变。慢病毒载体是能够转导或感染非分裂细胞并通常产生高病毒滴度的逆转录病毒载体。因此,逆转录病毒基因转移系统的选择将取决于靶组织。逆转录病毒载体由顺式作用的长末端重复序列组成,其包装能力高达6-10kb的外来序列。最小的顺式作用LTR足以复制和包装载体,然后将其用于将治疗性基因整合到靶细胞中以提供永久性转基因表达。广泛使用的逆转录病毒载体包括基于鼠类白血病病毒(MuLV)、长臂猿白血病病毒(GaLV)、猿猴免疫缺陷病毒(SIV)、人免疫缺陷病毒(HIV)及其组合的载体(参见例如Buchscher等人,J.Virol.66:2731-2739(1992);Johann等人,J.Virol.66:1635-1640(1992);Sommnerfelt等人,Virol.176:58-59(1990);Wilson等人,J.Virol.63:2374-2378(1989);Miller等人,J.Virol.65:2220-2224(1991);PCT/US94/05700)。The tropism of retroviruses can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Therefore, the choice of retroviral gene transfer system will depend on the target tissue. Retroviral vectors consist of cis-acting long terminal repeats with the ability to package up to 6-10 kb of foreign sequence. The minimal cis-acting LTRs are sufficient for replication and packaging of the vector, which is then used to integrate the therapeutic gene into the target cells to provide permanent transgene expression. Widely used retroviral vectors include those based on murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), simian immunodeficiency virus (SIV), human immunodeficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992); Sommnerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al., J. Virol. 65:2220-2224 (1991); PCT/US94/05700).

在优选瞬时表达的应用中,可使用基于腺病毒的系统。基于腺病毒的载体在许多细胞类型中都能够实现很高的转导效率并且不需要细胞分裂。利用这样的载体,已经获得了高滴度和表达水平。该载体可在相对简单的系统中大量产生。腺相关病毒(“AAV”)载体也可用于用靶核酸转导细胞,例如,在核酸和肽的体外生产中,以及用于体内和离体基因治疗程序(参见例如West等人,Virology 160:38-47(1987);美国专利第4,797,368号;WO 93/24641;Kotin,Human Gene Therapy 5:793-801(1994);Muzyczka,J.Clin.Invest.94:1351(1994))。重组AAV载体的构建描述于许多出版物,包括美国专利第5,173,414号;Tratschin等人,Mol.Cell.Biol.5:3251-3260(1985);Tratschin等人,Mol.Cell.Biol.4:2072-2081(1984);Hermonat和Muzyczka,PNAS81:6466-6470(1984);以及Samulski等人,J.Virol.63:03822-3828(1989)。In applications where transient expression is preferred, adenovirus-based systems can be used. Adenovirus-based vectors can achieve high transduction efficiencies in many cell types and do not require cell division. Using such vectors, high titers and expression levels have been obtained. The vector can be produced in large quantities in a relatively simple system. Adeno-associated virus ("AAV") vectors can also be used to transduce cells with target nucleic acids, for example, in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, for example, West et al., Virology 160:38-47 (1987); U.S. Patent No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994)). The construction of recombinant AAV vectors is described in many publications, including U.S. Pat. No. 5,173,414; Tratschin et al., Mol. Cell. Biol. 5:3251-3260 (1985); Tratschin et al., Mol. Cell. Biol. 4:2072-2081 (1984); Hermonat and Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al., J. Virol. 63:03822-3828 (1989).

本公开提供了AAV,其包含以下或基本上由以下组成:编码CRISPR系统的外源核酸分子,例如,多个包含第一盒或由第一盒组成的盒,所述第一盒包含以下或基本上由以下组成:启动子,编码CRISPR相关(Cas)蛋白(假定核酸酶或解旋酶蛋白)的核酸分子,例如,Cas12o和终止子,以及一个或多个,有利地多达载体的包装大小极限,例如,总共五个盒(包括第一盒),所述盒包含以下或基本上由以下组成:启动子,编码指导RNA(gRNA)的核酸分子和终止子(例如,每个盒示意性表示为启动子-gRNA1-终止子,启动子-gRNA2-终止子...启动子-gRNA(N)-终止子,其中N是可插入的载体的包装大小极限的上限的数目),或两个或更多个单独的rAAV,每个rAAV含有一个或多于一个CRISPR系统的盒,例如,第一rAAV,其含有第一盒,所述第一盒包含以下或基本上由以下组成:启动子,编码Cas的核酸分子,例如Cas和终止子,和第二rAAV,其含有一个或多个盒,每个盒包含以下或基本上由以下组成:启动子,编码指导RNA(gRNA)的核酸分子和终止子(例如,每个盒示意性表示为启动子-gRNA1-终止子,启动子-gRNA2-终止子...启动子-gRNA(N)-终止子,其中N是可插入的载体的包装大小极限的上限的数目)。或者,由于Cas12o可处理其自身的crRNA/gRNA,因此单个crRNA/gRNA阵列可用于多重基因编辑。因此,并非包含多个盒来递送gRNA,rAAV可含有单个盒,所述盒包含以下或基本上由以下组成:启动子,多个crRNA/gRNA和终止子(例如,示意性表示为启动子-gRNA1-gRNA2…gRNA(N)-终止子,其中N是可插入的载体的包装大小极限的上限的数目)。参见Zetsche等人,Nature Biotechnology 35,31-34(2017),其通过引用整体并入本文。由于rAAV是DNA病毒,因此本文关于AAV或rAAV的讨论中的核酸分子有利地是DNA。在一些实施方案中,启动子有利地是人突触蛋白I启动子(hSyn)。用于将核酸递送至细胞的其他方法是本领域技术人员已知的。参见例如US20030087817,其通过引用并入本文。The present disclosure provides an AAV comprising or consisting essentially of an exogenous nucleic acid molecule encoding a CRISPR system, e.g., a plurality of cassettes comprising or consisting of a first cassette, the first cassette comprising or consisting essentially of a promoter, a nucleic acid molecule encoding a CRISPR-associated (Cas) protein (a putative nuclease or helicase protein), e.g., Cas12o, and a terminator, and one or more, advantageously up to the packaging size limit of the vector, e.g., a total of five cassettes (including the first cassette), the cassette comprising or consisting essentially of a promoter, a nucleic acid molecule encoding a guide RNA (gRNA), and a terminator (e.g., each cassette is schematically represented as promoter-gRNA1-terminator, promoter-gRNA2-terminator...promoter-gRNA(N)-terminator The invention relates to a plurality of rAAVs, each of which contains one or more CRISPR system boxes, for example, a first rAAV containing a first box, the first box containing the following or consisting essentially of the following: a promoter, a nucleic acid molecule encoding Cas, such as Cas and a terminator, and a second rAAV containing one or more boxes, each of which contains the following or consisting essentially of the following: a promoter, a nucleic acid molecule encoding a guide RNA (gRNA) and a terminator (for example, each box is schematically represented as promoter-gRNA1-terminator, promoter-gRNA2-terminator...promoter-gRNA (N)-terminator, wherein N is the number of the upper limit of the packaging size limit of the vector that can be inserted). Alternatively, since Cas12o can process its own crRNA/gRNA, a single crRNA/gRNA array can be used for multiplex gene editing. Thus, instead of comprising multiple boxes to deliver gRNA, rAAV may contain a single box comprising or consisting essentially of: a promoter, multiple crRNA/gRNAs, and a terminator (e.g., schematically represented as promoter-gRNA1-gRNA2...gRNA(N)-terminator, where N is the number of the upper limit of the packaging size limit of the vector that can be inserted). See Zetsche et al., Nature Biotechnology 35, 31-34 (2017), which is incorporated herein by reference in its entirety. Since rAAV is a DNA virus, the nucleic acid molecules discussed herein regarding AAV or rAAV are advantageously DNA. In some embodiments, the promoter is advantageously the human synapsin I promoter (hSyn). Other methods for delivering nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817, which is incorporated herein by reference.

在另一个实施方案中,考虑了科卡尔水疱病毒(Cocal vesiculovirus)包膜假型逆转录病毒载体粒子(参见例如转让给Fred Hutchinson Cancer Research Center的美国专利公开第20120164118号)。科卡尔病毒属于水疱病毒属,并且是哺乳动物中的水疱性口炎的病原体。科卡尔病毒最初是从特立尼达的螨虫中分离出来的(Jonkers等人,Am.J.Vet.Res.25:236-242(1964)),并且已经在特立尼达、巴西和阿根廷从昆虫、牛和马中鉴定出感染。已经从自然感染的节肢动物中分离出许多使哺乳动物感染的水疱病毒,这表明它们是媒介传播的。在地方性和实验室获得病毒的农村地区,人们普遍获得水疱病毒抗体;人类感染通常会导致类似流感的症状。科卡尔病毒包膜糖蛋白在氨基酸水平上与VSV-GIndiana共有71.5%的同一性,并且水疱病毒包膜基因的系统发育比较显示,科卡尔病毒与水疱病毒中的VSV-G Indiana菌株在血清学上有所区别,但最密切相关。Jonkers等人,Am.J.Vet.Res.25:236-242(1964)和Travassos da Rosa等人,Am.J.Tropical Med.&Hygiene 33:999-1006(1984)。科卡尔水疱病毒包膜假型逆转录病毒载体粒子可包括例如慢病毒、α逆转录病毒、β逆转录病毒、γ逆转录病毒、δ逆转录病毒和ε逆转录病毒载体粒子,其可包含逆转录病毒Gag、Pol和/或一种或多种辅助蛋白和科卡尔水疱病毒包膜蛋白。在这些实施方案的某些方面,Gag、Pol和辅助蛋白是慢病毒和/或γ逆转录病毒。In another embodiment, Cocal vesiculovirus enveloped pseudotyped retroviral vector particles are contemplated (see, e.g., U.S. Patent Publication No. 20120164118 assigned to Fred Hutchinson Cancer Research Center). Cocal virus belongs to the genus Vesiculovirus and is the etiological agent of vesicular stomatitis in mammals. Cocal virus was originally isolated from mites in Trinidad (Jonkers et al., Am. J. Vet. Res. 25:236-242 (1964)), and infections have been identified in Trinidad, Brazil, and Argentina from insects, cattle, and horses. Many vesiculoviruses that infect mammals have been isolated from naturally infected arthropods, suggesting that they are vector-borne. In rural areas where the virus is endemic and laboratory-acquired, people commonly acquire antibodies to vesiculoviruses; human infection typically results in flu-like symptoms. The Cocal virus envelope glycoprotein shares 71.5% identity with VSV-G Indiana at the amino acid level, and phylogenetic comparison of the vesiculovirus envelope genes shows that Cocal virus is serologically distinct from, but most closely related to, the VSV-G Indiana strain of vesiculovirus. Jonkers et al., Am. J. Vet. Res. 25:236-242 (1964) and Travassos da Rosa et al., Am. J. Tropical Med. & Hygiene 33:999-1006 (1984). Cocal vesiculovirus envelope pseudotyped retroviral vector particles may include, for example, lentiviral, alpha retroviral, beta retroviral, gamma retroviral, delta retroviral, and epsilon retroviral vector particles, which may contain retroviral Gag, Pol, and/or one or more accessory proteins and Cocal vesiculovirus envelope proteins. In certain aspects of these embodiments, Gag, Pol, and accessory proteins are lentiviral and/or gamma retroviral.

在一些实施方案中,用本文所述的一个或多个载体瞬时或非瞬时转染宿主细胞。在一些实施方案中,当细胞天然存在于受试者中时,将细胞转染,任选地将其重新引入其中。在一些实施方案中,转染的细胞取自受试者。在一些实施方案中,细胞是源自取自受试者的细胞,例如细胞系。用于组织培养的广泛多种细胞系是本领域已知的。细胞系的实例包括但不限于C8161、CCRF-CEM、MOLT、mIMCD-3、NHDF、HeLa-S3、Huh1、Huh4、Huh7、HUVEC、HASMC、HEKn、HEKa、MiaPaCell、Panc1、PC-3、TF1、CTLL-2、C1R、Rat6、CV1、RPTE、A10、T24、J82、A375、ARH-77、Calu1、SW480、SW620、SKOV3、SK-UT、CaCo2、P388D1、SEM-K2、WEHI-231、HB56、TIB55、Jurkat、J45.01、LRMB、Bcl-1、BC-3、IC21、DLD2、Raw264.7、NRK、NRK-52E、MRC5、MEF、Hep G2、HeLa B、HeLa T4、COS、COS-1、COS-6、COS-M6A、BS-C-1猴肾上皮、BALB/3T3小鼠胚胎成纤维细胞、3T3Swiss、3T3-L1、132-d5人胎儿成纤维细胞;10.1小鼠成纤维细胞、293-T、3T3、721、9L、A2780、A2780ADR、A2780cis、A172、A20、A253、A431、A-549、ALC、B16、B35、BCP-1细胞、BEAS-2B、bEnd.3、BHK-21、BR 293、BxPC3、C3H-10T1/2、C6/36、Cal-27、CHO、CHO-7、CHO-IR、CHO-K1、CHO-K2、CHO-T、CHO Dhfr-/-、COR-L23、COR-L23/CPR、COR-L23/5010、COR-L23/R23、COS-7、COV-434、CML T1、CMT、CT26、D17、DH82、DU145、DuCaP、EL4、EM2、EM3、EMT6/AR1、EMT6/AR10.0、FM3、H1299、H69、HB54、HB55、HCA2、HEK-293、HeLa、Hepa1c1c7、HL-60、HMEC、HT-29、Jurkat、JY细胞、K562细胞、Ku812、KCL22、KG1、KYO1、LNCap、Ma-Mel 1-48、MC-38、MCF-7、MCF-10A、MDA-MB-231、MDA-MB-468、MDA-MB-435、MDCKII、MDCK II、MOR/0.2R、MONO-MAC 6、MTD-1A、MyEnd、NCI-H69/CPR、NCI-H69/LX10、NCI-H69/LX20、NCI-H69/LX4、NIH-3T3、NALM-1、NW-145、OPCN/OPCT细胞系、Peer、PNT-1A/PNT 2、RenCa、RIN-5F、RMA/RMAS、Saos-2细胞、Sf-9、SkBr3、T2、T-47D、T84、THP1细胞系、U373、U87、U937、VCaP、Vero细胞、WM39、WT-49、X63、YAC-1、YAR及其转基因品种。细胞系可从本领域技术人员已知的多种来源获得(参见例如美国典型培养物保藏中心(ATCC)(Manassus,Va.))。In some embodiments, host cells are transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, when the cell is naturally present in a subject, the cell is transfected and optionally reintroduced therein. In some embodiments, the transfected cell is taken from the subject. In some embodiments, the cell is derived from a cell taken from the subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huh1, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Pancl, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5 , MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelium, BALB/3T3 mouse embryonic fibroblasts, 3T3Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A27 80ADR, A2780cis, A172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293, BxPC3, C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr-/-, COR-L23, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-4 34. CML T1, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM 3. H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepa1c1c7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYO1, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-10A, MDA-M B-231, MDA-MB-468, MDA-MB-435, MDCKII, MDCK II, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR and their transgenic varieties. Cell lines are available from a variety of sources known to those of skill in the art (see, eg, the American Type Culture Collection (ATCC) (Manassus, Va.)).

在特定的实施方案中,AD官能化的CRISPR系统的一种或多种组分的瞬时表达和/或存在可以是令人感兴趣的,例如以降低脱靶效应。在一些实施方案中,用本文所述的一个或多个载体转染的细胞用于建立包含一个或多个载体衍生序列的新细胞系。在一些实施方案中,用如本文所述的AD官能化的CRISPR系统的组分瞬时转染(例如通过一个或多个载体的瞬时转染,或用RNA转染)并通过CRISPR复合物的活性进行修饰的细胞用于建立新细胞系,所述新细胞系包含含有修饰但缺乏任何其他外源序列的细胞。在一些实施方案中,用本文所述的一个或多个载体瞬时或非瞬时转染的细胞,或衍生自此类细胞的细胞系用于评估一种或多种测试化合物。In certain embodiments, the transient expression and/or presence of one or more components of the AD-functionalized CRISPR system can be of interest, for example, to reduce off-target effects. In some embodiments, cells transfected with one or more vectors as described herein are used to establish new cell lines comprising one or more vector-derived sequences. In some embodiments, cells transiently transfected with components of the AD-functionalized CRISPR system as described herein (e.g., by transient transfection of one or more vectors, or transfected with RNA) and modified by the activity of the CRISPR complex are used to establish new cell lines comprising cells containing modifications but lacking any other exogenous sequences. In some embodiments, cells transiently or non-transiently transfected with one or more vectors as described herein, or cell lines derived from such cells, are used to evaluate one or more test compounds.

在一些实施方案中,设想将RNA和/或蛋白质直接引入宿主细胞。例如,可将CRISPR-Cas蛋白作为编码mRNA与体外转录的指导RNA一起递送。此类方法可减少确保CRISPR-Cas蛋白作用的时间,并进一步防止CRISPR系统组分的长期表达。In some embodiments, it is contemplated that RNA and/or protein may be introduced directly into a host cell. For example, a CRISPR-Cas protein may be delivered as an encoding mRNA together with an in vitro transcribed guide RNA. Such methods may reduce the time required to ensure that the CRISPR-Cas protein is active and further prevent long-term expression of CRISPR system components.

在一些实施方案中,本公开的RNA分子以脂质体或lipofectin制剂等形式递送,并且可通过本领域技术人员众所周知的方法来制备。这类方法描述于例如美国专利第5,593,972号、第5,589,466号和第5,580,859号,所述专利通过引用并入本文。已经开发了专门针对增强和改善siRNA进入哺乳动物细胞的递送的递送系统(参见例如Shen等人,FEBSLet.2003,539:111-114;Xia等人,Nat.Biotech.2002,20:1006-1010;Reich等人,Mol.Vision.2003,9:210-216;Sorensen等人,J.Mol.Biol.2003,327:761-766;Lewis等人,Nat.Gen.2002,32:107-108;以及Simeoni等人,NAR 2003,31,11:2717-2724),并且可应用于本公开。siRNA最近已成功地用于抑制灵长类动物中的基因表达(参见例如Tolentino等人,Retina 24(4):660),其也可应用于本公开。In some embodiments, the RNA molecules of the present disclosure are delivered in the form of liposomes or lipofectin formulations, and can be prepared by methods well known to those skilled in the art. Such methods are described in, for example, U.S. Pat. Nos. 5,593,972, 5,589,466, and 5,580,859, which are incorporated herein by reference. Delivery systems specifically directed to enhancing and improving the delivery of siRNA into mammalian cells have been developed (see, e.g., Shen et al., FEBS Let. 2003, 539: 111-114; Xia et al., Nat. Biotech. 2002, 20: 1006-1010; Reich et al., Mol. Vision. 2003, 9: 210-216; Sorensen et al., J. Mol. Biol. 2003, 327: 761-766; Lewis et al., Nat. Gen. 2002, 32: 107-108; and Simeoni et al., NAR 2003, 31, 11: 2717-2724) and may be applied to the present disclosure. siRNA has recently been successfully used to inhibit gene expression in primates (see, e.g., Tolentino et al., Retina 24(4): 660), which may also be applied to the present disclosure.

实际上,RNA递送是体内递送的有用方法。可使用脂质体或粒子将Cas12o、腺苷脱氨酶和指导RNA递送至细胞中。因此,CRISPR-Cas蛋白(例如Cas12o)的递送,腺苷脱氨酶(其可与CRISPR-Cas蛋白或衔接子蛋白融合)的递送和/或本公开的RNA的递送可为RNA形式并经由微囊泡、脂质体或粒子或纳米粒子。例如,可将Cas12o mRNA、腺苷脱氨酶mRNA和指导RNA包装到脂质体粒子中以在体内递送。脂质体转染试剂,例如来自Life Technologies的lipofectamine和市场上的其他试剂,可有效地将RNA分子递送至肝脏中。在一些实施方案中,脂质纳米颗粒(LNP)同时包裹了Cas12o和其对应crRNA。在一些实施方案中,包裹了Cas12o和/或其对应crRNA的脂质纳米颗粒由静脉注射方式给予有需要的受试者(如人)。In fact, RNA delivery is a useful method for in vivo delivery. Cas12o, adenosine deaminase, and guide RNA can be delivered to cells using liposomes or particles. Therefore, the delivery of CRISPR-Cas proteins (such as Cas12o), the delivery of adenosine deaminase (which can be fused to CRISPR-Cas proteins or adapter proteins), and/or the delivery of RNA disclosed herein can be in the form of RNA and via microvesicles, liposomes or particles or nanoparticles. For example, Cas12o mRNA, adenosine deaminase mRNA, and guide RNA can be packaged into liposome particles for in vivo delivery. Liposomal transfection reagents, such as lipofectamine from Life Technologies and other reagents on the market, can effectively deliver RNA molecules to the liver. In some embodiments, lipid nanoparticles (LNPs) simultaneously encapsulate Cas12o and its corresponding crRNA. In some embodiments, lipid nanoparticles encapsulating Cas12o and/or its corresponding crRNA are administered to subjects (such as humans) in need by intravenous injection.

RNA的递送方式还优选包括经由粒子(Cho,S.,Goldberg,M.,Son,S.,Xu,Q.,Yang,F.,Mei,Y.,Bogatyrev,S.,Langer,R.和Anderson,D.,Lipid-like nanoparticles forsmall interfering RNA delivery to endothelial cells,Advanced FunctionalMaterials,19:3112-3118,2010)或外泌体(Schroeder,A.,Levins,C.,Cortez,C.,Langer,R.和Anderson,D.,Lipid-based nanotherapeutics for siRNA delivery,Journal ofInternal Medicine,267:9-21,2010,PMID:20059641)递送RNA。实际上,已显示外泌体在递送siRNA中特别有用,它是与CRISPR系统有些相似的系统。例如,El-Andaloussi S等人,(“Exosome-mediated delivery of siRNA in vitro and in vivo.”Nat Protoc.2012年12月;7(12):2112-26.doi:10.1038/nprot.2012.131.电子出版于2012年11月15日)描述了外泌体如何成为有前途的工具用于跨不同生物屏障的药物递送,并且可用于体外和体内siRNA的递送。他们的方法是通过转染包含与肽配体融合的外泌体蛋白的表达载体来生成靶向外泌体。然后将外泌体纯化并从转染的细胞上清液中表征,然后将RNA装载到外泌体中。根据本公开的递送或施用可用外泌体进行,特别是但不限于大脑。维生素E(α-生育酚)可与CRISPR Cas缀合并与高密度脂蛋白(HDL)一起递送至大脑,例如,采用与Uno等人(HUMAN GENE THERAPY 22:711-719(2011年6月))类似的方式,用于向大脑递送短干扰RNA(siRNA)。经由充满磷酸盐缓冲盐水(PBS)或游离TocsiBACE或Toc-siBACE/HDL并与脑输注试剂盒3(Alzet)连接的Osmotic微型泵(型号1007D;Alzet,Cupertino,CA)向小鼠输注。将脑输注套管放置在前囟后面约0.5mm的中线处,以输注到第三脑室背侧。Uno等人发现,通过相同的ICV输注方法,低至3nmol的含HDL的Toc-siRNA可以相当程度诱导靶标减少。在本公开中,对于人类,可考虑缀合至α-生育酚并与靶向脑的HDL共同施用的相似剂量的CRISPRCas,例如,可考虑约3nmol至约3μmol的靶向脑的CRISPR Cas。Zou等人((HUMAN GENETHERAPY 22:465-475(2011年4月))描述了一种慢病毒介导的靶向PKCγ的短发夹RNA的递送方法,以在大鼠的脊髓中进行体内基因沉默。Zou等人通过鞘内导管施用了约10μl的重组慢病毒,滴度为1×109转导单位(TU)/ml。在本公开中,人类可考虑在靶向脑的慢病毒载体中表达的相似剂量的CRISPR Cas,例如,可考虑在滴度为1×109转导单位(TU)/ml的慢病毒中约10-50ml的靶向脑的CRISPR Cas。The delivery method of RNA also preferably includes delivery of RNA via particles (Cho, S., Goldberg, M., Son, S., Xu, Q., Yang, F., Mei, Y., Bogatyrev, S., Langer, R. and Anderson, D., Lipid-like nanoparticles for small interfering RNA delivery to endothelial cells, Advanced Functional Materials, 19: 3112-3118, 2010) or exosomes (Schroeder, A., Levins, C., Cortez, C., Langer, R. and Anderson, D., Lipid-based nanotherapeutics for siRNA delivery, Journal of Internal Medicine, 267: 9-21, 2010, PMID: 20059641). In fact, exosomes have been shown to be particularly useful in delivering siRNA, which is a system somewhat similar to the CRISPR system. For example, El-Andaloussi S et al., ("Exosome-mediated delivery of siRNA in vitro and in vivo." Nat Protoc. 2012 Dec; 7(12): 2112-26. doi: 10.1038/nprot.2012.131. Epub 2012 Nov 15) describe how exosomes are a promising tool for drug delivery across different biological barriers and can be used for delivery of siRNA in vitro and in vivo. Their method is to generate targeted exosomes by transfecting an expression vector containing an exosomal protein fused to a peptide ligand. The exosomes are then purified and characterized from the transfected cell supernatant, and RNA is then loaded into the exosomes. Delivery or administration according to the present disclosure can be performed with exosomes, particularly but not limited to the brain. Vitamin E (α-tocopherol) can be conjugated to CRISPR Cas and delivered to the brain with high-density lipoprotein (HDL), for example, in a manner similar to that of Uno et al. (HUMAN GENE THERAPY 22:711-719 (June 2011)) for delivery of short interfering RNA (siRNA) to the brain. Mice were infused via an Osmotic minipump (Model 1007D; Alzet, Cupertino, CA) filled with phosphate-buffered saline (PBS) or free TocsiBACE or Toc-siBACE/HDL and connected to the Brain Infusion Kit 3 (Alzet). The brain infusion cannula was placed in the midline approximately 0.5 mm behind the anterior bregma for infusion into the dorsal third ventricle. Uno et al. found that as little as 3 nmol of Toc-siRNA in HDL could induce a substantial reduction in the target using the same ICV infusion method. In the present disclosure, for humans, similar doses of CRISPR Cas conjugated to α-tocopherol and co-administered with HDL targeting the brain may be considered, for example, about 3 nmol to about 3 μmol of CRISPR Cas targeting the brain may be considered. Zou et al. (HUMAN GENETHERAPY 22:465-475 (April 2011)) described a lentiviral-mediated delivery method of short hairpin RNA targeting PKCγ for in vivo gene silencing in the spinal cord of rats. Zou et al. administered about 10 μl of recombinant lentivirus via an intrathecal catheter at a titer of 1×10 9 transduction units (TU)/ml. In the present disclosure, humans may consider similar doses of CRISPR Cas expressed in a lentiviral vector targeting the brain, for example, about 10-50 ml of CRISPR Cas targeting the brain may be considered in a lentivirus at a titer of 1×10 9 transduction units (TU)/ml.

在一个实施方案中,递送系统可用于将系统和组合物的组分引入植物细胞中。例如,可使用电穿孔、显微注射、植物细胞原生质体的气溶胶束注射、基因枪(biolistic)法、DNA粒子轰击和/或土壤杆菌介导的转化将组分递送至植物。植物的方法和递送系统的实例包括描述于Fu等人,Transgenic Res.2000年2月;9(1):11-9;Klein RM等人,Biotechnology.1992;24:384-6;Casas AM等人,Proc Natl Acad Sci U SA.1993年12月1日;90(23):11212-11216;和美国专利第5,563,055号,Davey MR等人,Plant MolBiol.1989年9月;13(3):273-85中的那些,所述文献以引用方式整体并入本文。In one embodiment, the delivery system can be used to introduce the components of the system and composition into plant cells. For example, the components can be delivered to plants using electroporation, microinjection, aerosol injection of plant cell protoplasts, biolistic methods, DNA particle bombardment, and/or Agrobacterium-mediated transformation. Examples of methods and delivery systems for plants include those described in Fu et al., Transgenic Res. 2000 February; 9(1): 11-9; Klein RM et al., Biotechnology. 1992; 24: 384-6; Casas AM et al., Proc Natl Acad Sci U SA. 1993 December 1; 90(23): 11212-11216; and U.S. Pat. No. 5,563,055, Davey MR et al., Plant Mol Biol. 1989 September; 13(3): 273-85, which are incorporated herein by reference in their entirety.

本文所述的与组合物或Cas12o蛋白或CRISPR相关Cas12o蛋白相关的示例性递送组合物、系统和方法也适用于功能结构域和其他组分(例如,与Cas12o蛋白或CRISPR相关Cas12o蛋白相关的其他蛋白质和多核苷酸,诸如逆转录酶、核苷酸脱氨酶、逆转录转座子、供体多核苷酸等)。The exemplary delivery compositions, systems and methods described herein related to compositions or Cas12o proteins or CRISPR-associated Cas12o proteins are also applicable to functional domains and other components (e.g., other proteins and polynucleotides associated with Cas12o proteins or CRISPR-associated Cas12o proteins, such as reverse transcriptases, nucleotide deaminases, retrotransposons, donor polynucleotides, etc.).

货物(cargo)Cargo

递送系统可包含一种或多种货物。货物可包含本文的系统和组合物的一种或多种组分。货物可包含以下一项或多项:i)编码组合物和系统中的一种或多种蛋白质组分诸如Cas12o蛋白或CRISPR相关Cas12o蛋白和/或功能结构域的质粒;ii)编码一个或多个crRNA的质粒,iii)组合物和系统中的一种或多种一种或多种蛋白质组分诸如Cas12o蛋白或CRISPR相关Cas12o蛋白和/或功能结构域的mRNA;iv)一个或多个指导RNA;v)组合物和系统中的一种或多种蛋白质组分诸如Cas12o蛋白或CRISPR相关Cas12o蛋白和/或功能结构域;vi)它们的任何组合。所述一种或多种蛋白质组分可包括核酸指导的核酸酶(例如Cas)、逆转录酶、核苷酸脱氨酶、逆转录转座子蛋白、其他功能结构域或它们的任何组合。The delivery system may include one or more goods. The goods may include one or more components of the systems and compositions herein. The goods may include one or more of the following: i) a plasmid encoding one or more protein components such as Cas12o protein or CRISPR-related Cas12o protein and/or functional domains in the composition and system; ii) a plasmid encoding one or more crRNAs, iii) one or more protein components such as Cas12o protein or CRISPR-related Cas12o protein and/or mRNA of functional domains in the composition and system; iv) one or more guide RNAs; v) one or more protein components such as Cas12o protein or CRISPR-related Cas12o protein and/or functional domains in the composition and system; vi) any combination thereof. The one or more protein components may include nucleic acid-guided nucleases (e.g., Cas), reverse transcriptases, nucleotide deaminases, retrotransposon proteins, other functional domains, or any combination thereof.

在一些实例中,货物可包含编码组合物和系统中的一种或多种蛋白质组分诸如Cas12o蛋白或CRISPR相关Cas12o蛋白和/或功能结构域和一个或多个(例如,多个)指导RNA的质粒。在一些情况下,质粒还可编码重组模板(例如,用于HDR)。在一个实施方案中,货物可包含编码一种或多种蛋白质组分和一个或多个指导RNA的mRNA。在一些实例中,货物可包含一种或多种蛋白质组分和一个或多个crRNA或指导RNA,例如,以核糖核蛋白复合物(RNP)的形式。核糖核蛋白复合物可通过本文的方法和系统递送。在一些情况下,核糖核蛋白可通过基于多肽的穿梭剂递送。在一个实例中,核糖核蛋白可使用合成肽递送,所述合成肽包含与细胞穿透结构域(CPD)可操作地连接的内体泄漏结构域(ELD)、与富含组氨酸的结构域和CPD可操作地连接的ELD,例如,如WO2016161516中所述。RNP也可用于将组合物和系统递送至植物细胞,例如,如Wu JW等人,NatBiotechnol.2015年11月;33(11):1162-4中所述。In some examples, goods may include one or more protein components such as Cas12o protein or CRISPR-related Cas12o protein and/or functional domains and one or more (e.g., multiple) guide RNA plasmids in encoding compositions and systems. In some cases, plasmids may also encode recombinant templates (e.g., for HDR). In one embodiment, goods may include mRNA encoding one or more protein components and one or more guide RNAs. In some examples, goods may include one or more protein components and one or more crRNA or guide RNAs, for example, in the form of ribonucleoprotein complexes (RNPs). Ribonucleoprotein complexes can be delivered by the methods and systems herein. In some cases, ribonucleoproteins can be delivered by shuttle agents based on polypeptides. In one example, ribonucleoproteins can be delivered using synthetic peptides, and the synthetic peptides include an endosome leakage domain (ELD) operably connected to a cell penetration domain (CPD), an ELD operably connected to a histidine-rich domain and a CPD, for example, as described in WO2016161516. RNPs can also be used to deliver compositions and systems to plant cells, for example, as described in Wu JW et al., Nat Biotechnol. 2015 Nov;33(11):1162-4.

物理递送Physical delivery

在一个实施方案中,可通过物理递送方法将货物引入细胞。物理方法的实例包括显微注射、电穿孔和流体动力学递送。核酸和蛋白质都可使用此类方法来递送。例如,一种或多种蛋白质组分可在体外制备、分离(如果需要的话,再折叠、纯化),并引入细胞。In one embodiment, the cargo can be introduced into the cell by a physical delivery method. Examples of physical methods include microinjection, electroporation, and hydrodynamic delivery. Both nucleic acids and proteins can be delivered using such methods. For example, one or more protein components can be prepared in vitro, separated (if necessary, refolded, purified), and introduced into the cell.

显微注射Microinjection

将货物直接显微注射到细胞可实现高效率,例如,高于90%或约100%。在一个实施方案中,可使用显微镜和针头(例如,直径为0.5-5.0μm)进行显微注射以刺穿细胞膜并将货物直接递送至细胞内的靶位点。显微注射可用于体外和离体递送。Microinjection of cargo directly into cells can achieve high efficiencies, e.g., greater than 90% or about 100%. In one embodiment, microinjection can be performed using a microscope and a needle (e.g., 0.5-5.0 μm in diameter) to pierce the cell membrane and deliver the cargo directly to a target site within the cell. Microinjection can be used for in vitro and ex vivo delivery.

可显微注射包含一种或多种蛋白质组分和/或crRNA、mRNA和/或指导RNA的编码序列的质粒。在一些情况下,显微注射可用于i)将DNA直接递送至细胞核,和/或ii)将mRNA(例如,体外转录的)递送至细胞核或细胞质。在某些实例中,显微注射可用于将crRNA直接递送至细胞核并将mRNA递送至细胞质,从而例如促进一种或多种蛋白质组分的翻译和向细胞核的穿梭。The plasmid containing the coding sequence of one or more protein components and/or crRNA, mRNA and/or guide RNA can be microinjected. In some cases, microinjection can be used for i) DNA is delivered directly to the nucleus, and/or ii) mRNA (e.g., in vitro transcribed) is delivered to the nucleus or cytoplasm. In some instances, microinjection can be used for crRNA is delivered directly to the nucleus and mRNA is delivered to the cytoplasm, so as to promote the translation of one or more protein components and the shuttling to the nucleus.

显微注射可用于产生基因修饰的动物。例如,可将基因编辑货物注射到受精卵中,以允许进行高效的种系修饰。此种方法可产生带有所需修饰的正常胚胎和足月小鼠幼崽。显微注射还可用于例如使用Cas12o蛋白或CRISPR相关Cas12o蛋白提供瞬时上调或下调细胞基因组内的特定基因。Microinjection can be used to produce genetically modified animals. For example, gene editing cargo can be injected into fertilized eggs to allow efficient germline modification. This method can produce normal embryos and full-term mouse pups with the desired modifications. Microinjection can also be used to provide transient upregulation or downregulation of specific genes within the cell genome, for example using Cas12o proteins or CRISPR-associated Cas12o proteins.

电穿孔Electroporation

在一个实施方案中,货物和/或递送媒介物可通过电穿孔递送。电穿孔可使用脉冲高压电流瞬时打开悬浮于缓冲液中的细胞的细胞膜内的纳米大小的孔,从而使流体动力学直径为数十纳米的组分流入细胞中。在一些情况下,电穿孔可用于各种细胞类型并高效地将货物转移到细胞中。电穿孔可用于体外和离体递送。In one embodiment, cargo and/or delivery vehicle can be delivered by electroporation. Electroporation can use pulsed high voltage current to instantaneously open nanometer-sized pores in the cell membrane of cells suspended in a buffer, thereby allowing components with a hydrodynamic diameter of tens of nanometers to flow into the cell. In some cases, electroporation can be used for various cell types and efficiently transfer cargo into cells. Electroporation can be used for in vitro and ex vivo delivery.

电穿孔还可用于通过施加特定电压和试剂,例如通过核转染将货物递送到哺乳动物的细胞核中。此类方法包括描述于Wu Y等人(2015).Cell Res 25:67-79;Ye L等人(2014).Proc Natl Acad Sci USA 111:9591-6;Choi PS,Meyerson M.(2014).Nat Commun5:3728;Wang J,Quake SR.(2014).Proc Natl Acad Sci 111:13157-62中的那些。电穿孔还可用于体内递送货物,例如,通过使用描述于Zuckermann M等人(2015).Nat Commun 6:7391中的方法。Electroporation can also be used to deliver cargo to the nucleus of mammalian cells by applying specific voltages and reagents, such as by nuclear transfection. Such methods include those described in Wu Y et al. (2015). Cell Res 25:67-79; Ye L et al. (2014). Proc Natl Acad Sci USA 111:9591-6; Choi PS, Meyerson M. (2014). Nat Commun 5:3728; Wang J, Quake SR. (2014). Proc Natl Acad Sci 111:13157-62. Electroporation can also be used to deliver cargo in vivo, for example, by using the method described in Zuckermann M et al. (2015). Nat Commun 6:7391.

流体动力学递送Hydrodynamic delivery

流体动力学递送也可用于递送货物,例如用于体内递送。在一些实例中,流体动力学递送可通过将含有基因编辑货物的大体积(8%-10%体重)溶液快速推入受试者(例如,动物或人)的血流中来进行,例如,对于小鼠,通过尾静脉推入血流中。由于血液是不可压缩的,大剂量的液体可能导致流体动力学压力增加,从而暂时增强对内皮细胞和实质细胞的渗透性,从而使通常不能穿过细胞膜的货物进入细胞中。这种方法可用于递送裸DNA质粒和蛋白质。递送的货物可富集于肝脏、肾脏、肺、肌肉和/或心脏。Hydrodynamic delivery can also be used to deliver cargo, such as for in vivo delivery. In some instances, hydrodynamic delivery can be performed by rapidly pushing a large volume (8%-10% body weight) solution containing gene editing cargo into the bloodstream of a subject (e.g., an animal or a human), for example, for mice, through the tail vein. Since blood is incompressible, large doses of liquid may cause an increase in hydrodynamic pressure, thereby temporarily enhancing the permeability to endothelial cells and parenchymal cells, thereby allowing cargo that normally cannot pass through the cell membrane to enter the cell. This method can be used to deliver naked DNA plasmids and proteins. The delivered cargo can be enriched in the liver, kidneys, lungs, muscles, and/or heart.

转染Transfection

可通过用于将核酸引入细胞中的转染方法将货物例如核酸引入细胞。转染方法的实例包括磷酸钙介导的转染、阳离子转染、脂质体转染、树状聚合物转染、热休克转染、磁转染、脂质转染、刺穿转染(impalefection)、光学转染、专有剂(proprietary agent)增强的核酸摄取。Cargo, such as nucleic acids, can be introduced into cells by transfection methods for introducing nucleic acids into cells. Examples of transfection methods include calcium phosphate-mediated transfection, cationic transfection, liposome transfection, dendrimer transfection, heat shock transfection, magnetofection, lipofection, impalefection, optical transfection, proprietary agent enhanced nucleic acid uptake.

细胞穿透肽Cell Penetrating Peptides

在一个实施方案中,递送媒介物包括细胞穿透肽(CPP)。CPP是促进细胞摄取各种分子货物(例如,从纳米大小的粒子到小的化学分子和大的DNA片段)的短肽。In one embodiment, the delivery vehicle includes a cell penetrating peptide (CPP). CPPs are short peptides that facilitate cellular uptake of a variety of molecular cargoes (e.g., from nano-sized particles to small chemical molecules and large DNA fragments).

CPP可具有不同的大小、氨基酸序列和电荷。在一些实例中,CPP可将质膜易位并促进各种分子货物向细胞质或细胞器的递送。CPP可通过不同的机制,例如直接穿透膜、内吞作用介导的进入和通过形成暂时性结构的易位,而引入细胞中。CPPs can have different sizes, amino acid sequences and charges. In some examples, CPPs can translocate the plasma membrane and facilitate the delivery of various molecular cargoes to the cytoplasm or organelles. CPPs can be introduced into cells by different mechanisms, such as direct penetration of the membrane, endocytosis-mediated entry, and translocation through the formation of transient structures.

CPP的氨基酸组成可含有高相对丰度的带正电荷的氨基酸(诸如赖氨酸或精氨酸),或具有含有极性/带电荷的氨基酸和非极性疏水氨基酸的交替模式的序列。这两种类型的结构分别称为聚阳离子或两亲结构。第三类CPP是疏水肽,其仅含有非极性残基,具有低净电荷或具有对细胞摄取至关重要的疏水氨基酸基团。另一种类型的CPP是来自人免疫缺陷病毒1(HIV-1)的反式激活转录激活子(Tat)。CPP的实例包括穿膜肽(Penetratin)、Tat(48-60)、转运肽(Transportan)和(R-AhX-R4)(Ahx是指氨基己酰基)、卡波西成纤维细胞生长因子(FGF)信号肽序列、整联蛋白β3信号肽序列、聚精氨酸肽Arg序列、富含鸟嘌呤的分子转运蛋白和甜箭肽(sweet arrow peptide)。CPP和相关应用的实例还包括描述于美国专利第8,372,951号中的那些。The amino acid composition of a CPP may contain a high relative abundance of positively charged amino acids (such as lysine or arginine), or have a sequence containing an alternating pattern of polar/charged amino acids and nonpolar hydrophobic amino acids. These two types of structures are called polycationic or amphipathic structures, respectively. The third type of CPP is a hydrophobic peptide that contains only nonpolar residues, has a low net charge, or has a hydrophobic amino acid group that is critical for cellular uptake. Another type of CPP is the transactivating transcription activator (Tat) from human immunodeficiency virus 1 (HIV-1). Examples of CPPs include penetratin, Tat (48-60), transportan, and (R-AhX-R4) (Ahx refers to aminohexanoyl), Kaposi's fibroblast growth factor (FGF) signal peptide sequence, integrin β3 signal peptide sequence, polyarginine peptide Arg sequence, guanine-rich molecular transporter, and sweet arrow peptide. Examples of CPP and related applications also include those described in US Pat. No. 8,372,951.

CPP可以很容易地用于体外和离体作用,并且通常需要针对每种货物和细胞类型进行广泛优化。在一些实例中,CPP可直接共价附接至Cas12o蛋白,然后所述Cas12o蛋白与crRNA复合并递送至细胞。在一些实例中,可将CPP-Cas12o和CPP-crRNA单独递送至多个细胞。CPP也可用于递送RNP。CPP can be easily used for in vitro and ex vivo effects, and generally requires extensive optimization for each cargo and cell type. In some instances, CPP can be covalently attached directly to Cas12o protein, which is then compounded with crRNA and delivered to cells. In some instances, CPP-Cas12o and CPP-crRNA can be delivered to multiple cells individually. CPP can also be used to deliver RNP.

CPP可用于将组合物和系统递送至植物。在一些实例中,CPP可用于将组分递送至植物原生质体,然后将所述植物原生质体再生为植物细胞并进一步再生为植物。CPPs can be used to deliver compositions and systems to plants. In some examples, CPPs can be used to deliver components to plant protoplasts, which are then regenerated into plant cells and further regenerated into plants.

金纳米粒子Gold Nanoparticles

在一个实施方案中,递送媒介物包括金纳米粒子(也称为AuNP或胶体金)。金纳米粒子可与货物例如Cas12o蛋白:crRNA RNP形成复合物。金纳米粒子可被包被,例如,包被在硅酸盐和内体破坏性聚合物PAsp(DET)中。金纳米粒子的实例包括AuraSenseTherapeutics的Spherical Nucleic Acid(SNATM)构建体,以及描述于Mout R,等人(2017).ACS Nano 11:2452-8;Lee K等人(2017).Nat Biomed Eng 1:889-901中的那些。In one embodiment, the delivery vehicle includes gold nanoparticles (also known as AuNPs or colloidal gold). The gold nanoparticles can form a complex with a cargo such as Cas12o protein:crRNA RNP. The gold nanoparticles can be coated, for example, in silicate and endosome disrupting polymer PAsp (DET). Examples of gold nanoparticles include Spherical Nucleic Acid (SNATM) constructs from AuraSense Therapeutics, and those described in Mout R, et al. (2017). ACS Nano 11:2452-8; Lee K et al. (2017). Nat Biomed Eng 1:889-901.

基因修饰的细胞和生物体Genetically modified cells and organisms

本公开还提供了包含本文组合物和系统的一种或多种组分,例如Cas12o蛋白或CRISPR相关Cas12o蛋白和/或crRNA的细胞。另外提供的包括通过本文的系统和方法修饰的细胞,以及包含此类细胞或其子代的细胞培养物、组织、器官、生物体。在一个实施方案中,本公开提供了一种修饰细胞或生物体的方法。细胞可以是原核细胞或真核细胞。细胞可以是哺乳动物细胞。哺乳动物细胞可以是非人灵长类动物、牛、猪、啮齿动物或小鼠细胞。细胞可以是非哺乳动物真核细胞,诸如家禽、鱼或虾。细胞可以是治疗性T细胞或产生抗体的B细胞。细胞还可以是植物细胞。植物细胞可以是农作物诸如木薯、玉米、高粱、小麦或水稻的细胞。植物细胞还可以是藻类、树木或蔬菜的细胞。通过本公开引入细胞的修饰可使得改变细胞和细胞的子代以提高生物产物(诸如抗体、淀粉、醇或其他所需细胞输出)的产生。通过本公开引入细胞的修饰可使得细胞和细胞的子代包括改变所产生的生物产物的改变。The present disclosure also provides cells comprising one or more components of the compositions and systems herein, such as Cas12o proteins or CRISPR-related Cas12o proteins and/or crRNA. Also provided are cells modified by the systems and methods herein, and cell cultures, tissues, organs, and organisms comprising such cells or their progeny. In one embodiment, the present disclosure provides a method for modifying a cell or an organism. The cell may be a prokaryotic cell or a eukaryotic cell. The cell may be a mammalian cell. The mammalian cell may be a non-human primate, a cattle, a pig, a rodent, or a mouse cell. The cell may be a non-mammalian eukaryotic cell, such as poultry, fish, or shrimp. The cell may be a therapeutic T cell or an antibody-producing B cell. The cell may also be a plant cell. The plant cell may be a cell of a crop such as cassava, corn, sorghum, wheat, or rice. The plant cell may also be a cell of algae, a tree, or a vegetable. The modification introduced into the cell by the present disclosure may cause the cell and the progeny of the cell to be changed to increase the production of a bioproduct (such as an antibody, starch, alcohol, or other desired cell output). The modifications introduced into a cell by the present disclosure can be such that the cell and progeny of the cell include the alteration in an altered biological product produced.

在一个实施方案中,将驱动包含核酸靶向系统的一种或多种元件的组合物、系统或递送系统的一种或多种元件的表达的一种或多种多核苷酸分子、载体或载体系统引入宿主细胞中,使得该核酸靶向系统的这些元件的表达引导靶向核酸的复合物在一个或多个靶位点处形成。在本公开的一个实施方案中,宿主细胞可以是真核细胞、原核细胞或植物细胞。In one embodiment, one or more polynucleotide molecules, vectors or vector systems that drive the expression of one or more elements of a composition, system or delivery system comprising one or more elements of a nucleic acid targeting system are introduced into a host cell such that expression of these elements of the nucleic acid targeting system directs the formation of a nucleic acid-targeting complex at one or more target sites. In one embodiment of the present disclosure, the host cell may be a eukaryotic cell, a prokaryotic cell or a plant cell.

在一个实施方案中,宿主细胞是细胞系的细胞。细胞系可从本领域技术人员已知的多种来源获得(参见,例如,美国典型培养物保藏中心(ATCC)(Manassus,Va.))。在一个实施方案中,用本文所述的一种或多种载体转染的细胞用于建立包含一种或多种载体来源的序列的新细胞系。在一个实施方案中,使用用如本文所述的系统的组分瞬时转染(诸如通过一种或多种载体进行瞬时转染,或用RNA进行转染)并且通过复合物的活性修饰的细胞建立细胞系,所述细胞系包含含有修饰但缺少任何其他外源性序列的细胞。在一个实施方案中,用本文所述的一种或多种载体瞬时或非瞬时转染的细胞,或源自此类细胞的细胞系用于评估一种或多种受试化合物。In one embodiment, the host cell is a cell of a cell line. The cell line can be obtained from a variety of sources known to those skilled in the art (see, for example, American Type Culture Collection (ATCC) (Manassus, Va.)). In one embodiment, cells transfected with one or more vectors as described herein are used to establish a new cell line comprising sequences from one or more vector sources. In one embodiment, a cell line is established using a cell transiently transfected with components of a system as described herein (such as transiently transfected by one or more vectors, or transfected with RNA) and modified by the activity of the complex, the cell line comprising cells containing modifications but lacking any other exogenous sequences. In one embodiment, cells transiently or non-transiently transfected with one or more vectors as described herein, or cell lines derived from such cells are used to evaluate one or more test compounds.

还预期包含本文实施方案中任一项所述的多核苷酸分子、载体、载体系统或细胞中的一者或多者的人细胞或组织、植物或非人动物。在一个方面,提供了通过本公开的组合物、系统或修饰的酶修饰的或包含本公开的组合物、系统或修饰的酶的宿主细胞和细胞系,包括(分离的)干细胞及其子代。It is also contemplated that human cells or tissues, plants or non-human animals comprising one or more of the polynucleotide molecules, vectors, vector systems or cells of any one of the embodiments herein. In one aspect, host cells and cell lines modified by or comprising a composition, system or modified enzyme of the present disclosure are provided, including (isolated) stem cells and progeny thereof.

在一个实施方案中,植物或非人动物在植物或非人动物的至少一种组织类型处包含本文实施方案中任一项所述的系统组分、多核苷酸分子、载体、载体系统或细胞中的至少一者。在一个实施方案中,非人动物在至少一种组织类型中包含本文实施方案中任一项所述的系统组件、多核苷酸分子、载体、载体系统或细胞中的至少一者。在一个实施方案中,系统组分的存在是瞬时的,因为它们会随着时间的推移而降解。在一个实施方案中,包含在多核苷酸分子、载体、载体系统或细胞中的在实施方案中任一项所述的系统和组合物的组分的表达限于植物或非人动物中的某些组织类型或区域。在一个实施方案中,包含在多核苷酸分子、载体、载体系统或细胞中的在实施方案中任一项所述的系统和组合物的组分的表达依赖于生理线索。在一个实施方案中,包含在多核苷酸分子、载体、载体系统或细胞中的在实施方案中任一项所述的系统和组合物的组分的表达可由外源性分子触发。在一个实施方案中,包含在多核苷酸分子、载体、载体系统或细胞中的在实施方案中任一项所述的系统和组合物的组分的表达取决于非cas分子在植物或非人动物中的表达。In one embodiment, a plant or non-human animal comprises at least one of the system components, polynucleotide molecules, vectors, vector systems or cells described in any one of the embodiments herein at least one tissue type of the plant or non-human animal. In one embodiment, a non-human animal comprises at least one of the system components, polynucleotide molecules, vectors, vector systems or cells described in any one of the embodiments herein in at least one tissue type. In one embodiment, the presence of system components is transient because they degrade over time. In one embodiment, the expression of the components of the system and composition described in any one of the embodiments contained in polynucleotide molecules, vectors, vector systems or cells is limited to certain tissue types or regions in plants or non-human animals. In one embodiment, the expression of the components of the system and composition described in any one of the embodiments contained in polynucleotide molecules, vectors, vector systems or cells depends on physiological cues. In one embodiment, the expression of the components of the system and composition described in any one of the embodiments contained in polynucleotide molecules, vectors, vector systems or cells can be triggered by exogenous molecules. In one embodiment, the expression of the components of the system and composition described in any one of the embodiments contained in polynucleotide molecules, vectors, vector systems or cells depends on the expression of non-cas molecules in plants or non-human animals.

一般应用和用途General Applications and Uses

本文所述的系统、载体系统、载体和组合物可用于各种核酸靶向应用,改变或修改基因产物(诸如蛋白质)的合成、核酸切割、核酸编辑、核酸剪接;靶核酸的运输、靶核酸的追踪、靶核酸的分离、靶核酸的可视化等。The systems, vector systems, vectors and compositions described in this article can be used for various nucleic acid targeting applications, changing or modifying the synthesis of gene products (such as proteins), nucleic acid cleavage, nucleic acid editing, nucleic acid splicing; transport of target nucleic acids, tracking of target nucleic acids, separation of target nucleic acids, visualization of target nucleic acids, etc.

因此,本公开的多个方面还涵盖本文所述的组合物和系统在基因组工程化中的方法和用途,例如用于在体外、体内或离体改变或操纵一种或多种基因或一种或多种基因产物在原核或真核细胞中的表达。在一些实例中,靶多核苷酸是基因组DNA(包括核基因组DNA、线粒体DNA或叶绿体DNA)内的靶序列。Therefore, aspects of the present disclosure also encompass methods and uses of the compositions and systems described herein in genome engineering, such as for altering or manipulating the expression of one or more genes or one or more gene products in prokaryotic or eukaryotic cells in vitro, in vivo, or ex vivo. In some examples, the target polynucleotide is a target sequence within genomic DNA (including nuclear genomic DNA, mitochondrial DNA, or chloroplast DNA).

通常,在核酸靶向系统的情形下,靶向核酸的复合物(包含与靶序列杂交并与一种或多种靶向核酸的效应蛋白复合的crRNA或指导RNA)的形成导致靶序列中或附近(例如距其1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、20个、50个或更多个碱基对以内)一条或两条DNA或RNA链的切割。如本文所用,术语“与目标靶基因座相缔合的一个或多个序列”是指在靶序列附近的序列(例如,离靶序列1、2、3、4、5、6、7、8、9、10、20、50、或更多个碱基对之内,其中所述靶序列被包含在目标靶基因座中)。Typically, in the context of a nucleic acid-targeting system, the formation of a nucleic acid-targeting complex (comprising a crRNA or guide RNA hybridized to a target sequence and complexed with one or more nucleic acid-targeting effector proteins) results in the cleavage of one or both DNA or RNA strands in or near (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50 or more base pairs of) the target sequence. As used herein, the term "one or more sequences associated with a target locus of interest" refers to sequences that are near (e.g., within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs of) a target sequence, wherein the target sequence is contained in the target locus of interest.

在一个实施方案中,本公开提供了一种靶向多核苷酸的方法,所述方法包括使包含靶多核苷酸的样本(诸如细胞、细胞群、组织、器官或生物体)与组合物、系统、多核苷酸或载体接触。所述接触可导致基因产物的修饰或基因产物的量或表达的修改。在一些实例中,多核苷酸的靶序列是疾病相关靶序列。In one embodiment, the present disclosure provides a method for targeting a polynucleotide, the method comprising contacting a sample (such as a cell, a cell group, a tissue, an organ or an organism) containing a target polynucleotide with a composition, a system, a polynucleotide or a vector. The contact may result in modification of a gene product or modification of the amount or expression of a gene product. In some instances, the target sequence of the polynucleotide is a disease-associated target sequence.

在一个实施方案中,本公开提供了一种修饰靶多核苷酸的方法,所述方法包括将组合物中的一种或多种多核苷酸、或一种或多种载体递送至包含靶多核苷酸的细胞或细胞群,其中复合物将逆转录酶引导至靶序列并且所述逆转录酶促进来自crRNA的供体序列插入靶多核苷酸中。In one embodiment, the present disclosure provides a method for modifying a target polynucleotide, the method comprising delivering one or more polynucleotides in a composition, or one or more vectors to a cell or cell population comprising the target polynucleotide, wherein the complex guides a reverse transcriptase to the target sequence and the reverse transcriptase promotes insertion of a donor sequence from the crRNA into the target polynucleotide.

靶多核苷酸的实例包括与信号传导生物化学途径相关联的序列,例如信号传导生物化学途径相关基因或多核苷酸。靶多核苷酸的实例包括疾病相关基因或多核苷酸。“疾病相关”基因或多核苷酸是指与非疾病对照的组织或细胞相比,在源自疾病累计组织的细胞中以异常水平或异常形式产生转录或翻译产物的任何基因或多核苷酸。它可能是变得以异常高水平表达的基因;它可能是变得以异常低水平表达的基因,而这种改变的表达与疾病的发生和/或进展有关。疾病相关基因还指具有突变或基因变异的基因,该基因对导致疾病病因的基因直接负责或呈连锁不平衡。转录或翻译的产物可能是已知的或未知的,并且可能处于正常或异常水平。Examples of target polynucleotides include sequences associated with signal transduction biochemical pathways, such as signal transduction biochemical pathway-related genes or polynucleotides. Examples of target polynucleotides include disease-related genes or polynucleotides. "Disease-related" genes or polynucleotides refer to any genes or polynucleotides that produce transcription or translation products at abnormal levels or in abnormal forms in cells derived from disease-accumulated tissues compared to tissues or cells of non-disease controls. It may be a gene that becomes expressed at an abnormally high level; it may be a gene that becomes expressed at an abnormally low level, and this altered expression is related to the occurrence and/or progression of the disease. Disease-related genes also refer to genes with mutations or gene variations that are directly responsible for or in linkage disequilibrium with genes that cause the cause of the disease. The products of transcription or translation may be known or unknown and may be at normal or abnormal levels.

复合物的靶多核苷酸可以是真核细胞内源或外源的任何多核苷酸。例如,靶多核苷酸可以是存在于真核细胞的细胞核中的多核苷酸。靶多核苷酸可以是编码基因产物(例如,蛋白质)的序列或非编码序列(例如,调控多核苷酸或垃圾DNA)。不希望受理论的束缚,据信靶序列应该与TAM(靶相邻基序)(即,由复合物识别的短序列)相缔合。TAM的精确序列和长度要求因所使用的Cas12o蛋白或CRISPR相关Cas12o蛋白而异,但TAM通常是与原型间隔区(即靶序列)相邻的2-5个碱基对序列,技术人员将能够鉴定用于与给定Cas12o蛋白或CRISPR相关Cas12o蛋白一起使用的另外的TAM序列。此外,TAM相互作用结构域的工程化可允许对TAM特异性进行编程,提高靶位点识别保真度,并增加Cas12o蛋白基因组工程化平台的多功能性。The target polynucleotide of the complex can be any polynucleotide endogenous or exogenous to a eukaryotic cell. For example, the target polynucleotide can be a polynucleotide present in the nucleus of a eukaryotic cell. The target polynucleotide can be a sequence encoding a gene product (e.g., protein) or a non-coding sequence (e.g., regulatory polynucleotide or junk DNA). It is not desirable to be bound by theory, but it is believed that the target sequence should be associated with TAM (target adjacent motif) (i.e., a short sequence identified by the complex). The precise sequence and length requirements of TAM vary due to the Cas12o protein used or the CRISPR-related Cas12o protein, but TAM is typically a 2-5 base pair sequence adjacent to the prototype spacer (i.e., target sequence), and technicians will be able to identify additional TAM sequences for use with a given Cas12o protein or CRISPR-related Cas12o protein. In addition, the engineering of the TAM interaction domain allows programming of TAM specificity, improves target site recognition fidelity, and increases the versatility of the Cas12o protein genome engineering platform.

在一些情况下,Cas12o蛋白或CRISPR相关Cas12o蛋白可经工程化以改变它们的PAM特异性。在一些实施方案中,当所述靶序列为DNA时,所述靶序列位于原间隔序列临近基序(PAM)的3'端,并且所述PAM为5'-TN,其中,N为A、T、G或C。In some cases, Cas12o proteins or CRISPR-associated Cas12o proteins can be engineered to change their PAM specificity. In some embodiments, when the target sequence is DNA, the target sequence is located at the 3' end of the original spacer sequence adjacent to the motif (PAM), and the PAM is 5'-TN, wherein N is A, T, G or C.

靶多核苷酸的实例包括与信号传导生物化学途径相关联的序列,例如信号传导生物化学途径相关基因或多核苷酸。靶多核苷酸的实例包括疾病相关基因或多核苷酸。“疾病相关”基因或多核苷酸是指与非疾病对照的组织或细胞相比,在源自疾病累计组织的细胞中以异常水平或异常形式产生转录或翻译产物的任何基因或多核苷酸。它可能是变得以异常高水平表达的基因;它可能是变得以异常低水平表达的基因,而这种改变的表达与疾病的发生和/或进展有关。疾病相关基因还指具有突变或基因变异的基因,该基因对导致疾病病因的基因直接负责或呈连锁不平衡。转录或翻译的产物可能是已知的或未知的,并且可能处于正常或异常水平。Examples of target polynucleotides include sequences associated with signal transduction biochemical pathways, such as signal transduction biochemical pathway-related genes or polynucleotides. Examples of target polynucleotides include disease-related genes or polynucleotides. "Disease-related" genes or polynucleotides refer to any genes or polynucleotides that produce transcription or translation products at abnormal levels or in abnormal forms in cells derived from disease-accumulated tissues compared to tissues or cells of non-disease controls. It may be a gene that becomes expressed at an abnormally high level; it may be a gene that becomes expressed at an abnormally low level, and this altered expression is related to the occurrence and/or progression of the disease. Disease-related genes also refer to genes with mutations or gene variations that are directly responsible for or in linkage disequilibrium with genes that cause the cause of the disease. The products of transcription or translation may be known or unknown and may be at normal or abnormal levels.

本公开的多个方面涉及:一种靶向多核苷酸的方法,所述方法包括使包含所述多核苷酸的样本与如本文任何实施方案中所述的组合物、系统或Cas12o蛋白或CRISPR相关Cas12o蛋白接触;一种包含如本文任何实施方案中所述的组合物、系统或Cas12o蛋白或CRISPR相关Cas12o蛋白核酸酶的递送系统;一种包含如本文任何实施方案中所述的组合物、系统或Cas12o蛋白或CRISPR相关Cas12o蛋白的多核苷酸;一种包含如本文任何实施方案中所述的组合物、系统或Cas12o蛋白或CRISPR相关Cas12o蛋白的载体;或一种包含如本文任何实施方案中所述的组合物、系统或Cas12o蛋白或CRISPR相关Cas12o蛋白的载体系统。在一个实施方案中,靶多核苷酸与至少两种不同的组成、系统或Cas12o蛋白或CRISPR相关Cas12o蛋白接触。在另外的实施方案中,所述两种不同的Cas12o蛋白具有不同的靶多核苷酸特异性,或特异性程度。在一个实施方案中,所述两种不同的Cas12o蛋白或CRISPR相关Cas12o蛋白具有不同的TAM特异性。Aspects of the present disclosure relate to: a method for targeting a polynucleotide, the method comprising contacting a sample containing the polynucleotide with a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein as described in any embodiment herein; a delivery system comprising a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein nuclease as described in any embodiment herein; a polynucleotide comprising a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein as described in any embodiment herein; a carrier comprising a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein as described in any embodiment herein; or a carrier system comprising a composition, system, or Cas12o protein or CRISPR-associated Cas12o protein as described in any embodiment herein. In one embodiment, the target polynucleotide is contacted with at least two different compositions, systems, or Cas12o proteins or CRISPR-associated Cas12o proteins. In another embodiment, the two different Cas12o proteins have different target polynucleotide specificities, or degrees of specificity. In one embodiment, the two different Cas12o proteins or CRISPR-associated Cas12o proteins have different TAM specificities.

还设想了靶向多核苷酸的方法,所述方法包括使包含所述多核苷酸的样本与本文的组合物和系统、载体、多核苷酸接触,其中接触导致基因产物的修饰或基因产物的量或表达的修改。在一个实施方案中,靶向基因产物的表达通过所述方法而增加。在一个实施方案中,靶向基因产物的表达增加至少10%、至少15%、至少20%、至少25%、至少30%、至少35%、至少40%、至少45%、至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少95%、100%。在一个实施方案中,靶向基因产物的表达增加到至少1.5倍、至少2倍、至少2.5倍、至少3倍、至少3.5倍、至少3.5倍、至少4倍、至少4.5倍、至少5倍、至少10倍、至少10倍、至少15倍、至少20倍、至少25倍、至少50倍、至少100倍。在一个实施方案中,靶向基因产物的表达降低至少10%、至少15%、至少20%、至少25%、至少30%、至少35%、至少40%、至少45%、至少50%、至少55%、至少60%、至少65%、至少70%、至少75%、至少80%、至少85%、至少90%、至少95%、至少100%。在一个实施方案中,靶向基因产物的表达降低到至少1/1.5、至少1/2、至少1/2.5、至少1/3、至少1/3.5、至少1/3.5、至少1/4、至少1/4.5、至少1/5、至少1/10、至少1/10、至少1/15、至少1/20、至少1/25、至少1/50、至少1/100。在一个替代实施方案中,靶向基因产物的表达通过所述方法而降低。在另外的实施方案中,靶向基因的表达可被完全消除,或者可被认为消除,如果靶向基因的残余表达水平降到低于本领域中已知的用于定量、检测或监测表达水平的方法的检测限度。Also contemplated are methods of targeting polynucleotides, comprising contacting a sample comprising the polynucleotides with the compositions and systems, vectors, polynucleotides herein, wherein the contacting results in modification of the gene product or modification of the amount or expression of the gene product. In one embodiment, the expression of the targeted gene product is increased by the method. In one embodiment, the expression of the targeted gene product is increased by at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, 100%. In one embodiment, the expression of the targeted gene product is increased by at least 1.5 times, at least 2 times, at least 2.5 times, at least 3 times, at least 3.5 times, at least 3.5 times, at least 4 times, at least 4.5 times, at least 5 times, at least 10 times, at least 10 times, at least 15 times, at least 20 times, at least 25 times, at least 50 times, at least 100 times. In one embodiment, the expression of the targeted gene product is reduced by at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%. In one embodiment, the expression of the targeted gene product is reduced to at least 1/1.5, at least 1/2, at least 1/2.5, at least 1/3, at least 1/3.5, at least 1/3.5, at least 1/4, at least 1/4.5, at least 1/5, at least 1/10, at least 1/10, at least 1/15, at least 1/20, at least 1/25, at least 1/50, at least 1/100. In an alternative embodiment, the expression of the targeted gene product is reduced by the method. In other embodiments, the expression of the targeted gene can be completely eliminated, or can be considered eliminated if the residual expression level of the targeted gene is reduced to below the detection limit of the method known in the art for quantification, detection or monitoring expression levels.

在一个实施方案中,将驱动包含核酸靶向系统的一种或多种元件的核酸靶向系统或递送系统的一种或多种元件的表达的一种或多种多核苷酸分子、载体或载体系统引入宿主细胞中,使得该核酸靶向系统的这些元件的表达引导靶向核酸的复合物在一个或多个靶位点处形成。在本公开的一个实施方案中,宿主细胞可以是真核细胞、原核细胞或植物细胞。In one embodiment, one or more polynucleotide molecules, vectors or vector systems that drive the expression of one or more elements of a nucleic acid targeting system or a delivery system comprising one or more elements of a nucleic acid targeting system are introduced into a host cell such that expression of these elements of the nucleic acid targeting system directs the formation of a nucleic acid-targeting complex at one or more target sites. In one embodiment of the present disclosure, the host cell may be a eukaryotic cell, a prokaryotic cell or a plant cell.

还预期包含本文实施方案中任一项所述的多核苷酸分子、载体、载体系统或细胞中的一者或多者的人细胞或组织、植物或非人动物。在一个方面,提供了通过本公开的组合物、系统或修饰的酶修饰的或包含本公开的组合物、系统或修饰的酶的宿主细胞和细胞系,包括(分离的)干细胞及其子代。It is also contemplated that human cells or tissues, plants or non-human animals comprising one or more of the polynucleotide molecules, vectors, vector systems or cells of any one of the embodiments herein. In one aspect, host cells and cell lines modified by or comprising a composition, system or modified enzyme of the present disclosure are provided, including (isolated) stem cells and progeny thereof.

在一个实施方案中,植物或非人动物在所述植物或非人动物的至少一种组织类型处包含本文实施方案任一项中所述的组合物、多核苷酸分子、载体、载体系统或细胞中的至少一者。在某些实施方案中,非人动物在至少一种组织类型中包含本文实施方案任一项中所述的组合物、多核苷酸分子、载体、载体系统或细胞中的至少一者。在一个实施方案中,组合物的存在是瞬时的,因为它们会随着时间的推移而降解。在一个实施方案中,包含在多核苷酸分子、载体、载体系统或细胞中的在实施方案中任一项所述的组合物的表达限于植物或非人动物中的某些组织类型或区域。在一个实施方案中,包含在多核苷酸分子、载体、载体系统或细胞中的在实施方案中任一项所述的组合物的表达依赖于生理线索。在一个实施方案中,包含在多核苷酸分子、载体、载体系统或细胞中的在实施方案中任一项所述的组合物的表达可由外源性分子触发。在一个实施方案中,包含在多核苷酸分子、载体、载体系统或细胞中的在实施方案中任一项所述的组合物的表达取决于非Cas分子在植物或非人动物中的表达。In one embodiment, a plant or non-human animal comprises at least one of the compositions, polynucleotide molecules, vectors, vector systems or cells described in any one of the embodiments herein at least one tissue type of the plant or non-human animal. In certain embodiments, a non-human animal comprises at least one of the compositions, polynucleotide molecules, vectors, vector systems or cells described in any one of the embodiments herein in at least one tissue type. In one embodiment, the presence of the composition is transient because they degrade over time. In one embodiment, the expression of the composition described in any one of the embodiments contained in a polynucleotide molecule, a vector, a vector system or a cell is limited to certain tissue types or regions in a plant or non-human animal. In one embodiment, the expression of the composition described in any one of the embodiments contained in a polynucleotide molecule, a vector, a vector system or a cell depends on physiological cues. In one embodiment, the expression of the composition described in any one of the embodiments contained in a polynucleotide molecule, a vector, a vector system or a cell may be triggered by an exogenous molecule. In one embodiment, the expression of the composition described in any one of the embodiments contained in a polynucleotide molecule, a vector, a vector system or a cell depends on the expression of non-Cas molecules in plants or non-human animals.

在一个方面,本公开提供了使用核酸靶向系统的一种或多种元件的方法。本公开的靶向核酸的复合物提供了用于修饰靶DNA或RNA(单链或双链、线性或超螺旋)的高效手段。本公开的靶向核酸的复合物具有多种多样的实用性,包括修饰(例如,缺失、插入、转位、失活、激活)多种细胞类型中的靶DNA或RNA。这样,本公开的靶向核酸的复合物在例如基因疗法、药物筛选、疾病诊断和预后中具有广谱应用。示例性的靶向核酸的复合物包含与杂交至目标靶基因座内的靶序列的crRNA或指导RNA复合的靶向DNA或RNA的效应蛋白。In one aspect, the present disclosure provides a method using one or more elements of a nucleic acid targeting system. The target nucleic acid complex disclosed herein provides an efficient means for modifying a target DNA or RNA (single-stranded or double-stranded, linear or supercoiled). The target nucleic acid complex disclosed herein has a variety of practicality, including modification (e.g., deletion, insertion, translocation, inactivation, activation) of a target DNA or RNA in a variety of cell types. In this way, the target nucleic acid complex disclosed herein has a wide spectrum of applications in, for example, gene therapy, drug screening, disease diagnosis and prognosis. The target nucleic acid complex of an exemplary target nucleic acid includes a target DNA or RNA effector protein that is hybridized to a crRNA or guide RNA compounded with a target sequence in a target locus.

在一些实施方案中,本公开提供了一种切割靶多核苷酸的方法。所述方法可包括使用结合至靶多核苷酸并影响所述靶多核苷酸的切割的靶向核酸的复合物来修饰靶多核苷酸。在一个实施方案中,本公开的靶向核酸的复合物在引入细胞中时可在多核苷酸序列中产生断裂(例如,单链或双链断裂)。在一个实施方案中,所述方法可包括允许组合物结合至靶DNA或RNA以实现所述靶DNA或RNA的切割从而修饰所述靶DNA或RNA,其中靶向核酸的复合物包含与杂交至所述靶DNA或RNA内的靶序列的指导RNA复合的靶向核酸的效应蛋白。在一个方面,本公开提供了一种修饰DNA或RNA在真核细胞中的表达的方法。在一个实施方案中,所述方法包括允许靶向核酸的复合物结合至DNA或RNA,使得所述结合导致所述DNA或RNA的表达增加或减少;其中靶向核酸的复合物包含与crRNA或指导RNA复合的靶向核酸的效应蛋白。类似的考虑和条件适用于上述修饰靶DNA或RNA的方法。事实上,这些取样、培养和重新引入选项适用于本公开的各个方面。在一个方面,本公开提供了修饰真核细胞中的靶DNA或RNA的方法,所述方法可以在体内、离体或体外进行。在一个实施方案中,所述方法包括对来自人或非人动物的细胞或细胞群进行取样,以及修饰所述一个或多个细胞。培养可在任何阶段离体进行。甚至可将一个或多个细胞重新引入非人动物或植物中。对于重新引入的细胞,特别优选的是所述细胞是干细胞。如本文任何实施方案中所述的组合物可用于检测核酸标识符。In some embodiments, the present disclosure provides a method for cutting a target polynucleotide. The method may include modifying the target polynucleotide using a complex of a targeting nucleic acid that binds to the target polynucleotide and affects the cutting of the target polynucleotide. In one embodiment, the complex of the targeting nucleic acid of the present disclosure may produce a break (e.g., a single-strand or double-strand break) in a polynucleotide sequence when introduced into a cell. In one embodiment, the method may include allowing a composition to bind to a target DNA or RNA to achieve cutting of the target DNA or RNA so as to modify the target DNA or RNA, wherein the complex of the targeting nucleic acid comprises an effector protein of a targeting nucleic acid that is compounded with a guide RNA that hybridizes to a target sequence within the target DNA or RNA. In one aspect, the present disclosure provides a method for modifying the expression of DNA or RNA in a eukaryotic cell. In one embodiment, the method includes allowing a complex of a targeting nucleic acid to bind to a DNA or RNA so that the binding causes an increase or decrease in the expression of the DNA or RNA; wherein the complex of the targeting nucleic acid comprises an effector protein of a targeting nucleic acid compounded with a crRNA or a guide RNA. Similar considerations and conditions apply to the above-mentioned method for modifying the target DNA or RNA. In fact, these sampling, culturing, and reintroduction options are applicable to various aspects of the present disclosure. In one aspect, the present disclosure provides a method for modifying a target DNA or RNA in a eukaryotic cell, which method can be performed in vivo, in vitro or in vitro. In one embodiment, the method includes sampling a cell or cell population from a human or non-human animal, and modifying the one or more cells. Cultivation can be performed in vitro at any stage. One or more cells can even be reintroduced into a non-human animal or plant. For the reintroduced cells, it is particularly preferred that the cells are stem cells. The composition as described in any embodiment herein can be used to detect a nucleic acid identifier.

在一个实施方案中,本文的组合物诱导双链断裂以便达到诱导HDR介导的校正的目的。在另一个实施方案中,与Cas12o蛋白或其直系同源物或同系物复合的两个或更多个指导RNA可用于诱导多重断裂以便达到诱导HDR介导的校正的目的。重组模板核酸,如该术语在本文中所用,是指可与本文公开的组合物结合使用以改变靶位置的结构的核酸序列。在一个实施方案中,靶核酸经修饰以具有重组模板核酸的一些或全部序列,通常是在切割位点处或切割位点附近。在一个实施方案中,重组模板核酸是单链的。在一个替代实施方案中,重组模板核酸是双链的。在一个实施方案中,重组模板核酸是DNA,例如双链DNA。在一个替代实施方案中,重组模板核酸是单链DNA。在一个实施方案中,提供重组模板以用作同源重组中的模板,例如提供在由作为靶向核酸的复合物的一部分的靶向核酸的效应蛋白切口或切割的靶序列内或附近。在一个实施方案中,核酸酶诱导的非同源末端连接(NHEJ)可用于靶向基因特异性敲除。In one embodiment, the composition herein induces double-strand breaks in order to achieve the purpose of inducing HDR-mediated correction. In another embodiment, two or more guide RNAs compounded with Cas12o protein or its ortholog or homolog can be used to induce multiple breaks in order to achieve the purpose of inducing HDR-mediated correction. Recombinant template nucleic acid, as the term is used in this article, refers to a nucleic acid sequence that can be used in combination with the composition disclosed herein to change the structure of the target position. In one embodiment, the target nucleic acid is modified to have some or all of the sequences of the recombinant template nucleic acid, usually at or near the cleavage site. In one embodiment, the recombinant template nucleic acid is single-stranded. In an alternative embodiment, the recombinant template nucleic acid is double-stranded. In one embodiment, the recombinant template nucleic acid is DNA, such as double-stranded DNA. In an alternative embodiment, the recombinant template nucleic acid is single-stranded DNA. In one embodiment, a recombinant template is provided for use as a template in homologous recombination, for example, provided in or near a target sequence cut or cut by an effector protein of a targeting nucleic acid as a part of a complex of a targeting nucleic acid. In one embodiment, nuclease-induced non-homologous end joining (NHEJ) can be used for targeted gene-specific knockout.

示例性应用Example Applications

本公开提供了一种非天然存在的或工程化的组合物、或编码所述组合物的组分的一种或多种多核苷酸、或包含编码所述组合物的组分的一种或多种多核苷酸的载体或递送系统,其用于体内、离体或体外修饰靶细胞,并且所述修饰可以这样一种方式实施:改变细胞,使得一旦被修饰,Cas12o蛋白或CRISPR相关Cas12o蛋白修饰的细胞的子代或细胞系保留改变的表型。修饰的细胞和子代可以是多细胞生物体的一部分,诸如在将组合物离体或体内应用于所需细胞类型的情况下的植物或动物。本文的方法包括治疗性治疗方法。治疗性治疗方法可包括基因或基因组编辑,或基因疗法。The present disclosure provides a non-naturally occurring or engineered composition, or one or more polynucleotides encoding the components of the composition, or a vector or delivery system comprising one or more polynucleotides encoding the components of the composition, which is used to modify target cells in vivo, in vitro or in vitro, and the modification can be implemented in such a way that the cell is changed so that once modified, the progeny or cell line of the cell modified by the Cas12o protein or CRISPR-related Cas12o protein retains the changed phenotype. The modified cells and progeny can be part of a multicellular organism, such as a plant or animal in which the composition is applied in vitro or in vivo to a desired cell type. The methods herein include therapeutic treatment methods. The therapeutic treatment method may include gene or genome editing, or gene therapy.

在一个实施方案中,本文所述的一种或多种载体用于产生非人转基因动物或转基因植物。在一个实施方案中,转基因动物是哺乳动物,诸如小鼠、大鼠或兔。用于产生转基因动物和植物的方法在本领域中是已知的,并且一般是从细胞转染方法开始,例如本文所述。在一个实施方案中,本公开提供了一种包含本文所述的无催化活性的Cas12o蛋白或CRISPR相关Cas12o蛋白的工程化的非天然存在的组合物,并将此系统用于诸如荧光原位杂交(FISH)的检测方法中。In one embodiment, one or more vectors described herein are used to produce non-human transgenic animals or transgenic plants. In one embodiment, the transgenic animal is a mammal, such as a mouse, a rat or a rabbit. Methods for producing transgenic animals and plants are known in the art, and generally start from a cell transfection method, such as described herein. In one embodiment, the present disclosure provides an engineered non-natural composition comprising a catalytically inactive Cas12o protein or a CRISPR-related Cas12o protein as described herein, and this system is used in detection methods such as fluorescence in situ hybridization (FISH).

患者特异性筛选方法Patient-specific screening methods

靶向DNA(例如三核苷酸重复序列)的核酸靶向系统可用于筛选患者或患者样本中是否存在此类重复序列。重复序列可以是核酸靶向系统的RNA的靶标,并且如果核酸靶向系统与其结合,则可检测到该结合,从而指示存在此种重复序列。因此,核酸靶向系统可用于筛选患者或患者样本中是否存在所述重复序列。然后可向患者施用合适的化合物以解决该状况;或者,可施用核酸靶向系统以进行结合并引起插入、缺失或突变并减轻该状况。Nucleic acid targeting systems that target DNA (e.g., trinucleotide repeat sequences) can be used to screen patients or patient samples for the presence of such repeat sequences. The repeat sequence can be a target for the RNA of the nucleic acid targeting system, and if the nucleic acid targeting system binds to it, the binding can be detected, indicating the presence of such a repeat sequence. Therefore, the nucleic acid targeting system can be used to screen patients or patient samples for the presence of the repeat sequence. The patient can then be administered an appropriate compound to address the condition; alternatively, the nucleic acid targeting system can be administered to bind and cause insertions, deletions, or mutations and alleviate the condition.

全基因组基因敲除筛选Genome-wide knockout screening

本文所述的Cas12o蛋白和系统可用于执行高效且具有成本效益的功能基因组筛选。此类筛选可利用基于Cas12o蛋白核酸酶的全基因组文库。此类筛选和文库可用于确定基因的功能、基因参与的细胞途径,以及基因表达的任何改变导致特定生物过程的方式。本公开的一个优点是所述组合物避免了脱靶结合及其产生的副作用。这是使用被布置成对靶DNA具有高度序列特异性的系统来实现的。在本公开的优选实施方案中,Cas12o蛋白或CRISPR相关Cas12o蛋白复合物是Cas12o蛋白复合物。The Cas12o proteins and systems described herein can be used to perform efficient and cost-effective functional genomic screening. Such screening can utilize whole genome libraries based on Cas12o protein nucleases. Such screening and libraries can be used to determine the function of genes, the cellular pathways involved in genes, and the way in which any changes in gene expression lead to specific biological processes. An advantage of the present disclosure is that the composition avoids off-target binding and the side effects it produces. This is achieved using a system that is arranged to have a high degree of sequence specificity for target DNA. In a preferred embodiment of the present disclosure, the Cas12o protein or CRISPR-related Cas12o protein complex is a Cas12o protein complex.

功能改变和筛选Functional changes and screening

在一个实施方案中,本公开提供了一种基因功能评价和筛选的方法。使用组合物以精确递送功能结构域、激活或阻遏基因或通过精确改变特定目标基因座上的甲基化位点而改变表观遗传状态,可与一个或多个crRNA或指导RNA一起应用于单个细胞或细胞群或与文库一起离体或体内应用于细胞库中的基因组,包括施用或表达包含多个crRNA(包含指导分子)的文库,并且其中筛选还包括使用Cas12o蛋白或CRISPR相关Cas12o蛋白,其中包含Cas12o蛋白或CRISPR相关Cas12o蛋白的复合物经修饰以包含异源功能结构域。In one embodiment, the present disclosure provides a method for evaluating and screening gene function. Compositions are used to accurately deliver functional domains, activate or repress genes, or change epigenetic states by accurately changing methylation sites on specific target loci, which can be applied to single cells or cell groups together with one or more crRNAs or guide RNAs or applied to genomes in cell banks in vitro or in vivo together with libraries, including applying or expressing libraries comprising multiple crRNAs (comprising guide molecules), and wherein screening also includes using Cas12o proteins or CRISPR-related Cas12o proteins, wherein the complex comprising Cas12o proteins or CRISPR-related Cas12o proteins is modified to include heterologous functional domains.

细胞或生物体的修饰Modification of cells or organisms

本公开还提供了包含本文系统的一种或多种组分,例如Cas12o蛋白或CRISPR相关Cas12o蛋白和/或crRNA的细胞。另外提供的包括通过本文的系统和方法修饰的细胞,以及包含此类细胞或其子代的细胞培养物、组织、器官、生物体。本公开在一个实施方案中包括一种修饰细胞或生物体的方法。细胞可以是原核细胞或真核细胞。细胞可以是哺乳动物细胞。哺乳动物细胞可以是非人灵长类动物、牛、猪、啮齿动物或小鼠细胞。细胞可以是非哺乳动物真核细胞,诸如家禽、鱼或虾。细胞还可以是植物细胞。植物细胞可以是农作物诸如木薯、玉米、高粱、小麦或水稻的细胞。植物细胞还可以是藻类、树木或蔬菜的细胞。通过本公开引入细胞的修饰可使得改变细胞和细胞的子代以提高生物产物(诸如抗体、淀粉、醇或其他所需细胞输出)的产生。通过本公开引入细胞的修饰可使得细胞和细胞的子代包括改变所产生的生物产物的改变。The present disclosure also provides cells comprising one or more components of the system herein, such as Cas12o protein or CRISPR-related Cas12o protein and/or crRNA. Also provided are cells modified by the systems and methods herein, and cell cultures, tissues, organs, and organisms comprising such cells or their progeny. The present disclosure includes, in one embodiment, a method for modifying cells or organisms. The cell may be a prokaryotic cell or a eukaryotic cell. The cell may be a mammalian cell. The mammalian cell may be a non-human primate, a cattle, a pig, a rodent, or a mouse cell. The cell may be a non-mammalian eukaryotic cell, such as poultry, fish, or shrimp. The cell may also be a plant cell. The plant cell may be a cell of a crop such as cassava, corn, sorghum, wheat, or rice. The plant cell may also be a cell of algae, a tree, or a vegetable. The modification introduced into the cell by the present disclosure may cause the cell and the progeny of the cell to be changed to increase the production of a biological product (such as an antibody, starch, alcohol, or other desired cell output). The modification introduced into the cell by the present disclosure may cause the cell and the progeny of the cell to include a change in the biological product produced.

治疗用途和治疗方法Therapeutic uses and treatments

本文还提供了诊断、预测、治疗和/或预防受试者的疾病、状态或疾患的方法。一般来说,诊断、预测、治疗和/或预防受试者的疾病、状态或疾患的方法可包括使用本文所述的组合物、系统或其组分修饰受试者或其细胞中的多核苷酸,并且/或者包括使用本文所述的组合物、系统或其组分检测受试者或其细胞中的患病或健康多核苷酸。在一个实施方案中,治疗或预防方法可包括使用组合物、系统或其组分来修饰受试者或其细胞内的感染性生物体(例如,细菌或病毒)的多核苷酸。在一个实施方案中,治疗或预防方法可包括使用组合物、系统或其组分来修饰受试者体内的感染性生物体或共生生物体的多核苷酸。所述组合物、系统及其组分可用于开发疾病、状态或疾患的模型。所述组合物、系统及其组分可用于检测疾病状态或其校正,诸如通过本文所述的治疗或预防方法。所述组合物、系统及其组分可用于筛选和选择可用作例如本文所述的治疗或预防的细胞。所述组合物、系统及其组分可用于开发可用于修改受试者或其细胞中的一种或多种生物功能或活性的生物活性剂。Also provided herein is a method for diagnosing, predicting, treating and/or preventing a disease, state or illness of a subject. In general, the method for diagnosing, predicting, treating and/or preventing a disease, state or illness of a subject may include using a composition, system or its components as described herein to modify a polynucleotide in a subject or its cell, and/or include using a composition, system or its components as described herein to detect a diseased or healthy polynucleotide in a subject or its cell. In one embodiment, a treatment or prevention method may include using a composition, system or its components to modify a polynucleotide of an infectious organism (e.g., bacteria or virus) in a subject or its cell. In one embodiment, a treatment or prevention method may include using a composition, system or its components to modify a polynucleotide of an infectious organism or a symbiotic organism in a subject. The composition, system and its components may be used to develop a model of a disease, state or illness. The composition, system and its components may be used to detect a disease state or its correction, such as by a treatment or prevention method as described herein. The composition, system and its components may be used to screen and select cells that may be used as, for example, treatment or prevention as described herein. The composition, system and its components may be used to develop a bioactive agent that may be used to modify one or more biological functions or activities in a subject or its cell.

在一个实施方式中,所述应用包括向所述受试者或所述受试者的离体细胞施用前述的融合蛋白、前述多核苷酸、前述的CRISPR-Cas组合物、复合物、系统、试剂盒、递送组合物、酶制剂。In one embodiment, the use comprises administering the aforementioned fusion protein, the aforementioned polynucleotide, the aforementioned CRISPR-Cas composition, complex, system, kit, delivery composition, enzyme preparation to the subject or the subject's ex vivo cells.

在一些实施方案中,所述病症或疾病包括代谢性疾病、癌症、神经性疾病、眼科疾病和传染性疾病。In some embodiments, the condition or disease includes metabolic disease, cancer, neurological disease, ophthalmic disease, and infectious disease.

在一些实施方案中,所述病症或疾病包括遗传性疾病。In some embodiments, the condition or disease comprises a genetic disease.

在一些实施方案中,所述病症或疾病是由致病性点突变引起。In some embodiments, the condition or disease is caused by a pathogenic point mutation.

在一个实施方案中,所述疾病包括囊性纤维化、进行性假肥大性肌营养不良(Duchenne型肌营养不良,DMD)、贝克肌营养不良、α-1-抗胰蛋白酶缺乏、庞贝病(糖原贮积病Ⅱ型)、强直性肌营养不良、亨廷顿病、脆性X综合征、弗里德赖希共济失调、肌萎缩侧索硬化、遗传性慢性肾脏病、镰状细胞病、β地中海贫血、额颞叶痴呆、莱伯氏先天性黑蒙、高脂血症、高胆固醇血症(FH)、动脉粥样硬化(ASCVD)、转甲状腺素蛋白淀粉样变(ATTR)、Alpha-1抗胰蛋白酶缺乏症(AATD)、视网膜疾病、黄斑变性、维尔姆斯瘤、尤文肉瘤、神经内分泌瘤、胶质母细胞瘤、神经母细胞瘤、黑色素瘤、皮肤癌、乳腺癌、结肠癌、直肠癌、前列腺癌、肝癌、肾癌、胰腺癌、肺癌、胆道癌、宫颈癌、子宫内膜癌、食管癌、胃癌、头颈癌、甲状腺髓样癌、卵巢癌、胶质瘤、淋巴瘤、白血病、骨髓瘤、急性淋巴细胞白血病、急性髓细胞性白血病、慢性淋巴细胞白血病、慢性髓细胞性白血病、何杰金氏淋巴瘤、非何杰金氏淋巴瘤和尿膀胱癌、原发性高草酸尿症(PH1)、遗传性血管性水肿(HAE)和乙型肝炎(HEPATITIS B)。In one embodiment, the disease comprises cystic fibrosis, Duchenne muscular dystrophy (DMD), Becker muscular dystrophy, alpha-1-antitrypsin deficiency, Pompe disease (glycogen storage disease type II), myotonic dystrophy, Huntington disease, fragile X syndrome, Friedreich's ataxia, amyotrophic lateral sclerosis, hereditary chronic kidney disease, sickle cell disease, beta thalassemia, frontotemporal dementia, Leber's congenital amaurosis, hyperlipidemia, hypercholesterolemia (FH), atherosclerosis (ASCVD), transthyretin amyloidosis (ATTR), Alpha-1 antitrypsin deficiency (AATD), retinal disease disease, macular degeneration, Wilms' tumor, Ewing's sarcoma, neuroendocrine tumors, glioblastoma, neuroblastoma, melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, kidney cancer, pancreatic cancer, lung cancer, bile duct cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid cancer, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphocytic leukemia, acute myeloid leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma and urinary bladder cancer, primary hyperoxaluria (PH1), hereditary angioedema (HAE) and hepatitis B (HEPATITIS B).

在一些实施方案中,所述病症或疾病包括高胆固醇血症(FH)、动脉粥样硬化(ASCVD)、转甲状腺素蛋白淀粉样变(ATTR)、Alpha-1抗胰蛋白酶缺乏症(AATD)、原发性高草酸尿症(PH1)、遗传性血管性水肿(HAE)和乙型肝炎(Hepatitis B)。In some embodiments, the condition or disease includes hypercholesterolemia (FH), atherosclerosis (ASCVD), transthyretin amyloidosis (ATTR), alpha-1 antitrypsin deficiency (AATD), primary hyperoxaluria (PH1), hereditary angioedema (HAE), and hepatitis B.

在一个实施方案中,所述病症或疾病包括单碱基突变所导致的疾病。In one embodiment, the disorder or disease comprises a disease caused by a single base mutation.

在一个实施方案中,所述临床变体数据库在NCBI ClinVar网站上可得的NCBI ClinVar数据库获得。从该清单中辨别致病性单核苷酸多态性(SNP)。使用基因组基因座信息,辨别与每一个SNP重叠且围绕每一个SNP的区域中的CRISPR目标。与Cas蛋白或其变体组合地使用碱基编辑可校正从而靶向因果突变的SNP中的选择被列于下表中,表中仅列出每一种疾病的一个别名。“RS#”对应于NCBI网站上的SNP数据库中的RS登录号。“AlleleID”对应于因果等位基因登录号。“名称”栏含有基因的基因座辨别符、基因名称、基因中的突变位置、及突变导致的变化。In one embodiment, the clinical variant database is obtained from the NCBI ClinVar database available on the NCBI ClinVar website. Pathogenic single nucleotide polymorphisms (SNPs) are identified from this list. Using the genomic locus information, CRISPR targets in the region overlapping and surrounding each SNP are identified. A selection of SNPs that can be corrected using base editing in combination with Cas proteins or variants thereof to target causal mutations are listed in the table below, which lists only one alias for each disease. "RS#" corresponds to the RS accession number in the SNP database on the NCBI website. "AlleleID" corresponds to the causal allele accession number. The "Name" column contains the locus identifier of the gene, the gene name, the mutation position in the gene, and the changes caused by the mutation.

在一个实施方案中,本文所述的组合物、系统和/或其组分可用于治疗和/或预防循环系统疾病。在一个实施方案中,本文所述的组合物、系统可用于治疗神经系统疾病。在一个实施方案中,本文所述的组合物、系统可用于治疗听力疾病,例如单耳或双耳的听力疾病或听力损失。耳聋通常是由毛细胞丢失或受损使得无法将信号传递给听觉神经元而引起的。在此类情况下,耳蜗植入物可用于对声音作出反应,并将电信号传输到神经细胞。但由于受损的毛细胞释放的生长因子较少,这些神经元经常退化并从耳蜗缩回。在一个实施方案中,本文所述的组合物、系统和/或其组分可用于治疗非分裂细胞中的疾病;在一个实施方案中,待校正的基因或转录物位于非分裂细胞中。示例性非分裂细胞是肌肉细胞或神经元。非分裂(尤其是非分裂、完全分化)细胞类型提出了关于基因靶向或基因组工程化的问题,例如因为同源重组(HR)一般在G1细胞周期阶段受抑制。在一个实施方案中,待治疗的疾病是影响眼睛的疾病。因此,在一个实施方案中,将本文所述的组合物、系统或其组分递送至一只或两只眼睛。所述组合物、系统可用于校正几种基因突变引起的眼部缺陷,其进一步描述于Genetic Diseases of the Eye,第二版,由Elias I.Traboulsi编辑,Oxford UniversityPress,2012中。In one embodiment, the compositions, systems and/or components thereof described herein can be used to treat and/or prevent circulatory system diseases. In one embodiment, the compositions, systems described herein can be used to treat nervous system diseases. In one embodiment, the compositions, systems described herein can be used to treat hearing diseases, such as hearing diseases or hearing loss in one or both ears. Deafness is usually caused by the loss or damage of hair cells, which makes it impossible to transmit signals to auditory neurons. In such cases, cochlear implants can be used to respond to sound and transmit electrical signals to nerve cells. However, due to the reduced growth factors released by damaged hair cells, these neurons often degenerate and retract from the cochlea. In one embodiment, the compositions, systems and/or components thereof described herein can be used to treat diseases in non-dividing cells; in one embodiment, the gene or transcript to be corrected is located in a non-dividing cell. Exemplary non-dividing cells are muscle cells or neurons. Non-dividing (especially non-dividing, fully differentiated) cell types raise questions about gene targeting or genome engineering, for example because homologous recombination (HR) is generally inhibited in the G1 cell cycle phase. In one embodiment, the disease to be treated is a disease affecting the eye. Thus, in one embodiment, the compositions, systems, or components thereof described herein are delivered to one or both eyes. The compositions, systems, and methods described herein can be used to correct eye defects caused by several genetic mutations, which are further described in Genetic Diseases of the Eye, 2nd edition, edited by Elias I. Traboulsi, Oxford University Press, 2012.

在一个实施方案中,待治疗或待靶向的疾患是眼部病症。在一个实施方案中,眼部病症可包括青光眼。在一个实施方案中,眼部病症包括视网膜退行性疾病。在一个实施方案中,所述视网膜退行性疾病选自斯特格病(Stargardt disease)、巴比二氏综合征(Bardet-Biedl syndrome)、贝斯特病(Best disease)、蓝锥全色盲、脉络膜症、锥杆营养不良、先天性静止性夜盲症、增强S锥体综合征、青少年X连锁视网膜劈裂症、莱伯先天性黑蒙症(LeberCongenital Amaurosis)、莱文泰诺病(Malattia Leventinesse)、诺里病(NorrieDisease)或X连锁家族性渗出性玻璃体视网膜病变、图形样营养不良(PatternDystrophy)、索斯比营养不良(Sorsby Dystrophy)、乌谢尔综合征(Usher Syndrome)、色素性视网膜炎、色盲或黄斑营养不良或变性、色素性视网膜炎、色盲和年龄相关性黄斑变性。在一个实施方案中,视网膜退行性疾病是莱伯先天性黑蒙症(LCA)或色素性视网膜炎。在一个实施方案中,为了施用至眼睛使用慢病毒载体。在一个实施方案中,慢病毒载体是马感染性贫血病毒(EIAV)载体。其他病毒载体也可用于递送至眼睛,所述病毒载体诸如AAV载体,诸如描述于以下中的那些:Campochiaro等人,Human Gene Therapy 17:167-176(2006年2月);Millington-Ward等人(Molecular Therapy,第19卷第4期,642-649 2011年4月;Dalkara等人(SciTransl Med5,189ra76(2013)),其可适于与本文所述的组合物、系统一起使用。在一个实施方案中,剂量可在约106至109.5个粒子单位的范围内。In one embodiment, the condition to be treated or targeted is an ocular disorder. In one embodiment, the ocular disorder may include glaucoma. In one embodiment, the ocular disorder includes a retinal degenerative disease. In one embodiment, the retinal degenerative disease is selected from Stargardt disease, Bardet-Biedl syndrome, Best disease, blue cone achromatopsia, choroideremia, cone-rod dystrophy, congenital stationary night blindness, enhanced S cone syndrome, juvenile X-linked retinoschisis, Leber Congenital Amaurosis, Malattia Leventinesse, Norrie Disease or X-linked familial exudative vitreoretinopathy, pattern dystrophy, Sorsby Dystrophy, Usher Syndrome, retinitis pigmentosa, color blindness or macular dystrophy or degeneration, retinitis pigmentosa, color blindness and age-related macular degeneration. In one embodiment, the retinal degenerative disease is Leber Congenital Amaurosis (LCA) or retinitis pigmentosa. In one embodiment, a lentiviral vector is used for administration to the eye. In one embodiment, the lentiviral vector is an equine infectious anemia virus (EIAV) vector. Other viral vectors may also be used for delivery to the eye, such as AAV vectors, such as those described in: Campochiaro et al., Human Gene Therapy 17:167-176 (February 2006); Millington-Ward et al. (Molecular Therapy, Vol. 19, No. 4, 642-649 April 2011; Dalkara et al. (Sci Transl Med 5, 189ra76 (2013)), which may be suitable for use with the compositions and systems described herein. In one embodiment, the dosage may be in the range of about 106 to 109.5 particle units.

在一个实施方案中,组合物、系统可用于治疗和/或预防肌肉疾病和相关的循环系统或心血管疾病或病症。本公开还考虑将本文所述的组合物、系统,例如Cas12o蛋白或CRISPR相关Cas12o蛋白系统,递送至心脏。对于心脏,心肌趋向性腺相关病毒(AAVM)是优选的,特别是在心脏中表现出优先基因转移的AAVM41(参见,例如,Lin-Yanga等人,PNAS,2009年3月10日,第106卷,第10期)。施用可以是全身的或局部的。对于全身施用考虑约1-10x 1014个载体基因组的剂量。另参见,例如,Eulalio等人(2012)Nature 492:376和Somasuntharam等人(2013)Biomaterials 34:7790,其教导内容可适于和/或应用于本文所述的组合物、系统。In one embodiment, the composition, system can be used to treat and/or prevent muscle diseases and related circulatory or cardiovascular diseases or conditions. The present disclosure also contemplates delivering the compositions, systems described herein, such as Cas12o proteins or CRISPR-related Cas12o protein systems, to the heart. For the heart, myocardial tropism adeno-associated viruses (AAVMs) are preferred, particularly AAVM41 that exhibits preferential gene transfer in the heart (see, e.g., Lin-Yanga et al., PNAS, March 10, 2009, Vol. 106, No. 10). Administration can be systemic or local. For systemic administration, a dose of about 1-10x 1014 vector genomes is considered. See also, e.g., Eulalio et al. (2012) Nature 492:376 and Somasuntharam et al. (2013) Biomaterials 34:7790, the teachings of which may be suitable for and/or applied to the compositions, systems described herein.

例如,美国专利公布第20110023139号,其教导内容可适于和/或应用于本文所述的组合物、系统,描述了使用锌指核酸酶对与心血管疾病相关联的细胞、动物和蛋白质进行基因修饰。心血管疾病通常包括高血压、心脏病发作、心力衰竭、以及中风和TIA。涉及心血管疾病的任何染色体序列或由涉及心血管疾病的任何染色体序列编码的蛋白质都可用于本公开所述的方法。通常基于心血管相关蛋白与心血管疾病发展的实验性关联来选择心血管相关蛋白。例如,相对于缺乏心血管病症的群体,在患有心血管病症的群体中,心血管相关蛋白的产生率或循环浓度可升高或降低。可使用蛋白质组学技术评估蛋白质水平的差异,所述技术包括但不限于蛋白质印迹、免疫组织化学染色、酶联免疫吸附测定(ELISA)和质谱法。或者,可通过使用基因组技术获得编码蛋白质的基因的基因表达谱来鉴定心血管相关蛋白,所述技术包括但不限于DNA微阵列分析、基因表达系列分析(SAGE)和定量实时聚合酶链式反应(Q-PCR)。For example, U.S. Patent Publication No. 20110023139, its teaching content can be suitable for and/or applied to the compositions and systems described herein, describes the use of zinc finger nucleases to genetically modify cells, animals and proteins associated with cardiovascular disease. Cardiovascular disease generally includes hypertension, heart attack, heart failure, and stroke and TIA. Any chromosomal sequence related to cardiovascular disease or protein encoded by any chromosomal sequence related to cardiovascular disease can be used for the method disclosed herein. Cardiovascular-related proteins are usually selected based on the experimental association of cardiovascular-related proteins with the development of cardiovascular disease. For example, relative to a group lacking cardiovascular disease, in a group suffering from cardiovascular disease, the production rate or circulating concentration of cardiovascular-related proteins can be increased or decreased. Proteomic techniques can be used to assess the difference in protein levels, including but not limited to western blot, immunohistochemical staining, enzyme-linked immunosorbent assay (ELISA) and mass spectrometry. Alternatively, cardiovascular-related proteins can be identified by using genomic technology to obtain the gene expression profile of the gene encoding the protein, including but not limited to DNA microarray analysis, gene expression series analysis (SAGE) and quantitative real-time polymerase chain reaction (Q-PCR).

本文的组合物、系统可用于治疗肌肉系统的疾病。本公开还考虑将本文所述的组合物、系统、效应蛋白系统递送至肌肉。在一个实施方案中,待治疗的肌肉疾病是肌营养不良,诸如DMD。在一个实施方案中,本文所述的组合物、系统(诸如能够进行RNA修饰的系统)可用于实现外显子跳跃以实现患病基因的校正。在一个实施方案中,所述方法包括治疗镰状细胞相关疾病,例如镰状细胞性状、镰状细胞病诸如镰状细胞性贫血、β地中海贫血。例如,所述方法和系统可用于例如通过校正β球蛋白基因的一个或多个突变来修饰镰状细胞的基因组。在β地中海贫血的情况下,镰状细胞性贫血可通过用所述系统修饰HSC来校正。所述系统允许通过切割细胞的DNA然后让其自我修复来对细胞的基因组进行特异性编辑。Cas12o蛋白被插入并由RNA指导物引导至突变点,然后在该点切割DNA。同时,插入序列的健康型式。这个序列被细胞自身的修复系统用来修复诱导的切口。通过这种方式,Cas12o蛋白或CRISPR相关Cas12o蛋白允许校正先前获得的干细胞中的突变。这些方法和系统可用于使用靶向和校正突变的系统来校正镰状细胞性贫血有关的HSC(例如,使用合适的HDR模板,该模板递送β球蛋白的编码序列,有利地是非镰状β球蛋白);具体地,指导RNA可靶向导致镰状细胞性贫血的突变,并且HDR可以对β球蛋白的正确表达提供编码。靶向含有突变和Cas12o蛋白的粒子的ωRNA或指导RNA与携带突变的HSC接触。粒子还可含有合适的HDR模板以校正突变从而正确表达β球蛋白;或者可使HSC与含有或递送HDR模板的第二粒子或载体接触。可施用如此接触的细胞;并任选性地处理/扩增;参考Cartier。HDR模板可使HSC以表达工程化的β球蛋白基因(例如,βA-T87Q)或β球蛋白。The compositions and systems herein can be used to treat diseases of the muscle system. The present disclosure also contemplates delivering the compositions, systems, and effector protein systems described herein to muscles. In one embodiment, the muscle disease to be treated is muscular dystrophy, such as DMD. In one embodiment, the compositions and systems described herein (such as systems capable of RNA modification) can be used to achieve exon skipping to achieve correction of diseased genes. In one embodiment, the method includes treating sickle cell-related diseases, such as sickle cell traits, sickle cell diseases such as sickle cell anemia, and beta thalassemia. For example, the methods and systems can be used to modify the genome of sickle cells, for example, by correcting one or more mutations in the beta globin gene. In the case of beta thalassemia, sickle cell anemia can be corrected by modifying HSCs with the system. The system allows for specific editing of the genome of a cell by cutting the cell's DNA and then allowing it to repair itself. The Cas12o protein is inserted and guided to the mutation point by the RNA guide, and then the DNA is cut at that point. At the same time, a healthy version of the sequence is inserted. This sequence is used by the cell's own repair system to repair the induced incision. In this way, the Cas12o protein or CRISPR-associated Cas12o protein allows correction of mutations in previously obtained stem cells. These methods and systems can be used to correct sickle cell anemia-related HSCs using a system that targets and corrects mutations (e.g., using a suitable HDR template that delivers a coding sequence for a β globin, advantageously a non-sickling β globin); specifically, the guide RNA can target mutations that cause sickle cell anemia, and HDR can provide coding for the correct expression of β globin. ω RNA or guide RNA targeting particles containing mutations and Cas12o proteins is contacted with HSCs carrying mutations. The particles may also contain a suitable HDR template to correct mutations so as to correctly express β globin; or the HSC may be contacted with a second particle or vector containing or delivering an HDR template. Cells so contacted may be administered; and optionally processed/amplified; refer to Cartier. The HDR template may enable HSCs to express engineered β globin genes (e.g., βA-T87Q) or β globin.

在一个实施方案中,本文所述的组合物、系统或其组分可用于治疗肾脏或肝脏的疾病。因此,在一个实施方案中,本文所述的组合物或其组分递送至肝脏或肾脏。诱导治疗性核酸的细胞摄取的递送策略包括物理力或载体系统,诸如基于病毒、脂质或复合体的递送,或纳米载剂。根据具有较低可能的临床相关性的最初应用,当以全身性流体动力高压注射将核酸投递于肾细胞时,各种基因治疗性病毒和非病毒载剂已经被应用于体内靶向不同的动物肾脏疾病模型中的转录后事件((Csaba Révész和Péter Hamar(2011).Delivery Methods to Target RNAs in the Kidney,Gene TherapyApplications,Prof.Chunsheng Kang(编辑),ISBN:978-953-307-541-9,InTech,可获自:www.intechopen.com/books/gene-therapy-applications/delivery-methods-to-target-rnas-inthe-kidney)。递送至肾脏的方法可包括描述于Yuan等人(Am J PhysiolRenal Physiol 295:F605-F617,2008)中的那些。Yuang等人的方法可应用于本公开的组合物,其考虑将1-2g与胆固醇缀合的Cas12o蛋白皮下注射至人,用于递送至肾脏。在一个实施方案中,Molitoris等人(J Am Soc Nephrol 20:1754-1764,2009)的方法可适于组合物,并且对于人的12-20mg/kg的累积剂量可用于递送至肾脏的近端小管细胞。在一个实施方案中,Thompson等人(Nucleic Acid Therapeutics,第22卷,第4期,2012)的方法可适于组合物,并且可通过静脉内(i.v.)施用递送高达25mg/kg的剂量。在一个实施方案中,Shimizu等人(J Am Soc Nephrol 21:622-633,2010)的方法可适于组合物,并且可使用用于腹膜内(i.p.)施用的在约1-2升生理盐水中与纳米载剂复合的约10-20μmol组合物的剂量。In one embodiment, the compositions, systems, or components thereof described herein can be used to treat diseases of the kidney or liver. Thus, in one embodiment, the compositions, systems, or components thereof described herein are delivered to the liver or kidney. Delivery strategies for inducing cellular uptake of therapeutic nucleic acids include physical forces or carrier systems, such as delivery based on viruses, lipids, or complexes, or nanocarriers. Based on initial applications with low potential clinical relevance, various gene therapeutic viral and non-viral vectors have been used to target post-transcriptional events in vivo in different animal kidney disease models when nucleic acids are delivered to kidney cells by systemic hydrodynamic high-pressure injection ((Csaba Révész and Péter Hamar (2011). Delivery Methods to Target RNAs in the Kidney, Gene Therapy Applications, Prof. Chunsheng Kang (ed.), ISBN: 978-953-307-541-9, InTech, available from: www.intechopen.com/books/gene-therapy-applications/delivery-methods-to-target-rnas-inthe-kidney). Methods of delivery to the kidney may include those described in Yuan et al. (Am J Physiol Renal Physiol 295:F6). 05-F617,2008). The method of Yuang et al. can be applied to the compositions of the present disclosure, which contemplates subcutaneous injection of 1-2 g of Cas12o protein conjugated to cholesterol into humans for delivery to the kidneys. In one embodiment, the method of Molitoris et al. (J Am Soc Nephrol 20:1754-1764,2009) can be adapted to the compositions, and a cumulative dose of 12-20 mg/kg for humans can be used for delivery to the proximal tubule cells of the kidney. In one embodiment, Thompson et al. The method of human (Nucleic Acid Therapeutics, Vol. 22, No. 4, 2012) can be adapted to the composition and can deliver doses of up to 25 mg/kg by intravenous (i.v.) administration. In one embodiment, the method of Shimizu et al. (J Am Soc Nephrol 21:622-633, 2010) can be adapted to the composition and can use a dose of about 10-20 μmol of the composition complexed with a nanocarrier in about 1-2 liters of saline for intraperitoneal (i.p.) administration.

在一个实施方案中,由本文所述的组合物和系统治疗或预防的疾病可以是肺部或上皮疾病。本文所述的组合物和系统可用于治疗上皮和/或肺部疾病。本公开还考虑将本文所述的组合物、系统递送至一个或两个肺。In one embodiment, the disease treated or prevented by the compositions and systems described herein can be a pulmonary or epithelial disease. The compositions and systems described herein can be used to treat epithelial and/or pulmonary diseases. The present disclosure also contemplates delivering the compositions and systems described herein to one or both lungs.

在一个实施方案中,病毒载体可用于将组合物、系统或其组分递送至肺。在一个实施方案中,AAV是用于递送至肺的AAV-1、AAV-2、AAV-5、AAV-6和/或AAV-9。(参见,例如,Li等人,Molecular Therapy,第17卷第12期,2067-2077 2009年12月)。在一个实施方案中,MOI可以在1×103至4×105个载体基因组/细胞变化。在一个实施方案中,递送载体可以是如在Zamora等人(Am J Respir Crit Care Med第183卷.第531-538页,2011)中的RSV载体。Zamora等人的方法可应用于本公开的核酸靶向系统,并且雾化的组合物,例如以0.6mg/kg的剂量,可考虑用于本公开。In one embodiment, a viral vector can be used to deliver the composition, system, or components thereof to the lung. In one embodiment, the AAV is AAV-1, AAV-2, AAV-5, AAV-6, and/or AAV-9 for delivery to the lung. (See, e.g., Li et al., Molecular Therapy, Vol. 17, No. 12, 2067-2077, December 2009). In one embodiment, the MOI can vary from 1×103 to 4×105 vector genomes/cell. In one embodiment, the delivery vector can be an RSV vector as in Zamora et al. (Am J Respir Crit Care Med Vol. 183. pp. 531-538, 2011). The method of Zamora et al. can be applied to the nucleic acid targeting system of the present disclosure, and the aerosolized composition, e.g., at a dose of 0.6 mg/kg, can be considered for use in the present disclosure.

本文所述的组合物和系统可用于治疗皮肤疾病。本公开还考虑将本文所述的组合物和系统递送至皮肤。The compositions and systems described herein can be used to treat skin disorders.The present disclosure also contemplates delivering the compositions and systems described herein to the skin.

在一个实施方案中,可通过一种或多种微针或含有微针的装置将组合物、系统或其组分递送至皮肤(皮内递送)。例如,在一个实施方案中,所述装置和Hickerson等人(Molecular Therapy—Nucleic Acids(2013)2,e129)的方法可用于和/或适于例如以高达300μl的0.1mg/ml组合物的剂量将本文所述的组合物、系统递送至皮肤。在一个实施方案中,Leachman等人(Molecular Therapy,第18卷第2期,442-4462010年2月)的方法和技术可用于和/或适于将本文所述的组合物递送至皮肤。在一个实施方案中,Zheng等人(PNAS,2012年7月24日,第109卷,第30期,11975-11980)的方法和技术可用于和/或适于将本文所述的组合物纳米粒子递送至皮肤。在一个实施方案中,在单次应用中应用的约25nM的剂量可实现皮肤中的基因敲减。In one embodiment, the composition, system, or components thereof may be delivered to the skin via one or more microneedles or a device containing microneedles (intradermal delivery). For example, in one embodiment, the device and the method of Hickerson et al. (Molecular Therapy—Nucleic Acids (2013) 2, e129) may be used and/or adapted to deliver the composition, system described herein to the skin, for example, at a dose of up to 300 μl of a 0.1 mg/ml composition. In one embodiment, the methods and techniques of Leachman et al. (Molecular Therapy, Vol. 18, No. 2, 442-446 February 2010) may be used and/or adapted to deliver the composition described herein to the skin. In one embodiment, the methods and techniques of Zheng et al. (PNAS, July 24, 2012, Vol. 109, No. 30, 11975-11980) may be used and/or adapted to deliver nanoparticles of the composition described herein to the skin. In one embodiment, a dose of about 25 nM applied in a single application may achieve gene knockdown in the skin.

本文所述的组合物、系统可用于治疗癌症。本公开还考虑将本文所述的组合物、系统递送至癌细胞。此外,如本文别处所述,组合物、系统可用于修饰免疫细胞,诸如CAR或CART细胞,然后所述免疫细胞继而可用于治疗和/或预防癌症。这也描述于国际专利公布第WO 2015/161276号中,其公开内容特此以引用方式并入并在下文描述。The compositions and systems described herein can be used to treat cancer. The present disclosure also contemplates delivering the compositions and systems described herein to cancer cells. In addition, as described elsewhere herein, the compositions and systems can be used to modify immune cells, such as CAR or CART cells, which can then be used to treat and/or prevent cancer. This is also described in International Patent Publication No. WO 2015/161276, the disclosure of which is hereby incorporated by reference and described below.

本文所述的组合物、系统及其组分可用于修饰用于过继性细胞疗法的细胞。在本公开的一个方面,涉及编辑靶核酸序列或调节靶核酸序列的表达的方法和组合物及其结合癌症免疫疗法的应用通过适配本公开的组合物、系统来理解。在一些实例中,所述组合物、系统和方法可用于修饰干细胞(例如,诱导型多能细胞)以衍生修饰的自然杀伤细胞、γδT细胞和αβT细胞,这些细胞可用于过继性细胞疗法。在某些实例中,所述组合物、系统和方法可用于修饰修饰的自然杀伤细胞、γδT细胞和αβT细胞。The compositions, systems and components thereof described herein can be used to modify cells for adoptive cell therapy. In one aspect of the present disclosure, methods and compositions for editing target nucleic acid sequences or regulating the expression of target nucleic acid sequences and their application in combination with cancer immunotherapy are understood by adapting the compositions and systems disclosed herein. In some instances, the compositions, systems and methods can be used to modify stem cells (e.g., induced pluripotent cells) to derive modified natural killer cells, γδT cells and αβT cells, which can be used for adoptive cell therapy. In certain instances, the compositions, systems and methods can be used to modify modified natural killer cells, γδT cells and αβT cells.

如本文所用,“ACT”、“过继性细胞疗法”和“过继性细胞转移”可互换使用。在一个实施方案中,过继性细胞疗法(ACT)可以指将细胞转移到患者,目的是通过细胞的植入将功能性和特征转移到新宿主中(参见,例如,Mettananda等人,Editing anα-globin enhancerin primary human hematopoietic stem cells as a treatment forβ-thalassemia,NatCommun.2017年9月4日;8(1):424)。如本文所用,术语“植入(engraft)”或“植入(engraftment)”是指通过与组织的现有细胞接触而将细胞在体内并入目标组织中的过程。过继性细胞疗法(ACT)可以指将细胞(最常见的是免疫源性细胞)转移回同一患者或新的受体宿主中,目的是将免疫功能性和特征转移到新宿主中。如果可能的话,使用自体细胞通过最小化GVHD问题来帮助受体。自体肿瘤浸润淋巴细胞(TIL)(Zacharakis等人,(2018)NatMed.2018年6月;24(6):724-730;Besser等人,(2010)Clin.Cancer Res 16(9)2646-55;Dudley等人,(2002)Science 298(5594):850-4;和Dudley等人,(2005)Journal ofClinical Oncology 23(10):2346-57.)或基因重定向的外周血单核细胞(Johnson等人,(2009)Blood 114(3):535-46;和Morgan等人,(2006)Science314(5796)126-9)的过继性转移已被用于成功治疗患有晚期实体瘤(包括黑素瘤、转移性乳腺癌和结直肠癌)的患者以及患有CD19表达血液系统恶性肿瘤的患者(Kalos等人,(2011)Science TranslationalMedicine 3(95):95ra73)。在一个实施方案中,转移同种异体细胞免疫细胞(参见,例如,Ren等人,(2017)Clin Cancer Res 23(9)2255-2266)。如本文进一步描述的,可编辑同种异体细胞以降低同种异体反应性并预防移植物抗宿主病。因此,同种异体细胞的使用允许从健康供体获得细胞并将其制备用于患者,而不是从诊断后的患者制备自体细胞。As used herein, "ACT," "adoptive cell therapy," and "adoptive cell transfer" are used interchangeably. In one embodiment, adoptive cell therapy (ACT) may refer to the transfer of cells to a patient with the goal of transferring functionality and characteristics into the new host through the engraftment of the cells (see, e.g., Mettananda et al., Editing an α-globin enhancerin primary human hematopoietic stem cells as a treatment for β-thalassemia, Nat Commun. 2017 Sep 4;8(1):424). As used herein, the term "engraft" or "engraftment" refers to the process of incorporating cells into a target tissue in vivo through contact with existing cells of the tissue. Adoptive cell therapy (ACT) may refer to the transfer of cells (most commonly immunogenic cells) back into the same patient or into a new recipient host with the goal of transferring immune functionality and characteristics into the new host. If possible, using autologous cells helps the recipient by minimizing GVHD issues. Autologous tumor infiltrating lymphocytes (TIL) (Zacharakis et al., (2018) Nat Med. 2018 Jun;24(6):724-730; Besser et al., (2010) Clin. Cancer Res 16(9):2646-55; Dudley et al., (2002) Science 298(5594):850-4; and Dudley et al., (2005) Journal of Clinical Oncology 23(10):2346-57.) or Adoptive transfer of redirected peripheral blood mononuclear cells (Johnson et al., (2009) Blood 114(3):535-46; and Morgan et al., (2006) Science 314(5796)126-9) has been used to successfully treat patients with advanced solid tumors, including melanoma, metastatic breast cancer, and colorectal cancer, as well as patients with CD19-expressing hematological malignancies (Kalos et al., (2011) Science Translational Medicine 3(95):95ra73). In one embodiment, allogeneic immune cells are transferred (see, e.g., Ren et al., (2017) Clin Cancer Res 23(9)2255-2266). As further described herein, allogeneic cells can be edited to reduce alloreactivity and prevent graft-versus-host disease. Thus, the use of allogeneic cells allows cells to be obtained from a healthy donor and prepared for use in a patient, rather than preparing autologous cells from a patient after diagnosis.

在一个实施方案中,在疾病(诸如特别是肿瘤或癌症)的过继性细胞疗法(诸如特别是CAR或TCR T细胞疗法)中被靶向的抗原(诸如肿瘤抗原)可选自由以下组成的组:MR1(参见,例如,Crowther等人,2020,Genome-wide CRISPR-Cas9 screening revealsubiquitous T cell cancer targeting via the monomorphic MHC class I-relatedprotein MR1,Nature Immunology第21卷,第178-185页);B细胞成熟抗原(BCMA)(参见,例如,Friedman等人,Effective Targeting of Multiple BCMA-Expressing HematologicalMalignancies by Anti-BCMACAR T Cells,Hum Gene Ther.2018年3月8日;Berdeja JG等人Durable clinical responses in heavily pretreated patients with relapsed/refractory multiple myeloma:updated results from a multicenter study ofbb2121 anti-Bcma CAR T cell therapy.Blood.2017;130:740;以及Mouhieddine和Ghobrial,Immunotherapy in Multiple Myeloma:The Era of CAR T Cell Therapy,Hematologist,2018年5月至6月,第15卷,第3期);PSA(前列腺特异性抗原);前列腺特异性膜抗原(PSMA);PSCA(前列腺干细胞抗原);酪氨酸蛋白激酶跨膜受体ROR1;成纤维细胞激活蛋白(FAP);肿瘤相关糖蛋白72(TAG72);癌胚抗原(CEA);上皮细胞粘附分子(EPCAM);间皮素;人表皮生长因子受体2(ERBB2(Her2/neu));前列腺酶;前列腺酸性磷酸酶(PAP);延伸因子2突变体(ELF2M);胰岛素样生长因子1受体(IGF-1R);gplOO;BCR-ABL(断裂点簇集区-Abelson);酪氨酸酶;纽约食管鳞状细胞癌1(NY-ESO-1);κ-轻链,LAGE(L抗原);MAGE(黑素瘤抗原);黑素瘤相关抗原1(MAGE-A1);MAGE A3;MAGE A6;豆荚蛋白;人乳头瘤病毒(HPV)E6;HPV E7;prostein;生存素(survivin);PCTA1(半乳糖凝集素8);Melan-A/MART-1;Ras突变体;TRP-1(酪氨酸酶相关蛋白1或gp75);酪氨酸酶相关蛋白2(TRP2);TRP-2/INT2(TRP-2/内含子2);RAGE(肾抗原);晚期糖基化终产物受体1(RAGE1);肾泛素1、肾泛素2(RU1、RU2);肠道羧酸酯酶(iCE);热休克蛋白70-2(HSP70-2)突变体;促甲状腺激素受体(TSHR);CD123;CD171;CD19;CD20;CD22;CD26;CD30;CD33;CD44v7/8(分化簇44,内含子7/8);CD53;CD92;CD100;CD148;CD150;CD200;CD261;CD262;CD362;CS-1(CD2亚群1、CRACC、SLAMF7、CD319和19A24);C型凝集素样分子1(CLL-1);神经节苷脂GD3(aNeu5Ac(2-8)aNeu5Ac(2-3)bDGalp(1-4)bDGlcp(1-1)Cer);Tn抗原(Tn Ag);Fms样酪氨酸激酶3(FLT3);CD38;CD138;CD44v6;B7H3(CD276);KIT(CD117);白细胞介素13受体亚单位α-2(IL-13Ra2);白细胞介素11受体α(IL-11Ra);前列腺干细胞抗原(PSCA);丝氨酸蛋白酶21(PRSS21);血管内皮生长因子受体2(VEGFR2);路易斯(Y)抗原(Lewis(Y)antigen);CD24;血小板源性生长因子受体β(PDGFR-β);阶段特异性胚胎抗原4(SSEA-4);细胞表面相关粘蛋白1(MUC1);粘蛋白16(MUC16);表皮生长因子受体(EGFR);表皮生长因子受体变体III(EGFRvIII);神经细胞粘附分子(NCAM);碳酸酐酶IX(CAIX);蛋白酶体(Prosome,Macropain)β亚单位9型(LMP2);ephrin A型受体2(EphA2);Ephrin B2;岩藻糖基GM1;唾液酸路易斯粘附分子(sLe);神经节苷脂GM3(aNeu5Ac(2-3)bDGalp(1-4)bDGlcp(1-1)Cer);TGS5;高分子量黑素瘤相关抗原(HMWMAA);o-乙酰基-GD2神经节苷脂(OAcGD2);叶酸受体α;叶酸受体β;肿瘤内皮标志物1(TEM1/CD248);肿瘤内皮标志物7相关(TEM7R);紧密连接蛋白(claudin)6(CLDN6);G蛋白偶联受体C类第5组成员D(GPRC5D);染色体X开放阅读框61(CXORF61);CD97;CD179a;间变性淋巴瘤激酶(ALK);聚唾液酸;胎盘特异性1(PLAC1);globoH神经酰胺的六糖部分(GloboH);乳腺分化抗原(NY-BR-1);尿溶蛋白(uroplakin)2(UPK2);甲型肝炎病毒细胞受体1(HAVCR1);肾上腺素受体β3(ADRB3);泛连接蛋白3(PANX3);G蛋白偶联受体20(GPR20);淋巴细胞抗原6复合物基因座K9(LY6K);嗅觉受体51E2(OR51E2);TCRγ交替阅读框蛋白(TARP);威尔姆斯肿瘤蛋白(WT1);ETS易位变异基因6,位于染色体12p上(ETV6-AML);精子蛋白17(SPA17);X抗原家族成员1A(XAGE1);血管生成素结合细胞表面受体2(Tie 2);CT(癌症/睾丸(抗原));黑素瘤癌睾丸抗原1(MAD-CT-1);黑素瘤癌睾丸抗原2(MAD-CT-2);Fos相关抗原1;p53;p53突变体;人端粒酶逆转录酶(hTERT);肉瘤易位断裂点;细胞凋亡的黑素瘤抑制剂(ML-IAP);ERG(跨膜蛋白酶丝氨酸2(TMPRSS2)ETS融合基因);N-乙酰葡糖胺基转移酶V(NA17);配对盒蛋白Pax-3(PAX3);雄激素受体;细胞周期蛋白B1;细胞周期蛋白D1;v-myc禽骨髓细胞瘤病毒癌基因神经母细胞瘤源性同系物(MYCN);Ras同系物家族成员C(RhoC);细胞色素P450 1B1(CYP1B1);CCCTC结合因子(锌指蛋白)样(BORIS);T细胞识别的鳞状细胞癌抗原1或3(SART1、SART3);配对盒蛋白Pax-5(PAX5);前顶体素结合蛋白sp32(OY-TES1);淋巴细胞特异性蛋白酪氨酸激酶(LCK);A激酶锚定蛋白4(AKAP-4);滑膜肉瘤X断裂点1、2、3或4(SSX1、SSX2、SSX3、SSX4);CD79a;CD79b;CD72;白细胞相关免疫球蛋白样受体1(LAIR1);IgA受体的Fc片段(FCAR);白细胞免疫球蛋白样受体亚家族A成员2(LILRA2);CD300分子样家族成员f(CD300LF);C型凝集素结构域家族12成员A(CLEC12A);骨髓基质细胞抗原2(BST2);含EGF样模块粘蛋白样激素受体样2(EMR2);淋巴细胞抗原75(LY75);磷脂酰肌醇蛋白聚糖3(GPC3);Fc受体样5(FCRL5);小鼠双分钟2同系物(MDM2);活素(livin);甲胎蛋白(AFP);跨膜激活剂和CAML相互作用子(TACI);B细胞激活因子受体(BAFF-R);V-Ki-ras2 Kirsten大鼠肉瘤病毒癌基因同系物(KRAS);免疫球蛋白λ样多肽1(IGLL1);707-AP(707丙氨酸脯氨酸);ART-4(T4细胞识别的腺癌抗原);BAGE(B抗原;b-连环蛋白/m,b-连环蛋白/突变型);CAMEL(CTL识别的黑素瘤抗原);CAP1(癌胚抗原肽1);CASP-8(半胱天冬酶8);CDC27m(突变细胞分裂周期27);CDK4/m(突变细胞周期蛋白依赖性激酶4);Cyp-B(亲环素B);DAM(分化抗原黑素瘤);EGP-2(上皮糖蛋白2);EGP-40(上皮糖蛋白40);Erbb2、3、4(成红细胞白血病病毒癌基因同系物2、3、4);FBP(叶酸结合蛋白);fAchR(胎儿乙酰胆碱受体);G250(糖蛋白250);GAGE(G抗原);GnT-V(N-乙酰葡糖胺基转移酶V);HAGE(解旋酶抗原);ULA-A(人白细胞抗原A);HST2(人印戒瘤2);KIAA0205;KDR(激酶插入结构域受体);LDLR/FUT(低密度脂受体/GDP L-岩藻糖:b-D-半乳糖苷酶2-a-L岩藻糖基转移酶);L1CAM(L1细胞粘附分子);MC1R(黑皮质素1受体);Myosin/m(突变肌球蛋白);MUM-1、2、3(黑素瘤遍在突变蛋白1、2、3);NA88-A(患者M88的NA cDNA克隆);KG2D(自然杀伤组2成员D)配体;癌胚抗原(h5T4);p190小bcr-abl(190KDbcr-abl蛋白);Pml/RARa(早幼粒细胞白血病/维甲酸受体a);PRAME(黑素瘤优先表达的抗原);SAGE(肉瘤抗原);TEL/AML1(易位Ets家族白血病/急性髓样白血病1);TPI/m(突变磷酸丙糖异构酶);CD70;以及它们的任何组合。In one embodiment, the antigen (such as a tumor antigen) targeted in adoptive cell therapy (such as in particular CAR or TCR T cell therapy) for a disease (such as in particular a tumor or cancer) can be selected from the group consisting of: MR1 (see, for example, Crowther et al., 2020, Genome-wide CRISPR-Cas9 screening reveals subiquitous T cell cancer targeting via the monomorphic MHC class I-related protein MR1, Nature Immunology, 2019, pp. 215-221). 1, pp. 178-185); B cell maturation antigen (BCMA) (see, e.g., Friedman et al., Effective Targeting of Multiple BCMA-Expressing Hematological Malignancies by Anti-BCMACAR T Cells, Hum Gene Ther. March 8, 2018; Berdeja JG et al., Durable clinical responses in heavily pretreated patients with relapsed/ refractory multiple myeloma: updated results from a multicenter study of bb2121 anti-Bcma CAR T cell therapy. Blood. 2017; 130:740; and Mouhieddine and Ghobrial, Immunotherapy in Multiple Myeloma: The Era of CAR T Cell Therapy, Hematologist, May-June 2018, Volume 15, Issue 3); PSA ( prostate-specific antigen); prostate-specific membrane antigen (PSMA); PSCA (prostate stem cell antigen); tyrosine-protein kinase transmembrane receptor ROR1; fibroblast activation protein (FAP); tumor-associated glycoprotein 72 (TAG72); carcinoembryonic antigen (CEA); epithelial cell adhesion molecule (EPCAM); mesothelin; human epidermal growth factor receptor 2 (ERBB2 (Her2/neu)); prostate enzyme; prostatic acid phosphatase (PAP); elongation factor 2 mutant (ELF2M); insulin-like growth factor 1 receptor (IGF-1R); gplOO; BCR-ABL (breakpoint cluster region-Abelson); tyrosinase; New York esophageal squamous cell carcinoma 1 (NY- ESO-1); kappa-light chain, LAGE (L antigen); MAGE (melanoma antigen); melanoma-associated antigen 1 (MAGE-A1); MAGE A3; MAGE A6; legumin; human papillomavirus (HPV) E6; HPV E7; prostein; survivin; PCTA1 (galectin 8); Melan-A/MART-1; Ras mutant; TRP-1 (tyrosinase-related protein 1 or gp75); tyrosinase-related protein 2 (TRP2); TRP-2/INT2 (TRP-2/intron 2); RAGE (renal antigen); receptor for advanced glycation end products 1 (RAGE1); renal ubiquitin 1, renal ubiquitin 2 (RU1, RU2); intestinal carboxylesterase (iCE); heat shock protein 70-2 (HSP70-2) mutant; thyroid stimulating hormone receptor (TSHR); CD123; CD171; CD19; CD20; CD22; CD26; CD30; CD33; CD44v7/8 (cluster of differentiation 44, intron 7/8); CD53; CD92; CD100; CD148; CD150; CD200; CD261; CD262; CD362; CS-1 (CD2 subset 1, CRACC, SLAMF7, CD319 and 19A24); C-type lectin-like molecule 1 (CLL-1); ganglioside GD3 (aNeu5Ac(2 -8)aNeu5Ac(2-3)bDGalp(1-4)bDGlcp(1-1)Cer); Tn antigen (Tn Ag); Fms-like tyrosine kinase 3 (FLT3); CD38; CD138; CD44v6; B7H3(CD276); KIT(CD117); interleukin 13 receptor subunit alpha-2 (IL-13Ra2); interleukin 11 receptor alpha (IL-11Ra); prostate stem cell antigen (PSCA); serine protease 21 (PRSS21); vascular endothelial growth factor receptor 2 (VEGFR2); Lewis (Y) antigen (Lewis (Y) antigen); CD24; platelet-derived growth factor receptor PDGFR-β; stage-specific embryonic antigen 4 (SSEA-4); cell surface-associated mucin 1 (MUC1); mucin 16 (MUC16); epidermal growth factor receptor (EGFR); epidermal growth factor receptor variant III (EGFRvIII); neural cell adhesion molecule (NCAM); carbonic anhydrase IX (CAIX); proteasome (prosome, macropain) β subunit type 9 (LMP2); ephrin type A receptor 2 (EphA2); Ephrin B2; fucosyl GM1; sialyl Lewis adhesion molecule (sLe); ganglioside GM3 (aNeu5Ac(2-3)bDGalp(1-4)b DGlcp(1-1)Cer); TGS5; high molecular weight melanoma-associated antigen (HMWMAA); o-acetyl-GD2 ganglioside (OAcGD2); folate receptor alpha; folate receptor beta; tumor endothelial marker 1 (TEM1/CD248); tumor endothelial marker 7-related (TEM7R); tight junction protein (claudin) 6 (CLDN6); G protein-coupled receptor class C group 5 member D (GPRC5D); chromosome X open reading frame 61 (CXORF61); CD97; CD179a; anaplastic lymphoma kinase (ALK); polysialic acid; placenta-specific 1 (PLAC1); hexasaccharide moiety of globoH ceramide (GloboH ), breast differentiation antigen (NY-BR-1), uroplakin 2 (UPK2), hepatitis A virus cellular receptor 1 (HAVCR1), adrenergic receptor beta 3 (ADRB3), pan-nexin 3 (PANX3), G protein-coupled receptor 20 (GPR20), lymphocyte antigen 6 complex locus K9 (LY6K), olfactory receptor 51E2 (OR51E2), TCR gamma alternate reading frame protein (TARP), Wilms tumor protein (WT1), ETS translocation variant gene 6, located on chromosome 12p (ETV6-AML), sperm protein 17 (SPA17), X antigen family member 1A (XAGE1), angiopoietin binding cell surface Tie 2; CT (cancer/testis (antigen)); melanoma cancer testis antigen 1 (MAD-CT-1); melanoma cancer testis antigen 2 (MAD-CT-2); Fos-related antigen 1; p53; p53 mutant; human telomerase reverse transcriptase (hTERT); sarcoma translocation breakpoint; melanoma inhibitor of apoptosis (ML-IAP); ERG (transmembrane protease serine 2 (TMPRSS2) ETS fusion gene); N-acetylglucosaminyltransferase V (NA17); paired box protein Pax-3 (PAX3); androgen receptor; cyclin B1; cyclin D1; v-myc avian myelocytic tumor viral oncogene neuroblastoma homolog (MY CN); Ras homolog family member C (RhoC); cytochrome P450 1B1 (CYP1B1); CCCTC binding factor (zinc finger protein)-like (BORIS); squamous cell carcinoma antigen recognized by T cells 1 or 3 (SART1, SART3); paired box protein Pax-5 (PAX5); pre-acrosin binding protein sp32 (OY-TES1); lymphocyte-specific protein tyrosine kinase (LCK); A kinase anchoring protein 4 (AKAP-4); synovial sarcoma X breakpoint 1, 2, 3, or 4 (SSX1, SSX2, SSX3, SSX4); CD79a; CD79b; CD72; leukocyte-associated immunoglobulin-like receptor 1 (LAIR1); Ig Fc fragment of A receptor (FCAR); leukocyte immunoglobulin-like receptor subfamily A member 2 (LILRA2); CD300 molecule-like family member f (CD300LF); C-type lectin domain family 12 member A (CLEC12A); bone marrow stromal cell antigen 2 (BST2); EGF-like module-containing mucin-like hormone receptor-like 2 (EMR2); lymphocyte antigen 75 (LY75); glypican 3 (GPC3); Fc receptor-like 5 (FCRL5); mouse double minute 2 homolog (MDM2); livin; alpha-fetoprotein (AFP); transmembrane activator and CAML interactor (TACI); B cell activating factor receptor (BAFF-R) ;V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS); Immunoglobulin lambda-like polypeptide 1 (IGLL1); 707-AP (707 alanine proline); ART-4 (adenocarcinoma antigen recognized by T4 cells); BAGE (B antigen; b-catenin/m, b-catenin/mutant); CAMEL (melanoma antigen recognized by CTL); CAP1 (carcinoembryonic antigen peptide 1); CASP-8 (caspase 8); CDC27m (mutant cell division cycle 27); CDK4/m (mutant cell cycle protein-dependent kinase 4); Cyp-B (cyclophilin B); DAM (differentiation antigen melanoma); EGP-2 (epithelial glycoprotein 2); E GP-40 (epithelial glycoprotein 40); Erbb2, 3, 4 (erythroblastic leukemia viral oncogene homolog 2, 3, 4); FBP (folate binding protein); fAchR (fetal acetylcholine receptor); G250 (glycoprotein 250); GAGE (G antigen); GnT-V (N-acetylglucosaminyltransferase V); HAGE (helicase antigen); ULA-A (human leukocyte antigen A); HST2 (human signet ring tumor 2); KIAA0205; KDR (kinase insert domain receptor); LDLR/FUT (low-density lipid receptor/GDP L-fucose: b-D-galactosidase 2-a-L fucosyltransferase); L1CAM (L1 cell adhesion molecule); MC1R ( melanocortin 1 receptor); Myosin/m (mutant myosin); MUM-1, 2, 3 (melanoma ubiquitously mutated protein 1, 2, 3); NA88-A (NA cDNA clone of patient M88); KG2D (natural killer group 2 member D) ligand; carcinoembryonic antigen (h5T4); p190 small bcr-abl (190KD bcr-abl protein); Pml/RARa (promyelocytic leukemia/retinoic acid receptor alpha); PRAME (melanoma preferentially expressed antigen); SAGE (sarcoma antigen); TEL/AML1 (translocation Ets family leukemia/acute myeloid leukemia 1); TPI/m (mutant triosephosphate isomerase); CD70; and any combination thereof.

在一些实施方案中,本公开的组合物、系统或其组分可用于治疗和/或预防遗传疾病或具有遗传和/或表观遗传方面的疾病。本文例示的基因和疾患并不是详尽的。在一个实施方案中,治疗和/或预防遗传疾病的方法可包括向受试者施用组合物、系统和/或其一种或多种组分,其中所述组合物、系统和/或其一种或多种组分是能够修饰受试者的一个或多个细胞中与遗传疾病或具有遗传和/或表观遗传方面的疾病相关联的一种或多种基因的一个或多个拷贝。在一个实施方案中,修饰受试者中与遗传疾病或具有遗传和/或表观遗传方面的疾病相关联的一种或多种基因的一个或多个拷贝可消除受试者的遗传疾病或其症状。在一个实施方案中,修饰受试者中与遗传疾病或具有遗传和/或表观遗传方面的疾病相关联的一种或多种基因的一个或多个拷贝可降低受试者的遗传疾病或其症状的严重性。在一个实施方案中,组合物、系统或其组分可修饰与一种或多种疾病相关联的一种或多种基因或多核苷酸,所述一种或多种疾病包括遗传疾病和/或具有遗传方面和/或表观遗传方面的疾病。In some embodiments, the compositions, systems or components of the present disclosure can be used to treat and/or prevent genetic diseases or diseases with genetic and/or epigenetic aspects. The genes and diseases exemplified herein are not exhaustive. In one embodiment, the method for treating and/or preventing genetic diseases may include administering a composition, system and/or one or more components thereof to a subject, wherein the composition, system and/or one or more components thereof are capable of modifying one or more copies of one or more genes associated with a genetic disease or a disease with genetic and/or epigenetic aspects in one or more cells of the subject. In one embodiment, modifying one or more copies of one or more genes associated with a genetic disease or a disease with genetic and/or epigenetic aspects in a subject can eliminate the genetic disease or its symptoms of the subject. In one embodiment, modifying one or more copies of one or more genes associated with a genetic disease or a disease with genetic and/or epigenetic aspects in a subject can reduce the severity of the genetic disease or its symptoms of the subject. In one embodiment, a composition, system or its components can modify one or more genes or polynucleotides associated with one or more diseases, and the one or more diseases include genetic diseases and/or diseases with genetic and/or epigenetic aspects.

在一个实施方案中,所述组合物、系统或其组分可用于诊断、预测、治疗和/或预防由微生物诸如细菌、病毒、真菌、寄生虫或它们的组合引起的感染性疾病。在一个实施方案中,所述系统或其组分能够靶向混合群体中的具体微生物。此类技术的示例性方法描述于例如Gomaa AA,Klumpe HE,Luo ML,Selle K,Barrangou R,Beisel CL.2014.Programmable removal of bacterial strains by use of genome-targeting composition,systems,mBio5:e00928-13;Citorik RJ,Mimee M,LuTK.2014.Sequence-specific antimicrobials using efficiently delivered RNA-guided nucleases.Nat Biotechnol 32:1141-1145中,其教导内容可适于与本文所述的组合物、系统及其组分一起使用。在一个实施方案中,所述组合物、系统和/或其组分能够靶向致病和/或抗药微生物,诸如细菌、病毒、寄生虫和真菌。在一个实施方案中,所述组合物、系统和/或其组分能够靶向和修饰致病微生物中的一种或多种多核苷酸,使得微生物毒性降低、被杀伤、受抑制或以其他方式使其不能在宿主细胞中引起疾病和/或感染和/或复制。In one embodiment, the composition, system or its components can be used to diagnose, predict, treat and/or prevent infectious diseases caused by microorganisms such as bacteria, viruses, fungi, parasites or combinations thereof. In one embodiment, the system or its components are capable of targeting specific microorganisms in a mixed population. Exemplary methods of such technologies are described, for example, in Gomaa AA, Klumpe HE, Luo ML, Selle K, Barrangou R, Beisel CL. 2014. Programmable removal of bacterial strains by use of genome-targeting composition, systems, mBio5: e00928-13; Citorik RJ, Mimee M, Lu TK. 2014. Sequence-specific antimicrobials using efficiently delivered RNA-guided nucleases. Nat Biotechnol 32: 1141-1145, the teachings of which can be adapted for use with the compositions, systems and their components described herein. In one embodiment, the composition, system and/or its components can target pathogenic and/or drug-resistant microorganisms, such as bacteria, viruses, parasites and fungi. In one embodiment, the composition, system and/or its components can target and modify one or more polynucleotides in pathogenic microorganisms, so that the microorganisms are reduced in toxicity, killed, inhibited or otherwise unable to cause disease and/or infection and/or replication in host cells.

在一个方面,最具挑战性的线粒体病症中的一些是由线粒体DNA(mtDNA)突变引起的,所述线粒体DNA是母系遗传的高拷贝数基因组。在一个实施方案中,可使用本文所述的组合物、系统来修饰mtDNA突变。在一个实施方案中,可诊断、预测、治疗和/或预防的线粒体疾病可以是MELAS(线粒体肌病脑病和乳酸酸中毒和中风样发作)、CPEO/PEO(慢性进行性外眼肌麻痹综合征/进行性外眼眼肌麻痹)、KSS(卡恩斯-塞尔综合征)、MIDD(母系遗传性糖尿病和耳聋)、MERRF(肌阵挛性癫痫伴红色纤维参差不齐)、NIDDM(非胰岛素依赖型糖尿病)、LHON(莱伯遗传性视神经病变)、LS(利氏综合征)、氨基糖苷类诱发的听力障碍、NARP(神经病变、共济失调和色素性视网膜病)、锥体外系障碍伴运动不能-僵硬、精神病和SNHL、非综合征性听力损失、心肌病、脑肌病、皮尔逊综合征(Pearson's syndrome)或它们的组合。In one aspect, some of the most challenging mitochondrial disorders are caused by mutations in mitochondrial DNA (mtDNA), a high copy number genome inherited maternally. In one embodiment, the compositions, systems described herein can be used to modify mtDNA mutations. In one embodiment, the mitochondrial disease that can be diagnosed, predicted, treated and/or prevented can be MELAS (mitochondrial myopathy encephalopathy and lactic acidosis and stroke-like episodes), CPEO/PEO (chronic progressive external ophthalmoplegia syndrome/progressive external ophthalmoplegia), KSS (Kearns-Sell syndrome), MIDD (maternally inherited diabetes mellitus and deafness), MERRF (myoclonic epilepsy with ragged red fibers), NIDDM (non-insulin-dependent diabetes mellitus), LHON (Leber hereditary optic neuropathy), LS (Leigh syndrome), aminoglycoside-induced hearing impairment, NARP (neuropathy, ataxia and pigmentary retinopathy), extrapyramidal disorders with akinesia-rigidity, psychosis and SNHL, non-syndromic hearing loss, cardiomyopathy, encephalomyopathy, Pearson's syndrome or a combination thereof.

在一个实施方案中,受试者的mtDNA可在体内或离体进行修饰。在一个实施方案中,在离体修饰mtDNA的情况下,在修饰后,可将含有修饰的线粒体的细胞施用回受试者。在一个实施方案中,所述组合物、系统或其组分能够校正mtDNA突变或其组合。In one embodiment, the subject's mtDNA can be modified in vivo or ex vivo. In one embodiment, in the case of ex vivo modification of mtDNA, after modification, cells containing the modified mitochondria can be administered back to the subject. In one embodiment, the composition, system, or component thereof is capable of correcting mtDNA mutations or a combination thereof.

在一些实施方案中,本公开所述的组合物、系统或其组分可用于微生物组修饰。微生物组在健康和疾病方面起重要作用。例如,肠道微生物组可通过控制消化、防止致病微生物的生长而在健康方面发挥作用,并被认为会影响心境和情绪。不平衡的微生物组可促发疾病,并被认为会导致体重增加、血糖失控、高胆固醇、癌症和其他病症。健康的微生物组具有一系列可与非健康个体区分开来的联合特征,因此疾病相关微生物组的检测和鉴定可用于诊断和检测个体的疾病。所述组合物、系统及其组分可用于筛选微生物组细胞群并用于鉴定疾病相关微生物组。利用组合物、系统及其组分的细胞筛选方法在本文别处描述并且可应用于筛选受试者的微生物组,诸如肠道、皮肤、阴道和/或口腔微生物组。In some embodiments, the compositions, systems, or components thereof disclosed herein can be used for microbiome modification. Microbiome plays an important role in health and disease. For example, the intestinal microbiome can play a role in health by controlling digestion and preventing the growth of pathogenic microorganisms, and is believed to affect mood and emotions. An unbalanced microbiome can promote disease and is believed to cause weight gain, uncontrolled blood sugar, high cholesterol, cancer, and other conditions. A healthy microbiome has a series of joint features that can be distinguished from unhealthy individuals, so the detection and identification of disease-related microbiome can be used to diagnose and detect individual diseases. The compositions, systems, and components thereof can be used to screen microbiome cell populations and to identify disease-related microbiome. Cell screening methods using compositions, systems, and components thereof are described elsewhere herein and can be applied to screen a subject's microbiome, such as intestinal, skin, vaginal, and/or oral microbiome.

在一个实施方案中,可使用本文所述的组合物、系统和/或其组分来修饰受试者的微生物组的微生物群体。在一个实施方案中,组合物、系统和/或其组分可用于鉴定和选择微生物组中的一种或多种细胞类型并将它们从微生物组群体中去除。使用组合物、系统和/或其组分选择细胞的示例性方法在本文别处描述。通过这种方式,可改变微生物组的组成或微生物特征。在一个实施方案中,所述改变致使从患病微生物组组成变为健康微生物组组成。通过这种方式,可修改一种微生物类型或物种与另一种的比例,诸如从患病比例变为健康比例。在一个实施方案中,所选择的细胞是致病微生物。In one embodiment, the composition, system and/or its components described herein can be used to modify the microbial population of the subject's microbiome. In one embodiment, the composition, system and/or its components can be used to identify and select one or more cell types in the microbiome and remove them from the microbiome population. Exemplary methods for selecting cells using the composition, system and/or its components are described elsewhere herein. In this way, the composition or microbial characteristics of the microbiome can be changed. In one embodiment, the change causes the composition of the diseased microbiome to be changed to the healthy microbiome composition. In this way, the ratio of one microbial type or species to another can be modified, such as changing the ratio from a diseased ratio to a healthy ratio. In one embodiment, the selected cell is a pathogenic microorganism.

在一个实施方案中,本文所述的组合物和系统可用于修饰受试者的微生物组的微生物中的多核苷酸。在一个实施方案中,微生物是致病微生物。在一个实施方案中,微生物是共生和非致病微生物。修饰受试者的细胞中的多核苷酸的方法在本文别处描述并且可应用于这些实施方案。In one embodiment, the compositions and systems described herein can be used to modify polynucleotides in a microorganism of a subject's microbiome. In one embodiment, the microorganism is a pathogenic microorganism. In one embodiment, the microorganism is a symbiotic and non-pathogenic microorganism. Methods for modifying polynucleotides in a subject's cell are described elsewhere herein and can be applied to these embodiments.

在一个实施方案中,本公开提供了一种对与真核生物体或非人生物体中的基因组基因座相关联的疾病进行建模的方法,所述方法包括操纵所述基因组基因座的编码、非编码或调控元件内的靶序列,包括递送包含病毒载体系统的非天然存在的或工程化的组合物,所述病毒载体系统包括一种或多种可操作地编码用于其表达的组合物的病毒载体,其中所述组合物包含粒子递送系统或如上述实施方案中任一项所述的递送系统或病毒粒子或如上述实施方案中任一项所述的细胞。In one embodiment, the present disclosure provides a method of modeling a disease associated with a genomic locus in a eukaryotic or non-human organism, the method comprising manipulating a target sequence within a coding, non-coding or regulatory element of the genomic locus, comprising delivering a non-naturally occurring or engineered composition comprising a viral vector system, the viral vector system comprising one or more viral vectors operably encoding a composition for expression thereof, wherein the composition comprises a particle delivery system or a delivery system or a viral particle as described in any of the above embodiments or a cell as described in any of the above embodiments.

本公开还考虑了本文所述的组合物(例如Cas12o系统)提供适于使用的RNA指导的DNA核酸酶以提供用于移植的修饰组织的用途。例如,RNA指导的DNA核酸酶可用于敲除、敲减或破坏动物(诸如转基因猪(诸如人血红素加氧酶1转基因猪系))中的选定基因,例如通过破坏编码由人免疫系统识别的表位的基因的表达,即异种抗原基因的表达。用于破坏的候选猪基因可例如包括α(l,3)-半乳糖基转移酶和胞苷单磷酸-N-乙酰神经氨酸羟化酶基因(参见PCT专利公布WO 2014/066505)。另外,编码内源性逆转录病毒的基因,例如编码所有猪内源性逆转录病毒的基因,可能会被破坏(参见Yang等人,2015,Genome-wide inactivation of porcine endogenous retroviruses(PERVs),Science2015年11月27日:第350卷第6264期第1101-1104页)。另外,RNA指导的DNA核酸酶可用于靶向异种移植供体动物中其他基因的整合位点,诸如人CD55基因,以提高针对超急性排斥的保护。The present disclosure also contemplates the use of the compositions described herein (e.g., the Cas12o system) to provide RNA-guided DNA nucleases suitable for use to provide modified tissues for transplantation. For example, RNA-guided DNA nucleases can be used to knock out, knock down, or disrupt selected genes in animals such as transgenic pigs (such as human heme oxygenase 1 transgenic pig lines), for example by disrupting the expression of genes encoding epitopes recognized by the human immune system, i.e., the expression of xenoantigen genes. Candidate pig genes for disruption may, for example, include α(l,3)-galactosyltransferase and cytidine monophosphate-N-acetylneuraminic acid hydroxylase genes (see PCT Patent Publication WO 2014/066505). Additionally, genes encoding endogenous retroviruses, such as those encoding all porcine endogenous retroviruses, may be disrupted (see Yang et al., 2015, Genome-wide inactivation of porcine endogenous retroviruses (PERVs), Science 2015 Nov 27: 350, 6264, 1101-1104). Additionally, RNA-guided DNA nucleases may be used to target the integration sites of other genes in xenotransplant donor animals, such as the human CD55 gene, to improve protection against hyperacute rejection.

在植物和真菌中的应用Applications in plants and fungi

本文所述的组合物、系统和方法可用于在植物和真菌中进行基因或基因组询问或编辑或操纵。例如,应用包括植物基因或基因组的调查和/或选择和/或询问和/或比较和/或操纵和/或转化;例如,以创建、鉴定、开发、优化或赋予植物性状或特征,或转化植物或真菌基因组。因此,可提高植物、具有新的性状或特征的组合的新植物或具有增强的性状的新植物的产量。所述组合物、系统和方法可用于定点整合(SDI)或基因编辑(GE)或任何近反向育种(NRB)或反向育种(RB)技术中的植物。The compositions, systems and methods described herein can be used for gene or genome interrogation or editing or manipulation in plants and fungi. For example, applications include investigation and/or selection and/or interrogation and/or comparison and/or manipulation and/or transformation of plant genes or genomes; for example, to create, identify, develop, optimize or confer plant traits or characteristics, or transform plant or fungal genomes. Therefore, the yield of plants, new plants with a combination of new traits or characteristics, or new plants with enhanced traits can be increased. The compositions, systems and methods can be used for plants in site-directed integration (SDI) or gene editing (GE) or any near reverse breeding (NRB) or reverse breeding (RB) technology.

本文的组合物、系统和方法可用于赋予基本上任何植物和真菌以及它们的细胞和组织所需的性状(例如,增强的营养品质、增强的疾病抗性和对生物和非生物胁迫的抗性,以及增加的具有商业价值的植物产物或异源化合物的产量)。所述组合物、系统和方法可用于在不将任何外来基因永久引入基因组的情况下修饰内源性基因或修饰它们的表达。The compositions, systems and methods herein can be used to confer desired traits (e.g., enhanced nutritional quality, enhanced disease resistance and resistance to biotic and abiotic stresses, and increased production of commercially valuable plant products or heterologous compounds) to essentially any plant and fungus, and their cells and tissues. The compositions, systems and methods can be used to modify endogenous genes or modify their expression without permanently introducing any foreign genes into the genome.

植物的实例Examples of plants

本文的组合物、系统和方法可用于赋予基本上任何植物所需的性状。可对多种植物和植物细胞系统进行工程化以获得所需的生理和农艺特征。一般来说,术语“植物”涉及植物界的任何各种光合、真核、单细胞或多细胞生物体,其特征在于通过细胞分裂生长,含有叶绿体,并且具有由纤维素组成的细胞壁。术语植物涵盖单子叶植物和双子叶植物。在一个实施方案中,用于工程化的靶植物和植物细胞包括那些单子叶植物和双子叶植物,诸如包括以下的作物:谷类作物(例如,小麦、玉米、水稻、小米、大麦)、水果作物(例如,番茄、苹果、梨、草莓、橙)、饲料作物(例如,苜蓿)、块根蔬菜作物(例如,胡萝卜、马铃薯、甜菜、山药)、叶菜作物(例如,生菜、菠菜);开花植物(例如,矮牵牛、玫瑰、菊花)、针叶树和松树(例如,松冷杉、云杉);用于植物修复的植物(例如,重金属积聚植物);油料作物(例如,向日葵、油菜籽)和用于实验目的的植物(例如,拟南芥)。具体地,植物旨在包括但不限于被子植物和裸子植物,诸如金合欢、苜蓿、苋菜、苹果、杏、洋蓟、白蜡树、芦笋、鳄梨、香蕉、大麦、豆类、甜菜、桦树、山毛榉、黑莓、蓝莓、西兰花、球芽甘蓝、卷心菜、油菜、哈密瓜、胡萝卜、木薯、花椰菜、雪松、谷物、芹菜、栗子、樱桃、大白菜、柑橘、小柑橘、三叶草、咖啡、玉米、棉花、豇豆、黄瓜、柏树、茄子、榆树、菊苣、桉树、茴香、无花果、冷杉、天竺葵、葡萄、葡萄柚、落花生、地樱桃、树胶铁杉、山核桃、羽衣甘蓝、奇异果、大头菜、落叶松、生菜、韭菜、柠檬、青柠、刺槐、松树、铁线蕨、玉米、芒果、枫、甜瓜、小米、蘑菇、芥末、坚果、橡木、燕麦、油棕、秋葵、洋葱、橙子、观赏植物或花或树、木瓜、棕榈、欧芹、防风草、豌豆、桃、花生、梨、泥炭、胡椒、柿子、木豆、松树、菠萝、车前草、李子、石榴、马铃薯、南瓜、菊苣、萝卜、油菜籽、覆盆子、水稻、黑麦、高粱、红花、黄华柳、大豆、菠菜、云杉、笋瓜、草莓、甜菜、甘蔗、向日葵、甘薯、甜玉米、橘子、茶、烟草、番茄、树木、黑小麦、草坪草、芜菁、藤蔓、核桃、豆瓣菜、西瓜、小麦、山药、红豆杉和西葫芦。The compositions, systems and methods herein can be used to confer desired traits to essentially any plant. A variety of plants and plant cell systems can be engineered to obtain desired physiological and agronomic characteristics. In general, the term "plant" refers to any of the various photosynthetic, eukaryotic, unicellular or multicellular organisms in the plant kingdom, characterized by growth by cell division, containing chloroplasts, and having a cell wall composed of cellulose. The term plant encompasses monocots and dicots. In one embodiment, target plants and plant cells for engineering include those monocots and dicots, such as crops including: cereal crops (e.g., wheat, corn, rice, millet, barley), fruit crops (e.g., tomato, apple, pear, strawberry, orange), forage crops (e.g., alfalfa), root vegetable crops (e.g., carrot, potato, beet, yam), leafy vegetable crops (e.g., lettuce, spinach); flowering plants (e.g., petunia, rose, chrysanthemum), conifers and pines (e.g., pine fir, spruce); plants used for phytoremediation (e.g., heavy metal accumulating plants); oil crops (e.g., sunflower, rapeseed) and plants for experimental purposes (e.g., Arabidopsis thaliana). Specifically, plants are intended to include, but are not limited to, angiosperms and gymnosperms such as acacia, alfalfa, amaranth, apple, apricot, artichoke, ash, asparagus, avocado, banana, barley, beans, beets, birch, beech, blackberries, blueberries, broccoli, Brussels sprouts, cabbage, rapeseed, cantaloupe, carrots, cassava, cauliflower, cedar, cereals, celery, chestnuts, cherries, Chinese cabbage, citrus, clementines, clover, coffee, corn, cotton, cowpea, cucumber, cypress, eggplant, elm, chicory, eucalyptus, fennel, fig, fir, geranium, grape, grapefruit, peanut, ground cherry, gum hemlock, hickory, kale, kiwi, kohlrabi, larch, lettuce, leek, lemon, lime, acacia, pine, maidenhair fern, corn, mango, maple, melon, millet, mushroom, mustard, nuts, oak, oats, oil palm, okra, onion, orange, ornamental plants or flowers or trees, papaya, palm, parsley, parsnip, peas, peach, peanut, pear, peat, pepper, persimmon, pigeon pea, pineapple, plantain, plum, pomegranate, potato, pumpkin, chicory, radish, rapeseed, raspberry, rice, rye, sorghum, safflower, yellow willow, soybean, spinach, spruce, winter squash, strawberry, beet, sugar cane, sunflower, sweet potato, sweet corn, tangerine, tea, tobacco, tomato, tree, triticale, lawn grass, turnip, vine, walnut, watercress, watermelon, wheat, yam, yew and zucchini.

术语植物还涵盖藻类,它们主要是光合自养生物,主要是由于缺乏根、叶和其他高等植物特有的器官而形成的。所述组合物、系统和方法可用于广泛的“藻类”或“藻类细胞”。藻类的实例包括真核生物门,包括红藻门(Rhodophyta)(红藻)、绿藻门(Chlorophyta)(绿藻)、褐藻门(Phaeophyta)(褐藻)、硅藻门(Bacillariophyta)(硅藻)、真眼点藻门(Eustigmatophyta)和甲藻门(dinoflagellates)以及原核生物蓝藻门(Cyanobacteria)(蓝绿藻)。藻类物种的实例包括以下中的那些:双眉藻属(Amphora)、鱼腥藻属(Anabaena)、纤维藻属(Anikstrodesmis)、葡萄藻属(Botryococcus)、角毛藻属(Chaetoceros)、衣藻属(Chlamydomonas)、小球藻属(Chlorella)、绿球藻属(Chlorococcum)、小环藻属(Cyclotella)、筒柱藻属(Cylindrotheca)、杜氏藻属(Dunaliella)、球石藻属(Emiliana)、眼虫藻属(Euglena)、红球藻属(Hematococcus)、等鞭金藻属(Isochrysis)、单鞭金藻属(Monochrysis)、单针藻属(Monoraphidium)、微拟球藻属(Nannochloris)、拟微绿球藻属(Nannnochloropsis)、舟形藻属(Navicula)、肾鞭藻属(Nephrochloris)、肾爿藻属(Nephroselmis)、菱形藻属(Nitzschia)、节球藻属(Nodularia)、念珠藻属(Nostoc)、棕鞭藻属(Oochromonas)、卵囊藻属(Oocystis)、颤藻属(Oscillartoria)、巴夫藻属(Pavlova)、褐指藻属(Phaeodactylum)、扁藻属(Playtmonas)、颗石藻属(Pleurochrysis)、紫菜属(Porhyra)、伪鱼腥藻属(Pseudoanabaena)、塔胞藻属(Pyramimonas)、裂丝藻属(Stichococcus)、聚球藻属(Synechococcus)、集胞藻属(Synechocystis)、扁藻属(Tetraselmis)、海链藻属(Thalassiosira)和束毛藻属(Trichodesmium)。The term plant also encompasses algae, which are primarily photoautotrophs, formed primarily due to the lack of roots, leaves, and other organs characteristic of higher plants. The compositions, systems, and methods can be used for a wide range of "algae" or "algae cells." Examples of algae include eukaryotic phyla, including Rhodophyta (red algae), Chlorophyta (green algae), Phaeophyta (brown algae), Bacillariophyta (diatoms), Eustigmatophyta, and dinoflagellates, as well as prokaryotic Cyanobacteria (blue-green algae). Examples of algae species include those of the genera Amphora, Anabaena, Anikstrodesmis, Botryococcus, Chaetoceros, Chlamydomonas, Chlorella, Chlorococcum, Cyclotella, Cylindrotheca, Dunaliella, Emiliana, Euglena, Hematococcus, Isochrysis, Monochrysis, Monoraphidium, Nannochloris, Nannochloropsis, Navicula , Nephrochloris, Nephroselmis, Nitzschia, Nodularia, Nostoc, Oochromonas, Oocystis, Oscillartoria, Pavlova, Phaeodactylum, Playtmonas, Pleurochrysis, Porhyra, Pseudoanabaena, Pyramimonas, Stichococcus, Synechococcus, Synechocystis, Tetraselmis, Thalassiosira, and Trichodesmium.

在一个实施方案中,可引入编码组合物和系统的组分的多核苷酸以稳定整合到植物细胞的基因组中。在一些情况下,载体或表达系统可用于此种整合。载体或表达系统的设计可根据指导RNA和/或Cas12o蛋白或CRISPR相关Cas12o蛋白基因表达的时间、地点和条件进行调整。在一些情况下,多核苷酸可整合到植物的细胞器中,诸如质体、线粒体或叶绿体。表达系统的元件可位于一个或多个表达构建体上,所述表达构建体是环状的,诸如质粒或转化载体,或是非环状的,诸如线性双链DNA。In one embodiment, polynucleotides encoding components of the composition and system may be introduced to stably integrate into the genome of a plant cell. In some cases, a vector or expression system may be used for such integration. The design of the vector or expression system may be adjusted according to the time, place and conditions of expression of the guide RNA and/or Cas12o protein or CRISPR-related Cas12o protein gene. In some cases, the polynucleotides may be integrated into the organelles of the plant, such as plastids, mitochondria or chloroplasts. The elements of the expression system may be located on one or more expression constructs, which are circular, such as plasmids or transformation vectors, or non-circular, such as linear double-stranded DNA.

在一个实施方案中,整合方法一般包括以下步骤:选择合适的宿主细胞或宿主组织,将构建体引入宿主细胞或宿主组织,以及从其中再生植物细胞或植物。在一些实例中,用于稳定整合到植物细胞基因组中的表达系统可含有以下元件中的一者或多者:启动子元件,其可用于在植物细胞中表达RNA和/或Cas12o蛋白;5'非翻译区,用于增强表达;内含子元件,用于进一步增强某些细胞(诸如单子叶细胞)中的表达;多克隆位点,用于为插入指导RNA和/或Cas12o蛋白基因序列和其他所需元件提供方便的限制性位点;和3'非翻译区,用于提供表达的转录物的高效终止。In one embodiment, the integration method generally comprises the following steps: selecting a suitable host cell or host tissue, introducing the construct into the host cell or host tissue, and regenerating a plant cell or plant therefrom. In some examples, the expression system for stable integration into the plant cell genome may contain one or more of the following elements: a promoter element, which can be used to express RNA and/or Cas12o protein in plant cells; a 5' untranslated region for enhancing expression; an intron element for further enhancing expression in certain cells (such as monocotyledonous cells); a multiple cloning site for providing convenient restriction sites for inserting guide RNA and/or Cas12o protein gene sequences and other required elements; and a 3' untranslated region for providing efficient termination of expressed transcripts.

植物中的瞬时表达Transient expression in plants

在一个实施方案中,组合物和系统的组分可在植物细胞中瞬时表达。在一些实例中,组合物和系统可仅当指导RNA和Cas12o蛋白或CRISPR相关Cas12o蛋白都存在于细胞中时修饰靶核酸,使得可以进一步控制基因组修饰。由于Cas12o蛋白或CRISPR相关Cas12o蛋白的表达是瞬时的,因此从此类植物细胞再生的植物通常不含外来DNA。在某些实例中,Cas12o蛋白或CRISPR相关Cas12o蛋白被稳定表达并且指导序列被瞬时表达。In one embodiment, the components of the composition and system can be transiently expressed in plant cells. In some examples, the composition and system can modify the target nucleic acid only when the guide RNA and the Cas12o protein or the CRISPR-related Cas12o protein are present in the cell, so that the genome modification can be further controlled. Since the expression of the Cas12o protein or the CRISPR-related Cas12o protein is transient, the plants regenerated from such plant cells are generally free of foreign DNA. In some examples, the Cas12o protein or the CRISPR-related Cas12o protein is stably expressed and the guide sequence is transiently expressed.

可将DNA和/或RNA(例如,mRNA)引入植物细胞中用于瞬时表达。在此类情况下,可提供足够量的引入的核酸以修饰细胞,但所引入的核酸在经过预期的一段时间后或在一次或多次细胞分裂后不会持续存在。DNA and/or RNA (e.g., mRNA) can be introduced into plant cells for transient expression. In such cases, sufficient amounts of the introduced nucleic acid can be provided to modify the cell, but the introduced nucleic acid will not persist after a desired period of time or after one or more cell divisions.

可使用合适的载体实现瞬时表达。可用于瞬时表达的示例性载体包括pEAQ载体(可针对土壤杆菌介导的瞬时表达进行定制)和卷心菜叶卷曲病毒(CaLCuV),以及描述于Sainsbury F.等人,Plant Biotechnol J.2009年9月;7(7):682-93;和Yin K等人,Scientific Reports第5卷,文章编号:14926(2015)中的载体。Transient expression can be achieved using a suitable vector. Exemplary vectors that can be used for transient expression include pEAQ vectors (customizable for Agrobacterium-mediated transient expression) and Cabbage Leaf Curl Virus (CaLCuV), and vectors described in Sainsbury F. et al., Plant Biotechnol J. 2009 Sep; 7(7): 682-93; and Yin K et al., Scientific Reports Vol. 5, Article No.: 14926 (2015).

植物中的示例性应用Exemplary applications in plants

所述组合物、系统和方法可用于在目标植物(例如,作物)中产生遗传变异。可提供靶向基因组中一个或多个位置的一个或多个ωRNA,例如ωRNA的文库,并将其与Cas12o蛋白核酸酶一起引入植物细胞中。例如,可产生一组基因组规模的点突变和基因敲除。在一些实例中,所述组合物、系统和方法可用于从如此获得的细胞产生植物部分或植物,并针对目标性状筛选细胞。靶基因可同时包括编码区和非编码区。在一些情况下,性状是耐逆性,并且所述方法是用于产生耐逆性作物品种的方法。The composition, system and method can be used to produce genetic variation in target plants (e.g., crops). One or more ωRNAs targeting one or more positions in the genome can be provided, such as a library of ωRNAs, and introduced into plant cells together with the Cas12o protein nuclease. For example, a set of genome-scale point mutations and gene knockouts can be produced. In some instances, the composition, system and method can be used to produce plant parts or plants from the cells so obtained, and screen cells for target traits. The target gene can include coding regions and non-coding regions simultaneously. In some cases, the trait is stress tolerance, and the method is a method for producing stress-tolerant crop varieties.

在一个实施方案中,所述组合物、系统和方法用于修饰内源性基因或修饰它们的表达。组分的表达可通过Cas12o蛋白或CRISPR相关Cas12o蛋白的直接活性和任选地引入重组模板DNA,或者通过修饰被靶向的基因来诱导基因组的靶向修饰。上文所述的不同策略允许Cas12o蛋白或CRISPR相关Cas12o蛋白介导的靶向基因组编辑,而不要求将组分引入植物基因组中。In one embodiment, the compositions, systems and methods are used to modify endogenous genes or modify their expression. The expression of the components can be induced by the direct activity of the Cas12o protein or the CRISPR-associated Cas12o protein and the optional introduction of recombinant template DNA, or by modifying the targeted gene to induce targeted modification of the genome. The different strategies described above allow targeted genome editing mediated by the Cas12o protein or the CRISPR-associated Cas12o protein without requiring the introduction of the components into the plant genome.

在一些情况下,修饰可在不将任何外来基因(包括编码本文组合物的组分的那些)永久引入植物基因组中的情况下进行,以避免植物基因组中存在外来DNA。这可能会引起人们的兴趣,因为对非转基因植物的监管要求不那么严格。瞬时引入植物细胞中的组分通常在杂交时被去除。In some cases, the modification can be performed without permanently introducing any foreign genes (including those encoding the components of the compositions herein) into the plant genome to avoid the presence of foreign DNA in the plant genome. This may be of interest because regulatory requirements for non-transgenic plants are less stringent. Components that are transiently introduced into plant cells are typically removed upon hybridization.

例如,可通过所述组合物和系统的组分的瞬时表达来进行修饰。瞬时表达可通过用病毒载体递送组合物和系统的组分,借助颗粒分子诸如纳米粒子或CPP递送到原生质体中来进行。For example, modification can be performed by transient expression of the components of the compositions and systems. Transient expression can be performed by delivering the components of the compositions and systems using viral vectors, by delivery into protoplasts via particulate molecules such as nanoparticles or CPPs.

药盒Pill Box

在一个方面,本公开提供了含有上述方法和组合物中公开的任何一种或多种元件的药盒。在一个方面,本公开提供了包括本文所述的一种或多种组分的药盒。在一个实施方案中,所述药盒包括本文的组合物和所述药盒的使用说明。在一个实施方案中,所述药盒包括载体系统和所述药盒的使用说明。在一个实施方案中,所述药盒包括递送系统和所述药盒的使用说明。在一个实施方案中,所述药盒包括载体系统和所述药盒的使用说明。可单独地或组合地提供各元件,并且可在任何合适的容器诸如小瓶、瓶子或管中提供各元件。药盒可包括如本文所述的crRNA和任选的未结合的保护链。药盒可包括crRNA,其中保护链至少部分地结合至crRNA序列的可重编程间隔区部分(即spacer序列)。在一个实施方案中,药盒包括一种或多种语言的说明,例如一种语言以上的说明。这些说明可能特定于本文所述的应用和方法。In one aspect, the present disclosure provides a kit containing any one or more elements disclosed in the above-mentioned methods and compositions. In one aspect, the present disclosure provides a kit comprising one or more components described herein. In one embodiment, the kit includes instructions for use of the composition herein and the kit. In one embodiment, the kit includes instructions for use of a carrier system and the kit. In one embodiment, the kit includes instructions for use of a delivery system and the kit. In one embodiment, the kit includes instructions for use of a carrier system and the kit. Each element may be provided individually or in combination, and each element may be provided in any suitable container such as a vial, a bottle or a tube. The kit may include crRNA as described herein and an optional unbound protective chain. The kit may include crRNA, wherein the protective chain is at least partially bound to a reprogrammable spacer portion (i.e., a spacer sequence) of a crRNA sequence. In one embodiment, the kit includes instructions in one or more languages, such as instructions in more than one language. These instructions may be specific to the applications and methods described herein.

在一个实施方案中,药盒包括用于在利用本文所述的一种或多种元件的过程中使用的一种或多种试剂。可在任何合适的容器中提供试剂。例如,药盒可提供一种或多种反应或储存缓冲液。试剂可以在特定测定中有用的形式提供,或者以需要在使用前添加一种或多种其他组分的形式提供(例如,以浓缩物或冻干形式提供)。缓冲液可以是任何缓冲液,包括但不限于碳酸钠缓冲液、碳酸氢钠缓冲液、硼酸盐缓冲液、Tris缓冲液、MOPS缓冲液、HEPES缓冲液以及它们的组合。在一个实施方案中,缓冲液是碱性的。在一个实施方案中,缓冲液具有约7至约10的pH。在一个实施方案中,药盒包括同源重组模板多核苷酸。在一个实施方案中,药盒包括本文所述的一种或多种载体和/或一种或多种多核苷酸。药盒可有利地允许提供本公开系统的所有元件。In one embodiment, the kit includes one or more reagents for use in the process of utilizing one or more elements described herein. Reagents can be provided in any suitable container. For example, the kit can provide one or more reaction or storage buffers. Reagents can be provided in a form useful in a particular assay, or provided in a form that requires one or more other components to be added before use (for example, provided in a concentrate or lyophilized form). The buffer can be any buffer, including but not limited to sodium carbonate buffer, sodium bicarbonate buffer, borate buffer, Tris buffer, MOPS buffer, HEPES buffer and combinations thereof. In one embodiment, the buffer is alkaline. In one embodiment, the buffer has a pH of about 7 to about 10. In one embodiment, the kit includes homologous recombination template polynucleotides. In one embodiment, the kit includes one or more vectors and/or one or more polynucleotides described herein. The kit can advantageously allow all elements of the disclosed system to be provided.

本公开的主要优点包括:The main advantages of the present disclosure include:

本公开首次发现了一种全新的Cas蛋白,本公开的Cas蛋白包括OBD结构域、REC结构域、RuvC结构域、Helical结构域、Nuc结构域,并且具有式I或式II所示的结构。本公开的Cas蛋白具有非常好的基因编辑活性,可对靶基因进行有效编辑或切割,可用于治疗有需要的受试者的病症或疾病。The present disclosure has discovered a new Cas protein for the first time, and the Cas protein of the present disclosure includes an OBD domain, a REC domain, a RuvC domain, a Helical domain, and a Nuc domain, and has a structure shown in Formula I or Formula II. The Cas protein of the present disclosure has very good gene editing activity, can effectively edit or cut the target gene, and can be used to treat the symptoms or diseases of subjects in need.

实施例Example

下面结合具体实施例,进一步阐述本公开。应理解,这些实施例仅用于说明本公开而不用于限制本公开的范围。下列实施例中未注明具体条件的实验方法,通常按照常规条件,例如Sambrook等人,分子克隆:实验室手册(New York:Cold Spring Harbor Laboratory Press,1989)中所述的条件,或按照制造厂商所建议的条件。除非另外说明,否则百分比和份数是重量百分比和重量份数。The present disclosure is further described below in conjunction with specific examples. It should be understood that these examples are only used to illustrate the present disclosure and are not used to limit the scope of the present disclosure. The experimental methods in the following examples where specific conditions are not specified are generally carried out under conventional conditions, such as the conditions described in Sambrook et al., Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory Press, 1989), or according to the conditions recommended by the manufacturer. Unless otherwise stated, percentages and parts are weight percentages and weight parts.

除非有特别说明,否则本公开实施例中的试剂和材料均为市售产品。Unless otherwise specified, the reagents and materials in the embodiments of the present disclosure are all commercially available products.

实施例1对Cas12o蛋白的识别Example 1 Identification of Cas12o protein

通过分析收集到的未培养宏基因组序列,从样品中找到了发现的CRISPR系统相关联的6个Cas12o蛋白或片段,并分别命名为Cas12o1-Cas12o6,其氨基酸序列分别如SEQ ID NO.1、3、5、7-9所示,核苷酸编码序列分别如SEQ ID NO.16、40、46所示。通过PILERCR对含有Cas12o1、Cas12o2、Cas12o3的样品进行了CRISPR座进行了注释,获得了对应的直接重复(Direct Repeat,DR)序列,分别如SEQ ID NO.2、4、6所示,具体如下表所示:
By analyzing the collected uncultured metagenome sequences, six Cas12o proteins or fragments associated with the discovered CRISPR system were found in the samples and named Cas12o1-Cas12o6, respectively, with amino acid sequences as shown in SEQ ID NO.1, 3, 5, 7-9, and nucleotide coding sequences as shown in SEQ ID NO.16, 40, and 46, respectively. The CRISPR loci of samples containing Cas12o1, Cas12o2, and Cas12o3 were annotated by PILERCR, and the corresponding direct repeat (DR) sequences were obtained, as shown in SEQ ID NO.2, 4, and 6, respectively, as shown in the following table:

通过RNAfold对上述3个DR序列进行了RNA二级结构的进一步分析,结果参见图6,所有DR序列明显都拥有非常保守的二级结构:The RNA secondary structure of the above three DR sequences was further analyzed by RNAfold. The results are shown in Figure 6. All DR sequences obviously have very conservative secondary structures:

上述3个DR序列包含5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3'结构,其中区段R1a和R1b是反向互补序列并形成第一茎(R1),所述第一茎(R1)具有在Cas12o中的3个或5个核苷酸对;区段Ba和Bb不同时存在,由存在的区段Ba或区段Bb形成的、由2个或3个核苷酸形成凸起(B);区段R2a和R2b是反向互补序列并形成第二茎(R2),所述第二茎(R2)具有在6个或7个碱基对;并且L为第二茎部处形成的、有5个或7个核苷酸形成的环。The above three DR sequences contain a 5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3' structure, wherein segments R1a and R1b are reverse complementary sequences and form a first stem (R1), which has 3 or 5 nucleotide pairs in Cas12o; segments Ba and Bb do not exist at the same time, and a protrusion (B) formed by 2 or 3 nucleotides is formed by the existing segments Ba or Bb; segments R2a and R2b are reverse complementary sequences and form a second stem (R2), which has 6 or 7 base pairs; and L is a loop formed at the second stem, with 5 or 7 nucleotides.

具体的,Cas12o1的所述DR序列如图6A所示,其包含5'-R1a(ACA)-Ba(不存在)-R2a(GGUAUCC)-L(UAAAC)-R2b(GGAUGCU)-Bb(GA)-R1b(UGU)-3'。Specifically, the DR sequence of Cas12o1 is shown in Figure 6A, which contains 5'-R1a(ACA)-Ba(not present)-R2a(GGUAUCC)-L(UAAAC)-R2b(GGAUGCU)-Bb(GA)-R1b(UGU)-3'.

Cas12o2的所述DR序列如图6B所示,其包含5'-R1a(UUACA)-Ba(不存在)-R2a(ACUAUUC)-L(UUGAAAC)-R2b(GAAUGGU)-Bb(GAU)-R1b(UGUAA)-3'。The DR sequence of Cas12o2 is shown in Figure 6B, which contains 5'-R1a(UUACA)-Ba(absent)-R2a(ACUAUUC)-L(UUGAAAC)-R2b(GAAUGGU)-Bb(GAU)-R1b(UGUAA)-3'.

Cas12o3的所述DR序列如图6C所示,其包含5'-R1a(UCAGU)-Ba(GUG)-R2a(GGUCUG)-L(AAACA)-R2b(CAGACC)-Bb(不存在)-R1b(AUUGA)-3'。The DR sequence of Cas12o3 is shown in Figure 6C, which contains 5'-R1a(UCAGU)-Ba(GUG)-R2a(GGUCUG)-L(AAACA)-R2b(CAGACC)-Bb(not present)-R1b(AUUGA)-3'.

通过分析RuvC结构域的系统发生学关系发现,Cas12o为属于class2,type V的新Cas亚型,其在RuvC结构域系统发生树上接近Cas12h、Cas12i、Cas12b亚型(图1)。利用蛋白折叠软件AlphaFold预测Cas12o三维构象(图2B),发现其具有保守结构域REC-I、REC-II、OBD、RuvC-I、Helical、RuvC-II、Nuc-I、RuvC-III、Nuc-II,其蛋白构象与Cas12家族的其他核酸酶均不相同。通过对Cas12o1、Cas12o2蛋白进行注释,发现其代表性的保守基序分布如图2A所示,通过对Cas12o3蛋白进行注释,发现其保守基序分布如图2C所示,OBD结构域为二分裂结构域,分为不连续的OBD-I和OBD-II结构域。By analyzing the phylogenetic relationship of the RuvC domain, it was found that Cas12o is a new Cas subtype belonging to class 2, type V, which is close to Cas12h, Cas12i, and Cas12b subtypes on the RuvC domain phylogenetic tree (Figure 1). The protein folding software AlphaFold was used to predict the three-dimensional conformation of Cas12o (Figure 2B), and it was found that it has conserved domains REC-I, REC-II, OBD, RuvC-I, Helical, RuvC-II, Nuc-I, RuvC-III, and Nuc-II. Its protein conformation is different from other nucleases in the Cas12 family. By annotating Cas12o1 and Cas12o2 proteins, it was found that their representative conserved motif distribution is shown in Figure 2A. By annotating Cas12o3 proteins, it was found that their conserved motif distribution is shown in Figure 2C. The OBD domain is a binary fission domain, which is divided into discontinuous OBD-I and OBD-II domains.

通过实验测试,发现Cas12o1表现出对含有靶标5'端PAM 5'-TN(其中N为A、T、G或C)-3'侧翼序列的序列的强烈偏好(图7)。Through experimental testing, it was found that Cas12o1 showed a strong preference for sequences containing the target 5' end PAM 5'-TN (where N is A, T, G or C)-3' flanking sequence (Figure 7).

实施例2 Cas12o蛋白切割活性的验证Example 2 Verification of Cas12o protein cleavage activity

为了测试Cas12o1的切割活性,构建了分别表达Cas12o1及crRNA的载体,具体如下:In order to test the cleavage activity of Cas12o1, vectors expressing Cas12o1 and crRNA were constructed as follows:

选择人TTR基因作为切割靶标基因,并根据hTTR1靶序列(SEQ ID NO.12)设计了具有PAM序列为TN对应靶序列的Cas12o1-hTTR1-crRNA(SEQ ID NO.14)。The human TTR gene was selected as the cutting target gene, and Cas12o1-hTTR1-crRNA (SEQ ID NO.14) with a PAM sequence of TN corresponding to the target sequence was designed based on the hTTR1 target sequence (SEQ ID NO.12).

为了比较Cas12o1和对照LbCpf1(氨基酸序列为SEQ ID NO.18,核苷酸编码序列为SEQ ID NO.19)在相同靶序列的切割活性,设计了具有PAM为TTN的LbCpf1-hTTR1-crRNA(SEQ ID NO.21)。In order to compare the cutting activity of Cas12o1 and the control LbCpf1 (amino acid sequence is SEQ ID NO.18, nucleotide coding sequence is SEQ ID NO.19) at the same target sequence, LbCpf1-hTTR1-crRNA (SEQ ID NO.21) with PAM as TTN was designed.

在Cas12o1-hTTR1-crRNA序列和LbCpf1-hTTR1-crRNA序列5'端分别加上T7启动子,并在两者的3'端分别加上rrnB T2终止子,分别得到了Cas12o1-hTTR1-crRNA表达序列、LbCpf1-hTTR1-crRNA表达序列,其中,单下划线序列部分为Cas12o1/LbCpf1 DR序列,双下划线序列部分为spacer序列,斜体序列部分为T7启动子,波浪下划线序列部分为rrnB T2终止子序列,spacer序列和rrnB T2终止子序列之间为linker序列,虚线序列部分为MfeI酶切位点,粗体序列部分为MluI酶切位点,CACCG为linker,为保护序列片段完整性,在表达序列的5'端引入了保护碱基AGC,在表达序列3'引入了保护碱基ATA。
T7 promoter was added to the 5' end of Cas12o1-hTTR1-crRNA sequence and LbCpf1-hTTR1-crRNA sequence, and rrnB T2 terminator was added to the 3' end of both, respectively, to obtain Cas12o1-hTTR1-crRNA expression sequence and LbCpf1-hTTR1-crRNA expression sequence, respectively, wherein the single underline sequence portion is the Cas12o1/LbCpf1 DR sequence, the double underline sequence portion is the spacer sequence, the italic sequence portion is the T7 promoter, the wavy underline sequence portion is the rrnB T2 terminator sequence, the linker sequence is between the spacer sequence and the rrnB T2 terminator sequence, the dotted sequence portion is the MfeI restriction site, the bold sequence portion is the MluI restriction site, CACCG is the linker, and to protect the integrity of the sequence fragment, the protective base AGC was introduced at the 5' end of the expression sequence, and the protective base ATA was introduced at the 3' end of the expression sequence.

分别合成Cas12o1和LbCpf1的核苷酸编码序列(由苏州泓迅生物科技有限公司以及北京擎科生物科技股份有限公司合成),并分别将其构建至ABE8e质粒(Addgene,Plasmid#138489)第466-5160位,构建得到Cas12o1表达载体(图3A)和LbCpf1表达载体(图3B)。The nucleotide coding sequences of Cas12o1 and LbCpf1 were synthesized (synthesized by Suzhou Hongxun Biotechnology Co., Ltd. and Beijing Qingke Biotechnology Co., Ltd.), and constructed into the 466-5160 positions of the ABE8e plasmid (Addgene, Plasmid #138489), respectively, to construct the Cas12o1 expression vector (Figure 3A) and the LbCpf1 expression vector (Figure 3B).

对Cas12o1-hTTR1-crRNA表达序列片段和LbCpf1-hTTR1-crRNA表达序列片段(由苏州泓迅生物科技有限公司合成)分别进行双酶切(MfeI/MluI)处理,之后分别插入至已双酶切(MfeI/MluI)处理得到的Cas12o1表达载体骨架和LbCpf1表达载体骨架上,得到Cas12o1-hTTR1-crRNA表达载体和LbCpf1-hTTR1-crRNA表达载体。The Cas12o1-hTTR1-crRNA expression sequence fragment and the LbCpf1-hTTR1-crRNA expression sequence fragment (synthesized by Suzhou Hongxun Biotechnology Co., Ltd.) were treated with double enzyme digestion (MfeI/MluI), and then inserted into the Cas12o1 expression vector backbone and the LbCpf1 expression vector backbone that had been treated with double enzyme digestion (MfeI/MluI), respectively, to obtain the Cas12o1-hTTR1-crRNA expression vector and the LbCpf1-hTTR1-crRNA expression vector.

设计并合成(苏州泓迅生物科技有限公司)带有靶序列hTTR1靶序列(SEQ ID NO.12)的araC-pBAD-CCDB片段(SEQ ID NO.17),并将其插入至pKESK2质粒2(Addgene,Plasmid,#64857)的1284-1300位点处,获得Target质粒(SEQ ID NO.11,图谱参见图4),Target质粒上带有CCDB基因,其能够表达CCDB毒性蛋白(CCDB毒性蛋白作为DNA促旋酶抑制剂,能够锁定DNA促旋酶和断裂的双链DNA复合物,使DNA促旋酶不能发挥作用,最终导致细胞死亡),由L-阿拉伯糖诱导的PBAD启动子能够调控CCDB基因的表达,当Target质粒上的靶序列hTTR1靶序列被切割时,则PBAD启动子与CCDB毒性蛋白的调控表达通路中断,宿主细胞不会产生ccdB毒性蛋白而存活;反之,如果Target质粒上的hTTR1靶序列不被切割,则PBAD启动子调控CCDB基因表达出ccDB毒性蛋白导致宿主细胞死亡,故细菌存活比例可以指示切割活性。The araC-pBAD-CCDB fragment (SEQ ID NO.17) with the target sequence hTTR1 target sequence (SEQ ID NO.12) was designed and synthesized (Suzhou Hongxun Biotechnology Co., Ltd.), and inserted into the 1284-1300 site of pKESK2 plasmid 2 (Addgene, Plasmid, #64857) to obtain the Target plasmid (SEQ ID NO.11, see Figure 4 for the map). The Target plasmid carries the CCDB gene, which can express the CCDB toxic protein (the CCDB toxic protein acts as a DNA gyrase inhibitor and can lock the DNA promoter). The gyrase and broken double-stranded DNA complex makes the DNA gyrase unable to function, eventually leading to cell death). The PBAD promoter induced by L-arabinose can regulate the expression of the CCDB gene. When the target sequence hTTR1 on the Target plasmid is cut, the regulatory expression pathway between the PBAD promoter and the CCDB toxic protein is interrupted, and the host cell will not produce ccdB toxic protein and survive; conversely, if the hTTR1 target sequence on the Target plasmid is not cut, the PBAD promoter regulates the CCDB gene to express the ccDB toxic protein, leading to the death of the host cell. Therefore, the bacterial survival ratio can indicate the cutting activity.

将Target质粒转染至DH5a感受态细胞中,之后将Cas12o1-hTTR1-crRNA表达载体、LbCpf1-hTTR1-crRNA表达载体分别转染至带有Target质粒的DH5a感受态细胞中。The Target plasmid was transfected into DH5a competent cells, and then the Cas12o1-hTTR1-crRNA expression vector and the LbCpf1-hTTR1-crRNA expression vector were transfected into DH5a competent cells carrying the Target plasmid, respectively.

通过分析,发现Cas12o1和LbCpf1均有明显的切割活性,Cas12o1的切割活性远高于LbCpf1(图5)。Through analysis, it was found that both Cas12o1 and LbCpf1 had obvious cleavage activity, and the cleavage activity of Cas12o1 was much higher than that of LbCpf1 (Figure 5).

实施例3 Cas12o1、Cas12o2以及Cas12o3在哺乳动物细胞中的切割活性Example 3 Cleavage activity of Cas12o1, Cas12o2 and Cas12o3 in mammalian cells

为验证Cas12o1、Cas12o2、Cas12o3在哺乳动物细胞中的切割活性,将Cas12o1的核苷酸编码序列(SEQ ID NO.16)、Cas12o2的核苷酸编码序列(SEQ ID NO.40)、Cas12o3的核苷酸编码序列(SEQ ID NO.46)分别构建至pcDNA3.1(+)(Invitrogen,V79020),构建得到各个Cas蛋白表达载体。选择hTTR作为靶标,并将包含hTTR间隔序列的对应crRNA构建至pGL3-U6-sgRNA-EGFP(Plasmid#107721)质粒,得到crRNA表达载体。To verify the cleavage activity of Cas12o1, Cas12o2, and Cas12o3 in mammalian cells, the nucleotide coding sequence of Cas12o1 (SEQ ID NO.16), the nucleotide coding sequence of Cas12o2 (SEQ ID NO.40), and the nucleotide coding sequence of Cas12o3 (SEQ ID NO.46) were respectively constructed into pcDNA3.1(+) (Invitrogen, V79020) to construct each Cas protein expression vector. hTTR was selected as the target, and the corresponding crRNA containing the hTTR spacer sequence was constructed into the pGL3-U6-sgRNA-EGFP (Plasmid #107721) plasmid to obtain the crRNA expression vector.

靶序列及crRNA如下表所示:
The target sequence and crRNA are shown in the following table:

将Cas蛋白表达载体、crRNA表达载体和Target质粒分别转染至DH5a感受态细胞中大量制备,测定浓度后保存备用。The Cas protein expression vector, crRNA expression vector and Target plasmid were transfected into DH5a competent cells for large-scale preparation, and the concentrations were measured and stored for later use.

在转染前大约16小时,将HEK293T细胞铺板至24孔板,每孔细胞数为2×105(500μL)中。将Cas蛋白表达载体、crRNA表达载体、EGFP-C1质粒分别混合后用25μl的转染专用减血清培养基(源培生物,L530KJ)培养基稀释,再加入2μl Lipofectamine3000(Invitrogen,L3000015)试剂,吹打混匀作为试剂A,静置5分钟。同时,将2μl的Lipofectamine 3000转染试剂(Invitrogen,L3000015)用25μl的转染专用减血清培养基(源培生物,L530KJ)稀释并混匀,作为试剂B,静置5分钟。About 16 hours before transfection, HEK293T cells were plated in 24-well plates with 2×10 5 cells per well (500 μL). Cas protein expression vector, crRNA expression vector, and EGFP-C1 plasmid were mixed and then incubated with 25 μL of Dilute the transfection-specific reduced serum medium (Source Bio, L530KJ), add 2 μl of Lipofectamine 3000 (Invitrogen, L3000015) reagent, mix well by pipetting as reagent A, and let stand for 5 minutes. At the same time, 2 μl of Lipofectamine 3000 transfection reagent (Invitrogen, L3000015) was diluted with 25 μl of Dilute and mix the reduced serum medium for transfection (Source Biotechnology, L530KJ) as reagent B and let stand for 5 minutes.

将上述试剂A与试剂B混合并吹打均匀,静置20分钟。静置结束后将混合试剂逐滴加入待转染的24孔板细胞中,24孔板中每孔细胞转染的质粒用量分别是Cas蛋白表达载体0.3μg,crRNA表达载体0.3μg,EGFP-C1质粒0.3ug。放回37℃、5% CO2培养箱培养。转染6小时后将培养基换为含有10%FBS的DMEM培养基。转染48小时后,EGFP荧光蛋白表达表明细胞转染成功,分选EGFP表达阳性的细胞进行编辑效率的检测。将所述细胞进行基因组抽提(采用基因组DNA提取试剂盒,TIANGEN,DP304-03),PCR扩增后的PCR产物用于高通量深度测序(擎科生物科技有限公司)或Sanger测序(铂尚生物技术(上海)有限公司)进行编辑效率的鉴定。The above reagents A and B were mixed and blown evenly, and allowed to stand for 20 minutes. After standing, the mixed reagents were added dropwise to the 24-well plate cells to be transfected. The plasmid dosages for transfection of each well of the 24-well plate were 0.3 μg of Cas protein expression vector, 0.3 μg of crRNA expression vector, and 0.3 ug of EGFP-C1 plasmid. Return to 37 ° C, 5% CO 2 incubator for culture. After 6 hours of transfection, the culture medium was replaced with DMEM culture medium containing 10% FBS. After 48 hours of transfection, EGFP fluorescent protein expression indicated that the cells were successfully transfected, and cells with positive EGFP expression were sorted for editing efficiency detection. The cells were subjected to genomic extraction (using a genomic DNA extraction kit, TIANGEN, DP304-03), and the PCR products after PCR amplification were used for high-throughput deep sequencing (Qingke Biotechnology Co., Ltd.) or Sanger sequencing (Boshang Biotechnology (Shanghai) Co., Ltd.) for identification of editing efficiency.

经鉴定发现,Cas12o1、Cas12o2和Cas12o3在hTTR靶标的平均插入缺失百分比分别为16.11%、5.37%和11.89%。证明Cas12o1、Cas12o2和Cas12o3在向导RNA的介导下可以在哺乳动物细胞中实现有效切割。It was found that the average insertion and deletion percentages of Cas12o1, Cas12o2 and Cas12o3 in the hTTR target were 16.11%, 5.37% and 11.89%, respectively, proving that Cas12o1, Cas12o2 and Cas12o3 can achieve effective cutting in mammalian cells under the mediation of guide RNA.

从实施例1中可以看出,本公开的Cas12o1在大肠杆菌中的切割效率远高于LbCpf1。实施例2证明本公开的CRISPR-Cas12o系统能够在哺乳动物细胞中实现多功能和高效的基因组编辑。且由于Cas12o系列(Cas12o1、Cas12o2、Cas12o3)较小的尺寸、结构简单且较短的crRNA及自加工特性,适用包括AAV或LNP在内的递送方式,未来能够用于体内或离体的多重基因编辑应用。As can be seen from Example 1, the cleavage efficiency of Cas12o1 disclosed in the present invention in Escherichia coli is much higher than that of LbCpf1. Example 2 proves that the CRISPR-Cas12o system disclosed in the present invention can achieve multifunctional and efficient genome editing in mammalian cells. And due to the smaller size, simple structure, shorter crRNA and self-processing characteristics of the Cas12o series (Cas12o1, Cas12o2, Cas12o3), it is suitable for delivery methods including AAV or LNP, and can be used for multiple gene editing applications in vivo or in vitro in the future.

序列信息:











Sequence information:











在不脱离本公开的范围和精神的情况下,本公开所述的方法、药物组合物和药盒的各种修改和变型对于本领域技术人员而言是显而易见的。尽管已结合具体实施方案描述了本公开,但是应当理解,本公开能够进行进一步修改,并且所要求保护的本公开不应当不适当地限定于此类具体实施方案。实际上,对于本领域技术人员而言显而易见的用于实施本公开的所述模式的各种变型旨在落入本公开的范围内。本申请旨在涵盖大体上符合本公开原理的、包括虽然不属于本公开所公开内容范围但属于本公开所属领域的公知常用技术手段并可以应用于上文中阐述的必要特征中的任何变型、用途或者变更。Without departing from the scope and spirit of the present disclosure, various modifications and variations of the methods, pharmaceutical compositions and kits described in the present disclosure will be apparent to those skilled in the art. Although the present disclosure has been described in conjunction with specific embodiments, it should be understood that the present disclosure is capable of further modifications, and the disclosure claimed for protection should not be unduly limited to such specific embodiments. In fact, various variations of the modes for implementing the present disclosure that are apparent to those skilled in the art are intended to fall within the scope of the present disclosure. The present application is intended to cover any variation, use or change that is generally consistent with the principles of the present disclosure, including known commonly used technical means that do not belong to the scope of the disclosure disclosed in the present disclosure but belong to the field to which the present disclosure belongs and can be applied to the essential features set forth above.

Claims (34)

一种Cas蛋白,其特征在于,包括OBD结构域、REC结构域、RuvC结构域、Helical结构域、和Nuc结构域;A Cas protein, characterized in that it comprises an OBD domain, a REC domain, a RuvC domain, a Helical domain, and a Nuc domain; 任选地,所述RuvC结构域包含RuvC-I、RuvC-II和RuvC-III结构域;Optionally, the RuvC domain comprises RuvC-I, RuvC-II and RuvC-III domains; 任选地,所述Cas蛋白不包含HNH结构域和PI结构域;Optionally, the Cas protein does not comprise a HNH domain and a PI domain; 任选地,所述RuvC-III结构域位于Nuc-I结构域与Nuc-II结构域之间;Optionally, the RuvC-III domain is located between the Nuc-I domain and the Nuc-II domain; 任选地,所述OBD结构域为二分裂结构域(bi-split domain),包括OBD-I结构域、OBD-II结构域,所述OBD-I结构域位于N端,所述Nuc-II结构域位于C端;Optionally, the OBD domain is a bi-split domain, including an OBD-I domain and an OBD-II domain, wherein the OBD-I domain is located at the N-terminus, and the Nuc-II domain is located at the C-terminus; 任选地,所述Cas蛋白不借助tracrRNA行使核酸切割功能。Optionally, the Cas protein performs nucleic acid cleavage function without the aid of tracrRNA. 如权利要求1所述的Cas蛋白,其特征在于,所述Cas蛋白从N端到C端依次具有如下结构域:
A1-A2-A3-A4-Z5(I),
The Cas protein according to claim 1, characterized in that the Cas protein has the following domains from N-terminus to C-terminus:
A1-A2-A3-A4-Z5(I),
其中,A1为REC结构域;Among them, A1 is the REC domain; A2为OBD结构域;A2 is the OBD domain; A3为RuvC结构域;A3 is the RuvC domain; A4为Helical结构域;A4 is the Helical domain; Z5含有RuvC结构域和Nuc结构域;Z5 contains the RuvC domain and the Nuc domain; 并且,各“-”独立地为键或接头;And, each "-" is independently a bond or a linker; 任选地,所述OBD结构域包括OBD-II结构域;Optionally, the OBD domain comprises an OBD-II domain; 任选地,所述REC结构域包含REC-I结构域、REC-II结构域;Optionally, the REC domain comprises a REC-I domain and a REC-II domain; 任选地,所述RuvC结构域包含RuvC-I结构域、RuvC-II结构域、RuvC-III结构域;Optionally, the RuvC domain comprises a RuvC-I domain, a RuvC-II domain, and a RuvC-III domain; 任选地,Z5具有式II所示的结构:
Y1-Y2-Y3-Y4(II),
Optionally, Z5 has the structure shown in Formula II:
Y1-Y2-Y3-Y4(II),
其中,Y1为RuvC-II结构域;Among them, Y1 is the RuvC-II domain; Y2为Nuc-I结构域;Y2 is the Nuc-I domain; Y3为RuvC-III结构域;Y3 is the RuvC-III domain; Y4为Nuc-II结构域;Y4 is the Nuc-II domain; 并且,各“-”独立地为键或接头。Furthermore, each "-" is independently a bond or a linker.
如权利要求1所述的Cas蛋白,其特征在于,所述Cas蛋白从N端到C端具有如下结构:
Z1-Z2-Z3-Z4-X-Z5(III),
The Cas protein according to claim 1, characterized in that the Cas protein has the following structure from N-terminus to C-terminus:
Z1-Z2-Z3-Z4-X-Z5(III),
其中,Z1为无或OBD-I结构域;Among them, Z1 is none or OBD-I domain; Z2为REC结构域;Z2 is the REC domain; Z3为OBD-II结构域;Z3 is the OBD-II domain; Z4为RuvC-I结构域;Z4 is the RuvC-I domain; X为Helical结构域;X is the Helical domain; Z5含有RuvC结构域和Nuc结构域;Z5 contains the RuvC domain and the Nuc domain; 并且,各“-”独立地为键或接头;And, each "-" is independently a bond or a linker; 任选地,所述REC结构域包含REC-I结构域、REC-II结构域;Optionally, the REC domain comprises a REC-I domain and a REC-II domain; 任选地,所述RuvC结构域选自下组:RuvC-II结构域、RuvC-III结构域、或其组合;Optionally, the RuvC domain is selected from the group consisting of a RuvC-II domain, a RuvC-III domain, or a combination thereof; 任选地,所述Nuc结构域选自下组:Nuc-I结构域、Nuc-II结构域、或其组合;Optionally, the Nuc domain is selected from the group consisting of a Nuc-I domain, a Nuc-II domain, or a combination thereof; 任选地,Z5具有式II所示的结构:
Y1-Y2-Y3-Y4(II);
Optionally, Z5 has the structure shown in Formula II:
Y1-Y2-Y3-Y4(II);
其中,Y1为RuvC-II结构域;Among them, Y1 is the RuvC-II domain; Y2为Nuc-I结构域;Y2 is the Nuc-I domain; Y3为RuvC-III结构域;Y3 is the RuvC-III domain; Y4为Nuc-II结构域;Y4 is the Nuc-II domain; 并且,各“-”独立地为键或接头。Furthermore, each "-" is independently a bond or a linker.
如权利要求1所述的Cas蛋白,其特征在于,所述OBD结构域、REC结构域、RuvC结构域、Helical结构域、Nuc结构域与表1中的OBD结构域、REC结构域、RuvC结构域、Helical结构域、Nuc结构域的氨基酸序列具有至少约80%(例如,至少约80%、85%、90%、91%、92%、93%、94%、95%、96%、97%、98%、99%、99.1%、99.2%、99.3%、99.4%、99.5%、99.6%、99.7%、99.8%、99.9%或100%)的序列同一性的氨基酸序列;The Cas protein of claim 1, characterized in that the OBD domain, REC domain, RuvC domain, Helical domain, and Nuc domain have an amino acid sequence having at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9% or 100%) sequence identity with the amino acid sequence of the OBD domain, REC domain, RuvC domain, Helical domain, and Nuc domain in Table 1; 任选地,所述Cas蛋白的长度为约500至约1200个氨基酸,任选地,Cas蛋白包含约700与1100个氨基酸,更任选地,Cas蛋白包含约900与1000个氨基酸;Optionally, the Cas protein is about 500 to about 1200 amino acids in length, optionally, the Cas protein comprises about 700 and 1100 amino acids, more optionally, the Cas protein comprises about 900 and 1000 amino acids; 任选地,所述Cas蛋白为2类V型Cas核酸内切酶;Optionally, the Cas protein is a Class 2 V-type Cas endonuclease; 任选地,所述Cas蛋白包括与SEQ ID NO:1、3、5、7-9中的任一者具有至少70%、至少75%、至少80%或至少90%序列同一性的序列。Optionally, the Cas protein comprises a sequence having at least 70%, at least 75%, at least 80% or at least 90% sequence identity with any one of SEQ ID NO: 1, 3, 5, 7-9. 一种融合蛋白,其特征在于,包含权利要求1所述的Cas蛋白;以及一个或多个功能结构域;A fusion protein, characterized in that it comprises the Cas protein according to claim 1; and one or more functional domains; 任选地,所述功能结构域功能结构域选自定位信号、报告蛋白、Cas蛋白靶向部分、DNA结合域、表位标签、转录激活域、转录抑制域、核酸酶、脱氨结构域、甲基化酶、脱甲基酶、转录释放因子、HDAC、裂解活性多肽、连接酶、整合酶、转座酶、重组酶、聚合酶和碱基切除修复抑制剂(如尿嘧啶-DNA糖基化酶抑制剂(UGI))。Optionally, the functional domain functional domain is selected from a localization signal, a reporter protein, a Cas protein targeting portion, a DNA binding domain, an epitope tag, a transcription activation domain, a transcription repression domain, a nuclease, a deamination domain, a methylase, a demethylase, a transcription release factor, an HDAC, a cleavage-active polypeptide, a ligase, an integrase, a transposase, a recombinase, a polymerase, and a base excision repair inhibitor (such as a uracil-DNA glycosylase inhibitor (UGI)). 如权利要求5所述的融合蛋白,其特征在于,所述功能结构域包括以下一种或多种对靶序列的酶活性:The fusion protein according to claim 5, characterized in that the functional domain comprises one or more of the following enzymatic activities on the target sequence: 甲基化酶活性、脱甲基酶活性、乙酰基转移酶活性、脱乙酰酶活性、激酶活性、磷酸酶活性、泛素连接酶活性、去泛素化活性、腺苷酸化活性、脱腺苷酸化活性、SUMO化活性、脱SUMO化活性、核糖基化活性、脱核糖基化活性、豆蔻酰化活性、脱豆蔻酰化活性、糖基化活性(例如,来自O-GlcNAc转移酶)和脱糖基化活性;Methylase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase), and deglycosylation activity; 任选地,所述功能结构域选自腺苷脱氨酶催化结构域或胞苷脱氨酶催化结构域;Optionally, the functional domain is selected from an adenosine deaminase catalytic domain or a cytidine deaminase catalytic domain; 任选地,所述腺苷脱氨酶催化结构域或胞苷脱氨酶催化结构域包括ADAR1、ADAR2、APOBEC、AID或TAD中的一种或多种;Optionally, the adenosine deaminase catalytic domain or cytidine deaminase catalytic domain comprises one or more of ADAR1, ADAR2, APOBEC, AID or TAD; 任选地,所述腺苷脱氨酶催化结构域包含与SEQ ID NO:30所示的氨基酸序列具有至少80%、82%、85%、87%、90%、92%、95%、96%、97%、98%或99%或100%同一性的氨基酸序列,并且其保留如SEQ ID NO:30所示的氨基酸序列的脱氨活性;Optionally, the adenosine deaminase catalytic domain comprises an amino acid sequence that is at least 80%, 82%, 85%, 87%, 90%, 92%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence shown in SEQ ID NO: 30, and which retains the deamination activity of the amino acid sequence shown in SEQ ID NO: 30; 任选地,所述腺苷脱氨酶催化结构域的氨基酸序列相对于SEQ ID NO:30所示的氨基酸序列出现氨基酸添加、插入、缺失和置换;Optionally, the amino acid sequence of the adenosine deaminase catalytic domain has amino acid additions, insertions, deletions and substitutions relative to the amino acid sequence shown in SEQ ID NO: 30; 任选地,所述腺苷脱氨酶催化结构域包括SEQ ID NO:30所示的氨基酸序列的突变体:E18K+F19S+N20L,命名为腺苷脱氨酶004V14(可参见WO2023193536A1);Optionally, the adenosine deaminase catalytic domain comprises a mutant of the amino acid sequence shown in SEQ ID NO: 30: E18K+F19S+N20L, named adenosine deaminase 004V14 (see WO2023193536A1); 任选地,所述腺苷脱氨酶催化结构域包含与SEQ ID NO:31(选自CN114634923A中的005V1脱氨酶,在该申请中氨基酸序列为SEQ ID NO:2)所示的氨基酸序列具有至少80%、82%、85%、87%、90%、92%、95%、96%、97%、98%或99%或100%同一性的氨基酸序列,并且其保留如SEQ ID NO:31所示的氨基酸序列的脱氨活性;Optionally, the adenosine deaminase catalytic domain comprises an amino acid sequence that is at least 80%, 82%, 85%, 87%, 90%, 92%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence shown in SEQ ID NO: 31 (selected from 005V1 deaminase in CN114634923A, in which the amino acid sequence is SEQ ID NO: 2), and it retains the deamination activity of the amino acid sequence shown in SEQ ID NO: 31; 任选地,所述腺苷脱氨酶催化结构域的氨基酸序列相对于SEQ ID NO:31所示的氨基酸序列出现氨基酸添加、插入、缺失和置换;Optionally, the amino acid sequence of the adenosine deaminase catalytic domain has amino acid additions, insertions, deletions and substitutions relative to the amino acid sequence shown in SEQ ID NO:31; 任选地,所述腺苷脱氨酶催化结构域包括SEQ ID NO:31所示的氨基酸序列的突变体:Q148G+Q149M+P150R,命名为脱氨酶005V1-10-3;Optionally, the adenosine deaminase catalytic domain comprises a mutant of the amino acid sequence shown in SEQ ID NO: 31: Q148G+Q149M+P150R, named deaminase 005V1-10-3; 任选地,所述功能结构域是TadA8e的全长或功能性片段;Optionally, the functional domain is the full length or functional fragment of TadA8e; 任选地,所述定位信号包括核定位信号(NLS)和/或核输出信号(NES);Optionally, the localization signal comprises a nuclear localization signal (NLS) and/or a nuclear export signal (NES); 任选地,所述核定位信号的序列如SEQ ID NO:22-29、32-39、41-45中任一所示;Optionally, the sequence of the nuclear localization signal is shown in any one of SEQ ID NO: 22-29, 32-39, 41-45; 任选地,所述核定位信号的序列位于、靠近或接近权利要求1所述的Cas蛋白的末端(例如,N端或C端);Optionally, the sequence of the nuclear localization signal is located at, near or close to the end (e.g., N-terminus or C-terminus) of the Cas protein of claim 1; 任选地,所述核输出信号包括蛋白酪氨酸激酶2(如人蛋白酪氨酸激酶2);Optionally, the nuclear export signal comprises protein tyrosine kinase 2 (such as human protein tyrosine kinase 2); 任选地,所述报告蛋白包括谷胱甘肽-S-转移酶(GST)、辣根过氧化物酶(HRP)、氯霉素乙酰转移酶(CAT)、β-半乳糖苷酶、β-葡糖醛酸糖苷酶、自发荧光蛋白;Optionally, the reporter protein comprises glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), β-galactosidase, β-glucuronidase, autofluorescent protein; 任选地,所述自发荧光蛋白包括绿色荧光蛋白(例如,GFP、GFP-2、tagGFP、turboGFP、eGFP、CopGFP、AceGFP等)、HcRed、DsRed、青色荧光蛋白(例如,eCFP、Cerulean、CyPet、AmCyanl等)、黄色荧光蛋白(例如,(例如,YFP、eYFP、Citrine、Venus、YPet、PhiYFP等)、蓝色荧光蛋白(例如,eBFP、eBFP2、Azurite、mKalamal、GFPuv、Sapphire、T-sapphire);Optionally, the autofluorescent protein includes green fluorescent protein (e.g., GFP, GFP-2, tagGFP, turboGFP, eGFP, CopGFP, AceGFP, etc.), HcRed, DsRed, cyan fluorescent protein (e.g., eCFP, Cerulean, CyPet, AmCyanl, etc.), yellow fluorescent protein (e.g., (e.g., YFP, eYFP, Citrine, Venus, YPet, PhiYFP, etc.), blue fluorescent protein (e.g., eBFP, eBFP2, Azurite, mKalamal, GFPuv, Sapphire, T-sapphire); 任选地,所述DNA结合域包括甲基化结合蛋白、LexADBD、Gal4DBD;Optionally, the DNA binding domain includes methylation binding protein, LexADBD, Gal4DBD; 任选地,所述表位标签包括组氨酸标签、V5标签、FLAG标签、流感病毒血凝素标签、Myc标签、VSV-G标签、硫氧还蛋白标签、链霉亲和素标签;Optionally, the epitope tag comprises a histidine tag, a V5 tag, a FLAG tag, an influenza virus hemagglutinin tag, a Myc tag, a VSV-G tag, a thioredoxin tag, a streptavidin tag; 任选地,所述转录激活域包括VP64和/或VPR;Optionally, the transcriptional activation domain comprises VP64 and/or VPR; 任选地,所述转录抑制域包括KRAB和/或SID;Optionally, the transcriptional repression domain comprises KRAB and/or SID; 任选地,所述核酸酶包括FokI;Optionally, the nuclease comprises FokI; 任选地,所述裂解活性多肽包括具有单链RNA裂解活性的多肽、具有双链RNA裂解活性的多肽、具有单链DNA裂解活性的多肽或具有双链DNA裂解活性的多肽;Optionally, the cleavage active polypeptide includes a polypeptide having single-stranded RNA cleavage activity, a polypeptide having double-stranded RNA cleavage activity, a polypeptide having single-stranded DNA cleavage activity, or a polypeptide having double-stranded DNA cleavage activity; 任选地,所述连接酶包括DNA连接酶和/或RNA连接酶;Optionally, the ligase comprises DNA ligase and/or RNA ligase; 任选地,所述功能结构域连接于所述的Cas蛋白的N端,和/或C端;Optionally, the functional domain is connected to the N-terminus and/or C-terminus of the Cas protein; 任选地,所述功能结构域插入到所述Cas蛋白的N端和C端之间;Optionally, the functional domain is inserted between the N-terminus and the C-terminus of the Cas protein; 任选地,所述一个或多个功能结构域任选地通过接头连接至所述Cas蛋白的N端和/或C端;Optionally, the one or more functional domains are optionally connected to the N-terminus and/or C-terminus of the Cas protein via a linker; 任选地,所述功能结构域通过接头插入到所述Cas蛋白的N端和C端之间。Optionally, the functional domain is inserted between the N-terminus and the C-terminus of the Cas protein via a linker. 如权利要求5所述的融合蛋白,其特征在于,所述融合蛋白从N端到C端具有如下结构:
Z1-Z2(I’);或
Z2-Z1(II’);或
Z3-Z1-Z4(III’);
The fusion protein according to claim 5, characterized in that the fusion protein has the following structure from the N-terminus to the C-terminus:
Z1-Z2(I'); or
Z2-Z1(II'); or
Z3-Z1-Z4(III');
其中,Z1为胞嘧啶脱氨酶或腺苷脱氨酶;wherein Z1 is cytosine deaminase or adenosine deaminase; Z2为权利要求1所述的Cas蛋白;Z2 is the Cas protein according to claim 1; Z3为权利要求1所述的Cas蛋白的N端片段;Z3 is the N-terminal fragment of the Cas protein according to claim 1; Z4为权利要求1所述的Cas蛋白的C端片段;Z4 is the C-terminal fragment of the Cas protein according to claim 1; 并且,各“-”独立地为键或接头。Furthermore, each "-" is independently a bond or a linker.
一种分离的多核苷酸,其特征在于,所述的多核苷酸编码权利要求1所述的Cas蛋白或权利要求5所述的融合蛋白;An isolated polynucleotide, characterized in that the polynucleotide encodes the Cas protein according to claim 1 or the fusion protein according to claim 5; 任选地,所述分离的核苷酸包括经过人源化优化的序列;Optionally, the isolated nucleotide comprises a sequence optimized for humanization; 任选地,所述的多核苷酸在所述变体的ORF的侧翼还额外含有选自下组的辅助元件:信号肽、分泌肽、标签序列(如6His)、或其组合;Optionally, the polynucleotide further contains auxiliary elements selected from the following groups on the flank of the ORF of the variant: a signal peptide, a secretory peptide, a tag sequence (such as 6His), or a combination thereof; 任选地,所述的多核苷酸选自下组:基因组序列、cDNA序列、RNA序列、或其组合;Optionally, the polynucleotide is selected from the group consisting of a genomic sequence, a cDNA sequence, an RNA sequence, or a combination thereof; 任选地,该多核苷酸还包含与所述变体的ORF序列操作性连接的启动子;Optionally, the polynucleotide further comprises a promoter operably linked to the ORF sequence of the variant; 任选地,所述的启动子选自下组:组成型启动子、组织特异性启动子、诱导型启动子、或者强启动子;Optionally, the promoter is selected from the group consisting of a constitutive promoter, a tissue-specific promoter, an inducible promoter, or a strong promoter; 任选地,所述多核苷酸已被密码子优化以在真核细胞中表达;Optionally, the polynucleotide has been codon-optimized for expression in eukaryotic cells; 任选地,所述多核苷酸为根据宿主细胞的密码子偏好性进行密码子优化的多核苷酸;Optionally, the polynucleotide is a polynucleotide that is codon-optimized according to the codon preference of the host cell; 任选地,宿主细胞包括原核细胞或真核细胞;Optionally, the host cell comprises a prokaryotic cell or a eukaryotic cell; 任选地,所述的宿主细胞为真核细胞,如酵母细胞、植物细胞或哺乳动物细胞(包括人和非人哺乳动物);Optionally, the host cell is a eukaryotic cell, such as a yeast cell, a plant cell or a mammalian cell (including human and non-human mammals); 任选地,所述的宿主细胞为原核细胞,如大肠杆菌;Optionally, the host cell is a prokaryotic cell, such as Escherichia coli; 任选地,所述酵母细胞选自下组的一种或多种来源的酵母:毕氏酵母、克鲁维酵母、或其组合;Optionally, the yeast cell is selected from one or more sources of yeast selected from the group consisting of Pichia pastoris, Kluyveromyces, or a combination thereof; 任选地,所述的酵母细胞包括:克鲁维酵母,更佳地为马克斯克鲁维酵母、和/或乳酸克鲁维酵母;Optionally, the yeast cell comprises: Kluyveromyces, more preferably Kluyveromyces marxianus, and/or Kluyveromyces lactis; 任选地,所述宿主细胞选自下组:大肠杆菌、麦胚细胞,昆虫细胞,SF9、Hela、HEK293、CHO、酵母细胞、或其组合;Optionally, the host cell is selected from the group consisting of Escherichia coli, wheat germ cells, insect cells, SF9, Hela, HEK293, CHO, yeast cells, or a combination thereof; 任选地,所述多核苷酸包括与SEQ ID NO:16、40或46中的任一者具有至少70%、至少75%、至少80%或至少90%序列同一性的序列。Optionally, the polynucleotide comprises a sequence having at least 70%, at least 75%, at least 80% or at least 90% sequence identity to any one of SEQ ID NO:16, 40 or 46. 一种向导RNA(gRNA),其特征在于,所述向导RNA包括A guide RNA (gRNA), characterized in that the guide RNA comprises (i)能够结合权利要求1所述的Cas蛋白的同向重复(Direct Repeat,DR)序列和(i) capable of binding to the direct repeat (DR) sequence of the Cas protein described in claim 1 and (ii)能够靶向靶DNA的靶序列的间隔(spacer)序列,所述向导RNA被配置成与所述Cas蛋白形成复合物。(ii) a spacer sequence capable of targeting a target sequence of a target DNA, wherein the guide RNA is configured to form a complex with the Cas protein. 一种载体,其特征在于,包含权利要求9所述的多核苷酸。A vector, characterized in that it comprises the polynucleotide according to claim 9. 一种复合物,其特征在于,包含:A composite, characterized in that it comprises: (i)蛋白组分,选自下组:权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、或其组合;和(i) a protein component selected from the group consisting of the Cas protein of claim 1, the fusion protein of claim 5, or a combination thereof; and (ii)核酸组分,选自下组:权利要求9所述的向导RNA,编码权利要求9所述的向导RNA的核酸,权利要求9所述的向导RNA的前体RNA,编码权利要求9所述的向导RNA的前体RNA核酸、或其组合;所述蛋白组分与核酸组分相互结合形成复合物;(ii) a nucleic acid component selected from the group consisting of the guide RNA of claim 9, a nucleic acid encoding the guide RNA of claim 9, a precursor RNA of the guide RNA of claim 9, a precursor RNA nucleic acid encoding the guide RNA of claim 9, or a combination thereof; the protein component and the nucleic acid component are combined with each other to form a complex; 其中,所述向导RNA包括:Wherein, the guide RNA comprises: (iii)能够结合权利要求1所述的Cas蛋白的同向重复(Direct Repeat,DR)序列和(iii) capable of binding to the direct repeat (DR) sequence of the Cas protein described in claim 1 and (iv)能够靶向靶DNA的靶序列的间隔(spacer)序列。(iv) a spacer sequence capable of targeting a target sequence of a target DNA. 一种CRISPR-Cas组合物,其特征在于,包含:A CRISPR-Cas composition, comprising: (i)第一组分,选自下组:权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、编码权利要求1所述的Cas蛋白或权利要求5所述的融合蛋白的核苷酸序列,以及其任意组合;和(i) a first component selected from the group consisting of the Cas protein of claim 1, the fusion protein of claim 5, a nucleotide sequence encoding the Cas protein of claim 1 or the fusion protein of claim 5, and any combination thereof; and (ii)第二组分,所述第二组分为包含一种或多种权利要求9所述的向导RNA,或者编码所述包含一种或多种权利要求9所述的向导RNA的核苷酸序列;所述向导RNA包括:(ii) a second component, wherein the second component is a nucleotide sequence comprising one or more guide RNAs according to claim 9, or encoding the nucleotide sequence comprising one or more guide RNAs according to claim 9; the guide RNA comprises: (iii)能够结合权利要求1所述的Cas蛋白的同向重复(Direct Repeat,DR)序列和(iii) capable of binding to the direct repeat (DR) sequence of the Cas protein described in claim 1 and (iv)能够靶向靶DNA的靶序列的间隔(spacer)序列,所述向导RNA被配置成与所述Cas蛋白形成复合物;(iv) a spacer sequence capable of targeting a target sequence of a target DNA, wherein the guide RNA is configured to form a complex with the Cas protein; 任选地,所述组合物包括药物组合物;Optionally, the composition comprises a pharmaceutical composition; 任选地,所述组合物的剂型选自下组:冻干制剂、液体制剂、或其组合;Optionally, the dosage form of the composition is selected from the group consisting of a lyophilized preparation, a liquid preparation, or a combination thereof; 任选地,所述组合物的剂型为液体制剂;Optionally, the composition is in the form of a liquid preparation; 任选地,所述组合物的剂型为注射剂型;Optionally, the composition is in the form of an injection; 任选地,所述组合物为细胞制剂。Optionally, the composition is a cell preparation. 一种CRISPR-Cas系统,其特征在于,包含一种或多种载体,所述一种或多种载体包含:A CRISPR-Cas system, characterized in that it comprises one or more vectors, wherein the one or more vectors comprise: (i)第一核酸,其为编码权利要求1所述的Cas蛋白或权利要求5所述的融合蛋白的核苷酸序列;任选地所述第一核酸可操作地连接至第一调节元件;以及(i) a first nucleic acid, which is a nucleotide sequence encoding the Cas protein of claim 1 or the fusion protein of claim 5; optionally, the first nucleic acid is operably linked to a first regulatory element; and (ii)第二核酸,其编码权利要求9所述的向导RNA的核苷酸序列;(ii) a second nucleic acid encoding the nucleotide sequence of the guide RNA according to claim 9; 任选地,所述第二核酸可操作地连接至第二调节元件;Optionally, the second nucleic acid is operably linked to a second regulatory element; 所述向导RNA包含:The guide RNA comprises: (iii)能够结合权利要求1所述的Cas蛋白的同向重复(Direct Repeat,DR)序列和(iii) capable of binding to the direct repeat (DR) sequence of the Cas protein described in claim 1 and (iv)能够靶向靶DNA的靶序列的间隔(spacer)序列,所述向导RNA被配置成与所述Cas蛋白形成复合物;(iv) a spacer sequence capable of targeting a target sequence of a target DNA, wherein the guide RNA is configured to form a complex with the Cas protein; 其中,所述第一核酸与第二核酸存在于相同或不同的载体上。The first nucleic acid and the second nucleic acid are present on the same or different vectors. 如权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物或权利要求13所述的CRISPR-Cas系统,其特征在于,其中,The complex of claim 11, the CRISPR-Cas composition of claim 12, or the CRISPR-Cas system of claim 13, wherein: 间隔序列的长度大于17个核苷酸,优选地,为17至100个核苷酸,更优选16至50个核苷酸(例如,17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50个核苷酸),更优选17至50个核苷酸,更优选17至40个核苷酸,更优选18至39个核苷酸,最优选18至37个核苷酸;The length of the spacer sequence is greater than 17 nucleotides, preferably, 17 to 100 nucleotides, more preferably 16 to 50 nucleotides (e.g., 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides), more preferably 17 to 50 nucleotides, more preferably 17 to 40 nucleotides, more preferably 18 to 39 nucleotides, most preferably 18 to 37 nucleotides; 任选地,所述靶DNA选自双链DNA、单链DNA、RNA、基因组DNA和染色体外DNA;Optionally, the target DNA is selected from double-stranded DNA, single-stranded DNA, RNA, genomic DNA and extrachromosomal DNA; 任选地,所述间隔(spacer)序列连接至所述同向重复(Direct Repeat,DR)序列的3’端;Optionally, the spacer sequence is connected to the 3' end of the direct repeat (DR) sequence; 任选地,所述间隔(spacer)序列包含所述靶序列的互补序列;Optionally, the spacer sequence comprises a complementary sequence to the target sequence; 任选地,所述靶序列位于原间隔序列临近基序(PAM)的3'端,并且所述PAM为5'-TN,其中,N为A、T、G或C;Optionally, the target sequence is located at the 3' end of the protospacer adjacent motif (PAM), and the PAM is 5'-TN, wherein N is A, T, G or C; 任选地,所述靶序列是来自原核细胞或真核细胞的DNA或基于RNA反转录形成的DNA序列;或者,所述靶序列是非天然存在的DNA或基于RNA反转录形成的DNA序列;Optionally, the target sequence is a DNA from a prokaryotic cell or a eukaryotic cell, or a DNA sequence formed based on RNA reverse transcription; or, the target sequence is a non-naturally occurring DNA, or a DNA sequence formed based on RNA reverse transcription; 任选地,所述靶DNA位于细胞内或细胞外;Optionally, the target DNA is located inside or outside the cell; 任选地,所述细胞为真核细胞或原核细胞;Optionally, the cell is a eukaryotic cell or a prokaryotic cell; 任选地,所述细胞选自动物细胞、植物细胞、真菌细胞;Optionally, the cell is selected from animal cells, plant cells, fungal cells; 任选地,所述真核细胞是植物细胞、哺乳动物细胞、昆虫细胞、节肢动物细胞、真菌细胞、鸟细胞、爬行动物细胞、两栖动物细胞、无脊椎动物细胞、小鼠细胞、大鼠细胞、灵长类动物细胞、非人灵长类动物细胞或人细胞;Optionally, the eukaryotic cell is a plant cell, a mammalian cell, an insect cell, an arthropod cell, a fungal cell, a bird cell, a reptile cell, an amphibian cell, an invertebrate cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, or a human cell; 任选地,还包括DNA供体模板,所述DNA供体模板可以通过同源定向修复(HDR)而被插入到感兴趣的基因座处;Optionally, a DNA donor template is also included, which can be inserted into the locus of interest by homology-directed repair (HDR); 任选地,所述供体模板核酸具有8-1000个核苷酸的长度;Optionally, the donor template nucleic acid has a length of 8-1000 nucleotides; 任选地,所述供体模板核酸具有25-500个核苷酸的长度;Optionally, the donor template nucleic acid has a length of 25-500 nucleotides; 任选地,所述载体包括质粒、病毒载体;Optionally, the vector comprises a plasmid or a viral vector; 任选地,所述向导RNA包括未修饰和经修饰的向导RNA;Optionally, the guide RNA includes unmodified and modified guide RNA; 任选地,所述经修饰的向导RNA包括碱基的化学修饰;Optionally, the modified guide RNA includes chemical modifications of bases; 任选地,所述化学修饰包括甲基化修饰、甲氧基修饰、氟化修饰或硫代修饰;Optionally, the chemical modification includes methylation modification, methoxy modification, fluorination modification or thio modification; 任选地,所述第一调节元件和/或第二调节元件是启动子,例如诱导型启动子、组成型启动子、泛在性(ubiquitous)启动子、细胞类型特异性启动子或组织特异性启动子;Optionally, the first regulatory element and/or the second regulatory element is a promoter, such as an inducible promoter, a constitutive promoter, a ubiquitous promoter, a cell type specific promoter or a tissue specific promoter; 任选地,所述组合物中的至少一个组分是非天然存在的或经修饰的;Optionally, at least one component of the composition is non-naturally occurring or modified; 任选地,所述DR序列包含如下式IV所示的结构:
5’-R1a-Ba-R2a-L-R2b-Bb-R1b-3’  (IV),
Optionally, the DR sequence comprises a structure as shown in Formula IV below:
5'-R1a-Ba-R2a-L-R2b-Bb-R1b-3' (IV),
其中区段R1a和R1b是反向互补序列并形成第一茎(R1),所述第一茎(R1)具有在Cas蛋白中的多个(2个、或3个、或4个、或5个、或6个、或7个、或8个、或9个、或10个)核苷酸对;wherein segments R1a and R1b are reverse complementary sequences and form a first stem (R1) having a plurality (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) of nucleotide pairs in the Cas protein; 区段Ba和Bb不相互碱基配对,并形成凸起(B);Segments Ba and Bb do not base pair with each other and form a bulge (B); 区段R2a和R2b是反向互补序列并形成第二茎(R2),所述第二茎(R2)具有在多个(2个、或3个、或4个、或5个、或6个、或7个、或8个、或9个、或10个)碱基对;并且L为第二茎部处形成的、由多个(3个、4个、5个、6个、7个、8个、9个、10个)核苷酸形成的环;Segments R2a and R2b are reverse complementary sequences and form a second stem (R2) having a plurality of (2, or 3, or 4, or 5, or 6, or 7, or 8, or 9, or 10) base pairs; and L is a loop formed at the second stem portion and formed by a plurality of (3, 4, 5, 6, 7, 8, 9, 10) nucleotides; 任选地,所述DR序列具有与SEQ ID NO:2、4、6中任一项所示的DR序列的二级结构基本上相同的二级结构;Optionally, the DR sequence has a secondary structure substantially the same as the secondary structure of the DR sequence shown in any one of SEQ ID NOs: 2, 4, and 6; 任选地,所述DR序列与SEQ ID NO:2、4、6中任一项所示的DR序列相比具有不导致二级结构发生实质性差异的核苷酸添加、插入、缺失或置换;Optionally, the DR sequence has nucleotide additions, insertions, deletions or substitutions that do not result in substantial differences in secondary structure compared to the DR sequence shown in any one of SEQ ID NOs: 2, 4, and 6; 任选地,所述DR序列包含选自下列的序列,或由选自下列的序列组成:Optionally, the DR sequence comprises a sequence selected from the following, or consists of a sequence selected from the following: (i)SEQ ID NO:2、4、6中任一所示的序列;(i) any one of SEQ ID NOs: 2, 4, and 6; (ii)与SEQ ID NO:2、4、6中任一所示的序列相比具有一个或多个碱基的置换、缺失或添加(例如1个,2个,3个,4个,5个,6个,7个,8个,9个或10个碱基的置换、缺失或添加)的序列;(ii) a sequence having one or more base substitutions, deletions or additions (e.g., substitutions, deletions or additions of 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 bases) compared to the sequence shown in any one of SEQ ID NOs: 2, 4, 6; (iii)与SEQ ID NO:2、4、6中任一所示的序列具有至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少95%的序列同一性的序列;(iii) a sequence having at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% sequence identity to any of SEQ ID NOs: 2, 4, and 6; (iv)在严格条件下与(i)-(iii)任一项中所述的序列杂交的序列;或(iv) a sequence that hybridizes to the sequence described in any one of (i) to (iii) under stringent conditions; or (v)(i)-(iii)任一项中所述的序列的互补序列;(v) a complementary sequence of the sequence described in any one of (i) to (iii); 并且,(ii)-(v)中任一项所述的序列基本保留了其所源自的序列的生物学功能。Furthermore, the sequence described in any one of (ii) to (v) substantially retains the biological function of the sequence from which it is derived.
一种试剂盒,其特征在于,包括一种或多种选自下列的组分:权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求8所述的多核苷酸、权利要求10所述的载体、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、或权利要求13所述的CRISPR-Cas系统;A kit, characterized in that it comprises one or more components selected from the following: the Cas protein of claim 1, the fusion protein of claim 5, the polynucleotide of claim 8, the vector of claim 10, the complex of claim 11, the CRISPR-Cas composition of claim 12, or the CRISPR-Cas system of claim 13; 任选地,所述试剂盒还包括标签或说明书;Optionally, the kit further comprises a label or instructions; 任选地,所述试剂盒用于基因或基因组编辑、疾病治疗、靶向靶基因、切割目的基因或非目的基因的一种或多种。Optionally, the kit is used for one or more of gene or genome editing, disease treatment, targeting a target gene, and cutting a target gene or a non-target gene. 一种递送组合物,其特征在于,包含递送载体或递送介质,以及选自下列的一种或多种:权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求8所述的多核苷酸、权利要求10所述的载体、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、或权利要求13所述的CRISPR-Cas系统;A delivery composition, characterized in that it comprises a delivery vector or a delivery medium, and one or more selected from the following: the Cas protein of claim 1, the fusion protein of claim 5, the polynucleotide of claim 8, the vector of claim 10, the complex of claim 11, the CRISPR-Cas composition of claim 12, or the CRISPR-Cas system of claim 13; 任选地,所述递送载体是粒子;Optionally, the delivery vehicle is a particle; 任选地,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、微泡、基因枪或病毒载体(例如,复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒);Optionally, the delivery vehicle is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, microvesicles, gene guns or viral vectors (e.g., replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses); 任选地,所述的递送介质包括纳米颗粒、脂质体、外泌体、微囊泡、电转设备或基因枪。Optionally, the delivery medium comprises nanoparticles, liposomes, exosomes, microvesicles, an electroporation device or a gene gun. 一种宿主细胞,包含权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求8所述的多核苷酸、权利要求10所述的载体、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统或权利要求16所述的递送组合物;A host cell comprising the Cas protein of claim 1, the fusion protein of claim 5, the polynucleotide of claim 8, the vector of claim 10, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, or the delivery composition of claim 16; 任选地,所述的宿主细胞为真核细胞,如酵母细胞、植物细胞或哺乳动物细胞(包括人和非人哺乳动物);Optionally, the host cell is a eukaryotic cell, such as a yeast cell, a plant cell or a mammalian cell (including human and non-human mammals); 任选地,所述的宿主细胞为原核细胞,如大肠杆菌;Optionally, the host cell is a prokaryotic cell, such as Escherichia coli; 任选地,所述酵母细胞选自下组的一种或多种来源的酵母:毕氏酵母、克鲁维酵母、或其组合;较佳地,所述的酵母细胞包括:克鲁维酵母,更佳地为马克斯克鲁维酵母、和/或乳酸克鲁维酵母;Optionally, the yeast cell is selected from yeast of one or more sources of the following group: Pichia pastoris, Kluyveromyces, or a combination thereof; preferably, the yeast cell comprises: Kluyveromyces, more preferably Kluyveromyces marxianus, and/or Kluyveromyces lactis; 任选地,所述宿主细胞选自下组:大肠杆菌、麦胚细胞,昆虫细胞,SF9、Hela、HEK293、CHO、酵母细胞、或其组合。Optionally, the host cell is selected from the group consisting of Escherichia coli, wheat germ cells, insect cells, SF9, Hela, HEK293, CHO, yeast cells, or a combination thereof. 一种酶制剂,其特征在于,所述酶制剂包括权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统或权利要求16所述的递送组合物;An enzyme preparation, characterized in that the enzyme preparation comprises the Cas protein of claim 1, the fusion protein of claim 5, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13 or the delivery composition of claim 16; 任选地,所述的酶制剂包括注射剂、和/或冻干制剂。Optionally, the enzyme preparation includes an injection and/or a lyophilized preparation. 一种药盒,其特征在于,包括:A medicine box, characterized in that it comprises: 第一容器,以及位于所述第一容器中的权利要求12所述的CRISPR-Cas组合物或权利要求13所述的CRISPR-Cas系统,或含有权利要求12所述的CRISPR-Cas组合物或权利要求13所述的CRISPR-Cas系统;A first container, and the CRISPR-Cas composition of claim 12 or the CRISPR-Cas system of claim 13 located in the first container, or containing the CRISPR-Cas composition of claim 12 or the CRISPR-Cas system of claim 13; 任选地,所述第一容器中包含权利要求12所述的CRISPR-Cas组合物或权利要求13所述的CRISPR-Cas系统;Optionally, the first container comprises the CRISPR-Cas composition of claim 12 or the CRISPR-Cas system of claim 13; 任选地,所述的组合物为药物组合物;Optionally, the composition is a pharmaceutical composition; 任选地,所述药物组合物的剂型选自下组:冻干制剂、液体制剂、或其组合;Optionally, the dosage form of the pharmaceutical composition is selected from the group consisting of a lyophilized preparation, a liquid preparation, or a combination thereof; 任选地,所述药物组合物的剂型为口服剂型或注射剂型;Optionally, the pharmaceutical composition is in an oral dosage form or an injectable dosage form; 任选地,所述的药盒还含有说明书。Optionally, the kit further comprises instructions. 一种药盒,其特征在于,包括:A medicine box, characterized in that it comprises: (a1)第一容器,以及位于所述第一容器中的权利要求1所述的Cas蛋白、或权利要求5所述的融合蛋白、或其编码基因或其表达载体,或含有权利要求1所述的Cas蛋白、或权利要求5所述的融合蛋白、或其编码基因或其表达载体的药物;(a1) a first container, and the Cas protein according to claim 1, or the fusion protein according to claim 5, or a gene encoding the Cas protein, or an expression vector thereof, or a drug containing the Cas protein according to claim 1, or the fusion protein according to claim 5, or a gene encoding the Cas protein, or an expression vector thereof, located in the first container; (b1)任选的第二容器,以及位于所述第二容器中的权利要求9所述的向导RNA或其表达载体,或含有权利要求9所述的向导RNA或其表达载体的药物;(b1) an optional second container, and the guide RNA or its expression vector according to claim 9, or a drug containing the guide RNA or its expression vector according to claim 9, located in the second container; 任选地,所述的第一容器和第二容器为不同的容器;Optionally, the first container and the second container are different containers; 任选地,所述的第一容器的药物是含有权利要求1的Cas蛋白、或权利要求5的融合蛋白、或其编码基因或其表达载体的单方制剂;Optionally, the drug in the first container is a single preparation containing the Cas protein of claim 1, or the fusion protein of claim 5, or its encoding gene or its expression vector; 任选地,所述的第二容器的药物是含有权利要求9所述的向导RNA或其表达载体的单方制剂;Optionally, the drug in the second container is a single preparation containing the guide RNA or its expression vector as claimed in claim 9; 任选地,所述药物的剂型选自下组:冻干制剂、液体制剂、或其组合;Optionally, the dosage form of the drug is selected from the group consisting of a lyophilized preparation, a liquid preparation, or a combination thereof; 任选地,所述药物的剂型为口服剂型或注射剂型;Optionally, the drug is in an oral dosage form or an injectable dosage form; 任选地,所述的药盒还含有说明书。Optionally, the kit further comprises instructions. 一种靶向和编辑靶基因或切割靶基因的方法,其特征在于,包括:将权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统、权利要求16所述的递送组合物、权利要求18所述的酶制剂或权利要求19所述的药盒与所述靶基因接触,或者递送至包含所述靶基因的细胞中,靶序列存在于所述靶基因中;A method for targeting and editing a target gene or cutting a target gene, comprising: contacting the Cas protein of claim 1, the fusion protein of claim 5, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, the delivery composition of claim 16, the enzyme preparation of claim 18 or the kit of claim 19 with the target gene, or delivering the same to a cell containing the target gene, wherein the target sequence is present in the target gene; 任选地,所述靶基因存在于细胞内;Optionally, the target gene is present in a cell; 任选地,所述细胞是原核细胞;Optionally, the cell is a prokaryotic cell; 任选地,所述细胞是真核细胞,例如哺乳动物细胞(例如人类细胞)或植物细胞;Optionally, the cell is a eukaryotic cell, such as a mammalian cell (eg, a human cell) or a plant cell; 任选地,所述靶基因存在于体外的核酸分子(例如,质粒)中;Optionally, the target gene is present in a nucleic acid molecule (e.g., a plasmid) in vitro; 任选地,所述编辑靶基因或切割靶基因包括靶序列的断裂,如DNA的双链断裂或RNA的单链断裂,或将外源核酸插入所述断裂中;Optionally, the editing of the target gene or the cleavage of the target gene comprises a break in the target sequence, such as a double-strand break in DNA or a single-strand break in RNA, or inserting an exogenous nucleic acid into the break; 任选地,所述靶基因包括DNA;Optionally, the target gene comprises DNA; 任选地,所述DNA包括单链DNA、双链DNA;Optionally, the DNA includes single-stranded DNA and double-stranded DNA; 任选地,所述方法为非诊断和非治疗的方法。Optionally, the method is a non-diagnostic and non-therapeutic method. 一种诱导细胞状态改变的方法,其特征在于,所述方法包括将权利要求1所述的Cas蛋白、或权利要求5所述的融合蛋白、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统、权利要求16所述的递送组合物、权利要求18所述的酶制剂或权利要求19所述的药盒与细胞中的靶基因接触。A method for inducing a change in a cell state, characterized in that the method comprises contacting the Cas protein of claim 1, or the fusion protein of claim 5, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, the delivery composition of claim 16, the enzyme preparation of claim 18 or the drug kit of claim 19 with a target gene in a cell. 一种改变基因产物的表达的方法,其特征在于,包括:将权利要求1所述的Cas蛋白、或权利要求5所述的融合蛋白、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统、权利要求16所述的递送组合物、权利要求18所述的酶制剂或权利要求19所述的药盒与编码所述基因产物的核酸分子接触,或者递送至包含所述核酸分子的细胞中,所述靶序列存在于所述核酸分子中;A method for changing the expression of a gene product, comprising: contacting the Cas protein of claim 1, or the fusion protein of claim 5, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, the delivery composition of claim 16, the enzyme preparation of claim 18, or the kit of claim 19 with a nucleic acid molecule encoding the gene product, or delivering it to a cell containing the nucleic acid molecule, wherein the target sequence is present in the nucleic acid molecule; 任选地,所述核酸分子存在于体外的核酸分子(例如,质粒)中;Optionally, the nucleic acid molecule is present in a nucleic acid molecule (e.g., a plasmid) in vitro; 任选地,所述基因产物的表达被改变(例如,增强或降低);Optionally, expression of the gene product is altered (e.g., increased or decreased); 任选地,所述基因产物是蛋白;Optionally, the gene product is a protein; 任选地,所述的蛋白、融合蛋白、多核苷酸、分离的核酸分子、复合物、载体或组合物包含于递送载体中;Optionally, the protein, fusion protein, polynucleotide, isolated nucleic acid molecule, complex, vector or composition is contained in a delivery vehicle; 任选地,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、病毒载体(如复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒);Optionally, the delivery vehicle is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, viral vectors (such as replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses); 任选地,用于改变靶基因或编码靶基因产物的核酸分子中的一个或多个靶序列来修饰细胞、细胞系或生物体。Optionally, the cell, cell line or organism is modified by altering one or more target sequences in a target gene or a nucleic acid molecule encoding a target gene product. 一种由权利要求21至权利要求23任一所述的方法获得的细胞或其子代,其中所述细胞包含在其野生型中不存在的修饰。A cell or progeny thereof obtained by the method of any one of claims 21 to 23, wherein the cell comprises a modification not present in its wild type. 权利要求24所述的细胞或其子代的细胞产物。A cell product of the cell of claim 24 or its progeny. 一种体外的、离体的或体内的细胞或细胞系或它们的子代,其特征在于,所述细胞或细胞系或它们的子代包含:权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求8所述的多核苷酸、权利要求10所述的载体、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统或权利要求16所述的递送组合物;An in vitro, ex vivo or in vivo cell or cell line or their progeny, characterized in that the cell or cell line or their progeny comprises: the Cas protein of claim 1, the fusion protein of claim 5, the polynucleotide of claim 8, the vector of claim 10, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13 or the delivery composition of claim 16; 任选地,所述细胞是原核细胞;Optionally, the cell is a prokaryotic cell; 任选地,所述细胞是真核细胞,例如哺乳动物细胞(例如人类细胞)或植物细胞;Optionally, the cell is a eukaryotic cell, such as a mammalian cell (eg, a human cell) or a plant cell; 任选地,所述细胞是干细胞或干细胞系。Optionally, the cell is a stem cell or a stem cell line. 一种细胞制剂,其特征在于,包括权利要求17所述的宿主细胞或权利要求24所述的细胞或其子代、或权利要求25所述的细胞或其子代的细胞产物、或权利要求26所述的细胞或细胞系或它们的子代;A cell preparation, characterized in that it comprises the host cell according to claim 17, or the cell or its progeny according to claim 24, or the cell product of the cell or its progeny according to claim 25, or the cell or cell line or their progeny according to claim 26; 任选地,所述细胞制剂还包括药学上可接受的载体或赋形剂;Optionally, the cell preparation further comprises a pharmaceutically acceptable carrier or excipient; 任选地,所述细胞制剂包括注射剂、和/或冻干制剂。Optionally, the cell preparation includes an injection and/or a lyophilized preparation. 权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求8所述的多核苷酸、权利要求10所述的载体、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统、权利要求15所述的试剂盒、权利要求16所述的递送组合物、权利要求18所述的酶制剂或权利要求19所述的药盒的用途,其特征在于,用于制备药物或制剂,所述药物或制剂用于核酸编辑;Use of the Cas protein of claim 1, the fusion protein of claim 5, the polynucleotide of claim 8, the vector of claim 10, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, the kit of claim 15, the delivery composition of claim 16, the enzyme preparation of claim 18 or the kit of claim 19, characterized in that the drug or preparation is used for nucleic acid editing; 任选地,所述核酸编辑包括基因或基因组编辑;Optionally, the nucleic acid editing comprises gene or genome editing; 任选地,所述基因或基因组编辑包括修饰基因、敲除基因、改变基因产物的表达、修复突变、和/或插入多核苷酸。Optionally, the gene or genome editing comprises modifying a gene, knocking out a gene, altering the expression of a gene product, repairing a mutation, and/or inserting a polynucleotide. 权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求8所述的多核苷酸、权利要求10所述的载体、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统、权利要求15所述的试剂盒、权利要求16所述的递送组合物、权利要求18所述的酶制剂或权利要求19所述的药盒的用途,其特征在于,用于制备药物或制剂,所述药物或制剂用于选自下组的一种或多种:Use of the Cas protein of claim 1, the fusion protein of claim 5, the polynucleotide of claim 8, the vector of claim 10, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, the kit of claim 15, the delivery composition of claim 16, the enzyme preparation of claim 18 or the kit of claim 19, characterized in that it is used for preparing a drug or preparation, wherein the drug or preparation is used for one or more selected from the following group: (i)离体基因或基因组编辑;(i) ex vivo gene or genome editing; (ii)离体单链DNA的检测;(ii) Detection of single-stranded DNA in vitro; (iii)编辑靶基因座中的靶序列来修饰生物或非人类生物;(iii) editing a target sequence in a target locus to modify an organism or non-human organism; (iv)治疗由靶基因座中的靶序列的缺陷引起的病症;(iv) treating a disorder caused by a defect in the target sequence in the target locus; (v)治疗有需要的受试者的病症或疾病。(v) treating a condition or disease in a subject in need thereof. 一种治疗有需要的受试者的病症或疾病的方法,其特征在于,包括向所述受试者施用权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统、权利要求15所述的试剂盒、权利要求16所述的递送组合物、权利要求18所述的酶制剂或权利要求19所述的药盒。A method for treating a condition or disease in a subject in need thereof, comprising administering to the subject the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, the kit of claim 15, the delivery composition of claim 16, the enzyme preparation of claim 18, or the kit of claim 19. 根据权利要求30所述的方法,所述病症或疾病包括代谢性疾病、癌症、神经性疾病、眼科疾病和传染性疾病;The method of claim 30, wherein the condition or disease comprises a metabolic disease, cancer, a neurological disease, an ophthalmic disease, and an infectious disease; 任选地,所述病症或疾病包括遗传性疾病;Optionally, the condition or disease comprises a genetic disease; 任选地,所述病症或疾病是由致病性点突变引起;Optionally, the condition or disease is caused by a pathogenic point mutation; 任选地,所述病症或疾病包括高胆固醇血症(FH)、动脉粥样硬化(ASCVD)、Optionally, the condition or disease comprises hypercholesterolemia (FH), atherosclerosis (ASCVD), 转甲状腺素蛋白淀粉样变(ATTR)、Alpha-1抗胰蛋白酶缺乏症(AATD)、原发性高草酸尿症(PH1)、遗传性血管性水肿(HAE)和乙型肝炎(Hepatitis B);Transthyretin amyloidosis (ATTR), alpha-1 antitrypsin deficiency (AATD), primary hyperoxaluria (PH1), hereditary angioedema (HAE), and hepatitis B; 任选地,所述疾病或疾病由致病性点突变引起的疾病。Optionally, the disease or disorder is a disease caused by a pathogenic point mutation. 一种检测样品中是否存在靶标核酸分子的方法,其特征在于,所述方法包括将样品与权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、或权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统、权利要求15所述的试剂盒、权利要求16所述的递送组合物或权利要求18所述的酶制剂和非靶序列接触,检测非靶序列被切割产生的可检测信号,从而检测靶标核酸分子,所述非靶序列不与向导RNA杂交;A method for detecting whether a target nucleic acid molecule is present in a sample, characterized in that the method comprises contacting the sample with the Cas protein of claim 1, the fusion protein of claim 5, or the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, the kit of claim 15, the delivery composition of claim 16 or the enzyme preparation of claim 18 and a non-target sequence, detecting a detectable signal generated by cleavage of the non-target sequence, thereby detecting the target nucleic acid molecule, wherein the non-target sequence does not hybridize with the guide RNA; 任选地,所述非靶序列被复合物或CRISPR-Cas组合物或系统或递送组合物中的蛋白切割,则表示所述样本中存在靶标核酸分子;而所述非靶序列不被复合物或CRISPR-Cas组合物或系统或递送组合物中的蛋白切割,则表示所述样本中不存在靶标核酸分子;Optionally, the non-target sequence is cleaved by a protein in the complex or CRISPR-Cas composition or system or delivery composition, indicating the presence of a target nucleic acid molecule in the sample; and the non-target sequence is not cleaved by a protein in the complex or CRISPR-Cas composition or system or delivery composition, indicating the absence of a target nucleic acid molecule in the sample; 任选地,所述靶标核酸分子为靶标DNA;Optionally, the target nucleic acid molecule is a target DNA; 任选地,所述的靶标DNA包括基于RNA反转录形成的DNA;Optionally, the target DNA includes DNA formed based on RNA reverse transcription; 任选地,所述的靶标DNA包括cDNA;Optionally, the target DNA comprises cDNA; 任选地,所述的靶标DNA选自下组:单链DNA、双链DNA、或其组合。Optionally, the target DNA is selected from the group consisting of single-stranded DNA, double-stranded DNA, or a combination thereof. 一种无菌容器,其特征在于,其包含权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求8所述的多核苷酸、权利要求10所述的载体、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统、权利要求16所述的递送组合物或权利要求18所述的酶制剂;A sterile container, characterized in that it contains the Cas protein of claim 1, the fusion protein of claim 5, the polynucleotide of claim 8, the vector of claim 10, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, the delivery composition of claim 16 or the enzyme preparation of claim 18; 任选地,所述无菌容器是注射器。Optionally, the sterile container is a syringe. 一种可植入装置,其特征在于,其包含权利要求1所述的Cas蛋白、权利要求5所述的融合蛋白、权利要求8所述的多核苷酸、权利要求10所述的载体、权利要求11所述的复合物、权利要求12所述的CRISPR-Cas组合物、权利要求13所述的CRISPR-Cas系统、权利要求16所述的递送组合物或权利要求18所述的酶制剂;An implantable device, characterized in that it comprises the Cas protein of claim 1, the fusion protein of claim 5, the polynucleotide of claim 8, the vector of claim 10, the complex of claim 11, the CRISPR-Cas composition of claim 12, the CRISPR-Cas system of claim 13, the delivery composition of claim 16 or the enzyme preparation of claim 18; 任选地,所述的Cas蛋白、所述的融合蛋白、所述的多核苷酸、所述的复合物、所述的载体、所述的CRISPR-Cas组合物或所述的系统或所述的递送组合物或所述的酶制剂存储在储库中。Optionally, the Cas protein, the fusion protein, the polynucleotide, the complex, the vector, the CRISPR-Cas composition or the system or the delivery composition or the enzyme preparation is stored in a reservoir.
PCT/CN2024/137580 2023-12-06 2024-12-06 Cas protein, crispr-cas system containing cas protein, and use of cas protein Pending WO2025119363A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202311664660.5A CN120330162A (en) 2023-12-06 2023-12-06 A Cas protein, a CRISPR-Cas system containing the same and applications thereof
CN202311664660.5 2023-12-06

Publications (1)

Publication Number Publication Date
WO2025119363A1 true WO2025119363A1 (en) 2025-06-12

Family

ID=95980608

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/137580 Pending WO2025119363A1 (en) 2023-12-06 2024-12-06 Cas protein, crispr-cas system containing cas protein, and use of cas protein

Country Status (2)

Country Link
CN (1) CN120330162A (en)
WO (1) WO2025119363A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022120520A1 (en) * 2020-12-07 2022-06-16 Institute Of Zoology, Chinese Academy Of Sciences Engineered cas effector proteins and methods of use thereof
CN116254246A (en) * 2021-12-09 2023-06-13 北京干细胞与再生医学研究院 Engineered CAS12B effector proteins and methods of use thereof
US20230323322A1 (en) * 2020-08-25 2023-10-12 Institute Of Zoology, Chinese Academy Of Sciences Split cas12 systems and methods of use thereof
US20230383271A1 (en) * 2022-02-28 2023-11-30 Pairwise Plants Services, Inc. Engineered proteins and methods of use thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230323322A1 (en) * 2020-08-25 2023-10-12 Institute Of Zoology, Chinese Academy Of Sciences Split cas12 systems and methods of use thereof
CN117120602A (en) * 2020-08-25 2023-11-24 中国科学院动物研究所 Split CAS12 system and method of use
WO2022120520A1 (en) * 2020-12-07 2022-06-16 Institute Of Zoology, Chinese Academy Of Sciences Engineered cas effector proteins and methods of use thereof
CN116254246A (en) * 2021-12-09 2023-06-13 北京干细胞与再生医学研究院 Engineered CAS12B effector proteins and methods of use thereof
US20230383271A1 (en) * 2022-02-28 2023-11-30 Pairwise Plants Services, Inc. Engineered proteins and methods of use thereof

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"China Master’s Theses Full-text Database", 12 July 2021, article LUO DIYIN: "Structural insights into CRISPR-Cas12i1 effector guided DNA cleavage", XP093321624, DOI: 10.27019/d.cnki.gfjsu.2021.002061 *
DAS ANUSKA; GOSWAMI HEMANT N.; WHYMS CHARLISA T.; SRIDHARA SAGAR; LI HONG: "Structural principles of CRISPR-Cas enzymes used in nucleic acid detection", JOURNAL OF STRUCTURAL BIOLOGY, ACADEMIC PRESS, UNITED STATES, vol. 214, no. 1, 2 February 2022 (2022-02-02), United States , XP086982827, ISSN: 1047-8477, DOI: 10.1016/j.jsb.2022.107838 *
DATABASE PROTEIN 5 April 2020 (2020-04-05), XP093321627, Database accession no. NJL70927.1 *
YANG HUI; GAO PU; RAJASHANKAR KANAGALAGHATTA R.; PATEL DINSHAW J.: "PAM-Dependent Target DNA Recognition and Cleavage by C2c1 CRISPR-Cas Endonuclease", CELL, ELSEVIER, AMSTERDAM NL, vol. 167, no. 7, 15 December 2016 (2016-12-15), Amsterdam NL , pages 1814, XP029850724, ISSN: 0092-8674, DOI: 10.1016/j.cell.2016.11.053 *
ZHANG BO, LUO DIYIN, LI YU, PERČULIJA VANJA, CHEN JING, LIN JINYING, YE YANGMIAO, OUYANG SONGYING: "Mechanistic insights into the R-loop formation and cleavage in CRISPR-Cas12i1", NATURE COMMUNICATIONS, vol. 12, no. 1, 9 June 2021 (2021-06-09), XP055968167, DOI: 10.1038/s41467-021-23876-5 *

Also Published As

Publication number Publication date
CN120330162A (en) 2025-07-18

Similar Documents

Publication Publication Date Title
US12435320B2 (en) CRISPR having or associated with destabilization domains
JP7280312B2 (en) Novel CRISPR enzymes and systems
EP4085141A1 (en) Genome editing using reverse transcriptase enabled and fully active crispr complexes
AU2019406778A1 (en) Crispr-associated transposase systems and methods of use thereof
CN111727247A (en) Systems, methods and compositions for targeted nucleic acid editing
CN116096879A (en) RNA-guided nucleases and active fragments and variants thereof and methods of use
WO2017106657A1 (en) Novel crispr enzymes and systems
WO2016094867A1 (en) Protected guide rnas (pgrnas)
EP3230452A1 (en) Dead guides for crispr transcription factors
EP4168540A2 (en) Crispr-associated transposase systems and methods of use thereof
WO2022247873A1 (en) Engineered cas12i nuclease, effector protein and use thereof
JP2023531384A (en) Novel OMNI-59, 61, 67, 76, 79, 80, 81 and 82 CRISPR Nucleases
EP4225928A1 (en) Helitron mediated genetic modification
WO2025119363A1 (en) Cas protein, crispr-cas system containing cas protein, and use of cas protein
JP2025528195A (en) RNA-guided nucleases and active fragments and variants thereof and methods of use
TW202536173A (en) Rna-guided nucleases and active fragments and variants thereof and methods of use

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24899996

Country of ref document: EP

Kind code of ref document: A1