WO2017215619A1 - Fusion protein producing point mutation in cell, and preparation and use thereof - Google Patents
Fusion protein producing point mutation in cell, and preparation and use thereof Download PDFInfo
- Publication number
- WO2017215619A1 WO2017215619A1 PCT/CN2017/088369 CN2017088369W WO2017215619A1 WO 2017215619 A1 WO2017215619 A1 WO 2017215619A1 CN 2017088369 W CN2017088369 W CN 2017088369W WO 2017215619 A1 WO2017215619 A1 WO 2017215619A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- amino acid
- seq
- fusion protein
- sequence
- protein
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K19/00—Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/85—Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/78—Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y305/00—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
- C12Y305/04—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
- C12Y305/04001—Cytosine deaminase (3.5.4.1)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y306/00—Hydrolases acting on acid anhydrides (3.6)
- C12Y306/04—Hydrolases acting on acid anhydrides (3.6) acting on acid anhydrides; involved in cellular and subcellular movement (3.6.4)
- C12Y306/04012—DNA helicase (3.6.4.12)
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/01—Fusion polypeptide containing a localisation/targetting motif
- C07K2319/02—Fusion polypeptide containing a localisation/targetting motif containing a signal sequence
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/01—Fusion polypeptide containing a localisation/targetting motif
- C07K2319/09—Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/20—Fusion polypeptide containing a tag with affinity for a non-protein ligand
- C07K2319/21—Fusion polypeptide containing a tag with affinity for a non-protein ligand containing a His-tag
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/20—Fusion polypeptide containing a tag with affinity for a non-protein ligand
- C07K2319/22—Fusion polypeptide containing a tag with affinity for a non-protein ligand containing a Strep-tag
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/40—Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation
- C07K2319/41—Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation containing a Myc-tag
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/40—Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation
- C07K2319/42—Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation containing a HA(hemagglutinin)-tag
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/40—Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation
- C07K2319/43—Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation containing a FLAG-tag
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2800/00—Nucleic acids vectors
- C12N2800/10—Plasmid DNA
- C12N2800/106—Plasmid DNA for vertebrates
- C12N2800/107—Plasmid DNA for vertebrates for mammalian
Definitions
- the present invention relates to fusion proteins which produce point mutations in cells, their preparation and use.
- genotype There is a close relationship between genotype and phenotype.
- spontaneous mutations cause genotypic changes that produce multiple phenotypes.
- mutations are still made to diversify genes, produce a variety of phenotypes, and thus screen out functional mutants, study the relationship between genes and functions, and obtain more functional proteins.
- the frequency of spontaneous mutations is extremely low.
- the spontaneous mutation rate of human genome is 5.0 ⁇ 10 -10
- the spontaneous mutation rate of mouse genome is 1.8 ⁇ 10 -10
- the spontaneous mutation rate of E. coli genome is 5.4 ⁇ 10 -10
- the spontaneous mutation rate of HIV is 3 ⁇ 10 -5
- the spontaneous mutation frequency of the organism increases with the decrease of the biological genome [Holmes E C.
- In vivo point mutation method 1. Physical method: ultraviolet radiation, mutation frequency is 1 ⁇ 10 -10 [Packer M S, Liu D R. Methods for the directed evolution of proteins [J]. Nature Reviews Genetics, 2015]. 2. Chemical method: ENU is an alkylating agent that transfers ethyl groups to the oxygen and nitrogen atoms of DNA, causing mismatches, base substitutions or deletions, with a mutation frequency of 1-1.5 ⁇ 10 -5 [FILBY.ZEBRAFISH :METHODS AND PROTOCOLS.METHODS IN MOLECULAR BIOLOGY ⁇ By GJLieschke, AC Oates and K.
- B cells in the germinal center can produce multi-component antibodies by high-frequency mutation of somatic cells to resist the invasion of pathogens [Odegard VH, Schatz D G. Targeting of somatic hypermutation. [J]. Nature Reviews Immunology, 2006, 6(8): 573-583].
- High-frequency mutations in somatic cells refer to non-template point mutations in the immunoglobulin heavy light chain variable region, which are associated with B cell affinity maturation [Odegard V H et al., supra].
- the enzyme that mediates this process is activation-induced cytosine deaminase (AID).
- AID is a cytosine deaminase belonging to the APOBEC family, an RNA editing enzyme family: N-terminal nuclear localization signal, C-terminal nuclear export signal, and its catalytic domain is shared by APOBEC family [Zhenming X, Hong Z , Pone EJ, et al. Immunoglobulin class-switch DNA recombination: induction, targeting and beyond. [J]. Nature Reviews Immunology, 2012, 12(7): 517-31]. It is generally believed that the N-terminal structure is necessary for SHM. The expression of AID is restricted to the B cells of the germinal center, and its function of point mutation is conditional. It must act on single-stranded DNA and has sequence preference.
- the hotspot domain is RGYW [Kiyotsugu Y, Il-Mi O, Tomonori E, et al. AID Enzyme-Induced Hypermutation in an Actively Transcribed Gene in Fibroblasts [J]. Science, 2002, 296 (5575): 2033-2036].
- R stands for A/G
- Y stands for C/T
- W stands for A/T. It can be seen that the function of AID is related to the primary structure of DNA. First, the deamination of cytosine on single-stranded DNA is changed to U to form a UG mismatch. If UG is not repaired, a CT GA conversion mutation will be formed during DNA replication.
- U can be excised by UNG (uracil DNA glycosidase) to form a pyrimidine-free site, and four bases are randomly incorporated [Odegard V H et al., supra].
- UNG uracil DNA glycosidase
- the point mutations produced by the above process are significant for somatic high frequency mutations and can produce diverse antibodies.
- the frequency of point mutations caused in vivo is 1 ⁇ 10 -4 -1 ⁇ 10 -3 , and the sites are random [Masatoshi A, Nesreen H, Andre S, et al.Accumulation of the FACT complex, as well as Histone H3.3, serves as a target marker for somatic hypermutation. [J]. Proceedings of the National Academy of Sciences of the United States of America, 2013, 110 (19): 7784-7789], still unable to meet the experimental screening mutation Required for the body.
- the first aspect herein provides a fusion protein comprising a Cas enzyme having a cytosine deaminase and a nuclease activity loss, retaining an understanding of the chymase activity.
- the fusion protein is formed by a Casase that lacks cytosine deaminase and nuclease activity, retains knowledge of the chymase activity.
- the Cas enzyme is selected from the group consisting of: Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, homologs thereof or modified forms thereof.
- the nuclease activity of the Cas enzyme is partially deleted such that the Cas enzyme only causes DNA single strand breaks; or the nuclease activity of the Cas enzyme is all deleted, causing DNA double stranding fracture.
- the Cas enzyme is a Cas9 enzyme selected from the group consisting of: Cas9 (SpCas9) from S. pyogenes, Cas9 (SaCas9) from S. aureus, and Cas9 from S. thermophilus ( St1Cas9).
- the Cas enzyme is a Cas9 enzyme
- the two endonuclease catalytic domains of the enzyme are mutated in RuvC1 and/or HNH, resulting in loss of nuclease activity of the enzyme .
- both the RuvC1 and HNH of the Cas9 enzyme are mutated, resulting in a loss of the nuclease activity of the enzyme, retaining an understanding of the chymase activity.
- the 10th amino acid asparagine of the Cas9 enzyme is mutated to alanine or other amino acid
- the amino acid histidine at position 841 is mutated to alanine or other amino acid.
- the amino acid sequence of the Cas9 enzyme is set forth in SEQ ID NO: 2, pp. 42-1452, or as shown in SEQ ID NO: 72, amino acid residues 42-1419.
- the cytosine deaminase is a full length cytosine deaminase or a fragment thereof, wherein the fragment comprises at least an NLS domain, a catalytic domain, and an APOBEC-like cytosine deaminase Domain.
- the cytosine deaminase undergoes a substitution mutation at amino acid residues 10, 82, and 156.
- the substitution mutations are K10E, T82I, and E156G.
- the fragment comprises at least amino acid residues 9-182 of the AID, eg, at least amino acid residues 1-182 of the AID.
- the amino acid sequence of the cytosine deaminase is as shown in amino acids 1457-1654 of SEQ ID NO: 2, or as amino acid residues 1447-1629 of SEQ ID NO: 68 Show.
- the fragment comprises at least amino acid residues 1465-1638 of SEQ ID NO: 2, eg, at least amino acid residues 1457-1638 of SEQ ID NO: 2.
- the fragment consists of amino acid residues 1-182, consists of amino acid residues 1-164, or consists of amino acid residues 1-109.
- the fusion protein further comprises one or more of the following sequences: A head, a nuclear localization sequence, and an amino acid residue or amino acid sequence introduced for the purpose of constructing a fusion protein, promoting expression of a recombinant protein, obtaining a recombinant protein that is automatically secreted outside the host cell, or facilitating purification of the recombinant protein.
- the amino acid sequence of the fusion protein is set forth in SEQ ID NO: 2, 4, 66, 68, 70 or 72, or as shown in amino acids 26-1654 of SEQ ID NO: 2. Or as shown in SEQ ID NO: 4, positions 26-1638, or as shown in SEQ ID NO: 68, amino acids 26-1629, or as SEQ ID NO: 70, amino acids 26-1629, or as SEQ ID NO: 72 is shown in amino acids 26-1638.
- a second aspect of the invention provides a polynucleotide sequence selected from the group consisting of:
- a third aspect of the invention provides a nucleic acid construct comprising the polynucleotide sequence of the second aspect herein.
- the nucleic acid construct is an expression vector for expression of a fusion protein described herein in a host cell.
- a fourth aspect of the invention provides a host cell comprising a fusion protein, a coding sequence thereof or a nucleic acid construct as described herein.
- a fifth aspect herein provides a method of producing a point mutation in a cell, the method comprising the step of expressing a fusion protein and sgRNA described herein in said cell.
- the methods comprise the step of transferring a fusion protein described herein, or an expression vector thereof, and sgRNA or an expression vector thereof into the cell, followed by screening to obtain the desired mutated nucleic acid sequence.
- the sgRNA comprises a target binding region and a Cas protein recognition region, the target binding region being capable of specifically binding to a nucleic acid sequence to be mutated, the Cas protein recognition region being capable of being The Cas enzyme recognizes and binds.
- the target binding region of the sgRNA specifically binds to a template strand of a nucleic acid sequence to be mutated, and the contralateral region of the sgRNA binding region on the template strand is immediately adjacent to the anterior region sequence recognized by the Cas protein Adjacent motifs, or bases separated by 10 or less.
- the gene to be mutated encodes a functional protein.
- the functional protein includes proteins involved in the development, progression, and metastasis of diseases, proteins involved in cell differentiation, proliferation, and apoptosis, proteins involved in metabolism, development-related proteins, and Drug targets and so on.
- the functional protein is selected from the group consisting of antibodies, enzymes, lipoproteins, hormone proteins, transport and storage proteins, motor proteins, receptor proteins, and membrane proteins.
- a sixth aspect of the invention provides a kit comprising a fusion protein, polynucleotide sequence or nucleic acid construct as described herein.
- a seventh aspect of the invention provides the use of a fusion protein, polynucleotide sequence or nucleic acid construct as described herein for producing a point mutation in a cell, or in the preparation of a composition or kit for producing a point mutation in a cell Applications.
- Figure 1 A and C are PCR-amplified AID (lane 1) and AIDX fragment (lane 1); B is pEntr11-dCas9-AID plasmid agarose gel, in which one lane is pEntr11 empty plasmid, 2 The plasmid is pEntr11-dCas9 plasmid, the 3-7 lanes are pEntr11-dCas9-AID plasmid; D is the PCR result of pEntr11-dCas9-AIDX plasmid bacterial solution, and the amplified fragment is AIDX. Lanes 1-5 in D represent 5 different positive clones, respectively, and No. 6 is an empty plasmid as a negative control.
- Figure 2 A, 1 and 2 lanes are respectively PCR-amplified dCas9-AID and dCas9-AIDX fragments; B, enzymatically cleavage of MO91 empty-loaded plasmid, one of which is BglII single-cut, and the other is MO91 empty Plasmid, 3 lanes are BglII and XhoI double digestion; C, MO91-dCas9-AIDX plasmid bacterial solution PCR results, the amplified fragment is AIDX; D, MO91-dCas9-AID plasmid bacterial solution PCR results, amplified The fragment is an AID.
- Figure 3 A, 1 is the 3*flag+NLS fragment amplified by PCR, and 2 and 3 lanes are BglII single-cutting MO91-dCas9-AID plasmid and MO91-dCas9-AIDX plasmid, respectively, and 4 lanes are MO91- dCas9-AID plasmid control; B, 1-4 lanes are MO91-dCas9 (3*flag, NLS)-AID plasmid, lane 5 is MO91-dCas9-AID plasmid, and lane 6-9 is MO91-dCas9 (3*flag, NLS)-AIDX plasmid.
- Figure 4 Sequence of the EGFP reporter, the stop codon is shown in bold. The designed sgRNA is indicated by an arrow.
- Figure 5 Schematic representation of the pattern of the reporter plasmid.
- Figure 6 Flow cytometry reporting cell line. The three curves from left to right indicate Thy1.1 expression levels of unstained controls, reporter negative cells, and reporter positive cells, respectively.
- Figure 7 Comparison of dCas9-AID, dCas9-AIDX, AID and AIDX point mutation efficiencies in reporter cells.
- Figure 8 Optimization of dCas9-AID point mutation efficiency in reporter cells.
- A dCas9-AID induces GFP expression;
- B a schematic of different AID variants and the efficiency of their induction of point mutations;
- C dCas9-AIDX induces point mutations requiring cytosine deaminase activity of AID.
- Figure 9 Point mutation frequency distribution of dCas9-AIDX and AID on EGFP and cMyc genes.
- Figure 10 dCas9-AIDX randomly mutates C and G bases to three other bases.
- A statistics of base mutation types;
- B dCas9-AIDX induces point mutation mechanism.
- Figure 12 dCas9-AIDX not only acts on exogenous genes, but also on endogenous genes.
- Figure 13 Structural functional domain of the AID.
- Figure 14 Experimental procedure (a) and results (b-d) of the application of dCas9-AIDX to the Gleevec resistance screening of the K562BCR-ABL gene.
- FIG. 15 TAM (targeting cytosine deaminase AID-mediated gene mutation technique) mutating amino acids of the anti-HEL-IgG1 variable region.
- TAM induces base mutations in the anti-HEL-IgG1 variable region (top panel) and can repeatedly induce base mutations in the IgGl CDRs (bottom panel).
- Figure 17 The affinity of the mutated antibody for HEL is increased by more than 10 fold.
- Figure 18 Results of expression of nCas9-AIDX in bacteria.
- the boxed box is a band of nCas9-AIDX fusion protein.
- Figure 19 Functional test results for different fusion proteins.
- the three pillars from left to right represent the results of MO91-AIDX-XTEN-dCas9, MO91-dCas9-XTEN-AIDX, and MO91-dCas9-AIDX.
- Figure 20 Functional test results for different fusion proteins.
- the three pillars from left to right represent the results of MO91-dCas9-AIDX, MO91-dCas9-XTEN-AIDX (K10E T82I E156G) and MO91-dCas9-XTEN-AIDX.
- Figure 21 Functional verification results of the nCas9-AIDX fusion protein.
- This document relates to a fusion protein of Cas protein with nuclease activity and cytosine deaminase AID or a mutant thereof.
- the fusion protein Under the guidance of sgRNA, the fusion protein is recruited to a specific DNA sequence, and AID or its mutant deamination of cytosine to produce uracil, which is then randomly mutated into other bases during DNA repair. High mutation efficiency is obtained while achieving site-directed mutagenesis.
- CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
- a complex of a Cas protein with endonuclease activity and its specifically recognized sgRNA is complementary paired with a template strand in the target DNA by a pairing region of the sgRNA, and the double stranded DNA is cleaved by Cas at a specific position.
- Cas protein and “Cas enzyme” are used interchangeably herein.
- Cas/sgRNA The above-described property of Cas/sgRNA is utilized herein to utilize sgRNA specific binding to a target to position Cas to a desired position at which the cytosine is deaminated by AID or a mutant thereof in the fusion protein.
- Partial or complete deletions of nuclease activity suitable for use in the present invention, particularly partial or complete deletion of endonuclease activity, but retaining knowledge of the chymase activity may be derived from various Cas proteins and variants thereof well known in the art.
- a Cas9 enzyme lacking nuclease activity and a single-stranded sgRNA specifically recognized by the same are used.
- the Cas9 enzyme may be a Cas9 enzyme from a different species including, but not limited to, Cas9 (SpCas9) from S. pyogenes, Cas9 (SaCas9) from S. aureus, and Cas9 (St1 Cas9) from Streptococcus thermophilus, and the like.
- Various variants of the Cas9 enzyme can be used as long as the Cas9 enzyme specifically recognizes its sgRNA and lacks nuclease activity.
- the Cas protein with nuclease activity deletion can be prepared by methods well known in the art, including but not limited to deletion of the entire catalytic domain of the endonuclease in the Cas protein or mutation of one or several amino acids in the domain. Thereby producing a Cas protein lacking in nuclease activity.
- the mutation may be one or several (for example, two or more, three or more, four or more, five or more, more than ten, to the entire catalytic domain) deletion or substitution of amino acid residues, or one or several new amino acids. Insertion of residues (for example, one or more, two or more, three or more, four or more, five or more, ten or more, or 1 to 10, and 1 to 15).
- Deletion of the above domains or mutation of amino acid residues can be carried out by methods conventional in the art, and whether the Cas protein after the mutation also has nuclease activity.
- Cas9 its two endonuclease catalytic domains, RuvC1 and HNH, can be mutated, for example, the 10th amino acid of the enzyme (in the RuvC1 domain) is mutated to alanine or other An amino acid that mutates the histidine of amino acid 841 (located in the HNH domain) to alanine or other amino acids.
- the Cas enzyme is completely nuclease free.
- the amino acid sequence of the nuclease-free Cas9 enzyme used herein is set forth in SEQ ID NO: 2, pp. 42-1452.
- the Cas enzyme portion used herein lacks nuclease activity, ie, the Cas enzyme can cause DNA single strand breaks.
- a representative example of such a Cas enzyme can be shown as amino acid residues 42-14-19 of SEQ ID NO:72.
- the function of the Cas/sgRNA complex requires a protospacer adjacent motif (PAM) in the non-template strand (3' to 5') of the DNA.
- PAM protospacer adjacent motif
- Different Cas enzymes their corresponding PAMs are not identical.
- the PAM for SpCas9 is typically NGG
- the PAM for the SaCas9 enzyme is typically NNGRR
- the PAM for the St1Cas9 enzyme is typically NNAGAA; wherein N is A, C, T or G and R is G or A.
- the PAM for the SaCas9 enzyme is NNGRRT. In certain preferred embodiments, the PAM for SpCas9 is TGG.
- sgRNA usually consists of two parts: a target binding region and a Cas protein recognition region.
- the target binding region and the Cas protein recognition region are usually joined in the 5' to 3' direction.
- the target binding region is typically 15 to 25 bases in length, more typically 18 to 22 bases, such as 20 bases.
- the target binding region specifically binds to the template strand of DNA, thereby recruiting the fusion protein to a predetermined site.
- the contralateral region of the sgRNA binding region on the DNA template strand is in close proximity to the PAM, or is separated by a few bases (eg, within 10, or within 8 or within 5). Therefore, when designing sgRNA, the PAM of the enzyme is usually determined according to the Cas enzyme used, and then a site which can be used as a PAM is found on the non-template strand of DNA, and then the non-template strand (3' to 5') PAM is used. Fragments 15 to 25 bases long, more usually 18 to 22 bases long downstream of the PAM site or within 10 (eg, within 8 or less, etc.) of the PAM site A sequence that is the target binding region of sgRNA.
- the Cas protein recognition region of sgRNA is determined based on the Cas protein used, which is well known to those skilled in the art.
- the sequence of the target binding region of the sgRNA herein is that the DNA strand containing the PAM site recognized by the selected Cas enzyme is immediately downstream of the PAM site or within 10 of the PAM site (for example, 8 or less, 5 Fragments of 15 to 25 bases in length, usually 18 to 22 bases in length; the Cas protein recognition region is specifically recognized by the selected Cas enzyme.
- the sgRNA can be prepared by methods conventional in the art, for example, by conventional chemical synthesis methods.
- the sgRNA can also be transformed into cells via an expression vector to express the sgRNA in the cell.
- Expression vectors for sgRNA can be constructed using methods well known in the art.
- AID is a cytosine deaminase belonging to the APOBEC family, an RNA editing enzyme family: a nuclear localization signal at the N-terminus and a nuclear export signal at the C-terminus.
- the catalytic domain is shared by the APOBEC family. It is generally believed that the N-terminal structure is required for somatic hypermutation (SHM).
- SHM somatic hypermutation
- AID function is deamination of cytosine, cytosine The pyridine becomes uracil, and subsequent DNA repair can turn uracil into other bases. It will be appreciated that cytosine deaminase, or fragments or mutants thereof that retain the biological activity of cytosine deamination, cytosine to uracil, are well known in the art.
- amino acids 9-26 are nuclear localization (NLS) domains, especially amino acids 13-26 are involved in DNA binding, amino acids 56-94 are catalytic domains, amino acids 109-182 are APOBEC-like domains, and amino acids 193-198 are The nuclear export (NES) domain, amino acids 39-42 interact with catenin-like protein 1 (CTNNBL1), and amino acids 113-123 are hotspot recognition loops.
- NLS nuclear localization
- amino acids 39-42 interact with catenin-like protein 1 (CTNNBL1)
- amino acids 113-123 are hotspot recognition loops.
- a full length sequence of AID (as indicated by amino acids 1457-1654 of SEQ ID NO: 2) can also be used herein, and fragments of AID can also be used.
- the fragment comprises at least an NLS domain, a catalytic domain and an APOBEC-like domain.
- the fragment comprises at least amino acid residues 9-182 of the AID (ie, amino acid residues 1465-1638 of SEQ ID NO: 2).
- the fragment comprises at least amino acid residues 1-182 of the AID (ie, amino acid residues 1457-1638 of SEQ ID NO: 2).
- an AID fragment as used herein consists of amino acid residues 1-182, consists of amino acid residues 1-164, or consists of amino acid residues 1-190.
- the AID fragment used herein consists of amino acid residues 1457-1638 of SEQ ID NO: 2, amino acid residues 1457-1642 of SEQ ID NO: 2, or SEQ ID NO: 2 Amino acid residue composition of 1457-1646.
- variants of AID that retain their cytosine deaminase activity can also be used herein.
- such variants may correspond to a wild-type sequence of AIDs having from 1 to 10, such as 1-8, 1-5 or 1-3 amino acid variations, including deletions, substitutions and mutations of amino acids.
- these amino acid variations do not occur within the above-described NLS domain, catalytic domain and APOBEC-like domain, or even within these domains do not affect the original biological function of these domains.
- it is preferred that these variations do not occur at positions 24, 27, 38, 56, 58, 87, 90, 112, 140, etc. of the AID amino acid sequence.
- these variations also do not occur within amino acids 39-42, amino acids 113-123.
- variation can occur among amino acids 1-8, amino acids 28-37, amino acids 43-55, and/or amino acids 183-198.
- the variation occurs at positions 10, 82, and 156.
- substitution mutations occur at positions 10, 82, and 156, and such substitution mutations can be K10E, T82I, and E156G.
- the amino acid sequence of an exemplary AID mutant comprises the amino acid sequence set forth at positions 1447-1629 of SEQ ID NO: 68, or the amino acid set forth at positions 1447-1629 of SEQ ID NO:68. Residue composition.
- fusion proteins comprising Cas enzyme and AID.
- the Cas enzyme is usually at the N-terminus of the amino acid sequence of the fusion protein, and the AID is at the C-terminus.
- fusion proteins formed primarily of Cas enzyme and AID are provided herein.
- a fusion protein or similar expression "formed primarily by" herein does not mean that the fusion protein includes only Cas enzyme and AID, and the definition is understood to mean that the fusion protein may include only Cas enzyme and AID, or It may also contain other components that do not affect the targeting of the Cas enzyme in the fusion protein and the function of the AID mutant target sequence, including but not limited to various linker sequences, nuclear localization sequences, and gene cloning operations, as described below, And/or an amino acid sequence introduced in the fusion protein for the purpose of constructing a fusion protein, promoting expression of a recombinant protein, obtaining a recombinant protein that is automatically secreted outside the host cell, or facilitating detection and/or purification of the recombinant protein.
- the Cas enzyme can be fused to the AID via a linker.
- the linker may be a peptide of 3 to 25 residues, for example, a peptide of 3 to 15, 5 to 15, 10 to 20 residues. Suitable examples of peptide linkers are well known in the art.
- the linker contains one or more motifs that are repeated before and after, and the motif typically contains Gly and/or Ser.
- the motif can be SGGS, GSSGS, GGGS, GGGGS, SSSSG, GSGSA, and GGSGG.
- the motif is contiguous in the linker sequence with no amino acid residues inserted between the repeats.
- the linker sequence may comprise 1, 2, 3, 4 or 5 repeat motifs.
- the linker sequence is a polyglycine linker sequence.
- the amount of glycine in the linker sequence is not particularly limited, but is usually 2 to 20, for example, 2 to 15, 2 to 10, and 2 to 8.
- the linker may also contain other known amino acid residues such as alanine (A), leucine (L), threonine (T), glutamic acid (E), styrene Amino acid (F), arginine (R), glutamine (Q), and the like.
- the linker sequence is XTEN, the amino acid sequence of which is set forth in amino acid residues 183-198 of SEQ ID NO:66.
- a linker can be composed of the following amino acid sequences: G(SGGGG) 2 SGGGLGSTEF (SEQ ID NO: 21), RSTSGLGGGS (GGGGS) 2 G (SEQ ID NO: 22), QLTSGLGGGS (GGGGS) 2 G (SEQ ID NO: 23) ), GGGS (SEQ ID NO: 24), GGGGS (SEQ ID NO: 25), SSSSG (SEQ ID NO: 26), GSGSA (SEQ ID NO: 27), GGSGGGGGGSGGGGSGGGGS (SEQ ID NO: 28), SSSSGSSSSGSSSSG (SEQ) ID NO: 29), GGSGAGSGSAGSGSA (SEQ ID NO: 30), GGSGGGGSGGGGSGG (SEQ ID NO: 31), SEQ ID NO: 72, amino acid residues 1420-1456, and the like.
- a suitable cleavage site which necessarily introduces one or more irrelevant residues at the end of the expressed amino acid sequence without affecting the activity of the sequence of interest.
- promote expression of a recombinant protein obtain a recombinant protein that is automatically secreted outside the host cell, or facilitate purification of the recombinant protein, it is often necessary to add some amino acids to the N-terminus, C-terminus of the recombinant protein or within the protein.
- Other suitable regions for example, including but not limited to, suitable linker peptides, signal peptides, leader peptides, End extension, etc.
- the amino terminus or carboxy terminus of the fusion protein herein may also contain one or more polypeptide fragments as a protein tag.
- Any suitable label can be used in this article.
- the tag may be FLAG (DYKDDDDK, SEQ ID NO: 32), HA, HA1, c-Myc, Poly-His, Poly-Arg, Strep-TagII, AU1, EE, T7, 4A6, ⁇ , B , gE and Ty1. These tags can be used to purify proteins.
- the fusion proteins herein may also contain a nuclear localization sequence (NLS).
- Nuclear localization sequences of various sources and various amino acids well known in the art can be used.
- Such nuclear localization sequences include, but are not limited to, the NLS of the SV40 viral large T antigen having the amino acid sequence PKKKRKV (SEQ ID NO: 33); the NLS from the nuclear protein, for example, having the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 34) Nuclear protein dichotomous NLS; NLS from c-myc having the amino acid sequence PAAKRVKLD (SEQ ID NO: 35) or RQRRNELRSRSP (SEQ ID NO: 36); NLS from hRNPA1M9 having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 37); sequence from the IBB domain of the input protein- ⁇ RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILK
- the sequence shown by amino acid residues 26-33 of SEQ ID NO: 2 is used herein as the NLS.
- the NLS may be located at the N-terminus and C-terminus of the fusion protein; it may also be located in the fusion protein sequence, such as the N-terminus and/or C-terminus of the Cas9 enzyme in the fusion protein, or the N-terminus and/or C of the AID located in the fusion protein. end.
- the accumulation of the fusion protein of the invention in the nucleus can be detected by any suitable technique.
- a detection marker can be fused to the Cas enzyme such that the location of the fusion protein within the cell can be visualized when combined with means for detecting the location of the nucleus (eg, a dye specific for the nucleus, such as DAPI).
- 3*flag is used herein as a marker, and the peptide sequence can be as shown in amino acid residues 1-23 of SEQ ID NO:2. It will be understood that, in general, if a marker sequence is present, the marker sequence is typically at the N-terminus of the fusion protein.
- the tag sequence can be directly linked to the NLS or can be joined by a suitable linker sequence.
- the NLS sequence can be ligated directly to the Cas enzyme or AID, or it can be ligated to the Cas enzyme or AID by a suitable linker sequence.
- the fusion proteins herein consist of a Cas enzyme and an AID.
- the fusion protein herein is formed by the Cas enzyme linked to the AID via a linker.
- the fusion protein NLS, Cas enzyme, AID, and optional linker sequences between the Cas enzyme and the AID are comprised herein.
- the Cas enzyme in the fusion protein is a Cas9 enzyme as described hereinbefore.
- the amino acid sequence of the AID in the fusion protein is set forth in amino acid residues 1457-1654 of SEQ ID NO:2.
- amino acid sequence of the AID in the fusion protein is set forth in amino acid residues 1457-1646 of SEQ ID NO:4. In other specific embodiments, the amino acid sequence of the AID in the fusion protein is set forth in amino acid residues 1447-1629 of SEQ ID NO:68.
- the amino acid sequence of the fusion protein herein is as set forth in SEQ ID NO: 2, 4, 66, 68, 70 or 72, or as shown in amino acids 26-1654 of SEQ ID NO: 2, or As shown in SEQ ID NO: 4, positions 26-1638, or as shown in SEQ ID NO: 68, amino acids 26-1629, or as SEQ ID NO: 70, amino acids 26-1629, or as SEQ ID NO: 72 is shown in amino acids 26-1638.
- polynucleotide sequences encoding the fusion proteins herein may be in the form of DNA or RNA.
- DNA forms include cDNA, genomic DNA or synthetic DNA.
- DNA can be single-stranded or double-stranded.
- the DNA can be a coding strand or a non-coding strand.
- the nucleotide sequences described herein can generally be obtained by PCR amplification.
- primers can be designed according to the nucleotide sequences disclosed herein, particularly the open reading frame sequences, and using a commercially available cDNA library or a cDNA library prepared by conventional methods known to those skilled in the art as a template, The relevant sequences were amplified. When the sequence is long, it is often necessary to perform two or more PCR amplifications, and then the amplified fragments are spliced together in the correct order.
- the polynucleotide sequence encoding a fusion protein described herein is set forth in SEQ ID NO: 1, 3, 65, 67, 79 or 71, or as SEQ ID NO: 1 73-4965 Shown as a base, or as shown in bases 73-4917 of SEQ ID NO: 3, or as bases 76-4890 of SEQ ID NO: 67, or as SEQ ID NO: 70, 76- The 4890 bases are shown, or as shown in SEQ ID NO: 72, positions 76-4917.
- nucleic acid constructs comprising the polynucleotides.
- the nucleic acid construct contains the coding sequences for the fusion proteins described herein, as well as one or more regulatory sequences operably linked to the sequences.
- the coding sequences for the fusion proteins of the invention can be manipulated in a variety of ways to ensure expression of the proteins.
- the nucleic acid construct can be manipulated depending on the identity or requirements of the expression vector prior to insertion of the nucleic acid construct into the vector. Techniques for altering polynucleotide sequences using recombinant DNA methods are known in the art.
- the control sequence can be a suitable promoter sequence.
- the promoter sequence is typically operably linked to the coding sequence of the protein to be expressed.
- the promoter may be any nucleotide sequence that exhibits transcriptional activity in the host cell of choice, including mutated, truncated and hybrid promoters, and may be derived from an extracellular or heterologous source encoding the host cell. Or Gene acquisition of intracellular polypeptides.
- the control sequence may also be a suitable transcription terminator sequence, a sequence recognized by the host cell to terminate transcription.
- the terminator sequence is operably linked to the 3' terminus of the nucleotide sequence encoding the polypeptide. Any terminator that is functional in the host cell of choice may be used in the present invention.
- the control sequence may also be a suitable leader sequence, an untranslated region of the mRNA that is important for translation by the host cell.
- the leader sequence is operably linked to the 5' terminus of the nucleotide sequence encoding the polypeptide. Any terminator that is functional in the host cell of choice may be used in the present invention.
- the nucleic acid construct is a vector.
- a polynucleotide sequence herein can be inserted into a recombinant expression vector.
- recombinant expression vector refers to bacterial plasmids, phage, yeast plasmids, plant cell viruses, mammalian cell viruses such as adenoviruses, retroviruses or other vectors well known in the art. Any plasmid and vector can be used as long as it can replicate and stabilize in the host.
- An important feature of expression vectors is that they typically contain an origin of replication, a promoter, a marker gene, and a translational control element.
- the expression vector may also include a ribosome binding site for translation initiation and a transcription terminator.
- the polynucleotide sequences described herein are operably linked to a suitable promoter in an expression vector to direct mRNA synthesis via the promoter.
- suitable promoters are: lac or trp promoter of E. coli; lambda phage PL promoter; eukaryotic promoters include CMV immediate early promoter, HSV thymidine kinase promoter, early and late SV40 promoter, anti- Promoters for the expression of LTRs of transcriptional viruses and other known controllable genes in prokaryotic or eukaryotic cells or their viruses.
- the marker gene can be used to provide phenotypic traits for selection of transformed host cells including, but not limited to, dihydrofolate reductase for eukaryotic cell culture, neomycin resistance, and green fluorescent protein (GFP), or for the large intestine Bacillus tetracycline or ampicillin resistance.
- GFP green fluorescent protein
- a polynucleotide described herein is expressed in a higher eukaryotic cell, transcription will be enhanced if an enhancer sequence is inserted into the vector.
- An enhancer is a cis-acting factor of DNA, usually about 10 to 300 base pairs, acting on a promoter to enhance transcription of the gene.
- Expression vectors containing the polynucleotide sequences described herein and appropriate transcription/translation control signals can be constructed using methods well known to those of skill in the art. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like.
- the vectors described herein can be transformed into a suitable host cell to enable expression of the fusion proteins described herein.
- the host cell can be a prokaryotic cell, such as a bacterial cell; or a lower eukaryotic cell, such as a yeast cell; a filamentous fungal cell, or a higher eukaryotic cell, such as a mammalian cell.
- the host cell can also be a plant cell.
- host cells are: Escherichia coli; Streptomyces; bacterial cells of Salmonella typhimurium; fungal cells such as yeast, filamentous fungi; plant cells; insect cells of Drosophila S2 or Sf9; CHO, COS, 293 cells, Or animal cells of Bowes melanoma cells, etc.
- others include A polynucleotide sequence or vector and a cell of sgRNA or an expression vector thereof, such as a cell for the preparation of a point mutant protein, are also within the scope of the host cells described herein.
- Transformation of host cells with recombinant DNA can be carried out using conventional techniques well known to those skilled in the art.
- the host is a prokaryote such as E. coli
- competent cells capable of absorbing DNA can be harvested after the exponential growth phase and treated by the CaCl 2 method, and the procedures used are well known in the art.
- Another method is to use MgCl 2 .
- Conversion can also be carried out by electroporation if desired.
- the host is a eukaryote, the following DNA transfection methods can be used: calcium phosphate coprecipitation, conventional mechanical methods such as microinjection, electroporation, liposome packaging, and the like.
- the obtained transformant can be cultured in a conventional manner to allow it to express the fusion protein described herein.
- the medium used in the culture may be selected from various conventional media depending on the host cell used.
- the recombinant fusion proteins herein can be isolated and purified using various separation methods known in the art.
- host cells comprising a fusion protein described herein, a coding sequence or expression vector thereof, and optionally sgRNA or an expression vector thereof, are also included herein.
- Such host cells can constitutively express the fusion proteins described herein, and can also express the fusion proteins described herein under certain conditions of induction.
- Methods for constitutively expressing a host cell or expressing a fusion protein of the invention under inducing conditions are well known in the art.
- an expression vector of the invention is constructed using an inducible promoter to effect inducible expression of the fusion protein.
- the fusion protein herein, its coding sequence or expression vector, and/and sgRNA, its coding sequence or expression vector can be provided in the form of a composition.
- the composition may contain an expression vector for the fusion protein and sgRNA or sgRNA herein, or an expression vector containing the fusion protein herein and an expression vector for sgRNA or sgRNA.
- the fusion protein or its expression vector, or sgRNA or its expression vector may be provided as a mixture, or may be packaged separately.
- the composition may be in the form of a solution or it may be in a lyophilized form.
- kits containing the compositions described herein comprising the expression vector of the fusion protein and sgRNA or sgRNA herein, or an expression vector comprising the expression vector of the fusion protein herein and sgRNA or sgRNA.
- the fusion protein or its expression vector, or sgRNA or its expression vector can be packaged separately or in the form of a mixture.
- reagents for transferring the fusion protein or its expression vector and/or sgRNA or expression vector thereof into a cell and instructions for directing the skilled person to perform the transfer.
- the kit can also include instructions for the skilled artisan to practice the various methods and uses described herein using the components contained in the kit. Kit Other reagents, such as reagents for PCR, etc., are also included.
- a third aspect herein provides a method of producing a point mutation in a cell, the method comprising the step of expressing a fusion protein and sgRNA described herein in said cell.
- a fusion protein of the invention, or an expression vector thereof, and sgRNA or an expression vector thereof are introduced into the cell.
- the cell constitutively expresses a fusion protein described herein only the corresponding sgRNA or its expression vector can be transferred into the cell.
- the cells may also be incubated with the inducer after administration of the sgRNA, or the cells may be subjected to corresponding inducing measures (eg, illumination).
- the fusion protein or its expression vector and/or sgRNA or expression vector thereof can be transferred into a cell using conventional transfection methods.
- a plasmid DNA-liposome complex is first prepared, and then the plasmid DNA-liposome complex and the corresponding sgRNA are co-transfected into cells.
- the cell can be cultured under conditions suitable for the growth of the cell and expression of a desired protein, and the resulting mutant can be isolated and analyzed by various conventional methods such as a high-throughput method.
- the methods described herein for generating point mutations in cells can also be used to generate mutant libraries, and then the mutants in the library can be isolated and screened using conventional techniques to obtain mutants having the desired biological function. Accordingly, the invention also provides a method of constructing a mutant library, the method comprising the step of expressing a fusion protein and sgRNA described herein in said cell.
- One or more sgRNAs can be designed for the same site to be mutated.
- the target binding regions of the various sgRNAs designed are different, but have the same Cas protein recognition region.
- the one or more sgRNAs can then be transferred into the cell along with the corresponding fusion protein.
- the cell can be any cell of interest, including prokaryotic cells and eukaryotic cells, such as plant cells, animal cells, microbial cells, and the like. Particularly preferred are animal cells, such as mammalian cells, rodent cells, including humans, horses, cows, sheep, rats, rabbits, and the like.
- Microbial cells include cells from a variety of microbial species well known in the art, especially those having microbial species of medical research value, production value (e.g., production of fuels such as ethanol, protein production, lipids such as DHA production).
- the cells may also be cells of various organ origin, such as cells from human liver, kidney, skin, and the like.
- the cells may also be various mature cell lines currently marketed, such as 293 cells, COS cells.
- the cell is a cell from a healthy individual; in other embodiments, the cell is a cell from a diseased tissue of a diseased individual, such as a cell from an inflammatory tissue, a tumor cell, an induced pluripotent stem cell, and the like.
- the cells may also be cells that have been genetically engineered to have a particular function (eg, to produce a protein of interest) or to produce a phenotype of interest.
- the gene or nucleic acid sequence to be mutated may be naturally present in the cell for the cell (endogenous)
- the gene or nucleic acid sequence may also be a foreign-transferred (exogenous) gene or nucleic acid sequence.
- the extraneously transferred gene or nucleic acid sequence can be integrated into the genomic sequence of the cell or independently of the genome and stably expressed.
- expression vectors expressing the fusion proteins and sgRNAs herein can be designed using known techniques to render these expression vectors suitable for expression in such cells.
- a promoter that facilitates expression in the cell and other related regulatory sequences can be provided in an expression vector. These can be selected and implemented by the technician according to the actual situation.
- Nucleic acid sequences which are expected to produce point mutations can be any nucleic acid sequence of interest, such as a gene sequence, particularly various diseases, or related to the production of various proteins of interest, or various biological functions of interest.
- Such gene or nucleic acid sequences of interest include, but are not limited to, nucleic acid sequences encoding various functional proteins.
- a functional protein refers to a protein capable of performing physiological functions of an organism, including a catalytic protein, a transport protein, an immune protein, and a regulatory protein.
- the functional proteins include, but are not limited to, proteins involved in the development, progression, and metastasis of diseases, proteins involved in cell differentiation, proliferation, and apoptosis, proteins involved in metabolism, development-related proteins. , as well as various drug targets and so on.
- the functional protein may be an antibody, an enzyme, a lipoprotein, a hormone protein, a transport and storage protein, a motor protein, a receptor protein, a membrane protein, or the like.
- mutant libraries, polynucleotides, nucleic acid constructs, cells, methods, and the like, as described herein can be used to construct mutant libraries and further screen for proteins with new or greater functions, such as antibodies, enzymes, or other functional proteins. Wait.
- Mutations can be made on the nucleic acid sequence of interest using the methods described herein, or at specific sites in the nucleic acid sequence of interest.
- the PAM site on the template strand can be searched according to the Cas enzyme used, and the PAM site is immediately downstream of the PAM site or within 10 cells (for example, within 8, within 5 or 3).
- a fragment of 15 to 25 bases in length, usually 18 to 22 bases in length, is designed as the target recognition region of the sgRNA to design the sgRNA recognized by the Cas enzyme.
- a site that can serve as a PAM can be found near the specific site, and the Cas enzyme capable of recognizing the PAM can be selected according to the PAM, and the fusion protein of the present invention containing the Cas enzyme and correspondingly designed and prepared according to the description herein sgRNA.
- the method herein may be an in vitro method or an in vivo method.
- the fusion protein or expression vector thereof and sgRNA or expression vector thereof can be transferred into a subject, such as a corresponding tissue cell, by a means well known in the art, and the sensation can be screened by observing the phenotypic change of the animal.
- a subject such as a corresponding tissue cell
- the sensation can be screened by observing the phenotypic change of the animal.
- a functional variant of interest it should be understood that in vivo experiments, the subject may be a variety of non-human animals, particularly various non-human model organisms conventionally employed in the art. In vivo experiments should also meet ethical requirements.
- Example 1 Construction of pEntr11-dCas9-AID plasmid and pEntr11-dCas9-AIDX plasmid
- a dCas9 gene fragment was amplified from dCas9 plasmid (Addgene) by PCR;
- the TBi1CD was cloned into the pEntr11-dCas9 plasmid by the Gibson Assembly method, and the construction of the pEntr11-dCas9-TET1CD plasmid was completed.
- the dCas9-AID fragment and the dCas9-AIDX fragment were amplified from the pEntr11-dCas9-AID plasmid and the pEntr11-dCas9-AIDX plasmid using the primers shown in SEQ ID NOS: 8 and 9 (Fig. 2, A);
- the MO91 plasmid (Addgene Plasmid #19755) and the AID and AIDX fragments were digested with restriction endonucleases BglII and XhoI, and then the vector, AID fragment and AIDX fragment were recovered (Fig. 2, B);
- the AID fragment and the AIDX fragment after digestion are ligated to the MO91 vector, and then the ligated product is transformed into Stbl3 competent cells;
- Example 4 Establish an effective reporting system indicating the efficiency of AID point mutations
- the level of point mutations at the genomic level needs to be detected by a simple and intuitive method.
- the present invention mainly uses flow analysis techniques to indirectly detect the level of point mutations at the protein level.
- the human insertion insertion stop codon (TAG) in the EGFP gene, EGFP could not be expressed normally.
- TAG human insertion insertion stop codon
- the fusion protein of this example acts on the stop codon in the EGFP gene, the stop codon is mutated and the EGFP gene mutation is normally expressed. Therefore, the higher the level of EGFP expression, the higher the efficiency of point mutations.
- the EGFP gene containing the stop codon was inserted into the MO405-thy1.1 plasmid (Addgene), and MSCV initiated gene expression. Infecting 293T with this plasmid, specifically including:
- the poisoning method is the same as the transfection method
- sgRNA was cloned into pLX (Addgene 50662) to obtain pLX sgRNA.
- the following four primers are required, wherein R1 and F2 are sgRNA specific:
- R1 rc (GN 19) GGTGTTTCGTCCTTTCC (SEQ ID NO: 11)
- R2 AAAGCTAGCTAATGCCAACTTTGTACAAGAAAGCTG (SEQ ID NO: 13)
- GN 19 new target sequence
- rc (GN 19 ) reverse complement of the new target sequence
- the base sequences of the target binding regions of the four sgRNAs are as follows:
- Example 6 CRISPR-Cas9 improves AID point mutation efficiency
- Transfection was carried out by culturing the reporter cells constructed in Example 4 to a confluency of 70-90%.
- transfected first prepare a plasmid DNA-liposome complex, including four times the amount 2000 reagent diluted in In the medium, dilute the MO91-dCas9 (3*flag, NLS)-AID plasmid or the MO91-dCas9 (3*flag, NLS)-AIDX plasmid, respectively.
- the diluted plasmid is then separately added to the diluted Incubate for 30 minutes in 2000 reagents (1:1).
- Example 4 The plasmid DNA-liposome complex and the 4 sgRNAs against the EGFP stop codon prepared in Example 5 were then co-transfected with the reporter cells constructed in Example 4.
- the reporter cells constructed in Example 4 were transfected only with the plasmid DNA-liposome complex. Incubation was carried out by adding 2 ug/ml of puromycin and 20 ug/ml of blasticidin, and screening for 3d, and analyzing the expression level of EGFP on the 4th and 7th day after transfection, respectively.
- the %EGFP+ of AID and AIDX were 0.14% and 0.30%, respectively.
- the %EGFP+ of dCas9-AID+sgRNA and dCas9-AIDX+sgRNA were 2.14% and 4.36%, respectively.
- Example 7 CRISPR-Cas9 improves AID point mutation efficiency and optimization
- the expression vector of sgRNA and dCas9-AID was co-transduced in the reporter cells constructed in Example 4 in the same manner as in Example 6.
- the sgRNA was divided into two groups, one of which was a control sgRNA against AAVS1, and the target binding regions thereof were as follows: GATTCCCAGGGCCGGTTAATG (SEQ ID NO: 18); GTCCCCTCCACCCCACAGTG (SEQ ID NO: 19); and GGGGCCACTAGGGACAGGAT (SEQ ID NO: 20).
- the other group is the sgRNA group against EGFP (SEQ ID NOS: 14-17).
- the control group was set to single-turn AID in the reporter cells.
- An expression vector for the control sgRNA was constructed as described in Example 5.
- dCas9 was fused to different AID mutants: AID-FL (full length), AID-CD (catalytic domain only), P182X (from amino acid residue 183) Short), R186X (truncated from amino acid residue at position 187), R190X (truncated from amino acid residue at position 191).
- dCas9-AID expression vector and sgRNA were co-transformed in the reporter cells, with dCas9-R186X being the most efficient ( Figure 8, B and C).
- the experiments of Examples 8-13 were therefore carried out using dCas9-R186X, and in these examples, dCas9-R186X was simply referred to as dCas9-AIDX.
- the entire system has a base substitution function, and a functional mutant of Cas9, dCas9, dCas9-AIDX is separately co-transferred in the reporter cells [R186X(E58Q) ], dCas9-AIDX and sgRNA, only the dcas9-AIDX and sgRNA groups have EGFP%+, while the other groups are all 0 (Fig. 8, C). It also proves that it is indeed the fusion of AID and dCas9 that the entire system has a base replacement function.
- Example 8 CRISPR-Cas9 limits AID point mutation to sgRNA targeting sites
- Example 9 dCas9-AIDX randomly mutates C and G bases to three other bases
- AIDX itself will mutate C to T and G to A. After the fusion of dCas9 and AIDX, the mutation direction of C and G became more uniform compared with the AIDX group.
- the role of the AID itself is dependent on the WRCY of the hotspot motif (W stands for A/T, R stands for A/C, Y stands for C/T), and the most preferred motif is AGCT.
- W stands for A/T
- R stands for A/C
- Y stands for C/T
- the most preferred motif is AGCT.
- the inventors have proposed a hypothesis that under normal circumstances, AID will deamination of cytosine to form uracil, which is repaired by DNA replication, and this ug mismatch is retained, and mutations of C to T and G to A occur, and The U base can be excised by base excision repair, and then four bases are inserted. Therefore, the fusion of dCas9 and AID is likely to inhibit the DNA replication pathway, promote base excision repair, and make the mutation direction more uniform (Fig. 10, b).
- Example 10 UGI increases the base substitution frequency of the dCas9-AIDX system, reveals the trajectory of dCas9-AIDX on the gene, and makes the base mutation direction more singular.
- UGI is an inhibitor of UNG, a phage protein that protects its genome from host UNG when it invades E. coli (Fig. 11, a).
- Three plasmids were co-transduced in the reporter cells, expressing dCas9-AIDX, a single sgRNA (target binding region GCCTCGAACTTCACCTCGGCG, SEQ ID NO: 16) and UGI (protein sequence: UniProtKB-P14739) to enhance a single sgRNA throughout the system. Mutation efficiency. The results showed a 10-fold increase in the highest point mutation efficiency (Fig. 11, b).
- Figure 11 (c) is a statistic based on data for 4 sgRNAs designed for the EGFP site.
- N is the first base in NGG in the PAM sequence.
- the upstream is -, the downstream is +, the statistical results of the two sets of data are consistent, both of which cause mutations in the upstream 20 bp of the PAM, that is, in the prototype interval region, and the highest point of mutation is in the -12/-13 position of the PAM.
- UGI can increase the overall mutation frequency of AID, but it will increase the proportion of base substitutions and reduce the conversion ratio (Fig. 11, d).
- Example 11 dCas9-AIDX can act not only on exogenous genes, but also on endogenous genes.
- the above experiments were all carried out in the reporter cells.
- the endogenous gene AAVS1 was selected as the target site, and three sgRNAs (SEQ ID NO: 18-20) were designed, and dCas9-AID and AAVS1 were co-transduced in 293T.
- the vector of three sgRNAs (as described in Example 7).
- the dCas9-AID system can also generate base substitutions to the endogenous gene AAVS1, and this mutation is also concentrated in the sgRNA target site.
- Example 12 Application of dCas9-AIDX to Gleevec resistance screening of K562BCR-ABL gene
- K562 is a leukemia cell line derived from human chronic myeloid leukemia. There is a chromosome in this cell called the ph chromosome. The chromosome is transposed by the long arms of chromosomes 9 and 22.
- the ABL gene on chromosome 9 contains a tyrosine kinase active center, which is in a low activity state under normal conditions, and has a high activity when translocated into the BCR locus.
- BCR-ABL is a proto-oncogene
- the commonly used drug is Gleevec (Gleevec, the active ingredient is imatinib mesylate)
- the main mechanism of action is gleevec ATP can be competitively bound to ABL, resulting in a low activity of the ABL gene.
- point mutations such as T315I
- T315I point mutations, such as T315I
- base substitutions at other sites can also cause Gleevec resistance.
- the dCas9-AIDX system can be used to screen for Gleevec resistance sites and specific mutation types as a basis for designing next generation inhibitors.
- the cells were cultured with 2 ml of anti-seeding solution, and virus was collected at 48 hours and 72 hours, respectively. Good collection 1000rpm virus immediate cell debris was removed by centrifugation for 5 minutes, the supernatant was added 2ul 10mg / ml Polybrene of infection 1x10 5 K562 cells, 37 °C, 900g speed rejection board 90 minutes. The cells were centrifuged 4 hours after infection, and the pellet was cultured with an anti-seeding solution. After two days of continuous infection, K562 cells need to be cultured for another two days. Flow cytometry is used to label cells expressing Thy1.1 surface molecules as PE + (antibody 1:200 dilution) and using single cell sorting technique.
- PE + antibody 1:200 dilution
- RNA of the cell population produced by each single cell clone was collected and subjected to RT-qPCR experiments.
- the cell line with the highest expression of dCas9-AIDX was used for subsequent screening of Gleevec resistance sites and mutation types.
- sgRNA for the genomic region of Exon6, the sixth exon of ABL gene.
- a total of 16 sgRNAs were designed (target sequence sequences are shown in SEQ ID NOs: 49-64, respectively), of which 6 are targeted to intron regions adjacent to exon Exon6, and 10 are directly targeted to the Exon6 region. And covered 83% of the exon sequences. Since the mutation of T315I has been recognized as one of the most important mutations causing Gleevec resistance, one and only one of the sgRNAs we designed can cover the T315I mutation site (944C) and can be used as a positive control.
- sgRNAs as negative controls for the genomic sequence of the AAVS1 gene unrelated to Gleevec resistance (target sequence sequences are shown in SEQ ID NOs: 18-20). These sgRNA sequences were all chemically synthesized, digested with BamH1 and HindIII, and finally cloned into the pSUPER-sgRNA vector carrying the H1 promoter.
- the K562 cell line stably expressing dCas9-AIDX was electroporated with ABL-Exon6 and AAVS1 mixed sgRNA libraries, respectively, and the instrument was used by the American Life Technology company Neoelectric transducer. 12-24 hours before electroporation, K562 cells were cultured in IMDM medium without anti-10% FBS. On the day of electroporation, two 1.2 ⁇ 10 6 K562 cells were transfected with 8ug respectively on the condition of 1000V voltage, single pulse and 50ms shock time. Equally mixed ABL-Exon6 or AAVS1 sgRNA.
- pSUPER-sgRNA plasmid vector carries the puromycin resistance gene
- cells expressing sgRNA were screened 24 hours after transfection by adding 2 ug/ml puromycin. After treatment with puromycin for 48 hours, K562 cells continued to expand.
- 2x10 5 of cellular DNA and RNA were collected for high-throughput sequencing and used as an Input control. The remaining cells were split into two portions and treated with 10 uM Gleevec drug or with an equal volume of DMSO, respectively. Once every three days Ficoll, remove dead cells, until the cell number far lower than when 2x10 4.
- Example 13 Application of dCas9-AIDX to increase the affinity and specificity of antibodies in vitro
- Antibodies can specifically recognize antigens as drug proteins for the treatment of various diseases.
- the affinity of an antibody is directly proportional to the somatic mutations produced in the germinal center in vivo.
- high affinity antibodies have multiple somatic high frequency mutations. Therefore, dCas9-AIDX can be used to mutate antibody genes to screen for antibodies with stronger affinity or other characteristics (such as better specificity, etc.).
- the protocol is as follows.
- the antibody molecule is stably expressed on the surface of 293T cells, and then sgRNA is designed for the antibody gene, and 293T cells are simultaneously transfected with dCas9-AIDX, and then the cell surface is stained. The stronger the stained cells, the mutant antibody molecules have Stronger affinity.
- the present embodiment employs from Invitrogen stably expressing Flp-In TM -293 lacZ-ZeocinTM a cell fusion locus.
- the transmembrane region sequence of the protein was cloned into a cDNA sequence such as pcDNA5/FRT/GOI vector (Life Science Technology, USA).
- the vector into Flp-InMM-293 cells using the Flp-In TM system Flp-In TM -293 cells contained the coding sequence containing the IgG1 Flp recombinase integrated into the target site by Flp recombinase lacZ- ZeocinTM fusion locus.
- Cells that were not successfully integrated were able to express anti-Zeocin proteins; after successful integration, anti-Zeocin proteins could not be expressed due to the lack of the initiation codon ATG, but were able to express hygromycin-resistant proteins. Therefore, hygromycin antibiotics were used to screen for IgG1-synthesized 293 cells in which only one copy of the anti-HEL-IgG1 gene was expressed per cell.
- the sgRNA sequence was then cloned into the pSUPER-puro plasmid vector (Addgene).
- the MO91-dCas9 (3*flag, NLS)-AIDX plasmid constructed in Example 3 and the sgRNA library (ie, 16 sgRNAs were mixed together in equal amounts) or the sgRNA of the control gene AAVS1 were co-transfected into the IgG1-expressing IgG1 obtained previously.
- the sgRNA library ie, 16 sgRNAs were mixed together in equal amounts
- the sgRNA of the control gene AAVS1 were co-transfected into the IgG1-expressing IgG1 obtained previously.
- PE anti-mouse IgG and Alex647-HEL were stained on the 7th day after transfection, and then flow sorted to sort out IgG strength. Cells that are unchanged and bind to the HEL antigen.
- mutant cells were detected by flow cytometry using PE anti-mouse IgG1 and 647-HEL surface staining, and it was found that a small group of cells had unchanged IgG1 expression and increased binding to HEL. This group of cells was then subjected to flow sorting, and after sorting and amplification, compared with the cells before the mutation, it was found that the affinity of the mutant antibody to HEL was enhanced more than 10 times (Fig. 17).
- MO91-AIDX-XTEN-dCas9 MO91-dCas9-XTEN-AIDX (K10E T82I E156G) and MO91-nCas9-AIDX can be constructed by referring to the above steps and the methods of Examples 1 and 2.
- the 3*flag and/or NLS fragment can be cloned into the above plasmid by the method of Example 3 to obtain SEQ ID NO: 66, 68, 70 and 72, respectively.
- the AIDX in these fusion proteins is an AID fragment or a mutant thereof truncated from amino acid residue 183.
- the resulting expression strain was grown overnight in LB medium containing 100 ⁇ g/ml kanamycin at 37 °C.
- the culture was cooled to 4 ° C in 2 hours, IPTG 0.5 mM was added, and the protein expression was induced for ⁇ 16 h;
- His-tagged fusion protein was eluted in elution buffer and concentrated to a total volume of 1 ml by ultrafiltration (Amicon-Millipore, 100 kDa molecular weight cut-off);
- the protein was diluted to 20 ml in buffer A and loaded onto a Hi-Trap SP column (29051324, GE Healthcare) and eluted with a gradient of 100 mM-1 M NaCl;
Landscapes
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Medicinal Chemistry (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Plant Pathology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Peptides Or Proteins (AREA)
- Enzymes And Modification Thereof (AREA)
Abstract
Description
本发明涉及在细胞内产生点突变的融合蛋白、其制备及用途。The present invention relates to fusion proteins which produce point mutations in cells, their preparation and use.
基因型与表型间存在密切关系。自然界中,自发突变会引起基因型的改变,从而产生多种表型。实验室中,仍然通过突变,使基因多样化,产生多种表型,从而筛选出功能突变体,研究基因与功能的相关,获得功能更强的蛋白质。自然界中,自发突变频率极低。常见生物中,人类基因组的自发突变率为5.0×10-10,小鼠基因组自发突变率为1.8×10-10,大肠杆菌基因组的自发突变率为5.4×10-10,HIV的自发突变率为3×10-5,随着生物基因组的减小,生物体的自发突变频率增高〔Holmes E C.The comparative genomics of viral emergence[J].Proceedings of the National Academy of Sciences,2010,107(4):1742-1746〕。但这种低水平的基因突变频率不能产生足够数量的表型,用以研究基因、表型与功能的关系。There is a close relationship between genotype and phenotype. In nature, spontaneous mutations cause genotypic changes that produce multiple phenotypes. In the laboratory, mutations are still made to diversify genes, produce a variety of phenotypes, and thus screen out functional mutants, study the relationship between genes and functions, and obtain more functional proteins. In nature, the frequency of spontaneous mutations is extremely low. In common organisms, the spontaneous mutation rate of human genome is 5.0×10 -10 , the spontaneous mutation rate of mouse genome is 1.8×10 -10 , the spontaneous mutation rate of E. coli genome is 5.4×10 -10 , and the spontaneous mutation rate of HIV is 3×10 -5 , the spontaneous mutation frequency of the organism increases with the decrease of the biological genome [Holmes E C. The comparative genomics of viral emergence [J]. Proceedings of the National Academy of Sciences, 2010, 107 (4) :1742-1746]. But this low level of gene mutation frequency does not produce a sufficient number of phenotypes to study the relationship between genes, phenotypes and function.
为了提高基因突变频率,实验室现有手段主要分体内突变方法和体外突变方法。体内点突变方法:1.物理方法:紫外辐射,突变频率为1×10-10〔Packer M S,Liu D R.Methods for the directed evolution of proteins[J].Nature Reviews Genetics,2015〕。2.化学方法:ENU是一种烷化剂,将乙基转移到DNA的氧和氮原子上,引起错配,碱基置换或者缺失,突变频率为1-1.5×10-5〔FILBY.ZEBRAFISH:METHODS AND PROTOCOLS.METHODS IN MOLECULAR BIOLOGY‐By G.J.Lieschke,A.C Oates and K.Kawakami.[J].Journal of Fish Biology,2010,76(7):1874-1876〕。虽然ENU易于获得,但它对光、热、PH都很敏感,限制了它的应用。这两种方法均可以通过剂量改变其突变频率,但引起的点突变是随机的,突变频率低,突变图谱不均一,对生物体有害〔Guénet J L.Chemical mutagenesis of the mouse genome:an overview[J].Genetica,2004,122(1):9-24〕。3.生物方法:转座子,染色体DNA上可自主复制和位移的基本单位,可引起插入突变,可以通过基因的插入导致基因敲除,基因激活,并可以通过选择不同载体来选择不同的插入位点,但其突变亲率比ENU低,在每一细胞周期中,只能发生3×10-5插入事件,并且需要host同时表达转座酶来完成转座〔Kitada K,Ishishita S,Tosaka K,et al.Transposon-tagged mutagenesis in the rat.[J]. Nature Methods,2007,4(2):131-133〕。In order to increase the frequency of gene mutations, the existing methods in the laboratory mainly include in vivo mutation methods and in vitro mutation methods. In vivo point mutation method: 1. Physical method: ultraviolet radiation, mutation frequency is 1 × 10 -10 [Packer M S, Liu D R. Methods for the directed evolution of proteins [J]. Nature Reviews Genetics, 2015]. 2. Chemical method: ENU is an alkylating agent that transfers ethyl groups to the oxygen and nitrogen atoms of DNA, causing mismatches, base substitutions or deletions, with a mutation frequency of 1-1.5×10 -5 [FILBY.ZEBRAFISH :METHODS AND PROTOCOLS.METHODS IN MOLECULAR BIOLOGY‐By GJLieschke, AC Oates and K. Kawakami. [J]. Journal of Fish Biology, 2010, 76(7): 1874-1876]. Although ENU is easy to obtain, it is sensitive to light, heat, and pH, limiting its application. Both methods can change the mutation frequency by dose, but the point mutations caused are random, the mutation frequency is low, the mutation map is not uniform, and it is harmful to the organism [Guénet J L.Chemical mutagenesis of the mouse genome:an overview[ J]. Genetica, 2004, 122(1): 9-24]. 3. Biological methods: transposons, the basic unit of autonomous replication and displacement on chromosomal DNA, can cause insertional mutations, can lead to gene knock-out by gene insertion, gene activation, and can select different insertions by selecting different vectors. Site, but its mutation rate is lower than ENU. In each cell cycle, only 3×10 -5 insertion events can occur, and host is required to simultaneously express transposase to complete transposition [Kitada K, Ishishita S, Tosaka K, et al. Transposon-tagged mutagenesis in the rat. [J]. Nature Methods, 2007, 4(2): 131-133].
而在免疫系统,生发中心的B细胞,可以通过体细胞高频突变产生多样性抗体,抵抗病原的入侵〔Odegard V H,Schatz D G.Targeting of somatic hypermutation.[J].Nature Reviews Immunology,2006,6(8):573-583〕。体细胞高频突变指的是免疫球蛋白重轻链可变区的非模板点突变,与B细胞亲和成熟有关〔Odegard V H等,同前〕。而介导这一过程重要的酶是激活诱导的胞嘧啶脱氨酶(activation induced cytosine deaminase,AID)。AID是一种胞嘧啶脱氨酶,属于APOBEC家族,一种RNA编辑酶家族:N端有核定位信号,C端有核输出信号,其催化结构域为APOBEC家族所共有〔Zhenming X,Hong Z,Pone E J,et al.Immunoglobulin class-switch DNA recombination:induction,targeting and beyond.[J].Nature Reviews Immunology,2012,12(7):517-31〕。一般认为N端结构为SHM所必须。AID的表达局限于生发中心的B细胞,其发挥点突变功能是有条件的,必须作用于单链的DNA,并且具有序列偏好性,hotspot结构域为RGYW〔Kiyotsugu Y,Il-Mi O,Tomonori E,et al.AID Enzyme-Induced Hypermutation in an Actively Transcribed Gene in Fibroblasts[J].Science,2002,296(5575):2033-2036〕。R代表A/G,Y代表C/T,W代表A/T,可见AID发挥功能与DNA的一级结构有关。首先将单链DNA上的胞嘧啶脱氨基变为U,形成U-G错配,如果U-G未修复,在DNA复制过程中会形成C-T G-A的转换突变。此外,U可被UNG(尿嘧啶DNA糖苷酶)切除,形成无嘧啶位点,将四种碱基随机参入〔Odegard V H等,同前〕。以上过程产生的点突变对于体细胞高频突变意义重大,可以产生多样性的抗体。但其在体内引起的点突变频率为1×10-4-1×10-3,且位点具有随机性〔Masatoshi A,Nesreen H,Andre S,et al.Accumulation of the FACT complex,as well as histone H3.3,serves as a target marker for somatic hypermutation.[J].Proceedings of the National Academy of Sciences of the United States of America,2013,110(19):7784-7789〕,仍无法满足实验筛选突变体所需。In the immune system, B cells in the germinal center can produce multi-component antibodies by high-frequency mutation of somatic cells to resist the invasion of pathogens [Odegard VH, Schatz D G. Targeting of somatic hypermutation. [J]. Nature Reviews Immunology, 2006, 6(8): 573-583]. High-frequency mutations in somatic cells refer to non-template point mutations in the immunoglobulin heavy light chain variable region, which are associated with B cell affinity maturation [Odegard V H et al., supra]. The enzyme that mediates this process is activation-induced cytosine deaminase (AID). AID is a cytosine deaminase belonging to the APOBEC family, an RNA editing enzyme family: N-terminal nuclear localization signal, C-terminal nuclear export signal, and its catalytic domain is shared by APOBEC family [Zhenming X, Hong Z , Pone EJ, et al. Immunoglobulin class-switch DNA recombination: induction, targeting and beyond. [J]. Nature Reviews Immunology, 2012, 12(7): 517-31]. It is generally believed that the N-terminal structure is necessary for SHM. The expression of AID is restricted to the B cells of the germinal center, and its function of point mutation is conditional. It must act on single-stranded DNA and has sequence preference. The hotspot domain is RGYW [Kiyotsugu Y, Il-Mi O, Tomonori E, et al. AID Enzyme-Induced Hypermutation in an Actively Transcribed Gene in Fibroblasts [J]. Science, 2002, 296 (5575): 2033-2036]. R stands for A/G, Y stands for C/T, and W stands for A/T. It can be seen that the function of AID is related to the primary structure of DNA. First, the deamination of cytosine on single-stranded DNA is changed to U to form a UG mismatch. If UG is not repaired, a CT GA conversion mutation will be formed during DNA replication. In addition, U can be excised by UNG (uracil DNA glycosidase) to form a pyrimidine-free site, and four bases are randomly incorporated [Odegard V H et al., supra]. The point mutations produced by the above process are significant for somatic high frequency mutations and can produce diverse antibodies. However, the frequency of point mutations caused in vivo is 1×10 -4 -1×10 -3 , and the sites are random [Masatoshi A, Nesreen H, Andre S, et al.Accumulation of the FACT complex, as well as Histone H3.3, serves as a target marker for somatic hypermutation. [J]. Proceedings of the National Academy of Sciences of the United States of America, 2013, 110 (19): 7784-7789], still unable to meet the experimental screening mutation Required for the body.
发明内容Summary of the invention
本文第一方面提供一种融合蛋白,所述融合蛋白含有胞嘧啶脱氨酶和核酸酶活性缺失、保留了解旋酶活性的Cas酶。The first aspect herein provides a fusion protein comprising a Cas enzyme having a cytosine deaminase and a nuclease activity loss, retaining an understanding of the chymase activity.
在一个或多个实施方案中,所述融合蛋白由胞嘧啶脱氨酶和核酸酶活性缺失、保留了解旋酶活性的Cas酶形成。In one or more embodiments, the fusion protein is formed by a Casase that lacks cytosine deaminase and nuclease activity, retains knowledge of the chymase activity.
在一个或多个实施方案中,所述Cas酶选自:Cas1、Cas1B、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9(也称为Csn1和Csx12)、Cas10、Csy1、Csy2、Csy3、 Cse1、Cse2、Csc1、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、Csf1、Csf2、Csf3、Csf4、其同源物或其修饰形式。In one or more embodiments, the Cas enzyme is selected from the group consisting of: Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, homologs thereof or modified forms thereof.
在一个或多个实施方案中,所述Cas酶的核酸酶活性部分缺失,使得所述Cas酶仅能造成DNA单链断裂;或所述Cas酶的核酸酶活性全部缺失,能引起DNA双链断裂。In one or more embodiments, the nuclease activity of the Cas enzyme is partially deleted such that the Cas enzyme only causes DNA single strand breaks; or the nuclease activity of the Cas enzyme is all deleted, causing DNA double stranding fracture.
在一个或多个实施方案中,所述Cas酶为Cas9酶,选自:来自化脓链球菌的Cas9(SpCas9)、来自金黄色葡萄球菌的Cas9(SaCas9),以及来自嗜热链球菌的Cas9(St1Cas9)。In one or more embodiments, the Cas enzyme is a Cas9 enzyme selected from the group consisting of: Cas9 (SpCas9) from S. pyogenes, Cas9 (SaCas9) from S. aureus, and Cas9 from S. thermophilus ( St1Cas9).
在一个或多个实施方案中,所述Cas酶为Cas9酶,该酶的两个核酸内切酶催化结构域RuvC1和/或HNH发生突变,导致该酶核酸酶活性缺失、保留了解旋酶活性。In one or more embodiments, the Cas enzyme is a Cas9 enzyme, the two endonuclease catalytic domains of the enzyme are mutated in RuvC1 and/or HNH, resulting in loss of nuclease activity of the enzyme .
在一个或多个实施方案中,所述Cas9酶的RuvC1和HNH都发生突变,导致该酶核酸酶活性缺失、保留了解旋酶活。In one or more embodiments, both the RuvC1 and HNH of the Cas9 enzyme are mutated, resulting in a loss of the nuclease activity of the enzyme, retaining an understanding of the chymase activity.
在一个或多个实施方案中,所述Cas9酶的第10个氨基酸天冬酰胺突变为丙氨酸或其它氨基酸,第841位氨基酸组氨酸突变为丙氨酸或其它氨基酸。In one or more embodiments, the 10th amino acid asparagine of the Cas9 enzyme is mutated to alanine or other amino acid, and the amino acid histidine at position 841 is mutated to alanine or other amino acid.
在一个或多个实施方案中,所述Cas9酶的氨基酸序列如SEQ ID NO:2第42-1452所示,或如SEQ ID NO:72第42-1419位氨基酸残基所示。In one or more embodiments, the amino acid sequence of the Cas9 enzyme is set forth in SEQ ID NO: 2, pp. 42-1452, or as shown in SEQ ID NO: 72, amino acid residues 42-1419.
在一个或多个实施方案中,所述胞嘧啶脱氨酶为全长胞嘧啶脱氨酶或其片段,其中所述片段至少包括胞嘧啶脱氨酶的NLS结构域、催化结构域和APOBEC样结构域。In one or more embodiments, the cytosine deaminase is a full length cytosine deaminase or a fragment thereof, wherein the fragment comprises at least an NLS domain, a catalytic domain, and an APOBEC-like cytosine deaminase Domain.
在一个或多个实施方案中,所述胞嘧啶脱氨酶在第10位、82位和156位氨基酸残基发生取代突变。In one or more embodiments, the cytosine deaminase undergoes a substitution mutation at
在一个或多个实施方案中,所述取代突变为K10E、T82I和E156G。In one or more embodiments, the substitution mutations are K10E, T82I, and E156G.
在一个或多个实施方案中,所述片段至少包含AID的第9-182位氨基酸残基,例如至少包含AID第1-182位氨基酸残基。In one or more embodiments, the fragment comprises at least amino acid residues 9-182 of the AID, eg, at least amino acid residues 1-182 of the AID.
在一个或多个实施方案中,所述胞嘧啶脱氨酶的氨基酸序列如SEQ ID NO:2第1457-1654位氨基酸所示,或如SEQ ID NO:68第1447-1629位氨基酸残基所示。In one or more embodiments, the amino acid sequence of the cytosine deaminase is as shown in amino acids 1457-1654 of SEQ ID NO: 2, or as amino acid residues 1447-1629 of SEQ ID NO: 68 Show.
在一个或多个实施方案中,所述片段至少包含SEQ ID NO:2的第1465-1638位氨基酸残基,例如至少包含SEQ ID NO:2第1457-1638位氨基酸残基。In one or more embodiments, the fragment comprises at least amino acid residues 1465-1638 of SEQ ID NO: 2, eg, at least amino acid residues 1457-1638 of SEQ ID NO: 2.
在一个或多个实施方案中,所述片段由第1-182位氨基酸残基组成,由第1-186位氨基酸残基组成,或由第1-190位氨基酸残基组成。In one or more embodiments, the fragment consists of amino acid residues 1-182, consists of amino acid residues 1-164, or consists of amino acid residues 1-109.
在一个或多个实施方案中,所述融合蛋白还包含以下序列中的一种或多种:接 头,核定位序列,以及为了构建融合蛋白、促进重组蛋白的表达、获得自动分泌到宿主细胞外的重组蛋白、或利于重组蛋白的纯化而引入的氨基酸残基或氨基酸序列。In one or more embodiments, the fusion protein further comprises one or more of the following sequences: A head, a nuclear localization sequence, and an amino acid residue or amino acid sequence introduced for the purpose of constructing a fusion protein, promoting expression of a recombinant protein, obtaining a recombinant protein that is automatically secreted outside the host cell, or facilitating purification of the recombinant protein.
在一个或多个实施方案中,所述融合蛋白的氨基酸序列如SEQ ID NO:2、4、66、68、70或72所示,或如SEQ ID NO:2第26-1654位氨基酸所示,或如SEQ ID NO:4第26-1638位所示,或如SEQ ID NO:68第26-1629位氨基酸所示,或如SEQ ID NO:70第26-1629位氨基酸所示,或如SEQ ID NO:72第26-1638位氨基酸所示。In one or more embodiments, the amino acid sequence of the fusion protein is set forth in SEQ ID NO: 2, 4, 66, 68, 70 or 72, or as shown in amino acids 26-1654 of SEQ ID NO: 2. Or as shown in SEQ ID NO: 4, positions 26-1638, or as shown in SEQ ID NO: 68, amino acids 26-1629, or as SEQ ID NO: 70, amino acids 26-1629, or as SEQ ID NO: 72 is shown in amino acids 26-1638.
本文第二方面提供一种多核苷酸序列,选自:A second aspect of the invention provides a polynucleotide sequence selected from the group consisting of:
(1)编码本文第一方面所述的融合蛋白的多核苷酸序列;和(1) a polynucleotide sequence encoding the fusion protein of the first aspect of the invention;
(2)(1)所述序列的互补序列。(2) (1) The complementary sequence of the sequence.
本发明第三方面提供核酸构建物,所述核酸构建物含有本文第二方面所述的多核苷酸序列。A third aspect of the invention provides a nucleic acid construct comprising the polynucleotide sequence of the second aspect herein.
在一个或多个实施方案中,所述核酸构建物是表达载体,用于在宿主细胞中表达本文所述的融合蛋白。In one or more embodiments, the nucleic acid construct is an expression vector for expression of a fusion protein described herein in a host cell.
本发明第四方面提供一种宿主细胞,所述宿主细胞含有本文所述的融合蛋白、其编码序列或核酸构建物。A fourth aspect of the invention provides a host cell comprising a fusion protein, a coding sequence thereof or a nucleic acid construct as described herein.
本文第五方面提供一种在细胞内产生点突变的方法,所述方法包括在所述细胞中表达本文所述的融合蛋白和sgRNA的步骤。A fifth aspect herein provides a method of producing a point mutation in a cell, the method comprising the step of expressing a fusion protein and sgRNA described herein in said cell.
在一个或多个实施方案中,所述方法包括将本文所述的融合蛋白或其表达载体和sgRNA或其表达载体转入所述细胞内,然后筛选获得所需要的突变核酸序列的步骤。In one or more embodiments, the methods comprise the step of transferring a fusion protein described herein, or an expression vector thereof, and sgRNA or an expression vector thereof into the cell, followed by screening to obtain the desired mutated nucleic acid sequence.
在一个或多个实施方案中,所述sgRNA包括靶标结合区和Cas蛋白识别区,所述靶标结合区能特异性结合待突变的核酸序列,所述Cas蛋白识别区能被所述融合蛋白中的Cas酶识别并结合。In one or more embodiments, the sgRNA comprises a target binding region and a Cas protein recognition region, the target binding region being capable of specifically binding to a nucleic acid sequence to be mutated, the Cas protein recognition region being capable of being The Cas enzyme recognizes and binds.
在一个或多个实施方案中,所述sgRNA的靶标结合区与待突变的核酸序列的模板链特异性结合,模板链上sgRNA结合区域的对侧区紧邻该Cas蛋白所识别的前间区序列邻近基序,或隔开10个以内的碱基。In one or more embodiments, the target binding region of the sgRNA specifically binds to a template strand of a nucleic acid sequence to be mutated, and the contralateral region of the sgRNA binding region on the template strand is immediately adjacent to the anterior region sequence recognized by the Cas protein Adjacent motifs, or bases separated by 10 or less.
在一个或多个实施方案中,所述待突变的基因编码功能蛋白。In one or more embodiments, the gene to be mutated encodes a functional protein.
在一个或多个实施方案中,所述功能蛋白包括疾病的发生、发展和转移中涉及的蛋白,细胞分化、增殖与凋亡中涉及的蛋白,参与新陈代谢的蛋白,发育相关的蛋白,以及各种药物靶点等等。In one or more embodiments, the functional protein includes proteins involved in the development, progression, and metastasis of diseases, proteins involved in cell differentiation, proliferation, and apoptosis, proteins involved in metabolism, development-related proteins, and Drug targets and so on.
在一个或多个实施方案中,所述功能蛋白选自:抗体、酶、脂蛋白、激素类蛋白、运输和贮存蛋白、运动蛋白、受体蛋白、和膜蛋白。 In one or more embodiments, the functional protein is selected from the group consisting of antibodies, enzymes, lipoproteins, hormone proteins, transport and storage proteins, motor proteins, receptor proteins, and membrane proteins.
本发明第六方面提供一种试剂盒,所述试剂盒含有本文所述的融合蛋白、多核苷酸序列或核酸构建物。A sixth aspect of the invention provides a kit comprising a fusion protein, polynucleotide sequence or nucleic acid construct as described herein.
本发明第七方面提供本文所述的融合蛋白、多核苷酸序列或核酸构建物在在细胞内产生点突变中的应用,或在制备用于在细胞内产生点突变的组合物或试剂盒中的应用。A seventh aspect of the invention provides the use of a fusion protein, polynucleotide sequence or nucleic acid construct as described herein for producing a point mutation in a cell, or in the preparation of a composition or kit for producing a point mutation in a cell Applications.
图1:A和C分别为PCR扩增出的AID(泳道1)及AIDX片段(泳道1);B为pEntr11-dCas9-AID质粒琼脂糖凝胶图,其中1道为pEntr11空载质粒,2道为pEntr11-dCas9质粒,3-7道为pEntr11-dCas9-AID质粒;D为pEntr11-dCas9-AIDX质粒菌液PCR结果,扩增出的片段是AIDX。D中1-5泳道分别代表5个不同的阳性克隆,6号是空载质粒,作为阴性对照。Figure 1: A and C are PCR-amplified AID (lane 1) and AIDX fragment (lane 1); B is pEntr11-dCas9-AID plasmid agarose gel, in which one lane is pEntr11 empty plasmid, 2 The plasmid is pEntr11-dCas9 plasmid, the 3-7 lanes are pEntr11-dCas9-AID plasmid; D is the PCR result of pEntr11-dCas9-AIDX plasmid bacterial solution, and the amplified fragment is AIDX. Lanes 1-5 in D represent 5 different positive clones, respectively, and No. 6 is an empty plasmid as a negative control.
图2:A,1道和2道分别为PCR扩增出的dCas9-AID及dCas9-AIDX片段;B,酶切MO91空载质粒,其中1道为BglⅡ单酶切,2道为MO91空载质粒,3道为BglⅡ和XhoⅠ双酶切;C,MO91-dCas9-AIDX质粒菌液PCR结果,扩增出的片段是AIDX;D,MO91-dCas9-AID质粒菌液PCR结果,扩增出的片段是AID。Figure 2: A, 1 and 2 lanes are respectively PCR-amplified dCas9-AID and dCas9-AIDX fragments; B, enzymatically cleavage of MO91 empty-loaded plasmid, one of which is BglII single-cut, and the other is MO91 empty Plasmid, 3 lanes are BglII and XhoI double digestion; C, MO91-dCas9-AIDX plasmid bacterial solution PCR results, the amplified fragment is AIDX; D, MO91-dCas9-AID plasmid bacterial solution PCR results, amplified The fragment is an AID.
图3:A,1道为PCR扩增出的3*flag+NLS片段,2道及3道分别为BglⅡ单酶切MO91-dCas9-AID质粒和MO91-dCas9-AIDX质粒,4道为MO91-dCas9-AID质粒对照;B,1-4道为MO91-dCas9(3*flag,NLS)-AID质粒,5道为MO91-dCas9-AID质粒,6-9道为MO91-dCas9(3*flag,NLS)-AIDX质粒。Figure 3: A, 1 is the 3*flag+NLS fragment amplified by PCR, and 2 and 3 lanes are BglII single-cutting MO91-dCas9-AID plasmid and MO91-dCas9-AIDX plasmid, respectively, and 4 lanes are MO91- dCas9-AID plasmid control; B, 1-4 lanes are MO91-dCas9 (3*flag, NLS)-AID plasmid,
图4:EGFP报告子的序列,终止密码子以粗体表示。设计的sgRNA用箭头表示。Figure 4: Sequence of the EGFP reporter, the stop codon is shown in bold. The designed sgRNA is indicated by an arrow.
图5:报告质粒的模式示意图。Figure 5: Schematic representation of the pattern of the reporter plasmid.
图6:流式检测报告细胞系。三条曲线从左到右分别表示未染色对照、报告子阴性细胞和报告子阳性细胞的Thy1.1表达水平。Figure 6: Flow cytometry reporting cell line. The three curves from left to right indicate Thy1.1 expression levels of unstained controls, reporter negative cells, and reporter positive cells, respectively.
图7:dCas9-AID,dCas9-AIDX,AID和AIDX点突变效率在报告细胞中的比较。Figure 7: Comparison of dCas9-AID, dCas9-AIDX, AID and AIDX point mutation efficiencies in reporter cells.
图8:dCas9-AID点突变效率在报告细胞中的优化。A,dCas9-AID诱导GFP表达;B,不同AID变体的示意图以及其诱导点突变的效率;C,dCas9-AIDX诱导点突变需要AID的胞嘧啶脱氨酶活性。Figure 8: Optimization of dCas9-AID point mutation efficiency in reporter cells. A, dCas9-AID induces GFP expression; B, a schematic of different AID variants and the efficiency of their induction of point mutations; C, dCas9-AIDX induces point mutations requiring cytosine deaminase activity of AID.
图9:dCas9-AIDX和AID对EGFP和cMyc基因造成的点突变频率分布。Figure 9: Point mutation frequency distribution of dCas9-AIDX and AID on EGFP and cMyc genes.
图10:dCas9-AIDX将C和G碱基随机突变为其他三种碱基。A,碱基突变类型的统计;B,dCas9-AIDX诱导点突变的机制。Figure 10: dCas9-AIDX randomly mutates C and G bases to three other bases. A, statistics of base mutation types; B, dCas9-AIDX induces point mutation mechanism.
图11:UGI提高dCas9-AIDX系统的碱基置换频率,揭示dCas9-AIDX在基因上 的作用轨迹,并使碱基突变方向更加单一化。Figure 11: UGI increases the frequency of base substitutions in the dCas9-AIDX system, revealing that dCas9-AIDX is genetically The trajectory of the action, and the direction of the base mutation is more singular.
图12:dCas9-AIDX不仅可以对外源性基因起作用,同时可以作用于内源性基因。Figure 12: dCas9-AIDX not only acts on exogenous genes, but also on endogenous genes.
图13:AID的结构功能域。Figure 13: Structural functional domain of the AID.
图14:将dCas9-AIDX应用于K562BCR-ABL基因的Gleevec耐药性筛选的实验过程(a)及结果(b-d)。Figure 14: Experimental procedure (a) and results (b-d) of the application of dCas9-AIDX to the Gleevec resistance screening of the K562BCR-ABL gene.
图15:TAM(靶向胞嘧啶脱氨酶AID介导基因突变技术)突变抗HEL-IgG1可变区的氨基酸。Figure 15: TAM (targeting cytosine deaminase AID-mediated gene mutation technique) mutating amino acids of the anti-HEL-IgG1 variable region.
图16:TAM诱导抗HEL-IgG1可变区的碱基突变(上图),且可重复地诱导IgG1CDR的碱基突变(下图)。Figure 16: TAM induces base mutations in the anti-HEL-IgG1 variable region (top panel) and can repeatedly induce base mutations in the IgGl CDRs (bottom panel).
图17:突变后的抗体对HEL的亲和力增强了10倍以上。Figure 17: The affinity of the mutated antibody for HEL is increased by more than 10 fold.
图18:nCas9-AIDX在细菌中的表达结果。方框框出的条带为nCas9-AIDX融合蛋白的条带。Figure 18: Results of expression of nCas9-AIDX in bacteria. The boxed box is a band of nCas9-AIDX fusion protein.
图19:不同融合蛋白的功能测试结果。对每一组数据,从左到右三根柱子依次代表MO91-AIDX-XTEN-dCas9、MO91-dCas9-XTEN-AIDX和MO91-dCas9-AIDX的结果。Figure 19: Functional test results for different fusion proteins. For each set of data, the three pillars from left to right represent the results of MO91-AIDX-XTEN-dCas9, MO91-dCas9-XTEN-AIDX, and MO91-dCas9-AIDX.
图20:不同融合蛋白的功能测试结果。对每一组数据,从左到右三根柱子依次代表MO91-dCas9-AIDX、MO91-dCas9-XTEN-AIDX(K10E T82I E156G)和MO91-dCas9-XTEN-AIDX的结果。Figure 20: Functional test results for different fusion proteins. For each set of data, the three pillars from left to right represent the results of MO91-dCas9-AIDX, MO91-dCas9-XTEN-AIDX (K10E T82I E156G) and MO91-dCas9-XTEN-AIDX.
图21:nCas9-AIDX融合蛋白的功能验证结果。Figure 21: Functional verification results of the nCas9-AIDX fusion protein.
本文涉及核酸酶活性缺失的Cas蛋白与胞嘧啶脱氨酶AID或其突变体的融合蛋白。在sgRNA的指引下,所述融合蛋白被招募到特定的DNA序列,AID或其突变体对胞嘧啶脱氨基,产生尿嘧啶,而后在DNA修复过程中,被随机突变成其它碱基,从而在实现定点突变的同时获得高的突变效率。This document relates to a fusion protein of Cas protein with nuclease activity and cytosine deaminase AID or a mutant thereof. Under the guidance of sgRNA, the fusion protein is recruited to a specific DNA sequence, and AID or its mutant deamination of cytosine to produce uracil, which is then randomly mutated into other bases during DNA repair. High mutation efficiency is obtained while achieving site-directed mutagenesis.
关于Cas/sgRNA的内容,除本文下文所述外,还可参见CN 201380049665.5和CN 201380072752.2,本文将其全部内容以引用的方式纳入本文。With regard to the content of the Cas/sgRNA, in addition to the description herein, see also CN 201380049665.5 and CN 201380072752.2, the entire contents of each of which are hereby incorporated by reference.
Cas蛋白Cas protein
CRISPR(Clustered Regularly Interspaced Short Palindromic Repeats)是细菌抵御病毒侵袭或躲避哺乳动物免疫反应的基因编辑系统。该系统经过改造和优化,目前已被广泛应用在体外生化反应、细胞与个体的基因编辑中。 CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a genetic editing system in which bacteria resist viral invasion or evade mammalian immune responses. The system has been modified and optimized and is now widely used in in vitro biochemical reactions, cell and individual gene editing.
通常,具有核酸内切酶活性的Cas蛋白与其特异性识别的sgRNA形成的复合物通过sgRNA的配对区与靶标DNA中的模板链进行互补配对,由Cas在特定位置将双链DNA切断。应理解,本文中,“Cas蛋白”与“Cas酶”可互换使用。Typically, a complex of a Cas protein with endonuclease activity and its specifically recognized sgRNA is complementary paired with a template strand in the target DNA by a pairing region of the sgRNA, and the double stranded DNA is cleaved by Cas at a specific position. It should be understood that "Cas protein" and "Cas enzyme" are used interchangeably herein.
本文利用Cas/sgRNA的上述特性,即利用sgRNA与靶标的特异性结合而将Cas定位到期望的位置,在该位置由融合蛋白中的AID或其突变体对胞嘧啶脱氨基。适用于本发明的核酸酶活性部分或完全缺失,尤其是核酸内切酶活性部分或完全缺失、但保留了解旋酶活性的Cas蛋白可以衍生自本领域周知的各种Cas蛋白及其变异体,包括但不限于Cas1、Cas1B、Cas2、Cas3、Cas4、Cas5、Cas6、Cas7、Cas8、Cas9(也称为Csn1和Csx12)、Cas10、Csy1、Csy2、Csy3、Cse1、Cse2、Csc1、Csc2、Csa5、Csn2、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、CsaX、Csx3、Csx1、Csx15、Csf1、Csf2、Csf3、Csf4、其同源物或其修饰形式。The above-described property of Cas/sgRNA is utilized herein to utilize sgRNA specific binding to a target to position Cas to a desired position at which the cytosine is deaminated by AID or a mutant thereof in the fusion protein. Partial or complete deletions of nuclease activity suitable for use in the present invention, particularly partial or complete deletion of endonuclease activity, but retaining knowledge of the chymase activity may be derived from various Cas proteins and variants thereof well known in the art. Including but not limited to Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, a homolog thereof or a modified form thereof.
在一些实施方案中,使用核酸酶活性缺失的Cas9酶和其特异性识别的单链sgRNA。Cas9酶可以是来自不同物种的Cas9酶,包括但不限于来自化脓链球菌的Cas9(SpCas9)、来自金黄色葡萄球菌的Cas9(SaCas9),以及来自嗜热链球菌的Cas9(St1Cas9)等。可以使用Cas9酶的各种变体,只要该Cas9酶能特异性识别它的sgRNA,并缺失核酸酶活性即可。In some embodiments, a Cas9 enzyme lacking nuclease activity and a single-stranded sgRNA specifically recognized by the same are used. The Cas9 enzyme may be a Cas9 enzyme from a different species including, but not limited to, Cas9 (SpCas9) from S. pyogenes, Cas9 (SaCas9) from S. aureus, and Cas9 (St1 Cas9) from Streptococcus thermophilus, and the like. Various variants of the Cas9 enzyme can be used as long as the Cas9 enzyme specifically recognizes its sgRNA and lacks nuclease activity.
可采用本领域周知的方法制备核酸酶活性缺失的Cas蛋白,这些方法包括但不限于使Cas蛋白中核酸内切酶的整个催化结构域缺失或使该结构域中的一个或数个氨基酸发生突变,从而产生核酸酶活性缺失的Cas蛋白。突变可以是一个或数个(例如2个以上、3个以上、4个以上、5个以上、10个以上,至整个催化结构域)氨基酸残基的缺失或取代,或一个或数个新氨基酸残基(例如1个以上、2个以上、3个以上、4个以上、5个以上、10个以上,或者1~10个、1~15个不等)的插入。可采用本领域常规的方法进行上述结构域的缺失或氨基酸残基的突变,以及检测突变后的Cas蛋白是否还具有核酸酶活性。例如,对于Cas9,可将它的两个核酸内切酶催化结构域RuvC1和HNH分别突变,例如将该酶的第10个氨基酸(位于RuvC1结构域中)天冬酰胺突变为丙氨酸或其它氨基酸,将第841位氨基酸(位于HNH结构域中)组氨酸突变为丙氨酸或其它氨基酸。这两处突变使Cas9失去核酸内切酶活性。优选的是,Cas酶完全无核酸酶活性。在一个或多个实施方案中,本文使用的无核酸酶活性的Cas9酶的氨基酸序列如SEQ ID NO:2第42-1452所示。在其他实施方案中,本文使用的Cas酶部分缺失核酸酶活性,即该Cas酶可引起DNA单链断裂。这类Cas酶的代表性例子可如SEQ ID NO:72第42-1419位氨基酸残基所示。 The Cas protein with nuclease activity deletion can be prepared by methods well known in the art, including but not limited to deletion of the entire catalytic domain of the endonuclease in the Cas protein or mutation of one or several amino acids in the domain. Thereby producing a Cas protein lacking in nuclease activity. The mutation may be one or several (for example, two or more, three or more, four or more, five or more, more than ten, to the entire catalytic domain) deletion or substitution of amino acid residues, or one or several new amino acids. Insertion of residues (for example, one or more, two or more, three or more, four or more, five or more, ten or more, or 1 to 10, and 1 to 15). Deletion of the above domains or mutation of amino acid residues can be carried out by methods conventional in the art, and whether the Cas protein after the mutation also has nuclease activity. For example, for Cas9, its two endonuclease catalytic domains, RuvC1 and HNH, can be mutated, for example, the 10th amino acid of the enzyme (in the RuvC1 domain) is mutated to alanine or other An amino acid that mutates the histidine of amino acid 841 (located in the HNH domain) to alanine or other amino acids. These two mutations cause Cas9 to lose endonuclease activity. Preferably, the Cas enzyme is completely nuclease free. In one or more embodiments, the amino acid sequence of the nuclease-free Cas9 enzyme used herein is set forth in SEQ ID NO: 2, pp. 42-1452. In other embodiments, the Cas enzyme portion used herein lacks nuclease activity, ie, the Cas enzyme can cause DNA single strand breaks. A representative example of such a Cas enzyme can be shown as amino acid residues 42-14-19 of SEQ ID NO:72.
Cas/sgRNA复合物行使功能需要在DNA的非模板链(3’到5’)有前间区序列邻近基序(protospacer adjacent motif,PAM)。不同Cas酶,其对应的PAM并不完全相同。例如,针对SpCas9的PAM通常是NGG;针对SaCas9酶的PAM通常是NNGRR;针对St1Cas9酶的PAM通常是NNAGAA;其中,N为A、C、T或G,R为G或A。The function of the Cas/sgRNA complex requires a protospacer adjacent motif (PAM) in the non-template strand (3' to 5') of the DNA. Different Cas enzymes, their corresponding PAMs are not identical. For example, the PAM for SpCas9 is typically NGG; the PAM for the SaCas9 enzyme is typically NNGRR; the PAM for the St1Cas9 enzyme is typically NNAGAA; wherein N is A, C, T or G and R is G or A.
在某些优选的实施方式中,针对SaCas9酶的PAM是NNGRRT。在某些优选的实施方式中,针对SpCas9的PAM是TGG。In certain preferred embodiments, the PAM for the SaCas9 enzyme is NNGRRT. In certain preferred embodiments, the PAM for SpCas9 is TGG.
sgRNAsgRNA
sgRNA通常包括两部分:靶标结合区和Cas蛋白识别区。靶标结合区与Cas蛋白识别区通常以5’到3’的方向连接。sgRNA usually consists of two parts: a target binding region and a Cas protein recognition region. The target binding region and the Cas protein recognition region are usually joined in the 5' to 3' direction.
靶标结合区的长度通常为15~25个碱基,更通常为18~22个碱基,如20个碱基。靶标结合区与DNA的模板链特异性结合,从而将融合蛋白招募到预定位点。通常,DNA模板链上sgRNA结合区域的对侧区紧邻PAM,或者隔开数个碱基(例如10个以内,或8个以内,或5个以内)。因此,在设计sgRNA时,通常先根据所用的Cas酶确定该酶的PAM,然后在DNA的非模板链上寻找可作为PAM的位点,之后将该非模板链(3’到5’)PAM位点下游紧邻该PAM位点或与该PAM位点隔开10个以内(例如8个以内、5个以内等)的长15~25个碱基、更通常长18~22个碱基的片段作为sgRNA的靶标结合区的序列。The target binding region is typically 15 to 25 bases in length, more typically 18 to 22 bases, such as 20 bases. The target binding region specifically binds to the template strand of DNA, thereby recruiting the fusion protein to a predetermined site. Typically, the contralateral region of the sgRNA binding region on the DNA template strand is in close proximity to the PAM, or is separated by a few bases (eg, within 10, or within 8 or within 5). Therefore, when designing sgRNA, the PAM of the enzyme is usually determined according to the Cas enzyme used, and then a site which can be used as a PAM is found on the non-template strand of DNA, and then the non-template strand (3' to 5') PAM is used.
sgRNA的Cas蛋白识别区则根据所使用的Cas蛋白而确定,这为本领域所技术人员所掌握。The Cas protein recognition region of sgRNA is determined based on the Cas protein used, which is well known to those skilled in the art.
因此,本文的sgRNA的靶标结合区的序列为含所选Cas酶识别的PAM位点的DNA链下游紧邻该PAM位点或与该PAM位点隔开10个以内(例如8个以内、5个以内等)的长15~25个碱基、更通常长18~22个碱基的片段;其Cas蛋白识别区为所选Cas酶所特异性识别。Therefore, the sequence of the target binding region of the sgRNA herein is that the DNA strand containing the PAM site recognized by the selected Cas enzyme is immediately downstream of the PAM site or within 10 of the PAM site (for example, 8 or less, 5 Fragments of 15 to 25 bases in length, usually 18 to 22 bases in length; the Cas protein recognition region is specifically recognized by the selected Cas enzyme.
可采用本领域常规的方法制备sgRNA,例如,采用常规的化学合成方法合成。sgRNA也可经由表达载体转入细胞,在细胞内表达出该sgRNA。可采用本领域周知的方法构建sgRNA的表达载体。The sgRNA can be prepared by methods conventional in the art, for example, by conventional chemical synthesis methods. The sgRNA can also be transformed into cells via an expression vector to express the sgRNA in the cell. Expression vectors for sgRNA can be constructed using methods well known in the art.
激活诱导的胞嘧啶脱氨酶(AID)Activation-induced cytosine deaminase (AID)
AID是一种胞嘧啶脱氨酶,属于APOBEC家族,一种RNA编辑酶家族:N端有核定位信号,C端有核输出信号,其催化结构域为APOBEC家族所共有。一般认为N端结构为体细胞超变(SHM)所必须。AID的功能是对胞嘧啶脱氨基,将胞嘧 啶变成尿嘧啶,随后的DNA修复可以将尿嘧啶变成其它碱基。应理解的是,本领域周知的胞嘧啶脱氨酶或其保留了对胞嘧啶脱氨基、将胞嘧啶变成尿嘧啶的生物学活性的片段或突变体均可用于本文。AID is a cytosine deaminase belonging to the APOBEC family, an RNA editing enzyme family: a nuclear localization signal at the N-terminus and a nuclear export signal at the C-terminus. The catalytic domain is shared by the APOBEC family. It is generally believed that the N-terminal structure is required for somatic hypermutation (SHM). AID function is deamination of cytosine, cytosine The pyridine becomes uracil, and subsequent DNA repair can turn uracil into other bases. It will be appreciated that cytosine deaminase, or fragments or mutants thereof that retain the biological activity of cytosine deamination, cytosine to uracil, are well known in the art.
如图14显示了AID的结构功能域。其中氨基酸9-26为核定位(NLS)结构域,尤其是氨基酸13-26参与了DNA的结合,氨基酸56-94为催化结构域,氨基酸109-182为APOBEC样结构域,氨基酸193-198为核输出(NES)结构域,氨基酸39-42与连环蛋白样蛋白1(CTNNBL1)相互作用,氨基酸113-123是hotspot识别环。The structural functional domain of the AID is shown in Figure 14. Among them, amino acids 9-26 are nuclear localization (NLS) domains, especially amino acids 13-26 are involved in DNA binding, amino acids 56-94 are catalytic domains, amino acids 109-182 are APOBEC-like domains, and amino acids 193-198 are The nuclear export (NES) domain, amino acids 39-42 interact with catenin-like protein 1 (CTNNBL1), and amino acids 113-123 are hotspot recognition loops.
本文可使用AID的全长序列(如SEQ ID NO:2第1457-1654位氨基酸所示),也可使用AID的片段。优选的是,所述片段至少包括NLS结构域、催化结构域和APOBEC样结构域。因此,在某些实施方案中,所述片段至少包含AID第9-182位氨基酸残基(即SEQ ID NO:2第1465-1638位氨基酸残基)。在其他实施方案中,所述片段至少包含AID第1-182位氨基酸残基(即SEQ ID NO:2第1457-1638位氨基酸残基)。例如,在某些实施方案中,本文使用的AID片段由第1-182位氨基酸残基组成,由第1-186位氨基酸残基组成,或由第1-190位氨基酸残基组成。因此,在某些实施方案中,本文使用的AID片段由SEQ ID NO:2第1457-1638位氨基酸残基、SEQ ID NO:2第1457-1642位氨基酸残基,或由SEQ ID NO:2第1457-1646位氨基酸残组成。A full length sequence of AID (as indicated by amino acids 1457-1654 of SEQ ID NO: 2) can also be used herein, and fragments of AID can also be used. Preferably, the fragment comprises at least an NLS domain, a catalytic domain and an APOBEC-like domain. Thus, in certain embodiments, the fragment comprises at least amino acid residues 9-182 of the AID (ie, amino acid residues 1465-1638 of SEQ ID NO: 2). In other embodiments, the fragment comprises at least amino acid residues 1-182 of the AID (ie, amino acid residues 1457-1638 of SEQ ID NO: 2). For example, in certain embodiments, an AID fragment as used herein consists of amino acid residues 1-182, consists of amino acid residues 1-164, or consists of amino acid residues 1-190. Thus, in certain embodiments, the AID fragment used herein consists of amino acid residues 1457-1638 of SEQ ID NO: 2, amino acid residues 1457-1642 of SEQ ID NO: 2, or SEQ ID NO: 2 Amino acid residue composition of 1457-1646.
本文还可使用AID的保留了其胞嘧啶脱氨酶活的变体。例如,这样的变体相当于AID的野生型序列可具有1-10个,如1-8个,1-5个或1-3个氨基酸变异,包括氨基酸的缺失、取代和突变。优选的是,这些氨基酸变异不发生在上述NLS结构域、催化结构域和APOBEC样结构域内,或即便发生在这些结构域内也不影响到这些结构域原本的生物学功能。例如,优选的是,这些变异不发生在AID氨基酸序列的第24、27、38、56、58、87、90、112、140等位置上。在某些实施方案中,这些变异也不发生在氨基酸39-42、氨基酸113-123之内。因此,例如,变异可发生在氨基酸1-8、氨基酸28-37、氨基酸43-55和/或氨基酸183-198之中。在某些实施方案中,变异发生在第10、82和156位。例如,在第10、82和156位发生取代突变,这类取代突变可以是K10E、T82I和E156G。在这些实施方案中,示例性的AID突变体的氨基酸序列含有如SEQ ID NO:68第1447-1629位所示的氨基酸序列,或由如SEQ ID NO:68第1447-1629位所示的氨基酸残基组成。Variants of AID that retain their cytosine deaminase activity can also be used herein. For example, such variants may correspond to a wild-type sequence of AIDs having from 1 to 10, such as 1-8, 1-5 or 1-3 amino acid variations, including deletions, substitutions and mutations of amino acids. Preferably, these amino acid variations do not occur within the above-described NLS domain, catalytic domain and APOBEC-like domain, or even within these domains do not affect the original biological function of these domains. For example, it is preferred that these variations do not occur at
融合蛋白 Fusion protein
本文提供融合蛋白,其含有Cas酶与AID。本文的融合蛋白,Cas酶通常在融合蛋白氨基酸序列的N端,AID在C端。在某些实施方案中,本文提供主要由Cas酶和AID形成的融合蛋白。应理解的是,本文所述的“主要由……形成”的融合蛋白或类似表述并不意指融合蛋白仅包括Cas酶和AID,该限定应理解为融合蛋白可仅包括Cas酶和AID,或还可含有其他不影响到该融合蛋白中的Cas酶的靶向作用及AID突变靶序列的功能的部分,包括但不限于各种接头序列、核定位序列以及如下文所述因基因克隆操作、和/或为了构建融合蛋白、促进重组蛋白的表达、获得自动分泌到宿主细胞外的重组蛋白、或利于重组蛋白的检测和/或纯化等而在融合蛋白中引入的氨基酸序列。Provided herein are fusion proteins comprising Cas enzyme and AID. In the fusion protein herein, the Cas enzyme is usually at the N-terminus of the amino acid sequence of the fusion protein, and the AID is at the C-terminus. In certain embodiments, provided herein are fusion proteins formed primarily of Cas enzyme and AID. It should be understood that a fusion protein or similar expression "formed primarily by" herein does not mean that the fusion protein includes only Cas enzyme and AID, and the definition is understood to mean that the fusion protein may include only Cas enzyme and AID, or It may also contain other components that do not affect the targeting of the Cas enzyme in the fusion protein and the function of the AID mutant target sequence, including but not limited to various linker sequences, nuclear localization sequences, and gene cloning operations, as described below, And/or an amino acid sequence introduced in the fusion protein for the purpose of constructing a fusion protein, promoting expression of a recombinant protein, obtaining a recombinant protein that is automatically secreted outside the host cell, or facilitating detection and/or purification of the recombinant protein.
Cas酶可通过接头与AID融合。接头可以是3~25个残基的肽,例如3~15、5~15、10~20个残基的肽。肽接头的适合的实例是本领域中公知的。通常,接头含有一个或多个前后重复的基序,该基序通常含有Gly和/或Ser。例如,该基序可以是SGGS、GSSGS、GGGS、GGGGS、SSSSG、GSGSA和GGSGG。优选地,该基序在接头序列中是相邻的,在重复之间没有插入氨基酸残基。接头序列可以包含1、2、3、4或5个重复基序组成。在某些实施方案中,接头序列是多甘氨酸接头序列。接头序列中甘氨酸的数量无特别限制,通常为2~20个,例如2~15、2~10、2~8个。除甘氨酸和丝氨酸来,接头中还可含有其它已知的氨基酸残基,例如丙氨酸(A)、亮氨酸(L)、苏氨酸(T)、谷氨酸(E)、苯丙氨酸(F)、精氨酸(R)、谷氨酰胺(Q)等。在某些实施方案中,接头序列为XTEN,其氨基酸序列如SEQ ID NO:66第183-198位氨基酸残基所示。The Cas enzyme can be fused to the AID via a linker. The linker may be a peptide of 3 to 25 residues, for example, a peptide of 3 to 15, 5 to 15, 10 to 20 residues. Suitable examples of peptide linkers are well known in the art. Typically, the linker contains one or more motifs that are repeated before and after, and the motif typically contains Gly and/or Ser. For example, the motif can be SGGS, GSSGS, GGGS, GGGGS, SSSSG, GSGSA, and GGSGG. Preferably, the motif is contiguous in the linker sequence with no amino acid residues inserted between the repeats. The linker sequence may comprise 1, 2, 3, 4 or 5 repeat motifs. In certain embodiments, the linker sequence is a polyglycine linker sequence. The amount of glycine in the linker sequence is not particularly limited, but is usually 2 to 20, for example, 2 to 15, 2 to 10, and 2 to 8. In addition to glycine and serine, the linker may also contain other known amino acid residues such as alanine (A), leucine (L), threonine (T), glutamic acid (E), styrene Amino acid (F), arginine (R), glutamine (Q), and the like. In certain embodiments, the linker sequence is XTEN, the amino acid sequence of which is set forth in amino acid residues 183-198 of SEQ ID NO:66.
作为例子,接头可由以下氨基酸序列组成:G(SGGGG)2SGGGLGSTEF(SEQ ID NO:21)、RSTSGLGGGS(GGGGS)2G(SEQ ID NO:22)、QLTSGLGGGS(GGGGS)2G(SEQ ID NO:23)、GGGS(SEQ ID NO:24)、GGGGS(SEQ ID NO:25)、SSSSG(SEQ ID NO:26)、GSGSA(SEQ ID NO:27)、GGSGGGGGGSGGGGSGGGGS(SEQ ID NO:28)、SSSSGSSSSGSSSSG(SEQ ID NO:29)、GSGSAGSGSAGSGSA(SEQ ID NO:30)、GGSGGGGSGGGGSGG(SEQ ID NO:31)、SEQ ID NO:72第1420-1456位氨基酸残基等。As an example, a linker can be composed of the following amino acid sequences: G(SGGGG) 2 SGGGLGSTEF (SEQ ID NO: 21), RSTSGLGGGS (GGGGS) 2 G (SEQ ID NO: 22), QLTSGLGGGS (GGGGS) 2 G (SEQ ID NO: 23) ), GGGS (SEQ ID NO: 24), GGGGS (SEQ ID NO: 25), SSSSG (SEQ ID NO: 26), GSGSA (SEQ ID NO: 27), GGSGGGGGGSGGGGSGGGGS (SEQ ID NO: 28), SSSSGSSSSGSSSSG (SEQ) ID NO: 29), GGSGAGSGSAGSGSA (SEQ ID NO: 30), GGSGGGGSGGGGSGG (SEQ ID NO: 31), SEQ ID NO: 72, amino acid residues 1420-1456, and the like.
应理解,在基因克隆操作中,常常需要设计合适的酶切位点,这势必在所表达的氨基酸序列末端引入了一个或多个不相干的残基,而这并不影响目的序列的活性。为了构建融合蛋白、促进重组蛋白的表达、获得自动分泌到宿主细胞外的重组蛋白、或利于重组蛋白的纯化,常常需要将一些氨基酸添加至重组蛋白的N-末端、C-末端或该蛋白内的其它合适区域内,例如,包括但不限于,适合的接头肽、信号肽、前导肽、 末端延伸等。因此,本文融合蛋白的氨基端或羧基端还可含有一个或多个多肽片段,作为蛋白标签。任何合适的标签都可以用于本文。例如,所述的标签可以是FLAG(DYKDDDDK,SEQ ID NO:32),HA,HA1,c-Myc,Poly-His,Poly-Arg,Strep-TagII,AU1,EE,T7,4A6,ε,B,gE以及Ty1。这些标签可用于对蛋白进行纯化。It will be appreciated that in gene cloning procedures, it is often desirable to design a suitable cleavage site which necessarily introduces one or more irrelevant residues at the end of the expressed amino acid sequence without affecting the activity of the sequence of interest. In order to construct a fusion protein, promote expression of a recombinant protein, obtain a recombinant protein that is automatically secreted outside the host cell, or facilitate purification of the recombinant protein, it is often necessary to add some amino acids to the N-terminus, C-terminus of the recombinant protein or within the protein. Other suitable regions, for example, including but not limited to, suitable linker peptides, signal peptides, leader peptides, End extension, etc. Thus, the amino terminus or carboxy terminus of the fusion protein herein may also contain one or more polypeptide fragments as a protein tag. Any suitable label can be used in this article. For example, the tag may be FLAG (DYKDDDDK, SEQ ID NO: 32), HA, HA1, c-Myc, Poly-His, Poly-Arg, Strep-TagII, AU1, EE, T7, 4A6, ε, B , gE and Ty1. These tags can be used to purify proteins.
本文的融合蛋白还可含有核定位序列(NLS)。可使用本领域周知的各种来源和各种氨基酸组成的核定位序列。这类核定位序列包括但不限于:SV40病毒大T抗原的NLS,其具有氨基酸序列PKKKRKV(SEQ ID NO:33);来自核质蛋白的NLS,例如,具有序列KRPAATKKAGQAKKKK(SEQ ID NO:34)的核质蛋白二分NLS;来自c-myc的NLS,其具有氨基酸序列PAAKRVKLD(SEQ ID NO:35)或RQRRNELKRSP(SEQ ID NO:36);来自hRNPA1M9的NLS,其具有序列NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY(SEQ ID NO:37);来自输入蛋白-α的IBB结构域的序列RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV(SEQ ID NO:38);肌瘤T蛋白的序列VSRKRPRP(SEQ ID NO:39)和PPKKARED(SEQ ID NO:40);小鼠c-ablIV的序列SALIKKKKKMAP(SEQ ID NO:41);流感病毒NS1的序列DRLRR(SEQ ID NO:42)和PKQKKRK(SEQ ID NO:43);肝炎病毒δ抗原的序列RKLKKKIKKL(SEQ ID NO:44);小鼠Mx1蛋白的序列REKKKFLKRR(SEQ ID NO:45);人聚(ADP-核糖)聚合酶的序列KRKGDEVDGVDEVAKKKSKK(SEQ ID NO:46);以及类固醇激素受体(人)糖皮质激素的序列RKCLQAGMNLEARKTKK(SEQ ID NO:47);等。在某些具体实施方案中,本文使用SEQ ID NO:2第26-33位氨基酸残基所示的序列作为NLS。NLS可位于融合蛋白的N端、C端;也可位于融合蛋白序列中,例如位于融合蛋白中Cas9酶的N端和/或C端,或位于融合蛋白中的AID的N端和/或C端。The fusion proteins herein may also contain a nuclear localization sequence (NLS). Nuclear localization sequences of various sources and various amino acids well known in the art can be used. Such nuclear localization sequences include, but are not limited to, the NLS of the SV40 viral large T antigen having the amino acid sequence PKKKRKV (SEQ ID NO: 33); the NLS from the nuclear protein, for example, having the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 34) Nuclear protein dichotomous NLS; NLS from c-myc having the amino acid sequence PAAKRVKLD (SEQ ID NO: 35) or RQRRNELRSRSP (SEQ ID NO: 36); NLS from hRNPA1M9 having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 37); sequence from the IBB domain of the input protein-α RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 38); sequence of fibroid T protein VSRKRPRP (SEQ ID NO: 39) and PPKKARED (SEQ ID NO: 40); mouse c Sequence of -ablIV SALIKKKKKMAP (SEQ ID NO: 41); sequence of influenza virus NS1 DRLRR (SEQ ID NO: 42) and PKQKKRK (SEQ ID NO: 43); sequence of hepatitis virus delta antigen RKLKKKIKKL (SEQ ID NO: 44) ; sequence of mouse Mx1 protein REKKKFLKRR (SEQ ID NO: 45); sequence of human poly(ADP-ribose) polymerase KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 46); and sequence of steroid hormone receptor (human) glucocorticoid RKCLQAGMNLEARKTKK (SEQ I D NO: 47); etc. In certain embodiments, the sequence shown by amino acid residues 26-33 of SEQ ID NO: 2 is used herein as the NLS. The NLS may be located at the N-terminus and C-terminus of the fusion protein; it may also be located in the fusion protein sequence, such as the N-terminus and/or C-terminus of the Cas9 enzyme in the fusion protein, or the N-terminus and/or C of the AID located in the fusion protein. end.
可以通过任何适合的技术检测本发明融合蛋白在细胞核中的积聚。例如,可将检测标记融合到Cas酶上,使得在与检测细胞核的位置的手段(例如,对于细胞核特异的染料,如DAPI)相结合时融合蛋白在细胞内的位置可以被可视化。在某些实施方案中,本文使用3*flag作为标记,该肽段序列可如SEQ ID NO:2第1-23位氨基酸残基所示。应理解,通常,若存在标记序列时,标记序列通常在融合蛋白的N端。标记序列与NLS之间可直接连接,也可通过适当的接头序列连接。NLS序列可直接与Cas酶或AID连接,也可通过适当的接头序列与Cas酶或AID连接。The accumulation of the fusion protein of the invention in the nucleus can be detected by any suitable technique. For example, a detection marker can be fused to the Cas enzyme such that the location of the fusion protein within the cell can be visualized when combined with means for detecting the location of the nucleus (eg, a dye specific for the nucleus, such as DAPI). In certain embodiments, 3*flag is used herein as a marker, and the peptide sequence can be as shown in amino acid residues 1-23 of SEQ ID NO:2. It will be understood that, in general, if a marker sequence is present, the marker sequence is typically at the N-terminus of the fusion protein. The tag sequence can be directly linked to the NLS or can be joined by a suitable linker sequence. The NLS sequence can be ligated directly to the Cas enzyme or AID, or it can be ligated to the Cas enzyme or AID by a suitable linker sequence.
因此,在某些实施方案中,本文的融合蛋白由Cas酶和AID组成。在其它实施方案中,本文的融合蛋白由Cas酶通过接头与AID连接而成。在某些实施方案中, 本文的融合蛋白NLS、Cas酶、AID以及Cas酶和AID之间的任选的接头序列组成。在某些具体实施方案中,融合蛋白中的Cas酶是前文所述的Cas9酶。在某些具体实施方案中,融合蛋白中的AID的氨基酸序列如SEQ ID NO:2第1457-1654位氨基酸残基所示。在其它具体实施方案中,融合蛋白中的AID的氨基酸序列如SEQ ID NO:4第1457-1646位氨基酸残基所示。在其它具体实施方案中,融合蛋白中的AID的氨基酸序列如SEQ ID NO:68第1447-1629位氨基酸残基所示。Thus, in certain embodiments, the fusion proteins herein consist of a Cas enzyme and an AID. In other embodiments, the fusion protein herein is formed by the Cas enzyme linked to the AID via a linker. In certain embodiments, The fusion protein NLS, Cas enzyme, AID, and optional linker sequences between the Cas enzyme and the AID are comprised herein. In certain embodiments, the Cas enzyme in the fusion protein is a Cas9 enzyme as described hereinbefore. In certain particular embodiments, the amino acid sequence of the AID in the fusion protein is set forth in amino acid residues 1457-1654 of SEQ ID NO:2. In other specific embodiments, the amino acid sequence of the AID in the fusion protein is set forth in amino acid residues 1457-1646 of SEQ ID NO:4. In other specific embodiments, the amino acid sequence of the AID in the fusion protein is set forth in amino acid residues 1447-1629 of SEQ ID NO:68.
在某些实施方案中,本文的融合蛋白的氨基酸序列如SEQ ID NO:2、4、66、68、70或72所示,或如SEQ ID NO:2第26-1654位氨基酸所示,或如SEQ ID NO:4第26-1638位所示,或如SEQ ID NO:68第26-1629位氨基酸所示,或如SEQ ID NO:70第26-1629位氨基酸所示,或如SEQ ID NO:72第26-1638位氨基酸所示。In certain embodiments, the amino acid sequence of the fusion protein herein is as set forth in SEQ ID NO: 2, 4, 66, 68, 70 or 72, or as shown in amino acids 26-1654 of SEQ ID NO: 2, or As shown in SEQ ID NO: 4, positions 26-1638, or as shown in SEQ ID NO: 68, amino acids 26-1629, or as SEQ ID NO: 70, amino acids 26-1629, or as SEQ ID NO: 72 is shown in amino acids 26-1638.
多核苷酸序列、宿主和蛋白表达Polynucleotide sequence, host and protein expression
本文包括编码本文融合蛋白的的多核苷酸序列。本文的多核苷酸可以是DNA形式或RNA形式。DNA形式包括cDNA、基因组DNA或人工合成的DNA。DNA可以是单链的或是双链的。DNA可以是编码链或非编码链。Included herein are polynucleotide sequences encoding the fusion proteins herein. The polynucleotides herein may be in the form of DNA or RNA. DNA forms include cDNA, genomic DNA or synthetic DNA. DNA can be single-stranded or double-stranded. The DNA can be a coding strand or a non-coding strand.
本文所述的核苷酸序列通常可以用PCR扩增法获得。具体而言,可根据本文所公开的核苷酸序列,尤其是开放阅读框序列来设计引物,并用市售的cDNA库或按本领域技术人员已知的常规方法所制备的cDNA库作为模板,扩增而得有关序列。当序列较长时,常常需要进行两次或多次PCR扩增,然后再将各次扩增出的片段按正确次序拼接在一起。例如,在某些实施方案中,编码本文所述融合蛋白的多核苷酸序列如SEQ ID NO:1、3、65、67、79或71所示,或如SEQ ID NO:1第73-4965位碱基所示,或如SEQ ID NO:3第73-4917位碱基所示,或如SEQ ID NO:67第76-4890位碱基所示,或如SEQ ID NO:70第76-4890位碱基所示,或如SEQ ID NO:72第76-4917位碱基所示。The nucleotide sequences described herein can generally be obtained by PCR amplification. In particular, primers can be designed according to the nucleotide sequences disclosed herein, particularly the open reading frame sequences, and using a commercially available cDNA library or a cDNA library prepared by conventional methods known to those skilled in the art as a template, The relevant sequences were amplified. When the sequence is long, it is often necessary to perform two or more PCR amplifications, and then the amplified fragments are spliced together in the correct order. For example, in certain embodiments, the polynucleotide sequence encoding a fusion protein described herein is set forth in SEQ ID NO: 1, 3, 65, 67, 79 or 71, or as SEQ ID NO: 1 73-4965 Shown as a base, or as shown in bases 73-4917 of SEQ ID NO: 3, or as bases 76-4890 of SEQ ID NO: 67, or as SEQ ID NO: 70, 76- The 4890 bases are shown, or as shown in SEQ ID NO: 72, positions 76-4917.
本文也包括包含所述多核苷酸的核酸构建物。该核酸构建物含有本文所述的融合蛋白的编码序列,以及与这些序列操作性连接的一个或多个调控序列。本发明所述的融合蛋白的编码序列可以多种方式被操作以保证所述蛋白的表达。在将核酸构建物插入载体之前可根据表达载体的不同或要求而对核酸构建物进行操作。利用重组DNA方法来改变多核苷酸序列的技术是本领域已知的。Also included herein are nucleic acid constructs comprising the polynucleotides. The nucleic acid construct contains the coding sequences for the fusion proteins described herein, as well as one or more regulatory sequences operably linked to the sequences. The coding sequences for the fusion proteins of the invention can be manipulated in a variety of ways to ensure expression of the proteins. The nucleic acid construct can be manipulated depending on the identity or requirements of the expression vector prior to insertion of the nucleic acid construct into the vector. Techniques for altering polynucleotide sequences using recombinant DNA methods are known in the art.
调控序列可以是合适的启动子序列。启动子序列通常与待表达蛋白的编码序列操作性连接。启动子可以是在所选择的宿主细胞中显示转录活性的任何核苷酸序列,包括突变的、截短的和杂合启动子,并且可以从编码与该宿主细胞同源或异源的胞外或 胞内多肽的基因获得。The control sequence can be a suitable promoter sequence. The promoter sequence is typically operably linked to the coding sequence of the protein to be expressed. The promoter may be any nucleotide sequence that exhibits transcriptional activity in the host cell of choice, including mutated, truncated and hybrid promoters, and may be derived from an extracellular or heterologous source encoding the host cell. Or Gene acquisition of intracellular polypeptides.
调控序列也可以是合适的转录终止子序列,由宿主细胞识别以终止转录的序列。终止子序列与编码该多肽的核苷酸序列的3’末端操作性连接。在选择的宿主细胞中有功能的任何终止子都可用于本发明。The control sequence may also be a suitable transcription terminator sequence, a sequence recognized by the host cell to terminate transcription. The terminator sequence is operably linked to the 3' terminus of the nucleotide sequence encoding the polypeptide. Any terminator that is functional in the host cell of choice may be used in the present invention.
调控序列也可以是合适的前导序列,对宿主细胞翻译重要的mRNA的非翻译区。前导序列与编码该多肽的核苷酸序列的5′末端可操作连接。在选择的宿主细胞中有功能的任何终止子都可用于本发明。The control sequence may also be a suitable leader sequence, an untranslated region of the mRNA that is important for translation by the host cell. The leader sequence is operably linked to the 5' terminus of the nucleotide sequence encoding the polypeptide. Any terminator that is functional in the host cell of choice may be used in the present invention.
在某些实施方案中,所述核酸构建物是载体。例如,可将本文的多核苷酸序列插入到重组表达载体中。术语“重组表达载体”指本领域熟知的细菌质粒、噬菌体、酵母质粒、植物细胞病毒、哺乳动物细胞病毒如腺病毒、逆转录病毒或其它载体。只要能在宿主体内复制和稳定,任何质粒和载体都可以用。表达载体的一个重要特征是通常含有复制起点、启动子、标记基因和翻译控制元件。表达载体还可包括翻译起始用的核糖体结合位点和转录终止子。本文所述的多核苷酸序列可操作性地连接到表达载体中的适当启动子上,以经由该启动子指导mRNA合成。这些启动子的代表性例子有:大肠杆菌的lac或trp启动子;λ噬菌体PL启动子;真核启动子包括CMV立即早期启动子、HSV胸苷激酶启动子、早期和晚期SV40启动子、反转录病毒的LTRs和其它一些已知的可控制基因在原核或真核细胞或其病毒中表达的启动子。标记基因可用于提供用于选择转化的宿主细胞的表型性状,包括但不限于真核细胞培养用的二氢叶酸还原酶、新霉素抗性以及绿色荧光蛋白(GFP),或用于大肠杆菌的四环素或氨苄青霉素抗性。当本文所述的多核苷酸在高等真核细胞中表达时,如果在载体中插入增强子序列,则将会使转录得到增强。增强子是DNA的顺式作用因子,通常大约有10到300个碱基对,作用于启动子以增强基因的转录。In certain embodiments, the nucleic acid construct is a vector. For example, a polynucleotide sequence herein can be inserted into a recombinant expression vector. The term "recombinant expression vector" refers to bacterial plasmids, phage, yeast plasmids, plant cell viruses, mammalian cell viruses such as adenoviruses, retroviruses or other vectors well known in the art. Any plasmid and vector can be used as long as it can replicate and stabilize in the host. An important feature of expression vectors is that they typically contain an origin of replication, a promoter, a marker gene, and a translational control element. The expression vector may also include a ribosome binding site for translation initiation and a transcription terminator. The polynucleotide sequences described herein are operably linked to a suitable promoter in an expression vector to direct mRNA synthesis via the promoter. Representative examples of such promoters are: lac or trp promoter of E. coli; lambda phage PL promoter; eukaryotic promoters include CMV immediate early promoter, HSV thymidine kinase promoter, early and late SV40 promoter, anti- Promoters for the expression of LTRs of transcriptional viruses and other known controllable genes in prokaryotic or eukaryotic cells or their viruses. The marker gene can be used to provide phenotypic traits for selection of transformed host cells including, but not limited to, dihydrofolate reductase for eukaryotic cell culture, neomycin resistance, and green fluorescent protein (GFP), or for the large intestine Bacillus tetracycline or ampicillin resistance. When a polynucleotide described herein is expressed in a higher eukaryotic cell, transcription will be enhanced if an enhancer sequence is inserted into the vector. An enhancer is a cis-acting factor of DNA, usually about 10 to 300 base pairs, acting on a promoter to enhance transcription of the gene.
本领域一般技术人员清楚如何选择适当的载体、启动子、增强子和宿主细胞。可采用本领域技术人员熟知的方法构建含本文所述的多核苷酸序列和合适的转录/翻译控制信号的表达载体。这些方法包括体外重组DNA技术、DNA合成技术、体内重组技术等。One of ordinary skill in the art will recognize how to select appropriate vectors, promoters, enhancers, and host cells. Expression vectors containing the polynucleotide sequences described herein and appropriate transcription/translation control signals can be constructed using methods well known to those of skill in the art. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like.
可将本文所述的载体转化适当的宿主细胞,以使其能够表达本文所述的融合蛋白。宿主细胞可以是原核细胞,如细菌细胞;或是低等真核细胞,如酵母细胞;丝状真菌细胞、或是高等真核细胞,如哺乳动物细胞。宿主细胞还可以是植物细胞。宿主细胞的代表性例子有:大肠杆菌;链霉菌属;鼠伤寒沙门氏菌的细菌细胞;真菌细胞如酵母、丝状真菌;植物细胞;果蝇S2或Sf9的昆虫细胞;CHO、COS、293细胞、或Bowes黑素瘤细胞的动物细胞等。除用于表达融合蛋白的细胞外,其它的含本文所述 多核苷酸序列或载体以及sgRNA或其表达载体的细胞,例如用于制备点突变蛋白的细胞,也在本文所述的宿主细胞的范围之内。The vectors described herein can be transformed into a suitable host cell to enable expression of the fusion proteins described herein. The host cell can be a prokaryotic cell, such as a bacterial cell; or a lower eukaryotic cell, such as a yeast cell; a filamentous fungal cell, or a higher eukaryotic cell, such as a mammalian cell. The host cell can also be a plant cell. Representative examples of host cells are: Escherichia coli; Streptomyces; bacterial cells of Salmonella typhimurium; fungal cells such as yeast, filamentous fungi; plant cells; insect cells of Drosophila S2 or Sf9; CHO, COS, 293 cells, Or animal cells of Bowes melanoma cells, etc. In addition to the cells used to express the fusion protein, others include A polynucleotide sequence or vector and a cell of sgRNA or an expression vector thereof, such as a cell for the preparation of a point mutant protein, are also within the scope of the host cells described herein.
用重组DNA转化宿主细胞可用本领域技术人员熟知的常规技术进行。当宿主为原核生物如大肠杆菌时,能吸收DNA的感受态细胞可在指数生长期后收获,用CaCl2法处理,所用的步骤在本领域众所周知。另一种方法是使用MgCl2。如果需要,转化也可用电穿孔的方法进行。当宿主是真核生物,可选用如下的DNA转染方法:磷酸钙共沉淀法,常规机械方法如显微注射、电穿孔、脂质体包装等。Transformation of host cells with recombinant DNA can be carried out using conventional techniques well known to those skilled in the art. When the host is a prokaryote such as E. coli, competent cells capable of absorbing DNA can be harvested after the exponential growth phase and treated by the CaCl 2 method, and the procedures used are well known in the art. Another method is to use MgCl 2 . Conversion can also be carried out by electroporation if desired. When the host is a eukaryote, the following DNA transfection methods can be used: calcium phosphate coprecipitation, conventional mechanical methods such as microinjection, electroporation, liposome packaging, and the like.
转化宿主细胞后,获得的转化子可以用常规方法培养,以允许其表达本文所述的融合蛋白。根据所用的宿主细胞,培养中所用的培养基可选自各种常规培养基。可利用本领域已知的各种分离方法分离和纯化本文的重组融合蛋白。这些方法是本领域技术人员所熟知的,包括但并不限于:常规的复性处理、用蛋白沉淀剂处理(盐析方法)、离心、渗透破菌、超处理、超离心、分子筛层析(凝胶过滤)、吸附层析、离子交换层析、高效液相层析(HPLC)和其它各种液相层析技术及这些方法的结合。After transformation of the host cell, the obtained transformant can be cultured in a conventional manner to allow it to express the fusion protein described herein. The medium used in the culture may be selected from various conventional media depending on the host cell used. The recombinant fusion proteins herein can be isolated and purified using various separation methods known in the art. These methods are well known to those skilled in the art and include, but are not limited to, conventional renaturation treatment, treatment with a protein precipitant (salting method), centrifugation, osmotic bacteria, super treatment, ultracentrifugation, molecular sieve chromatography ( Gel filtration), adsorption chromatography, ion exchange chromatography, high performance liquid chromatography (HPLC) and various other liquid chromatography techniques and combinations of these methods.
因此,本文也包括含本文所述融合蛋白、其编码序列或表达载体和任选的sgRNA或其表达载体的宿主细胞。这种宿主细胞可组成型表达本文所述的融合蛋白,也可在一定的诱导条件下表达本文所述的融合蛋白。如何使宿主细胞组成型表达或在诱导条件下表达本发明融合蛋白的方法是本领域周知的。例如,在某些实施方案中,使用诱导型启动子构建本发明的表达载体,从而实现融合蛋白的诱导表达。Thus, host cells comprising a fusion protein described herein, a coding sequence or expression vector thereof, and optionally sgRNA or an expression vector thereof, are also included herein. Such host cells can constitutively express the fusion proteins described herein, and can also express the fusion proteins described herein under certain conditions of induction. Methods for constitutively expressing a host cell or expressing a fusion protein of the invention under inducing conditions are well known in the art. For example, in certain embodiments, an expression vector of the invention is constructed using an inducible promoter to effect inducible expression of the fusion protein.
组合物、试剂盒Composition, kit
本文的融合蛋白、其编码序列或表达载体,和/和sgRNA、其编码序列或表达载体可以组合物的形式提供。例如,组合物可含有本文的融合蛋白和sgRNA或sgRNA的表达载体,或可含有本文融合蛋白的表达载体和sgRNA或sgRNA的表达载体。在组合物中,融合蛋白或其表达载体、或sgRNA或其表达载体可以混合物的形式提供,或者可单独包装。组合物可以是溶液的形式,也可以是冻干形式。The fusion protein herein, its coding sequence or expression vector, and/and sgRNA, its coding sequence or expression vector can be provided in the form of a composition. For example, the composition may contain an expression vector for the fusion protein and sgRNA or sgRNA herein, or an expression vector containing the fusion protein herein and an expression vector for sgRNA or sgRNA. In the composition, the fusion protein or its expression vector, or sgRNA or its expression vector may be provided as a mixture, or may be packaged separately. The composition may be in the form of a solution or it may be in a lyophilized form.
组合物可提供在试剂盒中。因此,本文提供含有本文所述组合物的试剂盒。或者,本文也提供一种试剂盒,该试剂盒含有本文的融合蛋白和sgRNA或sgRNA的表达载体,或含有本文融合蛋白的表达载体和sgRNA或sgRNA的表达载体。试剂盒中,融合蛋白或其表达载体、或sgRNA或其表达载体可独立包装,或以混合物的形式提供。试剂盒中还可包括例如用于将所述融合蛋白或其表达载体和/或sgRNA或其表达载体转入细胞的试剂,以及指导技术人员进行所述转入的说明书。或者,试剂盒还可包括指导技术人员采用试剂盒所含成分实施本文所述的各种方法和用途的说明书。试剂盒 中还包括其它的试剂,例如用于PCR的试剂等。The composition can be provided in a kit. Accordingly, provided herein are kits containing the compositions described herein. Alternatively, a kit is provided herein, comprising the expression vector of the fusion protein and sgRNA or sgRNA herein, or an expression vector comprising the expression vector of the fusion protein herein and sgRNA or sgRNA. In the kit, the fusion protein or its expression vector, or sgRNA or its expression vector can be packaged separately or in the form of a mixture. Also included in the kit are reagents for transferring the fusion protein or its expression vector and/or sgRNA or expression vector thereof into a cell, and instructions for directing the skilled person to perform the transfer. Alternatively, the kit can also include instructions for the skilled artisan to practice the various methods and uses described herein using the components contained in the kit. Kit Other reagents, such as reagents for PCR, etc., are also included.
方法和用途Method and use
本文第三方面提供一种在细胞内产生点突变的方法,所述方法包括在所述细胞内表达本文所述的融合蛋白和sgRNA的步骤。在某些实施方案中,将本发明的融合蛋白或其表达载体和sgRNA或其表达载体转入所述细胞内。在细胞组成型表达本文所述融合蛋白的情况下,可仅将相应的sgRNA或其表达载体转入细胞中。在细胞诱导型表达本文所述融合蛋白的情况下,在转入sgRNA之后,还可用诱导剂孵育细胞,或对细胞施与相应的诱导措施(例如光照)。可采用常规的转染方法将所述融合蛋白或其表达载体和/或sgRNA或其表达载体转入细胞中。例如,在某些实施方案中,转染时,首先制备质粒DNA-脂质体复合物,然后将该质粒DNA-脂质体复合物和相应的sgRNA共同转染细胞。获得产生了点突变的细胞之后,可在适于该细胞生长并表达所需蛋白的条件下培育该细胞,并通过各种常规方法(例如高通量方法)分离、分析所产生的突变体。A third aspect herein provides a method of producing a point mutation in a cell, the method comprising the step of expressing a fusion protein and sgRNA described herein in said cell. In certain embodiments, a fusion protein of the invention, or an expression vector thereof, and sgRNA or an expression vector thereof, are introduced into the cell. Where the cell constitutively expresses a fusion protein described herein, only the corresponding sgRNA or its expression vector can be transferred into the cell. In the case of cell-inducible expression of the fusion protein described herein, the cells may also be incubated with the inducer after administration of the sgRNA, or the cells may be subjected to corresponding inducing measures (eg, illumination). The fusion protein or its expression vector and/or sgRNA or expression vector thereof can be transferred into a cell using conventional transfection methods. For example, in certain embodiments, upon transfection, a plasmid DNA-liposome complex is first prepared, and then the plasmid DNA-liposome complex and the corresponding sgRNA are co-transfected into cells. After obtaining a cell in which a point mutation is produced, the cell can be cultured under conditions suitable for the growth of the cell and expression of a desired protein, and the resulting mutant can be isolated and analyzed by various conventional methods such as a high-throughput method.
因此,本文所述的在细胞内产生点突变的方法也可用于产生突变体文库,然后利用常规的技术手段对文库中的突变体进行分离和筛选,获得具有所需生物学功能的突变体。因此,本发明也提供一种构建突变体文库的方法,所述方法包括在所述细胞内表达本文所述的融合蛋白和sgRNA的步骤。Thus, the methods described herein for generating point mutations in cells can also be used to generate mutant libraries, and then the mutants in the library can be isolated and screened using conventional techniques to obtain mutants having the desired biological function. Accordingly, the invention also provides a method of constructing a mutant library, the method comprising the step of expressing a fusion protein and sgRNA described herein in said cell.
可针对同一待突变位点设计一种或多种sgRNA。当设计多种sgRNA时,所设计的多种sgRNA的靶标结合区不同,但具有相同的Cas蛋白识别区。然后可将该一种或多种sgRNA与相应的融合蛋白一同转入细胞中。One or more sgRNAs can be designed for the same site to be mutated. When designing multiple sgRNAs, the target binding regions of the various sgRNAs designed are different, but have the same Cas protein recognition region. The one or more sgRNAs can then be transferred into the cell along with the corresponding fusion protein.
细胞可以是任意感兴趣的细胞,包括原核细胞和真核细胞,例如植物细胞、动物细胞、微生物细胞等。尤其优选的是动物细胞,例如哺乳动物细胞、啮齿类动物细胞,包括人、马、牛、羊、鼠、兔等等。微生物细胞包括本领域周知的来自各种微生物种类的细胞,尤其是那些具有医疗研究价值、生产价值(例如燃料如乙醇的生产、蛋白质生产、油脂如DHA生产)的微生物种类的细胞。细胞还可以是各种器官来源的细胞,例如来自人肝脏、肾脏、皮肤等处的细胞。细胞还可以是目前在售的各种成熟的细胞系,例如293细胞、COS细胞。在某些实施方案中,细胞是来自健康个体的细胞;在其他实施方案中,细胞是来自患病个体的患病组织的细胞,例如来自炎症组织的细胞、肿瘤细胞,诱导型多能干细胞等。细胞还可以是经基因工程改造过,以使其具有某种特定功能(例如生产感兴趣的蛋白)或产生感兴趣的表型的细胞。换言之,待突变的基因或核酸序列对于该细胞而言可以是天然就存在于该细胞内的(内源性) 基因或核酸序列,也可以是外来转入的(外源性的)基因或核酸序列。外来转入的基因或核酸序列可整合入细胞的基因组序列中,也可独立于基因组之外并稳定表达。The cell can be any cell of interest, including prokaryotic cells and eukaryotic cells, such as plant cells, animal cells, microbial cells, and the like. Particularly preferred are animal cells, such as mammalian cells, rodent cells, including humans, horses, cows, sheep, rats, rabbits, and the like. Microbial cells include cells from a variety of microbial species well known in the art, especially those having microbial species of medical research value, production value (e.g., production of fuels such as ethanol, protein production, lipids such as DHA production). The cells may also be cells of various organ origin, such as cells from human liver, kidney, skin, and the like. The cells may also be various mature cell lines currently marketed, such as 293 cells, COS cells. In certain embodiments, the cell is a cell from a healthy individual; in other embodiments, the cell is a cell from a diseased tissue of a diseased individual, such as a cell from an inflammatory tissue, a tumor cell, an induced pluripotent stem cell, and the like. . The cells may also be cells that have been genetically engineered to have a particular function (eg, to produce a protein of interest) or to produce a phenotype of interest. In other words, the gene or nucleic acid sequence to be mutated may be naturally present in the cell for the cell (endogenous) The gene or nucleic acid sequence may also be a foreign-transferred (exogenous) gene or nucleic acid sequence. The extraneously transferred gene or nucleic acid sequence can be integrated into the genomic sequence of the cell or independently of the genome and stably expressed.
针对不同的细胞,可采用已知技术设计表达本文融合蛋白和sgRNA的表达载体,以使这些表达载体适于在该细胞中表达。例如,可在表达载体中提供利于在该细胞中启动表达的启动子以及其他相关的调控序列。这些都可由技术人员根据实际情况加以选择和实施。For different cells, expression vectors expressing the fusion proteins and sgRNAs herein can be designed using known techniques to render these expression vectors suitable for expression in such cells. For example, a promoter that facilitates expression in the cell and other related regulatory sequences can be provided in an expression vector. These can be selected and implemented by the technician according to the actual situation.
期待产生点突变的核酸序列可以是任何感兴趣的核酸序列,例如基因序列,尤其是各种与疾病相关,或与各种感兴趣的蛋白质的生产相关,或各种与感兴趣的生物学功能相关的基因或核酸序列。这类感兴趣的基因或核酸序列包括但不限于编码各种功能蛋白的核酸序列。本文中,功能蛋白指能够完成生物体的生理功能的蛋白质,包括催化蛋白、运输蛋白、免疫蛋白和调节蛋白等。在某些具体实施方式中,所述功能蛋白包括但不限于:疾病的发生、发展和转移中涉及的蛋白,细胞分化、增殖与凋亡中涉及的蛋白,参与新陈代谢的蛋白,发育相关的蛋白,以及各种药物靶点等等。例如,功能蛋白可以是抗体、酶、脂蛋白、激素类蛋白、运输和贮存蛋白、运动蛋白、受体蛋白、膜蛋白等。因此,可利用本文所述的融合蛋白、多核苷酸、核酸构建物、细胞和方法等构建突变体文库,并进一步筛选获得具有新功能或更强功能的蛋白质,例如抗体、酶或其它功能蛋白等。Nucleic acid sequences which are expected to produce point mutations can be any nucleic acid sequence of interest, such as a gene sequence, particularly various diseases, or related to the production of various proteins of interest, or various biological functions of interest. A related gene or nucleic acid sequence. Such gene or nucleic acid sequences of interest include, but are not limited to, nucleic acid sequences encoding various functional proteins. Herein, a functional protein refers to a protein capable of performing physiological functions of an organism, including a catalytic protein, a transport protein, an immune protein, and a regulatory protein. In certain embodiments, the functional proteins include, but are not limited to, proteins involved in the development, progression, and metastasis of diseases, proteins involved in cell differentiation, proliferation, and apoptosis, proteins involved in metabolism, development-related proteins. , as well as various drug targets and so on. For example, the functional protein may be an antibody, an enzyme, a lipoprotein, a hormone protein, a transport and storage protein, a motor protein, a receptor protein, a membrane protein, or the like. Thus, mutant libraries, polynucleotides, nucleic acid constructs, cells, methods, and the like, as described herein, can be used to construct mutant libraries and further screen for proteins with new or greater functions, such as antibodies, enzymes, or other functional proteins. Wait.
利用本文所述的方法可在感兴趣的核酸序列上产生随机突变,或在感兴趣核酸序列的特定位点上产生突变。对于前者,可根据所用Cas酶寻找模板链上的PAM位点,以该PAM位点下游紧邻该PAM位点或与该PAM位点隔开10个以内(如8个以内、5个以内或3个以内)的长15~25个碱基、更通常长18~22个碱基的片段作为sgRNA的靶标识别区设计该Cas酶识别的sgRNA。对于后者,可在该特定位点附近寻找可作为PAM的位点,根据该PAM选择能识别该PAM的Cas酶,并依本文所述设计、制备含该Cas酶的本发明融合蛋白以及相应的sgRNA。Mutations can be made on the nucleic acid sequence of interest using the methods described herein, or at specific sites in the nucleic acid sequence of interest. For the former, the PAM site on the template strand can be searched according to the Cas enzyme used, and the PAM site is immediately downstream of the PAM site or within 10 cells (for example, within 8, within 5 or 3). A fragment of 15 to 25 bases in length, usually 18 to 22 bases in length, is designed as the target recognition region of the sgRNA to design the sgRNA recognized by the Cas enzyme. For the latter, a site that can serve as a PAM can be found near the specific site, and the Cas enzyme capable of recognizing the PAM can be selected according to the PAM, and the fusion protein of the present invention containing the Cas enzyme and correspondingly designed and prepared according to the description herein sgRNA.
本文的方法可以是体外方法,也可以是体内方法。当体内实施时,可采用本领域周知的手段将本文的融合蛋白或其表达载体和sgRNA或其表达载体转入实验对象体内,如相应的组织细胞内,并通过观察动物表型变化筛选出感兴趣的功能变体。应理解,体内实验时,实验对象可以是各种非人动物,尤其是本领域惯常采用的各种非人模式生物。体内实验也应满足伦理要求。The method herein may be an in vitro method or an in vivo method. When performed in vivo, the fusion protein or expression vector thereof and sgRNA or expression vector thereof can be transferred into a subject, such as a corresponding tissue cell, by a means well known in the art, and the sensation can be screened by observing the phenotypic change of the animal. A functional variant of interest. It should be understood that in vivo experiments, the subject may be a variety of non-human animals, particularly various non-human model organisms conventionally employed in the art. In vivo experiments should also meet ethical requirements.
下文将以具体实施例的方式阐述本发明。应理解,这些实施例仅仅是示例性的,而非限制本发明的范围。下列实施例中未注明具体条件的实验方法,通常按照常规条 件如Sambrook&Russell所著的Molecular Cloning:A Laboratory Manual(分子克隆实验指南第三版)中所述的条件,或按照制造厂商所建议的条件。除非另行定义,文中所使用的所有专业与科学用语与本领域熟练人员所熟悉的意义相同。此外,任何与所记载内容相似或均等的方法及材料皆可应用于本发明中。文中所述的较佳实施方法与材料仅作示范之用。The invention will be elucidated below in the context of specific embodiments. It is to be understood that the examples are merely illustrative and not limiting of the scope of the invention. The experimental methods in the following examples that do not specify the specific conditions are usually in accordance with the conventional strips. The conditions are as described in Sambrook & Russell, Molecular Cloning: A Laboratory Manual, or as recommended by the manufacturer. Unless otherwise defined, all professional and scientific terms used herein have the same meaning as those skilled in the art. In addition, any methods and materials similar or equivalent to those described can be applied to the present invention. The preferred embodiments and materials described herein are for illustrative purposes only.
实施例1:pEntr11-dCas9-AID质粒和pEntr11-dCas9-AIDX质粒的构建Example 1: Construction of pEntr11-dCas9-AID plasmid and pEntr11-dCas9-AIDX plasmid
1、以A20细胞株〔购买于中国科学院典型培养物保藏委员会细胞库〕RNA反转录出的cDNA为模板,利用SEQ ID NO:5和6所示引物及SEQ ID NO:5和7所示引物分别扩增出AID全长序列和AIDX片段(从第183位氨基酸残基起截短)(见图1,A和C);1. The cDNA reverse-transcribed from the A20 cell line (purchased in the cell bank of the Chinese Academy of Sciences' Type Culture Collection Committee) as a template, using the primers shown in SEQ ID NOS: 5 and 6, and SEQ ID NOS: 5 and Primers amplify the full-length AID sequence and the AIDX fragment (truncate from amino acid residue 183) (see Figure 1, A and C);
2、构建pEntr11-dCas9-TET1CD质粒:2. Construction of the pEntr11-dCas9-TET1CD plasmid:
(1)利用PCR从dCas9质粒(Addgene)扩增出dCas9目的基因片段;(1) a dCas9 gene fragment was amplified from dCas9 plasmid (Addgene) by PCR;
(2)利用限制性内切酶BamHⅠ和NcoⅠ对dCas9目的基因片段及pEntr11质粒(Invitrogen)酶切,回收上述片段;(2) The dCas9 gene fragment and the pEntr11 plasmid (Invitrogen) were digested with restriction endonucleases BamHI and NcoI to recover the above fragment;
(3)将酶切后的dCas9片段及pEntr11载体连接,然后将连接产物转化到TOP10感受态细胞中;(3) ligating the digested dCas9 fragment and the pEntr11 vector, and then transforming the ligation product into TOP10 competent cells;
(4)挑选阳性克隆,抽提质粒并送测序验证,至此完成了pEntr11-dCas9质粒的构建;(4) Select positive clones, extract the plasmid and send it to sequencing verification, thus completing the construction of pEntr11-dCas9 plasmid;
(5)利用PCR扩增出TET1CD目的基因片段;(5) Amplifying the TET1CD target gene fragment by PCR;
(6)利用限制性内切酶BamHⅠ和XhoⅠ对pEntr11-dCas9质粒酶切,并回收片段;(6) The pEntr11-dCas9 plasmid was digested with restriction endonucleases BamHI and XhoI, and the fragment was recovered;
(7)利用Gibson Assembly方法将TET1CD克隆到pEntr11-dCas9质粒中,至此完成了pEntr11-dCas9-TET1CD质粒的构建;(7) The TBi1CD was cloned into the pEntr11-dCas9 plasmid by the Gibson Assembly method, and the construction of the pEntr11-dCas9-TET1CD plasmid was completed.
3、利用限制性内切酶BamHⅠ和XhoⅠ对pEntr11-dCas9-TET1CD质粒、AID、AIDX片段进行酶切,然后回收pEntr11-dCas9载体及AID、AIDX片段;3. The pEntr11-dCas9-TET1CD plasmid, AID and AIDX fragments were digested with restriction endonucleases BamHI and XhoI, and then the pEntr11-dCas9 vector and AID and AIDX fragments were recovered;
4、分别将酶切后的AID、AIDX片段与pEntr11-dCas9载体连接,然后将连接产物转化到TOP10感受态细胞中;4. Linking the digested AID and AIDX fragments to the pEntr11-dCas9 vector, respectively, and then transforming the ligation product into TOP10 competent cells;
5、挑选阳性克隆,抽提质粒并送测序验证,至此完成了pEntr11-dCas9-AID及pEntr11-dCas9-AIDX质粒的构建(图1,B和D)。5. Positive clones were selected, plasmids were extracted and sequenced, and the construction of pEntr11-dCas9-AID and pEntr11-dCas9-AIDX plasmids was completed (Fig. 1, B and D).
实施例2:MO91-dCas9-AID质粒和MO91-dCas9-AIDX质粒的构建 Example 2: Construction of MO91-dCas9-AID plasmid and MO91-dCas9-AIDX plasmid
1、利用SEQ ID NO:8和9所示引物从pEntr11-dCas9-AID质粒和pEntr11-dCas9-AIDX质粒扩增出dCas9-AID片段和dCas9-AIDX片段(图2,A);1. The dCas9-AID fragment and the dCas9-AIDX fragment were amplified from the pEntr11-dCas9-AID plasmid and the pEntr11-dCas9-AIDX plasmid using the primers shown in SEQ ID NOS: 8 and 9 (Fig. 2, A);
2、利用限制性内切酶BglⅡ和XhoⅠ对MO91质粒(Addgene Plasmid#19755)及AID、AIDX片段进行酶切,然后回收载体、AID片段和AIDX片段(图2,B);2. The MO91 plasmid (Addgene Plasmid #19755) and the AID and AIDX fragments were digested with restriction endonucleases BglII and XhoI, and then the vector, AID fragment and AIDX fragment were recovered (Fig. 2, B);
3、分别将酶切后的AID片段、AIDX片段与MO91载体连接,然后将连接产物转化到Stbl3感受态细胞中;3. The AID fragment and the AIDX fragment after digestion are ligated to the MO91 vector, and then the ligated product is transformed into Stbl3 competent cells;
4、挑选阳性克隆,抽提质粒并送测序验证,至此完成了MO91-dCas9-AID及MO91-dCas9-AIDX质粒的构建(图2,C和D)。4. Positive clones were selected, plasmids were extracted and sequenced, and the construction of MO91-dCas9-AID and MO91-dCas9-AIDX plasmids was completed (Fig. 2, C and D).
实施例3:MO91-dCas9(3*flag,NLS)-AID质粒和MO91-dCas9(3*flag,NLS)-AIDX质粒的构建Example 3: Construction of MO91-dCas9 (3*flag, NLS)-AID plasmid and MO91-dCas9 (3*flag, NLS)-AIDX plasmid
以pCW-Cas9质粒(武汉淼灵生物科技有限公司)为模板,设计引物PCR扩增出3*flag+NLS片段,利用Gibson Assembly方法将3*flag+NLS片段分别克隆到MO91-dCas9-AID质粒和MO91-dCas9-AIDX质粒的dCas9N端,构建得到MO91-dCas9(3*flag,NLS)-AID质粒和MO91-dCas9(3*flag,NLS)-AIDX质粒(图3)。Using pCW-Cas9 plasmid (Wuhan Yuling Biotechnology Co., Ltd.) as a template, primers were designed to amplify 3*flag+NLS fragments, and 3*flag+NLS fragments were cloned into MO91-dCas9-AID plasmid by Gibson Assembly method. And the dCas9 N-terminus of the MO91-dCas9-AIDX plasmid, the MO91-dCas9 (3*flag, NLS)-AID plasmid and the MO91-dCas9 (3*flag, NLS)-AIDX plasmid (Fig. 3) were constructed.
实施例4:建立指示AID点突变效率的有效的报告系统Example 4: Establish an effective reporting system indicating the efficiency of AID point mutations
在基因组水平造成的点突变水平需要通过简单直观的方法检测,本发明主要采用流式分析技术在蛋白水平间接检测点突变水平。EGFP基因中人为插入终止密码子(TAG),EGFP无法正常表达。当本文的融合蛋白作用于EGFP基因中的终止密码子时,使终止密码子点突变,使EGFP基因突变正常表达。因此,EGFP表达水平越高,点突变的效率越高。The level of point mutations at the genomic level needs to be detected by a simple and intuitive method. The present invention mainly uses flow analysis techniques to indirectly detect the level of point mutations at the protein level. The human insertion insertion stop codon (TAG) in the EGFP gene, EGFP could not be expressed normally. When the fusion protein of this example acts on the stop codon in the EGFP gene, the stop codon is mutated and the EGFP gene mutation is normally expressed. Therefore, the higher the level of EGFP expression, the higher the efficiency of point mutations.
本实施例将含终止密码子的EGFP基因(序列如图4所示)插入到MO405-thy1.1质粒(Addgene)中,MSCV启动基因表达。使用该质粒包毒感染293T,具体包括:In this example, the EGFP gene containing the stop codon (sequence shown in Figure 4) was inserted into the MO405-thy1.1 plasmid (Addgene), and MSCV initiated gene expression. Infecting 293T with this plasmid, specifically including:
1、铺板293T,包毒时细胞密度达到90%;1. Laying 293T, the cell density reaches 90% when intoxicating;
2、24h后包毒,包毒方法和转染一样;2. After 24 hours, the poisoning method is the same as the transfection method;
3、包毒后24h换液;3. Change the liquid 24 hours after the poisoning;
4、包毒后24h,第一次收毒,加入聚凝胺1ug/ml,800g,90min,6-8h后换液;4, 24 hours after the poisoning, the first time to take the poison, add polyglycolamine 1ug / ml, 800g, 90min, 6-8h after the change;
5、包毒后48h,第二次收毒,加入聚凝胺1ug/ml,800g,90min,6-8h后换液;5, 48h after the poisoning, the second time to take the poison, add polyglycolamine 1ug / ml, 800g, 90min, 6-8h after the change;
6、待细胞长到足够数量后,流式染色(PE-thy1.1),分选th1.1阳性细胞作为报告细胞。结果如图6所示。报告细胞的模式示意图显示在图5中。 6. After the cells have grown to a sufficient number, flow staining (PE-thy1.1), sorting th1.1 positive cells as reporter cells. The result is shown in Figure 6. A schematic diagram of the reported cells is shown in Figure 5.
实施例5:sgRNA的制备Example 5: Preparation of sgRNA
1、寻找20bp的靶标序列。如果该20bp的靶标序列的起始碱基不是G,需将一个G加到其5’端以使其能被RNA聚合酶III U6启动子有效转录。需注意的是该靶标序列不能含有XhoI或NheI的识别位点。1. Find a 20 bp target sequence. If the starting base of the 20 bp target sequence is not G, a G is added to its 5' end to enable efficient transcription by the RNA polymerase III U6 promoter. It should be noted that the target sequence cannot contain a recognition site for XhoI or NheI.
2、将sgRNA克隆到pLX(Addgene 50662)中,获得pLX sgRNA。需如下4个引物,其中R1和F2是sgRNA特异性的:2. The sgRNA was cloned into pLX (Addgene 50662) to obtain pLX sgRNA. The following four primers are required, wherein R1 and F2 are sgRNA specific:
F1:AAACTCGAGTGTACAAAAAAGCAGGCTTTAAAG(SEQ ID NO:10)F1: AAACTCGAGTGTACAAAAAAGCAGGCTTTAAAG (SEQ ID NO: 10)
R1:rc(GN19)GGTGTTTCGTCCTTTCC(SEQ ID NO:11) R1: rc (GN 19) GGTGTTTCGTCCTTTCC (SEQ ID NO: 11)
F2:GN19GTTTTAGAGCTAGAAATAGCAA(SEQ ID NO:12)F2: GN 19 GTTTTAGAGCTAGAAATAGCAA (SEQ ID NO: 12)
R2:AAAGCTAGCTAATGCCAACTTTGTACAAGAAAGCTG(SEQ ID NO:13)R2: AAAGCTAGCTAATGCCAACTTTGTACAAGAAAGCTG (SEQ ID NO: 13)
其中,GN19=新的靶标序列,rc(GN19)=新靶标序列的反向互补序列。Among them, GN 19 = new target sequence, rc (GN 19 ) = reverse complement of the new target sequence.
3、分别使用F1+R1和F2+R2扩增pLX sgRNA;3. Amplify pLX sgRNA using F1+R1 and F2+R2, respectively;
4、凝胶纯化两次扩增获得的产物,合并,用于F1+R2进行第三次PCR;4. The product obtained by two times of gel purification is combined and used for F1+R2 for the third PCR;
5、使用NheI和XhoI消化步骤4进行的PCR获得的产物;和5. The product obtained by PCR using the NheI and
6、连接和转化,从而制备得到sgRNA的表达载体。6. Ligation and transformation to prepare an expression vector for sgRNA.
四条sgRNA的靶标结合区的碱基序列如下所示:The base sequences of the target binding regions of the four sgRNAs are as follows:
实施例6:CRISPR-Cas9提高AID点突变效率Example 6: CRISPR-Cas9 improves AID point mutation efficiency
培养实施例4所构建的报告细胞至70-90%的汇合度时进行转染。转染时,首先制备质粒DNA-脂质体复合物,包括将四倍量的2000试剂稀释在培养基中,分别将MO91-dCas9(3*flag,NLS)-AID质粒或MO91-dCas9(3*flag,NLS)-AIDX质粒稀释在培养基中,然后将稀释的质粒分别加到稀释的2000试剂中(1:1)孵育30分钟。之后将该质粒DNA-脂质体复合物和实施例5制备的针对EGFP终止密码子的4个sgRNA共同转染实施例4所构建的报告细胞。作为对照,仅用所述质粒DNA-脂质体复合物转染实施例4所构建的报告细胞。加嘌呤霉素2ug/ml和杀稻瘟菌素20ug/ml进行培育,筛选3d,分别在转染后第4天和第7天流式分析EGFP表达水平。Transfection was carried out by culturing the reporter cells constructed in Example 4 to a confluency of 70-90%. When transfected, first prepare a plasmid DNA-liposome complex, including four times the amount 2000 reagent diluted in In the medium, dilute the MO91-dCas9 (3*flag, NLS)-AID plasmid or the MO91-dCas9 (3*flag, NLS)-AIDX plasmid, respectively. In the medium, the diluted plasmid is then separately added to the diluted Incubate for 30 minutes in 2000 reagents (1:1). The plasmid DNA-liposome complex and the 4 sgRNAs against the EGFP stop codon prepared in Example 5 were then co-transfected with the reporter cells constructed in Example 4. As a control, the reporter cells constructed in Example 4 were transfected only with the plasmid DNA-liposome complex. Incubation was carried out by adding 2 ug/ml of puromycin and 20 ug/ml of blasticidin, and screening for 3d, and analyzing the expression level of EGFP on the 4th and 7th day after transfection, respectively.
结果如图7所示,AID与AIDX的%EGFP+分别为0.14%和0.30%,而 dCas9-AID+sgRNA和dCas9-AIDX+sgRNA的%EGFP+分别为2.14%和4.36%。As a result, as shown in Fig. 7, the %EGFP+ of AID and AIDX were 0.14% and 0.30%, respectively. The %EGFP+ of dCas9-AID+sgRNA and dCas9-AIDX+sgRNA were 2.14% and 4.36%, respectively.
结果表明,将AID或AIDX与dCas9融合,在sgRNA的导向作用下,会使AID在sgRNA的靶向作用下,在AID的点突变功能局限在特异的部位,同时提高其作用浓度,提高其突变效率。The results showed that the fusion of AID or AIDX with dCas9, under the guidance of sgRNA, would make the AID point mutation function in the specific part of the AID under the targeting effect of sgRNA, and increase its concentration and increase its mutation. effectiveness.
实施例7:CRISPR-Cas9提高AID点突变效率及优化Example 7: CRISPR-Cas9 improves AID point mutation efficiency and optimization
采用与实施例6相同的方法,在实施例4构建的报告细胞中共转sgRNA和dCas9-AID的表达载体。其中sgRNA分两组,一组是针对AAVS1的对照sgRNA,其靶标结合区分别如下:GATTCCCAGGGCCGGTTAATG(SEQ ID NO:18);GTCCCCTCCACCCCACAGTG(SEQ ID NO:19);和GGGGCCACTAGGGACAGGAT(SEQ ID NO:20)。另外一组是针对EGFP的sgRNA组(SEQ ID NO:14-17)。同时设置对照组在报告细胞中单转AID。对照sgRNA的表达载体如实施例5所述方法构建。The expression vector of sgRNA and dCas9-AID was co-transduced in the reporter cells constructed in Example 4 in the same manner as in Example 6. The sgRNA was divided into two groups, one of which was a control sgRNA against AAVS1, and the target binding regions thereof were as follows: GATTCCCAGGGCCGGTTAATG (SEQ ID NO: 18); GTCCCCTCCACCCCACAGTG (SEQ ID NO: 19); and GGGGCCACTAGGGACAGGAT (SEQ ID NO: 20). The other group is the sgRNA group against EGFP (SEQ ID NOS: 14-17). At the same time, the control group was set to single-turn AID in the reporter cells. An expression vector for the control sgRNA was constructed as described in Example 5.
在转染后第8天测FACS,AID组的EGFP%+只有0.13%,而dCas9-AID+sgRNA组的EGFP%+达到2.1%(图8,A),EGFP%+有了16倍提高。为了进一步优化dCas9-AID系统的效率,将dCas9与不同的AID突变体融合:AID-FL(全长),AID-CD(仅含催化结构域),P182X(从第183位氨基酸残基起截短),R186X(从第187位氨基酸残基起截短),R190X(从第191位氨基酸残基起截短)。在报告细胞中共转各dCas9-AID表达载体和sgRNA,其中dCas9-R186X的效率最高(图8,B和C)。因此采用dCas9-R186X进行实施例8-13的试验,在这些实施例中,将dCas9-R186X简称为dCas9-AIDX。On the 8th day after transfection, FACS was measured, and the EGFP%+ of the AID group was only 0.13%, while the EGFP%+ of the dCas9-AID+sgRNA group was 2.1% (Fig. 8, A), and the EGFP%+ had a 16-fold increase. To further optimize the efficiency of the dCas9-AID system, dCas9 was fused to different AID mutants: AID-FL (full length), AID-CD (catalytic domain only), P182X (from amino acid residue 183) Short), R186X (truncated from amino acid residue at position 187), R190X (truncated from amino acid residue at position 191). Each dCas9-AID expression vector and sgRNA were co-transformed in the reporter cells, with dCas9-R186X being the most efficient (Figure 8, B and C). The experiments of Examples 8-13 were therefore carried out using dCas9-R186X, and in these examples, dCas9-R186X was simply referred to as dCas9-AIDX.
为了证明在dCas9-AID体系中确实是由AID与dCas9融合后,才使整个系统具有碱基置换功能,在报告细胞中分别共转Cas9,dCas9,dCas9-AIDX的功能突变体〔R186X(E58Q)〕,dCas9-AIDX和sgRNA,只有dcas9-AIDX和sgRNA组具有EGFP%+,而其他组均为0(图8,C)。也就证明确实是由AID与dCas9融合后,才使整个系统具有碱基置换功能。In order to prove that the DCas9-AID system is indeed fused with dCas9, the entire system has a base substitution function, and a functional mutant of Cas9, dCas9, dCas9-AIDX is separately co-transferred in the reporter cells [R186X(E58Q) ], dCas9-AIDX and sgRNA, only the dcas9-AIDX and sgRNA groups have EGFP%+, while the other groups are all 0 (Fig. 8, C). It also proves that it is indeed the fusion of AID and dCas9 that the entire system has a base replacement function.
实施例8:CRISPR-Cas9将AID点突变功能局限在sgRNA靶向部位Example 8: CRISPR-Cas9 limits AID point mutation to sgRNA targeting sites
为研究CRISPR-Cas9是否能将AID点突变功能局限在sgRNA靶向部位,以实施例4构建的报告系统的基因组DNA为模板,对含终止密码子的EGFP进行PCR,构建文库,并将cMyc作为对照基因,进行Miseq测序。结果如图9所示。由报告细胞的测序结果可知,Miseq虽然测序通量高,滤去低质量的读数(reads)后,但仍有测 序基底突变频率,EGFP为0.25%,cMyc为0.15%。但即使有基底水平干扰,仍可观察到dCas9-AIDX+sgRNA组的EGFP基因点突变频率明显高于AIDX组,同样证明CRISPR-Cas9提高AID点突变效率。并且这些高频突变位点主要集中在sgRNA的靶向位点,而在cMyc基因中几乎没有发生点突变。证明dCas9与AID融合之后,sgRNA将dCas9-AID靶向到sgRNA的靶向位点,使AID只会对sgRNA的靶向位点发挥作用,产生点突变,而不会对其他基因位点造成很大改变;并且能够大幅提高点突变频率。To investigate whether CRISPR-Cas9 can limit the AID point mutation function to the sgRNA targeting site, use the genomic DNA of the reporter system constructed in Example 4 as a template, PCR the EGFP containing the stop codon, construct a library, and use cMyc as For the control gene, Miseq sequencing was performed. The result is shown in Figure 9. According to the sequencing results of the reporter cells, although the Miseq has high sequencing throughput and filters out low-quality readings, it still has a test. The order base mutation frequency was 0.25% for EGFP and 0.15% for cMyc. However, even with basal level interference, the frequency of EGFP gene mutations in the dCas9-AIDX+sgRNA group was significantly higher than that in the AIDX group, which also proved that CRISPR-Cas9 increased the efficiency of AID point mutation. Moreover, these high frequency mutation sites are mainly concentrated in the targeting site of sgRNA, and almost no point mutation occurs in the cMyc gene. After demonstrating that dCas9 is fused to AID, sgRNA targets dCas9-AID to the targeting site of sgRNA, so that AID will only act on the target site of sgRNA, producing point mutations without causing very much other gene loci. Great change; and can greatly increase the frequency of point mutations.
实施例9:dCas9-AIDX将C和G碱基随机突变为其他三种碱基Example 9: dCas9-AIDX randomly mutates C and G bases to three other bases
AIDX本身会将C突变为T,将G突变为A。将dCas9与AIDX融合之后,与AIDX组对比,C和G的突变方向变得更加均一化。AIDX itself will mutate C to T and G to A. After the fusion of dCas9 and AIDX, the mutation direction of C and G became more uniform compared with the AIDX group.
同时AID本身的作用是依赖于hotspot基序的WRCY(W代表A/T,R代表A/C,Y代表C/T),其中最偏好的基序是AGCT。而将dCas9与AIDX融合之后,这种基序的偏好性会明显消失。因此本发明人提出一种假设,正常情况下,AID会将胞嘧啶脱氨基,形成尿嘧啶,通过DNA复制修复,将这种u-g错配保留,发生C到T、G到A的突变,另外可以通过碱基切除修复方式,将U碱基切除,随即插入四种碱基。所以dCas9与AID的融合很有可能抑制DNA复制这条途径,促进碱基切除修复,使突变方向更加均一化(图10,b)。At the same time, the role of the AID itself is dependent on the WRCY of the hotspot motif (W stands for A/T, R stands for A/C, Y stands for C/T), and the most preferred motif is AGCT. After the fusion of dCas9 and AIDX, the preference of this motif will obviously disappear. Therefore, the inventors have proposed a hypothesis that under normal circumstances, AID will deamination of cytosine to form uracil, which is repaired by DNA replication, and this ug mismatch is retained, and mutations of C to T and G to A occur, and The U base can be excised by base excision repair, and then four bases are inserted. Therefore, the fusion of dCas9 and AID is likely to inhibit the DNA replication pathway, promote base excision repair, and make the mutation direction more uniform (Fig. 10, b).
此外,对Miseq数据进行统计分析,AIDX和dCas9-AIDX+sgRNA组在EGFP上的造成点突变类型基本上与报道一致,C和G碱基突变占主要部分,A和T所占比例较少。并且G主要突变向T,C突变向A。但在dCas9-AIDX组,G突变向T和C的比例增加,C突变向G或A的比例增加。因此,dCas9-AIDX可以产生更均一的突变类型(图10,a)。In addition, statistical analysis of the Miseq data showed that the type of point mutations caused by the AIDX and dCas9-AIDX+sgRNA groups on EGFP was basically consistent with the report. The C and G base mutations accounted for the majority, and A and T accounted for a small proportion. And the main mutation of G to T, C mutation to A. However, in the dCas9-AIDX group, the ratio of G mutation to T and C increased, and the ratio of C mutation to G or A increased. Therefore, dCas9-AIDX can produce a more uniform type of mutation (Fig. 10, a).
实施例10:UGI提高dCas9-AIDX系统的碱基置换频率,揭示dCas9-AIDX在基因上的作用轨迹,并使碱基突变方向更加单一化。Example 10: UGI increases the base substitution frequency of the dCas9-AIDX system, reveals the trajectory of dCas9-AIDX on the gene, and makes the base mutation direction more singular.
UGI是UNG的抑制剂,是一种噬菌体蛋白,当噬菌体入侵大肠杆菌时,可以保护自身的基因组免受宿主UNG的修复(图11,a)。在报告细胞中共转三种质粒,分别表达dCas9-AIDX、单条sgRNA(靶标结合区为GCCTCGAACTTCACCTCGGCG,SEQ ID NO:16)和UGI(蛋白序列:UniProtKB-P14739),用以提高在整个体系中单条sgRNA的突变效率。结果显示,最高点突变效率有10倍提高(图11,b)。UGI is an inhibitor of UNG, a phage protein that protects its genome from host UNG when it invades E. coli (Fig. 11, a). Three plasmids were co-transduced in the reporter cells, expressing dCas9-AIDX, a single sgRNA (target binding region GCCTCGAACTTCACCTCGGCG, SEQ ID NO: 16) and UGI (protein sequence: UniProtKB-P14739) to enhance a single sgRNA throughout the system. Mutation efficiency. The results showed a 10-fold increase in the highest point mutation efficiency (Fig. 11, b).
除此之外,加入UGI后,整个体系的突变方向更加单一,C到T,G到A。同 时统计了dCas9-AIDX的作用轨迹,整个体系在PAM序列前后造成的突变频率。图11(c)是根据针对EGFP位点设计的4个sgRNA的数据进行的统计。都是以PAM序列中NGG中的N为第一位碱基。其上游为-,下游为+,两组数据的统计结果一致,都是对PAM的上游20bp也就是在原型间隔序列区域造成突变,而且突变最高点是在PAM的-12/-13位。UGI可以增加AID的整体突变频率,但会使碱基置换的比例增加,转换比例减少(图11,d)。In addition, after the addition of UGI, the mutation direction of the whole system is more single, C to T, G to A. Same The trajectory of the action of dCas9-AIDX and the frequency of mutations caused by the whole system before and after the PAM sequence were counted. Figure 11 (c) is a statistic based on data for 4 sgRNAs designed for the EGFP site. N is the first base in NGG in the PAM sequence. The upstream is -, the downstream is +, the statistical results of the two sets of data are consistent, both of which cause mutations in the upstream 20 bp of the PAM, that is, in the prototype interval region, and the highest point of mutation is in the -12/-13 position of the PAM. UGI can increase the overall mutation frequency of AID, but it will increase the proportion of base substitutions and reduce the conversion ratio (Fig. 11, d).
实施例11:dCas9-AIDX不仅可以对外源性基因起作用,同时可以作用于内源性基因。以上的实验均是在报告细胞中进行,本实施例选用内源性基因AAVS1作为靶标位点,设计3个sgRNA(SEQ ID NO:18-20),在293T中共转表达dCas9-AID和针对AAVS1的三个sgRNA的载体(如实施例7所述)。Example 11: dCas9-AIDX can act not only on exogenous genes, but also on endogenous genes. The above experiments were all carried out in the reporter cells. In this example, the endogenous gene AAVS1 was selected as the target site, and three sgRNAs (SEQ ID NO: 18-20) were designed, and dCas9-AID and AAVS1 were co-transduced in 293T. The vector of three sgRNAs (as described in Example 7).
结果如图12所示。dCas9-AID系统同样可以对内源性基因AAVS1产生碱基置换,并且这种突变也是集中在sgRNA靶标位点。The result is shown in FIG. The dCas9-AID system can also generate base substitutions to the endogenous gene AAVS1, and this mutation is also concentrated in the sgRNA target site.
实施例12:将dCas9-AIDX应用于K562BCR-ABL基因的Gleevec耐药性筛选Example 12: Application of dCas9-AIDX to Gleevec resistance screening of K562BCR-ABL gene
K562是来源于慢性髓样白血病人的白血病细胞系。在这种细胞中存在着一种染色体,叫做ph染色体。该染色体是由第9号和第22号染色体的长臂转座而成。第9号染色体上的ABL基因含有酪氨酸激酶活性中心,在正常状态下处于低活性状态,而当转座到BCR基因座中后,会具有很高的活性。会引起一系列信号转导,引发癌症,因此BCR-ABL是一种原癌基因,常用的药物就是Gleevec(格列卫,活性成分是甲磺酸依马替尼),其主要作用机制是gleevec可以竞争性与ABL结合ATP,从而使ABL基因处于低活性。但在病人样本中发现在酪氨酸激酶活性结构域中,会发生点突变,如T315I,使结构域失去结合gleevec的能力,产生gleevec耐药性。除此之外,其它位点的碱基置换也会导致Gleevec耐药性。可以使用dCas9-AIDX系统来筛选Gleevec耐药性位点及具体突变类型,作为设计下一代抑制剂的基础。K562 is a leukemia cell line derived from human chronic myeloid leukemia. There is a chromosome in this cell called the ph chromosome. The chromosome is transposed by the long arms of
首先,为了获得稳定表达dCas9-AIDX的K562细胞,我们利用目的质粒MSCV-dCas9-AID-P182X-IRES-Thy1.1与病毒包装质粒pcl-10A1共同转染293T细胞。在六孔板的一孔中提前12-24小时铺好1x106的293T细胞,并用2ml无抗10%FBS的DMEM培养过夜,次日待细胞长至80%密度时,转染3ug目的质粒和1ug病毒包装质粒,以及10ul转染试剂LIPO2000。转染24小时后用2ml有抗培液培养,分别在48小时、72小时收集病毒。收集好的病毒立即1000rpm离心5分钟去除细胞碎片,取上清加入2ul 10mg/ml Polybrene感染1x105的K562细胞,37℃、900g转速甩板90 分钟。感染后4小时离心细胞,取沉淀用有抗培液培养。经过两天连续感染后的K562细胞需要再继续培养两天,再利用流式染色,将表达Thy1.1表面分子的细胞标记为PE+(抗体1:200稀释),并利用单细胞分选技术获得两块96孔板PE-Thy1.1+的K562单细胞。经过两周的培养,收集由各个单细胞克隆产生的细胞群的RNA,分别进行RT-qPCR实验。其中dCas9-AIDX表达最高的细胞株用以进行后续对Gleevec耐药性位点及突变类型的筛选。First, in order to obtain K562 cells stably expressing dCas9-AIDX, we used the plasmid of interest MSCV-dCas9-AID-P182X-IRES-Thy1.1 to co-transfect 293T cells with the viral packaging plasmid pcl-10A1. 1×10 6 293T cells were plated in a well of a six-well plate 12-24 hours in advance, and cultured overnight with 2 ml of DMEM without anti-10% FBS, and the next day when the cells were grown to 80% density, 3 ug of the target plasmid was transfected. 1ug virus packaging plasmid, and 10ul transfection reagent LIPO2000. After 24 hours of transfection, the cells were cultured with 2 ml of anti-seeding solution, and virus was collected at 48 hours and 72 hours, respectively. Good collection 1000rpm virus immediate cell debris was removed by centrifugation for 5 minutes, the supernatant was added 2ul 10mg / ml Polybrene of infection 1x10 5 K562 cells, 37 ℃, 900g speed rejection board 90 minutes. The cells were centrifuged 4 hours after infection, and the pellet was cultured with an anti-seeding solution. After two days of continuous infection, K562 cells need to be cultured for another two days. Flow cytometry is used to label cells expressing Thy1.1 surface molecules as PE + (antibody 1:200 dilution) and using single cell sorting technique. Two 96-well plates of PE-Thy1.1 + K562 single cells were obtained. After two weeks of culture, RNA of the cell population produced by each single cell clone was collected and subjected to RT-qPCR experiments. The cell line with the highest expression of dCas9-AIDX was used for subsequent screening of Gleevec resistance sites and mutation types.
同时,为了筛选出Gleevec耐药性的位点,我们针对ABL基因第六号外显子Exon6所在基因组区域进行了sgRNA的设计。共设计了16个sgRNA(靶标区序列分别如SEQ ID NO:49-64所示),其中6个靶向到与外显子Exon6相邻的内含子区域,10个直接靶向到Exon6区域,并覆盖了83%的外显子序列。由于T315I的突变已被公认为是造成Gleevec耐药性的最主要突变之一,我们设计的sgRNA中有且仅有1个能够覆盖到T315I突变的位点(944C),能够作为阳性对照。同时,我们针对与Gleevec耐药性无关的AAVS1基因的基因组序列设计了3个sgRNA作为阴性对照(靶标区序列如SEQ ID NO:18-20所示)。这些sgRNA序列都是通过化学合成,利用BamH1和HindIII双酶切,最终被克隆于携带H1启动子的pSUPER-sgRNA载体中。我们利用苯酚氯仿-乙醇沉降法对等量混合的16个Exon6的sgRNA质粒或3个AAVS1的sgRNA质粒进行沉降,使混合质粒的最终浓度在1.5ug/ul以上。随后,将稳定表达dCas9-AIDX的K562细胞株分别用ABL-Exon6、AAVS1混合好的sgRNA库进行电转,仪器使用美国Life Technology公司Neo电转仪。电转前12-24小时,先以无抗10%FBS的IMDM培养液培养K562细胞,电转当天以1000V电压、单脉冲、50ms电击时间为条件,对两份1.2x106的K562细胞分别转染8ug等量混合的ABL-Exon6或AAVS1的sgRNA。由于pSUPER-sgRNA质粒载体携带有嘌呤霉素抗性基因,故在转染后24小时,加入2ug/ml嘌呤霉素对表达sgRNA的细胞进行筛选。嘌呤霉素处理48小时后撤去,K562细胞继续扩大培养。转染后第六天收集2x105的细胞DNA和RNA进行高通量测序并作为Input对照,将其余细胞分成两份,分别用10uM Gleevec药物或与其等体积的DMSO处理。每三天进行一次Ficoll,除去死亡细胞,直到细胞数低于2x104时为止。在Gleevec药物处理下,转染进AAVS1sgRNA的对照组细胞在7-10天左右基本全部死亡,而转染进ABL-Exon6sgRNA的实验组细胞能够继续增殖。在转染后第36-40天左右,实验组细胞增殖到107数量级(图14,b)。同时收集Gleevec处理和DMSO处理的细胞的DNA和RNA,进行高通量测序分析。测序结果表明,在30%的细胞中有T315I的突变,而此突变是已知的在病人中发现的耐药性突变,除此之外,还发现多个未报道过的点突变(图14,c和d)。
At the same time, in order to screen out the Gleevec resistance site, we designed the sgRNA for the genomic region of Exon6, the sixth exon of ABL gene. A total of 16 sgRNAs were designed (target sequence sequences are shown in SEQ ID NOs: 49-64, respectively), of which 6 are targeted to intron regions adjacent to exon Exon6, and 10 are directly targeted to the Exon6 region. And covered 83% of the exon sequences. Since the mutation of T315I has been recognized as one of the most important mutations causing Gleevec resistance, one and only one of the sgRNAs we designed can cover the T315I mutation site (944C) and can be used as a positive control. At the same time, we designed three sgRNAs as negative controls for the genomic sequence of the AAVS1 gene unrelated to Gleevec resistance (target sequence sequences are shown in SEQ ID NOs: 18-20). These sgRNA sequences were all chemically synthesized, digested with BamH1 and HindIII, and finally cloned into the pSUPER-sgRNA vector carrying the H1 promoter. We used a phenol chloroform-ethanol sedimentation method to sediment an equal amount of 16 Exon6 sgRNA plasmids or 3 AAVS1 sgRNA plasmids so that the final concentration of the mixed plasmid was above 1.5 ug/ul. Subsequently, the K562 cell line stably expressing dCas9-AIDX was electroporated with ABL-Exon6 and AAVS1 mixed sgRNA libraries, respectively, and the instrument was used by the American Life Technology company Neoelectric transducer. 12-24 hours before electroporation, K562 cells were cultured in IMDM medium without anti-10% FBS. On the day of electroporation, two 1.2× 10 6 K562 cells were transfected with 8ug respectively on the condition of 1000V voltage, single pulse and 50ms shock time. Equally mixed ABL-Exon6 or AAVS1 sgRNA. Since the pSUPER-sgRNA plasmid vector carries the puromycin resistance gene, cells expressing sgRNA were screened 24 hours after transfection by adding 2 ug/ml puromycin. After treatment with puromycin for 48 hours, K562 cells continued to expand. On
实施例13:将dCas9-AIDX应用于体外提高抗体的亲和力和特异性Example 13: Application of dCas9-AIDX to increase the affinity and specificity of antibodies in vitro
抗体可以特异性的识别抗原,作为治疗多种疾病的药物蛋白。抗体的亲和力与其在体内生发中心产生的体细胞突变成正比,一般而言,高亲和力的抗体都具有多个体细胞高频突变。因此,可以使用dCas9-AIDX来针对抗体基因进行突变,筛选亲和力更强或具有其它特征(如特异性更好等)的抗体。Antibodies can specifically recognize antigens as drug proteins for the treatment of various diseases. The affinity of an antibody is directly proportional to the somatic mutations produced in the germinal center in vivo. In general, high affinity antibodies have multiple somatic high frequency mutations. Therefore, dCas9-AIDX can be used to mutate antibody genes to screen for antibodies with stronger affinity or other characteristics (such as better specificity, etc.).
使用方案如下,在293T细胞表面稳定表达抗体分子,而后针对抗体基因,设计sgRNA,和dCas9-AIDX同时转染293T细胞,而后进行细胞表面的染色,染色越强的细胞,其突变的抗体分子具有更强的亲和力。The protocol is as follows. The antibody molecule is stably expressed on the surface of 293T cells, and then sgRNA is designed for the antibody gene, and 293T cells are simultaneously transfected with dCas9-AIDX, and then the cell surface is stained. The stronger the stained cells, the mutant antibody molecules have Stronger affinity.
本实施例采用Invitrogen公司的稳定表达一个lacZ-ZeocinTM融合基因座的Flp-InTM-293细胞。首先合成低亲和力的抗鸡卵溶菌酶(HEL)的小鼠IgG1抗体(KD=2.78E-09M)的cDNA序列,并连接上H2Kk蛋白跨膜区序列的编码序列,以在抗体末端加入H2Kk蛋白的跨膜区序列,将所得DNA序列克隆如pcDNA5/FRT/GOI载体(Life Science Technology,USA)中。将该载体转入Flp-InMM-293细胞中,利用该Flp-InTM-293细胞所含的Flp-InTM系统将含Flp重组靶位点的该IgG1编码序列通过Flp重组酶整合到lacZ-ZeocinTM融合基因座上。没有整合成功的细胞能够表达抗Zeocin的蛋白;而整合成功后,抗Zeocin的蛋白由于缺少起始密码子ATG而不能表达,但能够表达抗潮霉素的蛋白。因此,利用潮霉素抗生素来筛选出IgG1整合成功的293细胞,在这类细胞中,每个细胞只表达一个拷贝的抗HEL-IgG1基因。The present embodiment employs from Invitrogen stably expressing Flp-In TM -293 lacZ-ZeocinTM a cell fusion locus. First, a low-affinity cDNA sequence of mouse IgG1 antibody (K D = 2.78E-09M) against chicken egg lysozyme (HEL) was synthesized, and the coding sequence of the H2Kk protein transmembrane region sequence was ligated to add H2Kk at the end of the antibody. The transmembrane region sequence of the protein was cloned into a cDNA sequence such as pcDNA5/FRT/GOI vector (Life Science Technology, USA). The vector into Flp-InMM-293 cells, using the Flp-In TM system Flp-In TM -293 cells contained the coding sequence containing the IgG1 Flp recombinase integrated into the target site by Flp recombinase lacZ- ZeocinTM fusion locus. Cells that were not successfully integrated were able to express anti-Zeocin proteins; after successful integration, anti-Zeocin proteins could not be expressed due to the lack of the initiation codon ATG, but were able to express hygromycin-resistant proteins. Therefore, hygromycin antibiotics were used to screen for IgG1-synthesized 293 cells in which only one copy of the anti-HEL-IgG1 gene was expressed per cell.
接着,分别针对IgG1重链和轻链的各3个CDRs选择16个合适的PAM序列设计如下所示的sgRNA(SEQ ID NO:73-88),使每个重链或轻链的CDR至少有2条sgRNA覆盖:Next, 16 suitable PAM sequences were selected for each of the 3 CDRs of the IgG1 heavy and light chain, respectively, to design the sgRNA (SEQ ID NO: 73-88) shown below, such that the CDR of each heavy or light chain has at least 2 sgRNA overlays:
IgHIgH
IgLIgL
然后将sgRNA序列克隆到pSUPER-puro质粒载体(Addgene)中。将实施例3构建的MO91-dCas9(3*flag,NLS)-AIDX质粒和sgRNA库(即16个sgRNA按等量混合在一起)或对照基因AAVS1的sgRNA共转染到前文获得的表达IgG1的293细胞中,经过嘌呤霉素和杀稻瘟菌素抗生素筛杀后,于转染后第7天进行PE抗小鼠IgG和Alex647-HEL表面染色后进行流式分选,分选出IgG强度不变而和HEL抗原结合增加的细胞。经培养增殖后,首先对DNA上的突变进行高通量测序分析,其结果和本文对ABL基因或GFP基因的突变基本一致(图15)。dCas9-AIDX诱导了抗HEL IgG1可变区的碱基突变并可重复地诱导IgG1CDR的碱基突变(图16)。The sgRNA sequence was then cloned into the pSUPER-puro plasmid vector (Addgene). The MO91-dCas9 (3*flag, NLS)-AIDX plasmid constructed in Example 3 and the sgRNA library (ie, 16 sgRNAs were mixed together in equal amounts) or the sgRNA of the control gene AAVS1 were co-transfected into the IgG1-expressing IgG1 obtained previously. In 293 cells, after penicillin and blasticidin antibiotics were screened, PE anti-mouse IgG and Alex647-HEL were stained on the 7th day after transfection, and then flow sorted to sort out IgG strength. Cells that are unchanged and bind to the HEL antigen. After culture and proliferation, the high-throughput sequencing analysis of the mutations on the DNA was first performed, and the results were basically consistent with the mutations of the ABL gene or the GFP gene (Fig. 15). dCas9-AIDX induced base mutations in the anti-HEL IgG1 variable region and repeatedly induced base mutations in the IgG1 CDRs (Fig. 16).
而后,用PE抗小鼠IgG1和647-HEL表面染色在流式细胞仪上检测突变后的细胞,发现一小群细胞的IgG1表达不变而和HEL结合增加。而后对这群细胞进行流式分选,分选扩增后,和突变前的细胞进行比较,发现突变后的抗体对HEL的亲和力增强了10倍以上(图17)。Thereafter, the mutant cells were detected by flow cytometry using PE anti-mouse IgG1 and 647-HEL surface staining, and it was found that a small group of cells had unchanged IgG1 expression and increased binding to HEL. This group of cells was then subjected to flow sorting, and after sorting and amplification, compared with the cells before the mutation, it was found that the affinity of the mutant antibody to HEL was enhanced more than 10 times (Fig. 17).
然后收取适量细胞抽取基因组DNA进行测序,发现其亲和力增加的主要原因是由轻链的52位的甘氨酸突变为天冬氨酸(碱基为GGT改变为GAT,图15)。Then, an appropriate amount of cells were extracted for genomic DNA for sequencing, and the main reason for the increase in affinity was that the glycine at position 52 of the light chain was mutated to aspartic acid (the base was changed from GGT to GAT, Fig. 15).
实施例14:其它融合蛋白的制备Example 14: Preparation of other fusion proteins
1、质粒的构建1, the construction of the plasmid
(1)利用基因合成合成XTEN接头序列;(1) synthesizing XTEN linker sequences by gene synthesis;
(2)利用限制性内切酶对实施例2构建获得的MO91-dCas9-AIDX质粒进行酶切,回收载体、AIDX片段和dCas9片段;(2) the MO91-dCas9-AIDX plasmid obtained in the construction of Example 2 was digested with restriction endonuclease, and the vector, AIDX fragment and dCas9 fragment were recovered;
(3)分别将酶切后的AIDX片段、dCas9片段、XTEN接头序列与MO91载体连接,然后将连接产物转化到Stbl3感受态细胞中;(3) ligating the AIDX fragment, dCas9 fragment and XTEN linker sequence into the MO91 vector, and then transforming the ligation product into Stbl3 competent cells;
(4)挑选阳性克隆,抽提质粒并送测序验证,至此完成了MO91-dCas9-XTEN-AIDX质粒的构建; (4) The positive clones were selected, the plasmid was extracted and sequenced, and the construction of the MO91-dCas9-XTEN-AIDX plasmid was completed.
可参照上述步骤以及实施例1和2的方法构建质粒MO91-AIDX-XTEN-dCas9,MO91-dCas9-XTEN-AIDX(K10E T82I E156G)以及MO91-nCas9-AIDX。The plasmids MO91-AIDX-XTEN-dCas9, MO91-dCas9-XTEN-AIDX (K10E T82I E156G) and MO91-nCas9-AIDX can be constructed by referring to the above steps and the methods of Examples 1 and 2.
在需要克隆入3*flag和/或NLS片段时,可参照实施例3的方法在上述质粒中克隆入3*flag和/或NLS片段,获得分别表达SEQ ID NO:66、68、70和72所示融合蛋白的质粒。这些融合蛋白中的AIDX为从第183位氨基酸残基起截短的AID片段或其突变体。When a 3*flag and/or NLS fragment needs to be cloned, the 3*flag and/or NLS fragment can be cloned into the above plasmid by the method of Example 3 to obtain SEQ ID NO: 66, 68, 70 and 72, respectively. A plasmid for the fusion protein shown. The AIDX in these fusion proteins is an AID fragment or a mutant thereof truncated from amino acid residue 183.
2、重组蛋白的表达和纯化2. Expression and purification of recombinant protein
(1)按常规方法构建质粒pET-nCas9-AIDX-6His,然后用该质粒转化大肠杆菌BL21STAR-感受态细胞;(1) constructing plasmid pET-nCas9-AIDX-6His according to a conventional method, and then transforming Escherichia coli BL21STAR-competent cells with the plasmid;
(2)将所得表达菌株在含有100μg/ml卡那霉素的LB培养基中在37℃下生长过夜。将细胞以1:100稀释到2xYT培养基中,并在37℃下生长至OD 600=~0.6。培养物在2小时内冷却至4℃,加入IPTG 0.5mM,诱导蛋白表达~16h;(2) The resulting expression strain was grown overnight in LB medium containing 100 μg/ml kanamycin at 37 °C. The cells were diluted 1:100 into 2xYT medium and grown at 37 °C to an OD600 = -0.6. The culture was cooled to 4 ° C in 2 hours, IPTG 0.5 mM was added, and the protein expression was induced for ~16 h;
(3)通过在4000g离心15分钟收集细胞,并重悬于裂解缓冲液中;(3) The cells were collected by centrifugation at 4000 g for 15 minutes and resuspended in lysis buffer;
(4)细胞用细胞破碎剂(Union)在800巴下裂解5分钟,离心后分离裂解物上清15分钟;(4) The cells were lysed with a cell disrupting agent (Union) at 800 bar for 5 minutes, and the lysate supernatant was separated by centrifugation for 15 minutes after centrifugation;
(5)将裂解物在4℃下与Ni-NTA(1ml浆液/L细菌)(DP101,TransGen)一起温育1小时以捕获His-标记的融合蛋白;将树脂转移到柱中,用冷洗涤缓冲液(使用考马斯G250不能观察到颜色变化的程度)广泛洗涤;(5) The lysate was incubated with Ni-NTA (1 ml of slurry/L bacteria) (DP101, TransGen) for 1 hour at 4 ° C to capture the His-tagged fusion protein; the resin was transferred to the column and washed with cold Buffer (wide extent of color change not observed with Coomassie G250);
(6)His标记的融合蛋白在洗脱缓冲液中洗脱,并通过超滤(Amicon-Millipore,100kDa分子量截留)浓缩至1ml总体积;(6) His-tagged fusion protein was eluted in elution buffer and concentrated to a total volume of 1 ml by ultrafiltration (Amicon-Millipore, 100 kDa molecular weight cut-off);
(7)将蛋白质在缓冲液A中稀释至20ml,并加载到Hi-Trap SP柱(29051324,GE Healthcare)上并用100mM-1M NaCl梯度洗脱;(7) The protein was diluted to 20 ml in buffer A and loaded onto a Hi-Trap SP column (29051324, GE Healthcare) and eluted with a gradient of 100 mM-1 M NaCl;
(8)将含有nCas9-AIDX的洗脱级分浓缩至约1ml,并通过使用Superdex 20010/300GL柱(17517501,GE医疗);(8) The eluted fraction containing nCas9-AIDX was concentrated to about 1 ml and passed through a Superdex 20010/300 GL column (17517501, GE Healthcare);
(9)将洗脱的蛋白质浓缩至约3mg/ml,在液氮中快速冷冻并储存在-80℃。(9) The eluted protein was concentrated to about 3 mg/ml, rapidly frozen in liquid nitrogen and stored at -80 °C.
在细菌中诱导nCas9-AIDX表达的电泳图谱见图18。An electropherogram of the induction of nCas9-AIDX expression in bacteria is shown in Figure 18.
3、不同融合蛋白的功能测试3. Functional testing of different fusion proteins
采用与实施例10相同的方法测试本实施例不同融合蛋白的功能。结果如图19-21所示。 The function of the different fusion proteins of this example was tested in the same manner as in Example 10. The result is shown in Figure 19-21.
Claims (10)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610423512 | 2016-06-15 | ||
| CN201610423512.8 | 2016-06-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017215619A1 true WO2017215619A1 (en) | 2017-12-21 |
Family
ID=60663317
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/088369 Ceased WO2017215619A1 (en) | 2016-06-15 | 2017-06-15 | Fusion protein producing point mutation in cell, and preparation and use thereof |
Country Status (2)
| Country | Link |
|---|---|
| CN (2) | CN114380922B (en) |
| WO (1) | WO2017215619A1 (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108504676A (en) * | 2018-02-05 | 2018-09-07 | 上海科技大学 | A kind of pnCasSA-BEC plasmids and its application |
| CN109593781A (en) * | 2018-12-20 | 2019-04-09 | 华中农业大学 | The accurate efficient edit methods of upland cotton genome |
| CN112480262A (en) * | 2019-09-11 | 2021-03-12 | 中国科学院沈阳应用生态研究所 | Fusion protein and preparation and application thereof |
| WO2022047624A1 (en) * | 2020-09-01 | 2022-03-10 | Huigene Therapeutics Co., Ltd | Small cas proteins and uses thereof |
| CN115094127A (en) * | 2022-02-22 | 2022-09-23 | 中国科学院深圳先进技术研究院 | A method for in situ detection of protein-deoxyribonucleotide binding sites |
| CN115850385A (en) * | 2022-07-04 | 2023-03-28 | 北京惠之衡生物科技有限公司 | Expression promoting peptide and application thereof |
| WO2024069581A1 (en) * | 2022-09-30 | 2024-04-04 | Illumina Singapore Pte. Ltd. | Helicase-cytidine deaminase complexes and methods of use |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110527697B (en) * | 2018-05-23 | 2023-07-07 | 中国科学院分子植物科学卓越创新中心 | RNA fixed-point editing technology based on CRISPR-Cas13a |
| CN110938658B (en) * | 2018-09-21 | 2023-02-07 | 中国科学院分子细胞科学卓越创新中心 | Antibody evolution method and application thereof |
| CN109402096B (en) * | 2018-11-20 | 2021-01-01 | 中国科学院生物物理研究所 | AID enzyme mutant and application thereof |
| CN111748546B (en) * | 2019-03-26 | 2023-05-09 | 复旦大学附属中山医院 | Fusion protein for generating gene point mutation and induction method of gene point mutation |
| BR112021021979A2 (en) * | 2019-05-03 | 2022-12-06 | Specific Biologics Inc | POLYPEPTIDES, CHIMERIC NUCLEASES, PHARMACEUTICALLY ACCEPTABLE FORMULATIONS, METHODS FOR EDITING GENOMIC DNA, DELETING DEFINED LENGTHS OF A DNA MOLECULE AND REPLACING SELECTED SEQUENCES OF A DNA MOLECULE, PROCESS, METHODS OF ADMINISTRATION OF THE FORMULATION, OF TREATMENT OF A DISEASE DISEASE LUNG AND TREATMENT OF MONOGENETIC AND INFECTIOUS DISEASES, LIGAND AND MODIFIED DONOR DNA MOLECULES |
| CN111304180B (en) * | 2019-06-04 | 2023-05-26 | 山东舜丰生物科技有限公司 | Novel DNA nucleic acid cutting enzyme and application thereof |
| US20230051661A1 (en) * | 2019-12-26 | 2023-02-16 | Agency For Science, Technology And Research | Nucleobase Editors |
| CN111518794B (en) * | 2020-04-13 | 2023-05-16 | 中山大学 | Preparation and application of inducible mutant protein based on activation-inducible cytidine deaminase |
| CN113773373B (en) * | 2021-10-12 | 2024-02-06 | 成都齐碳科技有限公司 | Mutant of porin monomer, protein hole and application thereof |
| CN113896776B (en) * | 2021-10-12 | 2024-02-06 | 成都齐碳科技有限公司 | Mutant of porin monomer, protein hole and application thereof |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015089406A1 (en) * | 2013-12-12 | 2015-06-18 | President And Fellows Of Harvard College | Cas variants for gene editing |
| WO2015133554A1 (en) * | 2014-03-05 | 2015-09-11 | 国立大学法人神戸大学 | Genomic sequence modification method for specifically converting nucleic acid bases of targeted dna sequence, and molecular complex for use in same |
| WO2016022363A2 (en) * | 2014-07-30 | 2016-02-11 | President And Fellows Of Harvard College | Cas9 proteins including ligand-dependent inteins |
| CN105518146A (en) * | 2013-04-04 | 2016-04-20 | 哈佛学院校长同事会 | Therapeutic uses of genome editing with CRISPR/CAS systems |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| ES2375863T3 (en) * | 2006-01-03 | 2012-03-07 | F. Hoffmann-La Roche Ag | CHEMERIC FUSION PROTEIN WITH SUPERIOR CHAPERON AND FOLDING ACTIVITIES. |
| EP3322804B1 (en) * | 2015-07-15 | 2021-09-01 | Rutgers, The State University of New Jersey | Nuclease-independent targeted gene editing platform and uses thereof |
-
2017
- 2017-06-15 CN CN202210113683.6A patent/CN114380922B/en active Active
- 2017-06-15 WO PCT/CN2017/088369 patent/WO2017215619A1/en not_active Ceased
- 2017-06-15 CN CN201710451424.3A patent/CN107522787B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105518146A (en) * | 2013-04-04 | 2016-04-20 | 哈佛学院校长同事会 | Therapeutic uses of genome editing with CRISPR/CAS systems |
| WO2015089406A1 (en) * | 2013-12-12 | 2015-06-18 | President And Fellows Of Harvard College | Cas variants for gene editing |
| WO2015133554A1 (en) * | 2014-03-05 | 2015-09-11 | 国立大学法人神戸大学 | Genomic sequence modification method for specifically converting nucleic acid bases of targeted dna sequence, and molecular complex for use in same |
| WO2016022363A2 (en) * | 2014-07-30 | 2016-02-11 | President And Fellows Of Harvard College | Cas9 proteins including ligand-dependent inteins |
Non-Patent Citations (2)
| Title |
|---|
| LEE, C.M. ET AL.: "The Neisseria Meningitidis CRISPR-Cas9 System Enables Specific Genome Editing in Mammalian Cells", MOLECULAR THERAPY, vol. 24, no. 3, 16 February 2016 (2016-02-16), pages 645 - 654, XP055449590 * |
| MA, Y.Q. ET AL.: "Targeted AID-Mediated Mutagenesis (TAM) Enables Efficient Genomic Diversification in Mammalian Cells", NATURE METHODS, vol. 13, no. 12, 10 October 2016 (2016-10-10), pages 1029 - 1035, XP002778319 * |
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108504676A (en) * | 2018-02-05 | 2018-09-07 | 上海科技大学 | A kind of pnCasSA-BEC plasmids and its application |
| CN108504676B (en) * | 2018-02-05 | 2021-12-10 | 上海科技大学 | pnCasSA-BEC plasmid and application thereof |
| CN109593781A (en) * | 2018-12-20 | 2019-04-09 | 华中农业大学 | The accurate efficient edit methods of upland cotton genome |
| CN112480262A (en) * | 2019-09-11 | 2021-03-12 | 中国科学院沈阳应用生态研究所 | Fusion protein and preparation and application thereof |
| CN112480262B (en) * | 2019-09-11 | 2022-10-28 | 中国科学院沈阳应用生态研究所 | Fusion protein and preparation and application thereof |
| WO2022047624A1 (en) * | 2020-09-01 | 2022-03-10 | Huigene Therapeutics Co., Ltd | Small cas proteins and uses thereof |
| US20240209396A1 (en) * | 2020-09-01 | 2024-06-27 | Huigene Therapeutics Co., Ltd. | Small cas proteins and uses thereof |
| CN115094127A (en) * | 2022-02-22 | 2022-09-23 | 中国科学院深圳先进技术研究院 | A method for in situ detection of protein-deoxyribonucleotide binding sites |
| WO2023160163A1 (en) * | 2022-02-22 | 2023-08-31 | 中国科学院深圳先进技术研究院 | Method for detecting binding position of protein and deoxyribonucleotide in situ |
| CN115850385A (en) * | 2022-07-04 | 2023-03-28 | 北京惠之衡生物科技有限公司 | Expression promoting peptide and application thereof |
| CN115850385B (en) * | 2022-07-04 | 2023-08-11 | 北京惠之衡生物科技有限公司 | Expression-promoting peptide and application thereof |
| WO2024069581A1 (en) * | 2022-09-30 | 2024-04-04 | Illumina Singapore Pte. Ltd. | Helicase-cytidine deaminase complexes and methods of use |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107522787A (en) | 2017-12-29 |
| CN114380922A (en) | 2022-04-22 |
| CN114380922B (en) | 2025-02-28 |
| CN107522787B (en) | 2025-09-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2017215619A1 (en) | Fusion protein producing point mutation in cell, and preparation and use thereof | |
| US12037611B2 (en) | Enhanced hAT family transposon-mediated gene transfer and associated compositions, systems, and methods | |
| JP6892642B2 (en) | A set of polypeptides that exhibit nuclease or nickase activity photodependently or in the presence of a drug, or suppress or activate the expression of a target gene. | |
| CN107794272B (en) | High-specificity CRISPR genome editing system | |
| CN114729368A (en) | Compositions and methods for immunotherapy | |
| CN111902536A (en) | Engineered CAS9 system for eukaryotic genome modification | |
| JP2016523084A (en) | Target integration | |
| JP2009017884A (en) | Chromosome-based platform | |
| JPH04505104A (en) | Production of proteins using homologous recombination | |
| JP6956416B2 (en) | Transposon system, kits containing it and their use | |
| US20240425830A1 (en) | Engineered cas12i nuclease, effector protein and use thereof | |
| CN109295053A (en) | Methods for regulating RNA splicing by inducing base mutation at the splice site or base substitution in the polypyrimidine region | |
| JP3844656B2 (en) | Methods for transformation of animal cells | |
| JP2024133642A (en) | Active DNA transposon system and methods of use thereof | |
| CN109929839A (en) | Detatching single base gene editing system and its application | |
| JP2019511235A (en) | Method of making synthetic chromosomes expressing biosynthetic pathways and use thereof | |
| JP2009538144A (en) | Protein production using eukaryotic cell lines | |
| CN117904208A (en) | Methods of selecting cells based on integration of a detectable label with CRISPR/Cas control of a target protein | |
| JP7026304B2 (en) | Targeted in-situ protein diversification through site-specific DNA cleavage and repair | |
| JP2011504741A (en) | New recombinant sequence | |
| CN111051509A (en) | Compositions containing C2CL endonuclease for dielectric calibration and methods of using the same for dielectric calibration | |
| TW200411052A (en) | Method of transferring mutation into target nucleic acid | |
| Qi et al. | Construction of a TAT-Cas9-EGFP Site-Specific Integration Eukaryotic Cell Line Using Efficient PEG10 Modification | |
| HK40075530A (en) | High fidelity spcas9 nucleases for genome modification | |
| CN118726266A (en) | A BRAF V600E point mutation lung epithelial cell and its construction method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17812730 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17812730 Country of ref document: EP Kind code of ref document: A1 |