WO2022266849A1

WO2022266849A1 - Screening of novel crispr-cas13 protein and use thereof

Info

Publication number: WO2022266849A1
Application number: PCT/CN2021/101596
Authority: WO
Inventors: 周海波; 许争争
Original assignee: Center for Excellence in Brain Science and Intelligence Technology Chinese Academy of Sciences
Current assignee: Center for Excellence in Brain Science and Intelligence Technology Chinese Academy of Sciences
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2022-12-29
Anticipated expiration: 2023-12-22
Also published as: CN117480251A; WO2022268135A1

Abstract

Screening of a novel CRISPR-Cas13 protein and the use thereof. A novel Cas13 family protein, a method for screening the novel Cas13 family protein, and corresponding RNA detection systems, RNA editing systems and other systems, and the use thereof. Disclosed are the Cas13 protein and related RNA detection systems, RNA editing systems and other systems. The novel Cas13 protein contains more extended HEPN domains. By means of the method for rapidly searching the Cas13 protein, various novel Cas13 proteins are obtained, which have broad application prospects and a huge market value.

Description

Screening and application of novel CRISPR-Cas13 proteins

technical field

本公开内容涉及生物技术及医学领域。更具体地，本公开内容涉及新的Cas13家族蛋白、筛选新的Cas13家族蛋白的方法、以及相应的RNA检测、编辑系统及其应用。本公开内容尤其涉及Cas13蛋白及相关的RNA检测系统。This disclosure relates to the fields of biotechnology and medicine. More specifically, the present disclosure relates to novel Cas13 family proteins, methods for screening novel Cas13 family proteins, and corresponding RNA detection and editing systems and applications thereof. In particular, the present disclosure relates to Cas13 proteins and related RNA detection systems.

Background technique

CRISPR-Cas系统是原核生物(主要是细菌和古细菌)的一种适应性免疫系统，能够对外源病毒(例如噬菌体)的入侵进行防御。依据作用对像不同可以分为两大类，一类是靶向DNA的CRISPR-Cas系统，例如CRISPR-Cas9；另一类则是靶向RNA的CRISPR-Cas系统，例如Cas13家族，包括Cas13a、Cas13b、Cas13c、Cas13d、Cas13X、Cas13Y，等。与DNA编辑的CRISPR-Cas系统相比，CRISPR-Cas13系统主要由HEPN(higher eukaryotes and prokaryotes nucleotide)结构域来发挥切割和抵御外来入侵核酸，因而靶向RNA的CRISPR-Cas的作用更加温和且直接，能够在不改变基因组的情况下进行RNA转录本的调控，提高了基因编辑的安全性。此外，既往研究发现当前已知的Cas13蛋白/系统在被基于sgRNA引导的靶序列识别激活时会发生构象变化，表现出特异性RNase活性的同时也会有非特异的RNase活性(称为bystander RNase酶切活性)，而bystander旁切特性会无差别的对靶RNA临近RNA分子进行切割，尤其是Cas13a和Cas13b，它们显示出非常强烈的bystander RNase酶切活性。利用这一特性研究学者们已将其应用于RNA病毒检测，疾病治疗等领域，如2017年，张锋等利用Cas13a蛋白结合RPA等温扩增技术及逆转录技术，开发了一种能够对痕量DNA和RNA进行检测的新方法SHERLOCK。2018年，Jennifer等开发了可以检测核酸的新方法DETECTR，他们将Cas12a蛋白和LAMP等温扩增技术结合，其灵敏度可以实现对aM级样品的检测。The CRISPR-Cas system is an adaptive immune system of prokaryotes (mainly bacteria and archaea), which can defend against the invasion of foreign viruses (such as phages). According to different targets, they can be divided into two categories, one is the CRISPR-Cas system targeting DNA, such as CRISPR-Cas9; the other is the CRISPR-Cas system targeting RNA, such as the Cas13 family, including Cas13a, Cas13b, Cas13c, Cas13d, Cas13X, Cas13Y, etc. Compared with the CRISPR-Cas system for DNA editing, the CRISPR-Cas13 system mainly uses the HEPN (higher eukaryotes and prokaryotes nucleotide) domain to cut and defend against foreign invading nucleic acids, so the role of CRISPR-Cas targeting RNA is milder and more direct , enabling the regulation of RNA transcripts without altering the genome, improving the safety of gene editing. In addition, previous studies have found that the currently known Cas13 protein/system will undergo conformational changes when it is activated by sgRNA-guided target sequence recognition, showing specific RNase activity as well as non-specific RNase activity (called bystander RNase activity). enzyme cleavage activity), and the bystander cleavage property will indiscriminately cleave RNA molecules adjacent to the target RNA, especially Cas13a and Cas13b, which show very strong bystander RNase cleavage activity. Using this feature, researchers have applied it to the detection of RNA viruses, disease treatment and other fields. For example, in 2017, Zhang Feng et al. used Cas13a protein combined with RPA isothermal amplification technology and reverse transcription technology to develop a method that can detect trace amounts of New method SHERLOCK for DNA and RNA detection. In 2018, Jennifer and others developed a new method DETECTR that can detect nucleic acids. They combined Cas12a protein and LAMP isothermal amplification technology, and its sensitivity can realize the detection of aM level samples.

然而不同Cas13蛋白的sgRNA在target目标RNA区域的时候会有很强的靶向序列偏好性(Protospacer flanking site，PFS)的特性(类似Cas9系统的PAM)，这在一定程度上会限制它的应用范围，因为有时候需要靶向的核酸如果因为PFS不存在则会极大的降低甚至没法启动Cas13蛋白的RNase活性。因而亟需从自然界寻找适用多种不同PFS特性的Cas13蛋白来增加Cas13蛋白在检测领域，临床诊疗等方面上的应用。However, sgRNAs of different Cas13 proteins will have strong targeting sequence preference (Protospacer flanking site, PFS) characteristics (similar to the PAM of the Cas9 system) when targeting the target RNA region, which will limit its application to a certain extent range, because sometimes the nucleic acid that needs to be targeted will greatly reduce or even fail to activate the RNase activity of the Cas13 protein if the PFS does not exist. Therefore, it is urgent to find Cas13 proteins suitable for a variety of different PFS properties from nature to increase the application of Cas13 proteins in the field of detection, clinical diagnosis and treatment, etc.

发明内容Contents of the invention

针对现有筛选新型CRISPR-Cas蛋白技术的不足和实际需求，本公开内容提供了一种快速寻找包含较多的拓展的HEPN结构域(至少2个)的新型CRISPR-Cas13直系同源蛋白的方法并从生物信息分析层面(例如，序列比对、蛋白结构预测等)和实验层面验证了候选蛋白的RNA编辑活性。这些蛋白潜在应用于RNA层面的调控、编辑、检测等方面，具有广阔的学术价值和商业应用价值。Aiming at the deficiencies and actual needs of the existing technology for screening novel CRISPR-Cas proteins, the present disclosure provides a method for quickly finding novel CRISPR-Cas13 orthologous proteins containing more expanded HEPN domains (at least 2) And the RNA editing activity of the candidate protein was verified from the level of biological information analysis (for example, sequence alignment, protein structure prediction, etc.) and experimental level. These proteins are potentially applied to the regulation, editing, and detection of RNA levels, and have broad academic and commercial value.

本公开内容所解决的技术问题是如何快速寻找新型的RNA酶切活性结构域(拓展的HEPN结构域)较多的候选CRISPR-Cas13蛋白及其系统；其次是验证候选CRISPR-Cas13蛋白及其系统的活性；并最终获得了多种新型Cas13蛋白。The technical problem solved in this disclosure is how to quickly find a candidate CRISPR-Cas13 protein and its system with more novel RNA enzymatic cleavage active domains (extended HEPN domain); secondly, to verify the candidate CRISPR-Cas13 protein and its system activity; and finally obtained a variety of novel Cas13 proteins.

本公开内容实现了以下技术效果：The disclosure achieves the following technical effects:

(1)开发了快速筛选新型Cas13家族蛋白的分析方法，该方法可以对新更新的原核微生物DNA序列和宏基因组序列进行CRIPSR array系统的分析和相关效应蛋白的筛选；(1) Developed an analysis method for rapid screening of novel Cas13 family proteins, which can perform CRIPSR array system analysis and screening of related effector proteins on newly updated prokaryotic microbial DNA sequences and metagenomic sequences;

(2)筛选的Cas13家族成员，拓展CRISPR-Cas13的应用范围，能够通过整合多种不同PFS特性的Cas13蛋白来实现增强多种或者单一病毒检测的灵敏度。同时通过腺相关病毒等递送载体包装还可以实现相关疾病诊疗，如神经相关退行性疾病的诊疗，在植物领域则可以开展育种，逆境胁迫等方面的研究，在微生物领域可以进行相关工程菌的改造等；(2) The screened Cas13 family members can expand the application range of CRISPR-Cas13, and can enhance the sensitivity of multiple or single virus detection by integrating Cas13 proteins with different PFS characteristics. At the same time, through the packaging of adeno-associated virus and other delivery vectors, the diagnosis and treatment of related diseases can also be realized, such as the diagnosis and treatment of nerve-related degenerative diseases. In the field of plants, research on breeding and adversity stress can be carried out. In the field of microorganisms, related engineering bacteria can be transformed. Wait;

(3)本方法在筛选过程中，除利用Cas13蛋白的已知HEPN结构域进行筛选外，还将其他种类的蛋白质中具备RNA切割活性的保守型结构域包括在内，从而提供了筛选新的Cas13蛋白的可能，并且由于这些新Cas13蛋白中这些新的功能结构域的鉴定，为进一步改造Cas13蛋白提供了新的思路和可能性。(3) In the screening process of this method, in addition to using the known HEPN domain of the Cas13 protein to screen, the conserved domain with RNA cleavage activity in other types of proteins will also be included, thereby providing a new screening method. The possibility of Cas13 protein, and due to the identification of these new functional domains in these new Cas13 proteins, provides new ideas and possibilities for further modifying Cas13 proteins.

在本公开内容的一个方面中，提供了Cas13蛋白。In one aspect of the disclosure, a Cas13 protein is provided.

在一个优选的实施方案中，所述Cas13蛋白包含如SEQ ID NO:1-204中任一项所述的氨基酸序列，或具有一个或更多个残基的保守氨基酸取代的SEQ ID NO:1-198中任一项所述的氨基酸序列。In a preferred embodiment, the Cas13 protein comprises an amino acid sequence as described in any one of SEQ ID NO: 1-204, or SEQ ID NO: 1 with conservative amino acid substitutions of one or more residues - the amino acid sequence of any one of 198.

在一个优选的实施方案中，所述Cas13蛋白的RNA切割活性被保留。In a preferred embodiment, the RNA cleavage activity of the Cas13 protein is retained.

在一个优选的实施方案中，所述Cas13蛋白的HEPN结构域或RNA切割结构域经进一步修饰或改造，而使其RNA切割活性降低或消除，成为RNA切割活性降低或消除的dCas13。In a preferred embodiment, the HEPN domain or the RNA cleavage domain of the Cas13 protein is further modified or transformed to reduce or eliminate its RNA cleavage activity, and become dCas13 with reduced or eliminated RNA cleavage activity.

在一个优选的实施方案中，所述Cas13蛋白与一个或更多个异源功能性结构域融合。In a preferred embodiment, the Cas13 protein is fused with one or more heterologous functional domains.

在一个优选的实施方案中，所述融合在所述Cas13蛋白的N端、C端或者内部。In a preferred embodiment, the fusion is at the N-terminal, C-terminal or internal of the Cas13 protein.

在一个优选的实施方案中，所述一个或更多个异源功能性结构域具有以下活性：脱氨酶如胞苷脱氨基酶和脱氧腺苷脱氨基酶、甲基化酶、去甲基化酶、转录激活、转录抑制、核酸酶、单链RNA裂解、双链RNA裂解、单链DNA裂解、双链DNA裂解、DNA或RNA连接酶、报告蛋白、检测蛋白、定位信号、或其任意组合。在本公开内容的另一个方面中，提供了一种核酸分子，其包含编码上述Cas13蛋白的核苷酸序列。In a preferred embodiment, said one or more heterologous functional domains have the following activities: deaminases such as cytidine deaminase and deoxyadenosine deaminase, methylase, demethylase enzyme, transcriptional activation, transcriptional repression, nuclease, single-stranded RNA cleavage, double-stranded RNA cleavage, single-stranded DNA cleavage, double-stranded DNA cleavage, DNA or RNA ligase, reporter protein, detection protein, localization signal, or any combination. In another aspect of the present disclosure, a nucleic acid molecule comprising the nucleotide sequence encoding the above-mentioned Cas13 protein is provided.

在一个优选的实施方案中，所述核酸分子针对在特定宿主细胞中的表达而进行了密码子优化。In a preferred embodiment, the nucleic acid molecule is codon-optimized for expression in a particular host cell.

在一个优选的实施方案中，所述宿主细胞是原核或真核生物细胞，优选人细胞。In a preferred embodiment, the host cell is a prokaryotic or eukaryotic cell, preferably a human cell.

在一个优选的实施方案中，所述核酸分子包含与编码Cas13的核苷酸序列有效链接的启动子，其为组成型启动子、诱导型启动子、组织特异性启动子、嵌合型启动子或发育特异性启动子。In a preferred embodiment, the nucleic acid molecule comprises a promoter operatively linked to the nucleotide sequence encoding Cas13, which is a constitutive promoter, an inducible promoter, a tissue-specific promoter, a chimeric promoter or development-specific promoters.

在本公开内容的另一个方面中，提供了一种表达载体，其包含上述核酸分子，以DNA或RNA或蛋白等形式表达上述氨基酸序列或核苷酸序列。In another aspect of the present disclosure, an expression vector is provided, which comprises the above-mentioned nucleic acid molecule, and expresses the above-mentioned amino acid sequence or nucleotide sequence in the form of DNA, RNA, or protein.

在一个优选的实施方案中，所述表达载体为腺相关病毒(AAV)、腺病毒、慢病毒、逆转录病毒、单纯孢疹病毒、溶瘤病毒。In a preferred embodiment, the expression vector is adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, herpes simplex virus, oncolytic virus.

在本公开内容的另一个方面中，提供了一种递送系统，其包含(1)上述表达载体，或上述Cas13蛋白；以及(2)递送载体。In another aspect of the present disclosure, a delivery system is provided, comprising (1) the above-mentioned expression vector, or the above-mentioned Cas13 protein; and (2) a delivery vector.

在一个优选的实施方案中，所述递送载体是纳米颗粒、脂质体、外泌体、微囊泡或基因枪。In a preferred embodiment, the delivery vehicle is a nanoparticle, liposome, exosome, microvesicle or gene gun.

在本公开内容的另一个方面中，提供了一种CRISPR-Cas系统，其包含：(1)上述Cas13蛋白或核酸分子，或者其衍生物或功能片段；(2)用于靶向目标RNA的gRNA序列。In another aspect of the present disclosure, a CRISPR-Cas system is provided, which comprises: (1) the above-mentioned Cas13 protein or nucleic acid molecule, or a derivative or a functional fragment thereof; (2) a CRISPR-Cas system for targeting target RNA gRNA sequence.

在一个优选的实施方案中，其中所述gRNA序列包含同向重复(DR)序列和靶向靶RNA部分的间隔区域的序列。In a preferred embodiment, wherein the gRNA sequence comprises a direct repeat (DR) sequence and a sequence targeting a spacer region of the target RNA portion.

在一个优选的实施方案中，其中所述DR序列为表1中所示序列；其中所述间隔区序列为15-60个核苷酸，优选25-50个核苷酸，更优选30个核苷酸。In a preferred embodiment, wherein the DR sequence is the sequence shown in Table 1; wherein the spacer sequence is 15-60 nucleotides, preferably 25-50 nucleotides, more preferably 30 cores glycosides.

在一个优选的实施方案中，所述DR序列可以是对应以下任一项的衍生物，其中所述衍生物(i)与表1中所示序列中的任一个相比，具有一个或多个(例如1、2、3、4、5、6、7、8、9或10)个核苷酸的添加、缺失、或取代；(ii)与表1中所示序列中任何一个具有至少20％、 30％、40％、50％、60％、70％、80％、90％、95％或97％的序列同一性；(iii)在严格条件下与表1中所示序列任意一个，或与(i)和(ii)中的任意一个杂交；或(iv)是(i)-(iii)中任何一个的互补物，条件是所述衍生物非表1中所示序列中的任何一个，并且所述衍生物编码一个RNA，或本身即是一个RNA，所述RNA与SEQ ID NO：199-397编码的任意RNA基本保持相同的二级结构。In a preferred embodiment, the DR sequence may be a derivative corresponding to any of the following, wherein the derivative (i) has one or more (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) nucleotide additions, deletions, or substitutions; (ii) any one of the sequences shown in Table 1 has at least 20 %, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 97% sequence identity; (iii) any one of the sequences shown in Table 1 under stringent conditions, or hybridize to any one of (i) and (ii); or (iv) is the complement of any one of (i)-(iii), provided that the derivative is not any of the sequences shown in Table 1 One, and said derivative encodes, or is itself an RNA, said RNA substantially maintaining the same secondary structure as any RNA encoded by SEQ ID NO: 199-397.

在一个优选的实施方案中，所述CRISPR-Cas系统还包含：(3)靶RNA。In a preferred embodiment, the CRISPR-Cas system further comprises: (3) target RNA.

在一个优选的实施方案中，所述CRISPR-Cas系统引起靶RNA序列的降解、切割或序列的改变。In a preferred embodiment, the CRISPR-Cas system causes degradation, cleavage or sequence change of the target RNA sequence.

在一个优选的实施方案中，所述靶RNA是mRNA或ncRNA，包括选自lncRNA、miRNA、misc_RNA、Mt_rRNA、Mt_tRNA、rRNA、scaRNA、scRNA、snoRNA、snRNA、sRNA的非编码RNA。In a preferred embodiment, the target RNA is mRNA or ncRNA, including non-coding RNA selected from lncRNA, miRNA, misc_RNA, Mt_rRNA, Mt_tRNA, rRNA, scaRNA, scRNA, snoRNA, snRNA, sRNA.

在本公开内容的另一个方面中，提供了一种细胞，其包含上述Cas13蛋白、核酸分子、表达载体、递送系统或CRISPR-Cas系统。In another aspect of the present disclosure, there is provided a cell comprising the above-mentioned Cas13 protein, nucleic acid molecule, expression vector, delivery system or CRISPR-Cas system.

在一个优选的实施方案中，所述细胞为原核细胞或真核细胞，优选人细胞。In a preferred embodiment, the cells are prokaryotic or eukaryotic cells, preferably human cells.

在本公开内容的另一个方面中，提供了一种降解或切割目的细胞中靶RNA、修饰目的细胞中靶RNA的序列的方法，其包括使用上述Cas13蛋白、核酸分子、表达载体、递送载体或CRISPR-Cas系统。In another aspect of the present disclosure, there is provided a method for degrading or cutting target RNA in the target cell, modifying the sequence of the target RNA in the target cell, which includes using the above-mentioned Cas13 protein, nucleic acid molecule, expression vector, delivery vector or CRISPR-Cas system.

在一个优选的实施方案中，所述目的细胞为原核细胞或真核细胞，优选人细胞。In a preferred embodiment, the target cells are prokaryotic cells or eukaryotic cells, preferably human cells.

在一个优选的实施方案中，其中所述目的细胞为离体细胞、体外细胞或体内细胞。In a preferred embodiment, the target cells are ex vivo cells, in vitro cells or in vivo cells.

在本公开内容的另一个方面中，提供了上述Cas13蛋白、核酸分子、或CRISPR-Cas系统用于检测核酸分子的用途。In another aspect of the present disclosure, use of the above-mentioned Cas13 protein, nucleic acid molecule, or CRISPR-Cas system for detecting nucleic acid molecules is provided.

在一个优选的实施方案中，所述检测的靶标为RNA或DNA，其中所述RNA或DNA为原核微生物或真核生物体内的RNA或DNA。In a preferred embodiment, the detected target is RNA or DNA, wherein the RNA or DNA is RNA or DNA in prokaryotic microorganisms or eukaryotic organisms.

在本公开内容的另一个方面中，所述原核微生物为DNA病毒或其核酸、RNA病毒或其核酸。In another aspect of the present disclosure, the prokaryotic microorganism is a DNA virus or nucleic acid thereof, an RNA virus or nucleic acid thereof.

在本公开内容的另一个方面中，所述真核生物包括动物和植物，优选人；所述体内的RNA或DNA包括细胞内或体液中的RNA或DNA。In another aspect of the present disclosure, the eukaryotes include animals and plants, preferably humans; the RNA or DNA in vivo includes RNA or DNA in cells or body fluids.

在本公开内容的另一个方面中，所述体液包括血液、尿液或淋巴液等体液。In another aspect of the present disclosure, the bodily fluid includes blood, urine, or lymph fluid.

Description of drawings

图1示出了候选Cas13蛋白DZ109的DR序列的RNA二级结构预测结果。其中DZ109a表示DR1，DZ109b则表示DR2(序列参见表1)。Figure 1 shows the prediction results of the RNA secondary structure of the DR sequence of the candidate Cas13 protein DZ109. Among them, DZ109a represents DR1, and DZ109b represents DR2 (see Table 1 for the sequence).

图2A示出了候选蛋白DZ109在细胞层面验证其RNase活性的结果。在哺乳动物细胞系检测候选蛋白酶切活性实验结果：上图为含有DZ109蛋白的质粒(含有对应靶向mCherry的sgRNA)与含有mCherry蛋白的质粒共转染293T细胞系24h后，10倍镜下观察到的荧光结果图。可以发现与阴性对照组相比，候选蛋白DZ109的RNase活性很强，而且旁切活性也很强，对应绿光和红光都大幅度的降低。其中White light表示白光视野下的细胞结果；Green light表示绿色荧光下的结果图；Red light表示红色荧光下的结果图；阴性-R1为阴性对照组；其中PS394～PS396为含有靶向mcherry不同区域的sgRNA的编号，所用DR为DZ109a。R1和R2代表两个不同的实验重复。Figure 2A shows the results of verifying the RNase activity of the candidate protein DZ109 at the cell level. Experimental results of detection of candidate protease cleavage activities in mammalian cell lines: the above picture shows the plasmid containing DZ109 protein (containing the corresponding sgRNA targeting mCherry) and the plasmid containing mCherry protein after co-transfection of 293T cell line for 24 hours, observed under 10X microscope The obtained fluorescence results. It can be found that compared with the negative control group, the RNase activity of the candidate protein DZ109 is very strong, and the side-cutting activity is also very strong, and the corresponding green light and red light are greatly reduced. Among them, White light represents the cell results under white light field; Green light represents the result graph under green fluorescence; Red light represents the result graph under red fluorescence; Negative-R1 is the negative control group; among them, PS394～PS396 are different regions containing targeted mcherry The number of the sgRNA used is DZ109a. R1 and R2 represent two different experimental replicates.

图2B示出了候选蛋白DZ109在细胞层面验证其RNase活性的结果。在哺乳动物细胞系检测候选蛋白酶切活性实验结果：上图为含有DZ109蛋白的质粒(含有对应靶向mCherry的sgRNA)与含有mCherry蛋白的质粒共转染293T细胞系24h后，10倍镜下观察到的荧光结果图。可以发现与阴性对照组相比，候选蛋白DZ109在DZ109b序列做DR的时候只有ps397号sgRNA的时候DZ109的RNase活性和旁切活性也很强，对应绿光和红光都大幅度的降低。而其他sgRNA存在情况下则没有效果。其中White light表示白光视野下的细胞结果；Green light表示绿色荧光下的结果图；Red light表示红色荧光下的结果图；阴性-R1为阴性对照组；其中PS397～PS399为含有靶向mcherry不同区域的sgRNA的编号，所用DR为DZ109b。R1和R2代表两个不同的实验重复。Figure 2B shows the results of verifying the RNase activity of the candidate protein DZ109 at the cell level. Experimental results of detection of candidate protease cleavage activities in mammalian cell lines: the above picture shows the plasmid containing DZ109 protein (containing the corresponding sgRNA targeting mCherry) and the plasmid containing mCherry protein after co-transfection of 293T cell line for 24 hours, observed under 10X microscope The obtained fluorescence results. It can be found that compared with the negative control group, when the candidate protein DZ109 has only ps397 sgRNA when the DZ109b sequence is used as DR, the RNase activity and side cut activity of DZ109 are also very strong, and the corresponding green light and red light are greatly reduced. In the presence of other sgRNAs, there was no effect. Among them, White light represents the cell results under white light field; Green light represents the result graph under green fluorescence; Red light represents the result graph under red fluorescence; Negative-R1 is the negative control group; among them, PS397～PS399 are different regions containing targeted mcherry The number of the sgRNA used, the DR used is DZ109b. R1 and R2 represent two different experimental replicates.

图2C示出了DZ109细胞层面验证其RNase活性的流式分析结果。在哺乳动物细胞系检测候选蛋白酶切活性的流式分析实验结果：上图为含有DZ109蛋白的质粒(含有对应靶向mCherry的sgRNA)与含有mCherry蛋白的质粒共转染293T细胞系48h后，流式分析的结果图。可以发现与阴性对照组相比，候选蛋白DZ109的RNase活性很强，红绿双阳主群发生了明显的偏移，对应红光被大幅度的敲低。阴性对照为只含有表达mcherry蛋白(红光)和DZ109蛋白(发绿光)的对照组；其中PS394～PS396为含有靶向mcherry不同区域的sgRNA的实验组，所用DR为DZ109a。R1和R2代表两个不同的实验重复。Figure 2C shows the results of flow cytometric analysis of the RNase activity of DZ109 at the cell level. The results of flow cytometry assays for detecting the cleavage activity of candidate proteases in mammalian cell lines: the figure above shows the 293T cell line after co-transfection of the plasmid containing DZ109 protein (containing the corresponding sgRNA targeting mCherry) and the plasmid containing mCherry protein for 48 hours. The result graph of formula analysis. It can be found that compared with the negative control group, the RNase activity of the candidate protein DZ109 is very strong, and the main group of red and green double yang has a significant shift, corresponding to the red light being greatly knocked down. The negative control is the control group that only expresses mcherry protein (red light) and DZ109 protein (green light); PS394-PS396 is the experimental group containing sgRNA targeting different regions of mcherry, and the DR used is DZ109a. R1 and R2 represent two different experimental replicates.

图2D示出了DZ109细胞层面验证其RNase活性的流式分析结果。在哺乳动物细胞系检测候选蛋白酶切活性的流式分析实验结果：上图为含有DZ109蛋白的质粒(含有对应靶向 mCherry的sgRNA)与含有mCherry蛋白的质粒共转染293T细胞系48h后，流式分析的结果图。可以发现与阴性对照组相比，候选蛋白DZ109采用PS397号sgRNA的时候的RNase活性很强，红绿双阳主群发生了非常明显的偏移，对应红光和绿光被大幅度的敲低。阴性对照为只含有表达mcherry蛋白(红光)和DZ109蛋白(发绿光)的对照组；其中PS397～PS399为含有靶向mcherry不同区域的sgRNA的实验组，所用DR为DZ109b。R1和R2代表两个不同的实验重复。Figure 2D shows the results of flow cytometric analysis of the RNase activity of DZ109 at the cell level. The results of flow cytometry assays for detecting the cleavage activity of candidate proteases in mammalian cell lines: the figure above shows the 293T cell line after co-transfection of the plasmid containing DZ109 protein (containing the corresponding sgRNA targeting mCherry) and the plasmid containing mCherry protein for 48 hours. The result graph of formula analysis. It can be found that compared with the negative control group, when the candidate protein DZ109 uses PS397 sgRNA, the RNase activity is very strong, and the main group of red and green double positives has a very obvious shift, and the corresponding red light and green light are greatly knocked down. The negative control is the control group that only expresses mcherry protein (red light) and DZ109 protein (green light); PS397-PS399 is the experimental group containing sgRNA targeting different regions of mcherry, and the DR used is DZ109b. R1 and R2 represent two different experimental replicates.

图3示出了Cas13d[xdz9]阳性对照的结果。在哺乳动物细胞系检测候选蛋白酶切活性实验结果：上图为含有Cas13d蛋白的质粒(含有对应靶向mCherry的sgRNA)与含有mCherry蛋白的质粒共转染293T细胞系24h后，10倍镜下观察到的荧光结果图。其中White light表示白光视野下的细胞结果；Green light表示绿色荧光下的结果图；Red light表示红色荧光下的结果图；px262为阴性对照组px261为实验组。Figure 3 shows the results of the Cas13d[xdz9] positive control. Experimental results of detection of candidate protease cleavage activity in mammalian cell lines: the above picture shows the plasmid containing Cas13d protein (containing the corresponding sgRNA targeting mCherry) and the plasmid containing mCherry protein after co-transfection of 293T cell line for 24 hours, observed under a 10X microscope The obtained fluorescence results. Among them, White light represents the cell results under white light field; Green light represents the result graph under green fluorescence; Red light represents the result graph under red fluorescence; px262 is the negative control group and px261 is the experimental group.

图4示出了候选蛋白DZ109的模式图，其中图4A表示拓展的HEPN的位置信息；图4B则表示候选蛋白与CRISPR array的临近模式图。Figure 4 shows the schematic diagram of the candidate protein DZ109, in which Figure 4A shows the position information of the expanded HEPN; Figure 4B shows the adjacent schematic diagram of the candidate protein and CRISPR array.

detailed description

下面将结合实施例对本发明的实施方案进行详细描述，但是本领域技术人员将会理解，下列实施例仅用于举例说明本发明，而不应视为限定本发明的范围。实施例中未注明具体条件者，按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者，均为可以通过市购获得的常规产品。Embodiments of the present invention will be described in detail below in conjunction with examples, but those skilled in the art will understand that the following examples are only used to illustrate the present invention, and should not be considered as limiting the scope of the present invention. Those who do not indicate the specific conditions in the examples are carried out according to the conventional conditions or the conditions suggested by the manufacturer. The reagents or instruments used were not indicated by the manufacturer, and they were all commercially available conventional products.

如在说明书中所使用的，没有数量词修饰的名词可意指一个/种或更多个/种。如在权利要求书中所使用的，当与词语“包含/包括”结合使用时，没有数量词修饰的名词可意指一个/种或多于一个/种。As used in the specification, nouns modified by quantifiers may mean one or more. As used in the claims, when used in conjunction with the word "comprising/comprising", nouns modified by a quantifier may mean one or more than one.

权利要求书中术语“或/或者”的使用用于意指“和/或”，除非明确地指出仅指替代方案或替代方案是相互排斥的，尽管本公开内容支持仅指替代方案和“和/或”的限定。如本文中使用的“另一/另一些”可意指至少第二或更多个/种。The use of the term "or/or" in the claims is used to mean "and/or" unless expressly stated to mean only the alternative or the alternatives are mutually exclusive, although this disclosure supports referring only to the alternative and "and /or" is limited. "Another" as used herein may mean at least a second or more.

在整个本申请中，术语“约”用于表示值包括装置的误差、用于确定该值的方法的固有变化，或者存在于研究对象之间的固有变化。这样的固有变异可以是标注值的±10％的变异。Throughout this application, the term "about" is used to indicate that a value includes error in the device, inherent variation in the methods used to determine the value, or inherent variation among study subjects. Such inherent variation may be a variation of ±10% of the labeled value.

在整个申请中，除非另有说明，否则核苷酸序列以5’至3’方向列出，并且氨基酸序列以N端至C端方向列出。Throughout the application, unless otherwise stated, nucleotide sequences are listed in a 5' to 3' orientation and amino acid sequences are listed in an N-terminal to C-terminal orientation.

通过以下详细描述，本发明的其他目的、特征和优点将变得明显。然而，应理解，尽管表明了本发明的一些优选实施方案，但是详细描述和具体实施例仅以举例说明的方式给出，因为根据该详细描述，在本发明的精神和范围内的多种变化和修改对于本领域技术人员而言将变得明显。Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating certain preferred embodiments of the invention, are given by way of illustration only, since many changes can be made from the detailed description within the spirit and scope of the invention. and modifications will become apparent to those skilled in the art.

定义definition

NCBI(https://www.ncbi.nlm.nih.gov/)是指美国国家生物信息中心，是一个面向全世界的公共数据库，本领域技术人员利用该数据库提供的核酸数据库进行下载原核生物的基因组，蛋白质组相关数据库等，也可以利用该数据提供的blast比对软件进行序列比对的分析。NCBI (https://www.ncbi.nlm.nih.gov/) refers to the National Center for Biological Information of the United States, which is a public database facing the world. Those skilled in the art can use the nucleic acid database provided by this database to download prokaryotic Genome, proteome-related databases, etc., can also use the blast alignment software provided by the data for sequence alignment analysis.

IMG(https://img.jgi.doe.gov/)是指微生物基因组整合数据库，是新一代基因组数据库的代表，不仅能够完整收录现有数据库的内容，还提供了更完善的数据上传、注释和分析服务，将测序数据储存到IMG/M数据库。该数据可以下载纯培养细菌测序基因组、宏基因组、宏基因组组装基因组、单细胞测序基因组的数据。IMG (https://img.jgi.doe.gov/) refers to the integrated database of microbial genomes, which is a representative of the new generation of genome databases. and analysis services, storing sequencing data into the IMG/M database. The data can download the data of pure culture bacterial sequencing genome, metagenome, metagenomic assembly genome, and single cell sequencing genome.

CRISPR(cluster regularly interspaced short palindromic repeats)是原核生物，主要是指细菌和古细菌体内的一串DNA序列，包括同向重复(direct repeat，DR)区域和非重复间隔区(spacer)区域。而CRIPSR系统除了包含CRISPR array外，还包括相关的Cas蛋白。它们一起构成了细菌低于外来病毒入侵的免疫系统。CRISPR (cluster regularly interspaced short palindromic repeats) is a prokaryote, mainly referring to a string of DNA sequences in bacteria and archaea, including direct repeat (DR) regions and non-repeating spacer (spacer) regions. In addition to the CRISPR array, the CRIPSR system also includes related Cas proteins. Together, they make up the immune system that keeps bacteria below the invasion of foreign viruses.

Cas13家族是目前已知该能够靶向RNA的CRIPSR酶家族，其成员包括Cas13a、Cas13b、Cas13c、Cas13d、Cas13X和Cas13Y家族。与CRISPR/Cas9切割DNA的活性不同，CRISPR/Cas13能够用于切割细菌细胞中特定的RNA序列。The Cas13 family is the currently known CRIPSR enzyme family capable of targeting RNA, and its members include the Cas13a, Cas13b, Cas13c, Cas13d, Cas13X and Cas13Y families. Unlike the activity of CRISPR/Cas9 to cut DNA, CRISPR/Cas13 can be used to cut specific RNA sequences in bacterial cells.

附带切割(collateral cleavage)也称Bystander切割活性，在CRISPR-cas13family中它通常指CRISPR-Cas系统非特性的酶切活性，即在CRISPR-cas13蛋白在与sgRNA结合作用目标target区域的过程CRISPR-cas13蛋白发生构象改变变成非特性的RNase，不仅能够切割目标靶核酸，也能切割临近的核酸分子。如已报道的Cas13a、Cas13b等都表现出非常强的bystander旁切RNase活性。Collateral cleavage is also called Bystander cleavage activity. In CRISPR-cas13family, it usually refers to the non-specific enzyme cleavage activity of CRISPR-Cas system, that is, the process of CRISPR-cas13 protein binding to sgRNA to target region CRISPR-cas13 The conformational change of the protein becomes a non-specific RNase, which can not only cleave the target nucleic acid, but also cleave adjacent nucleic acid molecules. As reported, Cas13a, Cas13b, etc. all show very strong bystander paralysis RNase activity.

HEPN结构域是higher eukaryotes and prokaryotes nucleotide domain的简称，是CRISPR-Cas13酶系统中Cas13蛋白发挥切割和抵御外来入侵核酸的重要结构域。HEPN domain is the abbreviation of higher eukaryotes and prokaryotes nucleotide domain, which is an important domain of Cas13 protein in the CRISPR-Cas13 enzyme system to cut and resist foreign invading nucleic acids.

ABE系统是Adenine base editors的简称，即嘌呤碱基转换技术，能够实现A/T到G/C 的单碱基改变。最常用的酶是adar酶(adenosine deaminases acting on RNA，一种作用于RNA的腺苷脱氨酶)。主要是通过将腺嘌呤脱氨基成肌苷，在DNA或者RNA中进行读码的时候会被看成G，从而实现A/T到G/C的突变。由于细胞对肌苷的切出修复不敏感，因而这种突变可以维持较高的产物纯度。The ABE system is the abbreviation of Adenine base editors, that is, the purine base conversion technology, which can realize the single base change from A/T to G/C. The most commonly used enzyme is adar enzyme (adenosine deaminases acting on RNA, an adenosine deaminase acting on RNA). It is mainly through the deamination of adenine to inosine, which will be regarded as G when reading the code in DNA or RNA, so as to realize the mutation from A/T to G/C. Since cells are insensitive to inosine excision repair, this mutation maintains high product purity.

CBE系统是Cytidine base editor的简称，即嘧啶碱基转换技术，目前有BE1、BE2和BE3个工具，其中BE3的效率最高，因而在基因治疗，动物模型制作以及功能基因筛选等领域被广泛应用。CBE system is the abbreviation of Cytidine base editor, that is, pyrimidine base conversion technology. Currently, there are BE1, BE2 and BE3 tools, among which BE3 has the highest efficiency, so it is widely used in the fields of gene therapy, animal model making and functional gene screening.

真核细胞例如哺乳动物细胞，包括人类细胞(人类原代细胞或已建立的人类细胞系)。所述细胞可以是非人类哺乳动物细胞，例如来自非人类灵长类动物(例如猴子)、奶牛/公牛/家牛、绵羊、山羊、猪、马、狗、猫、啮齿动物(例如兔子、小、大鼠、仓鼠)等。所述细胞来自鱼(例如鲑鱼)、鸟(例如禽鸟，包括小鸡、鸭、鹅)、爬行动物、贝类(例如牡蛎、蛤、龙虾、虾)、昆虫、蠕虫、酵母等。所述细胞可以来自植物，例如单子叶植物或双子叶植物。所述植物可以是粮食作物，例如大麦、木薯、棉花、花生、玉米、小米、油棕果、土豆、豆类、油菜籽或低芥酸菜子、大米、黑麦、高粱、大豆、甘蔗、糖甜菜、向日葵和小麦。所述植物可以是谷物(例如大麦、玉米、小米、大米、黑麦、高粱和小麦)。所述植物可以是块茎(例如木薯和土豆)。在一些实施方案中，所述植物可以是糖料作物(例如甜菜和甘蔗)。所述植物可以是含油作物(例如大豆、花生、油菜籽或低芥酸菜子、向日葵和油棕果)。所述植物可以是纤维作物(例如棉花)。所述植物可以是树木，例如桃树或油桃树、苹果树、梨树、杏树、核桃树、开心果树、柑橘属树(例如橙子、葡萄柚或柠檬树)、草、蔬菜、水果或藻类。所述植物可以是茄属植物；芸苔属(Brassica)植物；莴苣属(Lactuca)植物；菠菜属(Spinacia)植物；辣椒属(Capsicum)植物；棉花、烟草、芦笋、胡萝卜、卷心菜、西兰花、花椰菜、番茄、茄子、胡椒、生菜、菠菜、草莓、蓝莓、覆盆子、黑莓、葡萄、咖啡、可可等。Eukaryotic cells such as mammalian cells, including human cells (human primary cells or established human cell lines). The cells may be non-human mammalian cells, e.g., from non-human primates (e.g. monkeys), cows/bulls/cattle, sheep, goats, pigs, horses, dogs, cats, rodents (e.g. rabbits, small, rats, hamsters) etc. The cells are from fish (eg, salmon), birds (eg, birds, including chickens, ducks, geese), reptiles, shellfish (eg, oysters, clams, lobsters, shrimps), insects, worms, yeast, and the like. The cells may be from a plant, such as a monocot or a dicot. The plants may be food crops such as barley, cassava, cotton, peanuts, corn, millet, oil palm fruit, potatoes, beans, rapeseed or canola, rice, rye, sorghum, soybeans, sugar cane, sugar Beets, sunflowers and wheat. The plants may be cereals (eg barley, corn, millet, rice, rye, sorghum and wheat). The plants may be tubers (eg cassava and potatoes). In some embodiments, the plant may be a sugar crop (eg, sugar beet and sugar cane). The plant may be an oleaginous crop (such as soybean, peanut, rapeseed or canola, sunflower and oil palm fruit). The plant may be a fiber crop (eg cotton). The plant may be a tree, such as a peach or nectarine tree, an apple tree, a pear tree, an apricot tree, a walnut tree, a pistachio tree, a citrus tree (such as an orange, grapefruit or lemon tree), a grass, a vegetable, a fruit or algae. The plant may be a plant of the genus Solanum; a plant of the genus Brassica; a plant of the genus Lactuca; a plant of the genus Spinacia; a plant of the genus Capsicum; cotton, tobacco, asparagus, carrot, cabbage, broccoli , cauliflower, tomato, eggplant, pepper, lettuce, spinach, strawberry, blueberry, raspberry, blackberry, grape, coffee, cocoa, etc.

CRISPR系统CRISPR system

CRISPR(成簇规律间隔短回文重复序列)/Cas13(CRISPR相关蛋白13)介导的RNA编辑正在成为用于疾病诊疗、植物育种等方面的有前景的工具。CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas13 (CRISPR-associated protein 13)-mediated RNA editing is becoming a promising tool for disease diagnosis and treatment, plant breeding, etc.

CRISPR是包含碱基序列的短重复的DNA基因座。每个重复之后是来自先前暴露于病毒的“间隔区DNA”的短区段。在约40％的测序的真细菌基因组和90％的测序的古细菌中发现CRISPR。CRISPR通常与编码与CRISPR相关的蛋白质的Cas基因相关。CRISPR/Cas 系统是原核免疫系统，其赋予对外来遗传元件(例如质粒和噬菌体)的抗性并提供获得性免疫的形式。CRISPR间隔区识别并沉默真核生物体中的这些外源遗传元件(例如RNAi)。CRISPR is a DNA locus comprising short repeats of base sequences. Each repeat is followed by a short segment of "spacer DNA" from previous exposure to the virus. CRISPR is found in approximately 40% of sequenced eubacterial genomes and 90% of sequenced archaea. CRISPR is usually associated with a Cas gene that encodes a protein associated with CRISPR. The CRISPR/Cas system is a prokaryotic immune system that confers resistance to foreign genetic elements such as plasmids and phages and provides a form of acquired immunity. CRISPR spacers recognize and silence these exogenous genetic elements (eg RNAi) in eukaryotic organisms.

CRISPR重复序列的大小为24至48个碱基对。它们通常显示一些二重对称，这意味着形成二级结构例如发夹，但不是真正的回文结构。重复序列被相似长度的间隔区分开。一些CRISPR间隔区序列与来自质粒和噬菌体的序列准确地匹配，尽管一些间隔区与原核生物的基因组匹配。响应于噬菌体感染，可迅速添加新的间隔区。CRISPR repeats are 24 to 48 base pairs in size. They usually show some double symmetry, which means that secondary structures such as hairpins are formed, but are not true palindromic structures. Repeats are separated by spaces of similar length. Some CRISPR spacer sequences matched exactly to sequences from plasmids and bacteriophages, although some spacers matched the genomes of prokaryotes. New spacers can be added rapidly in response to phage infection.

指导RNA(gRNA)。作为RNA指导的蛋白，Cas13需要短RNA以指导RNA靶标的识别。guide RNA (gRNA). As an RNA-guiding protein, Cas13 requires short RNAs to guide the recognition of RNA targets.

核酸酶nuclease

Cas核酸酶。CRISPR相关(Cas)基因通常与CRISPR重复-间隔区阵列相关。截至2013年，已描述了超过四十个不同的Cas蛋白家族。在这些蛋白家族之中，Cas1看来在不同的CRISPR/Cas系统中是普遍存在的。Cas基因和重复序列结构的特定组合已用于限定8种CRISPR亚型(Ecoli、Ypest、Nmeni、Dvulg、Tneap、Hmari、Apern和Mtube)，其中一些与编码重复序列相关神秘蛋白(repeat-associated mysterious protein，RAMP)的另外的基因模块相关。在单个基因组中可存在多于一种CRISPR亚型。CRISPR/Cas亚型的散发性分布(sporadic distribution)表明该系统在微生物进化期间经历水平基因转移。Cas nuclease. CRISPR-associated (Cas) genes are often associated with CRISPR repeat-spacer arrays. As of 2013, more than forty different Cas protein families have been described. Among these protein families, Cas1 appears to be ubiquitous in different CRISPR/Cas systems. Specific combinations of Cas genes and repeat structures have been used to define eight CRISPR subtypes (Ecoli, Ypest, Nmeni, Dvulg, Tneap, Hmari, Apern, and Mtube), some of which are associated with encoding repeat-associated mysterious proteins (repeat-associated mysterious proteins). protein, RAMP) related to another gene module. More than one CRISPR isoform can exist in a single genome. The sporadic distribution of CRISPR/Cas isoforms suggests that this system underwent horizontal gene transfer during microbial evolution.

外源DNA明显地由Cas基因编码的蛋白质加工成小元件(长度为约30个碱基对)，然后以某种方式将其插入到靠近前导序列的CRISPR基因座中。来自CRISPR基因座的RNA是组成型表达的，并且被Cas蛋白加工成由具有侧翼重复序列的单独外源来源序列元件构成的小RNA。RNA指导其他Cas蛋白在RNA或DNA水平上沉默外源遗传元件。证据表明CRISPR亚型之间的功能多样性。Cse(Cas亚型Ecoli)蛋白(在大肠杆菌(E.coli)中称为CasA-E)形成功能性复合体Cascade，其将CRISPR RNA转录物加工成保留Cascade的间隔区-重复序列单元。在另一些原核生物中，Cas6加工CRISPR转录物。有趣的是，大肠杆菌中基于CRISPR的噬菌体灭活需要Cascade和Cas3，但不需要Cas1和Cas2。在激烈火球菌(Pyrococcus furiosus)和另一些原核生物中发现的Cmr(Cas RAMP模块)蛋白与小的CRISPR RNA形成功能性复合体，其识别和切割互补靶RNA。RNA指导的CRISPR酶被分类为V型限制酶。The foreign DNA is apparently processed into small elements (about 30 base pairs in length) by the protein encoded by the Cas gene, which is then somehow inserted into the CRISPR locus close to the leader sequence. RNAs from CRISPR loci are constitutively expressed and processed by Cas proteins into small RNAs composed of individually exogenously derived sequence elements with flanking repeats. RNA guides other Cas proteins to silence exogenous genetic elements at the RNA or DNA level. Evidence for functional diversity among CRISPR isoforms. The Cse (Cas subtype Ecoli) protein (referred to as CasA-E in E. coli) forms the functional complex Cascade, which processes CRISPR RNA transcripts into Cascade-retaining spacer-repeat units. In other prokaryotes, Cas6 processes CRISPR transcripts. Interestingly, Cascade and Cas3, but not Cas1 and Cas2, are required for CRISPR-based phage inactivation in E. coli. The Cmr (Cas RAMP module) protein found in Pyrococcus furiosus and some other prokaryotes forms a functional complex with small CRISPR RNAs that recognize and cleave complementary target RNAs. RNA-guided CRISPR enzymes are classified as type V restriction enzymes.

实施例Example

实施例1：新型Cas13蛋白从头筛选Example 1: Novel Cas13 protein de novo screening

我们还进行了从头寻找CRISPR-Cas13其他家族成员。简单来说，该分析系统包括2大块，一部分CRISPR array区域的鉴定，我们首先下载NCBI和IMG截止到2021年2月份的全部细菌，古细菌基因组以及宏基因组的序列，利用CRISPR array鉴定软件(如Pilercr)进行鉴定CRISPR array区域；另一部分是该区域上下游附近Cas相关蛋白的搜寻，即取该区域上下游临近的6个蛋白，共计12个蛋白进行目标结构域分析。最终候选蛋白的氨基酸序列编号、拓展的HEPN结构域和坐标等信息参见表2。We also performed a de novo search for other CRISPR-Cas13 family members. To put it simply, the analysis system includes 2 large blocks, the identification of a part of the CRISPR array region. We first download the sequences of all bacteria, archaeal genomes and metagenomics from NCBI and IMG as of February 2021, and use the CRISPR array identification software ( Such as Pilercr) to identify the CRISPR array region; the other part is to search for Cas-related proteins near the upstream and downstream of this region, that is, to take 6 proteins adjacent to the upstream and downstream of this region, a total of 12 proteins for target domain analysis. See Table 2 for the amino acid sequence numbers, expanded HEPN domains and coordinates of the final candidate proteins.

其中本筛选体系的HEPN结构域除了包含过去Cas13家族成员中发现的RxxxxH(R4xH)特征(我们记为早期HEPN结构域)外，还进行了拓展，包括其他具有RNA切割活性的结构域RxxxxxH(R5xH)和RxxxxxxH(R6xH)(我们将RxxxxxH(R5xH)和RxxxxxxH(R6xH)以及RxxxxH(R4xH)总体记为拓展的HEPN结构域)。其中临近R保守氨基酸优选为N、Q、H或D，例如R[NDQH]xxxH、R[NDQH]xxxxH、R[NDQH]xxxxxH等组合；其中R4xH优选R[NQH]xxxH，而R5xH和R6xH则优选R[NDQH]xxxxH和R[NQDH]xxxxxH；其中x代表任意氨基酸，而中括号内N、D、Q和H为优先考虑保守氨基酸。Among them, the HEPN domain of this screening system, in addition to including the RxxxxxH (R4xH) features found in the past members of the Cas13 family (we denote it as the early HEPN domain), has also been expanded, including other domains with RNA cleavage activity RxxxxxH (R5xH ) and RxxxxxxH (R6xH) (we denote RxxxxxH (R5xH) and RxxxxxxH (R6xH) and RxxxxxH (R4xH) collectively as the extended HEPN domain). Wherein the conservative amino acid adjacent to R is preferably N, Q, H or D, such as combinations such as R[NDQH]xxxH, R[NDQH]xxxxH, R[NDQH]xxxxxH; wherein R4xH is preferably R[NQH]xxxH, and R5xH and R6xH are Preferred are R[NDQH]xxxxH and R[NQDH]xxxxxH; wherein x represents any amino acid, and N, D, Q and H in square brackets are preferentially considered conservative amino acids.

实施例2：新型候选Cas13蛋白的功能验证Example 2: Functional verification of novel candidate Cas13 protein

候选蛋白筛到后将进行功能验证。简而言之，我们首先将候选蛋白的核酸序列，DR序列以及target spacer序列送公司进行合成，然后将其导入表达质粒中构建相应的质粒。然后通过质粒转化在DH5a大肠杆菌感受态细胞进行质粒扩增培养，然后抽提质粒后进行人源293T(能表达红光)细胞系的转染试验，与此同时我们还设计了对应的阴性和阳性对照组来进一步确认候选蛋白的切割活性。共转染质粒48h后进行流式细胞分析等试验来最终确定候选蛋白的RNA酶切活性。Functional verification will be carried out after the candidate protein is screened. In short, we first send the nucleic acid sequence, DR sequence and target spacer sequence of the candidate protein to the company for synthesis, and then introduce them into the expression plasmid to construct the corresponding plasmid. Then, the plasmid was amplified and cultured in DH5a E. coli competent cells through plasmid transformation, and then the transfection test of the human 293T (can express red light) cell line was carried out after the plasmid was extracted. At the same time, we also designed the corresponding negative and A positive control group was used to further confirm the cleavage activity of the candidate protein. After 48 hours of co-transfection of the plasmids, experiments such as flow cytometry analysis were performed to finally determine the RNA cleavage activity of the candidate protein.

按照上述研究策略，我们随机选择了DZ109进行了验证。如图2至3中所示，荧光结果图和流式分析结果图表明，DZ109具有很强的RNase活性。According to the above research strategy, we randomly selected DZ109 for verification. As shown in Figures 2 to 3, the fluorescence results and flow cytometry results show that DZ109 has strong RNase activity.

实施例3：新型候选Cas13蛋白的RNA核酸检测功能Embodiment 3: RNA nucleic acid detection function of novel candidate Cas13 protein

鉴于候选Cas13蛋白非常强的非特异bystander RNase活性，潜在应用于RNA的检测，如RNA病毒，肿瘤信号RNA分子。简单来说，通过构建能够切割目标检测核酸的CRISPR-cas系统(如它可以是检测试纸方式存在，或者递送载体包被等方式)，包括候选的CRISPR-Cas13蛋白，sgRNA(靶向目标检测病毒RNA)以及报告检测分子(如RNA荧光报告分子)，然后当该系统与靶RNA结合后能够发挥候选cas13蛋白的bystander旁切RNase活性而继续切割报告检测分子，从而使得信号分子发出信号，如发荧光。而这些信号能够被检测仪器接收并转化成电信号就可以被读取出来，这样就可以达到目标核酸的检测目的，如进一步整合机器学习算法模型还可以进一步进行目标核酸的定量和预测。因而可以广泛应用于病毒检测，如新冠病毒检测；也可以广泛应用于疾病(如肿瘤)的无创诊断，如液体活检。In view of the very strong non-specific bystander RNase activity of the candidate Cas13 protein, it is potentially applied to the detection of RNA, such as RNA viruses and tumor signal RNA molecules. To put it simply, by constructing a CRISPR-cas system capable of cleaving the target detection nucleic acid (for example, it can exist in the form of a detection test paper, or a delivery vector coating, etc.), including the candidate CRISPR-Cas13 protein, sgRNA (targeted detection virus RNA) and reporter detection molecules (such as RNA fluorescent reporter molecules), and then when the system binds to the target RNA, it can exert the bystander RNase activity of the candidate cas13 protein and continue to cut the reporter detection molecule, so that the signal molecule can send a signal, such as sending fluorescence. These signals can be received by detection instruments and converted into electrical signals before being read out, so that the detection of target nucleic acids can be achieved. For example, further integration of machine learning algorithm models can further quantify and predict target nucleic acids. Therefore, it can be widely used in virus detection, such as the detection of new coronavirus; it can also be widely used in the non-invasive diagnosis of diseases (such as tumors), such as liquid biopsy.

实施例4：新型候选Cas13蛋白的碱基编辑功能验证Example 4: Verification of the base editing function of the novel candidate Cas13 protein

当前用于单碱基编辑的系统主要有两种，一种是ABE系统，另一种是CBE系统。简单来说，通过候选Cas13蛋白的切割结构域(拓展的HEPN结构域)进行突变处理，获得只有结合RNA而没有切割活性的候选dCas13蛋白，然后融合adar酶序列，构建ABE单碱基编辑系统的质粒，然后对特定序列，比如TP53基因的转录本进行定点碱基突变处理的sgRNA设计并构建相应的质粒载体。然后通过共转染人源293T细胞系，48小时后进行流式细胞分选获得共转染的细胞系。然后进行RNA转录本的提取以及建库。然后进deep seq测序。测序结束后通过生物信息方法分析TP53基因转录本的突变情况就可以获得对应的ABE系统的单碱基编辑效能分析。从而通过不断的优化sgRNA来实现构建目标区域的最优单碱基编辑系统。There are two main systems currently used for single base editing, one is the ABE system and the other is the CBE system. To put it simply, the cleavage domain (extended HEPN domain) of the candidate Cas13 protein is mutated to obtain a candidate dCas13 protein that only binds RNA but has no cleavage activity, and then fuses the adar enzyme sequence to construct the ABE single base editing system. Plasmids, and then design sgRNAs for site-directed base mutation processing on specific sequences, such as transcripts of the TP53 gene, and construct corresponding plasmid vectors. Then, the co-transfected cell line was obtained by co-transfecting the human 293T cell line and sorting by flow cytometry 48 hours later. Then the extraction of RNA transcripts and library construction were performed. Then enter deep seq sequencing. After the sequencing is completed, the mutation status of the TP53 gene transcript can be analyzed by bioinformatics methods to obtain the single-base editing efficiency analysis of the corresponding ABE system. In this way, the optimal single base editing system for the target region can be constructed by continuously optimizing sgRNA.

实施例5：候选Cas13蛋白与已知Cas13蛋白的同源性分析Example 5: Homology Analysis of Candidate Cas13 Proteins and Known Cas13 Proteins

依据未知蛋白在已知蛋白的覆盖度越高且相似度占比越大则未知蛋白与已知蛋白的同源性越近的原理进行。对所筛选到的候选蛋白后，我们先从NCBI数据库以及专利文献中下载Cas13a,b,c,d,x(e),y(f)的相关蛋白序列，然后与我们的数据一起合并构建本地blastp的索引文件，然后将候选蛋白序列比对到本地blastp索引库中进行蛋白序列比对分析。对于蛋白之间相似度(identity)小于20％或者没法比对到本地索引库的部分我们统一标注为20％；类似的，对于覆盖度(coverage)小于5％或者没法比对到本地索引库的标记为1％。本发明方法所鉴定出的新Cas13蛋白与已知各家族Cas13蛋白的同源性水平极低。例如，DZ109、DZ110、DZ140、DZ159、DZ163、DZ183、DZ264、DZ280等与目前已知的各Cas13类别的同源性均在20％以下。Based on the principle that the higher the coverage of the unknown protein on the known protein and the larger the proportion of similarity, the closer the homology between the unknown protein and the known protein. After screening the candidate proteins, we first download the relevant protein sequences of Cas13a, b, c, d, x(e), y(f) from the NCBI database and patent literature, and then merge them with our data to construct a local blastp index file, and then compare the candidate protein sequence to the local blastp index library for protein sequence alignment analysis. For the part whose identity between proteins is less than 20% or cannot be compared to the local index library, we will uniformly mark it as 20%; similarly, for the coverage (coverage) is less than 5% or cannot be compared to the local index Libraries are marked at 1%. The homology level between the novel Cas13 protein identified by the method of the present invention and the known Cas13 protein of each family is extremely low. For example, DZ109, DZ110, DZ140, DZ159, DZ163, DZ183, DZ264, DZ280, etc. have less than 20% homology with currently known Cas13 types.

候选Cas13蛋白的DR序列参见下表1。The DR sequences of candidate Cas13 proteins are shown in Table 1 below.

表1.候选Cas13蛋白的DR序列Table 1. DR sequences of candidate Cas13 proteins

SEQ ID No.SEQ ID No. DR-IDDR-ID DR-SEQDR-SEQ 199199 DZ109aDZ109a GTTGTGTATGCCCTATATTTGTAGGGTTGAAACAACGTTGTGTATGCCCTATATTTGTAGGGTTGAAACAAC 200200 DZ109bDZ109b GTTGTTTCAACCCTACAAATATAGGGCATACACAACGTTGTTTCAACCCTACAAATAGGGCATACACAAC

201201 DZ110DZ110 GGTGTGAATGCCCTTGTTTTGAAGGGTTAATACATCGGTGTGAATGCCCTTGTTTTGAAGGGTTAATACATC 202202 DZ111DZ111 GCTGGATTCATCTGTATTTTTGCAGGTATTCACAGCGCTGGATTCATCTGTATTTTTGCAGGTATTCACAGC 203203 DZ112DZ112 GCTGTGTTTACTTACCAAAATGCAAGTGTTACCAGCGCTGTGTTTACTTACCAAAATGCAAGTGTTACCAGC 204204 DZ113DZ113 GTTGTAACTGCCTTTGTTTTGAAAGGTAAAAACAACGTTGTAACTGCCTTTGTTTTGAAAGGTAAAAACAAC 205205 DZ114DZ114 GTTGTAACTGCTATTACTTTGAATAGAGAAAACAACGTTGTAACTGCTATTACTTTGAATAGAGAAAAACAAC 206206 DZ119DZ119 GTTGTGATTGCTCTAATTTTATAGGGCGCCAACAACGTTGTGATTGCTCTAATTTTAGGGCGCCAACAAC 207207 DZ120DZ120 GCTGGAGCAGCCCTCGATTTGCTGGGTAATCACAGCGCTGGAGCAGCCCTCGATTTGCTGGGTAATCACAGC 208208 DZ121DZ121 GCTGGAGCAGCCCTCGATTTGCAGGGTTATCACAGCGCTGGAGCAGCCCTCGATTTGCAGGGTTATCACAGC 209209 DZ123DZ123 GAGATAGACCCTTGTTAACTCGTAAGGTTCTGTGACTGAGATAGACCCTTGTTAACTCGTAAGGTTCTGTGACT 210210 DZ124DZ124 GAACTATAATCCTGTAACTGAACAGGATTCTGAAAGAACTATAATCCTGTAACTGAACAGGATTCTGAAA 211211 DZ126DZ126 TGTTTCTACCTTTCAAAACAAAGGCTTGCACACCTGTTTCTACCTTTTCAAAACAAAGGCTTGCACACC 212212 DZ127DZ127 GTGTCAGTCCGCCGTGAAACAGGCGGTCATGCAGAACGTGTCAGTCCGCCGTGAAACAGGCGGTCATGCAGAAC 213213 DZ128DZ128 GCTCTACAGCCCTCTGAAACATGAGGGTTCTGAAACGCTCTACAGCCCTCTGAAACATGAGGGTTCTGAAAC 214214 DZ129DZ129 GCACAAGATAACCCGAAAACAGTAGGGTTCTAAAACGCACAAGATAACCCGAAAACAGTAGGGTTCTAAAAC 215215 DZ130DZ130 GATTGAAAGGATTGTAAATTTACAAGGTCTTAAAACGATTGAAAGGATTGTAAAATTTACAAGGTCTTAAAAC 216216 DZ131DZ131 GTACTAATGCCCTACAGAAATGCAGGGTTCTAAAACGTACTAATGCCCTACAGAAATGCAGGGTTCTAAAAC 217217 DZ132DZ132 GAACTACTACCTCAATGAATGTTGGGGTTCAGAAACTGAACTACTACCTCAATGAATGTTGGGGTTCAGAAACT 218218 DZ133DZ133 GTGGAGAACCCGATATAGTGGGTACTAGAGGTGGAGAACCCGATATAGTGGGTACTAGAG 219219 DZ134DZ134 ACTTAAATCCCCCTGTAAATGCGGGGGTTCTAAAACACTTAAAATCCCCCTGTAAATGCGGGGGTTCTAAAAC 220220 DZ135DZ135 GGTGGAAAAGCCTTTGATTTGAAAGGTAAAAGCACCGGTGGAAAAGCCTTTGATTTGAAAGGTAAAAGCACC 221221 DZ136DZ136 GATTGAAAGCTATGCGAATTTGCACAGTCTTAAAACGATTGAAAGCTATGCGAATTTGCACAGTCTTAAAAC 222222 DZ137DZ137 CAAGTAAACCCCTACCAACTGGTAGGGGTCTGAAACCAAGTAAACCCCTACCAACTGGTAGGGGTCTGAAAC 223223 DZ138DZ138 GGTGTAGTGCTCCCTTATTTGGAGCTCATCTCCGGCGGTGTAGTGCTCCCTTATTTGGAGCTCATCTCCGGC 224224 DZ139DZ139 GAACTACACCCGTGCAAAAATGCAGGGGTCTAAAACGAACTACACCCGTGCAAAAATGCAGGGGTCTAAAAC 225225 DZ140DZ140 GCTGTAAATGCCTTTCAAAATGAGGCCTTTGCCAGCGCTGTAAATGCCTTTCAAAATGAGGCCTTTGCCAGC 226226 DZ141DZ141 GTACAATAGCCCGTTGAAATAGATGGGTTCTAAGACTGTACAATAGCCCGTTGAAATAGATGGGTTCTAAGACT 227227 DZ142DZ142 CTTGAAGATGACCTCCAATTAAGGGAGGACTGCACTCTTGAAGATGACCTCCAATTAAGGGAGGACTGCACT 228228 DZ143DZ143 GTTGTGGGAGCCTTTATTTTGTAAGGTATAAACAACGTTGTGGGAGCCTTTATTTTGTAAGGTATAAACAAC

229229 DZ144DZ144 GGTTCAGTCGCCTGTGAAAAGAGGCGTCGTCCTCAACGGTTCAGTCGCCTGTGAAAAGAGGCGTCGTCCTCAAC 230230 DZ145DZ145 GTTGTGAATGCCTTAATTTTGGAAGGTGAGAACAACGTTGTGAATGCCTTAATTTTGGAAGGTGAGAACAAC 231231 DZ146DZ146 GTTGTAAAAGCCTTTAGTTTGTAAGGTAAAAACAACGTTGTAAAAGCCTTTAGTTTGTAAGGTAAAAACAAC 232232 DZ147DZ147 GTTGTGGATGCCTTAACTTTGAAAGGTGAAAACAACGTTGTGGATGCCTTAACTTTGAAAGGTGAAAACAAC 233233 DZ148DZ148 GTCTCAGACCCCGTGTTTTTCAACGGGTTGTTGTTCGTTCCAGACCCCGTGTTTTTCAACGGGTTGTTGTTC 234234 DZ149DZ149 GTTGTGGAAGCCTGACTTTTATTAGGTAAGCACAACGTTGTGGAAGCCTGACTTTTATTAGGTAAGCACAAC 235235 DZ150DZ150 GTTGTAAATGCCTTATATTTGCAAGGTGAAAACAACGTTGTAAATGCCTTATATTTGCAAGGTGAAAACAAC 236236 DZ151DZ151 GTTGTAAATGCCTTATATTTGCAAGGTGAAAACAACGTTGTAAATGCCTTATATTTGCAAGGTGAAAACAAC 237237 DZ152DZ152 GTTGTAAATACCCTATATTTGAAGGGTAATAACAAGTTGTAAATACCCTATATTTGAAGGGTAATAACAA 238238 DZ153DZ153 GTTGTGTTTCCACTTCAAATATAGGGTATTCACAACGTTGTGTTTCCACTTCAAATATAGGGTATTCACAAC 239239 DZ154DZ154 AGCTGGTAACACTAGTATTTTTGCTGGTAATCACAGCAGCTGGTAACACTAGTATTTTTGCTGGTAATCACAGC 240240 DZ155DZ155 GTTGTAGAAGCCCCTTATTTGAAGGGGTAGTTGTAGAAGCCCCTTATTTGAAGGGGTA 241241 DZ156DZ156 GCTGTAGAAGCCTTCGTTTTGGGAGGTAAGTACAGCGCTGTAGAAGCCTTCGTTTTGGGAGGTAAGTACAGC 242242 DZ157DZ157 GTTGTAGCAACCTATATTTTGTTAGGAAAAGACAACGTTGTAGCAACCTATATTTTGTTAGGAAAAGACAAC 243243 DZ158DZ158 GAAGCCCCTATTTTGTGGGGTAGTTACAGCGAAGCCCCTATTTTGTGGGGTAGTTACAGC 244244 DZ159DZ159 TGCTGTAACTACCTGCTGTTTAAGAGGTACATACAGTTGCTGTAACTACCTGCTGTTTAAGAGGTACATACAGT 245245 DZ160DZ160 GTTGTAGAGGCCCTCGATTTGCAGGGTAGGTACAACGTTGTAGAGGCCCTCGATTTGCAGGGTAGGTACAAC 246246 DZ161DZ161 GTTGTGAAAGCCTTATATTTGAAGGGTAAATACAACGTTGTGAAAGCCTTATATTTGAAGGGTAAATACAAC 247247 DZ162DZ162 TTGTAATTGCTACCCAAAATGCAGCATTCAACAACTTGTAATTGCTACCCAAAATGCAGCATTCAACAAC 248248 DZ163DZ163 GGTGTGAATGCCCTTGTTTTGAAGGGTTAATACATCTGGTGTGAATGCCCTTGTTTTGAAGGGTTAATACATCT 249249 DZ164DZ164 GCTGTGGAAGCCTTGACTTTGAAAGGTAGTTACAGCGCTGTGGAAGCCTTGACTTTGAAAGGTAGTTACAGC 250250 DZ165DZ165 GGTGGAAAAGCCTTTGATTTGAAAGGTAAAAGCACCGGTGGAAAAGCCTTTGATTTGAAAGGTAAAAGCACC 251251 DZ166DZ166 GCTGTTGCAGCCTGCTATTTGGTGGGTATTTACAGCGCTGTTGCAGCCTGCTATTTGGTGGGTATTTACAGC 252252 DZ167DZ167 GTTGTGTTTACCTTTCAAATTAAGGGCAGCCACAACGTTGTGTTTACCTTTCAAATTAAGGGCAGCCACAAC 253253 DZ168DZ168 GTAGAAATGAAGACAAAGCGATAGAGTGCTTAATAACGTAGAAATGAAGACAAAGCGATAGAGTGCTTAATAAC 254254 DZ169DZ169 GTTGTAACTGCTCTTAGTTTGGTTGTAACTGCTCTTAGTTTG 255255 DZ170DZ170 GTTGTCGAAAGCCTTATTTTGAAGGGCTATTACAACGTTGTCGAAAGCCTTATTTTGAAGGGCTATTACAAC 256256 DZ171DZ171 GCTGTTTTTACCTTTCAAATTCAAGGCATTCACAGCGCTGTTTTTACCTTTCAAATTCAAGGCATTCACAGC

257257 DZ172DZ172 GCTGTAGAAGCCGGCACTTTGGCTGGTAATTACAGTGCTGTAGAAGCCGGCACTTTGGCTGGTAATTACAGT 258258 DZ173DZ173 GTTGCATCTGCCTTCTATTTGAGAGGCACAAACAACGTTGCATCTGCCTTTCTATTTGAGAGGCACAAACAAC 259259 DZ174DZ174 GTTTTAGTCCCCTTCGATATTGGGGTGGTCTATATCGTTTTAGTCCCCTTCGATATTGGGGTGGTCTATATC 260260 DZ175DZ175 GTTGTTGAAGACCAAATTTTGAATGGTTATAACAACGTTGTTGAAGACCAAATTTTGAATGGTTATAACAAC 261261 DZ176DZ176 TTGTTTCTTCCTACTAAAGTGCAGGTCTTTACAACTTGTTTCTTCCTACTAAAAGTGCAGGTCTTTACAAC 262262 DZ177DZ177 TTGTAACTGCTCTTATTTTGAAGGGTAAAAACAACTTGTAACTGCCTCTTATTTTGAAGGGTAAAAACAAC 263263 DZ178DZ178 TGCTGTAATTACCCTTCAAAAAGAAGGCTTCCACAGCTGCTGTAATTACCCCTTCAAAAAGAAGGCTTCCACAGC 264264 DZ179DZ179 GTTGTAAGTGCTCTTAATTTGAAGGGTAAACACAACGTTGTAAGTGCTCTTAATTTGAAGGGTAAACACAAC 265265 DZ180DZ180 GGTGTGGAAGCCTTCTTTTTGAAAGGGGTGTGGAAGCCTTCTTTTTGAAAGG 266266 DZ181DZ181 ACTGTAGATGATCCCAAAAGTGAAGGGAACTACTGTAGATGATCCCAAAAGTGAAGGGAACT 267267 DZ182DZ182 GTTGTGTTTACCTTTCAAATTAAGGGCAGCCACAACGTTGTGTTTACCTTTCAAATTAAGGGCAGCCACAAC 268268 DZ183DZ183 TTAGTTGTAACTGCCCTTATTTTGAAGGGTAAACACAACTTTTAGTTGTAACTGCCCTTATTTTGAAGGGTAAACACAACTT 269269 DZ184DZ184 GTTGTCTCTTCCCTCTCAAATGAGGGCTTTTACAACGTTGTCTCTTCCCTCTCAAATGAGGGCTTTTACAAC 270270 DZ185DZ185 GTTGTAACTGCTCTTAGTTTGAAGGGTAAAAACAACGTTGTAACTGCTCTTAGTTTGAAGGGTAAAAACAAC 271271 DZ186DZ186 GTTGTAACTGCTCTTAATTAGAAGGGTAAAAACAACGTTGTAACTGCTCTTAATTAGAAGGGTAAAAACAAC 272272 DZ187DZ187 GTTGTAACAAGCCTAAGTTTGAAAGGTAAAAACAACGTTGTAACAAGCCTAAGTTTGAAAGGTAAAAACAAC 273273 DZ188DZ188 GGTGTAATCAGTTTCTTTTTGAAAGCTATTAACACCAGGTGTAATCAGTTTCTTTTTGAAAGCTATTAACACCA 274274 DZ189DZ189 TTGTAACTGCTCTTATTTTGAATTGTAACTGCCTCTTATTTTGAA 275275 DZ191DZ191 CTATGTTGGGACATACCTGTTTTTGAAAGGTATTTACAACCTATGTTGGGACATACCTGTTTTTGAAAGGTATTTACAAC 276276 DZ192DZ192 TTGAGTTGTTACAGCCTTTGTTTTGAAAGGTATTTACAACTTGAGTTGTTACAGCCTTTGTTTTGAAAGGTATTTACAAC 277277 DZ193DZ193 AAGTTGTCATACCCACTCATATGTGGGCTTCTGCAACAAGTTGTCATACCCACTCATATGTGGGCTTCTGCAAC 278278 DZ194DZ194 GCTGTAACTACATCCCCAAACGGGAGCCTCTACAGCGCTGTAACTACATCCCCAAACGGGAGCCTCTACAGC 279279 DZ195DZ195 GTTGTAGCTGCCCTTAATTTGAAGGGTAAAAACAACGTTGTAGCTGCCCTTAATTTGAAGGGTAAAAACAAC 280280 DZ196DZ196 GCTGTAAGTACCTTTCAAAATCAAGGCTTCAACAGCGCTGTAAGTACCTTTCAAAAATCAAGGCTTCAACAGC 281281 DZ197DZ197 GTTGTAACTGCCCTTATTTTGATGGGTAAAAACAACGTTGTAACTGCCCCTTATTTTGATGGGTAAAAACAAC 282282 DZ198DZ198 TGATATAGACCACCCCAATATCAAAGGGGACTAAAACTGATATAGACCACCCCAATATCAAAGGGGACTAAAAC 283283 DZ199DZ199 GCTGTAGAGGCCCTGCGTTTGGCAGGTAGGTACAGCGCTGTAGAGGCCCTGCGTTTGGCAGGTAGGTACAGC 284284 DZ200DZ200 GTTGCAGAAGCCCACATGTGAGTGGGTATGACAACGTTGCAGAAGCCCACATGTGAGTGGGTATGACAAC

285285 DZ201DZ201 GTTGTAACTGCCCTTGATTTGAAGGGTAAAAGTTGTAACTGCCCTTGATTTGAAGGGTAAAA 286286 DZ202DZ202 TATTTTGAAGGGTATAAACAACTATTTTGAAGGGTATAAACAAC 287287 DZ203DZ203 GTTGTAACCGTGCTTGATTTTGAAGCGCAATTCCAACGTTGTAACCGTGCTTGATTTTGAAGCGCAATTCCAAC 288288 DZ204DZ204 AGTATTATCTAACCCCTAAATAACGGGGGGCTATAACTAGTATTATCTAACCCCTAAATAACGGGGGGCTATAACT 289289 DZ205DZ205 GGTGTGTCCGCCTTTGATTTGAAAGGTATGTGCACCGGTGTGTCCGCCTTTGATTTGAAAGGTATGTGCACC 290290 DZ206DZ206 GCTGTAACTGCCTTATTTTTGAGAGGTAAATACAGCGCTGTAACTGCCTTATTTTTGAGAGGTAAATACAGC 291291 DZ207DZ207 GTTGTAAATGCCCTTGTTTGAAGGGTAAAAACAACGTTGTAAATGCCCTTGTTTGAAGGGTAAAAACAAC 292292 DZ208DZ208 GTTGTGAATGCTCTTAGTTTGTGGAGTAAAGACAACGTTGTGAATGCTCTTAGTTTGTGGAGTAAAGACAAC 293293 DZ209DZ209 GTCGAAGAAGCCTCCAGTTTGAGGGGTGAGTTTGACGTCGAAGAAGCCTCCAGTTTGAGGGGTGAGTTTGAC 294294 DZ210DZ210 GTTGTAAGTGCCCTTAGTTTGAAAGGTAGAAACAACGTTGTAAGTGCCCCTTAGTTTGAAAGGTAGAAACAAC 295295 DZ211DZ211 GTTGTAACTGCCCTTATTTTAAAGGGTACAAACAACGTTGTAACTGCCCTTATTTTAAAGGGTACAAACAAC 296296 DZ212DZ212 GTTGTAACTGCCCTCGGTTTGGGGGGTGAACACAACGTTGTAACTGCCCTCGGTTTGGGGGGTGAACACAAC 297297 DZ213DZ213 GTTGTAACTGCTCTTGTTTTGAAGGGTAAACACAACGTTGTAACTGCTCTTGTTTTGAAGGGTAAACACAAC 298298 DZ215DZ215 GCTGTTATTACCTTTCAAATCAAAGGCATACACAGCGCTGTTATTACCTTTTCAAATCAAAGGCATACACAGC 299299 DZ216DZ216 GTTGTAAATGCCTTATATTTGTAAGGTGAAAACAACGTTGTAAATGCCTTATATTTGTAAGGTGAAAACAAC 300300 DZ217DZ217 GTTGTAGCTGCCCTTATTTTGAAGGGTAAACACAACGTTGTAGCTGCCCCTTATTTTGAAGGGTAAACACAAC 301301 DZ218DZ218 AGGGTTTTGAAGCCCTCTATGCTGAGAGGGTTGTAAACAGGGTTTTGAAGCCCTCTATGCTGAGAGGGTTGTAAAC 302302 DZ219DZ219 TTGTTGTAGCTGCTCTTTGTTTGGAGGGTAAAAACAACTTGTTGTAGCTGCTCTTTGTTTGGAGGGTAAAAACAAC 303303 DZ220DZ220 GATATAGACCACCCCAATATCGAAGGGGACTAAAACGATATAGACCACCCCAATATCGAAGGGGACTAAAAC 304304 DZ221DZ221 GTTGGGACGTATCACAATTTGAAAGGTACTCACAACGTTGGGACGTATCACAATTTGAAAGGTACTCACAAC 305305 DZ222DZ222 GCTGTGTATGCCTTTGATTTGAAAGGTAATAACAGCGCTGTGTATGCCTTTGATTTGAAAGGTAATAACAGC 306306 DZ223DZ223 GTTGTTACAGCCCTATATTTGAAGGGTACTCACAACGTTGTTACAGCCCTATATTTGAAGGGTACTCACAAC 307307 DZ224DZ224 ATTACAACCCCTATATTCACGGGGACTATAACATTACAACCCCTATATTCACGGGGACTATAAC 308308 DZ225DZ225 GTTGTTACTGCCCTTATTTTGAAGGGTAAACACAACGTTGTTACTGCCCCTTATTTTGAAGGGTAAACACAAC 309309 DZ226DZ226 GTTGTAGCTGCCTTTATTTTGAAAGGTAAAAACAACGTTGTAGCTGCCCTTTATTTTGAAAGGTAAAAACAAC 310310 DZ227DZ227 GCTGTAGAATCCCTTATTTTGAAGGGTAGGAACAGCGCTGTAGAATCCCTTATTTTGAAGGGTAGGAACAGC 311311 DZ228DZ228 GTTGTAACTGCCCTTATTTTGAAGGGTATAAACAACAGTTGTAACTGCCCCTTATTTTGAAGGGTATAAACAACA 312312 DZ229DZ229 GGTGTAGTGCTCCCTCATTTGGAGCTCATCTTTAGCGGTGTAGTGCTCCCTCATTTGGAGCTCATCTTTAGC

313313 DZ230DZ230 GTTGTAACTGCCCTTATTTTGAAGGGTAAACACAACGTTGTAACTGCCCCTTATTTTGAAGGGTAAACACAAC 314314 DZ231DZ231 GATATAGACTACCCCAATATCGAAGGGGACTAAAACGATATAGACTACCCCAATATCGAAGGGGACTAAAAC 315315 DZ232DZ232 GAGAAACATCACCCCCAAATGGAGGGGGACTGCACCGAGAAACATCACCCCCAAATGGAGGGGGACTGCACC 316316 DZ233DZ233 GTTGTAAGAACCTTGCAAATATAAGGCATTTACAACGTTGTAAGAACCTTGCAAATATAAGGCATTTACAAC 317317 DZ234DZ234 GTTGTTACAGTCCTTAATTTGAAGGGTAAACACAACGTTGTTACAGTCCTTAATTTGAAGGGTAAACACAAC 318318 DZ235DZ235 GTTGTTATTACCCTCCAAACTAAGAGCCTTTACAACGTTGTTATTACCTCTCCAAACTAAGAGCCTTTACAAC 319319 DZ236DZ236 GTTGTAAAGGCTCTTAGTTTGGAGGGTAATAACAACGTTGTAAAGGCTCTTAGTTTGGAGGGTAATAACAAC 320320 DZ237DZ237 GTTGTAAATACCCTATATTTGAAGGGTAATAACAACGTTGTAAATACCCTATATTTGAAGGGTAATAACAAC 321321 DZ238DZ238 GTTGTAGATGCCTTGATTTTGCAGGGTAAACACAACGTTGTAGATGCCTTGATTTTGCAGGGTAAACACAAC 322322 DZ239DZ239 GTTGGGACGTATCACAATTTGAAGGGTACTCACAACGTTGGGACGTATCACAATTTGAAGGGTACTCACAAC 323323 DZ240DZ240 GATATAGACCACCCCAATATCGAAGGGGACTAAAACGATATAGACCACCCCAATATCGAAGGGGACTAAAAC 324324 DZ241DZ241 GTCACAATCCCTGAACGCATGGGAACTGAAACGTCACAATCCCTGAACGCATGGGAACTGAAAC 325325 DZ242DZ242 GTTGTTTTTACCCTACAAAATGAGGCCAGCTACAACGTTGTTTTTACCTACAAAATGAGGCCAGCTACAAC 326326 DZ243DZ243 GTTGTAAATACTCTATATTTGAAGGGTGTTGTAAATACTCTATATTTGAAGGGT 327327 DZ244DZ244 GCTGTGGAAGCCTTTCGTTTGAATGGTAATTACAGCGCTGTGGAAGCCTTTCGTTTGAATGGTAATTACAGC 328328 DZ245DZ245 GATATAGACCACCCCAATATCGAAGGGGACTAAAACGATATAGACCACCCCAATATCGAAGGGGACTAAAAC 329329 DZ246DZ246 GTTGTAGATGCCCCGATTTTGCAGGGTAAACACAACGTTGTAGATGCCCCGATTTTGCAGGGTAAACACAAC 330330 DZ247DZ247 GGTGTGAATGATCCCAAAATGAAAGGGAACTACAACGGTGTGAATGATCCCAAAATGAAAGGGAACTACAAC 331331 DZ248DZ248 GATATAGATAACCCCAAAAACGAAGGGGTCTAAAACGATATAGATAACCCCCAAAAACGAAGGGGTCTAAAAC 332332 DZ249DZ249 GTTGTCATACCCTCTCATGTGAGGGCGTTAGCAACGTTGTCATACCTCTCATGTGAGGGCGTTAGCAAC 333333 DZ250DZ250 GATATAGATAACCCCAAAAACGAAGGGGACTAAAACGATATAGATAACCCCCAAAAACGAAGGGGACTAAAAC 334334 DZ251DZ251 GGTGTGGAAGCCCTCTGTTTGAAGGGTAGATACACCGGTGTGGAAGCCCTCTGTTTGAAGGGTAGATACACC 335335 DZ252DZ252 GTTGTAACTGCCCTCAGTTTGAAGGGTAAAAACAACGTTGTAACTGCCCTCAGTTTGAAGGGTAAAAACAAC 336336 DZ253DZ253 GTTGTAGAAGCCTGTATTTGAGCAGGTATGACAACGTTGTAGAAGCCTGTATTTGAGCAGGTATGACAAC 337337 DZ254DZ254 GATATAGATAACCCCAAAAACGAAGGGGACTAAAACGATATAGATAACCCCCAAAAACGAAGGGGACTAAAAC 338338 DZ255DZ255 GTTGTGAAAGGCCTCAGTTTGATGGGTACTAACAACGTTGTGAAAGGCCTCAGTTTGATGGGTACTAACAAC 339339 DZ256DZ256 GTCACAACTCCCATGTAGGCGGAGACTGCAACGTCACAACTCCCATGTAGGCGGAGACTGCAAC 340340 DZ257DZ257 GTTTTAGTCCCCTTTGATATTGGGGTGGTCTATATCGTTTTAGTCCCCTTTGATATTGGGGTGGTCTATATC

341341 DZ258DZ258 ATTACAATCCCCATATACAGGGGAACTGAAACATTACAATCCCCATATACAGGGGAACTGAAAC 342342 DZ259DZ259 GGGGGTGATCTCCAACGGGGGTGATCTCCAAC 343343 DZ260DZ260 ACTGTAGATAATCCCAATAGTGAAGGGAACTACAACACTGTAGATAATCCCAATAGTGAAGGGAACTACAAC 344344 DZ261DZ261 TTTGTGGGGTAGAAACAACTTTGTGGGGTAGAAACAAC 345345 DZ262DZ262 GGTGTGAAAGCCATCTTTTTGTATGGTAGGGACACCGGTGTGAAAGCCATCTTTTTGTATGGTAGGGACACC 346346 DZ263DZ263 GGTGCAGTCGCCCTTCAGTGGGCGTGGTCGGTGCAACGGTGCAGTCGCCCTTCAGTGGGCGTGGTCGGTGCAAC 347347 DZ264DZ264 TATGTTGTAACTGCCCTTATTTTGAAGGGTAAACACAACTATGTTGTAACTGCCCCTTATTTTGAAGGGTAAACACAAC 348348 DZ265DZ265 GTTTGAATACCACCCCCACATGACGGGGGACTGCAACGTTTGAATACCACCCCCACATGACGGGGGACTGCAAC 349349 DZ266DZ266 ACTGTAGACTATCCCAATAGTGAAGGGAACTACAACACTGTAGACTATCCCAATAGTGAAGGGAACTACAAC 350350 DZ267DZ267 GCTGTTTAAGCCCTTCGTTTGAAGGGTATTGACAGCGCTGTTTAAGCCCTTCGTTTGAAGGGTATTGACAGC 351351 DZ268DZ268 GTTGTAGAAGCCGTTCATTTGGAATGGTATGACAACGTTGTAGAAGCCGTTCATTTGGAATGGTATGACAAC 352352 DZ269DZ269 GGTGCAGTCGCCCTTCATTTGGGCGTGGTCTGCAGGGGTGCAGTCGCCCTTCATTTGGGCGTGGTCTGCAGG 353353 DZ270DZ270 CCCTGCACACCACGCCCAAGTTGAGGGCGACTGCACCTCCCTGCACACCACGCCCAAGTTGAGGGCGACTGCACCT 354354 DZ271DZ271 GTTGGATAAGCCTGCTATTTGCAAGGTGAAGACAACAGTTGGATAAGCCTGCTATTTGCAAGGTGAAGACAACA 355355 DZ272DZ272 GTTTTAGTCCCCTTCGATATTGGGGTGGTCTATATCGTTTTAGTCCCCTTCGATATTGGGGTGGTCTATATC 356356 DZ273DZ273 GTTGTATCCCCCTTTCAAATTGGGGTTATCCACATCGTTGTATCCCCTTTCAAATTGGGGTTATTCCACATC 357357 DZ274DZ274 GGTTCGGAACCACGCCCAATCACGGGCGACTGCACCGGTTCGGAACCACGCCCAATCACGGGCGACTGCACC 358358 DZ275DZ275 CGTAACAATCCCCGTGTGTAGGGGAACTGCAACTTCGTAACAATCCCCGTGTGTAGGGGAACTGCAACTT 359359 DZ276DZ276 GGTGCAGTCGCCCTCCATTCGGGCGTGATGTGCAGGGGTGCAGTCGCCCTCCATTCGGGCGTGATGTGCAGG 360360 DZ277DZ277 GAATGAAACTATCGTCACATAGCGACGAACTACACCGAATGAAACTATCGTCACATAGCGACGAACTACACC 361361 DZ278DZ278 GGTGTAGTCGCCTCTTAATTAGGCGTCATGTCCGGTGTAGTCGCCTCTTAATTAGGCGTCATGTCC 362362 DZ279DZ279 GATTTAGATAACCCCAATAATGAAGGGGACTAAAACGATTTAGATAACCCCAATAATGAAGGGGACTAAAAC 363363 DZ280DZ280 GTTGAATTAACCTTTCAAAAATAGGGCTACTTCAACGTTGAATTAACCTTTCAAAATAGGGCTACTTCAAC 364364 DZ281DZ281 GTTTCAGTTCCCCTGCCTACGGGGATTGTTACGTTTCAGTTCCCCTGCCTACGGGGATTGTTAC 365365 DZ282DZ282 GTCACAACTCCCGAGAGGTCACAACTCCCGAGAG 366366 DZ283DZ283 GTTGTCATACCCATCCAAACGACAGGCTTCTACAACAGTTGTCATACCCATCCAAACGACAGGCTTCTACAACA 367367 DZ284DZ284 GTTGTAATACCATCTCAAATGATGGCTTCGGCAACGTTGTAATACCATTCTCAAATGATGGCTTCGGCAAC 368368 DZ285DZ285 GGTGTGAACGCGCTTGTTTTGAAGCGTAAATACACCGGTGTGAACGCGCTTGTTTTGAAGCGTAAATACACC

369369 DZ286DZ286 GTTTCAGTTCCCCTGTATGCGGGGATTGTTACGTTTCAGTTCCCCTGTATGCGGGGATTGTTAC 370370 DZ287DZ287 TTGTCATACCCCTCCTGACGAGAGGCTTCTACAACTTGTCATACCCCTCCTGACGAGAGGCTTCTACAAC 371371 DZ288DZ288 GAAGGAGATCACCGCCACATGACGGCGGACTGCACCGAAGGAGATCACCGCCACATGACGGCGGACTGCACC 372372 DZ289DZ289 GTTGTCATACCCCTCCAAACGAGAGGCTTCTACAACGTTGTCATACCCCTCCAAACGAGAGGCTTCTACAAC 373373 DZ290DZ290 GCCTCACATCACCGCCAAAACGACGGCGGACTACACCGCCTCACATCACCGCCAAAACGACGGCGGACTACACC 374374 DZ292DZ292 GTTGAAATTATCCCCACATAAAGGGGAACTAAGACGTTGAAATTATCCCCACATAAAGGGGAACTAAGAC 375375 DZ293DZ293 GTTTTGAGAATAGCCCGACATAGAGGGCAATAGACGTTTTGAGAATAGCCCGACATAGAGGGCAATAGAC 376376 DZ294DZ294 GTAGAAATGAGTACAAAGCGATAGAGAGCTTAATAACGTAGAAATGAGTACAAAGCGATAGAGAGCTTAATAAC 377377 DZ296DZ296 GCTATATATCACCCCACTATGTAAGGGGACTAGAACGCTATATATCACCCCACTATGTAAGGGGACTAGAAC 378378 DZ297DZ297 ATTTAAGATGACTGCTTCTTTAACAGCAGACTGAACCATTTAAGATGACTGCTTCTTTAACAGCAGACTGAACC 379379 DZ298DZ298 GGTCCAGTCCGCTGTGAAAGGAGCGGTCATCCAGAACGGTCCAGTCCGCTGTGAAAGGAGCGGTCATCCAGAAC 380380 DZ299DZ299 GTGAATACAGCTCGATATAGTGAGCAATAAGATTGTGAATACAGCTCGATATAGTGAGCAATAAGATT 381381 DZ300DZ300 GTTGTAATACCATCTCAAATGATGGCTTCGGCAACGTTGTAATACCATTCTCAAATGATGGCTTCGGCAAC 382382 DZ301DZ301 CTGTTCAGGAGGACCGCTCATTTCACAGCGGACTGACCCCTGTTCAGGAGGACCGCTCATTTCACAGCGGACTGACCC 383383 DZ302DZ302 GAATACAGCTCGATATAGCGAGTAATAACGAATACAGCTCGATATAGCGAGTAATAAC 384384 DZ303DZ303 TGTGAAAGTAGCCCGATATAGAGGGCAATAATGTGAAAGTAGCCCGATATAGAGGGCAATAA 385385 DZ304DZ304 GTGAATACAGCTCGATATAGTGAGCAATAAGGTGAATACAGCTCGATATAGTGAGCAATAAG 386386 DZ305DZ305 GTTTCAGCATGACCCCTTCTTTCACAGGGGACTGAACGTTTCAGCATGACCCCCTTCTTTCACAGGGGACTGAAC 387387 DZ306DZ306 GTTAAAAGAAAACAGCCCGACATAGCGGGCAATAACGTTAAAAGAAAACAGCCCGACATAGCGGGCAATAAC 388388 DZ307DZ307 ACTTATTGCTCACTATATCGAGCTTTTCTCACACTTATTGCTCACTATATCGAGCTTTTCTCAC 389389 DZ308DZ308 GATATAGACCACCCCAATATCGAAGGGGACTAAAACGATATAGACCACCCCAATATCGAAGGGGACTAAAAC 390390 DZ309DZ309 TGTTCTGCATGACCGCCTGTTTCACGGCGGACTGACACTGTTCTGCATGACCGCCTGTTTCACGGCGGACTGACAC 391391 DZ310DZ310 GATATAGACCACCCCAATATCAAAGGGGACTAAAACGATATAGACCACCCCAATATCAAAGGGGACTAAAAC 392392 DZ311DZ311 GATATAGACCACCCCAATATCGAAGGGGACTAAAACGATATAGACCACCCCAATATCGAAGGGGACTAAAAC 393393 DZ312DZ312 GTTTTAGTCCCCTTCGATATTGGGGTGGTCTATATCGTTTTAGTCCCCTTCGATATTGGGGTGGTCTATATC 394394 DZ313DZ313 GGTGCAGGCCGCTGTGAAAGAAGCGGTCATGCGGAACGGTGCAGGCCGCTGTGAAAGAAGCGGTCATGCGGAAC 395395 DZ314DZ314 GTGAAAGTAGCCCGATATAGAGGGCAATAACGTGAAAGTAGCCCGATATAGAGGGCAATAAC 396396 DZ315DZ315 GATATAGACCACCCCAATATCAAAGGGGACTAAAACTGATATAGACCACCCCAATATCAAAGGGGACTAAAACT

397397 DZ316DZ316 GTTGTAGAAGCCTATCGTTTGGATAGGTATGACAACGTTGTAGAAGCCTATCGTTTGGATAGGTATGACAAC

最终候选Cas13蛋白的氨基酸序列编号、拓展的HEPN结构域和坐标等信息参见表2。See Table 2 for the amino acid sequence number, expanded HEPN domain and coordinates of the final candidate Cas13 protein.

表2.候选Cas13蛋白总结表Table 2. Summary table of candidate Cas13 proteins

Claims

Cas13 protein, it comprises as described in any one of SEQ ID NO:1 to 198 aminoacid sequence, or has any one of SEQ ID NO:1 to 198 of the conservative amino acid substitution of one or more residues amino acid sequence.

According to the Cas13 protein described in claim 1, its RNA cutting activity is retained.

According to the Cas13 protein according to claim 1, its HEPN domain or RNA cleavage domain is further modified or transformed to reduce or eliminate its RNA cleavage activity, and become dCas13 with reduced or eliminated RNA cleavage activity.

The Cas13 protein according to any one of claims 1 to 3, wherein the Cas13 protein is fused with one or more heterologous functional domains, wherein the fusion is at the N-terminal and C-terminal of the Cas13 protein or inside.

The Cas13 protein according to claim 4, wherein said one or more heterologous functional domains have the following activities: deaminases such as cytidine deaminase and deoxyadenosine deaminase, methylase, Demethylase, transcriptional activation, transcriptional repression, nuclease, single-stranded RNA cleavage, double-stranded RNA cleavage, single-stranded DNA cleavage, double-stranded DNA cleavage, DNA or RNA ligase, reporter protein, detection protein, localization signal, or any combination thereof.

A nucleic acid molecule comprising a nucleotide sequence encoding the Cas13 protein according to any one of claims 1 to 5.

The nucleic acid molecule according to claim 6, which is codon-optimized for expression in a specific host cell.

The nucleic acid molecule according to claim 7, wherein said host cell is a prokaryotic or eukaryotic cell, preferably a human cell.

The nucleic acid molecule according to any one of claims 6 to 8, comprising a promoter effectively linked to the nucleotide sequence encoding Cas13, which is a constitutive promoter, an inducible promoter, a tissue-specific promoter, Chimeric or developmental specific promoters.

An expression vector, comprising the nucleic acid molecule of any one of claims 6 to 9, expressing the amino acid sequence of claim 1 or the nucleotide sequence of any one of claims 6 to 9 in the form of DNA, RNA or protein.

The expression vector according to claim 10, which is adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, herpes simplex virus, oncolytic virus.

A delivery system comprising (1) the expression vector according to claim 10, or the Cas13 protein according to any one of claims 6 to 11; and (2) a delivery vector.

The delivery system according to claim 12, wherein the delivery vehicle is a nanoparticle, liposome, exosome, microvesicle or gene gun.

CRISPR-Cas system, it comprises: (1) Cas13 protein according to any one of claims 1 to 5 or its derivative or functional fragment, or the nucleic acid molecule according to any one of claims 6 to 9; ( 2) The gRNA sequence used to target the target RNA.

According to the CRISPR-Cas system according to claim 14, the functional fragment of the Cas13 protein shall refer to any one amino acid (such as 1, 2, 3, 4, 5, 6, 7) containing one or more SEQ ID NO: 1 to 204 , 8, 9 or 10 residues), additions, deletions and/or substitutions (eg conservative substitutions).

According to the CRISPR-Cas system according to claim 15, the derivative of the Cas13 protein should at least have at least ≥70% amino acid sequence identity with any protein fragment in SEQ ID NO: 1 to 198 (such as 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% agreement).

The CRISPR-Cas system according to claim 14, wherein the gRNA sequence comprises a direct repeat (DR) sequence and a sequence targeting a spacer region of the target RNA portion.

The CRISPR-Cas system according to claim 17, wherein the DR sequence is the sequence shown in Table 1; wherein the spacer sequence is 15-60 nucleotides, preferably 25-50 nucleotides, more Preferably 30 nucleotides.

The CRISPR-Cas system according to claim 18, wherein the DR sequence can be a derivative corresponding to any of the following, wherein the derivative (i) has, compared with any of the sequences shown in Table 1, One or more (such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) additions, deletions, or substitutions of nucleotides; (ii) with any of the sequences shown in Table 1 One has at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 97% sequence identity; (iii) under stringent conditions as shown in Table 1 Any of the sequences, or hybridize to any of (i) and (ii); or (iv) is the complement of any of (i)-(iii), provided that the derivative is not shown in Table 1 Any one of the sequences, and said derivative encodes an RNA, or itself is an RNA, said RNA substantially maintains the same secondary structure as any RNA encoded by SEQ ID NO: 199-397.

The CRISPR-Cas system according to claim 14, further comprising: (3) target RNA.

The CRISPR-Cas system according to claim 14, which causes degradation, cleavage or sequence change of the target RNA sequence.

The CRISPR-Cas system according to claim 20, wherein the target RNA is mRNA or ncRNA, including non-coding RNA selected from lncRNA, miRNA, misc_RNA, Mt_rRNA, Mt_tRNA, rRNA, scaRNA, scRNA, snoRNA, snRNA, sRNA .

A cell comprising the Cas13 protein described in any one of claims 1 to 5, the nucleic acid molecule described in any one of claims 6 to 9, the expression vector described in claim 10 or 11, the delivery described in claim 12 or 13 system, or the CRISPR-Cas system described in any one of claims 14 to 22.

The cell according to claim 23, which is a prokaryotic cell or a eukaryotic cell, preferably a human cell.

The method for degrading or cutting the target RNA in the target cell, modifying the sequence of the target RNA in the target cell, which includes using the Cas13 protein described in any one of claims 1 to 5, the nucleic acid molecule described in any one of claims 6 to 9 , the expression vector of claim 10 or 11, the delivery system of claim 12 or 13, or the CRISPR-Cas system of any one of claims 14 to 22.

The method according to claim 25, the target cells are prokaryotic cells or eukaryotic cells, preferably human cells.

The method according to claim 25, wherein the target cells are ex vivo cells, in vitro cells or in vivo cells.

The Cas13 protein described in any one of claims 1 to 5, the nucleic acid molecule described in any one of claims 6 to 9, or the CRISPR-Cas system described in any one of claims 14 to 22 is used to detect nucleic acid molecules use.

The use according to claim 28, wherein the detected target is RNA or DNA, wherein the RNA or DNA is RNA or DNA in prokaryotic microorganisms or eukaryotic organisms.

The use according to claim 29, wherein the prokaryotic microorganism is a DNA virus or a nucleic acid thereof, an RNA virus or a nucleic acid thereof.

The use according to claim 29, wherein the eukaryotic organisms include animals and plants, preferably humans; the RNA or DNA in the body includes RNA or DNA in cells or body fluids.

The use according to claim 31, wherein said body fluid comprises body fluids such as blood, urine or lymph.