[go: up one dir, main page]

WO2024138680A1 - Protein expression analysis method and related device - Google Patents

Protein expression analysis method and related device Download PDF

Info

Publication number
WO2024138680A1
WO2024138680A1 PCT/CN2022/144141 CN2022144141W WO2024138680A1 WO 2024138680 A1 WO2024138680 A1 WO 2024138680A1 CN 2022144141 W CN2022144141 W CN 2022144141W WO 2024138680 A1 WO2024138680 A1 WO 2024138680A1
Authority
WO
WIPO (PCT)
Prior art keywords
protein
sequence value
cell
sequencing data
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/144141
Other languages
French (fr)
Chinese (zh)
Inventor
刘伟庆
李兆勋
刘杏
李美
黎宇翔
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Shenzhen Co Ltd
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Priority to PCT/CN2022/144141 priority Critical patent/WO2024138680A1/en
Priority to CN202280102034.4A priority patent/CN120266210A/en
Publication of WO2024138680A1 publication Critical patent/WO2024138680A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • comparing the sequence value in the sequencing data inside the cell with the sequence value of the protein in a preset protein library to obtain a comparison result includes:
  • a comparison result is determined that the sequence value in the sequencing data inside the cell matches the sequence value of the protein in the preset protein library.
  • the present application provides a protein expression analysis device, the device comprising:
  • sequence value in the sequencing data includes a coupled initial base sequence and a marked base sequence, and the marked base sequence is used to determine the type of protein corresponding to the sequence value.
  • the device further includes a display module, and the analysis module is further used to generate a protein expression heat map according to the analysis results of the protein expression, wherein the protein expression heat map is used to describe the type, location and expression abundance of the protein inside the cell;
  • the present application provides a device, the device comprising: a processor and a memory;
  • the present application provides a computer storage medium, which includes computer instructions.
  • the computer instructions When the computer instructions are executed on an electronic device, the electronic device executes the method as described in any one of the first aspects.
  • the present application provides a computer program product.
  • the computer program product runs on a computer, the computer executes the method as described in any one of the first aspects.
  • the present application provides a protein expression analysis method and related equipment, the method comprising: obtaining sequencing data of a biological sample sequenced by a sequencing chip, the sequencing data comprising a sequence value and the position of the sequence value; determining the sequencing data belonging to the inside of the cell according to the single-stranded DNA staining image of the biological sample; comparing the sequence value in the sequencing data inside the cell with the sequence value of the protein in a pre-set protein library to obtain a comparison result; if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, then obtaining the analysis result of protein expression, the analysis result comprising the type of matched protein and the position inside the cell.
  • the method comprehensively analyzes the protein expression inside the cell by sequencing, staining, and comparison with the protein library, and can obtain the protein expression situation inside the cell, such as the type, position, and expression abundance of the protein inside the cell.
  • FIG1 is a flow chart of a protein expression analysis method provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of a single-stranded DNA staining image provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of a binary image provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a protein expression heat map provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a spatiotemporal proteomics analysis method provided in an embodiment of the present application.
  • FIG6 is a basic statistical diagram of data provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of a protein data capture situation provided in an embodiment of the present application.
  • FIG8 is a saturation statistical diagram provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of the proportion of protein types provided in an embodiment of the present application.
  • words such as “exemplary” or “for example” are used to indicate examples, illustrations or descriptions. Any embodiment or design described as “exemplary” or “for example” in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as “exemplary” or “for example” is intended to present related concepts in a specific way.
  • RNA will be translated into protein, and protein is the direct embodiment of physiological function.
  • immunohistochemistry immunofluorescence
  • immunohistochemistry can detect a limited number of proteins at the same time, the detection area is small, and the resolution is low.
  • protein detection can be achieved, there is still a lack of relevant analysis methods, and it is impossible to analyze the expression of proteins.
  • the present application provides a protein expression analysis method, which aims to obtain the types of proteins inside cells, the locations of proteins, and the expression abundance of proteins.
  • the method can be performed by an analysis device, and the method includes: the analysis device obtains sequencing data of a biological sample sequenced by a sequencing chip, and the sequencing data includes a sequence value and the position of the sequence value; according to the single-stranded DNA staining image of the biological sample, the sequencing data belonging to the inside of the cell is determined; the sequence value in the sequencing data inside the cell is compared with the sequence value of the protein in the pre-set protein library to obtain a comparison result; if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, the analysis result of the protein expression is obtained, and the analysis result includes the type of the matched protein and the position inside the cell.
  • the method comprehensively analyzes the protein expression inside the cell by sequencing, staining, and comparison with the protein library, and can obtain the protein expression inside the cell, such as the type, position, and expression abundance of the protein inside the cell.
  • FIG. 1 this figure is a flow chart of a protein expression analysis method provided in an embodiment of the present application, the method comprising:
  • An analysis device obtains sequencing data of a biological sample sequenced by a sequencing chip, wherein the sequencing data includes a sequence value and a location of a protein corresponding to the sequence value.
  • the sequencing chip can be a spatiotemporal sequencing chip (e.g., a BGI spatiotemporal sequencing chip), and the biological sample is sequenced based on the sequencing chip to obtain sequencing data.
  • the sequencing data includes a sequence value and the location of the protein corresponding to the sequence value.
  • the location of the protein can be represented by coordinates, such as abscissa and ordinate.
  • the sequence value can be a base sequence, such as "GCCGCCCCAG".
  • An embodiment of the present application provides a sequencing data, as shown in Table 1.
  • barcode represents the sequence value
  • x represents the horizontal coordinate of the corresponding sequence value
  • y represents the vertical coordinate of the corresponding sequence value
  • the tissue sections of the biological sample can be pre-stained with single-stranded DNA, and then the biological sample can be microphotographed to obtain a single-stranded DNA staining image.
  • Figure 2 is a schematic diagram of a single-stranded DNA staining image provided in an embodiment of the present application.
  • the analysis device can perform binarization processing on the single-stranded DNA staining image to obtain a binarized image.
  • this figure is a schematic diagram of a binarized image provided in an embodiment of the present application. Among them, the white part in the binarized image represents the inside of the cell, and the black part represents the outside of the cell.
  • the sequencing data that does not belong to the interior of the cell can be filtered out, thereby obtaining the sequencing data that belongs to the interior of the cell.
  • the area inside the cell can be determined by binarizing the image, and the sequencing data obtained in S101 includes the position, and then the position corresponding to the interior of the cell is filtered out from the sequencing data, thereby obtaining the remaining sequencing data that belongs to the interior of the cell.
  • the analysis device compares the sequence value in the sequencing data inside the cell with the sequence value of the protein in the preset protein library to obtain a comparison result.
  • the pre-set protein library can be a library that can detect proteins based on the screening of spatiotemporal proteomics technology.
  • the sequence values in Table 1 above can be compared with the sequence values of proteins in the pre-set protein library to obtain a comparison result.
  • the comparison result can include two situations: matching and mismatching. In the case of matching, the type of protein corresponding to the sequence value in the sequencing data can be determined based on the protein library. In the case of mismatching, the protein corresponding to the sequence value is characterized as not existing.
  • the analysis device can match the sequence value in the sequencing data inside the cell with the sequence value of the protein in the pre-set protein library to obtain a matching degree. If the matching degree is greater than a preset threshold, the comparison result that the sequence value in the sequencing data inside the cell matches the sequence value of the protein in the pre-set protein library is determined.
  • the matching degree can be characterized by a percentage or other means.
  • the preset threshold can be an empirical value or a value determined based on an error range, for example, it can be 95%, 90%, etc.
  • the sequence value in the sequencing data includes a coupled initial base sequence and a marker base sequence, and the marker base sequence is used to determine the type of protein corresponding to the sequence value.
  • the marker base sequence may be a 15 bp base sequence, as shown in Table 2 below.
  • Table 2 is an example of an ADT tag library provided in an embodiment of the present application.
  • the analysis device can couple the above-mentioned marked base sequence on the basis of the initial base sequence, so as to facilitate the comparison and subsequent processing.
  • a unique molecular identifier is an identifier used to mark a protein. If a protein is obtained by replication, the same unique molecular identifier will exist. If two proteins are the same but not obtained by replication, the unique molecular identifiers of the two proteins are different. Therefore, based on the unique molecular identifier, the duplicate proteins generated by the PCR process can be removed to obtain the true number of proteins inside the cell. Based on the true number of proteins, the expression abundance of the protein can be obtained. Based on the above process, the analysis equipment can obtain a statistical table of protein expression, as shown in Table 3 below.
  • GeneID represents the type of protein
  • x represents the horizontal coordinate of the protein
  • y represents the vertical coordinate of the protein
  • MIDCount represents the expression abundance of the protein
  • the sequencing chip can be a 1cm*1cm chip.
  • the sequencing chip can include many points, each store corresponds to a probe, the sequence of the probe is random, and the probe at each position will correspond to a sequence.
  • spatiotemporal proteomics technology is mainly divided into three parts.
  • the spatiotemporal sequencing chip is sequenced for the first time to obtain the chip coordinate mask file (as shown in Table 1 above), in which the first column is the 25bp barcode sequence, the second column is the sequencing chip x coordinate, and the third column is the sequencing chip y coordinate;
  • the tissue sections are stained with ssDNA and then photographed microscopically.
  • the analysis equipment can obtain the nucleus imaging of cells on the tissue, also known as the ssDNA map (as shown in Figure 2), and then the cells are segmented according to the position of the nucleus shown on the ssDNA map.
  • the embodiment of the present application provides an experimental example, which is introduced below.
  • this figure is a basic statistical diagram of data provided by the embodiment of the present application.
  • the English explanation in the figure can be found in Table 5 below.
  • this figure is a saturation statistical graph provided in an embodiment of the present application.
  • the horizontal axis of Figure 8 (a) represents the amount of sequencing data, and the vertical axis represents the saturation;
  • the horizontal axis of Figure 8 (b) represents the amount of sequencing data, and the vertical axis represents the number of unique reads under each bin.
  • the embodiment of the present application provides a protein expression analysis method, which includes: obtaining sequencing data of sequencing a biological sample by a sequencing chip, the sequencing data including a sequence value and the position of the sequence value; determining the sequencing data belonging to the inside of the cell according to the single-stranded DNA staining image of the biological sample; comparing the sequence value in the sequencing data inside the cell with the sequence value of the protein in a pre-set protein library to obtain a comparison result; if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, then obtaining the analysis result of protein expression, the analysis result including the type of matched protein and the position inside the cell.
  • the method comprehensively analyzes the protein expression inside the cell by sequencing, staining, and comparison with the protein library, and can obtain the protein expression inside the cell, such as the type, position, and expression abundance of the protein inside the cell.
  • the present application also provides a protein expression analysis device, as shown in FIG10 , which is a schematic diagram of a protein expression analysis device provided in the present application, the device comprising:
  • An acquisition module 1001 is used to acquire sequencing data of a biological sample sequenced by a sequencing chip, wherein the sequencing data includes a sequence value and a position of the sequence value;
  • a determination module 1002 is used to determine the sequencing data belonging to the interior of the cell according to the single-stranded DNA staining image of the biological sample;
  • the analysis module 1004 is used to obtain the analysis result of protein expression if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, and the analysis result includes the type of the matched protein and the location inside the cell.
  • sequence value in the sequencing data includes a coupled initial base sequence and a marked base sequence, and the marked base sequence is used to determine the type of protein corresponding to the sequence value.
  • the device further includes a display module, and the analysis module 1004 is further used to generate a protein expression heat map according to the analysis results of the protein expression, wherein the protein expression heat map is used to describe the type, location and expression abundance of the protein inside the cell;
  • the analysis module 1004 is further used to generate a protein in situ expression level report according to the analysis results; the display module is further used to display the protein in situ expression level report.
  • the embodiment of the present application also provides a device, which includes: a processor and a memory;
  • One or more computer programs are stored in the memory, and the one or more computer programs include instructions; when the instructions are executed by the processor, the electronic device executes a method as described in any one of the method embodiments.
  • An embodiment of the present application further provides a computer storage medium, which includes computer instructions.
  • the computer instructions When the computer instructions are executed on an electronic device, the electronic device executes a method as described in any one of the method embodiments.
  • An embodiment of the present application further provides a computer program product.
  • the computer program product When the computer program product is run on a computer, the computer executes a method as described in any one of the method embodiments.
  • the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.
  • a computer device which can be a personal computer, a training device, or a network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from one website, computer, training equipment, or data center to another website, computer, training equipment, or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
  • wired e.g., coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless e.g., infrared, wireless, microwave, etc.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Microbiology (AREA)
  • Urology & Nephrology (AREA)
  • Hematology (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • Food Science & Technology (AREA)
  • Pathology (AREA)
  • Cell Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A protein expression analysis method and a related device, relating to the technical field of biology. The method comprises: acquiring sequencing data from a sequencing chip sequencing a biological sample, the sequencing data comprising a sequence value and the position of a protein corresponding to the sequence value; according to a single-stranded DNA dyeing image of the biological sample, determining sequencing data belonging to the interior of a cell; comparing the sequence value in the sequencing data of the interior of the cell with a sequence value of the protein in a preset protein library, and obtaining a comparison result; and if the comparison result shows that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, acquiring an analysis result of protein expression, the analysis result comprising the type of the matched protein and the position in the interior of the cell. In said method, protein expression conditions can be analyzed.

Description

一种蛋白表达分析方法以及相关设备A protein expression analysis method and related equipment 技术领域Technical Field

本申请涉及生物技术领域,尤其涉及一种蛋白表达分析方法以及相关设备。The present application relates to the field of biotechnology, and in particular to a protein expression analysis method and related equipment.

背景技术Background technique

蛋白质是生命的物质基础,是有机大分子,是构成细胞的基本有机物,是生命活动的主要承担者。一般的,在RNA翻译成蛋白质后,蛋白质直接体现生理功能。Protein is the material basis of life, an organic macromolecule, the basic organic matter that constitutes cells, and the main bearer of life activities. Generally, after RNA is translated into protein, protein directly reflects physiological functions.

目前,对蛋白检测所采用的技术为免疫组化(免疫荧光)技术,但是,该技术能够同时检测的蛋白数量有限,仅在40种左右。Currently, the technology used for protein detection is immunohistochemistry (immunofluorescence) technology. However, the number of proteins that can be detected simultaneously by this technology is limited, only about 40.

虽然,现有技术中,能够实现对蛋白的检测,但还缺少相关的分析方法,无法对蛋白表达情况进行分析。Although the existing technology can realize the detection of proteins, there is still a lack of relevant analysis methods, and it is impossible to analyze the protein expression.

发明内容Summary of the invention

本申请提供了一种蛋白表达分析方法以及相关设备,能够对蛋白表达情况进行分析。The present application provides a protein expression analysis method and related equipment, which can analyze protein expression.

第一方面,本申请提供了一种蛋白表达分析方法,该方法包括:In a first aspect, the present application provides a method for analyzing protein expression, the method comprising:

获取测序芯片对生物样品进行测序的测序数据,所述测序数据包括序列值和所述序列值对应蛋白质的所在位置;Acquire sequencing data of a biological sample sequenced by a sequencing chip, wherein the sequencing data includes a sequence value and a location of a protein corresponding to the sequence value;

根据所述生物样品的单链DNA染色图像,确定属于细胞内部的测序数据;Determining sequencing data belonging to the interior of the cell according to the single-stranded DNA staining image of the biological sample;

将所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果;Comparing the sequence value in the sequencing data inside the cell with the sequence value of the protein in a preset protein library to obtain a comparison result;

若所述比对结果表征所述测序数据中的序列值与所述蛋白库中的蛋白质的序列值匹配,则获取蛋白表达的分析结果,所述分析结果包括相匹配的蛋白质的种类以及在所述细胞内部的位置。If the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, then the analysis result of protein expression is obtained, and the analysis result includes the type of the matched protein and the location inside the cell.

可选的,所述分析结果还包括:每种蛋白质对应的表达丰度;所述方法还包括:Optionally, the analysis result further includes: the expression abundance corresponding to each protein; the method further includes:

获取所述细胞内部的蛋白质的唯一分子标识;Obtaining a unique molecular identifier of a protein inside the cell;

根据所述细胞内部的蛋白质的唯一分子标识,对所述细胞内部的蛋白质进行去重处理,得到所述细胞内部的每种蛋白质对应的表达丰度。According to the unique molecular identifier of the protein inside the cell, the protein inside the cell is deduplicated to obtain the expression abundance corresponding to each protein inside the cell.

可选的,所述将所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果,包括:Optionally, comparing the sequence value in the sequencing data inside the cell with the sequence value of the protein in a preset protein library to obtain a comparison result includes:

若所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值的匹配程度大于预设阈值,则确定所述细胞内部的测序数据中的序列值与所述预先设定的蛋白库中蛋白质的序列值匹配的比对结果。If the degree of match between the sequence value in the sequencing data inside the cell and the sequence value of the protein in the preset protein library is greater than a preset threshold, a comparison result is determined that the sequence value in the sequencing data inside the cell matches the sequence value of the protein in the preset protein library.

可选的,所述测序数据中的序列值包括耦接的初始碱基序列和标记碱基序列,所述标记碱基序列用于确定所述序列值对应的蛋白质的种类。Optionally, the sequence value in the sequencing data includes a coupled initial base sequence and a marked base sequence, and the marked base sequence is used to determine the type of protein corresponding to the sequence value.

可选的,所述方法还包括:Optionally, the method further includes:

根据所述蛋白表达的分析结果,生成蛋白表达热图,所述蛋白表达热图用于描述所述细胞内部蛋白质的种类、位置以及表达丰度;Generate a protein expression heat map according to the analysis results of the protein expression, wherein the protein expression heat map is used to describe the type, location and expression abundance of the protein inside the cell;

展示所述蛋白表达热图。The protein expression heat map is displayed.

可选的,所述方法还包括:Optionally, the method further includes:

根据所述分析结果,生成蛋白原位表达量报告;Generate a protein in situ expression report based on the analysis results;

展示所述蛋白原位表达量报告。A report on the in situ expression level of the protein is displayed.

第二方面,本申请提供了一种蛋白表达分析装置,该装置包括:In a second aspect, the present application provides a protein expression analysis device, the device comprising:

获取模块,用于获取测序芯片对生物样品进行测序的测序数据,所述测序数据包括序列值和所述序列值对应蛋白质的所在位置;An acquisition module, used to acquire sequencing data of a biological sample sequenced by a sequencing chip, wherein the sequencing data includes a sequence value and a location of a protein corresponding to the sequence value;

确定模块,用于根据所述生物样品的单链DNA染色图像,确定属于细胞内部的测序数据;A determination module, used to determine the sequencing data belonging to the interior of the cell according to the single-stranded DNA staining image of the biological sample;

比对模块,用于将所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果;A comparison module, used to compare the sequence value in the sequencing data inside the cell with the sequence value of the protein in the preset protein library to obtain a comparison result;

分析模块,用于若所述比对结果表征所述测序数据中的序列值与所述蛋白库中的蛋白质的序列值匹配,则获取蛋白表达的分析结果,所述分析结果包括相匹配的蛋白质的种类以及在所述细胞内部的位置。通过计算该种蛋白的测序数据的数量,确定每种蛋白质的表达丰度。The analysis module is used to obtain the analysis result of protein expression if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, and the analysis result includes the type of the matched protein and the location inside the cell. The expression abundance of each protein is determined by calculating the number of sequencing data of the protein.

可选的,所述分析结果还包括:每种蛋白质对应的表达丰度;所述获取模块,还用于获取所述细胞内部的蛋白质的唯一分子标识;所述分析模块,还用于根据所述细胞内部的蛋白质的唯一分子标识,对所述细胞内部的蛋白质进行去重处理,得到所述细胞内部的每种蛋白质对应的表达丰度。Optionally, the analysis result also includes: the expression abundance corresponding to each protein; the acquisition module is also used to obtain the unique molecular identifier of the protein inside the cell; the analysis module is also used to deduplicate the protein inside the cell according to the unique molecular identifier of the protein inside the cell to obtain the expression abundance corresponding to each protein inside the cell.

可选的,所述比对模块,具体用于若所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值的匹配程度大于预设阈值,则确定所述细胞内部的测序数据中的序列值与所述预先设定的蛋白库中蛋白质的序列值匹配的比对结果。Optionally, the comparison module is specifically used to determine a comparison result that the sequence value in the sequencing data inside the cell matches the sequence value of the protein in the preset protein library if the degree of match between the sequence value in the sequencing data inside the cell and the sequence value of the protein in the preset protein library is greater than a preset threshold.

可选的,所述测序数据中的序列值包括耦接的初始碱基序列和标记碱基序列,所述标记碱基序列用于确定所述序列值对应的蛋白质的种类。Optionally, the sequence value in the sequencing data includes a coupled initial base sequence and a marked base sequence, and the marked base sequence is used to determine the type of protein corresponding to the sequence value.

可选的,所述装置还包括展示模块,所述分析模块,还用于根据所述蛋白表达的分析结果,生成蛋白表达热图,所述蛋白表达热图用于描述所述细胞内部蛋白质的种类、位置以及表达丰度;Optionally, the device further includes a display module, and the analysis module is further used to generate a protein expression heat map according to the analysis results of the protein expression, wherein the protein expression heat map is used to describe the type, location and expression abundance of the protein inside the cell;

所述展示模块,用于展示所述蛋白表达热图。The display module is used to display the protein expression heat map.

可选的,所述分析模块,还用于根据所述分析结果,生成蛋白原位表达量报告;所述展示模块,还用于展示所述蛋白原位表达量报告。Optionally, the analysis module is further used to generate a protein in situ expression level report based on the analysis results; the display module is further used to display the protein in situ expression level report.

第三方面,本申请提供了一种设备,该设备包括:处理器和存储器;In a third aspect, the present application provides a device, the device comprising: a processor and a memory;

其中,在所述存储器中存储有一个或多个计算机程序,所述一个或多个计算机程序包括指令;当所述指令被所述处理器执行时,使得所述电子设备执行如第一方面中任一项所述的方法。One or more computer programs are stored in the memory, and the one or more computer programs include instructions; when the instructions are executed by the processor, the electronic device executes the method as described in any one of the first aspects.

第四方面,本申请提供了一种计算机存储介质,该计算机存储介质包括计算机指令,当所述计算机指令在电子设备上运行时,所述电子设备执行如第一方面中任一项所述的方法。In a fourth aspect, the present application provides a computer storage medium, which includes computer instructions. When the computer instructions are executed on an electronic device, the electronic device executes the method as described in any one of the first aspects.

第五方面,本申请提供了一种计算机程序产品,当所述计算机程序产品在计算机上运 行时,所述计算机执行如第一方面中任一项所述的方法。In a fifth aspect, the present application provides a computer program product. When the computer program product runs on a computer, the computer executes the method as described in any one of the first aspects.

本申请的技术方案具有如下有益效果:The technical solution of this application has the following beneficial effects:

本申请提供了一种蛋白表达分析方法以及相关设备,该方法包括:获取测序芯片对生物样品进行测序的测序数据,所述测序数据包括序列值和所述序列值所在的位置;根据所述生物样品的单链DNA染色图像,确定属于细胞内部的测序数据;将所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果;若所述比对结果表征所述测序数据中的序列值与所述蛋白库中的蛋白质的序列值匹配,则获取蛋白表达的分析结果,所述分析结果包括相匹配的蛋白质的种类以及在所述细胞内部的位置。该方法通过测序、染色、与蛋白库比对的方式,对细胞内部的蛋白表达进行了综合分析,能够得到细胞内部的蛋白表达情况,如细胞内部蛋白质的种类、位置、表达丰度等。The present application provides a protein expression analysis method and related equipment, the method comprising: obtaining sequencing data of a biological sample sequenced by a sequencing chip, the sequencing data comprising a sequence value and the position of the sequence value; determining the sequencing data belonging to the inside of the cell according to the single-stranded DNA staining image of the biological sample; comparing the sequence value in the sequencing data inside the cell with the sequence value of the protein in a pre-set protein library to obtain a comparison result; if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, then obtaining the analysis result of protein expression, the analysis result comprising the type of matched protein and the position inside the cell. The method comprehensively analyzes the protein expression inside the cell by sequencing, staining, and comparison with the protein library, and can obtain the protein expression situation inside the cell, such as the type, position, and expression abundance of the protein inside the cell.

应当理解的是,本申请中对技术特征、技术方案、有益效果或类似语言的描述并不是暗示在任意的单个实施例中可以实现所有的特点和优点。相反,可以理解的是对于特征或有益效果的描述意味着在至少一个实施例中包括特定的技术特征、技术方案或有益效果。因此,本说明书中对于技术特征、技术方案或有益效果的描述并不一定是指相同的实施例。进而,还可以任何适当的方式组合本实施例中所描述的技术特征、技术方案和有益效果。本领域技术人员将会理解,无需特定实施例的一个或多个特定的技术特征、技术方案或有益效果即可实现实施例。在其他实施例中,还可在没有体现所有实施例的特定实施例中识别出额外的技术特征和有益效果。It should be understood that the description of technical features, technical solutions, beneficial effects or similar language in this application does not imply that all features and advantages can be realized in any single embodiment. On the contrary, it is understood that the description of features or beneficial effects means that specific technical features, technical solutions or beneficial effects are included in at least one embodiment. Therefore, the description of technical features, technical solutions or beneficial effects in this specification does not necessarily refer to the same embodiment. Furthermore, the technical features, technical solutions and beneficial effects described in the present embodiment can also be combined in any appropriate manner. Those skilled in the art will understand that the embodiment can be realized without one or more specific technical features, technical solutions or beneficial effects of a specific embodiment. In other embodiments, additional technical features and beneficial effects can also be identified in a specific embodiment that does not embody all embodiments.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本申请实施例提供的一种蛋白表达分析方法的流程图;FIG1 is a flow chart of a protein expression analysis method provided in an embodiment of the present application;

图2为本申请实施例提供的一种单链DNA染色图像的示意图;FIG2 is a schematic diagram of a single-stranded DNA staining image provided in an embodiment of the present application;

图3为本申请实施例提供的一种二值化图像的示意图;FIG3 is a schematic diagram of a binary image provided in an embodiment of the present application;

图4为本申请实施例提供的一种蛋白表达热图的示意图;FIG4 is a schematic diagram of a protein expression heat map provided in an embodiment of the present application;

图5为本申请实施例提供的一种时空蛋白组学分析方法的示意图;FIG5 is a schematic diagram of a spatiotemporal proteomics analysis method provided in an embodiment of the present application;

图6为本申请实施例提供的一种数据基本统计示意图;FIG6 is a basic statistical diagram of data provided in an embodiment of the present application;

图7为本申请实施例提供的一种蛋白数据捕获情况的示意图;FIG7 is a schematic diagram of a protein data capture situation provided in an embodiment of the present application;

图8为本申请实施例提供的一种饱和度统计图;FIG8 is a saturation statistical diagram provided in an embodiment of the present application;

图9为本申请实施例提供的一种蛋白质种类的占比示意图;FIG9 is a schematic diagram of the proportion of protein types provided in an embodiment of the present application;

图10为本申请实施例提供的一种蛋白表达分析装置的示意图。FIG. 10 is a schematic diagram of a protein expression analysis device provided in an embodiment of the present application.

具体实施方式Detailed ways

本申请说明书和权利要求书及附图说明中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。The terms "first", "second", "third", etc. in the specification, claims and drawings of this application are used to distinguish different objects rather than to limit a specific order.

在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, words such as "exemplary" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary" or "for example" is intended to present related concepts in a specific way.

为了下述各实施例的描述清楚简洁,首先给出相关技术的简要介绍:In order to make the description of the following embodiments clear and concise, a brief introduction to the related technology is first given:

时空组学技术,用于对生命科学的研究,曾被评为年度技术。基于此,利用时空组学技术的研究也越来越多,促进了物种演化、胚胎发育、疾病机理等领域的研究。RNA会被翻译成蛋白质,而蛋白质才是生理功能的直接体现者。Spatiotemporal omics technology, used in the study of life sciences, was once named the technology of the year. Based on this, more and more studies using spatiotemporal omics technology have been conducted, promoting research in the fields of species evolution, embryonic development, disease mechanisms, etc. RNA will be translated into protein, and protein is the direct embodiment of physiological function.

目前,可以采用免疫组化(免疫荧光)技术,对蛋白进行检测,但是,免疫组化技术能够同时检测的蛋白数量有限,检测区域较小,分辨率较低。虽然能够实现对蛋白的检测,但仍缺少相关的分析方法,无法对蛋白的表达情况进行分析。Currently, immunohistochemistry (immunofluorescence) technology can be used to detect proteins. However, immunohistochemistry can detect a limited number of proteins at the same time, the detection area is small, and the resolution is low. Although protein detection can be achieved, there is still a lack of relevant analysis methods, and it is impossible to analyze the expression of proteins.

有鉴于此,本申请提供了一种蛋白表达分析方法,该方法旨在获取细胞内部的蛋白质的种类,蛋白质的位置以及蛋白质的表达丰度。In view of this, the present application provides a protein expression analysis method, which aims to obtain the types of proteins inside cells, the locations of proteins, and the expression abundance of proteins.

具体地,该方法可以由分析设备执行,该方法包括:分析设备获取测序芯片对生物样品进行测序的测序数据,该测序数据包括序列值和所述序列值所在的位置;根据生物样品的单链DNA染色图像,确定属于细胞内部的测序数据;将细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果;若比对结果表征测序数据中的序列值与所述蛋白库中的蛋白质的序列值匹配,则获取蛋白表达的分析结果,分析结果包括相匹配的蛋白质的种类以及在细胞内部的位置。该方法通过测序、染色、与蛋白库比对的方式,对细胞内部的蛋白表达进行了综合分析,能够得到细胞内部的蛋白表达情况,如细胞内部蛋白质的种类、位置、表达丰度等。Specifically, the method can be performed by an analysis device, and the method includes: the analysis device obtains sequencing data of a biological sample sequenced by a sequencing chip, and the sequencing data includes a sequence value and the position of the sequence value; according to the single-stranded DNA staining image of the biological sample, the sequencing data belonging to the inside of the cell is determined; the sequence value in the sequencing data inside the cell is compared with the sequence value of the protein in the pre-set protein library to obtain a comparison result; if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, the analysis result of the protein expression is obtained, and the analysis result includes the type of the matched protein and the position inside the cell. The method comprehensively analyzes the protein expression inside the cell by sequencing, staining, and comparison with the protein library, and can obtain the protein expression inside the cell, such as the type, position, and expression abundance of the protein inside the cell.

为了使得本申请的技术方案更加清楚、易于理解,下面结合附图,以分析设备的角度,对本申请的技术方案进行介绍。In order to make the technical solution of the present application clearer and easier to understand, the technical solution of the present application is introduced below from the perspective of analyzing the equipment in conjunction with the accompanying drawings.

如图1所示,该图为本申请实施例提供的一种蛋白表达分析方法的流程图,该方法包括:As shown in FIG. 1 , this figure is a flow chart of a protein expression analysis method provided in an embodiment of the present application, the method comprising:

S101、分析设备获取测序芯片对生物样品进行测序的测序数据,测序数据包括序列值和所述序列值对应蛋白质的所在位置。S101. An analysis device obtains sequencing data of a biological sample sequenced by a sequencing chip, wherein the sequencing data includes a sequence value and a location of a protein corresponding to the sequence value.

测序芯片可以是时空测序芯片(例如华大时空测序芯片),基于该测序芯片对生物样品进行测序,得到测序数据。该测序数据包括序列值以及序列值对应蛋白质的所在位置。在一些示例中,蛋白质的所在位置可以通过坐标表示,例如横坐标和纵坐标。序列值可以是碱基序列,例如“GCCGCCCCAG”。本申请实施例提供了一种测序数据,如表1所示。The sequencing chip can be a spatiotemporal sequencing chip (e.g., a BGI spatiotemporal sequencing chip), and the biological sample is sequenced based on the sequencing chip to obtain sequencing data. The sequencing data includes a sequence value and the location of the protein corresponding to the sequence value. In some examples, the location of the protein can be represented by coordinates, such as abscissa and ordinate. The sequence value can be a base sequence, such as "GCCGCCCCAG". An embodiment of the present application provides a sequencing data, as shown in Table 1.

表1:Table 1:

BarcodeBarcode xx yy GCCCCCCCGCACCCCGCCGGAGAAGGCCCCCCCGCACCCCGCCGGAGAAG 6761567615 3382633826 CCTCCCCCGCCGCCCCTCGCGAAAGCCTCCCCCGCCGCCCCTCGCGAAAG 6761567615 3382733827 CCTTCCCCCGACCCCCTCACATCGGCCTTCCCCCGACCCCCTCACATCGG 6761567615 3382933829 CTGTCCGCTCCCCTCTCCTCGTTCACTGTCCGCTCCCCTCTCCTCGTTCA 6761567615 3383133831

其中,barcode表示序列值,x表示相应序列值的横坐标,y表示相应序列值的纵坐标。Wherein, barcode represents the sequence value, x represents the horizontal coordinate of the corresponding sequence value, and y represents the vertical coordinate of the corresponding sequence value.

S102、分析设备根据生物样品的单链DNA染色图像,确定属于细胞内部的测序数据。S102. The analysis device determines the sequencing data belonging to the interior of the cell based on the single-stranded DNA staining image of the biological sample.

在一些示例中,可以预先对生物样品的组织切片进行单链DNA染色,然后对该生物样品进行显微拍照,得到单链DNA染色图像。如图2所示,该图为本申请实施例提供的一种 单链DNA染色图像的示意图。In some examples, the tissue sections of the biological sample can be pre-stained with single-stranded DNA, and then the biological sample can be microphotographed to obtain a single-stranded DNA staining image. As shown in Figure 2, this figure is a schematic diagram of a single-stranded DNA staining image provided in an embodiment of the present application.

分析设备可以将该单链DNA染色图像进行二值化处理,得到二值化图像。如图3所示,该图为本申请实施例提供的一种二值化图像的示意图。其中,该二值化图像中白色部分表示细胞内部,黑色部分表示细胞外部。The analysis device can perform binarization processing on the single-stranded DNA staining image to obtain a binarized image. As shown in Figure 3, this figure is a schematic diagram of a binarized image provided in an embodiment of the present application. Among them, the white part in the binarized image represents the inside of the cell, and the black part represents the outside of the cell.

接着,可以将不属于细胞内部的测序数据过滤掉,从而得到属于细胞内部的测序数据。在一些示例中,通过二值化图像,即可确定细胞内部的区域,同时S101中已得到的测序数据中包括位置,然后将不属于细胞内部所对应的位置,从而测序数据中过滤掉,从而得到剩下属于细胞内部的测序数据。Next, the sequencing data that does not belong to the interior of the cell can be filtered out, thereby obtaining the sequencing data that belongs to the interior of the cell. In some examples, the area inside the cell can be determined by binarizing the image, and the sequencing data obtained in S101 includes the position, and then the position corresponding to the interior of the cell is filtered out from the sequencing data, thereby obtaining the remaining sequencing data that belongs to the interior of the cell.

S103、分析设备将细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果。S103, the analysis device compares the sequence value in the sequencing data inside the cell with the sequence value of the protein in the preset protein library to obtain a comparison result.

预先设定的蛋白库可以是基于时空蛋白组学技术筛选的部分可以检测蛋白的库。在一些实施例中,可以将上述表1中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,进而得到比对结果。其中比对结果可以包括匹配与不匹配两种情况。匹配的情况下,即可基于蛋白库确定该测序数据中序列值对应的蛋白质的种类,不匹配的情况下,表征不存在该序列值对应的蛋白质。The pre-set protein library can be a library that can detect proteins based on the screening of spatiotemporal proteomics technology. In some embodiments, the sequence values in Table 1 above can be compared with the sequence values of proteins in the pre-set protein library to obtain a comparison result. The comparison result can include two situations: matching and mismatching. In the case of matching, the type of protein corresponding to the sequence value in the sequencing data can be determined based on the protein library. In the case of mismatching, the protein corresponding to the sequence value is characterized as not existing.

在一些实施例中,分析设备可以将细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行匹配,得到匹配程度,若该匹配程度大于预设阈值,则确定细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值匹配的比对结果。其中,匹配程度可以通过百分比或其他方式表征。预设阈值可以是经验值或基于误差范围而确定的数值,例如可以是95%、90%等。In some embodiments, the analysis device can match the sequence value in the sequencing data inside the cell with the sequence value of the protein in the pre-set protein library to obtain a matching degree. If the matching degree is greater than a preset threshold, the comparison result that the sequence value in the sequencing data inside the cell matches the sequence value of the protein in the pre-set protein library is determined. The matching degree can be characterized by a percentage or other means. The preset threshold can be an empirical value or a value determined based on an error range, for example, it can be 95%, 90%, etc.

在一些实施例中,所述测序数据中的序列值包括耦接的初始碱基序列和标记碱基序列,所述标记碱基序列用于确定所述序列值对应的蛋白质的种类。In some embodiments, the sequence value in the sequencing data includes a coupled initial base sequence and a marker base sequence, and the marker base sequence is used to determine the type of protein corresponding to the sequence value.

其中,标记碱基序列可以是15bp的碱基序列,如下表2所示。The marker base sequence may be a 15 bp base sequence, as shown in Table 2 below.

表2:Table 2:

碱基序列Base sequence 种类type CTCATTGTAACTCCTCTCATTGTAACTCCT Human-CD3Human-CD3 GCGCAACTTGATGATGCGCAACTTGATGAT Human-CD8Human-CD8 CTGGGCAATTACTCGCTGGGCAATTACTCG Human-CD19Human-CD19 ACAGCGCCGTATTTAACAGCGCCGTATTTA Human-PD-1Human-PD-1 GTTGTCCGACAATACGTTGTCCGACAATAC Human-PD-L1Human-PD-L1 TTGCTTATTTCCGCATTGCTTATTTCCGCA Mouse-P2RY12Mouse-P2RY12

其中,表2为本申请实施例提供的一种ADT tag库的示例。Among them, Table 2 is an example of an ADT tag library provided in an embodiment of the present application.

在进行比对之前,分析设备可以在初始碱基序列的基础上耦接上述标记碱基序列,从而便于比对以及后续处理。Before performing the comparison, the analysis device can couple the above-mentioned marked base sequence on the basis of the initial base sequence, so as to facilitate the comparison and subsequent processing.

S104、若比对结果表征所述测序数据中的序列值与蛋白库中的蛋白质的序列值匹配,则分析设备获取蛋白表达的分析结果,分析结果包括相匹配的蛋白质的种类以及在细胞内 部的位置。S104. If the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, the analysis device obtains the analysis result of the protein expression, and the analysis result includes the type of the matched protein and the location inside the cell.

在一些实施例中,分析结果还可以包括每种蛋白质对应的表达丰度,基于此,分析设备还可以获取细胞内部的蛋白质的唯一分子标识,然后基于该细胞内部的蛋白质的唯一分子标识,对细胞内部的蛋白质进行去重处理,得到细胞内部的每种蛋白质对应的表达丰度。In some embodiments, the analysis results may also include the expression abundance corresponding to each protein. Based on this, the analysis device may also obtain the unique molecular identifier of the protein inside the cell, and then deduplicate the protein inside the cell based on the unique molecular identifier of the protein inside the cell to obtain the expression abundance corresponding to each protein inside the cell.

唯一分子标识是指用于标记蛋白质的标识,若某种蛋白质通过复制得到,则会存在相同的唯一分子标识,若两个蛋白质相同,但不是通过复制得到的,则这两个蛋白质的唯一分子标识不同。因此,基于该唯一分子标识,可以去掉PCR过程产生的重复的蛋白质,进而得到细胞内部的蛋白质的真实数量,基于该蛋白质的真实数量,得到该蛋白质的表达丰度。基于上述过程,分析设备可以得到蛋白表达的统计表格,如下表3所示。A unique molecular identifier is an identifier used to mark a protein. If a protein is obtained by replication, the same unique molecular identifier will exist. If two proteins are the same but not obtained by replication, the unique molecular identifiers of the two proteins are different. Therefore, based on the unique molecular identifier, the duplicate proteins generated by the PCR process can be removed to obtain the true number of proteins inside the cell. Based on the true number of proteins, the expression abundance of the protein can be obtained. Based on the above process, the analysis equipment can obtain a statistical table of protein expression, as shown in Table 3 below.

表3:table 3:

xx yy ADT tag编号ADT tag number UMI(唯一分子标识)UMI (Unique Molecular Identifier) ReadsReads 60456045 1865318653 1717 253298253298 33 2010320103 1622016220 22twenty two 686750686750 33 70527052 1701717017 2626 386504386504 33 1254612546 1836718367 3131 699050699050 33 1983919839 1500315003 1717 931045931045 33 98409840 1330513305 1717 643739643739 33

其中,x表示横坐标,y表示纵坐标,ADT tag编号用于表示不同的蛋白种类,用数字编码,UMI序列,即,唯一分子标识,reads的数目。每一行可以解释为,x坐标和y坐标对应的位置处,某个蛋白值在对应UMI序列下有多少条reads。Among them, x represents the horizontal axis, y represents the vertical axis, the ADT tag number is used to represent different protein types, and is encoded with numbers. The UMI sequence, that is, the unique molecular identifier, is the number of reads. Each row can be interpreted as the number of reads of a certain protein value under the corresponding UMI sequence at the position corresponding to the x-coordinate and the y-coordinate.

在分析设备完成上述去重处理后,即可得到准确的分析结果,其中,分析结果如下表4所示。After the analysis device completes the above deduplication processing, accurate analysis results can be obtained, wherein the analysis results are shown in Table 4 below.

表4:Table 4:

GeneIDGeneID xx yy MIDCountMIDCount Mouse-CD8aMouse-CD8a 1360713607 1427214272 11 Mouse-CD4Mouse-CD4 1251612516 79417941 11 Mouse-IgDMouse-IgD 1564615646 1004410044 11 Mouse-CD19Mouse-CD19 1271112711 68536853 11 Mouse-IgG2aMouse-IgG2a 1610016100 1275412754 22

其中,GeneID表示蛋白质的种类,x表示蛋白质的横坐标,y表示蛋白质的纵坐标,MIDCount表示蛋白质的表达丰度。Among them, GeneID represents the type of protein, x represents the horizontal coordinate of the protein, y represents the vertical coordinate of the protein, and MIDCount represents the expression abundance of the protein.

在一些实施例中,分析设备可以基于上述分析结果,生成蛋白原位表达量报告,然后通过显示装置,展示上述蛋白原位表达报告,以供用户参考。In some embodiments, the analysis device may generate a protein in situ expression report based on the above analysis results, and then display the above protein in situ expression report through a display device for user reference.

在一些实施例中,分析设备还可以基于蛋白表达的分析结果,生成蛋白表达热图,该蛋白表达热图用于描述细胞内部蛋白质的种类、位置以及表达丰度。在一些示例中,分析 设备可以通过网页投递生成蛋白表达热图的任务,进而得到蛋白表达热图。然后通过显示装置,展示该蛋白表达热图,以供用户参考。如图4所示,该图为本申请实施例提供的一种蛋白表达热图的示意图。In some embodiments, the analysis device can also generate a protein expression heat map based on the analysis results of protein expression, which is used to describe the type, position and expression abundance of proteins inside the cell. In some examples, the analysis device can deliver the task of generating a protein expression heat map through a web page, and then obtain a protein expression heat map. Then, through a display device, the protein expression heat map is displayed for user reference. As shown in Figure 4, the figure is a schematic diagram of a protein expression heat map provided in an embodiment of the present application.

为了便于理解,下面对本申请实施例提供的时空蛋白组学分析方法进行介绍,如图5所示,该图为本申请实施例提供的一种时空蛋白组学分析方法的示意图。For ease of understanding, the spatiotemporal proteomic analysis method provided in an embodiment of the present application is introduced below, as shown in FIG5 , which is a schematic diagram of a spatiotemporal proteomic analysis method provided in an embodiment of the present application.

其中,“1”表示基于测序芯片,得到时空测序芯片序列以及位置坐标文件,该测序芯片可以是1cm*1cm的芯片,该测序芯片可以包括很多个点,每个店对应探针,探针的序列是随机的,每个位置处的探针均会对应有一个序列。Among them, "1" means that based on the sequencing chip, the spatiotemporal sequencing chip sequence and position coordinate file are obtained. The sequencing chip can be a 1cm*1cm chip. The sequencing chip can include many points, each store corresponds to a probe, the sequence of the probe is random, and the probe at each position will correspond to a sequence.

“2”表示细胞核染色后显微拍摄后得到的图像。组织切片可以是生物样品的切片,对该组织切片进行单链DNA染色,然后进行显微镜拍照。"2" indicates an image obtained after microscopic photography after cell nucleus staining. The tissue section may be a section of a biological sample, which is stained with single-stranded DNA and then photographed under a microscope.

“3”表示抗体偶联序列测序数据文件,蛋白质和mRNA分别对应一个.fastq文件,探针可以同时捕获mRNA和蛋白质的序列。"3" indicates the antibody-coupled sequence sequencing data file. The protein and mRNA correspond to a .fastq file respectively, and the probe can capture the sequences of mRNA and protein at the same time.

“4”表示偶联了15bp碱基序列的抗体种类库,用来进行比对,确定蛋白质和mRNA的种类。“4” indicates an antibody library coupled with a 15 bp base sequence, which is used for comparison and determination of the types of proteins and mRNAs.

“5”表示蛋白表达的统计文件,参见上述表3。“5” indicates the statistical file of protein expression, see Table 3 above.

“6”表示与表达矩阵配准后的显微拍照图。“6” indicates the microscopic photograph after registration with the expression matrix.

“7”表示根据ssDNA图进行组织分割后生成的二值化图像。细胞内部是白色,细胞外部是黑色。"7" indicates a binary image generated by tissue segmentation based on the ssDNA map. The inside of the cell is white, and the outside of the cell is black.

“8”表示细胞分割后的表达量文件。“8” indicates the expression level file after cell segmentation.

“9”表示根据配准后的显微拍照图进行细胞分割,细胞内部值为1,细胞外部值为0。用0,1填充后的图像文件。"9" means cell segmentation based on the registered microscopic photographs, with the value inside the cell being 1 and the value outside the cell being 0. Image file filled with 0 and 1.

“10”表示组织内部的表达量文件,参见上述表4。“10” indicates the expression level file within the tissue, see Table 4 above.

需要说明的是,时空蛋白组学技术主要分为三个部分。第一,基于时空测序芯片,对时空测序芯片进行第一次测序,获得芯片坐标mask文件(如上表1),其中第一列是25bp barcode序列,第二列是测序芯片x坐标,第三列是测序芯片y坐标;第二,影像学显微拍照。对组织切片进行ssDNA染色,然后显微拍照。分析设备可以获得组织上细胞的细胞核成像,也称ssDNA图(如图2),后续根据ssDNA图上显示的细胞核的位置对细胞进行分割。第三,对时空测序芯片捕获到的mRNA以及ADT tag进行测序,其中ADT tag是用来标记蛋白的序列,即15bp的碱基序列。每种蛋白耦连一段15bp的碱基序列,时空蛋白组学技术筛选了部分可以检测的蛋白的库。记录了相应的蛋白以及耦连的15bp碱基序列(如表2)。ADT tag测序数据fastq文件Read1 35bp,其中1-25为芯片barcode序列。26-35为UMI序列。Read2 15bp,为ADT tag序列。It should be noted that spatiotemporal proteomics technology is mainly divided into three parts. First, based on the spatiotemporal sequencing chip, the spatiotemporal sequencing chip is sequenced for the first time to obtain the chip coordinate mask file (as shown in Table 1 above), in which the first column is the 25bp barcode sequence, the second column is the sequencing chip x coordinate, and the third column is the sequencing chip y coordinate; second, imaging microscopy photography. The tissue sections are stained with ssDNA and then photographed microscopically. The analysis equipment can obtain the nucleus imaging of cells on the tissue, also known as the ssDNA map (as shown in Figure 2), and then the cells are segmented according to the position of the nucleus shown on the ssDNA map. Third, the mRNA and ADT tag captured by the spatiotemporal sequencing chip are sequenced, where the ADT tag is a sequence used to mark the protein, that is, a 15bp base sequence. Each protein is coupled with a 15bp base sequence, and the spatiotemporal proteomics technology screens a library of some detectable proteins. The corresponding protein and the coupled 15bp base sequence are recorded (as shown in Table 2). ADT tag sequencing data fastq file Read1 35bp, where 1-25 are chip barcode sequences. 26-35 are UMI sequences. Read2 15bp, which is the ADT tag sequence.

基于上述三个部分,可以得到芯片坐标mask文件,单链DNA图以及ADT tag的fastq文件,接着,基于上述芯片坐标mask文件,单链DNA图以及ADT tag的fastq文件对蛋白的表达情况进行分析。Based on the above three parts, we can get the chip coordinate mask file, single-stranded DNA map and ADT tag fastq file. Then, the protein expression is analyzed based on the above chip coordinate mask file, single-stranded DNA map and ADT tag fastq file.

S201:分析设备将ADT tag的fastq文件的Read1 1-25bp碱基序列与芯片坐标mask文件的第一列芯片barcode序列进行比对,然后获取ADT tag序列在芯片的原位坐标,其中, 比对过程允许1bp的错配。S201: The analysis device compares the Read1 1-25bp base sequence of the ADT tag fastq file with the first column of the chip barcode sequence of the chip coordinate mask file, and then obtains the in situ coordinates of the ADT tag sequence on the chip, wherein the comparison process allows a 1bp mismatch.

S202:分析设备将ADT tag的fastq文件的Read2 15bp碱基序列与经过筛选的可以检测的蛋白的库进行比对,进而获得蛋白质的种类,然后结合芯片坐标mask文件可以获得某种蛋白质在芯片的位置。S202: The analysis device compares the Read2 15bp base sequence of the ADT tag fastq file with the screened library of detectable proteins to obtain the type of protein. The location of a certain protein on the chip can then be obtained by combining it with the chip coordinate mask file.

S203:分析设备去重/ADT count,并计算蛋白在原位的表达丰度。即对检测到的ADT tag分子,去掉PCR导致的重复(可以通过UMI进行去重),得到该蛋白质在具体位置的表达丰度。S203: Analyze the device to remove duplicates/ADT counts and calculate the in situ expression abundance of the protein. That is, for the detected ADT tag molecules, remove the duplications caused by PCR (which can be removed by UMI) to obtain the expression abundance of the protein at a specific location.

本申请实施例提供了一种实验示例,下面进行介绍。The embodiment of the present application provides an experimental example, which is introduced below.

生物样品为小鼠胸腺样品。通过上述实施例中的步骤,得到ADT tag的fastq文件read1与芯片mask文件,然后进行比对,得到Total Reads(原始测序reads总数)以及Valid CID Reads(Read1的1-25bp barcode序列称为CID。有效CID Reads数,Total Reads中的read的barcode能与芯片的barcode相匹配的reads数。即:可以在芯片上找到对应位置的reads数)。The biological sample is a mouse thymus sample. Through the steps in the above embodiment, the fastq file read1 of the ADT tag and the chip mask file are obtained, and then compared to obtain Total Reads (the total number of original sequencing reads) and Valid CID Reads (the 1-25bp barcode sequence of Read1 is called CID. The number of valid CID Reads is the number of reads whose barcode of the read in Total Reads can match the barcode of the chip. That is: the number of reads that can find the corresponding position on the chip).

将ADT tag的fastq文件read2与ADT tag database进行比对。得到Valid PID Reads(Read2的1-15bp序列为PID序列。有效PID reads数,在芯片上找到对应位置的reads,与偶联了15bp核酸的抗体种类库比对上的reads)以及数据基本质量情况。其中本例子,蛋白数据经过质控后有72.15%用于后续的定性和定量分析。Compare the ADT tag fastq file read2 with the ADT tag database. Get the Valid PID Reads (the 1-15bp sequence of Read2 is the PID sequence. The number of valid PID reads, the reads at the corresponding position found on the chip, and the reads compared with the antibody library coupled with 15bp nucleic acid) and the basic quality of the data. In this example, 72.15% of the protein data was used for subsequent qualitative and quantitative analysis after quality control.

如图6所示,该图为本申请实施例提供的一种数据基本统计示意图。图中英文解释可以参见如下表5。As shown in Figure 6, this figure is a basic statistical diagram of data provided by the embodiment of the present application. The English explanation in the figure can be found in Table 5 below.

表5:table 5:

Figure PCTCN2022144141-appb-000001
Figure PCTCN2022144141-appb-000001

Figure PCTCN2022144141-appb-000002
Figure PCTCN2022144141-appb-000002

然后,根据显微镜拍照的单链DNA图进行组织分割和细胞分割,其中单链DNA图可以参见图2或图3。接着,基于时空芯片的binsize以及细胞分割后cell的水平统计蛋白数据的捕获情况,如图7所示,该图为本申请实施例提供的一种蛋白数据捕获情况的示意图。其中,英文解释可以参见如下表6。Then, tissue segmentation and cell segmentation are performed according to the single-stranded DNA image taken by the microscope, where the single-stranded DNA image can be seen in Figure 2 or Figure 3. Next, the capture of protein data is statistically analyzed based on the binsize of the spatiotemporal chip and the level of the cell after cell segmentation, as shown in Figure 7, which is a schematic diagram of a protein data capture provided in an embodiment of the present application. The English explanation can be seen in Table 6 below.

表6:Table 6:

Figure PCTCN2022144141-appb-000003
Figure PCTCN2022144141-appb-000003

基于UMI对捕获的reads去重,计算UMI的种类得出蛋白质的表达量,并统计饱和度。如图8所示,该图为本申请实施例提供的一种饱和度统计图。其中,图8(a)的横坐标表示测序数据量,纵坐标表示饱和度;图8(b)的横坐标表示测序数据量,纵坐标表示每个bin下唯一的reads数。Based on the UMI, the captured reads are deduplicated, the types of UMI are calculated to obtain the expression of the protein, and the saturation is counted. As shown in Figure 8, this figure is a saturation statistical graph provided in an embodiment of the present application. Among them, the horizontal axis of Figure 8 (a) represents the amount of sequencing data, and the vertical axis represents the saturation; the horizontal axis of Figure 8 (b) represents the amount of sequencing data, and the vertical axis represents the number of unique reads under each bin.

统计蛋白种类以及整体UMI的占比情况,如图9所示,该图为本申请实施例提供的一种蛋白质种类的占比示意图。The proportion of protein types and overall UMIs is statistically analyzed, as shown in FIG9 , which is a schematic diagram of the proportion of protein types provided in an embodiment of the present application.

基于上述内容描述,本申请实施例提供了一种蛋白表达分析方法,该方法包括:获取测序芯片对生物样品进行测序的测序数据,所述测序数据包括序列值和所述序列值所在的位置;根据所述生物样品的单链DNA染色图像,确定属于细胞内部的测序数据;将所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果;若所述比对结果表征所述测序数据中的序列值与所述蛋白库中的蛋白质的序列值匹配,则获取蛋白表达的分析结果,所述分析结果包括相匹配的蛋白质的种类以及在所述细胞内部的位置。该方法通过测序、染色、与蛋白库比对的方式,对细胞内部的蛋白表达进行了综合分析,能够得到细胞内部的蛋白表达情况,如细胞内部蛋白质的种类、位置、表达丰度等。Based on the above description, the embodiment of the present application provides a protein expression analysis method, which includes: obtaining sequencing data of sequencing a biological sample by a sequencing chip, the sequencing data including a sequence value and the position of the sequence value; determining the sequencing data belonging to the inside of the cell according to the single-stranded DNA staining image of the biological sample; comparing the sequence value in the sequencing data inside the cell with the sequence value of the protein in a pre-set protein library to obtain a comparison result; if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, then obtaining the analysis result of protein expression, the analysis result including the type of matched protein and the position inside the cell. The method comprehensively analyzes the protein expression inside the cell by sequencing, staining, and comparison with the protein library, and can obtain the protein expression inside the cell, such as the type, position, and expression abundance of the protein inside the cell.

本申请实施例还提供了一种蛋白表达分析装置,如图10所示,该图为本申请实施例提供的一种蛋白表达分析装置的示意图,该装置包括:The present application also provides a protein expression analysis device, as shown in FIG10 , which is a schematic diagram of a protein expression analysis device provided in the present application, the device comprising:

获取模块1001,用于获取测序芯片对生物样品进行测序的测序数据,所述测序数据包括序列值和所述序列值所在的位置;An acquisition module 1001 is used to acquire sequencing data of a biological sample sequenced by a sequencing chip, wherein the sequencing data includes a sequence value and a position of the sequence value;

确定模块1002,用于根据所述生物样品的单链DNA染色图像,确定属于细胞内部的测序数据;A determination module 1002 is used to determine the sequencing data belonging to the interior of the cell according to the single-stranded DNA staining image of the biological sample;

比对模块1003,用于将所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果;A comparison module 1003 is used to compare the sequence value in the sequencing data inside the cell with the sequence value of the protein in the preset protein library to obtain a comparison result;

分析模块1004,用于若所述比对结果表征所述测序数据中的序列值与所述蛋白库中的蛋白质的序列值匹配,则,获取蛋白表达的分析结果,所述分析结果包括相匹配的蛋白质的种类以及在所述细胞内部的位置。The analysis module 1004 is used to obtain the analysis result of protein expression if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, and the analysis result includes the type of the matched protein and the location inside the cell.

可选的,所述分析结果还包括:每种蛋白质对应的表达丰度;所述获取模块1001,还用于获取所述细胞内部的蛋白质的唯一分子标识;所述分析模块,还用于根据所述细胞内部的蛋白质的唯一分子标识,对所述细胞内部的蛋白质进行去重处理,得到所述细胞内部的每种蛋白质对应的表达丰度。Optionally, the analysis result also includes: the expression abundance corresponding to each protein; the acquisition module 1001 is also used to obtain the unique molecular identifier of the protein inside the cell; the analysis module is also used to deduplicate the protein inside the cell according to the unique molecular identifier of the protein inside the cell to obtain the expression abundance corresponding to each protein inside the cell.

可选的,所述比对模块1003,具体用于若所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值的匹配程度大于预设阈值,则确定所述细胞内部的测序数据中的序列值与所述预先设定的蛋白库中蛋白质的序列值匹配的比对结果。Optionally, the comparison module 1003 is specifically used to determine a comparison result that the sequence value in the sequencing data inside the cell matches the sequence value of the protein in the preset protein library if the degree of matching between the sequence value in the sequencing data inside the cell and the sequence value of the protein in the preset protein library is greater than a preset threshold.

可选的,所述测序数据中的序列值包括耦接的初始碱基序列和标记碱基序列,所述标记碱基序列用于确定所述序列值对应的蛋白质的种类。Optionally, the sequence value in the sequencing data includes a coupled initial base sequence and a marked base sequence, and the marked base sequence is used to determine the type of protein corresponding to the sequence value.

可选的,所述装置还包括展示模块,所述分析模块1004,还用于根据所述蛋白表达的分析结果,生成蛋白表达热图,所述蛋白表达热图用于描述所述细胞内部蛋白质的种类、位置以及表达丰度;Optionally, the device further includes a display module, and the analysis module 1004 is further used to generate a protein expression heat map according to the analysis results of the protein expression, wherein the protein expression heat map is used to describe the type, location and expression abundance of the protein inside the cell;

所述展示模块,用于展示所述蛋白表达热图。The display module is used to display the protein expression heat map.

可选的,所述分析模块1004,还用于根据所述分析结果,生成蛋白原位表达量报告;所述展示模块,还用于展示所述蛋白原位表达量报告。Optionally, the analysis module 1004 is further used to generate a protein in situ expression level report according to the analysis results; the display module is further used to display the protein in situ expression level report.

本申请实施例还提供了一种设备,该设备包括:处理器和存储器;The embodiment of the present application also provides a device, which includes: a processor and a memory;

其中,在所述存储器中存储有一个或多个计算机程序,所述一个或多个计算机程序包括指令;当所述指令被所述处理器执行时,使得所述电子设备执行如方法实施例中任一项所述的方法。One or more computer programs are stored in the memory, and the one or more computer programs include instructions; when the instructions are executed by the processor, the electronic device executes a method as described in any one of the method embodiments.

本申请实施例还提供了一种计算机存储介质,该计算机存储介质包括计算机指令,当所述计算机指令在电子设备上运行时,所述电子设备执行如方法实施例中任一项所述的方法。An embodiment of the present application further provides a computer storage medium, which includes computer instructions. When the computer instructions are executed on an electronic device, the electronic device executes a method as described in any one of the method embodiments.

本申请实施例还提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,所述计算机执行如方法实施例中任一项所述的方法。An embodiment of the present application further provides a computer program product. When the computer program product is run on a computer, the computer executes a method as described in any one of the method embodiments.

另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。It should also be noted that the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. In addition, in the drawings of the device embodiments provided by the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.

通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation mode, the technicians in the field can clearly understand that the present application can be implemented by means of software plus necessary general hardware, and of course, it can also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components, etc. In general, all functions completed by computer programs can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be various, such as analog circuits, digital circuits or special circuits. However, for the present application, software program implementation is a better implementation mode in more cases. Based on such an understanding, the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented by software, all or part of the embodiments may be implemented in the form of a computer program product.

所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from one website, computer, training equipment, or data center to another website, computer, training equipment, or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device, data center, etc. that includes one or more available media integrated. The available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)).

Claims (10)

一种蛋白表达分析方法,其特征在于,包括:A protein expression analysis method, characterized by comprising: 获取测序芯片对生物样品进行测序的测序数据,所述测序数据包括序列值和所述序列值对应蛋白质的所在位置;Acquire sequencing data of a biological sample sequenced by a sequencing chip, wherein the sequencing data includes a sequence value and a location of a protein corresponding to the sequence value; 根据所述生物样品的单链DNA染色图像,确定属于细胞内部的测序数据;Determining sequencing data belonging to the interior of the cell according to the single-stranded DNA staining image of the biological sample; 将所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果;Comparing the sequence value in the sequencing data inside the cell with the sequence value of the protein in a preset protein library to obtain a comparison result; 若所述比对结果表征所述测序数据中的序列值与所述蛋白库中的蛋白质的序列值匹配,则获取蛋白表达的分析结果,所述分析结果包括相匹配的蛋白质的种类以及在所述细胞内部的位置。If the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, then the analysis result of protein expression is obtained, and the analysis result includes the type of the matched protein and the location inside the cell. 根据权利要求1所述的方法,其特征在于,所述分析结果还包括:每种蛋白质对应的表达丰度;所述方法还包括:The method according to claim 1, characterized in that the analysis result further includes: the expression abundance corresponding to each protein; the method further includes: 获取所述细胞内部的蛋白质的唯一分子标识;Obtaining a unique molecular identifier of a protein inside the cell; 根据所述细胞内部的蛋白质的唯一分子标识,对所述细胞内部的蛋白质进行去重处理,得到所述细胞内部的每种蛋白质对应的表达丰度。According to the unique molecular identifier of the protein inside the cell, the protein inside the cell is deduplicated to obtain the expression abundance corresponding to each protein inside the cell. 根据权利要求1或2所述的方法,其特征在于,所述将所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值进行比对,得到比对结果,包括:The method according to claim 1 or 2, characterized in that the comparing the sequence value in the sequencing data inside the cell with the sequence value of the protein in a preset protein library to obtain the comparison result comprises: 若所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质的序列值的匹配程度大于预设阈值,则确定所述细胞内部的测序数据中的序列值与所述预先设定的蛋白库中蛋白质的序列值匹配的比对结果。If the degree of match between the sequence value in the sequencing data inside the cell and the sequence value of the protein in the preset protein library is greater than a preset threshold, a comparison result is determined that the sequence value in the sequencing data inside the cell matches the sequence value of the protein in the preset protein library. 根据权利要求1所述的方法,其特征在于,所述测序数据中的序列值包括耦接的初始碱基序列和标记碱基序列,所述标记碱基序列用于确定所述序列值对应的蛋白质的种类。The method according to claim 1 is characterized in that the sequence value in the sequencing data includes a coupled initial base sequence and a marked base sequence, and the marked base sequence is used to determine the type of protein corresponding to the sequence value. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, characterized in that the method further comprises: 根据所述蛋白表达的分析结果,生成蛋白表达热图,所述蛋白表达热图用于描述所述细胞内部蛋白质的种类、位置以及表达丰度;Generate a protein expression heat map according to the analysis results of the protein expression, wherein the protein expression heat map is used to describe the type, location and expression abundance of the protein inside the cell; 展示所述蛋白表达热图。The protein expression heat map is displayed. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 5, characterized in that the method further comprises: 根据所述分析结果,生成蛋白原位表达量报告;Generate a protein in situ expression report based on the analysis results; 展示所述蛋白原位表达量报告。A report on the in situ expression level of the protein is displayed. 一种蛋白表达分析装置,其特征在于,包括:A protein expression analysis device, comprising: 获取模块,用于获取测序芯片对生物样品进行测序的测序数据,所述测序数据包括序列值和所述序列值对应蛋白质的所在位置;An acquisition module, used to acquire sequencing data of a biological sample sequenced by a sequencing chip, wherein the sequencing data includes a sequence value and a location of a protein corresponding to the sequence value; 确定模块,用于根据所述生物样品的单链DNA染色图像,确定属于细胞内部的测序数据;A determination module, used to determine the sequencing data belonging to the interior of the cell according to the single-stranded DNA staining image of the biological sample; 比对模块,用于将所述细胞内部的测序数据中的序列值与预先设定的蛋白库中蛋白质 的序列值进行比对,得到比对结果;A comparison module, used to compare the sequence value in the sequencing data inside the cell with the sequence value of the protein in the preset protein library to obtain a comparison result; 分析模块,用于若所述比对结果表征所述测序数据中的序列值与所述蛋白库中的蛋白质的序列值匹配,则获取蛋白表达的分析结果,所述分析结果包括相匹配的蛋白质的种类以及在所述细胞内部的位置。The analysis module is used to obtain the analysis result of protein expression if the comparison result indicates that the sequence value in the sequencing data matches the sequence value of the protein in the protein library, wherein the analysis result includes the type of the matched protein and the position inside the cell. 一种设备,其特征在于,包括:处理器和存储器;A device, comprising: a processor and a memory; 其中,在所述存储器中存储有一个或多个计算机程序,所述一个或多个计算机程序包括指令;当所述指令被所述处理器执行时,使得所述电子设备执行如权利要求1-6中任一项所述的方法。One or more computer programs are stored in the memory, and the one or more computer programs include instructions; when the instructions are executed by the processor, the electronic device executes the method as described in any one of claims 1 to 6. 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在电子设备上运行时,所述电子设备执行如权利要求1-6中任一项所述的方法。A computer storage medium, characterized in that it includes computer instructions, and when the computer instructions are executed on an electronic device, the electronic device executes the method as described in any one of claims 1-6. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,所述计算机执行如权利要求1-6中任一项所述的方法。A computer program product, characterized in that when the computer program product is run on a computer, the computer executes the method according to any one of claims 1 to 6.
PCT/CN2022/144141 2022-12-30 2022-12-30 Protein expression analysis method and related device Ceased WO2024138680A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/144141 WO2024138680A1 (en) 2022-12-30 2022-12-30 Protein expression analysis method and related device
CN202280102034.4A CN120266210A (en) 2022-12-30 2022-12-30 A protein expression analysis method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/144141 WO2024138680A1 (en) 2022-12-30 2022-12-30 Protein expression analysis method and related device

Publications (1)

Publication Number Publication Date
WO2024138680A1 true WO2024138680A1 (en) 2024-07-04

Family

ID=91716238

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/144141 Ceased WO2024138680A1 (en) 2022-12-30 2022-12-30 Protein expression analysis method and related device

Country Status (2)

Country Link
CN (1) CN120266210A (en)
WO (1) WO2024138680A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006294014A (en) * 2005-03-16 2006-10-26 Kumamoto Technology & Industry Foundation Analysis program, protein chip, method for manufacturing protein chip and antibody cocktail
CN113470743A (en) * 2021-07-16 2021-10-01 哈尔滨星云医学检验所有限公司 Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data
US20220010367A1 (en) * 2019-02-28 2022-01-13 10X Genomics, Inc. Profiling of biological analytes with spatially barcoded oligonucleotide arrays
US20220205035A1 (en) * 2019-04-05 2022-06-30 Board Of Regents, The University Of Texas System Methods and applications for cell barcoding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006294014A (en) * 2005-03-16 2006-10-26 Kumamoto Technology & Industry Foundation Analysis program, protein chip, method for manufacturing protein chip and antibody cocktail
US20220010367A1 (en) * 2019-02-28 2022-01-13 10X Genomics, Inc. Profiling of biological analytes with spatially barcoded oligonucleotide arrays
US20220205035A1 (en) * 2019-04-05 2022-06-30 Board Of Regents, The University Of Texas System Methods and applications for cell barcoding
CN113470743A (en) * 2021-07-16 2021-10-01 哈尔滨星云医学检验所有限公司 Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN AO; LIAO SHA; CHENG MENGNAN; MA KAILONG; WU LIANG; LAI YIWEI; QIU XIAOJIE; YANG JIN; XU JIANGSHAN; HAO SHIJIE; WANG XIN; LU H: "Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays", CELL, ELSEVIER, AMSTERDAM NL, vol. 185, no. 10, 4 May 2022 (2022-05-04), Amsterdam NL , pages 1777, XP087052552, ISSN: 0092-8674, DOI: 10.1016/j.cell.2022.04.003 *
HSU JUSTINE, JARROUX JULIEN, JOGLEKAR ANOUSHKA, ROMERO JUAN P., NEMEC COREY, REYES DANIEL, ROYALL ARIEL, HE YI, BELCHIKOV NATAN, L: "Comparing 10x Genomics single-cell 3’ and 5’ assay in short-and long-read sequencing", BIORXIV, 28 October 2022 (2022-10-28), XP093185577, DOI: 10.1101/2022.10.27.514084 *

Also Published As

Publication number Publication date
CN120266210A (en) 2025-07-04

Similar Documents

Publication Publication Date Title
Park et al. Spatial omics technologies at multimodal and single cell/subcellular level
US20240257914A1 (en) Method and system for 3d reconstruction of tissue gene expression data
Moses et al. Museum of spatial transcriptomics
Hildebrandt et al. Spatial Transcriptomics to define transcriptional patterns of zonation and structural components in the mouse liver
Llorens-Bobadilla et al. Solid-phase capture and profiling of open chromatin by spatial ATAC
Zhao et al. Modeling zero inflation is not necessary for spatial transcriptomics
Combs et al. Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome-wide spatial patterns of gene expression
CN117036248A (en) System and method for identifying morphological patterns in tissue samples
CN108920899B (en) A single exon copy number variation prediction method based on target region sequencing
JP6839268B2 (en) Somatic copy number polymorphism detection
Graf et al. FLINO: a new method for immunofluorescence bioimage normalization
CN117476112A (en) Chromosome automatic analysis method
Secci et al. Quantitative analysis of gene expression in RNAscope-processed brain tissue
Martin et al. Vesalius: high‐resolution in silico anatomization of spatial transcriptomic data using image analysis
Larsen et al. Cellular 3D-reconstruction and analysis in the human cerebral cortex using automatic serial sections
Roberts et al. Transcriptome-wide spatial RNA profiling maps the cellular architecture of the developing human neocortex
Cao et al. Automated sarcomere structure analysis for studying cardiotoxicity in human pluripotent stem cell-derived cardiomyocytes
Krupa et al. NuMorph: Tools for cortical cellular phenotyping in tissue-cleared whole-brain images
Mohammadi et al. Size matters: the impact of nucleus size on results from spatial transcriptomics
CN111370065B (en) Method and device for detecting cross-sample contamination rate of RNA
Huynh et al. Deconvolution of cell types and states in spatial multiomics utilizing TACIT
Huynh et al. Spatial Deconvolution of Cell Types and Cell States at Scale Utilizing TACIT
WO2024138680A1 (en) Protein expression analysis method and related device
CN111508559B (en) Method and device for detecting CNV in target area
JP5895613B2 (en) Determination method, determination apparatus, determination system, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22969822

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202280102034.4

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 202280102034.4

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE