CN111778336A - Gene marker combination and application for comprehensive quantitative assessment of tumor microenvironment - Google Patents
Gene marker combination and application for comprehensive quantitative assessment of tumor microenvironment Download PDFInfo
- Publication number
- CN111778336A CN111778336A CN202010718484.9A CN202010718484A CN111778336A CN 111778336 A CN111778336 A CN 111778336A CN 202010718484 A CN202010718484 A CN 202010718484A CN 111778336 A CN111778336 A CN 111778336A
- Authority
- CN
- China
- Prior art keywords
- weight value
- gene
- tumor microenvironment
- value
- tumor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/20—Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Medical Informatics (AREA)
- Zoology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Pathology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Immunology (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Public Health (AREA)
- Microbiology (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Hospice & Palliative Care (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
技术领域technical field
本发明属于分子生物医学领域,特别涉及肿瘤微环境的综合量化评估。The invention belongs to the field of molecular biomedicine, and particularly relates to comprehensive quantitative assessment of tumor microenvironment.
背景技术Background technique
目前,对肿瘤的病理研究、治疗肿瘤的药物和方案研究均是针对肿瘤细胞本身,比如病理研究做病理切片染色、绘制肿瘤基因突变图谱,治疗肿瘤的放化疗、靶向治疗,但是对肿瘤细胞赖以生存的肿瘤微环境研究明显滞后。英国外科医生Stephen Paget在1889年提出肿瘤转移的“种子与土壤”假说,可谓肿瘤微环境之滥觞。但是这种假说一直没有得到科学的证实,直到PD-1/PD-L1免疫检查点抑制剂在临床上的广泛应用,再回顾抗血管靶向治疗,才使得对肿瘤微环境的研究重新得到重视。即便到现在,对肿瘤微环境的研究仍然是不系统的,事实上目前的肿瘤微环境研究更多的停留在概念层面,无法准确具体描述和量化评估。一些研究也只是零散地涉及与肿瘤微环境有关系的部分因素,如根据肿瘤免疫反应相关基因的表达情况推算肿瘤免疫细胞的组成。这类研究过于专一、没有全面的反应肿瘤微环境的真实状态,导致研究结果不尽相同甚至截然相反,使肿瘤微环境很难作为一种标的物/指标应用到临床实践中。肿瘤微环境是一个复杂的系统(分子细胞成分复杂、涉及因素五花八门),能否对其进行全面的精确的评估、如何评估关系到肿瘤微环境能否在临床科研和临床应用中发挥价值。目前尚未出现对肿瘤微环境状态进行综合评估的方法和方案,本发明创造性提出通过对肿瘤微环境分解归纳为四个构成因素,筛选出基因标志物组合对肿瘤微环境进行综合量化评估,使我们更具体地认识肿瘤微环境,更精准地测量肿瘤微环境,并以此应用到临床实践过程中,带来临床价值。At present, the pathological research of tumors, the research on drugs and programs for the treatment of tumors are all aimed at the tumor cells themselves, such as staining of pathological sections for pathological research, drawing of tumor gene mutation maps, radiotherapy and chemotherapy, and targeted therapy for the treatment of tumors. The research on the tumor microenvironment on which survival depends is obviously lagging behind. British surgeon Stephen Paget put forward the "seed and soil" hypothesis of tumor metastasis in 1889, which can be described as the origin of the tumor microenvironment. However, this hypothesis has not been scientifically confirmed. It was not until the widespread clinical application of PD-1/PD-L1 immune checkpoint inhibitors and the review of anti-vascular targeted therapy that the research on the tumor microenvironment was re-emphasized. . Even now, the research on the tumor microenvironment is still not systematic. In fact, the current research on the tumor microenvironment is more at the conceptual level, unable to accurately describe and quantitatively evaluate it. Some studies only sporadically involve some factors related to the tumor microenvironment, such as inferring the composition of tumor immune cells based on the expression of tumor immune response-related genes. Such studies are too specific and do not comprehensively reflect the real state of the tumor microenvironment, resulting in inconsistent or even diametrically opposite research results, making it difficult to apply the tumor microenvironment as a target/indicator to clinical practice. The tumor microenvironment is a complex system (with complex molecular and cellular components and various factors involved). Whether it can be comprehensively and accurately evaluated and how to evaluate it is related to whether the tumor microenvironment can play a role in clinical research and clinical applications. There is no method and scheme for comprehensively evaluating the state of the tumor microenvironment. The present invention creatively proposes to decompose the tumor microenvironment into four constituent factors, and screen out a combination of gene markers to comprehensively and quantitatively evaluate the tumor microenvironment, so that we can A more specific understanding of the tumor microenvironment, more accurate measurement of the tumor microenvironment, and application of this to clinical practice, bringing clinical value.
发明内容SUMMARY OF THE INVENTION
本发明解决肿瘤微环境的综合评估和量化评估问题,从而使得肿瘤微环境作为一种可测量的生物标志物标的能够在临床实践中更好地发挥作用。The invention solves the problem of comprehensive evaluation and quantitative evaluation of the tumor microenvironment, so that the tumor microenvironment can play a better role in clinical practice as a measurable biomarker.
根据本发明的其中一方面,提供了用于肿瘤微环境综合量化评估的基因标志物组合,该基因标志物组合包括血管增生指标基因标志物、趋化因子表达指标基因标志物、免疫细胞浸润指标基因标志物和肿瘤生长与侵袭指标基因标志物。According to one aspect of the present invention, a combination of gene markers for comprehensive quantitative assessment of tumor microenvironment is provided, the combination of gene markers includes an angiogenesis index gene marker, a chemokine expression index gene marker, and an immune cell infiltration index Gene markers and tumor growth and invasion markers gene markers.
根据本发明的其中一方面,还提供了用于肿瘤微环境综合量化评估的基因标志物组合,上述的血管增生指标基因标志物包括ANG、ANGPT1、ANGPT2、ANGPTL4、DLL4、EDN1、FGF1、FGF2、FLT1、HIF1A、PDGFB、SERPINB5、TYMP、VEGFA、VEGFB和VEGFC共16基因;According to one aspect of the present invention, a combination of gene markers for comprehensive quantitative assessment of tumor microenvironment is also provided. FLT1, HIF1A, PDGFB, SERPINB5, TYMP, VEGFA, VEGFB and VEGFC total 16 genes;
上述趋化因子表达指标基因标志物包括CCL1、CCL2、CCL3、CCL4、CCL5、CCL8、CCL18、CCL19、CCL21、CXCL1、CXCL2、CXCL3、CXCL8、CXCL9、CXCL10、CXCL11、CXCL12、CXCL13共18基因;The above-mentioned chemokine expression index gene markers include CCL1, CCL2, CCL3, CCL4, CCL5, CCL8, CCL18, CCL19, CCL21, CXCL1, CXCL2, CXCL3, CXCL8, CXCL9, CXCL10, CXCL11, CXCL12, CXCL13 a total of 18 genes;
上述免疫细胞浸润指标基因标志物:IDO1、HLA-DRA、STAT1、IFNG、PRF1、GZMA、GZMB、NKG7、GZMH、KLRK1、KLRB1、KLRD1、CTSW、GNLY、CD14、CD15、CD19、CD68、CD163、CD33、CEACAM8、CD80、CD86、BATF3、TNFRSF17、CD20、TNFRSF4、CD4、TNFRSF9、CD8A、CD8B、LAG3、CD39、CXCR5、TBX21、FOXP3、CD45RO共37基因;The above-mentioned immune cell infiltration index gene markers: IDO1, HLA-DRA, STAT1, IFNG, PRF1, GZMA, GZMB, NKG7, GZMH, KLRK1, KLRB1, KLRD1, CTSW, GNLY, CD14, CD15, CD19, CD68, CD163, CD33 , CEACAM8, CD80, CD86, BATF3, TNFRSF17, CD20, TNFRSF4, CD4, TNFRSF9, CD8A, CD8B, LAG3, CD39, CXCR5, TBX21, FOXP3, CD45RO a total of 37 genes;
上述肿瘤生长与侵袭指标基因标志物:CDH1、CTNNB1、EPCAM、ITGAM、ITGAV、ITGAX、MACC1、MMP1、MMP2、MMP3、MMP9、MMP11、MMP13、MMP14、MKI67、MYC、PLAU、RAN、SNAI1、SNAI2、TIMP1、TNC、TWIST1、ZEB1、ZEB2共25基因。The above tumor growth and invasion index gene markers: CDH1, CTNNB1, EPCAM, ITGAM, ITGAV, ITGAX, MACC1, MMP1, MMP2, MMP3, MMP9, MMP11, MMP13, MMP14, MKI67, MYC, PLAU, RAN, SNAI1, SNAI2, TIMP1, TNC, TWIST1, ZEB1, ZEB2 have a total of 25 genes.
根据本发明的其中一方面,还提供了上述基因标志物组合的应用,上述基因标志物组合应用于肿瘤微环境综合量化评估。According to one aspect of the present invention, an application of the above-mentioned combination of gene markers is also provided, and the above-mentioned combination of gene markers is applied to comprehensive quantitative assessment of tumor microenvironment.
根据本发明的其中一方面,还提供了一种用于预测肿瘤转移和/或肿瘤复发的准确率的试剂盒,该试剂盒包含检测上述的基因标志物组合的探针。According to one aspect of the present invention, there is also provided a kit for predicting the accuracy of tumor metastasis and/or tumor recurrence, the kit comprising probes for detecting the above-mentioned combination of gene markers.
根据本发明的其中一方面,还提供了一种用于肿瘤微环境综合量化评估模型的建立方法,上述肿瘤微环境综合量化评估模型的建立方法使用上述的基因标志物组合包括血管增生指标基因标志物、趋化因子表达指标基因标志物、免疫细胞浸润指标基因标志物和肿瘤生长与侵袭指标基因标志物作为肿瘤微环境综合量化评估的标志物。According to one aspect of the present invention, there is also provided a method for establishing a comprehensive quantitative assessment model for tumor microenvironment. The above-mentioned method for establishing a comprehensive quantitative assessment model for tumor microenvironment uses the above-mentioned combination of gene markers including angiogenesis index gene markers Gene markers, chemokine expression index gene markers, immune cell infiltration index gene markers, and tumor growth and invasion index gene markers were used as markers for comprehensive quantitative assessment of tumor microenvironment.
根据本发明的其中一方面,上述用于肿瘤微环境综合量化评估模型的建立方法,建立方法对GEO数据库中包上述的基因标志物组合的基因表达数据和临床数据(分期/复发/转移)进行分析,首先选取GSE62254[HG-U133_Plus_2]中的前150个样本作为训练集,运用MAS5方法作背景矫正和标准化,获取96基因的基因表达值,在基因具有多个探针时,取最大值作为其基因的表达值,对基因表达值进行log2对数变换,运用logistic回归,计算权重值。According to one aspect of the present invention, the above-mentioned method for establishing a comprehensive quantitative assessment model for tumor microenvironment, the establishment method performs the gene expression data and clinical data (staging/recurrence/metastasis) in the GEO database including the above-mentioned combination of gene markers. For analysis, first select the first 150 samples in GSE62254[HG-U133_Plus_2] as the training set, use the MAS5 method for background correction and normalization, and obtain the gene expression values of 96 genes. When the gene has multiple probes, the maximum value is taken as the The expression value of its gene, log2 logarithm transformation of the gene expression value, using logistic regression to calculate the weight value.
根据本发明的其中一方面,上述一种用于肿瘤微环境综合量化评估模型的建立方法,基因标志物的权重值分别如下:According to one aspect of the present invention, in the above-mentioned method for establishing a comprehensive quantitative assessment model for tumor microenvironment, the weight values of the gene markers are as follows:
ANG权重值-20.1306ANG weight value -20.1306
ANGPT1权重值-34.0557ANGPT1 weight value -34.0557
ANGPT2权重值28.6868ANGPT2 weight value 28.6868
ANGPTL4权重值-7.5101ANGPTL4 weight value -7.5101
BATF3权重值-31.0749BATF3 weight value -31.0749
CCL1权重值13.6034CCL1 weight value 13.6034
CCL18权重值10.4873CCL18 weight value 10.4873
CCL19权重值-20.9738CCL19 weight value -20.9738
CCL2权重值11.7576CCL2 weight value 11.7576
CCL21权重值-5.1514CCL21 weight value -5.1514
CCL3权重值54.0877CCL3 weight value 54.0877
CCL4权重值-53.5021CCL4 weight value - 53.5021
CCL5权重值27.601CCL5 weight value 27.601
CCL8权重值-7.8325CCL8 weight value -7.8325
CD14权重值49.8048CD14 weight value 49.8048
CD163权重值20.646CD163 weight value 20.646
CD19权重值21.3692CD19 weight value 21.3692
CD33权重值4.0713CD33 weight value 4.0713
CD4权重值9.214CD4 weight value 9.214
CD68权重值-24.3628CD68 weight value - 24.3628
CD80权重值30.8515CD80 weight value 30.8515
CD86权重值70.5589CD86 weight value 70.5589
CD8A权重值5.4726CD8A weight value 5.4726
CD8B权重值7.7914CD8B weight value 7.7914
CDH1权重值-5.3397CDH1 weight value -5.3397
CEACAM8权重值0.21CEACAM8 weight value 0.21
CTNNB1权重值-20.1479CTNNB1 weight value -20.1479
CTSW权重值-6.4395CTSW weight value -6.4395
CXCL1权重值-20.0591CXCL1 weight value -20.0591
CXCL10权重值29.703CXCL10 weight value 29.703
CXCL11权重值-13.3524CXCL11 weight value -13.3524
CXCL12权重值19.7961CXCL12 weight value 19.7961
CXCL13权重值-20.5527CXCL13 weight value -20.5527
CXCL2权重值13.4255CXCL2 weight value 13.4255
CXCL3权重值23.3393CXCL3 weight value 23.3393
CXCL8权重值-43.1623CXCL8 weight value - 43.1623
CXCL9权重值9.0169CXCL9 weight value 9.0169
CXCR5权重值5.4856CXCR5 weight value 5.4856
DLL4权重值17.0118DLL4 weight value 17.0118
EDN1权重值-4.8511EDN1 weight value -4.8511
ENTPD1权重值3.4117ENTPD1 weight value 3.4117
EPCAM权重值-33.8795EPCAM weight value -33.8795
FGF1权重值25.9813FGF1 weight value 25.9813
FGF2权重值-12.3332FGF2 weight value -12.3332
FLT1权重值-31.7469FLT1 weight value -31.7469
FOXP3权重值13.9705FOXP3 weight value 13.9705
FUT4权重值13.9706FUT4 weight value 13.9706
GNLY权重值-10.0842GNLY weight value -10.0842
GZMA权重值35.9977GZMA weight value 35.9977
GZMB权重值-9.9711GZMB weight value -9.9711
GZMH权重值15.305GZMH weight value 15.305
HIF1A权重值-125.904HIF1A weight value - 125.904
HLA-DRA权重值-51.5725HLA-DRA weight value -51.5725
IDO1权重值-7.3857IDO1 weight value -7.3857
IFNG权重值-3.9568IFNG weight value -3.9568
ITGAM权重值-20.4378ITGAM weight value -20.4378
ITGAV权重值91.4138ITGAV weight value 91.4138
ITGAX权重值-17.7399ITGAX weight value -17.7399
KLRB1权重值6.4642KLRB1 weight value 6.4642
KLRD1权重值-27.2995KLRD1 weight value -27.2995
KLRK1权重值-31.2776KLRK1 weight value - 31.2776
LAG3权重值5.9364LAG3 weight value 5.9364
MACC1权重值4.823MACC1 weight value 4.823
MKI67权重值-28.3144MKI67 weight value - 28.3144
MMP1权重值-19.0372MMP1 weight value - 19.0372
MMP11权重值9.8452MMP11 weight value 9.8452
MMP13权重值3.3756MMP13 weight value 3.3756
MMP14权重值29.6593MMP14 weight value 29.6593
MMP2权重值-25.6822MMP2 weight value - 25.6822
MMP3权重值16.0266MMP3 weight value 16.0266
MMP9权重值8.0522MMP9 weight value 8.0522
MS4A1权重值1.1736MS4A1 weight value 1.1736
MYC权重值20.084MYC weight value 20.084
NKG7权重值-31.6593NKG7 weight value -31.6593
PDGFB权重值35.2784PDGFB weight value 35.2784
PLAU权重值20.9008PLAU weight value 20.9008
PRF1权重值46.0077PRF1 weight value 46.0077
PTPRC权重值-108.985PTPRC weight value -108.985
RAN权重值5.157RAN weight value 5.157
SERPINB5权重值18.3407SERPINB5 weight value 18.3407
SNAI1权重值-10.2615SNAI1 weight value -10.2615
SNAI2权重值7.6711SNAI2 weight value 7.6711
STAT1权重值18.7689STAT1 weight value 18.7689
TBX21权重值7.2855TBX21 weight value 7.2855
TIMP1权重值8.1053TIMP1 weight value 8.1053
TNC权重值-27.6527TNC weight value -27.6527
TNFRSF17权重值28.0124TNFRSF17 weight value 28.0124
TNFRSF4权重值-19.4462TNFRSF4 weight value -19.4462
TNFRSF9权重值7.2481TNFRSF9 weight value 7.2481
TWIST1权重值12.0057TWIST1 weight value 12.0057
TYMP权重值-15.5035TYMP weight value -15.5035
VEGFA权重值15.2587VEGFA weight value 15.2587
VEGFB权重值-15.0916VEGFB weight value -15.0916
VEGFC权重值1.8606VEGFC weight value 1.8606
ZEB1权重值47.2097ZEB1 weight value 47.2097
ZEB2权重值-15.7898。ZEB2 weight value -15.7898.
根据本发明的其中一方面,上述一种用于肿瘤微环境综合量化评估模型的建立方法,使用logistic分类器,logistic分类器是基于广义线性模型的分类器,其线性判别式为TMEscore=∑iωixi,其中ωi为每个基因的权重值,xi为基因的表达值,TMEscore代表肿瘤微环境的量化评估值。According to one aspect of the present invention, the above-mentioned method for establishing a comprehensive quantitative assessment model for tumor microenvironment uses a logistic classifier, the logistic classifier is a classifier based on a generalized linear model, and the linear discriminant formula is TMEscore=∑ i ω i x i , where ω i is the weight value of each gene, xi is the expression value of the gene, and TMEscore represents the quantitative evaluation value of the tumor microenvironment.
根据本发明的其中一方面,上述一种用于肿瘤微环境综合量化评估模型的建立方法,样品中TMEscore>阈值,则被分为转移,否则被分为未转移。According to one aspect of the present invention, in the above-mentioned method for establishing a comprehensive quantitative assessment model of tumor microenvironment, if the TMEscore in the sample is greater than the threshold value, it is classified as metastasis, otherwise, it is classified as non-metastatic.
根据本发明的其中一方面,上述一种用于肿瘤微环境综合量化评估模型的建立方法,其logistic分类器的阈值为-427.9891。According to one aspect of the present invention, in the above-mentioned method for establishing a comprehensive quantitative assessment model for tumor microenvironment, the threshold of the logistic classifier is -427.9891.
本发明所披露的一种用于肿瘤微环境量化评估的基因标志物组合,能够很好的对肿瘤微环境进行综合性量化评估,评估结果对肿瘤转移的预测准确率100%,对肿瘤复发的预测准确率超过80%。The combination of gene markers for quantitative evaluation of tumor microenvironment disclosed in the present invention can well perform comprehensive quantitative evaluation of tumor microenvironment. The prediction accuracy is over 80%.
上述说明仅是本发明技术方本发明所披露的一种用于肿瘤微环境量化评估的基因标志物组合,能够很好的对肿瘤微环境进行综合性量化评估,评估结果对肿瘤转移的预测准确率100%,对肿瘤复发的预测准确率超过80%。The above description is only a combination of gene markers for quantitative assessment of tumor microenvironment disclosed by the present invention, which can comprehensively and quantitatively assess the tumor microenvironment, and the assessment results can accurately predict tumor metastasis. The rate of tumor recurrence is 100%, and the prediction accuracy of tumor recurrence is more than 80%.
为了能够更清楚了解本发明的技术手段,并可依照说明书的内容予以实施,以下以本发明的较佳实施例并配合附图详细说明如后。In order to have a clearer understanding of the technical means of the present invention and to implement it according to the contents of the specification, the preferred embodiments of the present invention and the accompanying drawings are described in detail below.
附图说明Description of drawings
图1是本发明实施例2中ROC AUC;Fig. 1 is ROC AUC in the embodiment 2 of the present invention;
图2是本发明实施例3中ROC AUC;Fig. 2 is ROC AUC in the embodiment of the present invention 3;
图3是本发明实施例3中另一ROC AUC;Fig. 3 is another ROC AUC in embodiment 3 of the present invention;
图4是本发明实施例4中转移ROC图;Fig. 4 is the transfer ROC diagram in the embodiment 4 of the present invention;
图5是本发明实施例4中另一ROC图;Fig. 5 is another ROC diagram in the embodiment 4 of the present invention;
图6是本发明实施例4中GSE57303ROC图;Fig. 6 is GSE57303ROC diagram in the embodiment 4 of the present invention;
图7是本发明实施例4中GSE8167ROC图。FIG. 7 is a ROC diagram of GSE8167 in Example 4 of the present invention.
具体实施方式Detailed ways
下面结合附图和实施例,对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明,但不用来限制本发明的范围。The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. The following examples are intended to illustrate the present invention, but not to limit the scope of the present invention.
实施例1基因标志物的选择Example 1 Selection of Gene Markers
本发明定义了肿瘤微环境(tumor microenvironment,TME)的四个构成因素(肿瘤细胞赖以生存的微环境对肿瘤细胞具有难以估量的影响,比如血管系统提供的血氧影响肿瘤细胞的代谢,机体免疫细胞对肿瘤细胞的监视和耐受,基质细胞和胞外基质对肿瘤生长和转移的限制等等。为了更好的把握肿瘤微环境,将其拆解提炼为四个因素方面,通过NCBI搜索每个因素的相关基因。对这些基因依据功能进行精简和提炼,形成一种用于肿瘤微环境量化评估的基因标志物组合),这四个构成因素包括血管增生指标、趋化因子表达指标、免疫细胞浸润指标、肿瘤生长与侵袭指标,籍此形成一种用于肿瘤微环境综合量化评估的基因标志物组合,包括如下基因标志物:The present invention defines four components of tumor microenvironment (TME) (the microenvironment that tumor cells rely on has an incalculable impact on tumor cells, such as blood oxygen provided by the vascular system affects the metabolism of tumor cells, the body Surveillance and tolerance of tumor cells by immune cells, restriction of tumor growth and metastasis by stromal cells and extracellular matrix, etc. In order to better grasp the tumor microenvironment, it was disassembled and refined into four factors, and searched through NCBI Relevant genes of each factor. These genes are simplified and refined according to their functions to form a combination of gene markers for quantitative assessment of tumor microenvironment. These four components include angiogenesis indicators, chemokine expression indicators, Immune cell infiltration indicators, tumor growth and invasion indicators, thereby forming a combination of gene markers for comprehensive quantitative assessment of tumor microenvironment, including the following gene markers:
I.血管增生指标基因标志物:ANG、ANGPT1、ANGPT2、ANGPTL4、DLL4、EDN1、FGF1、FGF2、FLT1、HIF1A、PDGFB、SERPINB5、TYMP、VEGFA、VEGFB、VEGFC共16基因I. Angiogenesis index gene markers: ANG, ANGPT1, ANGPT2, ANGPTL4, DLL4, EDN1, FGF1, FGF2, FLT1, HIF1A, PDGFB, SERPINB5, TYMP, VEGFA, VEGFB, VEGFC, a total of 16 genes
II.趋化因子表达指标基因标志物:CCL1 CCL2 CCL3 CCL4 CCL5 CCL8CCL18CCL19CCL21CXCL1 CXCL2 CXCL3 CXCL8 CXCL9 CXCL10 CXCL11 CXCL12 CXCL13共18基因II. Chemokine expression index gene markers: CCL1 CCL2 CCL3 CCL4 CCL5 CCL8CCL18CCL19CCL21CXCL1 CXCL2 CXCL3 CXCL8 CXCL9 CXCL10 CXCL11 CXCL12 CXCL13 a total of 18 genes
III.免疫细胞浸润指标基因标志物:IDO1 HLA-DRASTAT1 IFNG PRF1GZMAGZMBNKG7 GZMHKLRK1 KLRB1 KLRD1 CTSW GNLY CD14 CD15 CD19 CD68 CD163CD33CEACAM8 CD80 CD86 BATF3TNFRSF17 CD20 TNFRSF4 CD4 TNFRSF9 CD8A CD8B LAG3 CD39CXCR5 TBX21 FOXP3 CD45RO共37基因III. Immune cell infiltration indicator gene markers: IDO1 HLA-DRASTAT1 IFNG PRF1GZMAGZMBNKG7 GZMHKLRK1 KLRB1 KLRD1 CTSW GNLY CD14 CD15 CD19 CD68 CD163CD33CEACAM8 CD80 CD86 BATF3TNFRSF17 CD20 TNFRSF4 CD4 TNFRSF9 CD8A CD8B LAG3 CD35 CD34 CXRO FO 7 TBX2 genes
IV.肿瘤生长与侵袭指标基因标志物:CDH1 CTNNB1 EPCAM ITGAM ITGAVITGAXMACC1 MMP1MMP2 MMP3 MMP9 MMP11 MMP13 MMP14 MKI67MYC PLAU RAN SNAI1 SNAI2TIMP1 TNC TWIST1 ZEB1 ZEB2共25基因IV. Tumor growth and invasion index gene markers: CDH1 CTNNB1 EPCAM ITGAM ITGAVITGAXMACC1 MMP1MMP2 MMP3 MMP9 MMP11 MMP13 MMP14 MKI67MYC PLAU RAN SNAI1 SNAI2TIMP1 TNC TWIST1 ZEB1 ZEB2 a total of 25 genes
实施例2分类器模型的构建Example 2 Construction of the classifier model
通过上述免疫微环境各构成因素的基因标志物组合(以下称96基因),进行如下应用:Through the combination of gene markers for each component of the immune microenvironment (hereinafter referred to as 96 genes), the following applications are carried out:
①对GEO数据库中包含上述基因标志物组合的基因表达数据和临床数据(分期/复发/转移)进行分析。首先选取GSE62254[Affymetrix HG-U133_Plus_2]中的前150个样本作为训练集,运用MAS5方法作背景矫正和标准化,获取96基因的基因表达值。如果基因具有多个探针,取最大值作为其基因的表达值,对基因表达值进行log2对数变换。运用logistic回归,计算权重值。①Analyze the gene expression data and clinical data (staging/recurrence/metastasis) in the GEO database containing the above-mentioned combination of gene markers. First, the first 150 samples in GSE62254[Affymetrix HG-U133_Plus_2] were selected as the training set, and the MAS5 method was used for background correction and normalization to obtain the gene expression values of 96 genes. If a gene has multiple probes, take the maximum value as the expression value of its gene, and perform log2 log transformation on the gene expression value. Use logistic regression to calculate the weights.
②因为肿瘤的临床数据如复发与否、转移与否是非此即彼的,其概率分布符合Bernouli分布,所以我们采用的是一种二分类器——logistic分类器。这是一种基于广义线性模型的分类器,其线性判别式TMEscore=∑iωixi,其中ωi为每个基因的权重值,xi为基因的表达值,TMEscore代表肿瘤微环境的量化评估值。如果一个样本的TMEscore>阈值,则被分为转移,否则被分为未转移。logistic分类器的阈值通过最大似然估计(maximumlikelihood estimation,MLE)获得,为-427.9891。②Because the clinical data of tumors, such as recurrence or metastasis, are either one or the other, the probability distribution conforms to Bernouli distribution, so we use a binary classifier—logistic classifier. This is a classifier based on a generalized linear model, and its linear discriminant formula TMEscore=∑ i ω i x i , where ω i is the weight value of each gene, xi is the expression value of the gene, and TMEscore represents the tumor microenvironment. Quantitative evaluation value. A sample is classified as transferred if its TMEscore > threshold, otherwise it is classified as not transferred. The threshold for the logistic classifier was obtained by maximum likelihood estimation (MLE) and was -427.9891.
该分类器对训练集的判断准确率为:100%The judgment accuracy of the classifier on the training set is: 100%
通过交叉十字验证的灵敏度和特异性为:The sensitivity and specificity by cross-validation are:
上表解释:比如对于转移来讲,Explanation from the above table: For example, for transfer,
Sensitivity是转移样本被判断为转移的正确率。Sensitivity is the correct rate at which metastatic samples are judged to be metastatic.
Specificity是非转移样本被判断为非转移的正确率。Specificity is the correct rate at which non-transfer samples are judged as non-transfer.
PPV(Positive Predictive Value)是在被判断为转移样本中真正为转移样本的概率。PPV (Positive Predictive Value) is the probability that the samples that are judged to be transfer samples are actually transfer samples.
NPV(Negative Predictive Value)是在被判断为非转移样本中真正为非转移样本的概率。NPV (Negative Predictive Value) is the probability that the samples judged to be non-metastatic are truly non-metastatic.
相应的logistic分类器ROC AUC达到1.0,如图1。The corresponding logistic classifier ROC AUC reaches 1.0, as shown in Figure 1.
实施例3分类器模型的验证Example 3 Validation of the classifier model
为了检验用于构建分类器的96基因标志物的稳定性,也为了防止marker的过拟合的问题,将GSE62254[HG-U133_Plus_2]的另一部分数据集对marker进行预测验证。In order to test the stability of the 96-gene markers used to construct the classifier, and to prevent the problem of over-fitting of the markers, another part of the dataset of GSE62254 [HG-U133_Plus_2] was used to perform prediction validation on the markers.
对于预测集GSE62254的判断准确率为100%The judgment accuracy rate for the prediction set GSE62254 is 100%
通过交叉十字验证的灵敏度和特异性为:The sensitivity and specificity by cross-validation are:
相应的logistic分类器ROC AUC达到1.0,如图2。The corresponding logistic classifier ROC AUC reaches 1.0, as shown in Figure 2.
为了进一步检验用于构建分类器的marker的稳定性,也为了防止marker的过拟合的问题,将GSE57303数据集和自有的22个肿瘤样本(临床诊断为胃癌/胃食管癌,其中有9例转移,13例未转移)数据对marker进行验证。In order to further test the stability of the marker used to construct the classifier, and also to prevent the problem of over-fitting of the marker, the GSE57303 dataset and its own 22 tumor samples (clinically diagnosed as gastric cancer / gastroesophageal cancer, of which 9 13 cases were transferred, 13 cases were not transferred) data to verify the marker.
对于预测集GSE57303和自有的22个肿瘤样本的判断准确率为100%The judgment accuracy rate for the prediction set GSE57303 and its own 22 tumor samples is 100%
通过交叉十字验证的灵敏度和特异性为:The sensitivity and specificity by cross-validation are:
相应的logistic分类器ROC AUC达到1.0,如图3。The corresponding logistic classifier ROC AUC reaches 1.0, as shown in Figure 3.
实施例4Example 4
作为对比的发明人前期所做25基因肿瘤微环境综合量化评估模型As a comparison, the 25-gene tumor microenvironment comprehensive quantitative assessment model made by the inventors earlier
同样将GSE62254的前150个样本作为训练集,运用MAS5方法作背景矫正和标准化,获取25基因的基因表达值。如果基因具有多个探针,取最大值作为其基因的表达值,对基因表达值进行log2对数变换。运用logistic回归,计算权重值,建立模型。通过建立的模型对其他的预测集进行预测,得到预测准确率。通过以上验证,得到分组的ROC图,评估其诊断灵敏度及特异性。The first 150 samples of GSE62254 were also used as the training set, and the MAS5 method was used for background correction and normalization to obtain the gene expression values of 25 genes. If a gene has multiple probes, take the maximum value as the expression value of its gene, and perform log2 log transformation on the gene expression value. Use logistic regression to calculate weights and build a model. Predict other prediction sets through the established model to obtain the prediction accuracy. Through the above verification, a grouped ROC chart was obtained to evaluate its diagnostic sensitivity and specificity.
转移:Transfer:
训练集marker的筛选Screening of training set markers
根据GSE62254[HG-U133_Plus_2]样本中的表达值以及分组情况,运用logistic,对训练集进行转移与非转移两分组的分类,并得到其权重值系数:According to the expression value and grouping situation in the GSE62254[HG-U133_Plus_2] sample, using logistic, the training set is classified into two groups of transfer and non-transfer, and its weight value coefficient is obtained:
设定每个基因的权重值为ωi,基因的表达值为xi,那么一个样本的∑iωixi>阈值,则被分为转移。分类器的阈值为-0.735。Set the weight value of each gene as ω i , and the expression value of the gene as xi , then a sample whose ∑ i ω i x i > the threshold value is classified as transfer. The threshold for the classifier is -0.735.
其中,这些marker对训练集的判断准确率为:Among them, the judgment accuracy of these markers on the training set is:
通过交叉十字验证的灵敏度和特异性为:The sensitivity and specificity by cross-validation are:
上表解释:比如对于转移来讲,Explanation from the above table: For example, for transfer,
Sensitivity是转移样本被判断为转移的正确率。Sensitivity is the correct rate at which metastatic samples are judged to be metastatic.
Specificity是非转移样本被判断为非转移的正确率。Specificity is the correct rate at which non-transfer samples are judged as non-transfer.
PPV(Positive Predictive Value)是在被判断为转移样本中真正为转移样本的概率。PPV (Positive Predictive Value) is the probability that the samples that are judged to be transfer samples are actually transfer samples.
NPV(Negative Predictive Value)是在被判断为非转移样本中真正为非转移样本的概率。NPV (Negative Predictive Value) is the probability that the samples judged to be non-metastatic are truly non-metastatic.
相应的logistic分类器ROC图如图4。The corresponding logistic classifier ROC diagram is shown in Figure 4.
预测集marker的验证Validation of prediction set markers
为了检验筛选出来的marker的稳定性,也为了防止marker的过拟合的问题,将另一部分数据集对marker进行验证。In order to test the stability of the selected markers, and to prevent the problem of over-fitting of the markers, another part of the data set is used to verify the markers.
对于预测集GSE62254的判断准确率为For the prediction set GSE62254, the judgment accuracy is
通过交叉十字验证的灵敏度和特异性为:The sensitivity and specificity by cross-validation are:
相应的logistic分类器ROC图如图5。The corresponding logistic classifier ROC diagram is shown in Figure 5.
为了检验筛选出来的marker的稳定性,也为了防止marker的过拟合的问题,将GSE57303数据集对marker进行验证。In order to check the stability of the selected markers and to prevent the problem of over-fitting of the markers, the GSE57303 dataset was used to verify the markers.
对于预测集GSE57303的判断准确率为For the prediction set GSE57303, the judgment accuracy is
通过交叉十字验证的灵敏度和特异性为:The sensitivity and specificity by cross-validation are:
相应的logistic分类器ROC图如图6。The corresponding logistic classifier ROC diagram is shown in Figure 6.
GSE8167GSE8167
为了检验筛选出来的marker的稳定性,也为了防止marker的过拟合的问题,将GSE8167数据集对marker进行验证。In order to check the stability of the selected markers, and to prevent the problem of over-fitting of the markers, the GSE8167 dataset was used to verify the markers.
对于预测集GSE8167的判断准确率为For the prediction set GSE8167, the judgment accuracy is
通过交叉十字验证的灵敏度和特异性为:The sensitivity and specificity by cross-validation are:
相应的logistic分类器ROC图如图7。The corresponding logistic classifier ROC diagram is shown in Figure 7.
从实施例1~4可见选用本发明96基因进行评估,结果对肿瘤转移的预测准确率100%,对肿瘤复发的预测准确率超过80%。并非随意选用一些相关的基因就能就行准确的量化评估。From Examples 1 to 4, it can be seen that 96 genes of the present invention were selected for evaluation, and the results showed that the prediction accuracy rate for tumor metastasis was 100%, and the prediction accuracy rate for tumor recurrence exceeded 80%. It is not necessary to randomly select some relevant genes for accurate quantitative evaluation.
以上实施例仅是本发明的优选实施方式,并不用于限制本发明,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变型,这些改进和变型也应视为本发明的保护范围。The above embodiments are only preferred embodiments of the present invention and are not intended to limit the present invention. It should be pointed out that for those of ordinary skill in the art, some improvements can be made without departing from the technical principles of the present invention. These improvements and modifications should also be regarded as the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010718484.9A CN111778336B (en) | 2020-07-23 | 2020-07-23 | Gene marker combination and application for comprehensive quantitative assessment of tumor microenvironment |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010718484.9A CN111778336B (en) | 2020-07-23 | 2020-07-23 | Gene marker combination and application for comprehensive quantitative assessment of tumor microenvironment |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111778336A true CN111778336A (en) | 2020-10-16 |
| CN111778336B CN111778336B (en) | 2021-02-26 |
Family
ID=72763960
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010718484.9A Active CN111778336B (en) | 2020-07-23 | 2020-07-23 | Gene marker combination and application for comprehensive quantitative assessment of tumor microenvironment |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111778336B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2012166700A2 (en) * | 2011-05-29 | 2012-12-06 | Lisanti Michael P | Molecular profiling of a lethal tumor microenvironment |
| CN110456054A (en) * | 2019-08-13 | 2019-11-15 | 臻悦生物科技江苏有限公司 | Cancer of pancreas detection reagent, kit, device and application |
| CN110621790A (en) * | 2017-05-10 | 2019-12-27 | 南托米克斯有限责任公司 | Circulating RNA for detecting, predicting and monitoring cancer |
| CN111235273A (en) * | 2020-01-16 | 2020-06-05 | 臻悦生物科技江苏有限公司 | Colorectal cancer tumor microenvironment detection reagent, kit, device and application |
-
2020
- 2020-07-23 CN CN202010718484.9A patent/CN111778336B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2012166700A2 (en) * | 2011-05-29 | 2012-12-06 | Lisanti Michael P | Molecular profiling of a lethal tumor microenvironment |
| CN110621790A (en) * | 2017-05-10 | 2019-12-27 | 南托米克斯有限责任公司 | Circulating RNA for detecting, predicting and monitoring cancer |
| CN110456054A (en) * | 2019-08-13 | 2019-11-15 | 臻悦生物科技江苏有限公司 | Cancer of pancreas detection reagent, kit, device and application |
| CN111235273A (en) * | 2020-01-16 | 2020-06-05 | 臻悦生物科技江苏有限公司 | Colorectal cancer tumor microenvironment detection reagent, kit, device and application |
Non-Patent Citations (2)
| Title |
|---|
| QIANQIAN DUAN ET AL: "Turning Cold into Hot: Firing up the Tumor Microenvironment", 《TRENDS IN CANCER》 * |
| 张如奎等: "浅论基因检测对肿瘤精准医疗的意义", 《中国医药生物技术》 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111778336B (en) | 2021-02-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112133365B (en) | Gene set for evaluating tumor microenvironment, scoring model and application of gene set | |
| JP4906505B2 (en) | Expression profile algorithms and tests for cancer diagnosis | |
| US8632980B2 (en) | Gene expression markers for prediction of patient response to chemotherapy | |
| EP3831964B1 (en) | Method to use gene expression to determine likelihood of clinical outcome of renal cancer | |
| AU2010242792B2 (en) | Gene expression profile algorithm and test for likelihood of recurrence of colorectal cancer and response to chemotherapy | |
| US20100305058A1 (en) | Individualized cancer treatments | |
| US20080233573A1 (en) | Gene expression profiling for identification, monitoring and treatment of transplant rejection | |
| US20170073763A1 (en) | Methods and Compositions for Assessing Patients with Non-small Cell Lung Cancer | |
| US20240401134A1 (en) | Methods and systems for measuring cell states | |
| Kwon | Emerging immune gene signatures as prognostic or predictive biomarkers in breast cancer | |
| EP2788535A1 (en) | Predicting prognosis in classic hodgkin lymphoma | |
| CN113234829B (en) | Colon cancer prognosis evaluation gene set and construction method thereof | |
| CN116230081A (en) | A biomarker, application and model building method for prognosis prediction of lung adenocarcinoma | |
| JP2016073287A (en) | Methods for identifying tumor characteristics and marker sets, tumor classifications, and cancer marker sets | |
| Lim et al. | A genomic‐augmented multivariate prognostic model for the survival of natural‐killer/T‐cell lymphoma patients from an international cohort | |
| CN111778336B (en) | Gene marker combination and application for comprehensive quantitative assessment of tumor microenvironment | |
| CN115161398A (en) | Marker combination for colon cancer diagnosis or prognosis evaluation | |
| CN115992229A (en) | lncRNA marker and model for pancreatic cancer prognosis risk assessment and application thereof | |
| CN117766024B (en) | Ovarian cancer CD8+T cell related prognosis evaluation method, system and application thereof | |
| CN118222713A (en) | Application of biomarker in detection of brain glioma-related TLS | |
| CN117925835A (en) | Colorectal cancer liver metastasis marker model and application thereof in prognosis and immunotherapy response prediction | |
| CA3246341A1 (en) | Methods for diagnosing myocardial infarction | |
| Huang et al. | Construction and validation of a TAMRGs prognostic signature for gliomas by integrated analysis of scRNA and bulk RNA sequencing data | |
| CA2612492A1 (en) | Gene expression profiling for identification and monitoring of multiple sclerosis | |
| Zhou et al. | The Prognostic Value of m6A-related LncRNAs in Patients with HNSCC |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CP03 | Change of name, title or address |
Address after: 201205 Shanghai Pudong New Area, No. 2218 Huanan Road, West Building, 18th Floor Patentee after: Yihua Bo Ao (Shanghai) Intelligent Technology Co.,Ltd. Country or region after: China Address before: Jiangsu Province Suzhou City Industrial Park Xinghu Street 218 Bio-Nano Park South Building A1 Room 302-5 Patentee before: SUZHOU BANGKAI GENE TECHNOLOGY CO.,LTD. Country or region before: China |
|
| CP03 | Change of name, title or address |