[go: up one dir, main page]

CN118703626A - A method and kit for multiplex PCR targeted methylation sequencing - Google Patents

A method and kit for multiplex PCR targeted methylation sequencing Download PDF

Info

Publication number
CN118703626A
CN118703626A CN202410724631.1A CN202410724631A CN118703626A CN 118703626 A CN118703626 A CN 118703626A CN 202410724631 A CN202410724631 A CN 202410724631A CN 118703626 A CN118703626 A CN 118703626A
Authority
CN
China
Prior art keywords
colorectal cancer
gene
seq
chromosome coordinates
methylation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410724631.1A
Other languages
Chinese (zh)
Inventor
张道允
巩子英
虞洪杰
孙永华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiaxing Yunying Medical Inspection Co ltd
Original Assignee
Jiaxing Yunying Medical Inspection Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiaxing Yunying Medical Inspection Co ltd filed Critical Jiaxing Yunying Medical Inspection Co ltd
Priority to CN202410724631.1A priority Critical patent/CN118703626A/en
Publication of CN118703626A publication Critical patent/CN118703626A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/136Screening for pharmacological compounds
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Hospice & Palliative Care (AREA)
  • Theoretical Computer Science (AREA)
  • Oncology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本说明书实施例提供一种多重PCR靶向甲基化测序的方法与试剂盒,该试剂盒包括DNA甲基化位点组合用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选。本说明书实施例还提供DNA甲基化位点组合,该DNA甲基化位点组合具有良好的敏感度和特异性,能够有效检测或筛查出结直肠癌患者,在已知的结直肠癌患者和非结直肠癌患者中表现出甲基化水平的显著差异,可作为结直肠癌早期筛查、患病风险预测等方面的标志物,也可用于设计诊断试剂或试剂盒。本说明书实施例还提供用于结直肠癌筛查或结直肠癌患病风险预测的装置。

The embodiments of this specification provide a method and kit for multiplex PCR targeted methylation sequencing, which includes a DNA methylation site combination for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening. The embodiments of this specification also provide a DNA methylation site combination, which has good sensitivity and specificity, can effectively detect or screen colorectal cancer patients, and shows significant differences in methylation levels between known colorectal cancer patients and non-colorectal cancer patients. It can be used as a marker for early screening of colorectal cancer, risk prediction, etc., and can also be used to design diagnostic reagents or kits. The embodiments of this specification also provide a device for colorectal cancer screening or colorectal cancer risk prediction.

Description

一种多重PCR靶向甲基化测序的方法与试剂盒A method and kit for multiplex PCR targeted methylation sequencing

技术领域Technical Field

本说明书涉及生物技术领域,尤其涉及一种多重PCR靶向甲基化测序的方法与试剂盒。The present invention relates to the field of biotechnology, and in particular to a method and a kit for multiplex PCR targeted methylation sequencing.

背景技术Background Art

结直肠癌是我国五大癌症之一,在消化道肿瘤中其发病率仅次于胃癌,中国结直肠癌患者的死亡数量占恶性肿瘤患者死亡总数的7.8%。结直肠癌发病率和死亡率均位于前五的位置。此外,我国结直肠癌多发生在中年以上的男性,40~60岁为高发年龄段,平均发病年龄为48.3岁。目前,结直肠癌的防治工作任重道远。Colorectal cancer is one of the five major cancers in my country. Its incidence rate is second only to gastric cancer among digestive tract tumors. The number of deaths from colorectal cancer in China accounts for 7.8% of the total number of deaths from malignant tumors. Both the incidence and mortality rates of colorectal cancer are in the top five. In addition, colorectal cancer in my country mostly occurs in middle-aged men, with a high incidence age of 40 to 60 years old, and an average age of 48.3 years. At present, there is still a long way to go in the prevention and treatment of colorectal cancer.

DNA甲基化是推动肿瘤发展的重要机制,在不同种类的肿瘤中,可以观察到DNA甲基化水平遭到不同程度的破坏。在生物机理水平上,基因组甲基化可以揭示不同生命过程中基因调控信息,生物体液样本含有来自于肿瘤特异的DNA甲基化信号,可以作为鉴别结直肠癌生物标志物的样本来源。基于靶向测序的重亚硫酸氢盐处理DNA甲基化技术,可以针对目标基因组区域进行NGS深度测序,在获得目标甲基化位点是否发生甲基化外,又可以获得具有统计学意义的甲基化位点测序深度和对应得甲基化率数据。DNA methylation is an important mechanism that drives tumor development. In different types of tumors, it can be observed that the DNA methylation level is damaged to varying degrees. At the biological mechanism level, genomic methylation can reveal gene regulation information in different life processes. Biological fluid samples contain tumor-specific DNA methylation signals and can be used as a sample source for identifying colorectal cancer biomarkers. Based on targeted sequencing, bisulfite-treated DNA methylation technology can perform NGS deep sequencing on the target genomic region. In addition to obtaining whether the target methylation site is methylated, it can also obtain statistically significant methylation site sequencing depth and corresponding methylation rate data.

因此,期望提供一种多重PCR靶向甲基化测序的方法与试剂盒,以准确、快速地筛查结直肠癌。Therefore, it is desirable to provide a method and kit for multiplex PCR targeted methylation sequencing to accurately and rapidly screen for colorectal cancer.

发明内容Summary of the invention

本说明书一个或多个实施例提供一种DNA甲基化位点组合作为生物标志物或DNA甲基化位点组合的检测试剂在制备用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的试剂盒中的用途,所述DNA甲基化位点组合包括以下组中的一个或多个:位于SOCS-1基因上染色体坐标为chr16:11348913的位点SOCS-1_24;位于cg24403845基因上染色体坐标为chr10:108924288的位点cg24403845_103。One or more embodiments of the present specification provide a use of a DNA methylation site combination as a biomarker or a detection reagent for a DNA methylation site combination in the preparation of a kit for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening, wherein the DNA methylation site combination includes one or more of the following groups: site SOCS-1_24 located on the SOCS-1 gene with chromosome coordinates chr16: 11348913; site cg24403845_103 located on the cg24403845 gene with chromosome coordinates chr10: 108924288.

在一些实施例中,所述DNA甲基化位点组合还包括如下位点中的一个或多个:位于HOXD3基因上染色体坐标为chr2:177027898的位点HOXD3_26;位于SDC2-2_基因上染色体坐标为chr8:97505775的位点SDC2-2_46;位于SFRP1基因上染色体坐标为chr8:41167015的位点SFRP1_48;位于SP9基因上染色体坐标为chr2:175199694的位点SP9_25;位于TAC1-2基因上染色体坐标为chr7:97361597的位点TAC1-2_135;位于TBR1基因上染色体坐标为chr2:162283730的位点TBR1_161。In some embodiments, the DNA methylation site combination also includes one or more of the following sites: site HOXD3_26 located on the HOXD3 gene with chromosome coordinates chr2: 177027898; site SDC2-2_46 located on the SDC2-2 gene with chromosome coordinates chr8: 97505775; site SFRP1_48 located on the SFRP1 gene with chromosome coordinates chr8: 41167015; site SP9_25 located on the SP9 gene with chromosome coordinates chr2: 175199694; site TAC1-2_135 located on the TAC1-2 gene with chromosome coordinates chr7: 97361597; site TBR1_161 located on the TBR1 gene with chromosome coordinates chr2: 162283730.

在一些实施例中,所述DNA甲基化位点组合包括由SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135及TBR1_161组成的组合。In some embodiments, the combination of DNA methylation sites includes a combination consisting of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161.

在一些实施例中,所述检测试剂包括用于扩增所述DNA甲基化位点组合的引物对组;其中,用于扩增SOCS-1_24的引物对如SEQ ID NO:1和SEQ ID NO:2所示;用于扩增cg24403845_103的引物对如SEQ ID NO:3和SEQ ID NO:4所示;用于扩增HOXD3_26的引物对如SEQ ID NO:5和SEQ ID NO:6所示;用于扩增SDC2-2_46的引物对如SEQ ID NO:7和SEQ IDNO:8所示;用于扩增SFRP1_48的引物对如SEQ ID NO:9和SEQ ID NO:10所示;用于扩增SP9_25的引物对如SEQ ID NO:11和SEQ ID NO:12所示;用于扩增TAC1-2_135的引物对如SEQ IDNO:13和SEQ ID NO:14所示;用于扩增TBR1_161的引物对如SEQ ID NO:15和SEQ ID NO:16所示。In some embodiments, the detection reagent includes a primer pair group for amplifying the DNA methylation site combination; wherein the primer pair for amplifying SOCS-1_24 is shown in SEQ ID NO: 1 and SEQ ID NO: 2; the primer pair for amplifying cg24403845_103 is shown in SEQ ID NO: 3 and SEQ ID NO: 4; the primer pair for amplifying HOXD3_26 is shown in SEQ ID NO: 5 and SEQ ID NO: 6; the primer pair for amplifying SDC2-2_46 is shown in SEQ ID NO: 7 and SEQ ID NO: 8; the primer pair for amplifying SFRP1_48 is shown in SEQ ID NO: 9 and SEQ ID NO: 10; the primer pair for amplifying SP9_25 is shown in SEQ ID NO: 11 and SEQ ID NO: 12; the primer pair for amplifying TAC1-2_135 is shown in SEQ ID NO: 13 and SEQ ID NO: 14; the primer pair for amplifying TBR1_161 is shown in SEQ ID NO: 15 and SEQ ID NO: 16. As shown in NO:16.

本说明书一个或多个实施例提供一种结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的装置,所述装置包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现以下方法:One or more embodiments of the present specification provide a device for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening, the device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the following method when executing the program:

获取受试者生物样本中DNA甲基化位点组合的甲基化水平,其中,所述DNA甲基化位点组合包括:位于SOCS-1基因上染色体坐标为chr16:11348913的位点SOCS-1_24;和/或位于cg24403845基因上染色体坐标为chr10:108924288的位点cg24403845_103;Obtaining the methylation level of a combination of DNA methylation sites in a biological sample of a subject, wherein the combination of DNA methylation sites includes: site SOCS-1_24 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; and/or site cg24403845_103 located at chromosome coordinates chr10:108924288 on the cg24403845 gene;

基于所述DNA甲基化位点组合的甲基化水平,使用预测模型评估所述受试者是否患有结直肠癌、预测所述受试者发展结直肠癌的风险、评估所述受试者治疗结直肠癌的效果和/或评估结直肠癌治疗药物的效果。Based on the methylation level of the combination of DNA methylation sites, a prediction model is used to assess whether the subject has colorectal cancer, predict the risk of the subject developing colorectal cancer, assess the effect of treatment for colorectal cancer in the subject and/or assess the effect of colorectal cancer treatment drugs.

在一些实施例中,所述DNA甲基化位点组合还包括如下位点中的一个或多个:位于HOXD3基因上染色体坐标为chr2:177027898的位点HOXD3_26;位于SDC2-2_基因上染色体坐标为chr8:97505775的位点SDC2-2_46;位于SFRP1基因上染色体坐标为chr8:41167015的位点SFRP1_48;位于SP9基因上染色体坐标为chr2:175199694的位点SP9_25;位于TAC1-2基因上染色体坐标为chr7:97361597的位点TAC1-2_135;位于TBR1基因上染色体坐标为chr2:162283730的位点TBR1_161。In some embodiments, the DNA methylation site combination also includes one or more of the following sites: site HOXD3_26 located on the HOXD3 gene with chromosome coordinates chr2: 177027898; site SDC2-2_46 located on the SDC2-2 gene with chromosome coordinates chr8: 97505775; site SFRP1_48 located on the SFRP1 gene with chromosome coordinates chr8: 41167015; site SP9_25 located on the SP9 gene with chromosome coordinates chr2: 175199694; site TAC1-2_135 located on the TAC1-2 gene with chromosome coordinates chr7: 97361597; site TBR1_161 located on the TBR1 gene with chromosome coordinates chr2: 162283730.

本说明书一个或多个实施例提供一种利用甲基化位点进行结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的预测模型,所述甲基化位点包括位于SOCS-1基因上染色体坐标为chr16:11348913的位点SOCS-1_24;和位于cg24403845基因上染色体坐标为chr10:108924288的位点cg24403845_103;所述预测模型由公式Risk score=-4.38+5.16*(cg24403845_103)+1.18*SOCS-1_24表示,其中,Risk score表示受试者患有结直肠癌、发展结直肠癌的风险分数及受试者接受治疗和/或给药后结直肠癌的风险水平,基因位点代表该基因位点的甲基化水平。One or more embodiments of the present specification provide a prediction model for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening using methylation sites, the methylation sites include site SOCS-1_24 located on the SOCS-1 gene with chromosome coordinates of chr16:11348913; and site cg24403845_103 located on the cg24403845 gene with chromosome coordinates of chr10:108924288; the prediction model is represented by the formula Risk score=-4.38+5.16*(cg24403845_103)+1.18*SOCS-1_24, wherein Risk score represents the risk score of a subject having colorectal cancer, developing colorectal cancer, and the risk level of colorectal cancer of a subject after receiving treatment and/or medication, and the gene site represents the methylation level of the gene site.

在一些实施例中,所述甲基化位点还包括:位于HOXD3基因上染色体坐标为chr2:177027898的位点HOXD3_26;位于SDC2-2_基因上染色体坐标为chr8:97505775的位点SDC2-2_46;位于SFRP1基因上染色体坐标为chr8:41167015的位点SFRP1_48;位于SP9基因上染色体坐标为chr2:175199694的位点SP9_25;位于TAC1-2基因上染色体坐标为chr7:97361597的位点TAC1-2_135;和位于TBR1基因上染色体坐标为chr2:162283730的位点TBR1_161,所述预测模型还可以由公式Risk score=-7.11+2.89*cg24403845_103+3.61*SOCS-1_24+1.05*TAC1-2_135+0.92*SDC2-2_46+0.77*SFRP1_48+0.72*TBR1_161+0.67*SP9_25+0.19*HOXD3_26表示,其中,Risk score表示受试者患有结直肠癌、发展结直肠癌的风险分数及受试者接受治疗和/或给药后结直肠癌的风险水平,各个基因位点代表该基因位点的甲基化水平。In some embodiments, the methylation sites further include: site HOXD3_26 located at chromosome coordinates chr2:177027898 on the HOXD3 gene; site SDC2-2_46 located at chromosome coordinates chr8:97505775 on the SDC2-2_ gene; site SFRP1_48 located at chromosome coordinates chr8:41167015 on the SFRP1 gene; site SP9_25 located at chromosome coordinates chr2:175199694 on the SP9 gene; site TAC1-2_135 located at chromosome coordinates chr7:97361597 on the TAC1-2 gene; and site TBR1_161 located at chromosome coordinates chr2:162283730 on the TBR1 gene. The prediction model can also be represented by the formula Risk score=-7.11+2.89*cg24403845_103+3.61*SOCS-1_24+1.05*TAC1-2_135+0.92*SDC2-2_46+0.77*SFRP1_48+0.72*TBR1_161+0.67*SP9_25+0.19*HOXD3_26, wherein Risk score represents the risk score of the subject suffering from colorectal cancer, developing colorectal cancer, and the risk level of colorectal cancer after the subject receives treatment and/or medication, and each gene locus represents the methylation level of the gene locus.

本说明书一个或多个实施例提供一种用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的甲基化位点组合,所述DNA甲基化位点组合包括:位于SOCS-1基因上染色体坐标为chr16:11348913的位点SOCS-1_24;和/或位于cg24403845基因上染色体坐标为chr10:108924288的位点cg24403845_103。One or more embodiments of the present specification provide a methylation site combination for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening, the DNA methylation site combination includes: site SOCS-1_24 located on the SOCS-1 gene with chromosome coordinates chr16: 11348913; and/or site cg24403845_103 located on the cg24403845 gene with chromosome coordinates chr10: 108924288.

在一些实施例中,所述DNA甲基化位点组合还包括如下位点中的一个或多个:位于HOXD3基因上染色体坐标为chr2:177027898的位点HOXD3_26;位于SDC2-2_基因上染色体坐标为chr8:97505775的位点SDC2-2_46;位于SFRP1基因上染色体坐标为chr8:41167015的位点SFRP1_48;位于SP9基因上染色体坐标为chr2:175199694的位点SP9_25;位于TAC1-2基因上染色体坐标为chr7:97361597的位点TAC1-2_135;位于TBR1基因上染色体坐标为chr2:162283730的位点TBR1_161。In some embodiments, the DNA methylation site combination also includes one or more of the following sites: site HOXD3_26 located on the HOXD3 gene with chromosome coordinates chr2: 177027898; site SDC2-2_46 located on the SDC2-2 gene with chromosome coordinates chr8: 97505775; site SFRP1_48 located on the SFRP1 gene with chromosome coordinates chr8: 41167015; site SP9_25 located on the SP9 gene with chromosome coordinates chr2: 175199694; site TAC1-2_135 located on the TAC1-2 gene with chromosome coordinates chr7: 97361597; site TBR1_161 located on the TBR1 gene with chromosome coordinates chr2: 162283730.

本说明书实施例之一提供一种前文所述的DNA甲基化位点组合的检测试剂,所述检测试剂包括用于扩增所述DNA甲基化位点组合的引物对组;其中,用于扩增SOCS-1_24的引物对如SEQ ID NO:1和SEQ ID NO:2所示;用于扩增cg24403845_103的引物对如SEQ IDNO:3和SEQ ID NO:4所示。One of the embodiments of the present specification provides a detection reagent for the DNA methylation site combination described above, wherein the detection reagent includes a primer pair group for amplifying the DNA methylation site combination; wherein the primer pair for amplifying SOCS-1_24 is shown as SEQ ID NO:1 and SEQ ID NO:2; and the primer pair for amplifying cg24403845_103 is shown as SEQ ID NO:3 and SEQ ID NO:4.

本说明书一个或多个实施例提供一种用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的试剂盒,所述试剂盒包含前文所述的检测试剂。One or more embodiments of the present specification provide a kit for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening, wherein the kit comprises the detection reagent described above.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

本说明书将以示例性实施例的方式进一步说明,这些示例性实施例将通过附图进行详细描述。这些实施例并非限制性的,在这些实施例中,相同的编号表示相同的结构。This specification will be further described in the form of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not restrictive, and in these embodiments, the same number represents the same structure.

图1是根据本说明书一些实施例所示的一种用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的系统的应用场景图;FIG1 is an application scenario diagram of a system for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening according to some embodiments of this specification;

图2是根据本说明书一些实施例所示的计算设备的架构的示意图;FIG2 is a schematic diagram of the architecture of a computing device according to some embodiments of the present specification;

图3是根据本说明书一些实施例所示的用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的系统的模块图;FIG3 is a module diagram of a system for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening according to some embodiments of this specification;

图4是根据本说明书一些实施例所示的结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的流程示意图;FIG4 is a schematic diagram of a process for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening according to some embodiments of this specification;

图5A和图5B是根据本说明书一些实施例所示的不同DNA甲基化位点甲基化率在正常样本和结直肠癌样本中的相关性;FIG5A and FIG5B are the correlations between the methylation rates of different DNA methylation sites in normal samples and colorectal cancer samples according to some embodiments of the present specification;

图6是根据本说明书一些实施例所示的预测模型建立的分析流程图;FIG6 is an analysis flow chart of establishing a prediction model according to some embodiments of this specification;

图7A和图7B是根据本说明书一些实施例所示的预测模型中甲基化位点β值和M值在正常样本和结直肠癌样本中的甲基化率分布图;FIG. 7A and FIG. 7B are methylation rate distribution diagrams of methylation site β values and M values in normal samples and colorectal cancer samples in the prediction model according to some embodiments of the present specification;

图8A和图8B是根据本说明书一些实施例所示的利用与2个甲基化位点相关的预测模型对训练样本集进行风险预测的ROC曲线图和训练样本集预测风险分数的瀑布图;8A and 8B are ROC curve diagrams of risk prediction for a training sample set using a prediction model associated with two methylation sites and a waterfall diagram of the risk score predicted for the training sample set according to some embodiments of the present specification;

图9A和图9B是根据本说明书一些实施例所示的利用与8个甲基化位点相关的预测模型对训练样本集进行风险预测的ROC曲线图和训练样本集预测风险分数的瀑布图;9A and 9B are ROC curve diagrams and waterfall diagrams of risk scores predicted for a training sample set using a prediction model associated with eight methylation sites according to some embodiments of the present specification;

图10A和图10B是根据本说明书一些实施例所示的利用与2个甲基化位点相关的预测模型对验证样本集预测风险分数的瀑布图和验证样本集预测风险分数的ROC曲线图;10A and 10B are waterfall diagrams of risk scores predicted for a validation sample set using a prediction model associated with two methylation sites and ROC curve diagrams of risk scores predicted for a validation sample set according to some embodiments of the present specification;

图11A和图11B是根据本说明书一些实施例所示的利用与8个甲基化位点相关的预测模型对验证样本集预测风险分数的瀑布图和验证样本集预测风险分数的ROC曲线图。11A and 11B are waterfall plots and ROC curve graphs of the risk scores predicted for the validation sample set using a prediction model associated with eight methylation sites according to some embodiments of the present specification.

具体实施方式DETAILED DESCRIPTION

为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地,下面描述中的附图仅仅是本说明书的一些示例或实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图将本说明书应用于其它类似情景。除非从语言环境中显而易见或另做说明,图中相同标号代表相同结构或操作。In order to more clearly illustrate the technical solutions of the embodiments of this specification, the following is a brief introduction to the drawings required for the description of the embodiments. Obviously, the drawings described below are only some examples or embodiments of this specification. For ordinary technicians in this field, without paying creative work, this specification can also be applied to other similar scenarios based on these drawings. Unless it is obvious from the language environment or otherwise explained, the same reference numerals in the figures represent the same structure or operation.

应当理解,本文使用的“系统”、“装置”、“单元”和/或“模块”是用于区分不同级别的不同组件、元件、部件、部分或装配的一种方法。然而,如果其他词语可实现相同的目的,则可通过其他表达来替换所述词语。It should be understood that the "system", "device", "unit" and/or "module" used herein are a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.

如本说明书和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。一般说来,术语“包括”与“包含”仅提示包括已明确标识的步骤和元素,而这些步骤和元素不构成一个排它性的罗列,方法或者设备也可能包含其它的步骤或元素。As shown in this specification and claims, unless the context clearly indicates an exception, the words "a", "an", "an" and/or "the" do not refer to the singular and may also include the plural. Generally speaking, the terms "comprises" and "includes" only indicate the inclusion of the steps and elements that have been clearly identified, and these steps and elements do not constitute an exclusive list. The method or device may also include other steps or elements.

本说明书中使用了流程图用来说明根据本说明书的实施例的系统所执行的操作。应当理解的是,前面或后面操作不一定按照顺序来精确地执行。相反,可以按照倒序或同时处理各个步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。Flowcharts are used in this specification to illustrate the operations performed by the system according to the embodiments of this specification. It should be understood that the preceding or following operations are not necessarily performed precisely in order. Instead, the steps may be processed in reverse order or simultaneously. At the same time, other operations may be added to these processes, or one or more operations may be removed from these processes.

DNA甲基化是DNA化学修饰的形式之一,是指在DNA甲基化转移酶(DNMTs)作用下,甲基基团(CH3-)共价结合到CpG结构的胞嘧啶第5位碳原子上的过程,常发生在基因启动子CpG岛区域,是重要的表观遗传学标志。已有的研究表明,异常的DNA甲基化是导致各类癌症发生的重要影响因素。例如,部分肿瘤相关基因的启动子区域的高甲基化会抑制相应基因的表达,反之,低甲基化会促进相应基因的表达。本说明书通过对早期结直肠癌患者与正常健康人群的样本中105个特异甲基化位点进行高通量检测分析,并基于特定算法,高效寻找出能准确分辨早期肺癌患者与健康人群的甲基化位点组合,通过结直肠癌患者与健康人组成的154个训练样本建立结直肠癌预测模型,通过84个验证样本的分析,能够实现结直肠癌的准确、快速、无创临床筛查。另一方面,可以以DNA甲基化位点作为结直肠癌标志物,进行结直肠癌筛查、患病风险预测、结直肠癌治疗效果的评估和结直肠癌治疗药物的筛选。该DNA甲基化位点组合的检测样本可广泛来源于受试者的体液、细胞、组织和器官,特别是受试者的结直肠灌洗液,能够用于实现准确、快速、无创的结直肠癌筛查、患病风险预测、预后预测及药物评估。DNA methylation is one of the forms of DNA chemical modification. It refers to the process in which a methyl group (CH3-) is covalently bound to the 5th carbon atom of cytosine in the CpG structure under the action of DNA methyltransferase (DNMTs). It often occurs in the CpG island region of the gene promoter and is an important epigenetic marker. Existing studies have shown that abnormal DNA methylation is an important influencing factor leading to the occurrence of various types of cancer. For example, high methylation in the promoter region of some tumor-related genes will inhibit the expression of the corresponding gene, and conversely, low methylation will promote the expression of the corresponding gene. This specification is through high-throughput detection and analysis of 105 specific methylation sites in samples of early colorectal cancer patients and normal healthy people, and based on a specific algorithm, it efficiently finds a combination of methylation sites that can accurately distinguish early lung cancer patients from healthy people, and establishes a colorectal cancer prediction model through 154 training samples composed of colorectal cancer patients and healthy people. Through the analysis of 84 verification samples, accurate, rapid and non-invasive clinical screening of colorectal cancer can be achieved. On the other hand, DNA methylation sites can be used as markers for colorectal cancer screening, disease risk prediction, colorectal cancer treatment effect evaluation, and colorectal cancer treatment drug screening. The test samples of this DNA methylation site combination can be widely derived from the subject's body fluids, cells, tissues and organs, especially the subject's colorectal lavage fluid, and can be used to achieve accurate, rapid, and non-invasive colorectal cancer screening, disease risk prediction, prognosis prediction, and drug evaluation.

本说明书提供一种结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的系统和装置,所述系统和装置基于前述DNA甲基化位点组合的相关甲基化水平评估受试者患结直肠癌的可能性或发展结直肠癌的风险。The present specification provides a system and device for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening. The system and device evaluate the possibility of a subject suffering from colorectal cancer or the risk of developing colorectal cancer based on the relevant methylation levels of the aforementioned DNA methylation site combination.

本说明书还提供一种DNA甲基化位点组合的检测试剂,包括扩增前述DNA甲基化位点组合的引物对组,能够在包括结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选等在内的多个方面进行广泛应用。The present specification also provides a detection reagent for a DNA methylation site combination, including a primer pair group for amplifying the aforementioned DNA methylation site combination, which can be widely used in multiple aspects including colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening.

本说明书还提供一种用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的试剂盒。The present specification also provides a kit for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening.

本说明书还提供DNA甲基化位点组合作为生物标志物的相关用途,以及DNA甲基化位点组合的检测试剂的相关用途。所述用途包括但不限于在制备用于结直肠癌筛查的试剂盒的用途,在制备用于结直肠癌患病风险预测的试剂盒的用途,在制备用于结直肠癌治疗效果的评估的试剂盒的用途,在制备用于结直肠癌治疗药物的筛选的试剂盒中的用途等,能够兼顾及提高筛查、预测、筛选的敏感度和特异性。This specification also provides the use of DNA methylation site combinations as biomarkers, and the use of detection reagents for DNA methylation site combinations. The uses include but are not limited to the use in preparing a kit for colorectal cancer screening, the use in preparing a kit for predicting the risk of colorectal cancer, the use in preparing a kit for evaluating the therapeutic effect of colorectal cancer, the use in preparing a kit for screening colorectal cancer therapeutic drugs, etc., which can take into account and improve the sensitivity and specificity of screening, prediction, and selection.

根据本说明书的一方面,提供一种用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的系统。图1是根据本说明书一些实施例所示的一种用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的系统的应用场景图。如图1所示,场景100可以包括处理设备110和存储设备120。According to one aspect of the present specification, a system for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening is provided. FIG1 is an application scenario diagram of a system for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening according to some embodiments of the present specification. As shown in FIG1 , scenario 100 may include a processing device 110 and a storage device 120.

处理设备110可以处理数据和/或信息。在一些实施例中,处理设备110可以从存储设备120或场景100的其他组件(例如,用户终端140、检测设备160)处获得数据和/或信息,并基于这些信息和/或数据执行程序指令,以执行一个或多个本说明书中描述的功能。例如,处理设备110可以从存储设备120处获取训练样本集,并基于训练样本集构建预测模型。又例如,处理设备110可以获取检测设备160测得的受试者生物样本150的DNA甲基化位点组合的甲基化水平相关信息,并调用存储在存储设备120处的预测模型处理该甲基化水平相关信息,以评估受试者患有结直肠癌的可能性或发展结直肠癌的风险。在一些实施例中,处理设备110可以为服务器或中央处理器。The processing device 110 can process data and/or information. In some embodiments, the processing device 110 can obtain data and/or information from the storage device 120 or other components of the scene 100 (e.g., the user terminal 140, the detection device 160), and execute program instructions based on this information and/or data to perform one or more functions described in this specification. For example, the processing device 110 can obtain a training sample set from the storage device 120 and build a prediction model based on the training sample set. For another example, the processing device 110 can obtain methylation level related information of the DNA methylation site combination of the subject's biological sample 150 measured by the detection device 160, and call the prediction model stored in the storage device 120 to process the methylation level related information to assess the possibility of the subject suffering from colorectal cancer or the risk of developing colorectal cancer. In some embodiments, the processing device 110 can be a server or a central processing unit.

存储设备120可以用于存储数据和/或信息。在一些实施例中,存储设备120可以存储从处理设备110或场景100的其他组件(例如,用户终端140、检测设备160)处获得数据和/或信息。例如,存储设备120可以存储预测模型,以备处理设备110调用。又例如,存储设备120可以从检测设备160处获取并存储受试者生物样本150的DNA甲基化位点组合的甲基化水平相关信息。再例如,存储设备120可以接收并存储用户终端140上传的信息,如受试者的身份信息等。The storage device 120 can be used to store data and/or information. In some embodiments, the storage device 120 can store data and/or information obtained from the processing device 110 or other components of the scene 100 (e.g., the user terminal 140, the detection device 160). For example, the storage device 120 can store a prediction model for use by the processing device 110. For another example, the storage device 120 can obtain and store information related to the methylation level of a combination of DNA methylation sites of a subject's biological sample 150 from the detection device 160. For another example, the storage device 120 can receive and store information uploaded by the user terminal 140, such as the subject's identity information.

在一些实施例中,场景100还包括网络130。网络130可以用于提供信息交换的渠道。在一些实施例中,处理设备110和场景100的其他组件(例如,存储设备120、用户终端140、检测设备160)之间可以通过网络130交换信息。例如,处理设备110可以通过网络130接收存储设备120中的数据。又例如,检测设备160测得的受试者生物样本150的DNA甲基化位点组合的甲基化水平相关信息可以通过网络130传输至处理设备110。在一些实施例中,网络130可以是有线网络或无线网络中的任意一种或多种。例如,网络130可以包括电缆网络、光纤网络等。在一些实施例中,网络130可以是点对点的、共享的、中心式的等各种拓扑结构或者多种拓扑结构的组合。在一些实施例中,网络130可以包括一个或以上网络接入点。例如,通过基站和/或一个或多个网络交换点等进出点,场景100的一个或多个组件可连接到网络130上以交换数据和/或信息。In some embodiments, the scene 100 also includes a network 130. The network 130 can be used to provide a channel for information exchange. In some embodiments, the processing device 110 and other components of the scene 100 (e.g., the storage device 120, the user terminal 140, the detection device 160) can exchange information through the network 130. For example, the processing device 110 can receive data in the storage device 120 through the network 130. For another example, the methylation level-related information of the DNA methylation site combination of the subject's biological sample 150 measured by the detection device 160 can be transmitted to the processing device 110 through the network 130. In some embodiments, the network 130 can be any one or more of a wired network or a wireless network. For example, the network 130 can include a cable network, an optical fiber network, etc. In some embodiments, the network 130 can be a point-to-point, shared, centralized, etc., or a combination of multiple topologies. In some embodiments, the network 130 can include one or more network access points. For example, through access points such as base stations and/or one or more network exchange points, one or more components of the scene 100 can be connected to the network 130 to exchange data and/or information.

在一些实施例中,场景100还包括用户终端140。用户终端140可用于实现场景100向用户提供的服务。例如,用户可以通过用户终端140向处理设备110发送受试者生物样本的DNA甲基化位点组合的甲基化水平相关信息。又例如,用户可以通过用户终端140接收处理设备110发送的受试者的评估结果。再例如,用户可以通过用户终端140向处理设备110发送受试者的临床检测结果,以使处理设备110基于受试者的临床检测结果更新训练样本集,并进行预测模型的迭代。在一些实施例中,用户终端140可以包括智能手机140-1、平板计算机140-2、膝上型计算机140-3等或其他具有输入和/或输出功能的设备中的一种或其任意组合。In some embodiments, the scene 100 also includes a user terminal 140. The user terminal 140 can be used to implement the services provided by the scene 100 to the user. For example, the user can send the methylation level related information of the DNA methylation site combination of the subject's biological sample to the processing device 110 through the user terminal 140. For another example, the user can receive the evaluation result of the subject sent by the processing device 110 through the user terminal 140. For another example, the user can send the clinical test results of the subject to the processing device 110 through the user terminal 140, so that the processing device 110 updates the training sample set based on the clinical test results of the subject and iterates the prediction model. In some embodiments, the user terminal 140 may include one or any combination of a smart phone 140-1, a tablet computer 140-2, a laptop computer 140-3, etc. or other devices with input and/or output functions.

在一些实施例中,场景100还包括检测设备160,用于检测受试者生物样本150的DNA甲基化位点组合的甲基化水平。作为示例,检测设备可以包括实现以下一种或多种方法的装置:WGBS、RRBS、oxBS-seq、MethylCap-seq、MBD-seq、MeDIP-seq、HPLC、MSRF、MASP、甲基化芯片法、焦磷酸测序法、dPCR和MS-PCR。In some embodiments, the scene 100 further includes a detection device 160 for detecting the methylation level of a combination of DNA methylation sites of a subject's biological sample 150. As an example, the detection device may include a device for implementing one or more of the following methods: WGBS, RRBS, oxBS-seq, MethylCap-seq, MBD-seq, MeDIP-seq, HPLC, MSRF, MASP, methylation chip method, pyrophosphate sequencing method, dPCR, and MS-PCR.

根据本说明书的又一方面,提供一种计算设备。图2是根据本说明书一些实施例所示的计算设备的架构的示意图。如图2所示,计算设备200包括处理器210、存储器220、输入/输出接口230和通信端口240。在一些实施例中,计算设备200可以实现处理设备110和/或存储设备120。例如,处理设备110可以在计算设备200上实现,并且计算设备200被配置为执行本说明书描述的处理设备110的功能。在一些实施例中,用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的装置可以在计算设备200中实现。According to another aspect of the present specification, a computing device is provided. FIG. 2 is a schematic diagram of the architecture of a computing device according to some embodiments of the present specification. As shown in FIG. 2, a computing device 200 includes a processor 210, a memory 220, an input/output interface 230, and a communication port 240. In some embodiments, the computing device 200 may implement the processing device 110 and/or the storage device 120. For example, the processing device 110 may be implemented on the computing device 200, and the computing device 200 is configured to perform the functions of the processing device 110 described in the present specification. In some embodiments, an apparatus for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation, and/or colorectal cancer treatment drug screening may be implemented in the computing device 200.

处理器210可以执行计算指令(程序代码)并执行本说明书描述的处理设备110的功能。计算指令可以包括程序、对象、组件、数据结构、过程、模块和功能(功能指本申请中描述的特定功能)。例如,处理器210可以处理用户输入的结直肠癌筛查或结直肠癌患病风险预测的可能性的指令。具体地,处理器210可以获取受试者生物样本中DNA甲基化位点组合(例如,SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135和TBR1_161中的一种或多种)的甲基化水平,基于所述DNA甲基化位点组合的甲基化水平,使用预测模型检测所述受试者是否患有结直肠癌或预测所述受试者患有结直肠癌的风险。在一些实施例中,计算设备200可以包括一个或多个处理器210;处理器210可以包括中央处理器(CPU)、专用集成电路(ASIC)以及能够执行一个或多个功能的任何电路和处理器等,或者任意组合。The processor 210 can execute computing instructions (program code) and execute the functions of the processing device 110 described in this specification. The computing instructions may include programs, objects, components, data structures, processes, modules and functions (functions refer to specific functions described in this application). For example, the processor 210 can process the instructions of the possibility of colorectal cancer screening or colorectal cancer risk prediction input by the user. Specifically, the processor 210 can obtain the methylation level of the DNA methylation site combination (for example, one or more of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161) in the subject's biological sample, and based on the methylation level of the DNA methylation site combination, the prediction model is used to detect whether the subject suffers from colorectal cancer or predict the risk of the subject suffering from colorectal cancer. In some embodiments, computing device 200 may include one or more processors 210; processor 210 may include a central processing unit (CPU), an application specific integrated circuit (ASIC), and any circuits and processors capable of performing one or more functions, etc., or any combination thereof.

存储器220可以存储从场景100的任何组件处获得的数据/信息。在一些实施例中,存储器220可以包括随机存取存储器(RAM)、只读存储器(ROM)等,或其任意组合。The memory 220 may store data/information obtained from any component of the scene 100. In some embodiments, the memory 220 may include a random access memory (RAM), a read-only memory (ROM), etc., or any combination thereof.

输入/输出接口230可以用于输入或输出信号、数据或信息。在一些实施例中,输入/输出接口230可以用于实现用户(例如,受试者、操作者等)与处理器210的交互行为。在一些实施例中,用户可以通过输入/输出接口230输入受试者的相关信息(例如,DNA甲基化位点组合的甲基化水平相关信息,以及姓名、年龄等基础身份信息)。在一些实施例中,输入/输出接口230可以包括输入装置和输出装置。例如,键盘、鼠标、显示设备、麦克风和扬声器等。The input/output interface 230 can be used to input or output signals, data or information. In some embodiments, the input/output interface 230 can be used to implement the interaction between the user (e.g., subject, operator, etc.) and the processor 210. In some embodiments, the user can input relevant information of the subject (e.g., information related to the methylation level of the DNA methylation site combination, and basic identity information such as name and age) through the input/output interface 230. In some embodiments, the input/output interface 230 may include an input device and an output device. For example, a keyboard, a mouse, a display device, a microphone, and a speaker, etc.

通信端口240可以连接到网络130以便进行数据通信。连接可以是有线连接、无线连接或两者的组合,例如通过电缆、光缆、移动网络、WIFI、WLAN或蓝牙等连接。在一些实施例中,通信端口240可以是标准化端口,如RS232、RS485等。在一些实施例中,通信端口240可以是专门设计的端口。The communication port 240 can be connected to the network 130 for data communication. The connection can be a wired connection, a wireless connection, or a combination of the two, such as a connection via an electric cable, an optical cable, a mobile network, WIFI, WLAN, or Bluetooth. In some embodiments, the communication port 240 can be a standardized port, such as RS232, RS485, etc. In some embodiments, the communication port 240 can be a specially designed port.

图3是根据本说明书一些实施例所示的用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的系统的模块图。如图3所示,用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的系统300包括获取模块310和分析模块320。在一些实施例中,这些模块可以包含在处理设备110或处理器210中。FIG3 is a module diagram of a system for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening according to some embodiments of this specification. As shown in FIG3, a system 300 for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening includes an acquisition module 310 and an analysis module 320. In some embodiments, these modules may be included in the processing device 110 or the processor 210.

获取模块310可以用于获取受试者生物样本中DNA甲基化位点组合的甲基化水平,例如,所述DNA甲基化位点组合可以包括SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135和TBR1_161中的一种或多种位点。The acquisition module 310 can be used to obtain the methylation level of a combination of DNA methylation sites in a biological sample of a subject. For example, the combination of DNA methylation sites may include one or more sites of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161.

在一些实施例中,获取模块310可以包括检测单元和信息处理单元。检测单元可以用于对受试者的生物样本进行DNA甲基化检测。示例性的,检测单元可以包括实现以下一种或多种方法的装置:WGBS、RRBS、oxBS-seq、MethylCap-seq、MBD-seq、MeDIP-seq、HPLC、MSRF、MASP、甲基化芯片法、焦磷酸测序法、dPCR和MS-PCR。信息处理单元可以用于处理检测单元的检测数据,以获得受试者生物样本的DNA甲基化位点组合的甲基化水平相关信息。In some embodiments, the acquisition module 310 may include a detection unit and an information processing unit. The detection unit may be used to perform DNA methylation detection on a biological sample of a subject. Exemplarily, the detection unit may include a device for implementing one or more of the following methods: WGBS, RRBS, oxBS-seq, MethylCap-seq, MBD-seq, MeDIP-seq, HPLC, MSRF, MASP, methylation chip method, pyrophosphate sequencing, dPCR, and MS-PCR. The information processing unit may be used to process the detection data of the detection unit to obtain information related to the methylation level of the DNA methylation site combination of the subject's biological sample.

分析模块320可以用于基于受试者生物样本的DNA甲基化位点组合的甲基化水平,使用预测模型评估受试者是否可能患有结直肠癌或存在发展结直肠癌的风险,评估结直肠癌治疗效果和/或筛选结直肠癌治疗药物。在一些实施例中,分析模块320可以用于使用基于DNA甲基化位点组合的甲基化阈值的模型进行评估。在一些实施例中,分析模块320可以用于使用基于机器学习算法或深度学习算法构建的模型进行评估。The analysis module 320 can be used to evaluate whether the subject may have colorectal cancer or be at risk of developing colorectal cancer based on the methylation level of the combination of DNA methylation sites in the subject's biological sample using a prediction model, evaluate the effect of colorectal cancer treatment and/or screen colorectal cancer treatment drugs. In some embodiments, the analysis module 320 can be used to evaluate using a model based on the methylation threshold of the combination of DNA methylation sites. In some embodiments, the analysis module 320 can be used to evaluate using a model built based on a machine learning algorithm or a deep learning algorithm.

在一些实施例中,系统300还包括确定模块330。确定模块330可以用于获取训练样本集,训练样本集包括已知的结直肠癌患者和非结直肠癌患者(健康人)的DNA甲基化位点的甲基化水平;以及使用ROC曲线对训练样本集进行分析,确定区分结直肠癌患者和非结直肠癌正常人的预测模型。In some embodiments, the system 300 further includes a determination module 330. The determination module 330 can be used to obtain a training sample set, the training sample set including the methylation levels of DNA methylation sites of known colorectal cancer patients and non-colorectal cancer patients (healthy people); and analyze the training sample set using a ROC curve to determine a prediction model for distinguishing colorectal cancer patients from non-colorectal cancer normal people.

术语“ROC曲线”(或称受试者工作特征曲线)是以实验敏感度(真阳性率)为纵坐标,以1-特异性(假阳性率)为横坐标,绘制的曲线。ROC曲线可以评价模型性能。The term "ROC curve" (or receiver operating characteristic curve) is a curve drawn with the sensitivity of the experiment (true positive rate) as the ordinate and 1-specificity (false positive rate) as the abscissa. The ROC curve can evaluate the performance of the model.

关于系统300各模块实现其功能的更多内容可以在本说明书其他地方找到(例如,图4及其描述)。More information about how each module of the system 300 implements its functions can be found elsewhere in this specification (eg, FIG. 4 and its description).

应当理解,图3所示的用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的系统300及其模块可以利用各种方式来实现。例如,在一些实施例中,系统300及其模块可以通过硬件、软件或者软件和硬件的结合来实现。其中,硬件部分可以利用专用逻辑来实现;软件部分则可以存储在存储器中,由适当的指令执行系统,例如微处理器或者专用设计硬件来执行。本领域技术人员可以理解上述的方法和系统可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本说明书的系统及其模块不仅可以有诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用例如由各种类型的处理器所执行的软件实现,还可以由上述硬件电路和软件的结合(例如,固件)来实现。It should be understood that the system 300 and its modules for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening shown in FIG3 can be implemented in various ways. For example, in some embodiments, the system 300 and its modules can be implemented by hardware, software, or a combination of software and hardware. Among them, the hardware part can be implemented using dedicated logic; the software part can be stored in a memory and executed by an appropriate instruction execution system, such as a microprocessor or a dedicated design hardware. Those skilled in the art can understand that the above-mentioned methods and systems can be implemented using computer executable instructions and/or included in a processor control code, for example, such as a carrier medium such as a disk, CD or DVD-ROM, a programmable memory such as a read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. Such code is provided. The system and its modules of this specification can not only be implemented by hardware circuits such as ultra-large-scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but can also be implemented by software executed by various types of processors, and can also be implemented by a combination of the above-mentioned hardware circuits and software (for example, firmware).

需要注意的是,以上对于系统300及其模块的描述,仅为描述方便,并不能把本说明书限制在所举实施例范围之内。可以理解,对于本领域的技术人员来说,在了解该系统的原理后,可能在不背离这一原理的情况下,对各个模块进行任意组合,或者构成子系统与其他模块连接。在一些实施例中,图3中披露的获取模块、分析模块和训练模块可以是一个系统中的不同模块,也可以是一个模块实现上述的两个或两个以上模块的功能。例如,各个模块可以共用一个存储模块,各个模块也可以分别具有各自的存储模块。诸如此类的变形,均在本说明书的保护范围之内。It should be noted that the above description of the system 300 and its modules is only for the convenience of description, and does not limit this specification to the scope of the embodiments. It is understandable that for those skilled in the art, after understanding the principle of the system, it is possible to arbitrarily combine the various modules, or form a subsystem to connect with other modules without deviating from this principle. In some embodiments, the acquisition module, analysis module and training module disclosed in Figure 3 can be different modules in a system, or one module can implement the functions of two or more of the above modules. For example, each module can share a storage module, or each module can have its own storage module. Such variations are all within the scope of protection of this specification.

根据本说明书的又一方面,提供一种结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的方法。图4是根据本说明书一些实施例所示的结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的流程示意图。如图4所示,流程400包括步骤401和步骤403。在一些实施例中,流程400中的至少一部分步骤(例如步骤401、步骤403)可以由计算设备(如图2所示的计算设备200,图1所示的处理设备110)完成。例如,流程400中的至少一部分步骤可以被实现为存储在存储设备120、存储器220中的一个指令(例如,应用程序)。图1中的处理设备110,图2中的处理器210和/或模块可以执行该指令,并且在执行指令时,处理设备110、处理器210和/或模块可以被配置为执行流程400。以下所示过程的操作仅出于说明的目的。在一些实施例中,流程400可以利用未描述的一个或以上附加操作和/或未描述的一个或以上操作来完成。另外,图4所示和以下描述的过程的操作顺序并非旨在限制。According to another aspect of this specification, a method for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening is provided. Figure 4 is a schematic flow chart of colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening according to some embodiments of this specification. As shown in Figure 4, process 400 includes steps 401 and 403. In some embodiments, at least a portion of the steps in process 400 (e.g., step 401, step 403) can be completed by a computing device (computing device 200 as shown in Figure 2, processing device 110 as shown in Figure 1). For example, at least a portion of the steps in process 400 can be implemented as an instruction (e.g., application program) stored in storage device 120, memory 220. The processing device 110 in Figure 1, the processor 210 and/or module in Figure 2 can execute the instruction, and when executing the instruction, the processing device 110, the processor 210 and/or the module can be configured to execute process 400. The operations of the process shown below are for illustration purposes only. In some embodiments, process 400 can be completed using one or more additional operations not described and/or one or more operations not described. In addition, the order of operations of the process shown in Figure 4 and described below is not intended to be limiting.

步骤401,获取受试者生物样本中DNA甲基化位点组合的甲基化水平。在一些实施例中,步骤401可以由计算设备(例如,图1的处理设备110、图3的获取模块310)执行。Step 401, obtaining the methylation level of the DNA methylation site combination in the subject's biological sample. In some embodiments, step 401 may be performed by a computing device (eg, the processing device 110 of FIG. 1 , the acquisition module 310 of FIG. 3 ).

在一些实施例中,患有结直肠癌的受试者的生物样本中DNA甲基化位点组合的甲基化水平可区别于非结直肠癌癌受试者(或称正常受试者)的生物样本中DNA甲基化位点组合的甲基化水平。In some embodiments, the methylation level of a combination of DNA methylation sites in a biological sample of a subject with colorectal cancer can be distinguished from the methylation level of a combination of DNA methylation sites in a biological sample of a subject without colorectal cancer (or normal subject).

如本文所用,术语“受试者”(或称“个体”)是指接受观察、检测或实验的对象。在一些实施例中,受试者可以是哺乳动物。哺乳动物包括但不限于灵长类(包括人和非人灵长类)以及啮齿动物(例如,小鼠和大鼠)。在一些实施例中,哺乳动物可以是人。As used herein, the term "subject" (or "individual") refers to an object that is observed, tested, or experimented on. In some embodiments, the subject can be a mammal. Mammals include, but are not limited to, primates (including humans and non-human primates) and rodents (e.g., mice and rats). In some embodiments, the mammal can be a human.

术语“生物样本”(或称“样本”、“样品”)是指分离自受试者的器官、组织、细胞和/或体液的组合物,该组合物包含一种或多种目标分析物(例如,核酸、代谢物等)。在一些实施例中,生物样本可以来自于受试者的体液。体液包括但不限于全血、血浆、血清、组织液、唾液、尿液、灌洗液(例如,结直肠灌洗液)等,或其组合。在一些实施例中,生物样本来自于受试者的结直肠癌灌洗液,特别是结直肠癌灌洗液的有形成分,结直肠灌洗液有形成分可包含循环游离核酸(例如,来源于结直肠的循环游离DNA(cfDNA))、循环肿瘤细胞(CTCs)(例如,结直肠肿瘤释放的肿瘤细胞)和脱落细胞(例如,结直肠组织脱落的细胞)中的一种或多种。The term "biological sample" (or "sample", "sample") refers to a composition separated from an organ, tissue, cell and/or body fluid of a subject, which composition contains one or more target analytes (e.g., nucleic acids, metabolites, etc.). In some embodiments, the biological sample can be derived from a subject's body fluid. Body fluids include, but are not limited to, whole blood, plasma, serum, tissue fluid, saliva, urine, lavage fluid (e.g., colorectal lavage fluid), etc., or a combination thereof. In some embodiments, the biological sample is derived from a subject's colorectal cancer lavage fluid, in particular, a formed component of a colorectal cancer lavage fluid, and the formed components of the colorectal lavage fluid may include one or more of circulating free nucleic acids (e.g., circulating free DNA (cfDNA) derived from the colorectum), circulating tumor cells (CTCs) (e.g., tumor cells released by colorectal tumors), and exfoliated cells (e.g., cells shed by colorectal tissue).

术语“甲基化水平”是评价DNA甲基化位点的甲基化状态的指标。在一些实施例中,甲基化水平可以通过DNA甲基化位点的甲基化M值来定量描述。The term "methylation level" is an indicator for evaluating the methylation state of a DNA methylation site. In some embodiments, the methylation level can be quantitatively described by the methylation M value of the DNA methylation site.

在一些实施例中,DNA甲基化位点的甲基化M值可以基于DNA甲基化位点的甲基化β值通过对数变换得到,DNA甲基化位点的甲基化β值与M值可以通过公式(1)和公式(2)来计算:In some embodiments, the methylation M value of the DNA methylation site can be obtained based on the methylation β value of the DNA methylation site by logarithmic transformation, and the methylation β value and M value of the DNA methylation site can be calculated by formula (1) and formula (2):

其中,Methylated和Unmethylated分别是待检测位点的甲基化和未甲基化的序列数目,offset是一个小的常数,用于稳定低强度信号的比率,防止分母为零。Among them, Methylated and Unmethylated are the number of methylated and unmethylated sequences at the site to be detected, respectively, and offset is a small constant used to stabilize the ratio of low-intensity signals to prevent the denominator from being zero.

β值反映了能够和给定被甲基化的序列匹配的寡核苷酸的比率,用于表示DNA甲基化水平的数值,是序列中检测位点的甲基化率,其范围从0(完全未甲基化)到1(完全甲基化)。β值具有比较直观的生物意义,一般认为β值≥0.6被认为是完全甲基化的,≤0.2被认为是完全未甲基化的,而介于两者之间被认为是部分甲基化的。The β value reflects the ratio of oligonucleotides that can match a given methylated sequence. It is used to represent the value of DNA methylation level. It is the methylation rate of the detection site in the sequence, ranging from 0 (completely unmethylated) to 1 (completely methylated). The β value has a relatively intuitive biological meaning. It is generally believed that a β value ≥ 0.6 is considered to be completely methylated, ≤ 0.2 is considered to be completely unmethylated, and between the two is considered to be partially methylated.

M值本质上是甲基化与非甲基化信号强度比值的对数变换,这种变换有利于数据的后续统计分析。相比于β值,M值的分布更接近正态分布,这使得M值在差异甲基化分析中尤其有用。M值的这一属性使得它特别适合进行参数统计测试(如t检验和ANOVA)和线性模型分析。β值随着甲基化水平的变化而变化,特别是在极端的低甲基化(接近0)或高甲基化(接近1)水平时,其方差会显著减小。这种方差的不均匀性可能会影响统计分析的灵敏度和特异性。相比之下,M值的方差相对更加稳定,不会随甲基化水平的改变而显著变化,这使得它在比较不同样本或组之间的甲基化差异时更为可靠。The M value is essentially a logarithmic transformation of the ratio of methylated to unmethylated signal intensities, which facilitates subsequent statistical analysis of the data. Compared with the β value, the distribution of the M value is closer to the normal distribution, which makes the M value particularly useful in differential methylation analysis. This property of the M value makes it particularly suitable for parametric statistical tests (such as t-tests and ANOVA) and linear model analysis. The β value changes with changes in methylation levels, especially at extreme low methylation (close to 0) or high methylation (close to 1) levels, its variance decreases significantly. This heterogeneity of variance may affect the sensitivity and specificity of statistical analysis. In contrast, the variance of the M value is relatively more stable and does not change significantly with changes in methylation levels, which makes it more reliable when comparing methylation differences between different samples or groups.

在一些实施例中,甲基化位点β值可以用于初步筛选DNA甲基化差异性位点,甲基化位点M值可以用于逻辑回归模型的训练。In some embodiments, the methylation site β value can be used to preliminarily screen DNA methylation differential sites, and the methylation site M value can be used to train a logistic regression model.

在一些实施例中,所述DNA甲基化位点组合适于检测不同阶段的结直肠癌,例如早期(例如,Ⅰ期、Ⅱ期)和晚期(例如,Ⅲ期、Ⅳ期)。在一些较优的实施例中,所述DNA甲基化位点组合适于区分早期结直肠癌群体和正常群体。In some embodiments, the DNA methylation site combination is suitable for detecting different stages of colorectal cancer, such as early stage (e.g., stage I, stage II) and late stage (e.g., stage III, stage IV). In some preferred embodiments, the DNA methylation site combination is suitable for distinguishing early colorectal cancer populations from normal populations.

所述DNA甲基化位点组合包括一个或多个DNA甲基化位点。如本文所使用,术语“DNA甲基化位点”(或称“甲基化位点”)是指在基因组DNA的CpG二核苷酸的胞嘧啶第5'碳位共价结合一个甲基基团,成为5-甲基胞嘧啶(5mC)。在一些实施例中,所述DNA甲基化位点组合中的每个DNA甲基化位点的甲基化状态可以与结直肠癌的发生、发展相关,所述DNA甲基化位点组合的DNA甲基化位点可位于结直肠癌相关基因(例如,已知的或可能潜在的结直肠癌抑癌基因)上。结直肠癌相关基因的非限制性实例可以包括但不限于:SOCS-1、cg24403845、HOXD3、SDC2-2、SFRP1、SP9、TAC1-2及TBR1。The DNA methylation site combination includes one or more DNA methylation sites. As used herein, the term "DNA methylation site" (or "methylation site") refers to a methyl group covalently bound to the 5' carbon position of the cytosine of the CpG dinucleotide of the genomic DNA, becoming 5-methylcytosine (5mC). In some embodiments, the methylation state of each DNA methylation site in the DNA methylation site combination may be related to the occurrence and development of colorectal cancer, and the DNA methylation site of the DNA methylation site combination may be located on a colorectal cancer-related gene (e.g., a known or potentially potential colorectal cancer tumor suppressor gene). Non-limiting examples of colorectal cancer-related genes may include, but are not limited to: SOCS-1, cg24403845, HOXD3, SDC2-2, SFRP1, SP9, TAC1-2 and TBR1.

在一些实施例中,所述DNA甲基化位点组合可以包含位于SOCS-1、cg24403845、HOXD3、SDC2-2、SFRP1、SP9、TAC1-2和/或TBR1上的一个或多个DNA甲基化位点。In some embodiments, the combination of DNA methylation sites may include one or more DNA methylation sites located on SOCS-1, cg24403845, HOXD3, SDC2-2, SFRP1, SP9, TAC1-2 and/or TBR1.

所述DNA甲基化位点组合中的每个DNA甲基化位点的甲基化水平均与结直肠癌显著相关。可以理解的是,对于所述DNA甲基化位点组合中的每个DNA甲基化位点而言,其在已知的结直肠癌群体中的甲基化水平与在正常群体中的甲基化水平之间存在显著性差异。The methylation level of each DNA methylation site in the combination of DNA methylation sites is significantly associated with colorectal cancer. It is understandable that for each DNA methylation site in the combination of DNA methylation sites, there is a significant difference between its methylation level in a known colorectal cancer population and its methylation level in a normal population.

在一些实施例中,所述DNA甲基化位点组合可以包含以下组中的至少1个、2个、3个、4个、5个、6个、7个或8个位点:位于SOCS-1基因上染色体坐标为chr16:11348913的位点SOCS-1_24;位于cg24403845基因上染色体坐标为chr10:108924288的位点cg24403845_103;位于HOXD3基因上染色体坐标为chr2:177027898的位点HOXD3_26;位于SDC2-2_基因上染色体坐标为chr8:97505775的位点SDC2-2_46;位于SFRP1基因上染色体坐标为chr8:41167015的位点SFRP1_48;位于SP9基因上染色体坐标为chr2:175199694的位点SP9_25;位于TAC1-2基因上染色体坐标为chr7:97361597的位点TAC1-2_135;以及位于TBR1基因上染色体坐标为chr2:162283730的位点TBR1_161。In some embodiments, the combination of DNA methylation sites may include at least 1, 2, 3, 4, 5, 6, 7 or 8 sites in the following group: site SOCS-1_24 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site cg24403845_103 located at chromosome coordinates chr10:108924288 on the cg24403845 gene; site HOXD3_26 located at chromosome coordinates chr2:177027898 on the HOXD3 gene; site SDC2_27 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site SDC2_28 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site SDC2_29 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site SDC2_20 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site SDC2_21 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site SDC2_22 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site SDC2_23 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site SDC2_24 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site SDC2_25 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene; site SDC2_26 located at chromosome coordinates chr16:11348913 on the SOCS-1 gene -2_ site SDC2-2_46 with chromosome coordinates chr8:97505775 on the gene; site SFRP1_48 with chromosome coordinates chr8:41167015 on the gene SFRP1; site SP9_25 with chromosome coordinates chr2:175199694 on the gene SP9; site TAC1-2_135 with chromosome coordinates chr7:97361597 on the gene TAC1-2; and site TBR1_161 with chromosome coordinates chr2:162283730 on the gene TBR1.

需要说明的是,本文所用染色体坐标信息来源于人类参考基因组hg19(GRCh37)。应当注意的是,本文所用的染色体坐标信息还可以转换为人类参考基因组hg38(GRCh38),本说明书不对此进行限制。It should be noted that the chromosome coordinate information used in this article is derived from the human reference genome hg19 (GRCh37). It should be noted that the chromosome coordinate information used in this article can also be converted into the human reference genome hg38 (GRCh38), and this specification does not limit this.

在一些较优的实施例中,所述DNA甲基化位点组合可以包括SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135及TBR1_161。可选的,所述DNA甲基化位点组合还可以包括一个或多个其他结直肠癌相关基因上的DNA甲基化位点。在一些实施例中,所述DNA甲基化位点组合可以包括SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135及TBR1_161中的一种或多种以及实施例表1中的其他甲基化位点中的一个或多个。In some preferred embodiments, the DNA methylation site combination may include SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161. Optionally, the DNA methylation site combination may also include DNA methylation sites on one or more other colorectal cancer-related genes. In some embodiments, the DNA methylation site combination may include one or more of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161 and one or more of the other methylation sites in Example Table 1.

在一些实施例中,DNA甲基化位点组合可以包括SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135及TBR1_161中的一种、二种、三种、四种、五种、六种、七种或全部。在一些实施例中,DNA甲基化位点组合可以包括至少含有SOCS-1_24和cg24403845_103的组合。例如,DNA甲基化位点组合可以包括SOCS-1_24和cg24403845_103这两种组合。又例如,DNA甲基化位点组合可以包括SOCS-1_24和cg24403845_103这两种组合和八个位点中剩余1个、2个、3个、4个或5个。In some embodiments, the DNA methylation site combination may include one, two, three, four, five, six, seven or all of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161. In some embodiments, the DNA methylation site combination may include a combination containing at least SOCS-1_24 and cg24403845_103. For example, the DNA methylation site combination may include the two combinations of SOCS-1_24 and cg24403845_103. For another example, the DNA methylation site combination may include the two combinations of SOCS-1_24 and cg24403845_103 and the remaining 1, 2, 3, 4 or 5 of the eight sites.

在一些更优的实施例中,所述DNA甲基化位点组合可以由SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135及TBR1_161组成。在一些实施例中,所述DNA甲基化位点组合可以由SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135及TBR1_161以及实施例表1中的其他甲基化位点中的一个或多个组成。In some more preferred embodiments, the DNA methylation site combination may consist of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161. In some embodiments, the DNA methylation site combination may consist of one or more of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161 and other methylation sites in Table 1 of the embodiment.

本说明书一些实施例提供的DNA甲基化位点组合的甲基化水平与结直肠癌之间存在显著的相关性。该DNA甲基化位点组合的甲基化状态可以被量化,并用于衡量DNA甲基化位点组合的甲基化水平。包含该DNA甲基化位点组合的样本可广泛地采集自受试者的器官、组织、细胞和体液等,特别是可以采集自受试者的结直肠灌洗液。该DNA甲基化位点组合作为结直肠癌标志物在结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估、结直肠癌治疗药物的筛选等方面的应用可实现筛查/诊断、预测、评估的敏感度和特异性的提高。There is a significant correlation between the methylation level of the DNA methylation site combination provided in some embodiments of this specification and colorectal cancer. The methylation state of the DNA methylation site combination can be quantified and used to measure the methylation level of the DNA methylation site combination. Samples containing the DNA methylation site combination can be widely collected from organs, tissues, cells and body fluids of subjects, and in particular can be collected from colorectal lavage fluid of subjects. The application of the DNA methylation site combination as a colorectal cancer marker in colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation, colorectal cancer treatment drug screening, etc. can achieve screening/diagnosis, prediction, and evaluation of sensitivity and specificity.

在一些实施例中,DNA甲基化位点组合的甲基化水平可以通过使用DNA甲基化位点组合的检测试剂检测受试者生物样本而获得。DNA甲基化位点组合的检测试剂用于实现DNA甲基化位点组合甲基化水平的检测。In some embodiments, the methylation level of the DNA methylation site combination can be obtained by detecting a biological sample of a subject using a detection reagent for the DNA methylation site combination. The detection reagent for the DNA methylation site combination is used to detect the methylation level of the DNA methylation site combination.

关于DNA甲基化位点组合的检测试剂的更多内容可以在本说明书的其他地方找到。More information about detection reagents for DNA methylation site combinations can be found elsewhere in this specification.

计算设备(例如,图1的处理设备110、图3的获取模块310)可通过多种方式实现步骤401的执行。在一些实施例中,处理设备110可调用存储在存储设备120中的受试者生物样本的DNA甲基化位点组合的甲基化水平相关信息。例如,受试者生物样本的DNA甲基化位点组合的甲基化水平相关信息由用户终端140经网络130上传至存储设备120,处理设备110可调用并获取该甲基化水平相关信息用于进一步的分析评估。在一些实施例中,处理设备110可以接收检测设备160检测获得的受试者生物样本的DNA甲基化位点组合的甲基化水平相关信息。例如,处理设备110向检测设备160(例如,PCR仪和/或NGS测序仪)发送检测指令,检测设备160基于该检测指令检测获得受试者生物样本的DNA甲基化位点组合的甲基化水平相关信息,并将该甲基化水平相关信息发送至处理设备110。在一些实施例中,处理设备110可基于用户输入获得受试者生物样本的DNA甲基化位点组合的甲基化水平相关信息。The computing device (e.g., the processing device 110 of FIG. 1 , the acquisition module 310 of FIG. 3 ) can implement the execution of step 401 in a variety of ways. In some embodiments, the processing device 110 can call the methylation level related information of the DNA methylation site combination of the subject's biological sample stored in the storage device 120. For example, the methylation level related information of the DNA methylation site combination of the subject's biological sample is uploaded to the storage device 120 by the user terminal 140 via the network 130, and the processing device 110 can call and obtain the methylation level related information for further analysis and evaluation. In some embodiments, the processing device 110 can receive the methylation level related information of the DNA methylation site combination of the subject's biological sample obtained by the detection device 160. For example, the processing device 110 sends a detection instruction to the detection device 160 (e.g., a PCR instrument and/or an NGS sequencer), and the detection device 160 detects and obtains the methylation level related information of the DNA methylation site combination of the subject's biological sample based on the detection instruction, and sends the methylation level related information to the processing device 110. In some embodiments, the processing device 110 may obtain information related to the methylation level of a combination of DNA methylation sites in a biological sample of a subject based on user input.

步骤403,基于DNA甲基化位点组合的甲基化水平,使用预测模型评估受试者是否患有结直肠癌、预测受试者发展结直肠癌的风险、评估受试者治疗结直肠癌的效果和/或评估结直肠癌治疗药物的效果。在一些实施例中,步骤403可以由计算设备(例如,图1的处理设备110、图3的分析模块320)执行。Step 403, based on the methylation level of the DNA methylation site combination, using a prediction model to evaluate whether the subject has colorectal cancer, predict the risk of the subject developing colorectal cancer, evaluate the effect of colorectal cancer treatment on the subject, and/or evaluate the effect of colorectal cancer treatment drugs. In some embodiments, step 403 can be performed by a computing device (e.g., the processing device 110 of FIG. 1, the analysis module 320 of FIG. 3).

本说明书一个或多个实施例提供一种利用甲基化位点进行结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的预测模型。One or more embodiments of the present specification provide a predictive model for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening using methylation sites.

在一些实施例中,预测模型可以是基于DNA甲基化位点组合的逻辑回归模型。预测模型可以通过计算受试者患有结直肠癌、发展结直肠癌的风险分数及受试者接受治疗和/或给药后结直肠癌的风险水平,实现对受试者患癌可能性、发展癌症风险、受试者治疗结直肠癌的效果和/或结直肠癌治疗药物的效果的评估。在一些实施例中,逻辑回归模型可以通过计算风险分数获得受试者患有结直肠癌、发展结直肠癌的风险。在一些实施例中,预测模型可以由以下公式(3)表示:In some embodiments, the prediction model can be a logistic regression model based on a combination of DNA methylation sites. The prediction model can be implemented by calculating the risk score of the subject suffering from colorectal cancer, developing colorectal cancer, and the risk level of colorectal cancer after the subject receives treatment and/or administration, so as to evaluate the possibility of the subject suffering from cancer, the risk of developing cancer, the effect of the subject's treatment of colorectal cancer, and/or the effect of the colorectal cancer treatment drug. In some embodiments, the logistic regression model can obtain the risk of the subject suffering from colorectal cancer and developing colorectal cancer by calculating the risk score. In some embodiments, the prediction model can be represented by the following formula (3):

Risk score=-4.38+5.16*(cg24403845_103)+1.18*SOCS-1_24 (3)Risk score=-4.38+5.16*(cg24403845_103)+1.18*SOCS-1_24 (3)

其中,Risk score表示受试者患有结直肠癌、发展结直肠癌的风险分数及受试者接受治疗和/或给药后结直肠癌的风险水平,基因位点代表该基因位点的甲基化M值。Among them, Risk score represents the risk score of the subject suffering from colorectal cancer, developing colorectal cancer, and the risk level of colorectal cancer after the subject receives treatment and/or medication, and the gene locus represents the methylation M value of the gene locus.

在一些实施例中,风险分数计算公式还可以由以下公式(4)表示:In some embodiments, the risk score calculation formula may also be represented by the following formula (4):

Risk score=-7.11+2.89*cg24403845_103+3.61*SOCS-1_24+1.05*TAC1-2_135+0.92*SDC2-2_46+0.77*SFRP1_48+0.72*TBR1_161+0.67*SP9_25+0.19*HOXD3_26 (4)Risk score=-7.11+2.89*cg24403845_103+3.61*SOCS-1_24+1.05*TAC1-2_135+0.92*SDC2-2_46+0.77*SFRP1_48+0.72*TBR1_161+0.67*SP9_25+0.19*HOXD3_26 (4)

其中,Risk score表示受试者患有结直肠癌、发展结直肠癌的风险分数及受试者接受治疗和/或给药后结直肠癌的风险水平,基因位点代表该基因位点的甲基化M值。Among them, Risk score represents the risk score of the subject suffering from colorectal cancer, developing colorectal cancer, and the risk level of colorectal cancer after the subject receives treatment and/or medication, and the gene locus represents the methylation M value of the gene locus.

风险分数可用于预测受试者生物样本是否是结直肠癌阳性样本。在一些实施例中,预测模型计算的受试者生物样本的风险分数大于风险分数阈值,则可以预测该受试者生物样本为结直肠癌阳性样本。在一些实施例中,风险分数阈值可以为0。The risk score can be used to predict whether the subject's biological sample is a colorectal cancer positive sample. In some embodiments, if the risk score of the subject's biological sample calculated by the prediction model is greater than the risk score threshold, then the subject's biological sample can be predicted to be a colorectal cancer positive sample. In some embodiments, the risk score threshold can be 0.

在一些实施例中,使用逻辑回归构建的预测模型的输入可以是受试者生物样本的DNA甲基化位点组合的甲基化M值,预测模型的输出可以是受试者患有结直肠癌的风险分数。预测模型可以使用训练样本集训练初始模型而获得。其中,训练样本集可以包括一个或多个已知的结直肠癌患者样本的DNA甲基化位点组合的甲基化水平和非结直肠癌患者(例如,健康人)样本的DNA甲基化位点组合的甲基化M值,以及用于指示样本对象是否患有结直肠癌癌的标签。术语“已知的结直肠癌患者”是指对象或个体具有结直肠癌临床症状且获得临床诊断验证(例如,已通过活体组织检查证实疾病类型及性质)。术语“非结直肠癌患者”是指对象或个体未罹患结直肠癌且日常生活无障碍。在一些实施例中,训练集可以是实施例中获得的结直肠灌洗液样本组的甲基化水平。在一些实施例中,可以使用实施例中获得的结直肠灌洗液验证组的甲基化水平来验证预测模型。In some embodiments, the input of the prediction model constructed using logistic regression can be the methylation M value of the combination of DNA methylation sites of the subject's biological sample, and the output of the prediction model can be the risk score of the subject suffering from colorectal cancer. The prediction model can be obtained by training the initial model using a training sample set. Among them, the training sample set can include the methylation level of the combination of DNA methylation sites of one or more known colorectal cancer patient samples and the methylation M value of the combination of DNA methylation sites of non-colorectal cancer patient (e.g., healthy person) samples, as well as a label indicating whether the sample object suffers from colorectal cancer. The term "known colorectal cancer patient" refers to an object or individual with clinical symptoms of colorectal cancer and clinical diagnosis verification (e.g., the type and nature of the disease have been confirmed by biopsy). The term "non-colorectal cancer patient" refers to an object or individual who does not suffer from colorectal cancer and has no obstacles in daily life. In some embodiments, the training set can be the methylation level of the colorectal lavage fluid sample group obtained in the embodiment. In some embodiments, the methylation level of the colorectal lavage fluid validation group obtained in the embodiment can be used to verify the prediction model.

示例性的,用于训练预测模型的训练样本集中,结直肠癌患者样本的标签可以为1,非结直肠癌患者样本的标签可以为0。以受试者生物样本的DNA甲基化位点组合的甲基化水平作为模型输入,经模型计算可得相应的风险分数,风险分数可经sigmod函数转换为表示受试者样本为结直肠癌患者样本(标签为1)的概率值,其取值在0到1之间。风险分数越大,对应的概率值越接近与1,表示受试者样本为结直肠癌患者样本的可能性越大;风险分数越小,对应的概率值越接近于0,表示受试者样本为结直肠癌患者样本的可能性越小;风险分数为0时,对应的概率值等于50%,表示受试者样本为结直肠癌患者样本的概率为50%。Exemplarily, in the training sample set used to train the prediction model, the label of the colorectal cancer patient sample can be 1, and the label of the non-colorectal cancer patient sample can be 0. The methylation level of the DNA methylation site combination of the subject's biological sample is used as the model input, and the corresponding risk score can be obtained through model calculation. The risk score can be converted into a probability value indicating that the subject sample is a colorectal cancer patient sample (label is 1) through the sigmod function, and its value is between 0 and 1. The larger the risk score, the closer the corresponding probability value is to 1, indicating that the possibility of the subject sample being a colorectal cancer patient sample is greater; the smaller the risk score, the closer the corresponding probability value is to 0, indicating that the possibility of the subject sample being a colorectal cancer patient sample is smaller; when the risk score is 0, the corresponding probability value is equal to 50%, indicating that the probability of the subject sample being a colorectal cancer patient sample is 50%.

关于预测模型的更多详细内容,参见实施例2相关内容及其描述。For more details about the prediction model, see the relevant content and description of Example 2.

计算设备(例如,图1的处理设备110、图3的分析模块320)可通过多种方式实现步骤403的执行。在一些实施例中,处理设备110可以调用存储在存储设备120中的预测模型,并使用该预测模型处理受试者生物样本的DNA甲基化位点组合的甲基化水平相关信息,以获得评估结果。在另一些实施例中,处理设备110可以基于用户指令更新存储在存储设备120中的预测模型,并使用更新的预测模型获得评估结果。其中,处理设备110可以通过网络130从公开或非公开的数据库收集结直肠癌群体和正常群体的关联DNA甲基化位点组合的甲基化水平相关信息,用于更新训练样本集并进行预测模型的优化。处理设备110还可以基于用户输入或基于用户终端140上传的数据/信息更新训练样本集,并进行预测模型的优化。The computing device (e.g., the processing device 110 of FIG. 1 , the analysis module 320 of FIG. 3 ) can implement the execution of step 403 in a variety of ways. In some embodiments, the processing device 110 can call the prediction model stored in the storage device 120, and use the prediction model to process the methylation level related information of the DNA methylation site combination of the subject's biological sample to obtain an evaluation result. In other embodiments, the processing device 110 can update the prediction model stored in the storage device 120 based on user instructions, and use the updated prediction model to obtain the evaluation result. Among them, the processing device 110 can collect the methylation level related information of the associated DNA methylation site combination of the colorectal cancer group and the normal group from a public or non-public database through the network 130, which is used to update the training sample set and optimize the prediction model. The processing device 110 can also update the training sample set based on user input or based on the data/information uploaded by the user terminal 140, and optimize the prediction model.

在一些实施例中,本说明书提供的预测模型的AUC可以大于0.9、0.93、0.95。在一些实施例中,本说明书提供的预测模型的敏感度可以大于90%、92%、94%、95%、96%、97%。在一些实施例中,本说明书提供的预测模型的特异性可以大于90%、92%、94%或95%。In some embodiments, the AUC of the prediction model provided in this specification may be greater than 0.9, 0.93, 0.95. In some embodiments, the sensitivity of the prediction model provided in this specification may be greater than 90%, 92%, 94%, 95%, 96%, 97%. In some embodiments, the specificity of the prediction model provided in this specification may be greater than 90%, 92%, 94% or 95%.

应当注意的是,上述有关流程400的描述仅仅是为了示例和说明,而不限定本说明书的适用范围。对于本领域技术人员来说,在本说明书的指导下可以对流程400进行各种修正和改变。然而,这些修正和改变仍在本说明书的范围之内。It should be noted that the above description of process 400 is only for example and illustration, and does not limit the scope of application of this specification. For those skilled in the art, various modifications and changes can be made to process 400 under the guidance of this specification. However, these modifications and changes are still within the scope of this specification.

根据本说明书的又一方面,提供一种结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的装置。所述装置可以包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,处理器执行程序时可实现本说明书一些实施例所示的结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的方法。According to another aspect of the present specification, a device for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening is provided. The device may include a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening methods shown in some embodiments of the present specification may be implemented.

关于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的方法的更多内容可以在本说明书其他地方找到(例如,图4及其描述)。More information about methods for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment efficacy assessment and/or colorectal cancer treatment drug screening can be found elsewhere in this specification (e.g., Figure 4 and its description).

根据本说明书的又一方面,提供一种DNA甲基化位点组合的检测试剂。所述DNA甲基化位点组合可作为生物标志物用于检测结直肠癌,包括SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135及TBR1_161中的一个或多个位点(例如,2个、5个或8个)。According to another aspect of the present specification, a detection reagent for a combination of DNA methylation sites is provided. The combination of DNA methylation sites can be used as a biomarker for detecting colorectal cancer, including one or more sites (e.g., 2, 5 or 8) of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161.

在一些实施例中,DNA甲基化位点组合的检测试剂包括用于扩增DNA甲基化位点组合的引物组。用于扩增DNA甲基化位点组合的引物组用于获得包含DNA甲基化位点组合的特异性扩增片段,并放大检测信息。In some embodiments, the detection reagent of the DNA methylation site combination includes a primer set for amplifying the DNA methylation site combination. The primer set for amplifying the DNA methylation site combination is used to obtain a specific amplified fragment containing the DNA methylation site combination and amplify the detection information.

在一些实施例中,用于扩增DNA甲基化位点组合的引物组包括扩增SOCS-1_24、cg24403845_103、HOXD3_26、SDC2-2_46、SFRP1_48、SP9_25、TAC1-2_135及TBR1_161中的一个或多个位点的引物对。可选的,用于扩增SOCS-1_24的引物对如SEQ ID NO:1和SEQ IDNO:2所示,或者该引物对的引物序列分别与SEQ ID NO:1和SEQ ID NO:2所示序列具有至少95%、96%、97%、98%或99%的相似度。可选的,用于扩增cg24403845_103的引物对如SEQID NO:3和SEQ ID NO:4所示,或者该引物对的引物序列分别与SEQ ID NO:3和SEQ ID NO:4所示序列具有至少95%、96%、97%、98%或99%的相似度。可选的,用于扩增HOXD3_26的引物对如SEQ ID NO:5和SEQ ID NO:6所示,或者该引物对的引物序列分别与SEQ ID NO:5和SEQ ID NO:6所示序列具有至少95%、96%、97%、98%或99%的相似度。可选的,用于扩增SDC2-2_46的引物对如SEQ ID NO:7和SEQ ID NO:8所示,或者该引物对的引物序列分别与SEQ ID NO:7和SEQ ID NO:8所示序列具有至少95%、96%、97%、98%或99%的相似度。可选的,用于扩增SFRP1_48的引物对如SEQ ID NO:9和SEQ ID NO:10所示,或者该引物对的引物序列分别与SEQ ID NO:9和SEQ ID NO:10所示序列具有至少95%、96%、97%、98%或99%的相似度。可选的,用于扩增SP9_25的引物对如SEQ ID NO:11和SEQ ID NO:12所示,或者该引物对的引物序列分别与SEQ ID NO:11和SEQ ID NO:12所示序列具有至少95%、96%、97%、98%或99%的相似度。可选的,用于扩增TAC1-2_135的引物对如SEQ ID NO:13和SEQ ID NO:14所示,或者该引物对的引物序列分别与SEQ ID NO:13和SEQ ID NO:14所示序列具有至少95%、96%、97%、98%或99%的相似度。可选的,用于扩增TBR1_161的引物对如SEQ ID NO:15和SEQ ID NO:16所示,或者该引物对的引物序列分别与SEQ ID NO:15和SEQ ID NO:16所示序列具有至少95%、96%、97%、98%或99%的相似度。In some embodiments, the primer set for amplifying the combination of DNA methylation sites includes primer pairs for amplifying one or more sites of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, SFRP1_48, SP9_25, TAC1-2_135 and TBR1_161. Optionally, the primer pair for amplifying SOCS-1_24 is as shown in SEQ ID NO:1 and SEQ ID NO:2, or the primer sequence of the primer pair has at least 95%, 96%, 97%, 98% or 99% similarity with the sequences shown in SEQ ID NO:1 and SEQ ID NO:2, respectively. Optionally, the primer pair used to amplify cg24403845_103 is as shown in SEQ ID NO: 3 and SEQ ID NO: 4, or the primer sequence of the primer pair has at least 95%, 96%, 97%, 98% or 99% similarity with the sequence shown in SEQ ID NO: 3 and SEQ ID NO: 4. Optionally, the primer pair used to amplify HOXD3_26 is as shown in SEQ ID NO: 5 and SEQ ID NO: 6, or the primer sequence of the primer pair has at least 95%, 96%, 97%, 98% or 99% similarity with the sequence shown in SEQ ID NO: 5 and SEQ ID NO: 6. Optionally, the primer pair for amplifying SDC2-2_46 is shown in SEQ ID NO:7 and SEQ ID NO:8, or the primer sequence of the primer pair has at least 95%, 96%, 97%, 98% or 99% similarity to the sequence shown in SEQ ID NO:7 and SEQ ID NO:8, respectively. Optionally, the primer pair for amplifying SFRP1_48 is shown in SEQ ID NO:9 and SEQ ID NO:10, or the primer sequence of the primer pair has at least 95%, 96%, 97%, 98% or 99% similarity to the sequence shown in SEQ ID NO:9 and SEQ ID NO:10, respectively. Optionally, the primer pair for amplifying SP9_25 is shown in SEQ ID NO:11 and SEQ ID NO:12, or the primer sequence of the primer pair has at least 95%, 96%, 97%, 98% or 99% similarity to the sequence shown in SEQ ID NO:11 and SEQ ID NO:12, respectively. Optionally, the primer pair for amplifying TAC1-2_135 is as shown in SEQ ID NO: 13 and SEQ ID NO: 14, or the primer sequence of the primer pair has at least 95%, 96%, 97%, 98% or 99% similarity with the sequence shown in SEQ ID NO: 13 and SEQ ID NO: 14. Optionally, the primer pair for amplifying TBR1_161 is as shown in SEQ ID NO: 15 and SEQ ID NO: 16, or the primer sequence of the primer pair has at least 95%, 96%, 97%, 98% or 99% similarity with the sequence shown in SEQ ID NO: 15 and SEQ ID NO: 16.

在一些实施例中,DNA甲基化位点组合的检测试剂还可以包括用于检测甲基化水平的其他试剂,例如甲基化转化试剂和/或测序试剂。作为示例,甲基化水平的检测方法可以包括但不限于WGBS、RRBS、oxBS-seq、MethylCap-seq、MBD-seq、MeDIP-seq、HPLC、MSRF、MASP、甲基化芯片法、焦磷酸测序法、dPCR、MS-PCR等,或其组合。在一些较优的实施例中,其他试剂可以包括实现WGBS、RRBS、oxBS-seq、MethylCap-seq、MBD-seq、MeDIP-seq、HPLC、MSRF、MASP、甲基化芯片法、焦磷酸测序法、dPCR和MS-PCR中的一种或多种方法所用的试剂。在一些更优的实施例中,其他试剂可以包括实现WGBS或RRBS所用的试剂。In some embodiments, the detection reagent of the DNA methylation site combination may also include other reagents for detecting the methylation level, such as methylation conversion reagents and/or sequencing reagents. As an example, the detection method of the methylation level may include but is not limited to WGBS, RRBS, oxBS-seq, MethylCap-seq, MBD-seq, MeDIP-seq, HPLC, MSRF, MASP, methylation chip method, pyrophosphate sequencing, dPCR, MS-PCR, etc., or a combination thereof. In some preferred embodiments, other reagents may include reagents used to implement one or more methods of WGBS, RRBS, oxBS-seq, MethylCap-seq, MBD-seq, MeDIP-seq, HPLC, MSRF, MASP, methylation chip method, pyrophosphate sequencing, dPCR and MS-PCR. In some more preferred embodiments, other reagents may include reagents used to implement WGBS or RRBS.

根据本说明书的又一方面,提供一种用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的试剂盒。所述试剂盒包含本说明书一些实施例所示的DNA甲基化位点组合的检测试剂。According to another aspect of the present specification, a kit is provided for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening. The kit comprises a detection reagent for the DNA methylation site combination shown in some embodiments of the present specification.

根据本说明书的又一方面,提供一种DNA甲基化位点组合作为生物标志物在制备用于结直肠癌筛查、结直肠癌患病风险预测、结直肠癌治疗效果的评估和/或结直肠癌治疗药物的筛选的试剂盒中的用途。所述DNA甲基化位点组合为本说明书一些实施例所示的DNA甲基化位点组合。According to another aspect of the present specification, there is provided a use of a DNA methylation site combination as a biomarker in the preparation of a kit for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect evaluation and/or colorectal cancer treatment drug screening. The DNA methylation site combination is the DNA methylation site combination shown in some embodiments of the present specification.

下述实施例中的实验方法,如无特殊说明,均为常规方法。下述实施例中所用的试验材料,如无特殊说明,均为自常规生化试剂公司购买得到的。以下实施例中的定量试验,均设置三次重复实验,结果取平均值。The experimental methods in the following examples are conventional methods unless otherwise specified. The experimental materials used in the following examples are purchased from conventional biochemical reagent companies unless otherwise specified. The quantitative tests in the following examples are repeated three times, and the results are averaged.

实施例Example

实施例1甲基化位点及甲基化水平的确定Example 1 Determination of methylation sites and methylation levels

1、收集结直肠灌洗液样本组1. Collect colorectal lavage fluid samples

采集208个结直肠癌患者和30个健康正常人的肠灌洗液样本共238例作为实验样本,样本采集后均保存于50mL灌洗液DNA保存管中,保存管内含7mL添加剂,以4000rpm离心10min,弃上清,用1×PBS洗涤沉淀。A total of 238 intestinal lavage fluid samples were collected from 208 colorectal cancer patients and 30 healthy normal people as experimental samples. After collection, the samples were stored in 50mL lavage fluid DNA preservation tubes containing 7mL of additives. The tubes were centrifuged at 4000rpm for 10min, the supernatant was discarded, and the precipitate was washed with 1×PBS.

2、特异性甲基化位点的统计2. Statistics of specific methylation sites

统计已知的和/或潜在的结直肠癌相关基因的105个甲基化位点,及作为质量控制的ATCB基因甲基化位点12个,共117个,具体信息见表1。A total of 117 methylation sites were counted, including 105 known and/or potential colorectal cancer-related genes and 12 ATCB gene methylation sites for quality control. The specific information is shown in Table 1.

表1甲基化位点信息表Table 1 Methylation site information

3、灌洗液样本组的DNA提取3. DNA extraction from lavage fluid sample group

对灌洗液样本组的DNA进行提取,向上述沉淀中加入180微升Buffer GTL,重悬沉淀;再加入20微升蛋白酶K,涡旋震荡混匀。于56℃水浴1小时,直到样品完全溶解,继续于90℃水浴1小时。短暂离心,将管壁上的溶液收集到管底。加入200微升Buffer GL,涡旋震荡彻底混匀。加入200微升无水乙醇,涡旋震荡彻底混匀。短暂离心,将管壁上的溶液收集到管底。Extract DNA from the lavage fluid sample group. Add 180 μL of Buffer GTL to the above precipitate and resuspend the precipitate. Then add 20 μL of proteinase K and vortex to mix. Incubate in a 56°C water bath for 1 hour until the sample is completely dissolved. Continue in a 90°C water bath for 1 hour. Centrifuge briefly to collect the solution on the tube wall to the bottom of the tube. Add 200 μL of Buffer GL and vortex to mix thoroughly. Add 200 μL of anhydrous ethanol and vortex to mix thoroughly. Centrifuge briefly to collect the solution on the tube wall to the bottom of the tube.

将管中溶液全部加入已放入离心管中的硅基质材料膜上,向硅基质材料膜上加入500微升已加入无水乙醇的Buffer GW1,12,000rpm离心1分钟,倒掉收集管中的废液,将硅基质材料膜重新放回收集管中。向硅基质材料膜上加入500微升已加入无水乙醇的BufferGW2,12,000rpm离心1分钟,倒掉收集管中的废液,将硅基质材料膜重新放回收集管中。12,000rpm离心2分钟,倒掉收集管中的废液,将硅基质材料膜置于室温数分钟以彻底晾干。Add all the solution in the tube to the silicon-based material membrane in the centrifuge tube, add 500 microliters of Buffer GW1 with anhydrous ethanol to the silicon-based material membrane, centrifuge at 12,000rpm for 1 minute, discard the waste liquid in the collection tube, and put the silicon-based material membrane back into the collection tube. Add 500 microliters of Buffer GW2 with anhydrous ethanol to the silicon-based material membrane, centrifuge at 12,000rpm for 1 minute, discard the waste liquid in the collection tube, and put the silicon-based material membrane back into the collection tube. Centrifuge at 12,000rpm for 2 minutes, discard the waste liquid in the collection tube, and place the silicon-based material membrane at room temperature for several minutes to dry thoroughly.

将硅基质材料膜放到一个新离心管中,加入50-200微升Buffer GE,室温放置2-5分钟,12000rpm离心1分钟,收集DNA溶液,-20℃保存以备进一步使用,使用微量分光光度计Nano-300和Qubit测定DNA浓度(浓度要不小于1ng/μL)。Place the silicon matrix material membrane in a new centrifuge tube, add 50-200 μL of Buffer GE, place at room temperature for 2-5 minutes, centrifuge at 12000 rpm for 1 minute, collect the DNA solution, and store it at -20°C for further use. Use a micro-spectrophotometer Nano-300 and Qubit to measure the DNA concentration (the concentration must be not less than 1 ng/μL).

4、灌洗液样本组的DNA甲基化转化4. DNA methylation transformation of lavage fluid sample group

对灌洗液样本组进行亚硫酸盐处理:在PCR管中加入50μL灌洗液沉淀DNA样本,150μL Bisulfite Mix,25μL MBuffer B-保护液。短暂离心后,将PCR管置于PCR仪上,85℃恒温孵育50min后冷却至室温,短暂离心。其中,灌洗液沉淀DNA取自前述DNA溶液,50μL灌洗液沉淀DNA样本中DNA含量为50-500ng。Bisulfite Mix的配制包括向一个含亚硫酸氢钠的干粉管中加入1.2mL MBuffer A-转化液,震荡混匀直至干粉完全溶解。The lavage fluid sample group was treated with sulfite: 50 μL lavage fluid precipitated DNA sample, 150 μL Bisulfite Mix, and 25 μL MBuffer B-protection solution were added to the PCR tube. After a brief centrifugation, the PCR tube was placed on the PCR instrument, incubated at 85°C for 50 minutes, cooled to room temperature, and briefly centrifuged. Among them, the lavage fluid precipitated DNA was taken from the aforementioned DNA solution, and the DNA content in the 50 μL lavage fluid precipitated DNA sample was 50-500 ng. The preparation of Bisulfite Mix includes adding 1.2 mL MBuffer A-conversion solution to a dry powder tube containing sodium bisulfite, and shaking and mixing until the dry powder is completely dissolved.

亚硫酸盐处理后的DNA纯化处理:前述PCR管中溶液全部导入1.5mL离心管中;离心管中加入285μL MBuffer C-结合液、115μL异丙醇、10μL磁珠悬液(使用前请充分混匀),震荡10min;短暂离心后放入磁力架上吸附2min,弃上清;离心管中加入1000μL MBuffer D-洗涤液,勿离开磁力架,孵育30s,弃上清;离心管中加入1000μL MBuffer E-孵育液,室温孵育15min,短暂离心后放入磁力架上吸附2min,弃上清;离心管中加入1000μL MBuffer D-洗涤液,勿离开磁力架,孵育30s,弃上清;本步骤重复操作一次。将离心管中多余的洗涤液吸干净后,置于超净工作台,吹干5min。DNA purification after sulfite treatment: all the solutions in the above PCR tubes were transferred into a 1.5mL centrifuge tube; 285μL MBuffer C-binding solution, 115μL isopropanol, and 10μL magnetic bead suspension were added to the centrifuge tube (please mix thoroughly before use), and the mixture was shaken for 10min; after a brief centrifugation, the mixture was placed on a magnetic rack for adsorption for 2min, and the supernatant was discarded; 1000μL MBuffer D-washing solution was added to the centrifuge tube, and the mixture was kept on the magnetic rack for incubation for 30s, and the supernatant was discarded; 1000μL MBuffer E-incubation solution was added to the centrifuge tube, and the mixture was incubated at room temperature for 15min, and the mixture was placed on a magnetic rack for adsorption for 2min, and the supernatant was discarded; 1000μL MBuffer D-washing solution was added to the centrifuge tube, and the mixture was kept on the magnetic rack for incubation for 30s, and the supernatant was discarded; this step was repeated once. After the excess washing solution in the centrifuge tube was cleaned, it was placed on a clean bench and blown dry for 5min.

对灌洗液样本组的DNA进行纯化与回收:向离心管中加入50μL MBuffer F-洗脱液,56℃温润,有助于提高洗脱效率,漩涡震荡使其充分混匀,静候5min。短暂离心,放于磁力架上吸附2min。吸取上清于干净的新离心管中,收集DNA溶液作为DNA转化样本,于-20℃保存以备进一步使用。Purify and recover DNA from the lavage fluid sample group: add 50 μL MBuffer F-elution buffer to the centrifuge tube, warm at 56°C to help improve the elution efficiency, vortex to mix it thoroughly, and wait for 5 minutes. Centrifuge briefly and place it on the magnetic rack for adsorption for 2 minutes. Pipette the supernatant into a clean new centrifuge tube, collect the DNA solution as the DNA conversion sample, and store it at -20°C for further use.

5、多重PCR-NGS检测5. Multiplex PCR-NGS testing

第一轮PCR,针对8个结直肠癌特异基因,设计8对甲基化特异性引物(引物序列见表2)对238个样本核酸进行了PCR反应。其中设置ACTB内参基因对照,控制NGS过程与亚硫酸盐转化过程。反应体系包括:10×ACE buffer,3μL;dNTP Mix(10mM),1μL;Primer混合引物,5μL;TMAC 600mM,2.5μL;50%甘油,6μL;5×Enhancer,2μL;灭菌水,5μL;Ace DNA Taq酶,0.5μL;DNA转化样本(即,亚硫酸盐处理后的纯化DNA),5μL。In the first round of PCR, 8 pairs of methylation-specific primers were designed for 8 colorectal cancer-specific genes (primer sequences are shown in Table 2) and PCR reactions were performed on 238 sample nucleic acids. The ACTB internal reference gene control was set to control the NGS process and sulfite conversion process. The reaction system included: 10×ACE buffer, 3μL; dNTP Mix (10mM), 1μL; Primer mixed primer, 5μL; TMAC 600mM, 2.5μL; 50% glycerol, 6μL; 5×Enhancer, 2μL; sterile water, 5μL; Ace DNA Taq enzyme, 0.5μL; DNA conversion sample (i.e., purified DNA after sulfite treatment), 5μL.

表2甲基化特异性引物序列表Table 2 Sequence table of methylation specific primers

第一轮PCR的反应条件为:1)循环数1:95℃5min;2)循环数35:95℃30s,52℃1min,72℃30s;3)循环数1:72℃5min。The reaction conditions of the first round of PCR were as follows: 1) Cycle 1: 95°C for 5 min; 2) Cycle 35: 95°C for 30 s, 52°C for 1 min, 72°C for 30 s; 3) Cycle 1: 72°C for 5 min.

第二轮PCR的反应体系包括:10×ACE buffer,3μL;dNTP Mix(10mM),1μL;引物AP5(5μM),2μL;引物Index(5μM),2μL;50%甘油,6μL;灭菌水,10.5μL;Ace DNA Taq酶,0.5μL;第一轮PCR反应产物,5μL。其中:引物AP5的序列为The reaction system of the second round of PCR includes: 10×ACE buffer, 3μL; dNTP Mix (10mM), 1μL; Primer AP5 (5μM), 2μL; Primer Index (5μM), 2μL; 50% glycerol, 6μL; Sterile water, 10.5μL; Ace DNA Taq enzyme, 0.5μL; First round PCR reaction product, 5μL. Among them: The sequence of primer AP5 is

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO:19);引物index的序列为AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT (SEQ ID NO: 19); the sequence of primer index is

CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT(SEQ ID NO:20)。需要说明的是,“NNNNNNNN”代表用于区分不同样本的索引index。CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 20). It should be noted that “NNNNNNNN” represents an index used to distinguish different samples.

第二轮PCR的反应条件为:1)循环数1:95℃10min;2)循环数21:95℃30s,57℃30s,72℃30s;3)循环数1:72℃5min。The reaction conditions of the second round of PCR were as follows: 1) Cycle 1: 95°C for 10 min; 2) Cycle 21: 95°C for 30 s, 57°C for 30 s, 72°C for 30 s; 3) Cycle 1: 72°C for 5 min.

扩增产物经核酸纯化试剂纯化后得到测序文库,然后测序试剂使用MiniseqTM MidOutput Reagent Cartridge(Illumina),在MiniSeq测序仪(Illumina)上进行测序。The amplified product was purified by nucleic acid purification reagent to obtain a sequencing library, and then the sequencing reagent used Miniseq TM MidOutput Reagent Cartridge (Illumina) and sequenced on a MiniSeq sequencer (Illumina).

6、各位点的甲基化水平(β值及M值)的计算6. Calculation of methylation levels (β value and M value) at each site

统计分析117个位点所在的NGS结果,每个位点测序深度不小于500X。可以通过公式(1)和(2)计算各位点甲基化水平(β值及M值):The NGS results of 117 loci were statistically analyzed, and the sequencing depth of each locus was no less than 500X. The methylation level (β value and M value) of each locus can be calculated by formulas (1) and (2):

其中,Methylated和Unmethylated分别是待检测位点的甲基化和未甲基化的序列数目,offset是一个小的常数,用于稳定低强度信号的比率,防止分母为零。关于β值及M值的更多说明可以参加前述相关内容。Among them, Methylated and Unmethylated are the number of methylated and unmethylated sequences at the site to be detected, respectively, and offset is a small constant used to stabilize the ratio of low-intensity signals to prevent the denominator from being zero. For more information about β and M values, please refer to the above related content.

7、甲基化率差异检验及相关性分析7. Methylation rate difference test and correlation analysis

对于154个训练样本的105个甲基化位点(见表1)的甲基化β值进行检验分析。利用甲基化数据的β值初步筛选这些差异性位点,即:在正常样本中为甲基化β值均值<0.2,并且在癌症样本中的β值均值大于正常样本中的均值;采用t-test检验正常样本与癌症样本中的β值的差异,p-value值<0.05表示在两类样本中的差异有统计显著性。经检验发现,在所有的105个位点中,大部分位点表现出在癌症样本中的甲基化β值,显著高于正常样本中的甲基化β值。The methylation β values of 105 methylation sites (see Table 1) of 154 training samples were tested and analyzed. The β values of methylation data were used to preliminarily screen these differential sites, that is, the mean methylation β value in normal samples was <0.2, and the mean β value in cancer samples was greater than that in normal samples; the difference in β values between normal samples and cancer samples was tested using t-test, and p-value <0.05 indicated that the difference between the two types of samples was statistically significant. The test found that among all 105 sites, most sites showed methylation β values in cancer samples that were significantly higher than methylation β values in normal samples.

分别对不同位点甲基化率(甲基化β值)在30例正常样本和208例结直肠癌样本中的相关性进行分析,相关性采用皮尔逊相关系数,相关性热图如图5A(30例正常样本)和图5B(208例结直肠癌样本)所示。相关性热图的横坐标和纵坐标均为甲基化位点,在正常样本和结直肠癌样本中,甲基化位点信息见表3,横坐标从左至右、纵坐标从上至下按表3顺序依次排列。The correlation of methylation rates (methylation β values) at different sites in 30 normal samples and 208 colorectal cancer samples was analyzed, and the Pearson correlation coefficient was used for the correlation. The correlation heat map is shown in Figure 5A (30 normal samples) and Figure 5B (208 colorectal cancer samples). The horizontal and vertical axes of the correlation heat map are both methylation sites. The methylation site information in normal samples and colorectal cancer samples is shown in Table 3. The horizontal axis is arranged from left to right and the vertical axis is arranged from top to bottom in the order of Table 3.

表3相关性热图的甲基化位点信息表Table 3 Methylation site information table of correlation heat map

在正常样本中,检测的位点甲基化率均比较低,其中SOCS-1基因上的位点表现出高度的相关性,其余的位点也表现出一定的相关性;在结直肠癌样本中,甲基化率的相关性表现出比较明显的模式,SOCS-1基因的不同位点高度相关,SP9、TAC1-2、cg24403845三个基因位点的甲基化率均高度相关;在其余的4个基因上,基因内的位点保持高度相关,在基因间也表现出一定的弱相关性。In normal samples, the methylation rates of the detected sites were relatively low, among which the sites on the SOCS-1 gene showed a high correlation, and the remaining sites also showed a certain correlation; in colorectal cancer samples, the correlation of methylation rates showed a relatively obvious pattern, different sites of the SOCS-1 gene were highly correlated, and the methylation rates of the three gene sites SP9, TAC1-2, and cg24403845 were all highly correlated; on the remaining four genes, the sites within the gene remained highly correlated, and also showed a certain weak correlation between genes.

实施例2预测模型的获取流程Example 2 Prediction model acquisition process

本说明书实施例收集了208个结直肠癌患者的灌洗液样本和30个正常人的灌洗液样本,通过文献检索到8个与结直肠癌发病相关的105个甲基化位点,以及作为质量控制的ACTB基因,共检测了117个甲基化位点,通过机器学习的方法,分别建立了与2个甲基化位点和8个甲基化位点相关的逻辑回归模型作为预测模型,在训练集和独立的验证集中取得很好的效果。The embodiments of this specification collected lavage fluid samples from 208 colorectal cancer patients and lavage fluid samples from 30 normal persons. Through literature retrieval, 105 methylation sites related to 8 colorectal cancer incidence and the ACTB gene as quality control were detected. A total of 117 methylation sites were detected. Through machine learning methods, logistic regression models related to 2 methylation sites and 8 methylation sites were established as prediction models, respectively, and good results were achieved in the training set and the independent validation set.

本说明书实施例提出了如图6所示的分析流程,预测模型的建立按图6所示分析流程进行,主要包括下述步骤:The embodiment of this specification proposes an analysis process as shown in FIG6 . The establishment of the prediction model is carried out according to the analysis process as shown in FIG6 , which mainly includes the following steps:

1、数据质量控制1. Data quality control

通常二代测序甲基化分析流程给出所测样本待测位点的甲基化率(β值)和相应位点的测序深度。在实验过程中,由于样本在采集、运输、提取过程中,可能出现一些问题,使最终的样本质量不能符合后续分析的要求,如采集样本过程,未能采集到足够、合适的样本;储存、运输过程中样本的分解;实验提取过程失败等,需要对测序结果的位点和样本整体进行质量控制。Usually, the second-generation sequencing methylation analysis process gives the methylation rate (β value) of the tested site of the sample and the sequencing depth of the corresponding site. During the experiment, due to some problems in the collection, transportation, and extraction of samples, the final sample quality may not meet the requirements of subsequent analysis, such as failure to collect enough and suitable samples during the sample collection process; sample decomposition during storage and transportation; failure of the experimental extraction process, etc., it is necessary to control the quality of the sites and samples of the sequencing results as a whole.

在本说明书实施例中,对每个位点,要求测序深度不小于500,否则设为缺失值;对于每个样本,缺失位点数目应小于总共测试位点的5%,否则该样本判定为不合格样本,将从后续分析中剔除。In the examples of this specification, for each site, the sequencing depth is required to be no less than 500, otherwise it is set as a missing value; for each sample, the number of missing sites should be less than 5% of the total test sites, otherwise the sample is judged as an unqualified sample and will be eliminated from subsequent analysis.

2、训练集和验证集数据拆分2. Splitting of training set and validation set data

经过质量控制,本说明书实施例共纳入238例灌洗液样本的甲基化测序数据进行下游的数据分析,其中结直肠癌阳性样本208例,阴性对照样本30例;将数据随机拆分为两份,分别作为训练集和测试集,其中训练集包含结直肠癌阳性样本139例,阴性对照样本15例;验证集包含结直肠癌阳性样本69例,阴性对照样本15例。训练集中的样本用于提取特征、训练模型;验证集样本完全独立与训练集样本,利用训练集所得的预测模型,进行结果预测并对所得的预测模型进行评估。内参基因ACTB在这些样本中的甲基化转化水平均在99%以上,保证各个基因的各个位点的亚硫酸盐处理过程正常。After quality control, the embodiments of this specification include methylation sequencing data of 238 lavage fluid samples for downstream data analysis, including 208 colorectal cancer positive samples and 30 negative control samples; the data are randomly split into two parts, as training set and test set, respectively, of which the training set contains 139 colorectal cancer positive samples and 15 negative control samples; the validation set contains 69 colorectal cancer positive samples and 15 negative control samples. The samples in the training set are used to extract features and train models; the validation set samples are completely independent of the training set samples, and the prediction model obtained from the training set is used to predict the results and evaluate the obtained prediction model. The methylation conversion level of the internal reference gene ACTB in these samples is above 99%, ensuring that the sulfite treatment process of each site of each gene is normal.

3、β值与M值的转化、归一化3. Conversion and normalization of β value and M value

β值具有较直观的生物学含义,β值一般呈Beta分布,而M值接近正态分布,更适合运用于一般的模型训练和运用。M值和β值之间相互转化的公式如下:The β value has a more intuitive biological meaning. The β value is generally Beta distributed, while the M value is close to normal distribution and is more suitable for general model training and application. The formula for the mutual conversion between the M value and the β value is as follows:

通过归一化公式(6)将M值归一化:The M value is normalized by normalization formula (6):

其中,Mnormal为归一化后的M值,μnormal和σnormal分别为基于训练集中的正常样本甲基化位点的M值的均值和方差。所有的样本归一化,都运用该公式及参数进行归一化处理,包括训练集和验证集,正常样本和癌症样本。Among them, M normal is the normalized M value, μ normal and σ normal are the mean and variance of the M value of the methylation sites of normal samples in the training set, respectively. All samples are normalized using this formula and parameters, including training sets and validation sets, normal samples and cancer samples.

将甲基化的β值转化成M值,然后将M值进行标准化,即按照在训练集中正常样本的均值和方差,经过转化和归一化的M值,在正常样本中将符合标准正态分布;在癌症样本中以同样的方式转化,均值和方差仍然取正常样本的值。如图7A和图7B所示,显示了本说明书实施例8个位点在正常样本和结直肠癌样本中的甲基化率分布图,图7A和图7B分别代表8个甲基化位点的β值和归一化后的M值。The methylated β value is converted into an M value, and then the M value is standardized, that is, according to the mean and variance of the normal sample in the training set, the converted and normalized M value will conform to the standard normal distribution in the normal sample; in the cancer sample, the same conversion is performed, and the mean and variance still take the values of the normal sample. As shown in Figures 7A and 7B, the methylation rate distribution diagram of the 8 sites in the examples of this specification in normal samples and colorectal cancer samples is shown, and Figures 7A and 7B represent the β values and normalized M values of the 8 methylation sites, respectively.

4、特征筛选和模型训练4. Feature screening and model training

本说明书实施例检测了8个基因中的105个位点(见表1)的甲基化率,这些位点是来源于文献报道的与结直肠癌相关的位点,外加质量控制ACTB基因的12个位点。在同一基因区域的甲基化率,一般具有高度的相关性(见图5A和图5B),即这些位点的甲基化信息是高度冗余的;在一些不同基因区域的甲基化率也可能具有高度的相关性。The embodiment of this specification detects the methylation rate of 105 sites in 8 genes (see Table 1), which are sites related to colorectal cancer reported in the literature, plus 12 sites of the ACTB gene for quality control. The methylation rate in the same gene region is generally highly correlated (see Figure 5A and Figure 5B), that is, the methylation information of these sites is highly redundant; the methylation rates in some different gene regions may also be highly correlated.

利用训练集数据,采用特征数目递增的方式训练和选择合适的模型。具体过程如下:(1)为减少模型建立的计算量,对每个基因区域的位点,均保留最好5个位点,即基于t-test的p-value最显著的位点,其余的位点不纳入模型训练过程;(2)选择m个位点开始训练训练逻辑回归模型(实际从单个位点,m=1开始),这里m个位点的组合不包含来源于同一区域的位点,对每个待选择的位点组合使用训练集的数据进行训练,训练过程中,使用留一法的AUC,来表征该模型的好坏。假定训练集中总共有k个样本,保留其中的一个样本,利用剩余的k-1个样本,训练逻辑回归模型,然后利用该模型预测保留的样本的预测值,每个样本将被保留并预测一次,利用这k个模型预测的样本的预测值,与样本的真实值,计算AUC;(3)在所有可能的m个位点的基础上,增加一个位点,即列出所有m+1个位点的组合,每个组合中同样不包含来源于同一基因区域的位点,重复上述模型训练、预测,及表征预测效果的AUC;(4)在最优的模型中,共纳入8个甲基化位点,其中cg24403845_103和SOCS-1_24两个位点贡献值明显大于其余位点,为主要位点,选择这两个位点训练逻辑回归模型,验证主要位点对样本分类的区分度。Using the training set data, the appropriate model is trained and selected in an increasing manner with the number of features. The specific process is as follows: (1) To reduce the amount of calculation required for model establishment, the best five sites are retained for each gene region, i.e., the sites with the most significant p-value based on the t-test, and the remaining sites are not included in the model training process; (2) m sites are selected to start training the logistic regression model (actually starting from a single site, m = 1). Here, the combination of m sites does not include sites from the same region. Each site combination to be selected is trained using the data from the training set. During the training process, the AUC of the leave-one-out method is used to characterize the quality of the model. Assume that there are a total of k samples in the training set, retain one of them, use the remaining k-1 samples to train the logistic regression model, and then use the model to predict the predicted value of the retained sample. Each sample will be retained and predicted once. The predicted value of the sample predicted by these k models is compared with the true value of the sample to calculate the AUC; (3) On the basis of all possible m sites, add one site, that is, list all combinations of m+1 sites. Each combination also does not contain sites from the same gene region. Repeat the above model training, prediction, and AUC to characterize the prediction effect; (4) In the optimal model, a total of 8 methylation sites are included, of which cg24403845_103 and SOCS-1_24 have a contribution value significantly greater than the other sites and are the main sites. These two sites are selected to train the logistic regression model to verify the discrimination of the main sites for sample classification.

(1)预测模型1:基于cg24403845_103和SOCS-1_24这2个位点的逻辑回归模型,其风险分数的计算公式如下:(1) Prediction model 1: Logistic regression model based on the two loci cg24403845_103 and SOCS-1_24. The risk score is calculated as follows:

Risk scorep=-4.38+5.16*(cg24403845_103)+1.18*SOCS-1_24 (3)Risk scorep=-4.38+5.16*(cg24403845_103)+1.18*SOCS-1_24 (3)

其中,Risk score表示受试者患有结直肠癌、发展结直肠癌的风险分数及受试者接受治疗和/或给药后结直肠癌的风险水平,基因位点代表该基因位点的甲基化M值。Among them, Risk score represents the risk score of the subject suffering from colorectal cancer, developing colorectal cancer, and the risk level of colorectal cancer after the subject receives treatment and/or medication, and the gene locus represents the methylation M value of the gene locus.

基于cg24403845_103和SOCS-1_24这2个位点的逻辑回归模型,使用2个甲基化位点的M值,取得了良好的区分结直肠癌样本和正常样本的效果。留一法交叉验证结果如图8A所示,ROC曲线面积为0.980。利用与2个甲基化位点相关的预测模型对训练集样本进行预测,如图8B所示,对于癌症样本,只有一例样本,其风险分数<0,对应的逻辑回归概率<0.5,预测值为阴性,该例样本的预测结果错误,除了该例样本外,其余所有的癌症样本预测值与真实值均符合;对于正常样本,有3例样本,其风险分数>0,对应的逻辑回归概率>0.5,预测值为阳性,该3例样本的预测结果错误,除了该3例样本外,其余所有的阴性样本预测值与真实值均符合。Based on the logistic regression model of the two loci cg24403845_103 and SOCS-1_24, the M values of the two methylation sites were used to achieve a good effect of distinguishing colorectal cancer samples from normal samples. The leave-one-out cross-validation result is shown in Figure 8A, and the ROC curve area is 0.980. The prediction model associated with the two methylation sites was used to predict the training set samples, as shown in Figure 8B. For cancer samples, there was only one sample with a risk score of <0, a corresponding logistic regression probability of <0.5, and a negative prediction value. The prediction result of this sample was wrong. Except for this sample, the prediction values of all other cancer samples were consistent with the true values; for normal samples, there were 3 samples with a risk score of >0, a corresponding logistic regression probability of >0.5, and a positive prediction value. The prediction results of these 3 samples were wrong. Except for these 3 samples, the prediction values of all other negative samples were consistent with the true values.

(2)预测模型2:基于cg24403845_103、SOCS-1_24、TAC1-2_135、SDC2-2_46、SFRP1_48、TBR1_161、SP9_25、HOXD3_268这8个位点的逻辑回归模型,其风险分数的计算公式如下:(2) Prediction model 2: A logistic regression model based on the eight loci cg24403845_103, SOCS-1_24, TAC1-2_135, SDC2-2_46, SFRP1_48, TBR1_161, SP9_25, and HOXD3_268. The risk score was calculated as follows:

Risk score=-7.11+2.89*cg24403845_103+3.61*SOCS-1_24+1.05*TAC1-2_135+0.92*SDC2-2_46+0.77*SFRP1_48+0.72*TBR1_161+0.67*SP9_25+0.19*HOXD3_26 (4)Risk score=-7.11+2.89*cg24403845_103+3.61*SOCS-1_24+1.05*TAC1-2_135+0.92*SDC2-2_46+0.77*SFRP1_48+0.72*TBR1_161+0.67*SP9_25+0.19*HOXD3_26 (4)

其中,Risk score表示受试者患有结直肠癌、发展结直肠癌的风险分数及受试者接受治疗和/或给药后结直肠癌的风险水平,基因位点代表该基因位点的甲基化M值。Among them, Risk score represents the risk score of the subject suffering from colorectal cancer, developing colorectal cancer, and the risk level of colorectal cancer after the subject receives treatment and/or medication, and the gene locus represents the methylation M value of the gene locus.

分别采用单元逻辑回归分析和多元逻辑回归分析对上述8个位点进行逻辑回归分析,结果如表4所示。在单元逻辑回归分析的结果中,这8个位点的M值均与结直肠癌的表型显著相关;在多元回归分析的结果中,只有cg24403845_103,SOCS-1_24具有统计显著性,表明这8个位点并非完全独立,在回归模型中,这两个位点是贡献最大的两个位点,其余的位点仍然增加模型的准确率;本说明书中的模型使用8个甲基化位点的M值,取得了良好的区分结直肠癌样本和正常样本的效果。留一法交叉验证结果如图9A所示,ROC曲线面积为0.996;利用与8个甲基化位点相关的预测模型对训练集样本进行预测,如图9B所示,对于癌症样本,只有一例样本,其风险分数<0,对应的逻辑回归概率<0.5,预测值为阴性,该例样本的预测结果错误,除了该例样本外,其余所有的癌症样本预测值与真实值均符合;对于所有正常样本,其风险分数<0,对应的逻辑回归概率<0.5,预测值为阴性,和真实值完全相同。Logistic regression analysis was performed on the above 8 sites using unit logistic regression analysis and multivariate logistic regression analysis, and the results are shown in Table 4. In the results of the unit logistic regression analysis, the M values of the 8 sites were significantly correlated with the phenotype of colorectal cancer; in the results of the multivariate regression analysis, only cg24403845_103 and SOCS-1_24 were statistically significant, indicating that the 8 sites were not completely independent, and in the regression model, these two sites were the two sites that contributed the most, and the remaining sites still increased the accuracy of the model; the model in this manual used the M values of the 8 methylation sites and achieved a good effect of distinguishing colorectal cancer samples from normal samples. The leave-one-out cross-validation result is shown in Figure 9A, and the ROC curve area is 0.996. The prediction model related to the 8 methylation sites is used to predict the training set samples, as shown in Figure 9B. For cancer samples, there is only one sample with a risk score of <0, a corresponding logistic regression probability of <0.5, and a predicted value of negative. The prediction result of this sample is wrong. Except for this sample, the predicted values of all other cancer samples are consistent with the true values. For all normal samples, their risk scores are <0, the corresponding logistic regression probability is <0.5, and the predicted values are negative, which are exactly the same as the true values.

表4单元和多元逻辑回归分析结果Table 4 Results of unit and multivariate logistic regression analysis

5、样本预测和模型评估5. Sample prediction and model evaluation

通过训练集数据得到最优的逻辑回归模型,将该模型运用到测试集数据中,可计算得到每个样本的风险分数及逻辑回归中对应分类类型的概率,和真实表型进行比较,计算出模型在独立测试集中的预测效能,以AUC形式体现。AUC(Area Under Curve)是ROC曲线(接收者操作特征曲线)下与坐标轴围成的面积,其取值的最大值为1;ROC曲线一般都处于y=x这条直线的上方,所以AUC的取值范围在0.5和1之间。AUC越接近1.0,检测方法准确率越高,应用价值高。The optimal logistic regression model is obtained through the training set data. The model is applied to the test set data to calculate the risk score of each sample and the probability of the corresponding classification type in the logistic regression. The model is compared with the true phenotype to calculate the predictive efficiency of the model in the independent test set, which is expressed in the form of AUC. AUC (Area Under Curve) is the area under the ROC curve (receiver operating characteristic curve) and the coordinate axis, and its maximum value is 1; the ROC curve is generally above the straight line y=x, so the value range of AUC is between 0.5 and 1. The closer the AUC is to 1.0, the higher the accuracy of the detection method and the higher the application value.

测试集是独立于训练集的样本,可用于评估经训练集数据训练的模型的可靠性。测试集样本的质量控制标准和训练集相同;由于在对测试集样本进行预测时,样本的最终表型不能用到模型中,只能在和预测结果进行比较时使用,因此采用了K Mean的方法:将测试集样本的数据,根据样本归一化的M值进行聚类,计算测试集中每个样本到每个聚类的距离,按距离最近的聚类样本的均值,填充测试集中样本的测量值,并用于该样本的预测。The test set is a sample independent of the training set, which can be used to evaluate the reliability of the model trained with the training set data. The quality control standards of the test set samples are the same as those of the training set; because the final phenotype of the sample cannot be used in the model when predicting the test set samples, it can only be used when comparing with the prediction results, so the K Mean method is used: the data of the test set samples are clustered according to the normalized M value of the sample, the distance from each sample in the test set to each cluster is calculated, and the measured value of the sample in the test set is filled according to the mean of the cluster sample with the closest distance, and used for the prediction of the sample.

对于测试样本,利用上文训练所得的与2个甲基化位点相关的预测模型进行预测,图10A中结果显示,对于癌症样本,只有一例样本,其风险分数<0,对应的逻辑回归概率<0.5,预测值为阴性,该例样本的预测结果错误,除了该例样本外,其余所有的癌症样本预测值与真实值均符合;对于测试样本中的正常对照样本,只有2例样本,其风险分数>0,对应的逻辑回归概率>0.5,预测值为阳性,该2例样本的预测结果错误,除了该2例样本外,其余所有的阴性样本预测值与真实值均符合。该预测模型对独立测试样本的预测效果非常有效,在独立测试集中,ROC曲线面积为0.990(如图10B)。For the test samples, the prediction model related to the two methylation sites obtained by the above training was used for prediction. The results in Figure 10A show that for cancer samples, only one sample has a risk score of <0, a corresponding logistic regression probability of <0.5, and a negative prediction value. The prediction result of this sample is wrong. Except for this sample, the prediction values of all other cancer samples are consistent with the true values; for normal control samples in the test samples, only two samples have a risk score of >0, a corresponding logistic regression probability of >0.5, and a positive prediction value. The prediction results of these two samples are wrong. Except for these two samples, the prediction values of all other negative samples are consistent with the true values. The prediction effect of this prediction model on independent test samples is very effective. In the independent test set, the ROC curve area is 0.990 (as shown in Figure 10B).

对测试样本,利用上文训练所得的与8个甲基化位点相关的预测模型进行预测,图11A中结果显示,对于所有的癌症样本,风险分数>0,对应的逻辑回归概率>0.5,预测值为阳性,和真实值完全相同;对于测试样本中的正常对照样本,只有一例样本,其风险分数>0,对应的逻辑回归概率>0.5,预测值为阳性,该例样本的预测结果错误,除了该例样本外,其余所有的阴性样本预测值与真实值均符合。本文模型,对独立测试样本的预测效果非常有效,在独立测试集中,ROC曲线面积为0.994(如图11B),验证结果灵敏度69/69=100%,特异度14/15=93.33%。For the test samples, the prediction model related to the 8 methylation sites obtained by the above training was used for prediction. The results in Figure 11A show that for all cancer samples, the risk score>0, the corresponding logistic regression probability>0.5, and the predicted value is positive, which is exactly the same as the true value; for the normal control samples in the test samples, there is only one sample with a risk score>0, a corresponding logistic regression probability>0.5, and a positive predicted value. The prediction result of this sample is wrong. Except for this sample, the predicted values of all other negative samples are consistent with the true value. The model in this paper is very effective in predicting independent test samples. In the independent test set, the ROC curve area is 0.994 (as shown in Figure 11B), and the validation results show that the sensitivity is 69/69=100% and the specificity is 14/15=93.33%.

综上所述,本发明共纳入208列结直肠癌和30列正常对照样本的甲基化数据,随机分为训练集和验证集,利用训练集分别建立了基于2个甲基化位点和8个甲基化位点的逻辑回归模型,并在验证集上取得非常好的效果,精准预测了结直肠癌样本与正常人样本。In summary, the present invention includes methylation data of 208 columns of colorectal cancer and 30 columns of normal control samples, which are randomly divided into training sets and validation sets. Logistic regression models based on 2 methylation sites and 8 methylation sites were established using the training sets, respectively, and very good results were achieved on the validation set, accurately predicting colorectal cancer samples and normal human samples.

上文已对基本概念做了描述,显然,对于本领域技术人员来说,上述详细披露仅仅作为示例,而并不构成对本说明书的限定。虽然此处并没有明确说明,本领域技术人员可能会对本说明书进行各种修改、改进和修正。该类修改、改进和修正在本说明书中被建议,所以该类修改、改进、修正仍属于本说明书示范实施例的精神和范围。The basic concepts have been described above. Obviously, for those skilled in the art, the above detailed disclosure is only for example and does not constitute a limitation of this specification. Although not explicitly stated here, those skilled in the art may make various modifications, improvements and corrections to this specification. Such modifications, improvements and corrections are suggested in this specification, so such modifications, improvements and corrections still belong to the spirit and scope of the exemplary embodiments of this specification.

同时,本说明书使用了特定词语来描述本说明书的实施例。如“一个实施例”、“一实施例”、和/或“一些实施例”意指与本说明书至少一个实施例相关的某一特征、结构或特点。因此,应强调并注意的是,本说明书中在不同位置两次或多次提及的“一实施例”或“一个实施例”或“一个替代性实施例”并不一定是指同一实施例。此外,本说明书的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。At the same time, this specification uses specific words to describe the embodiments of this specification. For example, "one embodiment", "an embodiment", and/or "some embodiments" refer to a certain feature, structure or characteristic related to at least one embodiment of this specification. Therefore, it should be emphasized and noted that "one embodiment" or "an embodiment" or "an alternative embodiment" mentioned twice or more in different positions in this specification does not necessarily refer to the same embodiment. In addition, certain features, structures or characteristics in one or more embodiments of this specification can be appropriately combined.

一些实施例中使用了描述成分、属性数量的数字,应当理解的是,此类用于实施例描述的数字,在一些示例中使用了修饰词“大约”、“近似”或“大体上”来修饰。除非另外说明,“大约”、“近似”或“大体上”表明所述数字允许有±20%的变化。相应地,在一些实施例中,说明书和权利要求中使用的数值参数均为近似值,该近似值根据个别实施例所需特点可以发生改变。在一些实施例中,数值参数应考虑规定的有效数位并采用一般位数保留的方法。尽管本说明书一些实施例中用于确认其范围广度的数值域和参数为近似值,在具体实施例中,此类数值的设定在可行范围内尽可能精确。In some embodiments, numbers describing the number of components and attributes are used. It should be understood that such numbers used in the description of the embodiments are modified by the modifiers "about", "approximately" or "substantially" in some examples. Unless otherwise specified, "about", "approximately" or "substantially" indicate that the numbers are allowed to vary by ±20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximate values, which may change according to the required features of individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and adopt the general method of retaining digits. Although the numerical domains and parameters used to confirm the breadth of their range in some embodiments of this specification are approximate values, in specific embodiments, the setting of such numerical values is as accurate as possible within the feasible range.

针对本说明书引用的每个专利、专利申请、专利申请公开物和其他材料,如文章、书籍、说明书、出版物、文档等,特此将其全部内容并入本说明书作为参考。与本说明书内容不一致或产生冲突的申请历史文件除外,对本说明书权利要求最广范围有限制的文件(当前或之后附加于本说明书中的)也除外。需要说明的是,如果本说明书附属材料中的描述、定义、和/或术语的使用与本说明书所述内容有不一致或冲突的地方,以本说明书的描述、定义和/或术语的使用为准。Each patent, patent application, patent application publication, and other materials, such as articles, books, specifications, publications, documents, etc., cited in this specification is hereby incorporated by reference in its entirety. Except for application history documents that are inconsistent with or conflicting with the contents of this specification, documents that limit the broadest scope of the claims of this specification (currently or later attached to this specification) are also excluded. It should be noted that if the descriptions, definitions, and/or use of terms in the materials attached to this specification are inconsistent or conflicting with the contents described in this specification, the descriptions, definitions, and/or use of terms in this specification shall prevail.

最后,应当理解的是,本说明书中所述实施例仅用以说明本说明书实施例的原则。其他的变形也可能属于本说明书的范围。因此,作为示例而非限制,本说明书实施例的替代配置可视为与本说明书的教导一致。相应地,本说明书的实施例不仅限于本说明书明确介绍和描述的实施例。Finally, it should be understood that the embodiments described in this specification are only used to illustrate the principles of the embodiments of this specification. Other variations may also fall within the scope of this specification. Therefore, as an example and not a limitation, alternative configurations of the embodiments of this specification may be considered consistent with the teachings of this specification. Accordingly, the embodiments of this specification are not limited to the embodiments explicitly introduced and described in this specification.

Claims (12)

  1. Use of a DNA methylation site combination as biomarker or detection reagent of a DNA methylation site combination for the preparation of a kit for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment effect assessment and/or colorectal cancer treatment drug screening, characterized in that the DNA methylation site combination comprises one or more of the following group:
    chromosome coordinates located on the SOCS-1 gene are chr16:11348913, position SOCS-1_24;
    chromosome coordinates located on cg24403845 gene are chr10:108924288 position cg24403845_103.
  2. 2. The use of claim 1, wherein the combination of DNA methylation sites further comprises one or more of the following:
    Chromosome coordinates located on HOXD genes are chr2:177027898, position HOXD3_26;
    Chromosome coordinates located on the SDC2-2_gene are chr8:97505775 position SDC2-2_46;
    Chromosome coordinates located on the SFRP1 gene are chr8:41167015, position SFRP 1-48;
    Chromosome coordinates located on the SP9 gene are chr2:175199694, position SP9_25;
    chromosome coordinates located on the TAC1-2 gene are chr7:97361597 position TAC1-2_135;
    chromosome coordinates located on the TBR1 gene are chr2:162283730, position TBR1_161.
  3. 3. The use of claim 2, wherein the DNA methylation site combination comprises a combination consisting of SOCS-1_24, cg24403845_103, HOXD3_26, SDC2-2_46, sfrp1_48, SP9_25, TAC1-2_135, and tbr1_161.
  4. 4. The use of claims 1-2, wherein the detection reagent comprises a primer pair set for amplifying the DNA methylation site combination; wherein,
    The primer pair for amplifying SOCS-1_24 is shown as SEQ ID NO.1 and SEQ ID NO. 2;
    the primer pair for amplifying cg24403845_103 is shown as SEQ ID NO. 3 and SEQ ID NO. 4;
    the primer pair for amplifying HOXD-26 is shown as SEQ ID NO. 5 and SEQ ID NO. 6;
    the primer pair for amplifying SDC2-2_46 is shown as SEQ ID NO. 7 and SEQ ID NO. 8;
    The primer pair for amplifying SFRP 1-48 is shown as SEQ ID NO. 9 and SEQ ID NO. 10;
    The primer pair for amplifying the SP 9-25 is shown as SEQ ID NO. 11 and SEQ ID NO. 12;
    the primer pair for amplifying TAC1-2_135 is shown as SEQ ID NO. 13 and SEQ ID NO. 14;
    The primer pair for amplifying TBR1_161 is shown as SEQ ID NO. 15 and SEQ ID NO. 16.
  5. 5. A device for colorectal cancer screening, colorectal cancer risk prediction, colorectal cancer treatment efficacy assessment and/or colorectal cancer treatment drug screening, the device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor when executing the program implements the following method:
    Obtaining a methylation level of a combination of DNA methylation sites in a biological sample of a subject, wherein the combination of DNA methylation sites comprises:
    Chromosome coordinates located on the SOCS-1 gene are chr16:11348913, position SOCS-1_24; and/or
    Chromosome coordinates located on cg24403845 gene are chr10:108924288, position cg24403845_103;
    Based on the methylation level of the combination of DNA methylation sites, a predictive model is used to assess whether the subject has colorectal cancer, predict the risk of the subject developing colorectal cancer, assess the efficacy of the subject in treating colorectal cancer, and/or assess the efficacy of colorectal cancer treatment drugs.
  6. 6. The device of claim 5, wherein the combination of DNA methylation sites further comprises one or more of the following sites:
    Chromosome coordinates located on HOXD genes are chr2:177027898, position HOXD3_26;
    Chromosome coordinates located on the SDC2-2_gene are chr8:97505775 position SDC2-2_46;
    Chromosome coordinates located on the SFRP1 gene are chr8:41167015, position SFRP 1-48;
    Chromosome coordinates located on the SP9 gene are chr2:175199694, position SP9_25;
    chromosome coordinates located on the TAC1-2 gene are chr7:97361597 position TAC1-2_135;
    chromosome coordinates located on the TBR1 gene are chr2:162283730, position TBR1_161.
  7. 7. A predictive model for colorectal cancer screening, colorectal cancer disease risk prediction, colorectal cancer treatment efficacy assessment and/or colorectal cancer treatment drug screening using methylation sites, characterized in that the methylation sites comprise chromosomal coordinates chr16 on the SOCS-1 gene: 11348913, position SOCS-1_24; and the chromosomal coordinate located on cg24403845 gene is chr10:108924288, position cg24403845_103; the predictive model is represented by the formula
    Risk score = -4.38+5.16 (cg 24403845 _103) +1.18 x socs-1_24, wherein Risk score represents the subject suffering from colorectal cancer, the Risk score for developing colorectal cancer, and the Risk level of colorectal cancer after the subject receives treatment and/or administration, and the gene locus represents the methylation level of the gene locus.
  8. 8. The predictive model of claim 7, wherein the methylation site further comprises:
    Chromosome coordinates located on HOXD genes are chr2:177027898, position HOXD3_26;
    Chromosome coordinates located on the SDC2-2_gene are chr8:97505775 position SDC2-2_46;
    Chromosome coordinates located on the SFRP1 gene are chr8:41167015, position SFRP 1-48;
    Chromosome coordinates located on the SP9 gene are chr2:175199694, position SP9_25;
    chromosome coordinates located on the TAC1-2 gene are chr7:97361597 position TAC1-2_135; and
    Chromosome coordinates located on the TBR1 gene are chr2:162283730 at position TBR1_161, the predictive model may also be represented by the formula
    Risk score=-7.11+2.89*cg24403845_103+3.61*SOCS-1_24+1.05*TAC1-2_135+0.92*SDC2-2_46+0.77*SFRP1_48+0.72*TBR1_161+0.67*SP9_25+0.19*HOXD3_26 And (c) a Risk score for colorectal cancer, and a Risk level for colorectal cancer after the subject receives the treatment and/or the administration, wherein the Risk score for colorectal cancer and the Risk level for colorectal cancer are expressed by the Risk score, and each gene locus represents the methylation level of the gene locus.
  9. 9. A methylation site combination for colorectal cancer screening, colorectal cancer risk prediction, evaluation of colorectal cancer treatment efficacy and/or screening of colorectal cancer treatment drugs, the DNA methylation site combination comprising:
    Chromosome coordinates located on the SOCS-1 gene are chr16:11348913, position SOCS-1_24; and/or
    Chromosome coordinates located on cg24403845 gene are chr10:108924288 position cg24403845_103.
  10. 10. The methylation site combination of claim 9, wherein the DNA methylation site combination further comprises one or more of the following sites:
    Chromosome coordinates located on HOXD genes are chr2:177027898, position HOXD3_26;
    Chromosome coordinates located on the SDC2-2_gene are chr8:97505775 position SDC2-2_46;
    Chromosome coordinates located on the SFRP1 gene are chr8:41167015, position SFRP 1-48;
    Chromosome coordinates located on the SP9 gene are chr2:175199694, position SP9_25;
    chromosome coordinates located on the TAC1-2 gene are chr7:97361597 position TAC1-2_135;
    chromosome coordinates located on the TBR1 gene are chr2:162283730, position TBR1_161.
  11. 11. A detection reagent for a combination of DNA methylation sites according to claim 9, characterized in that the detection reagent comprises a primer pair set for amplifying the combination of DNA methylation sites; wherein,
    The primer pair for amplifying SOCS-1_24 is shown as SEQ ID NO.1 and SEQ ID NO. 2;
    the primer pair for amplifying cg24403845_103 is shown as SEQ ID NO. 3 and SEQ ID NO. 4.
  12. 12. A kit for colorectal cancer screening, colorectal cancer risk prediction, evaluation of colorectal cancer treatment efficacy and/or screening of colorectal cancer treatment drugs, comprising the detection reagent according to claim 11.
CN202410724631.1A 2024-06-05 2024-06-05 A method and kit for multiplex PCR targeted methylation sequencing Pending CN118703626A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410724631.1A CN118703626A (en) 2024-06-05 2024-06-05 A method and kit for multiplex PCR targeted methylation sequencing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410724631.1A CN118703626A (en) 2024-06-05 2024-06-05 A method and kit for multiplex PCR targeted methylation sequencing

Publications (1)

Publication Number Publication Date
CN118703626A true CN118703626A (en) 2024-09-27

Family

ID=92807009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410724631.1A Pending CN118703626A (en) 2024-06-05 2024-06-05 A method and kit for multiplex PCR targeted methylation sequencing

Country Status (1)

Country Link
CN (1) CN118703626A (en)

Similar Documents

Publication Publication Date Title
US20230042332A1 (en) Disease Detection in Liquid Biopsies
EP3658684B1 (en) Enhancement of cancer screening using cell-free viral nucleic acids
WO2024183507A1 (en) Dna methylation site combination as marker of prostate cancer and use thereof
CN108676879A (en) Application of specific methylation sites as diagnostic markers for molecular typing of breast cancer
US11783912B2 (en) Methods and systems for analyzing nucleic acid molecules
JP2022536180A (en) Methods for detecting and predicting breast cancer
CN108300787A (en) Special application of the methylation sites as early diagnosing mammary cancer marker
EP4294938A1 (en) Cell-free dna methylation test
CN106399304A (en) Breast cancer related SNP marker
CN117551762A (en) DNA methylation site combination as colorectal tumor marker and application thereof
CN115896281B (en) Methylation biomarker, kit and application
KR102729181B1 (en) DNA methylation biomarkers for diagnosis or predicting prognosis of nontuberculous mycobacterium infection and use thereof
JP2024530154A (en) Co-occurrence of somatic mutations and aberrantly methylated fragments
CN118745464B (en) Liver cancer methylation markers and their applications
WO2025213323A1 (en) Dna methylation site combination as bladder cancer marker and use thereof
WO2023226939A1 (en) Methylation biomarker for detecting colorectal cancer lymph node metastasis and use thereof
CN118703626A (en) A method and kit for multiplex PCR targeted methylation sequencing
CN116987788B (en) Method and kit for detecting early lung cancer by using flushing liquid
CN115772565A (en) Methylation site for auxiliary detection of lung cancer somatic cell EGFR gene mutation and application thereof
CN120485365B (en) Application of leukocyte DNA methylation markers in colorectal cancer risk prediction
CN115772566B (en) Methylation biomarker for auxiliary detection of lung cancer somatic ERBB2 gene mutation and application thereof
CN116987787B (en) Device and computer-readable storage medium for detecting whether bladder cancer recurs
US20250163514A1 (en) Epigenetic biomarkers for the diagnosis of thyroid cancer
CN113005198B (en) Kit for detecting 15 gene mutation sites related to sensitivity of radiotherapy and chemotherapy of rectal cancer and application thereof
KR20240059529A (en) Methylation markers for diagnosing lung cancer and combinations thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination