Disclosure of Invention
To solve the deficiencies of the prior art, the inventors found that the markers (or targets) of the present invention can diagnose gastric cancer, predict the risk of occurrence of gastric cancer, or determine the state of gastric cancer with high sensitivity and specificity by screening a large number of markers. Based on the markers (or targets) of the invention, cancerous and non-cancerous tissues can be effectively distinguished.
More specifically, the invention provides 15 specific markers (or targets), establishes a diagnosis model of single markers or targets, any two marker combinations or targets and the relation between more than three marker combinations or targets and gastric cancer, and has the advantages of noninvasive detection, safe and convenient detection, high flux and high detection accuracy.
In one aspect, the invention relates to the use of a reagent for the preparation of a kit for diagnosing gastric cancer, predicting the risk of developing gastric cancer or determining the status of gastric cancer in a subject, characterized in that the reagent is used for detecting the methylation level of at least one marker selected from the group consisting of SEPTIN9, SEPTIN9_2, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, FGF2 and any combination thereof in a sample isolated from the subject.
In some embodiments, the agent is an agent selected from the group consisting of:
i) A substance, such as an oligonucleotide primer or probe, which is preferably complementary or identical to a fragment of at least 9 bases long of at least one target region of the marker, which hybridizes to or amplifies the at least one target region of the marker, and
Ii) a bisulphite reagent or a methylation sensitive restriction enzyme reagent that distinguishes between methylated and unmethylated dinucleotides, such as methylated and unmethylated CpG dinucleotides, within at least one region of interest of the marker.
In some embodiments, the at least one marker is a marker combination selected from the group consisting of:
i) DLX4 and PRDM14;
ii) DLX4, PRDM14, TJP2, HOXB3 and LOC645323;
iii) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, and HOXB6;
iv) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2 and VWC2, or
V) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2, VWC2, SEPTIN9, FGF2, ADCY1, and RNF180.
In some embodiments, the sample is selected from the group consisting of cell lines, histological sections, tissue biopsies, paraffin-embedded tissues, body fluids, and combinations thereof, preferably the sample is selected from the group consisting of stomach tissue, plasma, serum, whole blood, isolated blood cells, and combinations thereof.
In some embodiments, the reagent is used to detect the methylation level of at least one target region of the at least one marker, the target region being selected from the group consisting of regions chr17:75369558-75369622、chr17:75369603-75369691、chr6:392282-392377、chr9:71736209-71870124、chr1:111217074-111217181、chr5:63461942-63462020、chr7:25896423-25896507、chr7:49813254-49813323、chr8:70982125-70982184、chr17:46673901-46674018、chr13:103046952-103047051、chr7:45613861-45613949、chr17:46671415-46671501、chr17:48042492-48042581、chr4:123748602-123748663 or their complement or a treated sequence, or a treated sequence of the complement, or any combination of the foregoing sequences and/or regions.
In some embodiments, the oligonucleotide primer is selected from any one or more of SEQ ID NOS: 1-30.
In some embodiments, the oligonucleotide probe is selected from any one or more of SEQ ID NOS.33-47.
In another aspect, the invention relates to a kit for diagnosing gastric cancer, predicting the risk of developing gastric cancer or determining the status of gastric cancer in an individual, characterized in that the kit comprises reagents for detecting the methylation level of at least one target region of at least one marker selected from the group consisting of SEPTIN9, SEPTIN9_2, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, FGF2 and any combination thereof in a sample isolated from the individual.
In some embodiments, the agent is an agent selected from the group consisting of:
i) A substance, such as an oligonucleotide primer or probe, which is preferably complementary or identical to a fragment of at least 9 bases long of at least one target region of the marker, which hybridizes to or amplifies the at least one target region of the marker, and
Ii) a bisulphite reagent or a methylation sensitive restriction enzyme reagent that distinguishes between methylated and unmethylated dinucleotides, such as methylated and unmethylated CpG dinucleotides, within at least one region of interest of the marker.
In some embodiments, the at least one marker is a marker combination selected from the group consisting of:
i) DLX4 and PRDM14;
ii) DLX4, PRDM14, TJP2, HOXB3 and LOC645323;
iii) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, and HOXB6;
iv) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2 and VWC2, or
V) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2, VWC2, SEPTIN9, FGF2, ADCY1, and RNF180.
In some embodiments, the sample is selected from the group consisting of cell lines, histological sections, tissue biopsies, paraffin-embedded tissues, body fluids, and combinations thereof, preferably the sample is selected from the group consisting of stomach tissue, plasma, serum, whole blood, isolated blood cells, and combinations thereof.
In some embodiments, the reagent is used to detect the methylation level of at least one target region of the at least one marker, the target region being selected from the group consisting of regions chr17:75369558-75369622、chr17:75369603-75369691、chr6:392282-392377、chr9:71736209-71870124、chr1:111217074-111217181、chr5:63461942-63462020、chr7:25896423-25896507、chr7:49813254-49813323、chr8:70982125-70982184、chr17:46673901-46674018、chr13:103046952-103047051、chr7:45613861-45613949、chr17:46671415-46671501、chr17:48042492-48042581、chr4:123748602-123748663 or their complement or a treated sequence, or a treated sequence of the complement, or any combination of the foregoing sequences and/or regions.
In some embodiments, the oligonucleotide primer is selected from any one or more of SEQ ID NOS: 1-30.
In some embodiments, the oligonucleotide probe is selected from any one or more of SEQ ID NOS.33-47.
In yet another aspect, the invention relates to a method for diagnosing gastric cancer, predicting risk of developing gastric cancer or determining gastric cancer status in an individual, the method comprising the steps of:
(a) Obtaining a biological sample containing DNA from said individual, and
(B) Treating the DNA in the biological sample obtained in step (a) with a reagent capable of distinguishing between methylated and unmethylated sites, such as CpG sites, in the DNA, thereby obtaining treated DNA;
(c) Optionally pre-amplifying at least one target region of at least one target marker in the treated DNA obtained from step (b) with a pre-amplification primer pool, wherein at least one target region of each target marker is pre-amplified to obtain at least one pre-amplified product, and the at least one target marker comprises one or more markers selected from the group consisting of SEPTIN9, SEPTIN9_2, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, FGF2 and any combination thereof, and wherein the target region comprises at least one CpG dinucleotide sequence, and
(D) Detecting a methylation template of at least one target region of at least one target marker in step (b) or in step (c), said at least one target marker comprising one or more markers selected from the group consisting of SEPTIN9, SEPTIN9_2, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, FGF2, and any combination thereof.
In some embodiments, the reagent is a bisulphite reagent or a methylation sensitive restriction enzyme reagent that distinguishes between methylated and unmethylated dinucleotides, such as methylated and unmethylated CpG dinucleotides, within at least one target region of the marker.
In some embodiments, in step (c), amplification is performed using a substance that amplifies at least one region of interest of the marker, such as an oligonucleotide primer. In some embodiments, the oligonucleotide primer is complementary to or identical to a fragment of at least 9 bases long of at least one target region of the marker. In some embodiments, the oligonucleotide primer is selected from any one or more of SEQ ID NOS: 1-30. In some embodiments, the oligonucleotide probe is selected from any one or more of SEQ ID NOS.33-47. In some embodiments, in step (d), detection is performed using a substance, such as a probe, that hybridizes to at least one target region of the marker. In some embodiments, the probe is complementary to or identical to a fragment of at least 9 bases long of at least one target region of the marker.
In some embodiments, the at least one marker is a marker combination selected from the group consisting of:
i) DLX4 and PRDM14;
ii) DLX4, PRDM14, TJP2, HOXB3 and LOC645323;
iii) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, and HOXB6;
iv) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2 and VWC2, or
V) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2, VWC2, SEPTIN9, FGF2, ADCY1, and RNF180.
In some embodiments, the sample is selected from the group consisting of cell lines, histological sections, tissue biopsies, paraffin-embedded tissues, body fluids, and combinations thereof, preferably the sample is selected from the group consisting of stomach tissue, plasma, serum, whole blood, isolated blood cells, and combinations thereof.
In some embodiments, the reagent is used to detect the methylation level of at least one target region of the at least one marker, the target region being selected from the group consisting of regions chr17:75369558-75369622、chr17:75369603-75369691、chr6:392282-392377、chr9:71736209-71870124、chr1:111217074-111217181、chr5:63461942-63462020、chr7:25896423-25896507、chr7:49813254-49813323、chr8:70982125-70982184、chr17:46673901-46674018、chr13:103046952-103047051、chr7:45613861-45613949、chr17:46671415-46671501、chr17:48042492-48042581、chr4:123748602-123748663 or their complement or a treated sequence, or a treated sequence of the complement, or any combination of the foregoing sequences and/or regions.
Detailed Description
Several aspects of the invention are described below with reference to example applications for illustration. It should be understood that numerous specific details, relationships, and methods are set forth to provide a full understanding of the invention. One of ordinary skill in the relevant art, however, will readily recognize that the invention can be practiced without one or more of the specific details or with other methods.
The present invention relates to the relationship between the methylation level of newly discovered markers and gastric cancer. The markers described herein provide methods for diagnosing gastric cancer or assessing the risk of gastric cancer in a subject. Thus, one embodiment of the invention represents an improvement of markers suitable for diagnosing or assessing the risk of gastric cancer. In yet another embodiment, the newly discovered markers of the invention may be used in combination with one or more other gastric cancer markers known in the art (e.g., CEA, CA125, CA199, CA724, CA242, etc.), etc., e.g., for diagnosing gastric cancer or assessing risk of gastric cancer in an individual or for preparing kits and/or microarrays for this purpose.
The term "sample" means a material that is known or suspected to express or contain a marker as described herein. The sample may be derived from biological sources ("biological samples"), such as tissues (e.g., biopsy samples), extracts or cell cultures including cells (e.g., tumor cells), cell lysates, and biological or physiological fluids, such as whole blood, plasma, serum, saliva, cerebral spinal fluid, sweat, urine, milk, peritoneal fluid, and the like. Samples obtained from sources or after pretreatment to improve sample characteristics (e.g., to prepare plasma from blood, etc.) may be used directly. In certain aspects of the invention, the sample is a human physiological fluid, such as human plasma. In certain aspects of the invention, the sample is a biopsy sample such as tumor tissue or cells obtained by a biopsy.
In certain specific aspects of the invention, the sample is plasma or stomach tissue.
The target polynucleotide or substances (e.g., oligonucleotide primers or probes) hybridized or amplified to the target polynucleotide may be detectably labeled on one or more nucleotides using methods known in the art. The detectable label can be, but is not limited to, a luminescent label, a fluorescent label, a bioluminescent label, a chemiluminescent label, a radioactive label, and a colorimetric label.
As used herein, the term "marker" refers to a nucleic acid, gene region, or methylation site of interest whose methylation level or score based on a computational model of methylation level (e.g., AUC of ROC curve in case a machine learning model such as a logistic regression model is used) is indicative of gastric cancer diagnosis or high risk of gastric cancer. A gene should be considered to include all its transcriptional variants and all its promoter and regulatory elements. As will be appreciated by those skilled in the art, certain genes are known to exhibit allelic variation or single nucleotide polymorphism ("SNP") between individuals. SNPs include insertions and deletions of simple repeat sequences of different lengths (e.g., dinucleotide and trinucleotide repeats). Thus, the present application should be understood to extend to all forms of markers/genes arising from any other mutation, polymorphism or allelic variation. In addition, it is understood that the term "marker" shall include both the sense strand sequence of a marker or gene and the antisense strand sequence of a marker or gene.
The term "marker" as used herein is to be construed broadly to include both 1) the original marker found in a biological sample or genomic DNA (in a specific methylation state) and 2) its treated sequence (e.g., the corresponding region after bisulfite conversion or the corresponding region after MSRE treatment). The corresponding region after bisulfite conversion differs from the target marker in the genomic sequence in that one or more unmethylated cytosine residues are converted to uracil bases, thymine bases, or other bases that differ from cytosine in hybridization behavior. The MSRE treated corresponding region differs from the target marker in the genomic sequence in that the sequence is cleaved at one or more MSRE cleavage sites.
As used herein, "methylation state" refers to the presence, absence, and/or amount of one or more methylated nucleotide bases in a nucleic acid molecule. For example, a nucleic acid molecule containing a methylated cytosine is considered methylated when the methylation state of the nucleic acid molecule is methylated. Nucleic acid molecules that do not contain any methylation-modified cytosine are considered unmethylated, where the methylation state of the nucleic acid molecule is unmethylated. In some embodiments, a nucleic acid may be characterized as "unmethylated" if it is not methylated at a particular locus (e.g., a locus of a particular single CpG dinucleotide) or a particular combination of loci, even if it is methylated at other loci of the same gene or molecule.
Thus, methylation state describes the state of methylation of a nucleic acid (e.g., a genomic sequence). In addition, methylation state refers to a characteristic of a nucleic acid segment at a particular genomic locus that is associated with methylation. Such features include, but are not limited to, whether any cytosine (C) residues within this DNA sequence are methylated, the position of one or more methylated C residues, the frequency or percentage of methylated C throughout any particular region of the nucleic acid, and methylation allele differences due to, for example, differences in allele origins. "methylation state" refers to the relative concentration, absolute concentration, or pattern of methylated C or unmethylated C throughout any particular region of a nucleic acid in a biological sample. For example, one or more cytosine (C) residues within a nucleic acid sequence may be referred to as "hypermethylated" or have "increased methylation" if they are methylated, and one or more cytosine (C) residues within a DNA sequence may be referred to as "demethylated" or have "decreased methylation" if they are unmethylated. Likewise, if one or more cytosine (C) residues within a nucleic acid sequence are methylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.), the sequence is considered hypermethylated or has increased methylation compared to the other nucleic acid sequence. Or if one or more cytosine (C) residues within a DNA sequence are unmethylated compared to another nucleic acid sequence (e.g., from a different region or from a different individual, etc.), then the sequence is considered to be unmethylated or to have reduced methylation compared to the other nucleic acid sequence.
In the present invention, methylation level represents the proportion of one or more sites in the methylation state. The methylation level of a region (or group of sites) is the average of the methyl levels of all sites in the region (or all sites in the group). Thus, an increase or decrease in the methylation level of a region does not indicate an increase or decrease in the methylation level of all methylation sites in the region. The process of converting the results obtained by methods for detecting DNA methylation (e.g. simplified methylation sequencing, fluorescent quantitative PCR) to methylation levels is known in the art.
As used herein, "methylation level" includes the relationship of the methylation status of any number of CpG's in the sequence of interest, and any position. The relationship may be the addition or subtraction of methylation status parameters (e.g., 0 or 1) or the calculation of mathematical algorithms (e.g., mean, percentage, number of copies, scale, degree, or calculation using a mathematical model), including but not limited to methylation levelness values, methylation haplotype ratios, methylation haplotype loads, or AUC of ROC curves in the case of using a machine learning model such as a logistic regression model.
Genes used as markers in the present invention are intended to include naturally occurring variants of the gene, their complementary sequences, all of their promoters and regulatory elements (e.g., nucleic acid sequences within 5kb (e.g., 4kb, 3kb, 2kb, or 1 kb) upstream of the gene annotation start site and within 5kb downstream of the gene annotation end site), and fragments, particularly molecularly detectable fragments, of the gene or the variant. In the present invention, the terms "molecular biologically detectable fragment", "target region" and "target gene region" are used interchangeably. The molecular biology detectable fragment preferably comprises at least 16, 17, 18, 19, 20, 22, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300 or more consecutive nucleotides of the marker. In some embodiments, the contiguous nucleotide comprises at least 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 12, 15, or more CpG dinucleotide sequences. In some embodiments, it is preferred that the target gene region is rich in CpG dinucleotides.
In the present invention, the term "target region" or "target gene region" refers to any molecular biologically detectable fragment within a nucleic acid region consisting of the marker gene itself, 5kb upstream (e.g., 4kb, 3kb, 2kb, or 1 kb) of its gene annotation start site, and 5kb downstream (e.g., 4kb, 3kb, 2kb, or 1 kb) of its gene annotation end site, or a complementary sequence or a processed sequence thereof (e.g., a bisulfite converted counterpart sequence or a MSRE processed counterpart sequence), or a processed sequence of the complementary sequence (e.g., a bisulfite converted counterpart sequence or a MSRE processed counterpart sequence). For example, the target gene region of the target marker in table 1 below includes its Hg19 coordinates and any molecular biologically detectable fragment within 5kb (e.g., 4kb, 3kb, 2kb, or 1 kb) upstream and downstream of the coordinates, its complementary sequence or a treated sequence (e.g., a bisulfite converted corresponding sequence or an MSRE treated corresponding sequence), and the treated sequence of the complementary sequence (e.g., a bisulfite converted corresponding sequence or an MSRE treated corresponding sequence). More preferably, the target gene region of the target marker in table 1 below comprises its Hg19 coordinates and any molecular biologically detectable fragment within 5kb (e.g., 4kb, 3kb, 2kb, or 1 kb) upstream of the coordinates, its complementary sequence or a treated sequence (e.g., a bisulfite converted corresponding sequence or a MSRE treated corresponding sequence), and a treated sequence of the complementary sequence (e.g., a bisulfite converted corresponding sequence or a MSRE treated corresponding sequence).
In some embodiments, it is preferred to use and detect a target marker selected from table 1 below and a target region thereof or any combination thereof (according to Hg19 coordinates):
TABLE 1 target markers and target regions
The SEPTIN9 and SEPTIN9_2 described herein are protein-encoding genes, also known as SEPTIN9, which are members of the mitogen family, involved in cell division and cell cycle regulation. This gene is a candidate for an ovarian tumor suppressor gene. Mutation of this gene can lead to hereditary neuromuscular atrophy, also known as neuritis with arm dominance. Chromosomal translocations involving this gene on chromosome 17 and the MLL gene on chromosome 11 can lead to acute myeloid mononuclear leukemia.
The IRF4 is a protein coding gene, also called interferon regulatory factor 4, and the protein coded by the gene belongs to IRF (interferon regulating factor) transcription factor family and is characterized by having a unique tryptophan pentapeptide repetitive DNA binding domain. IRFs plays an important role in the regulation of interferon and in the regulation of interferon-inducible genes in response to viral infections.
TJP2 as described herein is a protein-encoding gene, also known as tight junction protein 2, which encodes a tight junction protein of members of the membrane-associated guanylate kinase homolog family. The encoded protein plays a role in the tight junction barrier of epithelial and endothelial cells, and is necessary for proper assembly of the tight junction.
KCNA3, as described herein, is a protein-encoding gene, also known as potassium voltage-GATED CHANNEL subfamily A member 3, which encodes a member of the subfamily associated with potassium channel, voltage-gating, shaker. The member comprises six transmembrane domains and the fourth segment contains a shaker-type repeat sequence. It belongs to the class of delayed rectifiers, which members allow nerve cells to repolarize efficiently after action potentials. It plays an important role in T cell proliferation and activation.
The RNF180 described herein is a protein-encoding gene, also known as RING FINGER protein180, which encodes a protein predicted to have ubiquitin-binding enzyme binding activity and ubiquitin-protein ligase activity. Predicted to be involved in the norepinephrine metabolic process, protein ubiquitin-dependent positive regulation during protein degradation, and tryptophan metabolic process. Predicting function in a number of processes including adult behavior, upregulation of protein ubiquitination, and protein polyubiquitination. Predicted to be located in the nuclear membrane. Predicted to be an intrinsic component of the membrane, located in the endoplasmic reticulum membrane.
LOC645323 as described herein is a region between two genes, located on chromosome 7.
VWC2 as described herein is a protein-encoding gene, also known as von Willebrand factor C domain containing 2, which encodes a secreted bone morphogenic protein antagonistic substance. The encoded protein may be involved in neurological function and development and may play a role in cell adhesion.
PRDM14 as described herein is a protein-encoding gene, also known as PR/SET domain 14, which encodes a protein containing members of the PRDI-BF1 and RIZ homology domains of the transcription regulator family. The encoded protein may have histone methyltransferase activity and play a key role in cellular pluripotency by inhibiting the expression of differentiation marker genes.
HOXB3, as described herein, is a protein-encoding gene, also known as homeobox B, which is a member of the Antp homeobox family, encoding a nucleoprotein with a homeobox DNA binding domain. It is contained in a set homeobox B of gene clusters located on chromosome 17. The encoded protein is a transcription factor of specific sequence and participates in development process. The increased expression of this gene is associated with a specific biological subtype of acute myelogenous leukemia.
FGF14 as described herein is a gene encoding a protein, also known as fibroblast growth factor 14, which encodes a protein that is a member of the Fibroblast Growth Factor (FGF) family. FGF family members have a broad range of mitotic and cell survival activities and are involved in a variety of biological processes including embryonic development, cell growth, morphogenesis, tissue repair, tumor growth, and invasion.
ADCY1 as described herein is a protein-encoding gene, also known as ADENYLATE CYCLASE 1, which encodes a member of the adenylate cyclase gene family, expressed predominantly in the brain. The protein is regulated by calcium/calmodulin concentration, and may be involved in brain development. By alternative splicing, multiple transcriptional variants are produced.
HOXB6, as described herein, is a protein-encoding gene, also known as homeobox B6, which is a member of the Antp homeobox family, encoding a protein having a homeobox DNA binding domain. It is included in a set homeobox B gene clusters located on chromosome 17. The encoded protein is a transcription factor of a specific sequence, involved in development, including lung and skin development, and has been localized in the nucleus and cytoplasm. Altered expression of this gene or altered subcellular localization of its proteins has been associated with some cases of acute myelogenous leukemia and colorectal cancer.
DLX4 described herein is a protein encoding gene, also known as distal-less homeobox 4, which is presumed to play a role in forebrain and craniofacial development.
FGF2 as described herein is a gene encoding a protein, also known as fibroblast growth factor2, which encodes a protein that is a member of the Fibroblast Growth Factor (FGF) family. FGF family members are capable of binding heparin and have broad mitotic and angiogenic activity. This protein has been thought to be involved in a variety of biological processes such as limb and nervous system development, wound healing and tumor growth.
In some embodiments, the at least one marker comprises SEPTIN9 and any one or more selected from SEPTIN9_2, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises SEPTIN9_2 and any one or more selected from SEPTIN9, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises IRF4 and any one or more selected from SEPTIN9_2, SEPTIN9, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises TJP2 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises KCNA3 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises RNF180 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises LOC645323 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, RNF180, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises VWC2 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, RNF180, LOC645323, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises PRDM14 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, HOXB3, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises HOXB3 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, FGF14, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises FGF14 and any one or more selected from the group consisting of SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, ADCY1, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises ADCY1 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, HOXB6, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises HOXB6 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, DLX4, and FGF 2.
In some embodiments, the at least one marker comprises DLX4 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, and FGF 2.
In some embodiments, the at least one marker comprises FGRF2 and any one or more selected from SEPTIN9_2, SEPTIN9, IRF4, TJP2, KCNA3, RNF180, LOC645323, VWC2, PRDM14, HOXB3, FGF14, ADCY1, HOXB6, and DLX 4.
In some embodiments, it is preferred to use and detect a combination of two or more of the markers of interest and their regions of interest in table 1. In some embodiments, the following combinations of target markers and target regions thereof in table 1 are preferably used and detected:
i) DLX4 and PRDM14;
ii) DLX4, PRDM14, TJP2, HOXB3 and LOC645323;
iii) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, and HOXB6;
iv) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2 and VWC2, or
V) DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2, VWC2, SEPTIN9, FGF2, ADCY1, and RNF180.
In some embodiments, the use of the subject targets and target gene regions thereof, or combinations thereof, is capable of achieving a sensitivity of at least 25%, e.g., at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 81%, at least 82%, or at least 83% with a specificity of greater than 80%, e.g., greater than 85%, or greater than 90%.
The terms "subject," "patient," and "individual" are used interchangeably herein to refer to a warm-blooded animal, such as a mammal. The term includes, but is not limited to, domestic animals, rodents (e.g., rats and mice), primates, and humans. Preferably the term refers to a human.
The term "methylation assay" refers to any assay that determines the methylation state of one or more dinucleotide (e.g., cpG) sequences within a DNA sequence.
In this context, the term "threshold" should be understood according to the general understanding of the person skilled in the art and represents any useful reference for reflecting the level of DNA methylation.
In one or more embodiments, the methylation level (e.g., ct value) of the target marker is increased or decreased when compared to a reference level. When the methylation marker level (e.g., ct value) meets a certain threshold, it is identified as having, at risk of developing, or progressing gastric cancer. Illustratively, the cutoff (threshold) values for Ct values in each target plasma described herein are, respectively, SEPTIN9 of 44.23, SEPTIN 9-2 of 44.50, IRF4 of 28.53, TJP2 of 27.68, KCNA3 of 44.50, RNF180 of 30.74, LOC645323 of 26.11, VWC2 of 24.03, PRDM14 of 25.45, HOXB3 of 25.40, FGF14 of 44.53, ADCY1 of 29.56, HOXB6 of 44.93, DLX4 of 25.97, FGF2 of 26.87. Target Ct values below the respective cut-off values indicate that gastric cancer is present, that there is a risk of developing gastric cancer or that gastric cancer is progressing, target Ct values above the respective cut-off values indicate that gastric cancer is not present, that there is a low risk of developing gastric cancer or that gastric cancer is remitting.
Conventional mathematical analysis methods and processes for determining thresholds, such as machine learning, are known in the art to reflect the stability of a model by fitting the model in a training set and evaluating the model in a test set. As with the differential analysis, the methylation signal differences between gastric cancer groups and non-gastric cancer groups were examined using a non-parametric test based on rank sum. For example, lasso regression, the regression coefficient of Lasso regression punishment sites is utilized, so that the characteristic selection of important sites is realized. An exemplary method is a binary logistic regression mathematical model. For example, for differential methylation markers, binary logistic regression is constructed on training set samples, and the test set sample prediction scores are counted using the accuracy, sensitivity and specificity of model statistics detection results and the area under the predicted value characteristic curve (ROC) (AUC). The method for constructing the binary logistic regression mathematical model is conventional in the art, and the logistic regression mathematical model combined by n targets is as follows:
Logit=a0+a1*Ct1+a2*Ct2……an*Ctn
When the n targets are combined, a Logit value can be obtained based on the Ct value and the Logistic regression formula of each target, and the CT value and the Logit value corresponding to the single target and the multi-target combined Logen index in the training set are used as cut-off values, so that the existence of gastric tumor can be confirmed, and the formation risk of gastric tumor can be estimated.
Any combination of the targets described herein in combination with a binary logistic regression mathematical model confirm the presence of gastric tumors, assess the risk of gastric neoplasia, and have AUCs higher than those of the single targets, as shown in tables 5 and 6. Regression models and cut-offs for some exemplary combined targets as shown in table 7, regression results with a Logit value higher than the corresponding cut-off value indicate that gastric cancer is present, that there is a risk of developing gastric cancer, or that gastric cancer is progressing, and Logit values below the corresponding cut-off value indicate that gastric cancer is not present, that there is a low risk of developing gastric cancer, or that gastric cancer is remitting.
The logistic regression model of the DLX4 and PRDM14 targets is Logit= 3.822812+ (-0.058743) CtT14+ (-0.048108) CtT9, and the cut-off value is 0.888.
The logistic regression model for the DLX4, PRDM14, TJP2, HOXB3 and LOC645323 targets was :Logit=7.412310+(-0.041919)*CtT14+(-0.023873)*CtT9+(-0.064603)*CtT4+(-0.037648)*CtT10+(-0.023907)*CtT7, cutoff of 0.418.
The logistic regression model of DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3 and HOXB6 targets was :Logit=13.176031+(-0.029628)*CtT14+(-0.017306)*CtT9+(-0.047626)*CtT4+(-0.035422)*CtT10+(-0.018088)*CtT7+(-0.033752)*CtT11+(-0.074550)*CtT5+(-0.048253)*CtT13, cutoff value was 0.255.
The logistic regression model for DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2, and VWC2 targets was :Logit=14.355564+(-0.032103)*CtT14+(-0.021949)*CtT9+(-0.048843)*CtT4+(-0.043056)*CtT10+(-0.021502)*CtT7+(-0.033408)*CtT11+(-0.073693)*CtT5+(-0.046266)*CtT13+(0.007920)*CtT3+(-0.045817)*CtT2+(0.041211)*CtT8, cutoff of 0.265.
The logistic regression model of DLX4, PRDM14, TJP2, HOXB3, LOC645323, FGF14, KCNA3, HOXB6, IRF4, SEPTIN9_2, VWC2, SEPTIN9, FGF2, ADCY1, and RNF180 targets was :Logit=16.764824+(-0.027859)*CtT14+(-0.022618)*CtT9+(-0.052860)*CtT4+(-0.045713)*CtT10+(-0.023478)*CtT7+(-0.035244)*CtT11+(-0.065107)*CtT5+(-0.046192)*CtT13+(0.009818)*CtT3+(-0.039743)*CtT2+(0.011837)*CtT8+(-0.094328)*CtT1+(0.017130)*CtT15+(0.015845)*CtT12+(0.025051)*Ct T6, cutoff of 0.04.
The term "oligonucleotide" refers to a multimeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. The term includes double and single stranded DNA and RNA, modified and unmodified forms such as methylation or capping of polynucleotides. The terms "polynucleotide" and "oligonucleotide" are used interchangeably herein. The oligonucleotide may, but need not, include other coding or non-coding sequences, or it may, but need not, be linked to other molecules and/or carriers or support materials. Oligonucleotides used in the methods or kits of the invention can be of any length suitable for the particular method. In certain applications, the term refers to an antisense nucleic acid molecule (e.g., an mRNA or DNA strand in the opposite direction to the sense polynucleotide encoding a marker of the invention).
Oligonucleotides for use in the present invention include complementary nucleic acid sequences and nucleic acids that are substantially identical to those sequences, and also include sequences that differ from the nucleic acid sequence by the degeneracy of the genetic code. Oligonucleotides useful in the present invention also include nucleic acids that hybridize under stringent conditions, preferably high stringency conditions, to oligonucleotide cancer marker nucleic acid sequences.
As used herein, "primer" generally refers to a linear oligonucleotide that is complementary to and anneals to a target sequence. The lower limit of primer length is determined by hybridization ability, since very short primers (e.g., less than 5 nucleotides) do not form thermodynamically stable duplex under most hybridization conditions. The primer length typically varies from 8 to 50 nucleotides. In certain embodiments, the primer is between about 15-25 nucleotides. Naturally occurring nucleotides (especially guanine, adenine, cytosine and thymine, hereinafter referred to as "G", "A", "C" and "T") and nucleotide analogs are useful in the primers of the invention.
The term "nucleotide analog" as used herein refers to a compound that is structurally similar to a naturally occurring nucleotide. The nucleotide analogs can have altered phosphate backbones, sugar moieties, nucleobases, or combinations thereof. Nucleotide analogs that generally have altered nucleobases impart, inter alia, different base pairing and base stacking properties. Nucleotide analogs with altered phospho-sugar backbones (e.g., peptide Nucleic Acids (PNAs), locked Nucleic Acids (LNAs)) generally alter, inter alia, strand properties, e.g., secondary structure formation.
Exemplary primers and probes for use in the present invention are shown in tables 2 and 3, and target gene regions for which they are directed are shown in table 1.
The nucleotide sequences of the primers and probes of the invention also include modified forms thereof, as long as the amplification or detection effect of the primers is not significantly affected. The modification may be, for example, the addition of one or more nucleotide residues in the nucleotide sequence or at both ends, the deletion of one or more nucleotide residues in the nucleotide sequence, or the replacement of one or more nucleotide residues in the sequence with another nucleotide residue, e.g., the replacement of a with T, the replacement of C with G, etc. It is clear to a person skilled in the art that the modified form of the primers is also encompassed within the scope of the invention, in particular the claims. In one embodiment, the modified form of the nucleotide sequence of the primer is a chemically amplified primer as disclosed in CN103270174 a.
The individual nucleotides in the primers of the invention can be chemically synthesized using, for example, a universal DNA synthesizer (e.g., model 394, manufactured by Applied Biosystems). Any other method well known in the art may also be used to synthesize oligonucleotides.
The target marker is amplified using DNA extracted from the sample as a template and PCR primers to obtain an amplified product. Amplification reactions include, but are not limited to, polymerase Chain Reaction (PCR), ligase chain reaction (LCP), self-sustained sequence replication (3 SR), nucleic acid sequence-based amplification (NASBA), strand Displacement Amplification (SDA), multiple Displacement Amplification (MDA), and Rolling Circle Amplification (RCA), which are disclosed in U.S. Pat. No. 4,683,195, U.S. Pat. No. 4,965,188, U.S. Pat. No. 4,683,202, U.S. Pat. No. 4,800,159 (PCR), gelfand et al, U.S. Pat. No. 5,210,015 (real-time PCR with "Taqman" or "Taq" [ registered trademark ] probes), wittwer et al, U.S. Pat. No. 6,174,670, kacian et al, U.S. Pat. No. 5,399,491 ("NASBA"); lirdi, U.S. Pat. No. 5,854,033; aono et al, japanese patent publication No. JP 4-262799, rolling circle amplification, and the like.
The target markers are preferably amplified using PCR methods. PCR methods per se are well known in the art. The term "PCR" includes derivative forms of the reaction including, but not limited to, reverse transcription PCR, real-time PCR, nested PCR, multiplex PCR, fluorescent quantitative PCR, and the like. The target nucleotide is preferably quantitatively amplified using a fluorescent quantitative PCR method.
PCR is performed by repeating the cycle of denaturation, annealing and extension steps about 30 to 60 times (e.g., 50 times) using a primer hybridized to the sense strand (reverse primer) and a primer hybridized to the antisense strand (forward primer) in the presence of a primer, a template DNA and a thermostable DNA polymerase. In one embodiment, the PCR is fluorescent quantitative PCR. In one embodiment, PCR uses primers as shown in Table 2. It will be appreciated by those skilled in the art that other PCR methods and primers may be used as long as the target fragment can be amplified.
In the PCR of the present invention, amplification may be performed using various conventional thermostable DNA polymerases, including, but not limited to FASTSTART TAQ DNA polymerase (Roche), ex Taq (registered trademark, takara), Z-Taq, accuPrime Taq DNA polymerase, and HotStarTaq Plus DNA polymerase.
Methods for selecting appropriate PCR reaction conditions based on the Tm values of the primers are well known in the art, and one of ordinary skill in the art can select the optimum conditions depending on the primer length, GC content, target specificity and sensitivity, the nature of the polymerase used, and the like. For example, a fluorescent quantitative PCR reaction can be performed using the following conditions, 95℃for 5 minutes, 95℃for 15 seconds, 56℃for 40 seconds, and 50 cycles. The reaction system was 25. Mu.L.
Reagents useful for detecting the methylation level of a target marker of the invention are well known in the art. Such reagents suitable for use in the present invention, for example bisulphite reagents or methylation sensitive restriction enzymes, are commercially available or are routinely prepared by methods well known to those skilled in the art.
The term "bisulfite reagent" refers to the bisulfite salt used to distinguish between methylated and unmethylated CpG dinucleotide sequences.
The term "methylation sensitive restriction enzyme" is understood to mean an enzyme that selectively digests nucleic acid according to the methylation state of its recognition site. For restriction enzymes that cleave specifically only when the recognition site is unmethylated or hemimethylated, cleavage does not occur or occurs with significantly reduced efficiency when the recognition site is methylated. For restriction enzymes that cleave specifically only when the recognition site is methylated, cleavage does not occur, or with significantly reduced efficiency, when the recognition site is unmethylated. Preferably, the recognition sequence of the methylation sensitive restriction enzyme contains a CG dinucleotide (e.g., cgcg or cccggg). In some embodiments, further preferred are restriction enzymes that do not cleave when the cytosine in the dinucleotide is methylated at the C5 carbon atom.
Kits of the invention may be prepared by methods conventional in the art. The kit may comprise materials or reagents (including reagents for detecting each target marker) used in practicing the methods of the invention. The kit may include reagents for storing the reaction (e.g., primers, dntps, enzymes, etc. in a suitable container) and/or support materials (e.g., buffers, instructions for performing the assay, etc.). For example, the kit may comprise one or more containers (e.g., cassettes) containing the respective reagents and/or support materials. Such contents may be delivered together or separately to the intended recipient. As an example, the kit may contain reagents for detecting each target marker, buffers, and instructions for use. The kit may further contain a polymerase, dTNP, etc. The kit may also contain internal standards for quality control, positive and negative controls, and the like. The kit may also comprise reagents for preparing nucleic acids, such as DNA, from the sample. The above examples are not to be construed as limiting the kits and their contents suitable for use in the present invention.
Microarrays refer to solid supports having a planar surface with an array of nucleic acids, each member of the array comprising identical copies of an oligonucleotide or polynucleotide immobilized on a spatially defined region or site that does not overlap with the regions or sites of other members of the array, that is, the regions or sites are spatially discrete. Furthermore, a spatially defined hybridization site may be "addressable" in that its location and the identity of its immobilized oligonucleotide are known or predetermined (e.g., known or predetermined prior to its use). The oligonucleotide or polynucleotide is typically single stranded and is typically covalently attached to the solid support at either the 5 '-or 3' -end. The density of nucleic acids containing non-overlapping regions in the microarray is typically greater than 100/cm 2, more preferably greater than 1000/cm 2. Microarray technology is disclosed, for example, in Schena et al Microarrays:A Practical Approach(IRL Press,Oxford,2000);Southern,Current Opin.Chem.Biol.,2:404-410,1998,, the entire contents of which are incorporated herein by reference.
The invention discloses the application of a marker in diagnosing gastric cancer and predicting the risk of gastric cancer, and a person skilled in the art can properly improve the technological parameters by referring to the content of the invention. It is expressly noted that all such similar substitutions and modifications will be apparent to those skilled in the art, and are deemed to be included in the present invention. While the use of the present invention has been described in terms of preferred embodiments, it will be apparent to those skilled in the relevant art that the invention can be practiced and practiced with modification and alteration and combination of the use described herein without departing from the spirit and scope of the invention.
Examples
For a clearer understanding of the present invention, reference will be made to the following detailed description taken in conjunction with the accompanying drawings and examples.
Example 1 comparison of methylation abundance of gastric cancer, paracancerous tissue and leucoma DNA samples
DNA samples were obtained from the buffy coat derived from healthy persons without gastric abnormalities, cancer tissue derived from gastric cancer patients and paracancerous tissue, respectively (19 of buffy coat samples, 17 of cancer tissue, 17 of paracancerous tissue), and buffy coat DNA was selected as the reference sample because plasma free DNA mostly derived from DNA released after rupture of buffy coat cells, the background of which may be a basal background signal of the detection site of plasma free DNA.
The tunica albuginea DNA was extracted with QIAGEN QIAAMP DNA MINI KIT and the Tissue DNA was extracted with the QIAGEN QIAAMP DNA FFPE Tissue Kit according to the requirements of the specification.
A sample of 20ng of the DNA obtained in the above step was treated with a bisulfite reagent (D5031, ZYMO RESEARCH) to obtain a converted DNA. Wherein the primer sequences of the respective methylation markers and internal references are shown in Table 2.
A pre-amplification PCR reaction was performed, and the final concentration of each primer in the reaction system of the pre-amplification PCR was 200nM. The PCR reaction system contained 10ul of the transformed DNA, 2.5. Mu.L of primer pool containing all detection sites, 12.5. Mu.L of KAPA2G Fast Multiplex Mix (KAPA Biosystems, KK 5802). The PCR reaction conditions were 95℃for 3 minutes, 95℃for 30 seconds, and 56℃for 60 seconds, and 40 cycles were performed. Amplification was performed using Applied Biosystems ProFlex PCR instrument.
The fluorescence Ct value of the marker is obtained through fluorescence PCR detection. In the fluorescent PCR reaction system, the final concentration of each primer was 500nM and the final concentration of each detection probe was 200nM. The PCR reaction system comprises 10 mu L of pre-amplified diluted product, 2.5 mu L of primer and probe premix containing detection sites and 12.5 mu L of PCR reagentUniversal Probe QPCR MASTER Mix (NEB). Wherein the primer sequences of each methylation marker and internal reference are shown in Table 2, and the probe sequences are shown in Table 3. The PCR reaction conditions were 95℃for 5 minutes, 95℃for 30 seconds, 56℃for 60 seconds (fluorescence was collected), and 50 cycles were performed. Different fluorescence was detected using ABI 7500Real-TIME PCR SYSTEM in the corresponding fluorescence channel. Sample Ct values obtained from the buffy coat, the paracancerous tissue and the cancerous tissue are calculated and compared, and the target Ct value at which no amplification signal is detected is set to 50.
TABLE 2 detection primer sequences
TABLE 3 detection of probe sequences
TABLE 4 summary of sample test results
The results in Table 4 above show that the average Ct values for the detection of cancerous tissue are small, representing a stronger methylation signature. The detection rate of methylation signals in cancer tissues is far higher than that of the leucoderma, and the methylation signals in the cancer tissues are also represented to be strong. Most samples of the buffy coat failed to detect target methylation signals. Therefore, all the targets have the potential for detecting gastric cancer by blood, and the feasibility and the specificity of the selected target markers on tumor tissues are proved.
Example 2 comparison of plasma sample methylation Signal in gastric cancer patients and gastric control populations
Plasma from 276 non-gastric cancer individuals (as a control) and preoperative plasma from 262 gastric cancer patients (with an I-IV ratio of 24.8%,21%,34.4%,18.3% and 1.5% of stage unknown gastric cancer patients) were selected as training sets, and methylation detection was performed on these samples.
The extraction method, sulfite conversion, pre-amplification and PCR reaction of plasma DNA were the same as in example 1.
The area under the subject's working curve for the detection sites is shown in table 5 below:
TABLE 5 area under the subject's working curve for the detection sites
| Target numbering |
Target spot |
AUC |
| T1 |
SEPTIN9 |
0.626 |
| T2 |
SEPTIN9_2 |
0.645 |
| T3 |
IRF4 |
0.647 |
| T4 |
TJP2 |
0.717 |
| T5 |
KCNA3 |
0.66 |
| T6 |
RNF180 |
0.527 |
| T7 |
LOC645323 |
0.671 |
| T8 |
VWC2 |
0.629 |
| T9 |
PRDM14 |
0.721 |
| T10 |
HOXB3 |
0.676 |
| T11 |
FGF14 |
0.668 |
| T12 |
ADCY1 |
0.581 |
| T13 |
HOXB6 |
0.658 |
| T14 |
DLX4 |
0.748 |
| T15 |
FGF2 |
0.596 |
The results of the table show that the target marker has higher differentiation on the blood sample of the gastric cancer patient.
Sequencing the targets according to the AUC, combining the targets 2 by using binary logistic regression with two targets with the maximum AUC, and then adding 3 targets from large to small in AUC each time to perform 5-target combination, 8-target combination, 11-target combination and 15-target combination. AUC of the combined logistic regression is shown in table 6 and figure 1.
TABLE 6 area under target-associated subject working curves
The results of the table show that the AUC of the combined detection of the targets is larger than that of a single target, and the target combination can be proved to increase the distinguishing property of the target marker on the blood sample of the gastric cancer patient.
For each combination, a Logit value can be obtained based on the Ct value of each target point and a Logistic regression formula, and CT values and Logit values corresponding to the points of the single target point and the multi-target point combined Gordon index in the training set are used as cut-off values, so that the existence of gastric tumor can be confirmed, and the formation risk of gastric tumor can be estimated. The CT value of a single target point is smaller than the cut-off value, and the possible existence of gastric tumor or high risk of gastric tumor formation are considered. The CT values of the combinations are greater than the cutoff value, and it is considered that gastric tumors may exist or that the risk of gastric neoplasia is high. The single target point and different combination are shown in Table 7, and the corresponding performance in the training set is shown in Table 8.
TABLE 7 cut-off value for single-target and Multi-target combinations
TABLE 8 Performance in training set
| |
Sensitivity of |
Specificity (specificity) |
| T1 |
26.2% |
99.0% |
| T2 |
33.7% |
94.4% |
| T3 |
34.5% |
90.2% |
| T4 |
46.0% |
90.2% |
| T5 |
32.9% |
99.0% |
| T6 |
17.5% |
90.2% |
| T7 |
32.5% |
90.2% |
| T8 |
25.8% |
90.2% |
| T9 |
37.3% |
90.2% |
| T10 |
26.2% |
90.2% |
| T11 |
41.7% |
90.6% |
| T12 |
23.8% |
90.2% |
| T13 |
34.1% |
96.9% |
| T14 |
48.4% |
90.2% |
| T15 |
28.2% |
90.2% |
| 2 Target spot |
49.6% |
90.2% |
| 5 Target spot |
57.5% |
90.2% |
| 8 Target spot |
59.5% |
90.2% |
| 11 Target spot |
63.1% |
90.2% |
| 15 Target spot |
67.9% |
90.2% |
Example 3 comparison of the Performance of the test in plasma samples from patients with gastric cancer and the gastric control population
The methylation detection of these samples was performed by selecting as a validation set the plasma of 120 non-gastric cancer individuals (control) and the preoperative plasma of 110 gastric cancer patients (ratio of I-IV: 27.3%,18.2%,30.9%,20.9%, and also 2.7% of patients with stage unknown gastric cancer).
The extraction method, sulfite conversion, pre-amplification and PCR reaction of plasma DNA were the same as in example 1.
The area under the working curve of the subject at each site was examined and the results are shown in table 9 below.
TABLE 9 verification of area under the subject working curve for concentrated detection sites
| Target numbering |
Target spot |
AUC |
| T1 |
SEPTIN9 |
0.617 |
| T2 |
SEPTIN9_2 |
0.631 |
| T3 |
IRF4 |
0.644 |
| T4 |
TJP2 |
0.727 |
| T5 |
KCNA3 |
0.643 |
| T6 |
RNF180 |
0.547 |
| T7 |
LOC645323 |
0.706 |
| T8 |
VWC2 |
0.699 |
| T9 |
PRDM14 |
0.756 |
| T10 |
HOXB3 |
0.658 |
| T11 |
FGF14 |
0.684 |
| T12 |
ADCY1 |
0.602 |
| T13 |
HOXB6 |
0.65 |
| T14 |
DLX4 |
0.667 |
| T15 |
FGF2 |
0.688 |
Prediction of the population in the validation set was performed using the logistic regression model of the multi-target combination of example 2. The concentrated ROC curve was validated (see table 10), further confirming that target association can increase the differentiation of target markers to blood samples from gastric cancer patients.
And evaluating and verifying the gastric neoplasia risk in the training set according to a regression formula and a cut-off value in the training set. The corresponding performance in the test set for single target and different combinations is shown in table 11.
TABLE 10 verification of area under subject working curves for concentrated target combinations
TABLE 11 Performance in test set
| |
Sensitivity of |
Specificity (specificity) |
| T1 |
23.3% |
100.0% |
| T2 |
33.3% |
90.8% |
| T3 |
31.7% |
93.6% |
| T4 |
46.7% |
97.2% |
| T5 |
29.2% |
99.1% |
| T6 |
20.0% |
88.1% |
| T7 |
35.0% |
93.6% |
| T8 |
25.0% |
94.5% |
| T9 |
33.3% |
93.6% |
| T10 |
24.2% |
86.2% |
| T11 |
43.3% |
91.7% |
| T12 |
23.3% |
89.0% |
| T13 |
30.0% |
100.0% |
| T14 |
40.8% |
88.1% |
| T15 |
40.0% |
90.8% |
| 2 Target spot |
47.2% |
90.4% |
| 5 Target spot |
54.2% |
92.7% |
| 8 Target spot |
58.3% |
93.6% |
| 11 Target spot |
60.8% |
89.0% |
| 15 Target spot |
61.2% |
89.0% |
According to the application, 15 methylation markers of gastric cancer are screened, and the machine learning diagnosis model constructed according to the methylation levels of the methylation markers can better distinguish gastric cancer from healthy people, so that the method has important significance for early screening of gastric cancer.
The foregoing detailed description is provided by way of explanation and example and is not intended to limit the scope of the appended claims. Numerous variations of the presently illustrated embodiments of the application will be apparent to those of ordinary skill in the art and are intended to be within the scope of the appended claims and equivalents thereof.