[go: up one dir, main page]

CN118186057A - Screening method of free DNA marker, DNA marker and application thereof - Google Patents

Screening method of free DNA marker, DNA marker and application thereof Download PDF

Info

Publication number
CN118186057A
CN118186057A CN202311712915.0A CN202311712915A CN118186057A CN 118186057 A CN118186057 A CN 118186057A CN 202311712915 A CN202311712915 A CN 202311712915A CN 118186057 A CN118186057 A CN 118186057A
Authority
CN
China
Prior art keywords
preset threshold
samples
segment
methylation
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311712915.0A
Other languages
Chinese (zh)
Inventor
孙坤
郭腾飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Bay Laboratory
Original Assignee
Shenzhen Bay Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Bay Laboratory filed Critical Shenzhen Bay Laboratory
Publication of CN118186057A publication Critical patent/CN118186057A/en
Priority to PCT/CN2024/124681 priority Critical patent/WO2025123912A1/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A screening method of free DNA markers, DNA markers and application thereof. The screening method of the free DNA marker comprises the following steps: collecting a verified positive experiment sample and a verified control group sample; nucleotide sequencing of the positive test samples and control samples by a methylation sequencing method, the sequencing method following: segmenting CpG sites; and judging whether the methylation quantitative average value of the positive sample and the control sample of the CpG site segment meets a specific condition or not to form a candidate marker. The invention adopts a noninvasive mode to obtain the sample, the noise tolerance of the screened marker detection is high, and the signal of the control group is clean; the sensitivity of detection is high, the interference resistance is strong, and the accuracy of a single marker is not required; the repeatability is strong, is fit for extensive popularization.

Description

游离DNA标志物的筛选方法、DNA标志物及其应用Screening method for free DNA marker, DNA marker and its application

技术领域Technical Field

本发明涉及基因检测技术领域,具体涉及一种游离DNA标志物的筛选方法、DNA标志物及其应用。The present invention relates to the technical field of gene detection, and in particular to a method for screening free DNA markers, DNA markers and applications thereof.

背景技术Background Art

痴呆是困扰我国老年人大脑健康的重要原因,直接危害着人民健康福祉。阿尔茨海默病(Alzheimer's disease,AD)是造成老年痴呆的最主要原因(65%以上)。截至2021年底,我国60岁及以上老年人口达2.67亿,占总人口的18.9%。预计在2035年左右,60岁及以上老年人口将突破4亿,在总人口中的占比将超过30%,进入重度老龄化阶段。我国目前约有4700万AD临床前阶段病人、3900万MCI病人、1500万痴呆患者。病人产生认知障碍后无法被逆转,因此“早发现、早诊断、早治疗”是AD防治关键。Dementia is an important cause of brain health problems in the elderly in my country, and it directly endangers people's health and well-being. Alzheimer's disease (AD) is the main cause of senile dementia (more than 65%). By the end of 2021, the number of elderly people aged 60 and above in my country reached 267 million, accounting for 18.9% of the total population. It is estimated that around 2035, the number of elderly people aged 60 and above will exceed 400 million, accounting for more than 30% of the total population, entering a severe aging stage. There are currently about 47 million AD preclinical patients, 39 million MCI patients, and 15 million dementia patients in my country. Cognitive impairment cannot be reversed after the patient develops it, so "early detection, early diagnosis, and early treatment" are the key to AD prevention and treatment.

神经元胞外的淀粉样蛋白斑块(β-amyloid(Aβ)plaques)和胞内的神经元纤维tau缠结(Neurofibrillary tau tangles)是AD的两个主要病理特征,也是区别AD痴呆与非AD痴呆的依据。目前较为成熟的AD诊断方法主要是通过正电子发射断层成像(positronemission tomography,PET)直接检测活体大脑Aβ斑块和tau缠结,或者间接地通过检测脑脊液(cerebrospinal fluid,CSF)中Aβ42和p-Tau等AD相关的病理蛋白浓度。但是,PET成像具有放射性、价格昂贵而且并非所有医院都具有成像设备,脑脊液的抽取也非常繁琐和痛苦。近年来,基于血液的疾病早期诊断技术开始兴起,其对患者创伤小、可以多次收取,在实施上具有很大的优势。疾病的血液标志物有重要临床价值,但是目前很多疾病都缺乏有效的诊断、监控标志物,尤其是在早期疾病诊断方面。标志物的鉴定包括生物学原理挖掘、生化实验、以及生物信息学计算方法等方面。敏感度高、灵活的疾病标志物鉴定方法有较大需求。在AD领域,目前血液检测技术的关注点几乎全在血浆蛋白标志物上,而探测这些血浆蛋白标志物大多需要借助超灵敏生物标志物检测系统(如SIMOA)或者质谱测量等平台开展,耗材和价格偏高,并且检测灵敏度和准确性也不够理想;综上,这些缺陷导致血浆蛋白生物标志物无法满足在我国社区大面积开展AD早期检测的需求。Extracellular amyloid plaques (β-amyloid (Aβ) plaques) and intracellular neurofibrillary tau tangles are the two main pathological features of AD and are also the basis for distinguishing AD dementia from non-AD dementia. At present, the more mature AD diagnostic methods are mainly to directly detect Aβ plaques and tau tangles in the living brain through positron emission tomography (PET), or indirectly by detecting the concentration of AD-related pathological proteins such as Aβ42 and p-Tau in cerebrospinal fluid (CSF). However, PET imaging is radioactive, expensive, and not all hospitals have imaging equipment. The extraction of cerebrospinal fluid is also very cumbersome and painful. In recent years, blood-based early disease diagnosis technology has begun to emerge. It is less traumatic to patients and can be collected multiple times, which has great advantages in implementation. Blood markers of diseases have important clinical value, but many diseases currently lack effective diagnostic and monitoring markers, especially in early disease diagnosis. The identification of markers includes aspects such as biological principle mining, biochemical experiments, and bioinformatics computational methods. There is a great demand for highly sensitive and flexible disease marker identification methods. In the field of AD, the current focus of blood testing technology is almost entirely on plasma protein markers, and the detection of these plasma protein markers mostly requires the use of ultra-sensitive biomarker detection systems (such as SIMOA) or mass spectrometry platforms. The consumables and prices are high, and the detection sensitivity and accuracy are not ideal; in summary, these defects have caused plasma protein biomarkers to be unable to meet the needs of large-scale early detection of AD in our communities.

外周血游离DNA是一种自然存在的DNA片段,大多是人体内细胞死亡后释放进外周血。在健康人群中,游离DNA主要来自血细胞和肝脏,但是在疾病患者中,由于病变组织会释放游离DNA、引发免疫反应,导致游离DNA发生变化,因此游离DNA分析可以用于疾病的诊断、检测等等。此外,游离DNA的代谢半衰期约6个小时,因此游离DNA分析可实现人体的实时监控。目前,游离DNA分析多集中在癌症、孕产、感染性疾病等领域。DNA甲基化(DNAmethylation)是一种重要的DNA表观遗传修饰。一种常见的DNA甲基化类型为5-甲基胞嘧啶(5mC),即胞嘧啶(C)上增加了一个甲基。在人类中,5mC绝大多数发生在CpG二核苷酸(即由磷酸酯连接的胞嘧啶和鸟嘌呤)中的胞嘧啶上,可影响基因组的稳定性、调节基因的表达,具有较强的细胞、组织特异性。人体内存在功能各异的组织器官,如肝、肺、肾、脑等,它们之间的基因组几乎完全一样,但是DNA甲基化修饰却有较大的差异;当人体内某个组织因处于病理状态而导致其细胞死亡并释放DNA进入外周血时,可以通过鉴定其组织特异性的DNA甲基化或位点,对该病理状态进行检测、诊断。此外,病理状态下机体产生的免疫反应也会导致游离DNA甲基化的异常变化。游离DNA甲基化检测在产前诊断、癌症诊断中已经获得了广泛应用,并取得了巨大的成功。但是,大多数游离DNA甲基化的疾病标志物的鉴定,需要收集疾病相关组织的甲基化数据(或者收集相关组织,自行通过实验获得),通过与外周血、或者正常组织进行比对来鉴定疾病标志物,然后在外周血中进行验证。甲基化数据获得方式的不同,对后续标志物鉴定有显著影响。Peripheral blood free DNA is a naturally occurring DNA fragment, which is mostly released into the peripheral blood after cell death in the human body. In healthy people, free DNA mainly comes from blood cells and liver, but in patients with diseases, free DNA is released by diseased tissues, triggering immune responses, causing changes in free DNA. Therefore, free DNA analysis can be used for disease diagnosis and detection, etc. In addition, the metabolic half-life of free DNA is about 6 hours, so free DNA analysis can achieve real-time monitoring of the human body. At present, free DNA analysis is mostly focused on cancer, pregnancy, infectious diseases and other fields. DNA methylation is an important DNA epigenetic modification. A common type of DNA methylation is 5-methylcytosine (5mC), that is, a methyl group is added to cytosine (C). In humans, 5mC mostly occurs on cytosine in CpG dinucleotides (i.e., cytosine and guanine connected by phosphates), which can affect the stability of the genome and regulate gene expression, and has strong cell and tissue specificity. There are tissues and organs with different functions in the human body, such as liver, lung, kidney, brain, etc. The genomes between them are almost exactly the same, but there are large differences in DNA methylation modification; when a tissue in the human body is in a pathological state, causing its cells to die and release DNA into the peripheral blood, the pathological state can be detected and diagnosed by identifying its tissue-specific DNA methylation or site. In addition, the immune response produced by the body under pathological conditions can also lead to abnormal changes in free DNA methylation. Free DNA methylation detection has been widely used in prenatal diagnosis and cancer diagnosis, and has achieved great success. However, the identification of most disease markers of free DNA methylation requires the collection of methylation data of disease-related tissues (or the collection of related tissues and obtaining them through experiments by themselves), and the identification of disease markers by comparing with peripheral blood or normal tissues, and then verifying them in peripheral blood. The different ways of obtaining methylation data have a significant impact on the subsequent marker identification.

比如基于DNA甲基化芯片(常见的有illumina HumanMethylation450BeadChip、Illumina Infinium Methylation EPIC BeadChip等),其覆盖了固定数目、特定序列的CpG位点,实验数据中每个CpG位点的捕获深度高、同时相邻CpG位点距离较远,因此一般都是每个CpG位点单独分析。基于特异性识别酶进行富集的方法,如MEDIP-seq,其分辨率较低,在疾病诊断中使用较少。目前的主流是基于测序的方法,比如WGBS、RRBS、EM-seq、TAPS等技术,可以覆盖几乎所有CpG位点,但是由于测序深度的限制,单个CpG位点的甲基化水平定量偏差较大,同时考虑到相邻CpG之间甲基化状态具有相关性,因此会将多个相邻的CpG位点作为一个标志物,而如何对CpG位点进行分段非常重要。主流做法有多种,比如基于已知的CpG岛(CpG island)、CpG岛岸(CpG island shore)、基因启动子等调控元素(regulatoryelement),将单个元素或者其部分序列作为一个分段,但这些方式仍有不足。此外,由于游离DNA浓度很低(每毫升血浆中仅含有约7ng的游离DNA),而传统的WGBS实验中的亚硫酸盐对DNA的伤害很大,导致游离DNA甲基化实验非常困难;与疾病相关的CpG位点仅占基因组的极少部分,导致检测成本高昂。综上,开发基于游离DNA甲基化的疾病早期诊断技术,在实验、计算方法两个方面都需要优化。For example, based on DNA methylation chips (common ones include Illumina HumanMethylation450BeadChip, Illumina Infinium Methylation EPIC BeadChip, etc.), it covers a fixed number of CpG sites with specific sequences. The capture depth of each CpG site in the experimental data is high, and the distance between adjacent CpG sites is far, so each CpG site is generally analyzed separately. Methods based on specific recognition enzymes for enrichment, such as MEDIP-seq, have low resolution and are rarely used in disease diagnosis. The current mainstream is sequencing-based methods, such as WGBS, RRBS, EM-seq, TAPS and other technologies, which can cover almost all CpG sites. However, due to the limitation of sequencing depth, the quantitative deviation of the methylation level of a single CpG site is large. At the same time, considering that the methylation status between adjacent CpGs is correlated, multiple adjacent CpG sites will be used as a marker, and how to segment CpG sites is very important. There are many mainstream approaches, such as treating a single element or part of its sequence as a segment based on known regulatory elements such as CpG islands, CpG island shores, and gene promoters, but these methods still have shortcomings. In addition, due to the low concentration of free DNA (only about 7 ng of free DNA per milliliter of plasma), and the great damage of sulfites in traditional WGBS experiments to DNA, free DNA methylation experiments are very difficult; disease-related CpG sites only account for a very small part of the genome, resulting in high detection costs. In summary, the development of early disease diagnosis technology based on free DNA methylation requires optimization in both experimental and computational methods.

发明内容Summary of the invention

有鉴于此,本发明的主要目的在于提供一种游离DNA标志物的筛选方法、DNA标志物及其应用,以期至少部分地解决上述技术问题。In view of this, the main purpose of the present invention is to provide a method for screening free DNA markers, DNA markers and applications thereof, in order to at least partially solve the above technical problems.

为了实现上述目的,作为本发明的第一个方面,提供了一种游离DNA标志物的筛选方法,包括以下步骤:In order to achieve the above object, as a first aspect of the present invention, a method for screening free DNA markers is provided, comprising the following steps:

选取人类任一段基因序列,将其第一个CpG位点作为一候选分段;对于任意一候选分段,考察其相邻的下一CpG位点,若所述CpG位点与所述候选分段中最后一CpG间隔小于等于第一预设阈值Di,则将所述CpG位点合并到当前候选分段,并继续考察更后面的CpG位点,直到不符合间隔条件;Select any human gene sequence and take its first CpG site as a candidate segment; for any candidate segment, examine its next adjacent CpG site, if the interval between the CpG site and the last CpG in the candidate segment is less than or equal to a first preset threshold Di, merge the CpG site into the current candidate segment, and continue to examine the subsequent CpG sites until the interval condition is not met;

考察当前候选分段,若其包含的CpG位点数目达到第二预设阈值Num,则将所述候选分段标记为正式分段,否则丢弃所述候选分段;Inspect the current candidate segment, and if the number of CpG sites contained in the segment reaches a second preset threshold value Num, mark the candidate segment as a formal segment, otherwise discard the candidate segment;

候选分段考察完毕后,将其相邻的下一个CpG位点作为一个新的候选分段起始,重复上述过程,直至达到预设的结束条件;After the candidate segment is examined, the next adjacent CpG site is used as a new candidate segment to start, and the above process is repeated until the preset end condition is reached;

采集经过验证的正实验样本和对照组样本;Collect verified positive experimental samples and control group samples;

通过DNA甲基化测定方法对所述正实验样本和对照组样本进行核苷酸测序:The positive experimental samples and the control group samples were sequenced by DNA methylation determination method:

对所有样本,对上述确定的所有正式分段,将其包含的所有CpG位点的甲基化定量值取平均作为所述正式分段的甲基化水平;For all samples, for all formal segments determined above, the methylation quantitative values of all CpG sites contained therein are averaged as the methylation level of the formal segment;

对于每个正式分段,若其对照组样本的甲基化水平小于预设阈值a的样本个数或比例不低于x,而在正实验样本中大于另一预设阈值b的样本个数或比例大于预设阈值y,则将所述正式分段作为一候选标志物;或者For each formal segment, if the number or proportion of samples in the control group whose methylation levels are less than a preset threshold a is not less than x, and the number or proportion of samples in the positive experimental samples whose methylation levels are greater than another preset threshold b is greater than a preset threshold y, then the formal segment is taken as a candidate marker; or

对于每个正式分段,若其对照组样本的甲基化水平大于预设阈值aa的样本个数或比例不低于预设阈值xx,而在正实验样本中小于另一预设阈值bb的样本个数或比例大于预设阈值yy,则也将所述正式分段作为一候选标志物。For each formal segment, if the number or proportion of samples in the control group whose methylation levels are greater than the preset threshold aa is not less than the preset threshold xx, and the number or proportion of samples in the positive experimental samples whose methylation levels are less than another preset threshold bb is greater than the preset threshold yy, then the formal segment is also taken as a candidate marker.

作为本发明的第二个方面,还提供了一种根据如上所述的筛选方法筛选得到的候选标志物;As a second aspect of the present invention, a candidate marker obtained by screening according to the screening method as described above is also provided;

作为优选,所述候选标志物包括如SEQ ID No.1~SEQ ID No.513所述的核苷酸序列。Preferably, the candidate marker comprises the nucleotide sequence as described in SEQ ID No.1 to SEQ ID No.513.

作为本发明的第三个方面,还提供了一种基于如上所述的候选标志物进行PCR扩增、复制、转化和/或变换而得到的引物、引物扩增链或样品组合物。As a third aspect of the present invention, there is also provided a primer, a primer amplification chain or a sample composition obtained by PCR amplification, replication, conversion and/or transformation based on the candidate marker as described above.

作为本发明的第四个方面,还提供了一种甲基化机器测序系统,所述甲基化机器测序系统包含如上所述的候选标志物的核酸文库。As a fourth aspect of the present invention, a methylation machine sequencing system is also provided, wherein the methylation machine sequencing system comprises a nucleic acid library of candidate markers as described above.

基于上述技术方案可知,本发明的标志物筛选方法、DNA标志物及其应用相对于现有技术至少具有如下有益效果之一:Based on the above technical solutions, it can be seen that the marker screening method, DNA marker and application thereof of the present invention have at least one of the following beneficial effects compared with the prior art:

1、本发明无需组织样本,克服了脑组织较难获取,尤其是早期AD患者的脑组织更是几乎无法获得的技术难题;1. The present invention does not require tissue samples, thus overcoming the technical difficulty of obtaining brain tissue, especially the brain tissue of early AD patients, which is almost impossible to obtain;

2、本发明的筛选方法中,采用全基因组测序,结合CpG对基因组进行分段方法的自由度高,有效基因分段可以允许最短只有10bp(即10个碱基)长、只包含3个CpG位点;2. In the screening method of the present invention, whole genome sequencing is used, and the method of segmenting the genome in combination with CpG has a high degree of freedom, and the effective gene segmentation can be as short as 10 bp (i.e., 10 bases) long and contain only 3 CpG sites;

3、采用本发明筛选的标志物组合物进行检测的噪音容忍度高,对照组信号干净;3. The noise tolerance of the marker combination screened by the present invention is high, and the control group signal is clean;

4、采用本发明筛选的标志物组合物的检测方法的敏感度高,其不要求单个标志物的准确率很高,而是将有潜力的标志物都选出来;4. The detection method of the marker combination screened by the present invention has high sensitivity, which does not require a high accuracy rate of a single marker, but rather selects all potential markers;

5、本发明的检测方法只采集外周血,对人体无创伤、可反复采集;5. The detection method of the present invention only collects peripheral blood, is non-invasive to the human body, and can be collected repeatedly;

6、癌症诊断相关领域的研究表明,血液中疾病相关的游离DNA标志物早于蛋白质出现,因此游离DNA相关标志物在早期诊断中的应用前景更好;同时,游离DNA大多为双链DNA,稳定性高、可重复性强;6. Research in the field of cancer diagnosis has shown that disease-related free DNA markers in the blood appear earlier than proteins, so free DNA-related markers have better application prospects in early diagnosis; at the same time, most free DNA is double-stranded DNA, which has high stability and strong repeatability;

7、游离DNA的代谢半衰期短,只有6个小时,因此可以对疾病进行实时监控;游离DNA甲基化检测技术成熟、可一次性检测多个位点,适合大范围推广使用;7. The metabolic half-life of free DNA is short, only 6 hours, so diseases can be monitored in real time; the free DNA methylation detection technology is mature and can detect multiple sites at one time, which is suitable for large-scale promotion and use;

8、本发明的标志物鉴定参数不基于统计检验,统计上有差异的位点作为标志物效果往往不好;本发明也不追求单个标志物能达到很好的效果,而是鉴定出一定量有潜力的标志物,抗干扰能力强,将这些标志物的全部或部分联合使用(如使用平均值、结合机器学习技术)可以达到较好的诊断效果。8. The marker identification parameters of the present invention are not based on statistical tests, and sites with statistical differences are often not effective as markers; the present invention does not pursue a single marker to achieve good results, but rather identifies a certain amount of potential markers with strong anti-interference capabilities. The combined use of all or part of these markers (such as using average values, combined with machine learning technology) can achieve better diagnostic results.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是一位早期AD患者(星号)和一位对照健康老人(×号)对于Seq.21的标志物的游离DNA甲基化率(甲基化水平)的对比点线图(图中每个点代表一个CpG位点);FIG1 is a comparative dot-line graph of the free DNA methylation rate (methylation level) of a marker of Seq.21 in an early AD patient (asterisk) and a healthy elderly control (×) (each dot in the graph represents a CpG site);

图2A和2B分别是两个独立数据集中的所有标志物在健康老人(左边)和AD患者(右边)中基于AβPET显像的甲基化水平的算术平均值分布图(图中每个点代表一位患者);Figures 2A and 2B are the arithmetic mean distribution diagrams of the methylation levels of all markers in healthy elderly people (left) and AD patients (right) based on Aβ PET imaging in two independent data sets, respectively (each point in the figure represents a patient);

图3是本发明的标志物与其它AD标志物(Aβ42/Aβ40、p-Tau181、NfL、GFAP)在健康对照、AD患者分类中的性能对比图(ROC曲线图);3 is a performance comparison diagram (ROC curve diagram) of the markers of the present invention and other AD markers (Aβ42/Aβ40, p-Tau181, NfL, GFAP) in the classification of healthy controls and AD patients;

图4是本发明的游离DNA标志物与其它AD标志物(Aβ42/Aβ40、p-Tau181、NfL、GFAP)在预测患者大脑中Aβ累积信号强度的性能对比图。FIG4 is a performance comparison diagram of the free DNA marker of the present invention and other AD markers (Aβ42/Aβ40, p-Tau181, NfL, GFAP) in predicting the intensity of Aβ accumulation signals in the patient's brain.

具体实施方式DETAILED DESCRIPTION

为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明作进一步的详细说明。In order to make the objectives, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

除非本文另有限定,否则本文所使用的所有技术和科学术语都具有本发明所属技术领域的普通技术人员所通常理解的同样含义。与本文所描述的方法和材料类似或等同的任何方法和材料都能用于本发明的实施或测试中,然而还是描述了优选的方法和材料。Unless otherwise limited herein, all technical and scientific terms used herein have the same meanings as commonly understood by those of ordinary skill in the art to which the invention belongs. Any methods and materials similar or equivalent to the methods and materials described herein can be used in the practice or testing of the present invention, but preferred methods and materials are described.

本文所提到的所有专利和出版物,包括这些专利和出版物中所披露的所有序列,都通过引用明确引入。All patents and publications, including all sequences disclosed within such patents and publications, mentioned herein are expressly incorporated by reference.

数值范围包括了限定范围的数字本身。除非另有说明,核酸和氨基酸分别是以5’到3’方向和氨基到羧基方向从左到右书写。Numerical ranges are inclusive of the numbers defining the range. Unless otherwise indicated, nucleic acids and amino acids are written left to right in 5' to 3' orientation and amino to carboxyl orientation, respectively.

本文所提供的标题并不是对本发明各个方面或实施方案的限制。因此,接下来所定义的术语要结合说明书整体进行更完整的定义。The headings provided herein are not limitations of the various aspects or embodiments of the invention. Accordingly, the terms defined below are more fully defined in conjunction with the specification as a whole.

除非另有限定,否则本文所使用的所有技术和科学术语都具有本发明所属技术领域的普通技术人员所通常理解的同样含义。为了清楚和便于引用,下文还是定义了某些术语。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which the invention belongs. For clarity and ease of reference, certain terms are defined below.

本文所使用的术语“样品”涉及材料或材料混合物,通常是液体形式的,尽管不是必然的,其包含一个或多个目标分析物。As used herein, the term "sample" refers to a material or mixture of materials, usually, although not necessarily, in liquid form, which contains one or more analytes of interest.

本文所使用的术语“核酸样品”是指包含核酸的样品。本文所使用的核酸样品可以是复杂的,因为它们包含多种不同的包含序列的分子。哺乳动物(如鼠或人)基因组DNA是复杂样品的典型。复杂样品可以具有超过104、105、106或107个不同的核酸分子。DNA目标可以源自任何来源,诸如基因组DNA或人工DNA构建物。As used herein, the term "nucleic acid sample" refers to a sample comprising nucleic acids. As used herein, nucleic acid samples can be complex because they comprise a variety of different molecules comprising sequences. Mammal (e.g., mouse or human) genomic DNA is typical of complex samples. Complex samples can have more than 104, 105, 106 or 107 different nucleic acid molecules. DNA targets can be derived from any source, such as genomic DNA or artificial DNA constructs.

本文所使用的术语“CpG”,是指5’-C-phosphate-G-3’,即紧邻的胞嘧啶(C)和鸟嘧啶(G)组成的2个核苷酸。The term "CpG" as used herein refers to 5'-C-phosphate-G-3', i.e., two nucleotides consisting of cytosine (C) and guanine (G) adjacent to each other.

本文所使用的术语“人类基因组GRCh38”,是指Genome Reference Consortium(GRC)于2013年发布的第38版人类基因组参考序列。The term “human genome GRCh38” used in this article refers to the 38th version of the human genome reference sequence released by the Genome Reference Consortium (GRC) in 2013.

本文所使用的术语“引物”是指天然或合成的寡核苷酸,它能够在与多核苷酸模板形成双链体时作为核酸合成的起始点,并沿模板从其3’末端延伸以便形成延伸的双链体。延伸过程期间所添加的核苷酸序列是由模板多核苷酸序列所决定的。通常引物是经DNA聚合酶进行延伸的。引物长度通常是与其在引物延伸产物合成中的用途相适应的,通常是在8到100个核苷酸长度的范围,诸如10到75、15到60、15到40、18到30、20到40、21到50、22到45、25到40个等等。典型的引物能够在10-50个核苷酸长度的范围,诸如15-45、18-40、20-30、21-25个等,以及所述范围内的任意长度。在一些实施方案中,引物通常不多于约10、12、15、20、21、22、23、24、25、26、27、28、29、30、35、40、45、50、55、60、65或70个核苷酸长度。The term "primer" as used herein refers to a natural or synthetic oligonucleotide that can serve as a starting point for nucleic acid synthesis when forming a duplex with a polynucleotide template, and extends from its 3' end along the template to form an extended duplex. The nucleotide sequence added during the extension process is determined by the template polynucleotide sequence. Usually the primer is extended via a DNA polymerase. The primer length is usually adapted to its use in the synthesis of primer extension products, usually in the range of 8 to 100 nucleotides in length, such as 10 to 75, 15 to 60, 15 to 40, 18 to 30, 20 to 40, 21 to 50, 22 to 45, 25 to 40, etc. Typical primers can be in the range of 10-50 nucleotides in length, such as 15-45, 18-40, 20-30, 21-25, etc., and any length within the range. In some embodiments, a primer is typically no more than about 10, 12, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, or 70 nucleotides in length.

本文所使用的术语“多个”(plurality)包含至少2个。在某些情况下,多个可以具有至少10个、至少100个、至少100个、至少10,000个、至少100,000个、至少106、107、108、109或更多个。As used herein, the term "plurality" includes at least 2. In some cases, the plurality can be at least 10, at least 100, at least 100, at least 10,000, at least 100,000, at least 106, 107, 108, 109 or more.

本文所使用的术语“测序”是指一种方法,通过该方法获得了多核苷酸的至少10个连续核苷酸的身份(identity)(例如至少20个、至少50个、至少100个或至少200个或者更多个连续核苷酸的身份)。As used herein, the term "sequencing" refers to a method by which the identity of at least 10 consecutive nucleotides of a polynucleotide is obtained (e.g., the identity of at least 20, at least 50, at least 100, or at least 200 or more consecutive nucleotides).

本文所使用的术语“标记”(tagging)是指将序列标志物(其包含标识序列(identifier sequence))附加到核酸分子上。序列标志物可被添加到核酸分子的5’末端、3’末端或者两端。序列标志物能够被添加到片段上,通过例如T4DNA连接酶或其他连接酶将衔接物连接到片段上。The term "tagging" as used herein refers to the attachment of a sequence marker (which includes an identifier sequence) to a nucleic acid molecule. The sequence marker can be added to the 5' end, the 3' end, or both ends of the nucleic acid molecule. The sequence marker can be added to the fragment and the linker can be connected to the fragment by, for example, T4 DNA ligase or other ligases.

本文所使用的术语“标志物”,也称之为“标志物”,是指可区分两种状态的某种参数。As used herein, the term "marker", also referred to as "marker", refers to a certain parameter that can distinguish between two states.

本文所使用的术语“游离DNA”,也称之为“循环无细胞DNA”(cfDNA)是指患者外周血中循环的DNA。无细胞DNA中的DNA分子可具有低于1kb的中值大小(例如在50bp到500bp、80bp到400bp或100~1,000bp的范围),尽管也可以存在具有该范围之外的中值大小的片段。无细胞DNA可包含循环肿瘤DNA(ctDNA),即癌症患者血液中自由循环的肿瘤DNA或者循环胎儿DNA(如果受试者是怀孕雌性/女性)。cfDNA能够是高度片段化的并在某些情况下能够具有约166bp的中值片段大小。cfDNA的获得能够通过离心全血以除去所有细胞,然后从剩下的血浆或血清中分离DNA。CfDNA大多是双链的,但是能够通过变性得到单链。The term "free DNA", also known as "circulating cell-free DNA" (cfDNA) as used herein, refers to DNA circulating in the patient's peripheral blood. The DNA molecules in the cell-free DNA may have a median size of less than 1 kb (e.g., in the range of 50 bp to 500 bp, 80 bp to 400 bp, or 100 to 1,000 bp), although fragments with a median size outside this range may also be present. Cell-free DNA may include circulating tumor DNA (ctDNA), i.e., tumor DNA that circulates freely in the blood of cancer patients or circulating fetal DNA (if the subject is a pregnant female/female). cfDNA can be highly fragmented and can have a median fragment size of about 166 bp in some cases. cfDNA can be obtained by centrifuging whole blood to remove all cells and then isolating DNA from the remaining plasma or serum. CfDNA is mostly double-stranded, but can be denatured to obtain single strands.

本文所使用的术语“扩增”是指使用目标核酸作为模板来产生目标核酸的一个或多个拷贝。As used herein, the term "amplifying" refers to producing one or more copies of a target nucleic acid using a target nucleic acid as a template.

本文所使用的术语“片段的拷贝”是指扩增的产物,其中片段的拷贝能够是片段链的反向互补,或者具有与片段链相同的序列。As used herein, the term "copy of a fragment" refers to the product of amplification, wherein the copy of the fragment can be the reverse complement of the fragment strand, or have the same sequence as the fragment strand.

本发明中涉及如下几种甲基化基因检测方法:The present invention relates to the following methods for detecting methylated genes:

1、全基因组DNA甲基化测序(Whole Genome Bisulfite Sequencing,WGBS)1. Whole Genome Bisulfite Sequencing (WGBS)

WGBS技术利用亚硫酸氢钠(Sodium Bisulfite)处理DNA,导致未甲基化的C变成U,并且在后续PCR和测序过程中成为T,而甲基化的C不受影响,从而可以在检测时辨别出哪些位点甲基化了。亚硫酸氢钠处理DNA容易产生破坏作用,尤其是CpG岛含有大量未甲基化的C,因此在此区域容易覆盖度低。WGBS技术的建库方法可以分为两种,一种是先进行DNA破碎与连接接头,然后亚硫酸盐处理C->T转化;另一种先进行C->T转化,然后再连接接头扩展。后一种对DNA投入要求更低。需要注意的是,WGBS技术无法区分5-hmC与5-mC。WGBS technology uses sodium bisulfite to treat DNA, causing unmethylated C to become U, and become T in the subsequent PCR and sequencing process, while methylated C is not affected, so that which sites are methylated can be identified during detection. Sodium bisulfite treatment of DNA is prone to damage, especially CpG islands contain a large number of unmethylated C, so coverage is easy to be low in this area. The library construction method of WGBS technology can be divided into two types. One is to first fragment the DNA and connect the adapter, and then treat the C->T with sulfite; the other is to first convert C->T, and then connect the adapter to expand. The latter requires lower DNA input. It should be noted that WGBS technology cannot distinguish between 5-hmC and 5-mC.

常用的WGBS比对软件是Bismark,Bismark将参考基因组序列预先进行C->T和G->A两种转换。比对时每一条reads同样进行C->T和G->A两种转换,这样组合以后每条reads相当于进行4种不同的比对,这些比对选出最佳比对,就可以确定发生甲基化的链方向和可能的甲基化位点。The commonly used WGBS alignment software is Bismark, which pre-converts the reference genome sequence into two types: C->T and G->A. During alignment, each read also undergoes two conversions: C->T and G->A. After the combination, each read is equivalent to undergoing four different alignments. The best alignment is selected from these alignments to determine the chain direction where methylation occurs and possible methylation sites.

2、简化基因组DNA甲基化测序(Reduced Representation BisulfiteSequencing,RRBS)技术2. Reduced Representation Bisulfite Sequencing (RRBS) technology

RRBS技术的原理为:通过限制性酶切的方法富集基因组DNA上富含CCGG位点的片段,经过重亚硫酸盐(Bisulfite)处理和高通量测序技术进行基因组CpG富集区域内的单碱基分辨率的甲基化测序。相对WGBS技术而言,RRBS技术作为高性价比的甲基化测序方案,它的测序量大幅减少,在大规模临床样本研究中具有广泛的应用价值。The principle of RRBS technology is to enrich the fragments rich in CCGG sites on genomic DNA through restriction enzyme digestion, and then perform single-base resolution methylation sequencing in the CpG-rich region of the genome through bisulfite treatment and high-throughput sequencing technology. Compared with WGBS technology, RRBS technology is a cost-effective methylation sequencing solution, and its sequencing volume is greatly reduced, which has a wide range of application value in large-scale clinical sample research.

RRBS技术利用重亚硫酸氢盐能够将未甲基化的胞嘧啶(C)转化为胸腺嘧啶(T)的特性,将基因组用重亚硫酸氢盐处理后进行测序,即可根据单个C位点上未转化为C未转化为T的reads数目与所有覆盖的reads数目的比例,计算得到甲基化率。该技术对于全面研究胚胎发育、衰老机制、疾病发生发展的表观遗传机制,以及筛选疾病相关的表观遗传学标记位点具有重要的应用价值。RRBS technology uses the property of bisulfite that it can convert unmethylated cytosine (C) into thymine (T). After the genome is treated with bisulfite and sequenced, the methylation rate can be calculated based on the ratio of the number of reads that have not been converted to C or T at a single C site to the number of all covered reads. This technology has important application value for comprehensive research on epigenetic mechanisms of embryonic development, aging mechanisms, disease occurrence and development, and screening of disease-related epigenetic marker sites.

RRBS技术的流程包括:(1)对DNA样品的检测,主要包括两种方法:Qubit对DNA浓度进行精确定量;琼脂糖凝胶电泳分析DNA降解程度以及是否有污染。(2)根据不同的样品类型采用不同的提取方案,获取高质量的基因组DNA;Qubit检测DNA样品的浓度,琼脂糖凝胶电泳检测DNA样品的完整性;通过酶切富集CpG片段;在DNA分子两端引入接头序列和index序列;电泳回收选择富含CpG的250-500bp片段;向回收的基因组DNA中,加入λDNA,随后通过重亚硫酸盐处理将非甲基化碱基C转化成U;PCR扩增富集文库及纯化PCR产物,获最终文库。(3)文库构建完成后,先使用Qubit2.0进行初步定量,稀释文库至1ng/μl,随后使用Agilent2100对文库的insert size进行检测,符合预期后,使用qPCR方法对文库的有效浓度进行准确定量,保证文库质量。(4)上机测序;文库检测合格后,把不同文库按照有效浓度及目标下机数据量的需求池化(pooling)后在illuminaNova平台测序。The process of RRBS technology includes: (1) Detection of DNA samples, mainly including two methods: Qubit accurately quantifies DNA concentration; agarose gel electrophoresis analyzes the degree of DNA degradation and whether there is contamination. (2) Use different extraction schemes according to different sample types to obtain high-quality genomic DNA; Qubit detects the concentration of DNA samples, and agarose gel electrophoresis detects the integrity of DNA samples; enriches CpG fragments by enzyme digestion; introduces adapter sequences and index sequences at both ends of DNA molecules; electrophoresis recovers and selects 250-500bp fragments rich in CpG; adds λDNA to the recovered genomic DNA, and then converts the non-methylated base C to U through bisulfite treatment; PCR amplifies the enriched library and purifies the PCR product to obtain the final library. (3) After the library is constructed, use Qubit2.0 for preliminary quantification, dilute the library to 1ng/μl, and then use Agilent2100 to detect the insert size of the library. If it meets expectations, use the qPCR method to accurately quantify the effective concentration of the library to ensure the quality of the library. (4) Sequencing: After the library is qualified, different libraries are pooled according to the effective concentration and target data volume requirements and then sequenced on the IlluminaNova platform.

3、酶学转化法甲基化检测——EM-seq测序3. Enzymatic conversion methylation detection - EM-seq sequencing

EM-seq测序通过将酶学转化法甲基化文库制备和靶向甲基化捕获系统相结合,共检测人基因组上123Mb区域内的398万个CpG位点的甲基化状态。EM-seq测序是癌症转移、人类发育和功能基因组学等各种应用中探索甲基化水平的理想选择。该方法覆盖范围广、检测灵敏度高,且降低了测序成本,适合进行大队列样本研究,尤其适用于cfDNA样本的甲基化研究。EM-seq sequencing combines enzymatic conversion methylation library preparation with a targeted methylation capture system to detect the methylation status of 3.98 million CpG sites in a 123Mb region on the human genome. EM-seq sequencing is ideal for exploring methylation levels in various applications such as cancer metastasis, human development, and functional genomics. This method has a wide coverage range, high detection sensitivity, and reduces sequencing costs. It is suitable for large cohort sample studies, especially for methylation studies of cfDNA samples.

酶法转化需要的样本起始量比较低,一般10-200ngDNA即可满足要求。DNA经过打断(如超声波打断;cfDNA为自然降解,无需再次打断)再通过两步酶法转化,将未甲基化的胞嘧啶与5mC和5hmC区分出来。酶法处理流程对DNA更温和,可以最大程度减少对DNA的损伤,所以酶法转化后得到的DNA更完整,文库中存在更多更长的插入片段,最终能获得更长的序列和更高的比对率。The sample starting amount required for enzymatic conversion is relatively low, and generally 10-200ngDNA can meet the requirements. After DNA is sheared (such as ultrasonic shearing; cfDNA is naturally degraded and does not need to be sheared again), it is converted by two steps of enzymatic method to distinguish unmethylated cytosine from 5mC and 5hmC. The enzymatic treatment process is gentler on DNA and can minimize damage to DNA, so the DNA obtained after enzymatic conversion is more complete, and there are more and longer inserts in the library, which can ultimately obtain longer sequences and higher alignment rates.

本发明公开了一种游离DNA标志物的筛选方法,包括以下步骤:The present invention discloses a method for screening free DNA markers, comprising the following steps:

采集经过验证的正实验样本(positive sample)和对照组样本(negativesample)。Collect verified positive experimental samples (positive sample) and control group samples (negative sample).

将人类基因组(理论上任意版本、任意补丁皆可,本项目中使用GRCh38的第13个补丁版本,即GRCh38.p13)中所有CpG位点的位置(genomic coordinate)取出。将所述正实验样本上第一个CpG位点作为一个候选分段。对于一个候选分段,考察其相邻的下一个CpG位点:若该CpG位点与该候选分段中最后一个CpG间隔在一个指定的范围内(第一预设阈值,比如10个碱基),则将该CpG位点合并到当前候选分段,并继续考察更后面的CpG位点;若间隔大于指定范围,则考察当前候选分段:若其包含的CpG位点数目达到第二预设阈值(如3个、4个或5个),则将此候选分段标记为正式分段,否则丢弃该候选分段;候选分段考察完毕后,将其相邻的下一个CpG位点作为一个新的候选分段起始,并重复该过程,直至所有CpG位点都完成考察。采用本发明的CpG分段方法可以实现:99%的分段在63bp以下,中位数为17bp,非常适合在游离DNA中进行实验验证、设计试剂盒;每个分段包含3个或以上CpG位点,可补足测序数据深度的缺陷,提高标志物的可靠性;不基于已知的CpG岛等功能注释元素,可鉴定到潜在的位于加强子区域(具有比基因启动子更好的组织特异性)的标志物。The positions (genomic coordinates) of all CpG sites in the human genome (theoretically any version and any patch are acceptable. In this project, the 13th patch version of GRCh38, i.e., GRCh38.p13) are used. The first CpG site on the positive experimental sample is used as a candidate segment. For a candidate segment, its next adjacent CpG site is examined: if the interval between the CpG site and the last CpG in the candidate segment is within a specified range (the first preset threshold, such as 10 bases), the CpG site is merged into the current candidate segment, and the subsequent CpG sites are continued to be examined; if the interval is greater than the specified range, the current candidate segment is examined: if the number of CpG sites it contains reaches the second preset threshold (such as 3, 4, or 5), the candidate segment is marked as a formal segment, otherwise the candidate segment is discarded; after the candidate segment is examined, its next adjacent CpG site is used as a new candidate segment to start, and the process is repeated until all CpG sites are examined. The CpG segmentation method of the present invention can achieve the following: 99% of the segments are below 63 bp, with a median of 17 bp, which is very suitable for experimental verification and kit design in free DNA; each segment contains 3 or more CpG sites, which can make up for the defects in the depth of sequencing data and improve the reliability of markers; it is not based on known functional annotation elements such as CpG islands, and can identify potential markers located in enhancer regions (with better tissue specificity than gene promoters).

其中,所述第一预设阈值Di例如为2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、30、40、50、60、70、80、90或100(个碱基);所述第二预设阈值Num例如为1、2、3、4、5、6、7、8、9或10(个CpG位点)。Among them, the first preset threshold Di is, for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90 or 100 (bases); the second preset threshold Num is, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 (CpG sites).

对所有实验样本,对上述确定的所有CpG分段,将其包含的所有CpG位点的甲基化定量值取平均作为该分段的甲基化水平。取平均可以有多种方法,比如算术平均值。由于数据中每个甲基化位点的测序深度不同,在本项目中,采用的是一种以测序深度为权重的加权平均值,该方式是甲基化分析中最常用的平均值计算方法。其它方法包括所有CpG位点甲基化水平的算术平均值、中位值等等。For all experimental samples, for all CpG segments determined above, the methylation quantitative values of all CpG sites contained in them are averaged as the methylation level of the segment. There are many ways to take the average, such as the arithmetic mean. Since the sequencing depth of each methylation site in the data is different, in this project, a weighted average with sequencing depth as the weight is used. This method is the most commonly used average calculation method in methylation analysis. Other methods include the arithmetic mean, median, and so on of the methylation levels of all CpG sites.

对于每个CpG分段,若其在所有对照组中的甲基化水平小于预设阈值a(比如10%)的个数或比例不低于x(比如90%),而在疾病组中大于另一预设阈值b(比如20%,可以与a相同)的样本个数或比例大于预设阈值y(比如25%),则将该CpG分段作为一个候选标志物;或者其在所有对照组中的甲基化水平大于预设阈值aa(比如90%)的个数或比例不低于预设阈值xx(比如90%),而在疾病组中小于另一预设阈值bb(比如80%,可以与aa相同)的样本个数或比例大于预设阈值yy(比如25%),则将该CpG分段作为一个候选标志物。For each CpG segment, if the number or proportion of its methylation level in all control groups is less than a preset threshold a (for example, 10%) is not less than x (for example, 90%), and the number or proportion of samples in the disease group that are greater than another preset threshold b (for example, 20%, which may be the same as a) is greater than a preset threshold y (for example, 25%), then the CpG segment is taken as a candidate marker; or the number or proportion of its methylation level in all control groups that is greater than a preset threshold aa (for example, 90%) is not less than a preset threshold xx (for example, 90%), and the number or proportion of samples in the disease group that is less than another preset threshold bb (for example, 80%, which may be the same as aa) is greater than a preset threshold yy (for example, 25%), then the CpG segment is taken as a candidate marker.

由此,通过设置预设阈值,可以确定不同档次的若干CpG分段筛选结果,其检测灵敏度也随着设置的预设阈值而不同。Therefore, by setting a preset threshold, several CpG segment screening results of different levels can be determined, and their detection sensitivity also varies with the set preset threshold.

例如,预设阈值a为10%,预设阈值x为90%,预设阈值b为20%,预设阈值y为25%,预设阈值aa为90%,预设阈值xx为90%,预设阈值bb为80%,预设阈值yy为25%。For example, the preset threshold a is 10%, the preset threshold x is 90%, the preset threshold b is 20%, the preset threshold y is 25%, the preset threshold aa is 90%, the preset threshold xx is 90%, the preset threshold bb is 80%, and the preset threshold yy is 25%.

或者例如,预设阈值a为5%,预设阈值x为N-2(N为对照组样本数目),预设阈值b为20%,预设阈值y为33%,预设阈值aa为95%,预设阈值x为N-1(N为对照组样本数目),预设阈值bb为80%,预设阈值yy为33%。作为优选,本发明采用预设阈值a为5%,预设阈值x为N-1(N为对照组样本数目),预设阈值b为5%,预设阈值y为25%,预设阈值aa为95%,预设阈值xx为N-1(N为对照组样本数目),预设阈值bb为95%,预设阈值yy为25%的方案。Or for example, the preset threshold a is 5%, the preset threshold x is N-2 (N is the number of samples in the control group), the preset threshold b is 20%, the preset threshold y is 33%, the preset threshold aa is 95%, the preset threshold x is N-1 (N is the number of samples in the control group), the preset threshold bb is 80%, and the preset threshold yy is 33%. Preferably, the present invention adopts a scheme in which the preset threshold a is 5%, the preset threshold x is N-1 (N is the number of samples in the control group), the preset threshold b is 5%, the preset threshold y is 25%, the preset threshold aa is 95%, the preset threshold xx is N-1 (N is the number of samples in the control group), the preset threshold bb is 95%, and the preset threshold yy is 25%.

在上述方案中,本发明的标志物鉴定参数不基于统计检验,这是疾病诊断和生物学研究的侧重点差异导致,统计上有差异的位点作为标志物效果往往不好,尤其是对照组不是很干净的位点,比如对照组平均50%,疾病组平均70%,统计上可能很显著,但是诊断上基本无法使用,尤其是在疾病信号较弱、来源不清晰的情况下(如早期疾病)。本发明也不追求单个标志物能达到很好的效果(往往会导致过度优化(over-fitting)),而是鉴定出一定量有潜力的标志物,以增强抗干扰能力(如单个标志物容易受到引物偏差、个体SNP等影响)。本发明还可以进一步通过数据处理技术,比如可以用机器学习技术,如随机森林、支持向量机、神经网络、梯度提升决策树等,也可以用较为简单的方法,如算术平均值,来将这些标志物的全部或部分结合起来使用,从而达到较好的诊断效果。In the above scheme, the marker identification parameters of the present invention are not based on statistical tests, which is caused by the difference in emphasis between disease diagnosis and biological research. The sites with statistical differences are often not good as markers, especially the sites where the control group is not very clean, such as the control group average 50%, the disease group average 70%, which may be statistically significant, but basically cannot be used in diagnosis, especially when the disease signal is weak and the source is unclear (such as early-stage disease). The present invention also does not pursue that a single marker can achieve a good effect (often leading to over-optimization (over-fitting)), but identifies a certain amount of potential markers to enhance anti-interference ability (such as a single marker is easily affected by primer bias, individual SNP, etc.). The present invention can also further use data processing technology, such as machine learning technology, such as random forest, support vector machine, neural network, gradient boosting decision tree, etc., or a relatively simple method, such as arithmetic mean, to combine all or part of these markers for use, so as to achieve a better diagnostic effect.

在上述方案中,所述正实验样本例如为经过AβPET确定其脑皮层Aβ斑块病理为阳性(AD病人组)的血液或体液样本,如外周血样本;对照组样本则可以为经过AβPET确定其脑皮层Aβ斑块病理为阴性的血液或体液样本。进一步地,所述血液或体液样本优选通过无创方式获取,且优选在离体6个小时之内进行检测,以保证实验样本中游离DNA的完整性。In the above scheme, the positive experimental sample is, for example, a blood or body fluid sample, such as a peripheral blood sample, whose cerebral cortex Aβ plaque pathology is positive (AD patient group) determined by AβPET; the control group sample can be a blood or body fluid sample whose cerebral cortex Aβ plaque pathology is negative determined by AβPET. Furthermore, the blood or body fluid sample is preferably obtained by non-invasive means, and is preferably tested within 6 hours of ex vivo to ensure the integrity of free DNA in the experimental sample.

在上述方案中,具体的甲基化测序方法例如可以是上述公知的测序方法,如WGBS技术、RRBS技术、EM-seq测序方法和TAPS技术等,或者也可以采用数字PCR、实时荧光定量PCR和甲基化DNA特异性识别酶结合技术等来对甲基化程度进行测定。In the above scheme, the specific methylation sequencing method can be, for example, the above-mentioned well-known sequencing methods, such as WGBS technology, RRBS technology, EM-seq sequencing method and TAPS technology, or digital PCR, real-time fluorescence quantitative PCR and methylated DNA specific recognition enzyme binding technology can be used to measure the methylation degree.

在上述方案中,对于数据处理技术例如还可以考虑:In the above scheme, for example, the data processing technology may also be considered:

纳入分析的所有标志物的甲基化水平平均值;The mean methylation level of all markers included in the analysis;

纳入分析的所有标志物的某个统计值,比如Q1、中位数、Q3等等;A statistical value of all markers included in the analysis, such as Q1, median, Q3, etc.

纳入分析的所有标志物中,非0值所占的比例;The proportion of non-zero values among all markers included in the analysis;

使用机器学习技术开发分类方法,包括但不限于决策树、随机森林、支持向量机、神经网络、梯度提升决策树,等等。Develop classification methods using machine learning techniques including but not limited to decision trees, random forests, support vector machines, neural networks, gradient boosted decision trees, and more.

本发明的游离DNA标志物的筛选方法,可以对人类基因组GRCh38.p13中所有CpG位点的位置进行筛选,也可以只对其中部分位置进行筛选。The method for screening free DNA markers of the present invention can screen the positions of all CpG sites in the human genome GRCh38.p13, or can screen only some of the positions.

本发明通过筛选,可以得到一系列的候选标志物,每一个候选标志物作为一个独立的基因片段,可以作为游离DNA标志物,用于实现对对应基因片段是否甲基化的精准判断标记,因此均具有其独特的使用价值和保护价值,本发明也将其作为基因片段的组合物进行专门的保护。Through screening, the present invention can obtain a series of candidate markers. Each candidate marker, as an independent gene fragment, can be used as a free DNA marker to accurately determine whether the corresponding gene fragment is methylated. Therefore, each candidate marker has its unique use value and protection value. The present invention also uses it as a composition of gene fragments for special protection.

作为这样的基因片段的组合物,例如包括如表1所示的核苷酸序列。Such a gene fragment composition includes, for example, the nucleotide sequences shown in Table 1.

表1通过本发明方法筛选得到的部分基因片段的组合物Table 1 Composition of partial gene fragments screened by the method of the present invention

将其用于游离DNA标志物进行如上所述的甲基化基因检测机器测序时,具有分辨率高和准确性好的优点,本发明筛选得到的cfDNA甲基化标志物结合数据处理技术可以有效的将健康对照与早期AD患者区分开来,准确率达到92%以上,显著优于目前已经商业化的血浆蛋白标志物。When it is used for free DNA markers for methylation gene detection machine sequencing as described above, it has the advantages of high resolution and good accuracy. The cfDNA methylation markers screened by the present invention combined with data processing technology can effectively distinguish healthy controls from early AD patients with an accuracy rate of more than 92%, which is significantly better than the currently commercialized plasma protein markers.

本发明还公开了一种根据如上所述的筛选方法筛选得到的标志物;以及,一种基于如上所述的标志物进行PCR扩增、复制、转化和/或变换而得到的引物、引物扩增链或经富集的无细胞DNA样品组合物。其中,所述无细胞DNA样品组合物例如包括连接了衔接物的无细胞DNA分子、包括扩增子的内标对照组合物,以及链霉亲和素载体等,且所述无细胞DNA样品组合物中包含如上所述的候选标志物。The present invention also discloses a marker obtained by screening according to the screening method as described above; and a primer, a primer amplification chain or an enriched cell-free DNA sample composition obtained by PCR amplification, replication, conversion and/or transformation based on the marker as described above. Wherein, the cell-free DNA sample composition, for example, includes a cell-free DNA molecule connected to a linker, an internal standard control composition including an amplicon, and a streptavidin carrier, and the cell-free DNA sample composition contains the candidate marker as described above.

本发明还公开了一种甲基化机器测序系统,所述甲基化机器测序系统包含如上所述的候选标志物的核酸文库。The present invention also discloses a methylation machine sequencing system, which comprises the nucleic acid library of candidate markers as described above.

综上,本发明公开并要求保护一种基于上述游离DNA标志物的筛选方法筛选得到的基因片段的组合物,以及基于这些基因片段的组合物PCR扩增、复制、转化、变换形成的引物、引物扩增链、样品组合物,以及将上述基因片段的组合物收录到核酸文库(libraryofnucleic acids)而形成的具有独特标志物的核酸文库,以及采用其的核酸测序系统,例如收录到illumina的核酸文库中,及采用这样的核酸文库的illumina核酸测序系统等。In summary, the present invention discloses and claims protection for a composition of gene fragments screened based on the above-mentioned screening method for free DNA markers, as well as primers, primer amplification chains, and sample compositions formed by PCR amplification, replication, conversion, and transformation of the composition of these gene fragments, and a nucleic acid library with unique markers formed by incorporating the above-mentioned composition of gene fragments into a nucleic acid library (library of nucleic acids), and a nucleic acid sequencing system using the same, for example, incorporating it into an Illumina nucleic acid library, and an Illumina nucleic acid sequencing system using such a nucleic acid library, etc.

下文将通过具体实施例来对本发明作进一步阐述说明。需要注意的是,下述的实施例仅是举例说明,而不是用于限定本发明。The present invention will be further described below through specific examples. It should be noted that the following examples are only for illustration and are not intended to limit the present invention.

实施例Example

1、患者招募、血液样本处理1. Patient recruitment and blood sample processing

所有纳入分析的患者均通过AβPET确定其脑皮层Aβ斑块病理为阴性(对照组)还是阳性(AD病人组)。血液处理为常规操作:使用含有抗凝剂(如EDTA)的采血管从静脉收集外周血(如6ml),使用离心机在低温(如4℃)、中速(如1600重力加速度)下离心一段时间(如15分钟),然后吸取上层液体,转移至新的离心管中,再使用离心机在低温(如4℃)、高速(如16000重力加速度)下离心一段时间(如15分钟),然后吸取上层液体(即血浆),转移至新的离心管或冻存管中。血浆可以放置于超低温(如-80℃)冰箱中保存,直至使用。All patients included in the analysis were determined by AβPET to determine whether their cerebral cortical Aβ plaque pathology was negative (control group) or positive (AD patient group). Blood processing was routine: peripheral blood (e.g., 6 ml) was collected from the vein using a blood collection tube containing an anticoagulant (e.g., EDTA), centrifuged at low temperature (e.g., 4°C) and medium speed (e.g., 1600 g-acceleration) for a period of time (e.g., 15 minutes), then the upper layer of liquid was aspirated and transferred to a new centrifuge tube, and then centrifuged at low temperature (e.g., 4°C) and high speed (e.g., 16000 g-acceleration) for a period of time (e.g., 15 minutes) using a centrifuge, then the upper layer of liquid (i.e., plasma) was aspirated and transferred to a new centrifuge tube or cryopreservation tube. Plasma can be stored in an ultra-low temperature refrigerator (e.g., -80°C) until use.

2、EM-seq数据分析与标志物鉴定2. EM-seq data analysis and marker identification

使用游离DNA提取试剂盒(已经商业化)提取游离DNA,使用EM-seq技术对游离DNA进行处理、建库,然后使用二代测序技术(如illuminaNova Seq 6000测序仪)对文库进行测序,获取每个CpG位点的甲基化水平。测序数据通过生物信息学方法进行分析,包括指控、比对和甲基化鉴定,获得每个CpG位点的甲基化水平。本发明使用Msuite2软件进行数据分析,其包含数据质控模块,可去除低质量测序数据,并允许在分析时去除部分序列,以降低游离DNA末端缺陷(比如锯齿状末端)的影响,提高结果的准确率。Free DNA is extracted using a free DNA extraction kit (commercialized), free DNA is processed and library is constructed using EM-seq technology, and then the library is sequenced using second-generation sequencing technology (such as illuminaNova Seq 6000 sequencer) to obtain the methylation level of each CpG site. The sequencing data is analyzed by bioinformatics methods, including accusation, alignment and methylation identification, to obtain the methylation level of each CpG site. The present invention uses Msuite2 software for data analysis, which includes a data quality control module, which can remove low-quality sequencing data and allow the removal of some sequences during analysis to reduce the impact of free DNA end defects (such as jagged ends) and improve the accuracy of the results.

将人类基因组(任意大版本、任意补丁版本皆可,本发明中使用GRCh38的第13个补丁版本,即GRCh38.p13,也可以使用更老的版本GRCh 37、GRCh36或最新的版本T2T CHM13v2.0/hs1、Han1等)中所有CpG位点的位置取出。将第一个CpG位点作为一个候选分段。对于一个候选分段,若其相邻的下一个CpG与该分段中最后一个CpG间隔在一个指定的范围内(比如10个碱基),则将它合并到当前候选分段,并继续考察更后面的CpG位点;若间隔大于制定范围,则考察当前候选分段,若其包含的CpG位点数目达到预设阈值(如3个),则将此候选分段标记为正式分段,否则丢弃,然后将下一个CpG位点作为一个新的候选分段起始,重复该过程,直至所有CpG位点都完成考察。The positions of all CpG sites in the human genome (any major version, any patch version is acceptable, the 13th patch version of GRCh38, i.e., GRCh38.p13, is used in the present invention, and older versions GRCh 37, GRCh36 or the latest versions T2T CHM13v2.0/hs1, Han1, etc. can also be used). The first CpG site is used as a candidate segment. For a candidate segment, if the interval between its adjacent next CpG and the last CpG in the segment is within a specified range (e.g., 10 bases), it is merged into the current candidate segment, and the subsequent CpG sites are continued to be investigated; if the interval is greater than the specified range, the current candidate segment is investigated, and if the number of CpG sites it contains reaches a preset threshold (e.g., 3), the candidate segment is marked as a formal segment, otherwise it is discarded, and then the next CpG site is used as a new candidate segment to start, and the process is repeated until all CpG sites are investigated.

对所有实验样本,对上述确定的所有CpG分段,将其包含的所有CpG位点的甲基化定量值取平均作为该分段的甲基化水平。取平均可以有多种方法,比如算术平均值。由于数据中每个甲基化位点的测序深度不同,在本项目中,采用的是一种以测序深度为权重的加权平均值,也是甲基化分析中最常用的平均值计算方法。For all experimental samples, for all CpG segments determined above, the methylation quantitative values of all CpG sites contained in them are averaged as the methylation level of the segment. There are many ways to take the average, such as the arithmetic mean. Since the sequencing depth of each methylation site in the data is different, in this project, a weighted average with sequencing depth as the weight is used, which is also the most commonly used average calculation method in methylation analysis.

对于每个CpG分段,若其在所有对照组中的甲基化水平都小于预设阈值a(=10%),而在疾病组中大于另一预设阈值b(=20%)的样本比例大于预设阈值c(=25%),则将该CpG分段作为一个候选标志物;或者其在所有对照组中的甲基化水平都大于预设阈值aa(=90%),而在疾病组中小于另一预设阈值bb(=80%)的样本比例大于预设阈值cc(=25%),则将该CpG分段作为一个候选标志物。For each CpG segment, if its methylation level in all control groups is less than a preset threshold a (=10%), and the proportion of samples in the disease group that are greater than another preset threshold b (=20%) is greater than a preset threshold c (=25%), then the CpG segment is taken as a candidate marker; or if its methylation level in all control groups is greater than a preset threshold aa (=90%), and the proportion of samples in the disease group that are less than another preset threshold bb (=80%) is greater than a preset threshold cc (=25%), then the CpG segment is taken as a candidate marker.

3、AD相关生物标志物3. AD-related biomarkers

通过实验暂时筛选得到如表1所示的513个标志物,并提供这些标志物的序列和在GRCh 38中的位置信息。注意每个标志物的长度不等,包含的CpG个数也不等(均为3个以上)。对于每个标志物,其包含的每个CpG位点也可以分别看做一个标志物。Through the experiment, 513 markers were temporarily screened as shown in Table 1, and the sequences of these markers and their position information in GRCh 38 were provided. Note that the length of each marker is different, and the number of CpGs contained is also different (all more than 3). For each marker, each CpG site contained in it can also be regarded as a marker.

4、将测得的513个标志物重新用于上述对照组(健康老人)和AD病人组(早期AD患者),使用全基因组甲基化测序的方法测定本发明的标志物位点,验证实验结果。其中,共收集了两个样本集,作为发现集的样本集1对于513个标志物共进行了19组健康老人测试和24组早期AD患者测试(限于篇幅,样本集1未贴出);如附表1所示,作为验证集的样本集2对于513个标志物共进行了17组健康老人测试和13组早期AD患者测试(限于篇幅,样本集2只贴出8组健康老人测试和8组早期AD患者测试)。4. The 513 markers measured were reused in the above-mentioned control group (healthy elderly) and AD patient group (early AD patients), and the marker sites of the present invention were determined by whole genome methylation sequencing to verify the experimental results. Among them, two sample sets were collected, and sample set 1 as the discovery set tested 19 groups of healthy elderly people and 24 groups of early AD patients for 513 markers (due to space limitations, sample set 1 was not posted); as shown in Appendix 1, sample set 2 as the validation set tested 17 groups of healthy elderly people and 13 groups of early AD patients for 513 markers (due to space limitations, sample set 2 only posted 8 groups of healthy elderly people and 8 groups of early AD patients).

5、对上述测试结果的部分图像化展示5. Graphical display of some of the above test results

图1是一位早期AD患者(星号)和一位对照健康老人(×号)对于SEQ ID No.21的标志物的游离DNA甲基化率(甲基化水平)的对比点线图(图中每个点代表一个CpG位点)。从图上可以看出,该标志物可以显著区分健康老人和早期AD患者,从而可以起到早期检测预警的作用,提高初诊识别率,有助于患者的早期干预治疗和康复。Figure 1 is a comparative dot-line graph of the free DNA methylation rate (methylation level) of the marker SEQ ID No. 21 for an early AD patient (asterisk) and a control healthy elderly person (×) (each dot in the figure represents a CpG site). It can be seen from the figure that the marker can significantly distinguish healthy elderly people from early AD patients, thereby playing a role in early detection and warning, improving the initial diagnosis recognition rate, and contributing to the early intervention treatment and rehabilitation of patients.

图2A和图2B分别是两个独立数据集中所有标志物在健康老人(左边)和AD患者(右边)中基于AβPET显像的甲基化水平的算术平均值分布图(图中每个点代表一位患者)。样本集1和样本集2的定义如上所述,在这2个集合中,使用所有位点的平均游离DNA甲基化水平作为一个预测指数,发现其可以很好的区分健康老人和早期AD患者,准确率如下:Figure 2A and Figure 2B are the distribution diagrams of the arithmetic mean of the methylation levels of all markers in healthy elderly people (left) and AD patients (right) based on AβPET imaging in two independent data sets (each point in the figure represents a patient). The definitions of sample set 1 and sample set 2 are as described above. In these two sets, the average free DNA methylation level of all sites is used as a predictive index, and it is found that it can well distinguish healthy elderly people from early AD patients, with the following accuracy rates:

样本集1:敏感度=100%,特异性=100%;Sample set 1: sensitivity = 100%, specificity = 100%;

样本集2:敏感度=92.3%,特异性=94.1%。Sample set 2: Sensitivity = 92.3%, specificity = 94.1%.

本发明也可以使用任意个位点做预测,例如图1就是1个位点,图2使用全部位点,也可以任意挑选其中2~512个位点求平均。The present invention can also use any number of sites for prediction, for example, FIG1 uses one site, FIG2 uses all sites, or 2 to 512 sites can be randomly selected to calculate the average.

此外,需要说明的是,两个数据集中部分数值是“NA”,对应样本在该位点的测序深度较低,该位点在计算平均值等指标时会被忽略,不参与平均计算。In addition, it should be noted that some values in the two data sets are "NA", and the sequencing depth of the corresponding samples at this site is low. This site will be ignored when calculating indicators such as the average value and will not participate in the average calculation.

图3是本发明的标志物与其它AD标志物(Aβ42/Aβ40、p-Tau181、NfL、GFAP)在健康对照、AD患者分类中的性能对比图(ROC曲线图)。其中,DNA甲基化-数据集1(AUC=1),DNA甲基化-数据集2(AUC=0.96),血浆p-Tau181(AUC=0.89),血浆NfL(AUC=0.86),血浆GFAP(AUC=0.76),血浆Aβ42/Aβ40(AUC=0.80)。通过ROC分析可以看到,本发明的方案(①、②)在早期AD患者中比现有的血浆蛋白标志物(③~⑥)有更好的预测准确性。Figure 3 is a performance comparison diagram (ROC curve diagram) of the markers of the present invention and other AD markers (Aβ42/Aβ40, p-Tau181, NfL, GFAP) in the classification of healthy controls and AD patients. Among them, DNA methylation-dataset 1 (AUC=1), DNA methylation-dataset 2 (AUC=0.96), plasma p-Tau181 (AUC=0.89), plasma NfL (AUC=0.86), plasma GFAP (AUC=0.76), plasma Aβ42/Aβ40 (AUC=0.80). Through ROC analysis, it can be seen that the schemes of the present invention (①, ②) have better prediction accuracy in early AD patients than existing plasma protein markers (③~⑥).

图4是本发明的游离DNA标志物与其它AD标志物(Aβ42/Aβ40、p-Tau181、NfL、GFAP)在预测患者大脑中Aβ累积信号强度的性能对比图。从图4中的A与其它B~E之间的对比看,相比于血浆蛋白标志物,本发明的游离DNA甲基化标志物与大脑Aβ信号强度有更好的相关性,表明本发明的标志物在动态监护中有更好的准确率。Figure 4 is a performance comparison chart of the free DNA marker of the present invention and other AD markers (Aβ42/Aβ40, p-Tau181, NfL, GFAP) in predicting the cumulative signal intensity of Aβ in the patient's brain. From the comparison between A and other B to E in Figure 4, compared with plasma protein markers, the free DNA methylation marker of the present invention has a better correlation with the brain Aβ signal intensity, indicating that the marker of the present invention has a better accuracy in dynamic monitoring.

以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific embodiments described above further illustrate the objectives, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims (10)

1.一种游离DNA标志物的筛选方法,其特征在于,包括以下步骤:1. A method for screening free DNA markers, characterized in that it comprises the following steps: 对于一段基因序列,将其第一个CpG位点作为一候选分段;对于任意一候选分段,考察其相邻的下一CpG位点,若所述CpG位点与所述候选分段中最后一CpG间隔小于等于第一预设阈值,则将所述CpG位点合并到当前候选分段,并继续考察更后面的CpG位点,直到不符合间隔条件;For a gene sequence, the first CpG site is used as a candidate segment; for any candidate segment, the next adjacent CpG site is examined, and if the interval between the CpG site and the last CpG in the candidate segment is less than or equal to a first preset threshold, the CpG site is merged into the current candidate segment, and the subsequent CpG sites are continuously examined until the interval condition is not met; 考察当前候选分段,若其包含的CpG位点数目达到第二预设阈值,则将所述候选分段标记为正式分段,否则丢弃所述候选分段;Inspect the current candidate segment, and if the number of CpG sites contained in the segment reaches a second preset threshold, mark the candidate segment as a formal segment, otherwise discard the candidate segment; 候选分段考察完毕后,将其相邻的下一个CpG位点作为一个新的候选分段起始,重复上述过程,直至达到预设的结束条件;After the candidate segment is examined, the next adjacent CpG site is used as a new candidate segment to start, and the above process is repeated until the preset end condition is reached; 采集经过验证的正实验样本和对照组样本;Collect verified positive experimental samples and control group samples; 通过DNA甲基化测定方法对所述正实验样本和对照组样本进行测定:The positive experimental samples and the control group samples were measured by the DNA methylation measurement method: 对所有样本,对上述确定的所有正式分段,将其包含的所有CpG位点的甲基化定量值取平均作为所述正式分段的甲基化水平;For all samples, for all formal segments determined above, the methylation quantitative values of all CpG sites contained therein are averaged as the methylation level of the formal segment; 对于每个正式分段,若其对照组样本的甲基化水平小于预设阈值a的样本个数或比例不低于x,而在正实验样本中大于另一预设阈值b的样本个数或比例大于预设阈值y,则将所述正式分段作为一候选标志物;或者For each formal segment, if the number or proportion of samples in the control group whose methylation levels are less than a preset threshold a is not less than x, and the number or proportion of samples in the positive experimental samples whose methylation levels are greater than another preset threshold b is greater than a preset threshold y, then the formal segment is taken as a candidate marker; or 对于每个正式分段,若其对照组样本的甲基化水平大于预设阈值aa的样本个数或比例不低于预设阈值xx,而在正实验样本中小于另一预设阈值bb的样本个数或比例大于预设阈值yy,则也将所述正式分段作为一候选标志物。For each formal segment, if the number or proportion of samples in the control group whose methylation levels are greater than the preset threshold aa is not less than the preset threshold xx, and the number or proportion of samples in the positive experimental samples whose methylation levels are less than another preset threshold bb is greater than the preset threshold yy, then the formal segment is also taken as a candidate marker. 2.根据权利要求1所述的筛选方法,其特征在于,所述正实验样本为经过AβPET确定其脑皮层Aβ斑块病理为阳性的血液或体液样本;优选为外周血样本;2. The screening method according to claim 1, characterized in that the positive experimental sample is a blood or body fluid sample whose cerebral cortex Aβ plaque pathology is positive as determined by AβPET; preferably a peripheral blood sample; 所述对照组样本为经过AβPET确定其脑皮层Aβ斑块病理为阴性的血液或体液样本;The control group samples are blood or body fluid samples whose cerebral cortex Aβ plaque pathology is determined to be negative by AβPET; 作为优选,所述血液或体液样本通过无创方式获取;进一步优选地,所述样本为无创方式获取的、离体6个小时之内的外周血样本。Preferably, the blood or body fluid sample is obtained by a non-invasive method; further preferably, the sample is a peripheral blood sample obtained by a non-invasive method and within 6 hours of ex vivo. 3.根据权利要求1所述的筛选方法,其特征在于,所述DNA甲基化测定方法选自WGBS技术、RRBS技术、EM-seq测序方法、TAPS技术、数字PCR、实时荧光定量PCR和甲基化DNA特异性识别酶结合技术;3. The screening method according to claim 1, characterized in that the DNA methylation determination method is selected from WGBS technology, RRBS technology, EM-seq sequencing method, TAPS technology, digital PCR, real-time fluorescence quantitative PCR and methylated DNA specific recognition enzyme binding technology; 所述第一预设阈值为2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、30、40、50、60、70、80、90或100;The first preset threshold is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90 or 100; 所述第二预设阈值为1、2、3、4、5、6、7、8、9或10。The second preset threshold is 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10. 4.根据权利要求1所述的筛选方法,其特征在于,所述计算平均值的方法选自算术平均、中位值平均、权重加权平均。4. The screening method according to claim 1 is characterized in that the method for calculating the average value is selected from the group consisting of arithmetic mean, median average, and weighted average. 5.根据权利要求1所述的筛选方法,其特征在于,5. The screening method according to claim 1, characterized in that 所述预设阈值a为10%,预设阈值x为90%,预设阈值b为20%,预设阈值y为25%,预设阈值aa为90%,预设阈值xx为90%,预设阈值bb为80%,预设阈值yy为25%;或者The preset threshold a is 10%, the preset threshold x is 90%, the preset threshold b is 20%, the preset threshold y is 25%, the preset threshold aa is 90%, the preset threshold xx is 90%, the preset threshold bb is 80%, and the preset threshold yy is 25%; or 所述预设阈值a为5%,预设阈值x为N-2,预设阈值b为20%,预设阈值y为33%,预设阈值aa为95%,预设阈值xx为N-1,预设阈值bb为80%,预设阈值yy为33%;其中,N为对照组样本数目;或者The preset threshold a is 5%, the preset threshold x is N-2, the preset threshold b is 20%, the preset threshold y is 33%, the preset threshold aa is 95%, the preset threshold xx is N-1, the preset threshold bb is 80%, and the preset threshold yy is 33%; wherein N is the number of samples in the control group; or 所述预设阈值a为5%,预设阈值x为N-1,预设阈值b为5%,预设阈值y为25%,预设阈值aa为95%,预设阈值xx为N-1,预设阈值bb为95%,预设阈值yy为25%;其中,N为对照组样本数目。The preset threshold a is 5%, the preset threshold x is N-1, the preset threshold b is 5%, the preset threshold y is 25%, the preset threshold aa is 95%, the preset threshold xx is N-1, the preset threshold bb is 95%, and the preset threshold yy is 25%; wherein N is the number of samples in the control group. 6.根据权利要求1所述的筛选方法,其特征在于,其中基于正式分段确定候选标志物的步骤中,还采用机器学习技术,优选随机森林、支持向量机、神经网络、梯度提升决策树方法来优化筛选结果。6. The screening method according to claim 1 is characterized in that, in the step of determining candidate markers based on formal segmentation, machine learning technology is also used, preferably random forest, support vector machine, neural network, gradient boosting decision tree method to optimize the screening results. 7.根据权利要求1所述的筛选方法,其特征在于,其中对人类基因组中所有染色体上的所有或部分CpG位点的位置进行筛选;7. The screening method according to claim 1, characterized in that the positions of all or part of the CpG sites on all chromosomes in the human genome are screened; 所述人类基因组数据来自GRCh38、GRCh37、GRCh36、T2TCHM13v2.0/hs1或Han1。The human genome data are from GRCh38, GRCh37, GRCh36, T2TCHM13v2.0/hs1 or Han1. 8.根据权利要求1-7任一项所述的筛选方法筛选得到的候选标志物;8. The candidate marker obtained by screening according to any one of claims 1 to 7; 作为优选,所述候选标志物包括如SEQ ID No.1~SEQ ID No.513所述的核苷酸序列。Preferably, the candidate marker comprises the nucleotide sequence as described in SEQ ID No.1 to SEQ ID No.513. 9.基于如权利要求8所述的候选标志物进行PCR扩增、复制、转化和/或变换而得到的引物、引物扩增链或样品组合物。9. Primers, primer amplification chains or sample compositions obtained by PCR amplification, replication, conversion and/or transformation based on the candidate marker according to claim 8. 10.一种甲基化机器测序系统,所述甲基化机器测序系统包含如权利要求8所述的候选标志物的核酸文库。10 . A methylation machine sequencing system, comprising the nucleic acid library of candidate markers according to claim 8 .
CN202311712915.0A 2022-12-13 2023-12-13 Screening method of free DNA marker, DNA marker and application thereof Pending CN118186057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2024/124681 WO2025123912A1 (en) 2022-12-13 2024-10-14 Screening method for cell-free dna marker, dna marker and use thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022116014102 2022-12-13
CN202211601410 2022-12-13

Publications (1)

Publication Number Publication Date
CN118186057A true CN118186057A (en) 2024-06-14

Family

ID=91411382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311712915.0A Pending CN118186057A (en) 2022-12-13 2023-12-13 Screening method of free DNA marker, DNA marker and application thereof

Country Status (2)

Country Link
CN (1) CN118186057A (en)
WO (1) WO2025123912A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025123912A1 (en) * 2022-12-13 2025-06-19 深圳湾实验室 Screening method for cell-free dna marker, dna marker and use thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050130172A1 (en) * 2003-12-16 2005-06-16 Bayer Corporation Identification and verification of methylation marker sequences
CN102796808A (en) * 2011-05-23 2012-11-28 深圳华大基因科技有限公司 Methylation high-flux detection method
CN109852672A (en) * 2017-11-30 2019-06-07 深圳豪石生物科技有限公司 A method for screening prognostic markers of DNA methylation in acute myeloid leukemia

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7718364B2 (en) * 2003-03-25 2010-05-18 John Wayne Cancer Institute DNA markers for management of cancer
MX2021003164A (en) * 2018-09-19 2021-06-23 Bluestar Genomics Inc Cell-free dna hydroxymethylation profiles in the evaluation of pancreatic lesions.
CN116312739A (en) * 2021-12-20 2023-06-23 博尔诚(北京)科技有限公司 Marker screening method, cancer detection method and device based on methylation sequencing
CN117059163A (en) * 2022-05-06 2023-11-14 博尔诚(北京)科技有限公司 System and method for screening large fragment methylation markers
CN118186057A (en) * 2022-12-13 2024-06-14 深圳湾实验室 Screening method of free DNA marker, DNA marker and application thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050130172A1 (en) * 2003-12-16 2005-06-16 Bayer Corporation Identification and verification of methylation marker sequences
CN102796808A (en) * 2011-05-23 2012-11-28 深圳华大基因科技有限公司 Methylation high-flux detection method
CN109852672A (en) * 2017-11-30 2019-06-07 深圳豪石生物科技有限公司 A method for screening prognostic markers of DNA methylation in acute myeloid leukemia

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RAY O. BAHADO-SINGH 等: "Artificial Intelligence and Circulating Cell-Free DNA Methylation Profiling: Mechanism and Detection of Alzheimer’s Disease", 《CELLS》, vol. 11, no. 1744, 25 May 2022 (2022-05-25), pages 1 - 19 *
林苏扬 等: "表观遗传修饰调控阿尔茨海默病的研究进展", 《生物化学与生物物理进展》, 25 October 2021 (2021-10-25), pages 1 - 21 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025123912A1 (en) * 2022-12-13 2025-06-19 深圳湾实验室 Screening method for cell-free dna marker, dna marker and use thereof

Also Published As

Publication number Publication date
WO2025123912A1 (en) 2025-06-19

Similar Documents

Publication Publication Date Title
CN104254618B (en) Size-based analysis of fetal DNA fraction in maternal plasma
US20150376691A1 (en) Rapid aneuploidy detection
CN112941180A (en) Group of lung cancer DNA methylation molecular markers and application thereof in preparation of lung cancer early diagnosis kit
US11312999B2 (en) Set of genes for molecular classifying of medulloblastoma and use thereof
CN105483229B (en) Method and system for detecting fetal chromosome aneuploidy
CN110760579B (en) Reagent for amplifying free DNA and amplification method
TW201840853A (en) Diagnostic applications using nucleic acid fragments
HUE030510T2 (en) Diagnosing fetal chromosomal aneuploidy using genomic sequencing
CN110648722B (en) Device for risk assessment of neonatal genetic diseases
CN112176057B (en) Use of CpG site methylation levels to detect markers of pancreatic ductal adenocarcinoma and its application
WO2015035555A1 (en) Method, system, and computer readable medium for determining whether fetus has abnormal number of sex chromosomes
CN115537462B (en) A sequencing method for simultaneous detection of pathogenic bacteria and host gene expression and its application in the diagnosis and prognosis of bacterial meningitis
CN113362893A (en) Construction method and application of tumor screening model
CN103571847A (en) FOXC1 gene mutant and its application
WO2025123912A1 (en) Screening method for cell-free dna marker, dna marker and use thereof
CN119546781A (en) Epigenetic analysis of cell-free DNA
CN103911439A (en) Analyzing method and application of differential expression gene of systemic lupus erythematosus hydroxymethylation status
TW202237856A (en) Methods using characteristics of urinary and other dna
CN116144669B (en) TCOF1 gene mutants and their applications
CN105838720B (en) PTPRQ gene mutant and its application
US11718880B2 (en) Marker and diagnosis method for noninvasive diagnosis of myocardial infarction
CN115992219A (en) TOMM6 mutant gene, primers, kit and method for detecting it, and use thereof
CN110628898B (en) BAZ1B susceptibility SNP locus detection reagent and kit prepared by same
CN117059163A (en) System and method for screening large fragment methylation markers
CN114875131A (en) Detection method of targeting membrane protein methylation as imprinted gene syndrome marker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40110215

Country of ref document: HK